My Project
Public Member Functions | List of all members
mlir::quant::UniformQuantizedValueConverter Class Reference

#include <UniformSupport.h>

Public Member Functions

 UniformQuantizedValueConverter (UniformQuantizedType uniformType)
 
 UniformQuantizedValueConverter (double scale, double zeroPoint, double clampMin, double clampMax, uint32_t storageBitWidth, bool isSigned)
 
 UniformQuantizedValueConverter (double scale, double zeroPoint, APFloat clampMin, APFloat clampMax, uint32_t storageBitWidth, bool isSigned)
 
virtual APInt quantizeFloatToInt (APFloat expressedValue) const
 
int64_t quantizeFloatToInt64 (APFloat expressedValue) const
 
virtual ~UniformQuantizedValueConverter ()
 

Detailed Description

Reference implementation of converting between real numbers and values represented by a UniformQuantizedType. Note that this is not expected to be speedy and may be superseded eventually by a more optimal implementation. Also, the interface assumes that quantization is done per-layer and will need to be wider for various per-channel schemes. As such, this is a placeholder.

Constructor & Destructor Documentation

◆ UniformQuantizedValueConverter() [1/3]

mlir::quant::UniformQuantizedValueConverter::UniformQuantizedValueConverter ( UniformQuantizedType  uniformType)
inlineexplicit

◆ UniformQuantizedValueConverter() [2/3]

mlir::quant::UniformQuantizedValueConverter::UniformQuantizedValueConverter ( double  scale,
double  zeroPoint,
double  clampMin,
double  clampMax,
uint32_t  storageBitWidth,
bool  isSigned 
)
inline

◆ UniformQuantizedValueConverter() [3/3]

mlir::quant::UniformQuantizedValueConverter::UniformQuantizedValueConverter ( double  scale,
double  zeroPoint,
APFloat  clampMin,
APFloat  clampMax,
uint32_t  storageBitWidth,
bool  isSigned 
)
inline

◆ ~UniformQuantizedValueConverter()

virtual mlir::quant::UniformQuantizedValueConverter::~UniformQuantizedValueConverter ( )
inlinevirtual

Member Function Documentation

◆ quantizeFloatToInt()

virtual APInt mlir::quant::UniformQuantizedValueConverter::quantizeFloatToInt ( APFloat  expressedValue) const
inlinevirtual

◆ quantizeFloatToInt64()

int64_t mlir::quant::UniformQuantizedValueConverter::quantizeFloatToInt64 ( APFloat  expressedValue) const
inline

The documentation for this class was generated from the following file: