首页 / TENSORFLOW / TensorFlow 笔记04 - 使用类封装写好的 TensorRT 模型，包括 int8 优化要用的 calibrator

TensorFlow 笔记04 - 使用类封装写好的 TensorRT 模型，包括 int8 优化要用的 calibrator

内容导读

互联网集市收集整理的这篇技术教程文章主要介绍了TensorFlow 笔记04 - 使用类封装写好的 TensorRT 模型，包括 int8 优化要用的 calibrator，小编现在分享给大家，供广大互联网技能从业者学习和参考。文章包含6448字，纯文字阅读大概需要10分钟。

内容图文

TensorFlow 笔记04 - 使用类封装写好的 TensorRT 模型，包括 int8 优化要用的 calibrator

? 使用类封装写好的 TensorRT 模型，每个函数、类成员各司其职，而不是以前程序那样纯过程式，变量全部摊开

● 代码，程序入口 enter.py

              1
            import
             os

              2
            import
             sys

              3
            import
             numpy as np

              4
            import
             tensorrt as trt

              5
            import
             pycuda.autoinit

              6
            import
             pycuda.driver as cuda

              7
            from datetime import datetime as dt
  8  9import loadPara as ld
 10import calibrator
 11 12 DEBUG           = True                     
 13 testDataPath    = "./" 14 calibDataPath   = "./" 15 tempPath        = "./" 16 paraFile        = tempPath + "para.h5" 17 cacheFile       = tempPath + "calib.cache" 18 outputFile      = tempPath + "output.txt" 19 20 iGpu            = 0
 21 calibCount      = 10                        # int8 校正次数 22 inputSize       = (1,1,1)                   # 输入数据尺寸，CHW 23 24class TrtPredictor:
 25def__init__(self, batchSize, dataType):
 26         self.logger     = trt.Logger(trt.Logger.ERROR)                      # 创建 logger 27         self.batchSize  = batchSize
 28         self.dataType   = dataType
 29         self.h5f, ...   = fld.loadPara(paraFile)                            # 读取训练好的参数 30 31         trtFilePath = tempPath + "engine-" + self.dataType + ".trt"# 尝试读取创建好的引擎，没有则现场创建引擎 32if os.path.isfile(trtFilePath) andnot DEBUG:
 33             f =  open(trtFilePath, ‘rb‘)
 34             engineStr = f.read()                                            # enginStr 不作为成员变量 35             self.runtime = trt.Runtime(self.logger)                         # 运行时读取文件中的引擎 36             self.engine = self.runtime.deserialize_cuda_engine(engineStr)
 37            f.close()
 38print("succeeded loading engine!")
 39else:                             
 40             self.create_engine()                                            # 创建 engine，并写入文件，方便下次调用 41if self.engine == None:
 42print("failed building engine!")
 43return 44             engineStr = self.engine.serialize()
 45             f = open(trtFilePath, ‘wb‘)
 46            f.write(engineStr)
 47            f.close()
 48print("succeeded building engine!")
 49 50         self.context = self.engine.create_execution_context()               # 创建 CUDA 上下文和流 51         self.stream = cuda.Stream()
 52 53def__del__(self):
 54         self.context = None
 55         self.engine  = None
 56        ld.close(self.h5f)
 57 58def create_engine(self):                                                # 构造引擎 59         self.builder = trt.Builder(self.logger)
 60         self.builder.max_batch_size     = 16
 61         self.builder.max_workspace_size = 1 << 30
 62         self.builder.fp16_mode          = self.dataType == ‘float16‘ 63         self.builder.int8_mode          = self.dataType == ‘int8‘ 64         self.network                    = self.builder.create_network()
 65         self.builder.strict_type_constraints = True        
 66 67         h0 = self.network.add_input("h0", trt.DataType.FLOAT, (1,) + inputSize) # 强制 N 为 1，多的数据堆在更高维度上 68 69#...                                                                # 中间层 70 71         self.network.mark_output(h0.get_output(0))                          # 标记输出层 72 73if self.dataType == ‘int8‘:                                         # int8 需要额外的校正，放到 builder 中 74             self.builder.int8_calibrator = calibrator.MyCalibrator(calibCount, (self.batchSize,) + inputSize, calibDataPath, cacheFile)
 75 76         self.engine = self.builder.build_cuda_engine(self.network)          # 创建引擎（最容易失败的地方，返回构造函数后要检查是否成功） 77 78def infer(self, hInPart, dIn, dOut, hOut):                              # 推理 79        cuda.memcpy_htod_async(dIn, hInPart, self.stream)
 80        self.context.execute_async(len(hInPart), [int(dIn), int(dOut)], self.stream.handle)
 81        cuda.memcpy_dtoh_async(hOut, dOut, self.stream)            
 82        self.stream.synchronize()
 83 84def predict(hIn, batchSize, dataType):    
 85     predictor = TrtPredictor(batchSize, dataType)                           # 构造一个预测器 86 87     dIn  = cuda.mem_alloc(hIn[0].nbytes * batchSize)                        # 准备主机和设备内存 88     hOut = np.empty((batchSize,) + tuple(predictor.engine.get_binding_shape(1)), dtype = np.float32)
 89     dOut = cuda.mem_alloc(hOut.nbytes)                                      # dOut 和 hOut 的大小一定是相同的 90     res=[]
 91for i in range(0, len(hIn), batchSize):                                 # 分 batch 喂入数据 92         predictor.infer(hIn[i:i+batchSize], dIn, dOut, hOut)                    
 93        res.append( hOut )
 94 95return res
 96 97if__name__ == "__main__":                                                  # main 函数负责管理 cuda.Device 和 cuda.Context 98     _ = os.system("clear")
 99     batchSize = int(sys.argv[1])    if len(sys.argv) > 1 and sys.argv[1].isdigit()                         else 1
100     dataType  = sys.argv[2]         if len(sys.argv) > 2 and sys.argv[2] in [‘float32‘, ‘float16‘, ‘int8‘] else‘float32‘101     DEBUG     = int(sys.argv[3])>0  if len(sys.argv) > 3 and sys.argv[3].isdigit()                         else False
102if DEBUG:                                                               # 清除建好的 engine 和 校正缓存，重头开始建立                                103         oldEngineEAndCache = glob(tempPath+"*.trt") + glob(tempPath+"*.cache")
104         [ os.remove(oldEngineEAndCache[i]) for i in range(len(oldEngineEAndCache))]
105print( "%s, start! GPU =  %s, batchSize = %2d, dataType  = %s" %( dt.now(), cuda.Device(iGpu).name(), batchSize, dataType ) )    
106107     inputData = loadData(testDataPath)                                      # 读取数据108     oF = open(outputFile, ‘w‘)
109    cuda.Device(iGpu).make_context()
110111     res = predict(inputData, batchSize, dataType)
112for i in range(len(res)):
113print( "%d -> %s" % (i,res[i]) )
114         oF.write(res[i] + ‘\n‘)
115116    oF.close()
117    cuda.Context.pop()
118print( "%s, finish!" %(dt.now()) )

● 代码，矫正器 calibrator.py。核心思想是，手写一个数据生成器供 TensorRT 调用，每次从校正数据集中抽取 batchSize 那么多的数据，计算工作全部由 TensorRT 完成

             1
            import
             os

             2
            import
             numpy as np

             3
            import
             tensorrt as trt

             4
            import
             pycuda.driver as cuda

             5
            import
             pycuda.autoinit

             6
             7
            class
             MyCalibrator(trt.IInt8EntropyCalibrator2):

             8
            def
            __init__
            (self, calibCount, inputShape, calibDataPath, cacheFile):

             9         trt.IInt8EntropyCalibrator2.__init__(self)                                              # 基类默认构造函数                                        10         self.calibCount     = calibCount                
11         self.shape          = inputShape
12         self.calibDataSet   = self.laodData(calibDataPath)                                      # 需要自己实现一个读数据的函数13         self.cacheFile      = cacheFile
14         self.calibData      = np.zeros(self.shape, dtype=np.float32)        
15         self.dIn            = cuda.mem_alloc(trt.volume(self.shape) * trt.float32.itemsize)     # 准备好校正用的设备内存      16         self.oneBatch       = self.batchGenerator()
1718def batchGenerator(self):                                                                   # calibrator 的核心，一个提供数据的生成器19for i in range(self.calibCount):
20print("> calibration ", i)
21             self.calibData = np.random.choice(self.calibDataSet, self.shape[0], replace=False)  # 随机选取数据 22yield np.ascontiguousarray(self.calibData, dtype=np.float32)                        # 调整数据格式后抛出   2324def get_batch_size(self):                                                                   # TensorRT 会调用，不能改函数名25return self.shape[0]
2627def get_batch(self, names):                                                                 # TensorRT 会调用，不能改函数名，老版本 TensorRT 的输入参数个数可能不一样28try:
29             data = next(self.oneBatch)                                                          # 生成下一组校正数据，拷贝到设备并返回设备地址，否则退出30            cuda.memcpy_htod(self.dIn, data)
31return [int(self.dIn)]
32except StopIteration:
33return None
3435def read_calibration_cache(self):                                                           # TensorRT 会调用，不能改函数名36if os.path.exists(self.cacheFile):
37print( "cahce file: %s" %(self.cacheFile) )
38             f = open(self.cacheFile, "rb")
39             cache = f.read()
40            f.close()
41return cache              
4243def write_calibration_cache(self, cache):                                                   # TensorRT 会调用，不能改函数名44print( "cahce file: %s" %(self.cacheFile) )
45         f = open(self.cacheFile, "wb")
46        f.write(cache)
47         f.close()

? 我的程序在 TensorRT 5 中 float32 和 float16 一切正常，int8 无法正确计算。具体表现为：正确加载 calibrator 调用，部分中间层计算结果与 float32 一模一样（二进制位层次上的相同，显然是采用了 float32 代替进行计算了），部分层所有计算结果与 float32 有分歧（10^-2 ~ 10^-3 量级上的），在之后多层计算中误差会逐渐放大，最终计算结果与 float32 大相径庭。更新 TensorRT 6 之后问题消失，int8 也能计算正确结果并获得加速。

原文：https://www.cnblogs.com/cuancuancuanhao/p/11758908.html

内容总结

以上是互联网集市为您收集整理的TensorFlow 笔记04 - 使用类封装写好的 TensorRT 模型，包括 int8 优化要用的 calibrator全部内容，希望文章能够帮你解决TensorFlow 笔记04 - 使用类封装写好的 TensorRT 模型，包括 int8 优化要用的 calibrator所遇到的程序开发问题。如果觉得互联网集市技术教程内容还不错，欢迎将互联网集市网站推荐给程序员好友。

内容备注

版权声明：本文内容由互联网用户自发贡献，该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容，请发送邮件至 gblab@vip.qq.com 举报，一经查实，本站将立刻删除。

内容手机端

扫描二维码推送至手机访问。

本文链接：https://qyyshop.com/info/1239326.html

来源：【匿名】

首页 / TENSORFLOW / TensorFlow 笔记04 - 使用类封装写好的 TensorRT 模型，包括 int8 优化要用的 calibrator

TensorFlow 笔记04 - 使用类封装写好的 TensorRT 模型，包括 int8 优化要用的 calibrator

内容导读

内容图文

内容总结

内容备注

内容手机端

【TensorFlow 笔记04 - 使用类封装写好的 TensorRT 模型，包括 int8 优化要用的 calibrator】教程文章相关的互联网学习教程文章

TensorFlow 笔记04 - 使用类封装写好的 TensorRT 模型，包括 int8 优化要用的 calibrator【代码】

TENSORFLOW - 相关标签

封装 - 相关标签

TENSORFLOW - 最新教程

TENSORFLOW - 最热教程