loading page

OptTc: A method of optimizating memory access latency for convolutional accelerators
  • +1
  • Hongzhi Zhao,
  • Yongchang Wang,
  • Fang Zhang,
  • Jie Wu
Hongzhi Zhao

Corresponding Author:[email protected]

Author Profile
Yongchang Wang
Fang Zhang
Jie Wu

Abstract

One important approach to improve the performance of the convolutional accelerator is to reduce its memory access latency. When the DDR memory connected to the accelerator is fixed, the memory access latency of the convolutional accelerator can be decreased by reducing the number of memory accesses or memory row conflicts, and using DMA to optimize memory access time intervals and so on. The convolutional process involves accessing a large amount of data with different sizes and types, which are usually stored in different memory rows. Accessing such data always causes a significant number of memory row conflicts, resulting in high memory access latency of the convolutional accelerator. This paper firstly analyzes the composition of DDR access latency for the convolutional accelerator and finds that the number of memory row conflicts is mainly decided by the number of loop tiling columns and padding size. Then, an OptTc method is proposed to calculate the optimal number of loop tiling columns. FPGA experiments of proposed convolution accelerator show that using the optimal number of loop tiling columns can reduce the memory access latency by 24% with negligible increase in FPGA resources in the case of VGG-16. The proposed OptTc method can effectively assist in the design of convolutional accelerators.
17 Dec 2023Submitted to TechRxiv
22 Dec 2023Published in TechRxiv