Buffer Sizes Reduction for Memory-efficient CNN Inference on Mobile and Embedded Devices

In this video Svetlana Minakova and Todor Stefanov, from the Leiden Institute of Advanced Computer Science, present the paper entitled "Buffer Sizes Reduction for Memory-efficient CNN Inference on Mobile and Embedded Devices" which was accepted at the Euromicro Conference on Digital System Design 2020.

Details of the publication:

S. Minakova and T. Stefanov, «Buffer Sizes Reduction for Memory-efficient CNN Inference on Mobile and Embedded Devices», in Proceedings of the DSD 2020 Euromicro Conference on Digital System Design, Portroz, Slovenia, virtual event, August 26-28, 2020

Abstract:

Nowadays, Convolutional Neural Networks (CNNs) are the core of many intelligent systems, including those that run on mobile and embedded devices. However, the execution of computationally demanding and memory-hungry CNNs on resource-limited mobile and embedded devices is quite challenging. One of the main problems,when running CNNs on such devices, is the limited amountof memory available. Thus, reduction of the CNN memory footprint is crucial for the CNN inference on mobile and embedded devices. The CNN memory footprint is determinedby the amount of memory required to store CNN parameters (weights and biases) and intermediate data, exchangedbetween CNN operators. The most common approaches, utilized to reduce the CNN memory footprint, such as pruning and quantization, reduce the memory required to store the CNN parameters. However, these approaches decrease the CNN accuracy. Moreover, with the increasing depth of the state-of-the-art CNNs, the intermediate data exchanged between CNN operators takes even more space than the CNN parameters. Therefore, in this paper, we propose a novel approach, which allows to reduce the memory, required to store intermediate data, exchanged between CNN operators. Unlike pruning and quantization approaches, our proposed approach preserves the CNN accuracy and reduces the CNN memory footprint at the cost of decreasing the CNN throughput. Thus, our approach isorthogonal to the pruning and quantization approaches, and can be combined with these approaches for further CNNmemory footprint reduction.

Buffer Sizes Reduction for Memory-efficient CNN Inference on Mobile and Embedded Devices - Video presentation

Video Surveillance of Critical Infrastructure using Deep Learning algorithms

Contacts