Accelerating Digit Classification on FPGA with Pruned Binarized Neural Networks

Syamantak Payra; Gabriel Loke; Yoel Fink; Joseph D. Steinmeyer

doi:10.71070/oaml.v4i1.111

Vol. 4 No. 1 (2024): Issue 4

Articles

Accelerating Digit Classification on FPGA with Pruned Binarized Neural Networks

PDF

Syamantak Payra,
Gabriel Loke,
Yoel Fink,
Joseph D. Steinmeyer

more info

Syamantak Payra
Dept. of Electrical Engineering, Stanford University, 94305, CA, USA; Institute for Soldier Nanotechnologies, Massachusetts Institute of Technology, 02139, MA, USA

Gabriel Loke
Dept. of Materials Science and Engineering, Massachusetts Institute of Technology, 02139, MA, USA

Yoel Fink
Dept. of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, 02139, MA, USA; Institute for Soldier Nanotechnologies, Massachusetts Institute of Technology, 02139, MA, USA

Joseph D. Steinmeyer
Dept. of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, 02139, MA, USA

DOI: https://doi.org/10.71070/oaml.v4i1.111

Published 2024-04-10

Keywords

Algorithms Implemented in Hardware,
Combinational Logic,
Cost/Performance,
Neural Nets,
Optical Character Recognition

How to Cite

Payra, S., Loke, G., Fink, Y., & D. Steinmeyer, J. (2024). Accelerating Digit Classification on FPGA with Pruned Binarized Neural Networks. Optimizations in Applied Machine Learning, 4(1), 1–17. https://doi.org/10.71070/oaml.v4i1.111

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

Abstract

As neural networks are increasingly deployed on mobile and distributed computing platforms, there is a need to lower latency and increase computational speed while decreasing power and memory usage. Rather than using FPGAs as accelerators in tandem with CPUs or GPUs, we directly encode individual neural network layers as combinational logic within FPGA hardware. Utilizing binarized neural networks minimizes the arithmetic computation required, shrinking latency to only the signal propagation delay. We evaluate size-optimization strategies and demonstrate network compression via weight quantization and weight-model unification, achieving 96% of the accuracy of baseline MNIST digit classification models while using only 3% of the memory. We further achieve 86% decrease in model footprint, 8mW dynamic power consumption, and <9ns latency, validating the versatility and capability of feature-strength-based pruning approaches for binarized neural networks to flexibly meet performance requirements amid application resource constraints.

PDF

References

K. He and J. Sun, “Convolutional Neural Networks at Constrained Time Cost,” ArXiv14121710 Cs, Dec. 2014, [Online]. Available: http://arxiv.org/abs/1412.1710
A. Shawahna, S. M. Sait, and A. El-Maleh, “FPGA-based Accelerators of Deep Learning Networks for Learning and Classification: A Review,” IEEE Access, vol. 7, pp. 7823–7859, 2019, doi: 10.1109/ACCESS.2018.2890150.
D. G. Bailey, “Image Processing Using FPGAs,” J. Imaging, vol. 5, no. 5, p. 53, May 2019, doi: 10.3390/jimaging5050053.
V. Secrieru, S. Zaporojan, and V. Dorogan, “A COST- PERFORMANCE ANALYSIS OF EMBEDDED SYSTEMS FOR LOW AND MEDIUM-VOLUMES APPLICATIONS,” Technical
University of Moldova, Chişinău, Moldova, Jan. 2012. [Online].
Available: http://repository.utm.md/handle/5014/741
F. Siddiqui et al., “FPGA-Based Processor Acceleration for Image Processing Applications,” J. Imaging, vol. 5, no. 1, p. 16, Jan. 2019, doi: 10.3390/jimaging5010016.
H. Qi, O. Ayorinde, and B. H. Calhoun, “An Ultra-Low-Power FPGA for IoT Applications,” presented at the 2017 IEEE SOI-3D-Subthreshold Microelectronics Technology Unified Conference (S3S), Burlingame, CA, USA, Oct. 2017. doi: 10.1109/S3S.2017.8308753.
C. Lammie, W. Xiang, and M. R. Azghadi, “Accelerating Deterministic and Stochastic Binarized Neural Networks on FPGAs Using OpenCL,” ArXiv190506105 Cs Stat, May 2019, doi: 10.1109/MWSCAS2019.1158.
N. Farha, Ann Louisa Paul J, Naadiya Kousar L S, Devika S, and Ruckmani Divakaran, “Design and Implementation of Logic Gates and Adder Circuits on FPGA Using ANN,” Int. J. Res. Appl. Sci. Eng. Technol. IJRASET, vol. 4, no. 5, pp. 623–629, May 2016.
Y.-H. Chen, “Architecture Design for Highly Flexible and Energy- Efficient Deep Neural Network Accelerators,” Massachusetts Institute of Technology, 2018.
Y.-H. Chen, T.-J. Yang, J. Emer, and V. Sze, “Eyeriss v2: A Flexible Accelerator for Emerging Deep Neural Networks on Mobile Devices,” ArXiv180707928 Cs, May 2019, [Online]. Available: http://arxiv.org/abs/1807.07928
P. Garcia, D. Bhowmik, R. Stewart, G. Michaelson, and A. Wallace, “Optimized Memory Allocation and Power Minimization for FPGA- Based Image Processing,” J. Imaging, vol. 5, no. 1, p. 7, Jan. 2019, doi: 10.3390/jimaging5010007.
A. R. Ormondi and J. C. Rajapakse, Eds., FPGA implementations of neural networks. Dordrecht, The Netherlands: Springer, 2006.
C. Kalbande and A. Bavaskar, “Implementation of FPGA-Based General Purpose Artificial Neural Network,” ITSI Trans. Electr. Electron. Eng., vol. 1, no. 3, pp. 99–103, 2013.
Y. Hao, “A General Neural Network Hardware Architecture on FPGA,” ArXiv171105860 Cs, Nov. 2017, [Online]. Available: http://arxiv.org/abs/1711.05860
G.-M. Lozito, A. Laudani, F. Riganti Fulginei, and A. Salvini, “FPGA Implementations of Feed Forward Neural Network by using Floating Point Hardware Accelerators,” Adv. Electr. Electron. Eng., vol. 12, no. 1, pp. 30–39, Mar. 2014, doi: 10.15598/aeee.v12i1.831.
R. Gadea, J. Cerda, F. Ballester, and A. Macholi, “Artificial neural network implementation on a single FPGA of a pipelined on-line backpropagation,” in Proceedings 13th International Symposium on System Synthesis, Madrid, Spain, 2000, pp. 225–230. doi: 10.1109/ISSS.2000.874054.
G. D. S. Korol, “An FPGA Implementation for Convolutional Neural Network,” Pontifical Catholic University of Rio Grande Do Sul, Porto Alegre, 2019.
C. Farabet, C. Poulet, and Y. LeCun, “An FPGA-based stream processor for embedded real-time vision with Convolutional Networks,” in 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops, Kyoto, Sep. 2009, pp. 878–885. doi: 10.1109/ICCVW.2009.5457611.
J. Duarte et al., “Fast inference of deep neural networks in FPGAs for particle physics,” J. Instrum., vol. 13, no. 07, pp. P07027–P07027, Jul. 2018, doi: 10.1088/1748-0221/13/07/P07027.
D. Crankshaw, “The Design and Implementation of Low-Latency Prediction Serving Systems,” University of California at Berkeley, UCB/EECS-2019-171, Dec. 2019. [Online]. Available: http://www2.eecs.berkeley.edu/Pubs/TechRpts/2019/EECS-2019- 171.html
L. Bernstein, A. Sludds, R. Hamerly, V. Sze, J. Emer, and D. Englund, “Freely scalable and reconfigurable optical hardware for deep learning,” ArXiv200613926 Cs, Jun. 2020, [Online]. Available: http://arxiv.org/abs/2006.13926
N. Dowlin, R. Gilad-Bachrach, K. Laine, K. Lauter, M. Naehrig, and J. Wernsing, “CryptoNets: Applying Neural Networks to Encrypted Data with High Throughput and Accuracy,” in Proceedings of the 33rd International Conference on Machine Learning, New York, NY, USA, 2016, vol. 48, p. 10.
A. Brutzkus, O. Elisha, and R. Gilad-Bachrach, “Low Latency Privacy Preserving Inference,” in Proceedings of the 36th International Conference on Machine Learning, Long Beach, California, 2019, vol. 97, p. 10.
C. Juvekar, V. Vaikuntanathan, and A. Chandrakasan, “GAZELLE: A Low Latency Framework for Secure Neural Network Inference,” p. 17, Jan. 2018.
Y. Cheng, D. Wang, P. Zhou, and T. Zhang, “A Survey of Model Compression and Acceleration for Deep Neural Networks,” ArXiv171009282 Cs, Jun. 2020, [Online]. Available: http://arxiv.org/abs/1710.09282
W. Niu et al., “GRIM: A General, Real-Time Deep Learning Inference Framework for Mobile Devices based on Fine-Grained Structured Weight Sparsity,” IEEE Trans. Pattern Anal. Mach. Intell., pp. 1–1, 2021, doi: 10.1109/TPAMI.2021.3089687.
J. Johnson, “Rethinking floating point for deep learning,” in 32nd Conference on Neural Information Processing Systems, Montreal, Canada, 2018, p. 8.
C. Zhang, P. Li, G. Sun, Y. Guan, B. Xiao, and J. Cong, “Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural
Networks,” in Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, Monterey California USA, Feb. 2015, pp. 161–170. doi: 10.1145/2684746.2689060.
C. Zhang, D. Wu, J. Sun, G. Sun, G. Luo, and J. Cong, “Energy- Efficient CNN Implementation on a Deeply Pipelined FPGA Cluster,” in Proceedings of the 2016 International Symposium on Low Power Electronics and Design, San Francisco Airport CA USA, Aug. 2016, pp. 326–331. doi: 10.1145/2934583.2934644.
E. Stromatias, D. Neil, F. Galluppi, M. Pfeiffer, S.-C. Liu, and S. Furber, “Scalable energy-efficient, low-latency implementations of trained spiking Deep Belief Networks on SpiNNaker,” in 2015 International Joint Conference on Neural Networks (IJCNN), Killarney, Ireland, Jul. 2015, pp. 1–8. doi: 10.1109/IJCNN.2015.7280625.
P. Corcoran and S. K. Datta, “Mobile-Edge Computing and the Internet of Things for Consumers: Extending cloud computing and services to the edge of the network,” IEEE Consum. Electron. Mag., vol. 5, no. 4,
pp. 73–74, Oct. 2016, doi: 10.1109/MCE.2016.2590099.
J. Lyu and S. Sheen, “A Channel-Pruned and Weight-Binarized Convolutional Neural Network for Keyword Spotting,” ArXiv190905623 Cs Stat, Sep. 2019, [Online]. Available: http://arxiv.org/abs/1909.05623
I. Hubara, M. Courbariaux, D. Soudry, R. El-Yaniv, and Y. Bengio, “Binarized Neural Networks,” in 30th Conference on Neural Information Processing Systems, Barcelona, Spain, 2016, p. 9.
M. Rastegari, V. Ordonez, J. Redmon, and A. Farhadi, “XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks,” in Computer Vision – ECCV 2016, vol. 9908, B. Leibe, J. Matas, N. Sebe, and M. Welling, Eds. Cham: Springer International Publishing, 2016, pp. 525–542. doi: 10.1007/978-3-319-46493-0_32.
Y. LeCun, C. Cortes, and C. J. C. Burges, “The MNIST Database of Handwritten Digits.” 1998. [Online]. Available: http://yann.lecun.com/exdb/mnist/
M. Abadi et al., “TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems,” in OSDI’16, Savannah, GA, USA, Mar. 2016, pp. 265–283. doi: doi:10.5555/3026877.3026899.
T. Hastie, T. Robert, and M. Wainwright, Statistical Learning with Sparsity, 1st ed. New York: Taylor & Francis Group, 2016. [Online].
Available: https://doi.org/10.1201/b18401
R. N. D’Souza, P.-Y. Huang, and F.-C. Yeh, “Structural Analysis and Optimization of Convolutional Neural Networks with a Small Sample Size,” Sci. Rep., vol. 10, no. 1, p. 834, Dec. 2020, doi: 10.1038/s41598-
020-57866-2.
Y. LeCun et al., “Backpropagation Applied to Handwritten Zip Code Recognition,” Neural Comput., vol. 1, no. 4, pp. 541–551, Dec. 1989, doi: 10.1162/neco.1989.1.4.541.
G. Wang and J. Gong, “Facial Expression Recognition Based on Improved LeNet-5 CNN,” in 2019 Chinese Control And Decision Conference (CCDC), Nanchang, China, Jun. 2019, pp. 5655–5660. doi: 10.1109/CCDC.2019.8832535.
S. Xiong, X. Chen, and H. Zhang, ‘Deep Learning-Based Multifunctional End-to-End Model for Optical Character Classification and Denoising’, Journal of Computational Methods in Engineering Applications, pp. 1–13, Nov. 2023, doi: 10.62836/jcmea.v3i1.030103.
I. Kouretas and V. Paliouras, “Hardware Implementation of a Softmax- Like Function for Deep Learning,” Technologies, vol. 8, no. 3, p. 46, Aug. 2020, doi: 10.3390/technologies8030046.
L. Guerra, B. Zhuang, I. Reid, and T. Drummond, “Automatic Pruning for Quantized Neural Networks,” ArXiv200200523 Cs, Feb. 2020, [Online]. Available: http://arxiv.org/abs/2002.00523
Y. Li and F. Ren, “BNN Pruning: Pruning Binary Neural Network Guided by Weight Flipping Frequency,” in 2020 21st International Symposium on Quality Electronic Design (ISQED), Santa Clara, CA, USA, Mar. 2020, pp. 306–311. doi: 10.1109/ISQED48828.2020.9136977.
H. Li, A. Kadav, I. Durdanovic, H. Samet, and H. P. Graf, “Pruning Filters for Efficient ConvNets,” presented at the International Conference on Learning Representations (ICLR), Mar. 2017. [Online].
Available: http://arxiv.org/abs/1608.08710
K. M. Cherry and L. Qian, “Scaling up molecular pattern recognition with DNA-based winner-take-all neural networks,” Nature, vol. 559, no. 7714, pp. 370–376, Jul. 2018, doi: 10.1038/s41586-018-0289-6.
M. Nazemi, G. Pasandi, and M. Pedram, “NullaNet: Training Deep Neural Networks for Reduced-Memory-Access Inference,” ArXiv180708716 Cs Stat, Aug. 2018, [Online]. Available: http://arxiv.org/abs/1807.08716
M. Kim and P. Smaragdis, “Bitwise Neural Networks,” presented at the International Conference on Machine Learning (ICML) Workshop on Resource-Efficient Machine Learning, Lille, France, Jan. 2016. [Online]. Available: http://arxiv.org/abs/1601.06071
H. Alemdar, V. Leroy, A. Prost-Boucle, and F. Petrot, “Ternary neural networks for resource-efficient AI applications,” in 2017 International Joint Conference on Neural Networks (IJCNN), Anchorage, AK, USA, May 2017, pp. 2547–2554. doi: 10.1109/IJCNN.2017.7966166.
M. Nazemi, G. Pasandi, and M. Pedram, “Energy-efficient, low-latency realization of neural networks through boolean logic minimization,” in Proceedings of the 24th Asia and South Pacific Design Automation Conference, Tokyo Japan, Jan. 2019, pp. 274–279. doi: 10.1145/3287624.3287722.
X. Lin, C. Zhao, and W. Pan, “Towards Accurate Binary Convolutional Neural Network,” p. 9.
A. Palvanov and Y. I. Cho, “Comparisons of Deep Learning Algorithms for MNIST in Real-Time Environment,” Int. J. FUZZY Log. Intell. Syst., vol. 18, no. 2, pp. 126–134, Jun. 2018, doi: 10.5391/IJFIS.2018.18.2.126.
Y. Umuroglu et al., “FINN: A Framework for Fast, Scalable Binarized Neural Network Inference,” Proc. 2017 ACMSIGDA Int. Symp. Field- Program. Gate Arrays, pp. 65–74, Feb. 2017, doi: 10.1145/3020078.3021744.
D. Giardino, M. Matta, F. Silvestri, S. Spanò, and V. Trobiani, “FPGA Implementation of Hand-written Number Recognition Based on CNN,”
Int. J. Adv. Sci. Eng. Inf. Technol., vol. 9, no. 1, p. 167, Feb. 2019, doi: 10.18517/ijaseit.9.1.6948.
J. Ngadiuba et al., “Compressing deep neural networks on FPGAs to binary and ternary precision with hls4ml,” Mach. Learn. Sci. Technol., vol. 2, no. 1, p. 015001, Dec. 2020, doi: 10.1088/2632-2153/aba042.
R. B. Kent and M. S. Pattichis, “Design, Implementation, and Analysis of High-Speed Single-Stage N-Sorters and N-Filters,” IEEE Access, vol. 9, pp. 2576–2591, 2021, doi: 10.1109/ACCESS.2020.3047594.
S. Liang, S. Yin, L. Liu, W. Luk, and S. Wei, “FP-BNN: Binarized neural network on FPGA,” Neurocomputing, vol. 275, pp. 1072–1086, Jan. 2018, doi: 10.1016/j.neucom.2017.09.046.
Z. Wang, “A Digits-Recognition Convolutional Neural Network on FPGA,” p. 45.
W. Schwarting, J. Alonso-Mora, and D. Rus, “Planning and Decision- Making for Autonomous Vehicles,” Annu. Rev. Control Robot. Auton. Syst., vol. 1, no. 1, pp. 187–210, May 2018, doi: 10.1146/annurev- control-060117-105157.
M. Krause, “How Many AA Batteries Would it Take to Power a Mercedes?,” NSF Center for Sustainable Nanotechnology, Apr. 29, 2016. https://sustainable-nano.com/2016/04/29/aa-batteries-mercedes/
S. Payra, I. Wicaksono, J. Cherston, C. Honnet, V. Sumini, and J. A. Paradiso, “Feeling Through Spacesuits: Application of Space-Resilient E-Textiles to Enable Haptic Feedback on Pressurized Extravehicular Suits,” in 2021 IEEE Aerospace Conference (50100), Big Sky, MT, USA, Mar. 2021, pp. 1–12. doi: 10.1109/AERO50100.2021.9438515.
G. Loke et al., “Digital electronics in fibres enable fabric-based machine-learning inference,” Nat. Commun., vol. 12, no. 1, p. 3317, Dec. 2021, doi: 10.1038/s41467-021-23628-5.
S. Payra, G. Loke, and Y. Fink, “Enabling Adaptive Robot-Environment Interaction and Context-Aware Artificial Somatosensory Reflexes through Sensor-Embedded Fibers,” presented at the 2020 IEEE Undergraduate Research Technology Conference, Massachusetts Institute of Technology, Oct. 2020. [Online]. Available: https://www.rle.mit.edu/wp-content/uploads/2020/10/PA20- 0081_URTC_Updated.pdf
F. R. Willett, D. T. Avansino, L. R. Hochberg, J. M. Henderson, and K.
V. Shenoy, “High-performance brain-to-text communication via handwriting,” Nature, vol. 593, no. 7858, pp. 249–254, May 2021, doi: 10.1038/s41586-021-03506-2.
M. Wielgosz and M. Karwatowski, “Mapping Neural Networks to FPGA-Based IoT Devices for Ultra-Low Latency Processing,” Sensors, vol. 19, no. 13, p. 2981, Jul. 2019, doi: 10.3390/s19132981.
V. Sze, Y.-H. Chen, J. Emer, A. Suleiman, and Z. Zhang, “Hardware for Machine Learning: Challenges and Opportunities,” 2017 IEEE Cust. Integr. Circuits Conf. CICC, pp. 1–8, Apr. 2017, doi: 10.1109/CICC.2017.7993626.

Accelerating Digit Classification on FPGA with Pruned Binarized Neural Networks

Keywords

How to Cite

Download Citation

Abstract

References