Investigation of dynamic control ML algorithms on existing and future Arm microNPU systems
(2021) EITM01 20211Department of Electrical and Information Technology
- Abstract
- In this thesis, dynamically controlled machine learning algorithms running on state of the art Arm microNPUs, with an attached Cortex-M CPU, were investigated.
The machine learning framework used was Tensorflow and different subsets of it,
such as Tensorflow Lite micro. Compiling the network to run on the microNPU was
done with the use of an open-source compiler called, Vela. In order to investigate
the dynamic support - the algorithm, MTCNN, as proposed by K. Zhang et, al. [1] was implemented on the aforementioned hardware; MTCNN was chosen due to its dynamic properties, as the algorithm structure changes depending on the input it receives. Where some parts of the algorithm may not be executed as frequently,
or at all, depending on... (More) - In this thesis, dynamically controlled machine learning algorithms running on state of the art Arm microNPUs, with an attached Cortex-M CPU, were investigated.
The machine learning framework used was Tensorflow and different subsets of it,
such as Tensorflow Lite micro. Compiling the network to run on the microNPU was
done with the use of an open-source compiler called, Vela. In order to investigate
the dynamic support - the algorithm, MTCNN, as proposed by K. Zhang et, al. [1] was implemented on the aforementioned hardware; MTCNN was chosen due to its dynamic properties, as the algorithm structure changes depending on the input it receives. Where some parts of the algorithm may not be executed as frequently,
or at all, depending on the result of earlier stages.
Full dynamic support was possible on the CPU through custom kernel implementations. Several of the components needed for full use of dynamic control,
on the microNPU, were unsupported throughout the toolchain. Because of this,
MTCNN was divided into several parts (removing the need for dynamic support)
to further investigate the performance gain by natively supporting the dynamic
control operators on the microNPU. The results obtained clearly outlines a general
performance gain by adding dynamic control support and the ability to run and
schedule an algorithm, as a single ML model. However, for MTCNN in particular
it was concluded that a major speedup was achieved by executing the static parts
of the algorithm on the microNPU. The dynamic parts, with regards to MTCNN,
amounts for a small percentage of the total run-time. (Less) - Popular Abstract
- When it comes to computer science in general; dynamic control flow, such as
conditional statements and loops, are a common tool for algorithm development.
However, the same can not be said for machine learning applications, even though
there is a deep connection between the two. The world is becoming increasingly
connected, with intelligent devices present in everyday products ranging from security cameras to LED lights. The need for smarter edge devices is growing and
more demand is put on said devices - a combination of these two topics are touched
upon in this thesis.
The possibility of implementing machine learning algorithms that are in nature,
dynamic, on embedded devices was investigated. This was done via implementing
an... (More) - When it comes to computer science in general; dynamic control flow, such as
conditional statements and loops, are a common tool for algorithm development.
However, the same can not be said for machine learning applications, even though
there is a deep connection between the two. The world is becoming increasingly
connected, with intelligent devices present in everyday products ranging from security cameras to LED lights. The need for smarter edge devices is growing and
more demand is put on said devices - a combination of these two topics are touched
upon in this thesis.
The possibility of implementing machine learning algorithms that are in nature,
dynamic, on embedded devices was investigated. This was done via implementing
an algorithm called MTCNN on a system containing a general purpose computer
with an attached accelerator. The accelerator is specialized to perform tasks often found in machine learning algorithms. The process of running an algorithm on the accelerator included extensive use of different tools.
The fundamental approach to this investigation was reminiscent of the famous
phrase "Divide and conquer", however, "Divide and analyze" is a better description; MTCNN was deconstructed into smaller parts to analyze what was, and
what was not, supported by the accelerator.
This resulted in the ability of running performance estimates for the complete
algorithm. A resounding increase to performance was achieved when larger parts
of the algorithm were able to run on the accelerator, however, the dynamic parts
had to be executed on the general purpose computer. The reason for this was
the lack of support for dynamic control flow in the different tools and hardware;
Due to this, the possibility of running algorithms like MTCNN completely on the
accelerator is currently not possible. (Less)
Please use this url to cite or link to this publication:
http://lup.lub.lu.se/student-papers/record/9060936
- author
- Åhlund, Joel LU and Cordesius, David LU
- supervisor
- organization
- course
- EITM01 20211
- year
- 2021
- type
- H2 - Master's Degree (Two Years)
- subject
- keywords
- Dynamic control, NPU, MTCNN, Machine learning, Neural networks
- report number
- LU/LTH-EIT 2021-835
- language
- English
- id
- 9060936
- date added to LUP
- 2021-08-12 10:09:16
- date last changed
- 2021-08-12 10:09:16
@misc{9060936, abstract = {{In this thesis, dynamically controlled machine learning algorithms running on state of the art Arm microNPUs, with an attached Cortex-M CPU, were investigated. The machine learning framework used was Tensorflow and different subsets of it, such as Tensorflow Lite micro. Compiling the network to run on the microNPU was done with the use of an open-source compiler called, Vela. In order to investigate the dynamic support - the algorithm, MTCNN, as proposed by K. Zhang et, al. [1] was implemented on the aforementioned hardware; MTCNN was chosen due to its dynamic properties, as the algorithm structure changes depending on the input it receives. Where some parts of the algorithm may not be executed as frequently, or at all, depending on the result of earlier stages. Full dynamic support was possible on the CPU through custom kernel implementations. Several of the components needed for full use of dynamic control, on the microNPU, were unsupported throughout the toolchain. Because of this, MTCNN was divided into several parts (removing the need for dynamic support) to further investigate the performance gain by natively supporting the dynamic control operators on the microNPU. The results obtained clearly outlines a general performance gain by adding dynamic control support and the ability to run and schedule an algorithm, as a single ML model. However, for MTCNN in particular it was concluded that a major speedup was achieved by executing the static parts of the algorithm on the microNPU. The dynamic parts, with regards to MTCNN, amounts for a small percentage of the total run-time.}}, author = {{Åhlund, Joel and Cordesius, David}}, language = {{eng}}, note = {{Student Paper}}, title = {{Investigation of dynamic control ML algorithms on existing and future Arm microNPU systems}}, year = {{2021}}, }