Investigation of dynamic control ML algorithms on existing and future Arm microNPU systems

Åhlund, Joel; Cordesius, David

Investigation of dynamic control ML algorithms on existing and future Arm microNPU systems

Mark

Åhlund, Joel ^LU and Cordesius, David ^LU (2021) EITM01 20211
Department of Electrical and Information Technology

Abstract: In this thesis, dynamically controlled machine learning algorithms running on state of the art Arm microNPUs, with an attached Cortex-M CPU, were investigated.
The machine learning framework used was Tensorflow and different subsets of it,
such as Tensorflow Lite micro. Compiling the network to run on the microNPU was
done with the use of an open-source compiler called, Vela. In order to investigate
the dynamic support - the algorithm, MTCNN, as proposed by K. Zhang et, al. [1] was implemented on the aforementioned hardware; MTCNN was chosen due to its dynamic properties, as the algorithm structure changes depending on the input it receives. Where some parts of the algorithm may not be executed as frequently,
or at all, depending on... (More); In this thesis, dynamically controlled machine learning algorithms running on state of the art Arm microNPUs, with an attached Cortex-M CPU, were investigated.
The machine learning framework used was Tensorflow and different subsets of it,
such as Tensorflow Lite micro. Compiling the network to run on the microNPU was
done with the use of an open-source compiler called, Vela. In order to investigate
the dynamic support - the algorithm, MTCNN, as proposed by K. Zhang et, al. [1] was implemented on the aforementioned hardware; MTCNN was chosen due to its dynamic properties, as the algorithm structure changes depending on the input it receives. Where some parts of the algorithm may not be executed as frequently,
or at all, depending on the result of earlier stages.
Full dynamic support was possible on the CPU through custom kernel implementations. Several of the components needed for full use of dynamic control,
on the microNPU, were unsupported throughout the toolchain. Because of this,
MTCNN was divided into several parts (removing the need for dynamic support)
to further investigate the performance gain by natively supporting the dynamic
control operators on the microNPU. The results obtained clearly outlines a general
performance gain by adding dynamic control support and the ability to run and
schedule an algorithm, as a single ML model. However, for MTCNN in particular
it was concluded that a major speedup was achieved by executing the static parts
of the algorithm on the microNPU. The dynamic parts, with regards to MTCNN,
amounts for a small percentage of the total run-time. (Less)
Popular Abstract: When it comes to computer science in general; dynamic control flow, such as
conditional statements and loops, are a common tool for algorithm development.
However, the same can not be said for machine learning applications, even though
there is a deep connection between the two. The world is becoming increasingly
connected, with intelligent devices present in everyday products ranging from security cameras to LED lights. The need for smarter edge devices is growing and
more demand is put on said devices - a combination of these two topics are touched
upon in this thesis.
The possibility of implementing machine learning algorithms that are in nature,
dynamic, on embedded devices was investigated. This was done via implementing
an... (More); When it comes to computer science in general; dynamic control flow, such as
conditional statements and loops, are a common tool for algorithm development.
However, the same can not be said for machine learning applications, even though
there is a deep connection between the two. The world is becoming increasingly
connected, with intelligent devices present in everyday products ranging from security cameras to LED lights. The need for smarter edge devices is growing and
more demand is put on said devices - a combination of these two topics are touched
upon in this thesis.
The possibility of implementing machine learning algorithms that are in nature,
dynamic, on embedded devices was investigated. This was done via implementing
an algorithm called MTCNN on a system containing a general purpose computer
with an attached accelerator. The accelerator is specialized to perform tasks often found in machine learning algorithms. The process of running an algorithm on the accelerator included extensive use of different tools.
The fundamental approach to this investigation was reminiscent of the famous
phrase "Divide and conquer", however, "Divide and analyze" is a better description; MTCNN was deconstructed into smaller parts to analyze what was, and
what was not, supported by the accelerator.
This resulted in the ability of running performance estimates for the complete
algorithm. A resounding increase to performance was achieved when larger parts
of the algorithm were able to run on the accelerator, however, the dynamic parts
had to be executed on the general purpose computer. The reason for this was
the lack of support for dynamic control flow in the different tools and hardware;
Due to this, the possibility of running algorithms like MTCNN completely on the
accelerator is currently not possible. (Less)

Please use this url to cite or link to this publication: http://lup.lub.lu.se/student-papers/record/9060936

author

Åhlund, Joel ^LU and Cordesius, David ^LU

supervisor

Lucas Ferreira ^LU
Steffen Malkowsky ^LU

organization

Department of Electrical and Information Technology

course

EITM01 20211

year

2021

type

H2 - Master's Degree (Two Years)

subject

Technology and Engineering

keywords

Dynamic control, NPU, MTCNN, Machine learning, Neural networks

report number

LU/LTH-EIT 2021-835

language

English

id

9060936

date added to LUP

2021-08-12 10:09:16

date last changed

2021-08-12 10:09:16

@misc{9060936,
  abstract     = {{In this thesis, dynamically controlled machine learning algorithms running on state of the art Arm microNPUs, with an attached Cortex-M CPU, were investigated.
The machine learning framework used was Tensorflow and different subsets of it,
such as Tensorflow Lite micro. Compiling the network to run on the microNPU was
done with the use of an open-source compiler called, Vela. In order to investigate
the dynamic support - the algorithm, MTCNN, as proposed by K. Zhang et, al. [1] was implemented on the aforementioned hardware; MTCNN was chosen due to its dynamic properties, as the algorithm structure changes depending on the input it receives. Where some parts of the algorithm may not be executed as frequently,
or at all, depending on the result of earlier stages.
Full dynamic support was possible on the CPU through custom kernel implementations. Several of the components needed for full use of dynamic control,
on the microNPU, were unsupported throughout the toolchain. Because of this,
MTCNN was divided into several parts (removing the need for dynamic support)
to further investigate the performance gain by natively supporting the dynamic
control operators on the microNPU. The results obtained clearly outlines a general
performance gain by adding dynamic control support and the ability to run and
schedule an algorithm, as a single ML model. However, for MTCNN in particular
it was concluded that a major speedup was achieved by executing the static parts
of the algorithm on the microNPU. The dynamic parts, with regards to MTCNN,
amounts for a small percentage of the total run-time.}},
  author       = {{Åhlund, Joel and Cordesius, David}},
  language     = {{eng}},
  note         = {{Student Paper}},
  title        = {{Investigation of dynamic control ML algorithms on existing and future Arm microNPU systems}},
  year         = {{2021}},
}

LUP Student Papers

LUND UNIVERSITY LIBRARIES

Investigation of dynamic control ML algorithms on existing and future Arm microNPU systems