In this tutorial, we are going to evaluate the performance of the model on STM32. We are going to use the X-CUBE-AI to generate the project for performance evaluation purpose. Actually, the X-CUBE-AI can generate three types of project:
Validation (we have use this in the previous tutorial),
SystemPerformance (this tutorial), and
ApplicationTemplate (we will use this in the next tutorial).
The performance application has a minimal interactive console. It uses a COM port to connect to the host serial terminal. We can use a virtual COM port over a USB connection, such as an ST-LINK/V2 feature.
New Project (Performance)
Run the STM32CubeMX, and do the same steps as in the previous tutorial to create a new project. On the
Additional Software Components selection, select
X-CUBE-AI/Application, and select
SystemPerformance. After that, select
X-CUBE-AI/X-CUBE-AI and enable the core and click
Do the same steps as in the previous tutorial to setup the UART, import the model, and generate the code.
In STM32CubeIDE, with your project open, open the
Run menu, and then click the
Debug Configurations... option. In the dialog that pops up, click the
Debug button. After that, click the
Resume button or press
F8 to start the application.
Run your serial terminal application, and connect it to your STM32. The performance application will send the log to the serial console. The first part of the log is the system run-time information:
# AI system performance measurement 3.0
Compiled with GCC 7.3.1
STM32 Runtime configuration...
Device : DevID:0x00000421 (UNKNOWN) RevID:0x00001000
Core Arch. : M4 - FPU PRESENT and used
HAL version : 0x01070600
system clock : 84 MHz
FLASH conf. : ACR=0x00000702 - Prefetch=True $I/$D=(True,True) latency=2
AI Network (AI platform API 1.1.0)...
It indicates the STM32 run-time or executing environment, such as device ID, core architecture, HAL version, system clock value, used toolchain, and others.
The second part of the log is the C-model network information:
Found network "network"
Creating the network "network"..
Model name : network
Model signature : 2e3da697972fc8fab341d62973f1249b
Model datetime : Tue Jul 23 20:59:33 2019
Compile datetime : Jul 23 2019 21:01:16
Runtime revision : (4.0.0)
Tool revision : (rev-) (4.0.0)
nodes : 6
complexity : 34 MACC
activation : 24 bytes
params : 100 bytes
inputs/outputs : 1/1
IN  : shape(HWC):(1,1,2) format=FLOAT (32bits, signed) size=8bytes
OUT  : shape(HWC):(1,1,1) format=FLOAT (32bits, signed) size=4bytes
Initializing the network
It indicates the main static characteristics of the generated C-model. Firstly, it reports the complexity, the functional complexity of the imported model in Multiply And Accumulate operations (MACC). Secondly, it reports the activation and params size. Activations are stored in RAM, while params are stored in ROM/Flash.
The last part of the log is the C-model run-time performance:
Running PerfTest on "network" with random inputs (16 iterations)...
Results for "network", 16 inferences @84MHz/84MHz (complexity: 34 MACC)
duration : 0.029 ms (average)
CPU cycles : 2512 -22/+3 (average,-/+)
CPU Workload : 0%
cycles/MACC : 73.88 (average for all layers)
used stack : 177 bytes
used heap : 0:0 0:0 (req:allocated,req:released) cfg=0
It reports the measured system performance, such as CPU cycles by inference (duration in ms, CPU cycles, CPU workload), and used stack and used heap (in bytes):
duration: indicates the duration in ms for one inference.
CPU cycles: indicates the number of CPU cycles for one inference.
CPU workload: indicator corresponding to the associated CPU workload during 1 s.
cycles/MACC: is the number of CPU cycles by MACC operation.
The use of X-CUBE-AI system performance project is for measuring the model’s inference CPU load and memory usage.