In this tutorial, we are going to evaluate the performance of the model on STM32. We are going to use the X-CUBE-AI to generate the project for performance evaluation purpose. Actually, the X-CUBE-AI can generate three types of project: Validation (we have use this in the previous tutorial), SystemPerformance (this tutorial), and ApplicationTemplate (we will use this in the next tutorial).

The performance application has a minimal interactive console. It uses a COM port to connect to the host serial terminal. We can use a virtual COM port over a USB connection, such as an ST-LINK/V2 feature.

New Project (Performance)

Run the STM32CubeMX, and do the same steps as in the previous tutorial to create a new project. On the Additional Software Components selection, select X-CUBE-AI/Application, and select SystemPerformance. After that, select X-CUBE-AI/X-CUBE-AI and enable the core and click Ok.

Do the same steps as in the previous tutorial to setup the UART, import the model, and generate the code.

In STM32CubeIDE, with your project open, open the Run menu, and then click the Debug Configurations... option. In the dialog that pops up, click the Debug button. After that, click the Resume button or press F8 to start the application.

Run-Time Performance

Run your serial terminal application, and connect it to your STM32. The performance application will send the log to the serial console. The first part of the log is the system run-time information:

It indicates the STM32 run-time or executing environment, such as device ID, core architecture, HAL version, system clock value, used toolchain, and others.

The second part of the log is the C-model network information:

It indicates the main static characteristics of the generated C-model. Firstly, it reports the complexity, the functional complexity of the imported model in Multiply And Accumulate operations (MACC). Secondly, it reports the activation and params size. Activations are stored in RAM, while params are stored in ROM/Flash.

The last part of the log is the C-model run-time performance:

It reports the measured system performance, such as CPU cycles by inference (duration in ms, CPU cycles, CPU workload), and used stack and used heap (in bytes):

  • duration: indicates the duration in ms for one inference.
  • CPU cycles: indicates the number of CPU cycles for one inference.
  • CPU workload: indicator corresponding to the associated CPU workload during 1 s.
  • cycles/MACC: is the number of CPU cycles by MACC operation.


The use of X-CUBE-AI system performance project is for measuring the model’s inference CPU load and memory usage.

Next: Run the Model as an Application

Leave a Reply

Close Menu