Tuning Multicore Code for Real-Time Performance

October 11, 2017 Kenton Williston

Embedded systems are rapidly evolving. Modern solutions juggle dynamic workloads instead of tasks with predictable timing. Code is no longer static. Changing security and business requirements necessitate frequent software updates.

Perhaps most important, the move to multicore fundamentally changes software design. Optimizing code on these processors requires techniques once reserved for high-end servers.

Particularly challenging? Running real-time and non-real-time code on the same CPU. That calls for specialized tools that can ensure hard determinism in a dynamic environment. That's where a new real-time SDK for Intel® VTune Amplifier comes in.

Resource Sharing and Determinism

Some quick background: Resource sharing is one of the key factors that complicates multicore design. In a typical processor, multiple cores share a last level cache (LLC), DRAM controller, I/O controller, and other hardware.

This sharing can lead to resource contention. Figure 1 shows how a “noisy neighbor” on Core 0 can overuse the cache, starving an app on Core 1 and impacting its performance.

Figure 1. Cache contention can impact performance. (Source: Intel)

Tools like Intel's Cache Allocation Technology can help address these problems by reserving specific cache blocks for specific cores. But cache contention is just one type of problem that can reduce performance. Modern embedded systems need comprehensive tools to take full advantage of their hardware platforms.

The VTune Advantage

To tackle this need, TenAsys, developer of the INtime Distributed RTOS and INtime for Windows, is adding support for Intel® VTune to both of these products. This pairing brings together comprehensive optimization with real-time determinism.

Intel® VTune Amplifier is a performance profiler that can analyze code to find hot spots that consume excess CPU time. Figure 2 illustrates how the tool calls out problem code.

Figure 2. Intel® VTune Amplifier identifies code hot spots. (Source: Intel)

Importantly, Intel® VTune Amplifier supports hot-spot analysis across the Intel® processor lineup. This stands in stark contrast to Cache Allocation Technology, which is currently supported on only six processors. Other features of Intel® VTune Amplifier include:

  • Identifying long synchronization waits that leave a CPU underutilized
  • Stepping through source code to identify which functions use the most CPU time
  • Interfacing with on-chip performance monitoring unit to gather low-level, fine-grained performance data

Low-level hot-spot analysis and CPU-specific profiling aren’t VTune’s only uses. It’s also an effective way to examine how an RTOS node performs in more general mixed workloads.

Figure 3 illustrates the wide range of workloads that may run on a multicore platform. In this example, Windows runs on one core, while INtime runs on the other.

Figure 3. A multicore platform can run heterogeneous workloads. (Source: TenAsys)

INtime runs on bare metal, without using a hypervisor. It creates an entirely separate hardware partition for itself that the conventional Windows installation doesn’t manage. This arrangement ensures that Window-based applications will not directly interfere with the RTOS, but the real-time applications can still run into resource contention or undesirable crosstalk effects.

To deal with these issues, TenAsys has added support for Intel® VTune Amplifier to its INtime products. The “noisy neighbor” scenario mentioned above illustrates how this can be useful.

Suppose the real-time applications meet their timing deadlines when run alone. But once the Windows applications are added, the system behaves erratically. By profiling the workloads individually and then again in parallel, a developer can discover that the real-time workload has a much higher cache miss rate when Windows applications are running at the same time.

Next the developer can turn to the TenAsys SDK, which includes:

  • INtime Explorer, a dynamic object browser
  • INscope system timing analyzer
  • Spider multi-thread aware debugger
  • A fault manager for dictating which actions a node should take after a fault occurs

All of these features are supported through the Microsoft Visual Studio development environment.

Make the Most of MultiCore

As embedded systems become more complex, software developers need tools that can keep pace. By adding support for VTune to its INtime RTOS and INtime for Windows, TenAsys is giving developers the software resources they need to take full advantage of the Intel embedded platforms that will power advances in robotics, machine learning, industrial IoT, and a vast array of other devices.

About the Author

Kenton Williston

Kenton Williston is the Editor-in-Chief of insight.tech and served as the editor of its predecessor publication, the Embedded Innovator magazine. Kenton received his B.S. in Electrical Engineering in 2000 and has been writing about embedded computing and IoT ever since.

Follow on Twitter More Content by Kenton Williston
Previous Download
Operation Convergent Response 2017: Real-Time Anomaly Detection Gives Early Warning of Impending Disasters
Operation Convergent Response 2017: Real-Time Anomaly Detection Gives Early Warning of Impending Disasters

Next Article
5 Steps for Machine Learning and Predictive Maintenance
5 Steps for Machine Learning and Predictive Maintenance

Machine intelligence can be used to reduce maintenance costs and predict when components will fail, but han...

×

Would you like to hear from a company featured in this article?

First Name
Last Name
Your Company
Phone Number
Country/Region
Agree to Terms (see above)
I would like to be contacted by: - optional
Your contact request is submitted.
Error - something went wrong!
×

The content you are looking for is just a step away.

Country/Region
Agree to Terms (above)
Thank you!
Error - something went wrong!