Skip to main content

INDUSTRY

Simplify Safety-Critical System Design

As industrial systems become more complex, safety certification is a growing challenge. Building integrated systems that combine safety-critical and non-critical features is particularly difficult.

Developers can tackle the challenge with the Intel® Xeon® processor D-1529 for industrial IEC 61508 certification, the first Intel® processor designed for SIL 2 certification. This functional safety solution provides a tightly integrated package comprising hardware, software, and certification documentation that simplifies safety certification, and it enables developers to replace redundant systems with a single hardware platform – significantly reducing costs.

The Basics of Functional Safety

Before we get to the new processor, let's review the fundamentals of safety-critical design. Designing safe systems begins with risk assessment. Consider the way IEC 61508 categorizes risk based on the likelihood of the event and the consequence if the event occurs (see Figure 1).

Figure 1. IEC 61508 places risk into four categories. (Source: Wikipedia)

To meet IEC 61508 requirements – or any safety regulation – developers must assess the risk for determined hazardous events. Then they must design the system to avoid unacceptable risks, and put mechanisms in place to accommodate the events that do occur.

A First In Certified Processors

To create robust systems, safety must be considered from the onset of design. Thus, safety begins with the hardware architecture.

As Intel's first processor for IEC 61508 certification, the Intel® Xeon® processor D-1529 comes with a variety of features not found on general-purpose chips. These include hardware upgrades along with software, tools, and documentation to accelerate development of IEC61508 safety integrity level (SIL) 1 and 2 applications.

On the silicon front, the processor integrates extensive hardware diagnostics. These include:

  • Over-voltage/over-current detection
  • Processor temperature reporting
  • Machine check exceptions
  • PCIe advanced error reporting
  • Platform controller hub (PCH) error logic
  • SATA and AHCi diagnostics

Safety features are also integrated into the software architecture supporting the processor. Programmable error exceptions and software-generated exceptions enable developers to ensure the reliable operation of application code.

Supporting tools include the Intel® Software Test Library (Intel® STL). This library simplifies enabling offline and online software diagnostics, software validation, and fault injection.

Replacing Redundant Systems

A key advantage of the intel approach is its ability to replace redundant systems with a single platform. As an example of this approach, Laurent Remont, CTO of Kontron, points to safety-certified computers for rail.

The traditional approach is to use redundant processing cards, explains Remont. For example, Kontron offers a 3U VPX system with redundant blades (Channel A and Channel B blades in Figure 2) linked through an Ethernet switch and monitored by a gateway CPU blade. To further guarantee availability, the entire 3U VPX system is then duplicated.

Figure 2. Kontron's rail computer includes multiple layers of redundancy. (Source: Kontron)

With the Intel® Xeon® processor D-1529, developers can shrink Channel A, Channel B, the Ethernet switch, and the gateway blade onto a single blade. Instead of running the redundant code on different blades, developers can now run the code on two different cores of the same processor.

According to Remont, this hardware consolidation relies on the fact that the processor cores have been pre-certified by Intel for their independence and robustness, as well as the redundancy ensured by the Intel safety library.

Remont notes that the overall reliability and cost of a single-blade solution is significantly optimized when compared to the traditional architecture. This is particularly true considering that many traditional systems use a mix of CPU architectures to ensure reliability. By running all workloads on the same architecture instead, developers can speed time-to-market, reducing total cost of ownership, and accelerate the certification process.

Running Workloads with Mixed Critically

One of the most important issues complicating safety certification is mixed criticality. Complex systems have a variety of workloads that need to meet varying levels of safety requirements, based on their probability of occurrence and impact in case of failure. Further complicating design is the use of virtualization, which allows a single hardware controller to accommodate workloads for multiple machines.

The introduction of IT/business workloads and IoT connectivity to applications like factory automation adds yet another layer of complexity. Being able to monitor factory equipment remotely can significantly improve operating efficiency by turning real-time data into actionable intelligence.

But smarter manufacturing requires a robust and secure implementation that does not compromise the reliable operation of the system. The workloads that enable these advanced features could be third-party software and may not have been written to meet stringent safety requirements. Consider how a communications library that prioritizes guaranteed delivery could violate critical real-time deadlines as a consequence.

To mitigate such risks, systems need to be able to isolate and protect the safe workloads from "non-safe" workloads. In other words, if the user interface (UI) crashes, the failure must be contained to prevent the main system from acting unpredictably. Similarly, if a machine in a virtualized environment goes down, this should not impact the other virtual machines sharing that environment.

Using a Safety-Critical OS

For these and other reasons, safety needs to be an integral part of a system's OS. As such, Wind River supports the Intel® Xeon® processor D-1529 with its safety-focused Linux and VxWorks OSs. These OSs provide a virtualized environment with advanced time and space partitioning capabilities that can support mixed criticality across diverse workloads.

Wind River also supports the platform through its Simics virtual development environment. This platform allows developers to model the Intel® Xeon® processor D-1529 along with the surrounding system ahead of hardware availability (Figure 3).

Figure 3. Wind River Simics enables whole-system simulation. (Source: Wind River)

Managing Complexity

Designing safe systems is only going to get more complicated as the Internet of Things becomes more pervasive. By considering safety at the onset of design and integrating safety mechanisms from the ground up, OEMs can have confidence their products will meet safety requirements today and tomorrow.

About the Author

Nicholas Cravotta is a veteran of the electronics industry. He has been technical editor for EDN, Embedded Systems Programming, and Communications Systems Design, and was the founding editor-in-chief of Multimedia Systems Design. During his years as an engineer, he designed hard real-time embedded systems, wrote application software for PCs and workstations, built an operating system from the ground up, and developed in-house software and hardware development and test tools, among many other projects. He has written over 600 published articles and has taught programming and technical writing at UC Berkeley. When he isn’t writing about engineering, he is an award-winning game designer for BlueMatter Games where he focuses on innovative ways to engage people, including the home-version of Escape the Room and Houdini, the reconfigurable disentanglement puzzle. He was recently a contestant on the reality TV show, “The Toy Box” showing the Pinata Backpack.

Profile Photo of Nicholas Cravotta