

September 11, 2009

# Reliability Requirements for Automotive



Tony Reipold, Kevin Kemp, Claude Moughanni Freescale Semiconductor



# **Reliability Requirements for Automotive**

- Market Characteristics
- Safety Standards
- Research Needs /Discussion Points



#### **Market Characteristics**

- Automotive reliability needs driven by
  - Consumer expectation: JD Power, Consumer Reports ...
  - Warranty / service / recall cost
  - Liability, litigation, regulatory
- Automotive electronics growth Competitive new features
  - Powertrain: Engine control, transmission, instrumentation
  - Chassis / safety: ABS, ESC, active suspension, airbag ...
  - Body electrical: Door, window, lock, alarms ...
  - Comfort / convenience: Climate control, memory seat ...
  - Energy efficiency: Hybrid, EV, electric steering (weight reduction) ...
  - Infotainment, telematics
  - Driver aid: Radar, night vision, lane departure, active cruise ...
  - Intelligent highway, autonomous vehicle …
- Risk averse
  - Long technology adoption cycles
- Operating Environment
  - Temperature (-40 to 175°C), moisture, vibration, EM noise
- Extreme Cost Sensitivity



# **Reliability Threats**

- Device wear-out (hard fail or degradation)
- Manufacturing defects (test escapes)
- HW and SW design errors (verification escapes)
- Transient faults (alpha particles, cosmic rays, noise, EMI)
- Overstress failure (ESD, power surge, short circuit load)
- Parametric variability (performance marginality, increased failure susceptibility)
- Analog and sensor components as well as digital
- Over reaction to transient or correctable faults (unnecessary system shut down)
- Improper maintenance

Basic Concepts and Taxonomy of Dependable and Secure Computing A. Avizienis et. al., IEEE Trans on Dependable and Secure Computing, 2004



## Safety Systems versus Safety-Critical Systems

|                                 |                                      | Functional safety                      |                                          |  |
|---------------------------------|--------------------------------------|----------------------------------------|------------------------------------------|--|
|                                 |                                      | Non-safety critical                    | Safety-critical                          |  |
|                                 |                                      | [failure $\Rightarrow$ no imm. danger] | [failure $\Rightarrow$ immediate danger] |  |
| Automotive<br>Safety<br>Systems | Active Safety<br>[avoid accident]    | Braking Assistant                      | ESP                                      |  |
|                                 | Passive Safety<br>[survive accident] | Safety Belt<br>Pretensioner            | Airbag                                   |  |
| Non-Safety Systems              |                                      | Lighting                               | Electronic<br>Throttle Control           |  |

⇒ Functional Safety is a property rather than an application domain

- ⇒ many automotive systems require this property
- ⇒ differences in criticality and moment of activation



# **Safety Standards**

#### IEC 61508

- General safety standard for E/E/PE systems
- First edition 2000
- Metrics:
  - Probability of dangerous failure per hour (PFH)
  - Safe Failure Fraction (SFF)
- 4 Safety Integrity Levels (SIL)
- Hardware redundancy in formulas (HFT)

|           | SIL 1             | SIL 2             | SIL 3             |
|-----------|-------------------|-------------------|-------------------|
| PFH [1/h] | <10 <sup>-5</sup> | <10 <sup>-6</sup> | <10 <sup>-7</sup> |
| SFF       | >=60%             | >=90%             | >=99%             |

Note: specialised table for typical automotive application with HFT=0

## ISO 26262

- Adaption of IEC 61508 for the automotive industry
- First edition emerging now
- Metrics:
  - Probability of violation of safety goals (PFH)
  - Single Point Fault Metric (SPFM)
  - Latent Fault Metric (LFM)
- 4 Automotive SILs (ASIL)

| Hardware modeling | radyndan                      | ୢ୵ୠ୲ୄଽୄୄ୷ୣ        | CAUSAL D          |
|-------------------|-------------------------------|-------------------|-------------------|
| PFH [1/h]         | <10 <sup>-7</sup><br>(recom.) | <10 <sup>-7</sup> | <10 <sup>-8</sup> |
| SPFM              | >90%                          | >97%              | >99%              |
| LFM               | >60%                          | >80%              | >90%              |



# **Deriving MCU Safety Level Measures**

#### Self test measures

- Ensure that the device is free from dormant faults
- Core self-test
- Device self-test

#### Error detection measures

- Stop errors from propagating beyond component boundary
  - Error correction (compensation)
  - Shut down (fail-silent)
- · HW plausibility based
  - Illegal address/op-code detection, Supervisor & user modes, memory error detection, ECC, clock monitors, voltage supervision, watchdogs,
- HW redundancy based
  - redundant peripherals, dual-core

#### Development process measures

- Avoid systematic failures
- Follow IEC 61508 process requirements





# Fault Tolerance/Resiliency Techniques

|                                                                                                                                                                                                        | Device | Circuit /<br>uArch                  | System<br>Arch                     | Firmware /<br>Driver / OS | Application<br>SW |
|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------|-------------------------------------|------------------------------------|---------------------------|-------------------|
| Guardbanding / Uprating / Overdesign                                                                                                                                                                   | х      | х                                   | x                                  |                           |                   |
| Fail-over machanisms (e.g ABS)                                                                                                                                                                         |        |                                     |                                    |                           |                   |
| Communication channel coding (ECC)                                                                                                                                                                     |        | Low cost / very<br>effective        |                                    |                           |                   |
| Memory Error Correction                                                                                                                                                                                |        | low to med cost /<br>very effective |                                    |                           |                   |
| Self-checking digital circuits (Error correcting buses and logic) self-<br>Checking and Fault-Tolerant Digital Design, Parag K. Lala; Academic Press, 2001                                             |        |                                     |                                    |                           |                   |
| Triple modular redundancy + voting (TMR)                                                                                                                                                               |        |                                     | Very high cost /<br>very effective |                           |                   |
| Hardware replicate / diversity + checker                                                                                                                                                               |        |                                     | Med to high cost<br>/ effective    |                           |                   |
| Software diversity + checker                                                                                                                                                                           |        |                                     |                                    |                           | High cost         |
| On line BIST<br>http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=04700583                                                                                                                           |        | X (memory/logic)                    | x                                  | x                         |                   |
| Checkpoint – test – recover<br>(BulletProof Silicon - Austin/Bertacco)<br>http://www.eecs.umich.edu/-taustin/papers/dtco-25-04-aust.pdf<br>http://www.eecs.umich.edu/-taustin/papers/HPCA06-bullet.pdf |        | x                                   |                                    | x                         |                   |
| Asymmetric reliability - control vs. data<br>(ERSA – Mitra)<br>http://selse3.selse.org/Papers/17_Bau_P.pdf                                                                                             |        |                                     | x                                  |                           |                   |
| Monitors / diagnostics<br>(Clock source/quality monitors, open/short circuit tests, cpu tests,<br>voltage/current monitors, multiple A2D test voltages)                                                |        | Low cost /<br>effective             | x                                  |                           |                   |
| NoC w. redundancy + resilient routing                                                                                                                                                                  |        |                                     | x                                  |                           |                   |
| On-chip sensors (thermal, NBTI, TDDB)                                                                                                                                                                  | х      |                                     |                                    | x                         |                   |
| Selective component hardening based on criticality (VGER - Seshia)                                                                                                                                     | х      |                                     |                                    |                           |                   |

## **Chassis / Safety Applications**





#### **Metrics for Fault Tolerance Mechanisms**





## **Fault Tolerant Architectures**





# **Automotive System Level Example**

| Implementation<br>Abstraction | Example Application (for Illustration)                                         |                              |                            | System level goals:                                                                  |
|-------------------------------|--------------------------------------------------------------------------------|------------------------------|----------------------------|--------------------------------------------------------------------------------------|
|                               |                                                                                |                              |                            | Optimize total system resilience and cost                                            |
| Application SW                | Auto Drive Vehicle: Integrate all data sources and control direction and speed |                              |                            | Optimize component level resilience ar<br>cost in context of a the system            |
|                               |                                                                                |                              |                            | Compare alternate and innovative architectural solutions                             |
| Firmware / Driver / OS        | Read road sensors                                                              | Read car<br>radar            | GPS, direction, speed, Etc | Use tools to enable accurate modeling,<br>repeatable processes and faster time to    |
|                               |                                                                                |                              |                            | <ul><li>market.</li><li>Tools should cover system requirements</li></ul>             |
| System / Interfaces           | Communicate sense results                                                      | Communicate<br>radar results |                            | capture phase through product delivery                                               |
|                               |                                                                                |                              |                            | Resilient system design requires                                                     |
| Board / Subsystem             | Sensor to control logic                                                        | radar to control logic       |                            | cross layer and within layer<br>communication of :                                   |
|                               |                                                                                |                              |                            | Failure occurrence                                                                   |
| Packaged IC / Sensors         | Sensor with<br>lens/package                                                    | radar with<br>package        |                            | <ul> <li>Recovery implemented</li> <li>Probability of failure recurrence</li> </ul>  |
|                               |                                                                                |                              |                            | <ul> <li>Distance from lowest allowable safety<br/>critical degraded mode</li> </ul> |
| Manufacturing / Device        | Manufacture<br>sensor                                                          | Manufacture<br>radar x/r     |                            |                                                                                      |
|                               |                                                                                |                              |                            |                                                                                      |



### **Research Needs / Discussion Points**

- Requirements capture and architecture exploration methodology & tools
- New, low cost resilient architectures / methods
  - Apply to "mature" technologies to achieve very high reliability levels
- Software resiliency
  - Software (faults) are part of the problem
  - Software (may be) part of the solution
- Provable compliance (SIL / ASIL) of fault tolerant designs
  - At the system level
  - At each level in the system
  - Design, manufacturing, HW, SW, power supply, packaging, documentation...
  - Cross discipline resiliency models
- Commonalities with Aerospace (- rad hard, + low cost)
  - Life-critical applications
  - Conservative / risk-averse
  - Application follower (dynamics, navigation telematics,, autonomy ...)
- Communication and interfaces
  - Unambiguous system communication within a level and across levels
  - Optimized local communication overhead requirements (cost optimization)
  - Resilient interfaces

