Reliability engineering of an electric scooter as the 28th engineering axis: meta-axis of all engineering axes — MIL-HDBK-217F Notice 2 + IEC 61709:2017 + FIDES Guide 2009 Edition A + Telcordia SR-332 Issue 4 + IEEE 1413-2010 + JEDEC JEP122H + IEC 62308:2006 + ISO/IEC 25023:2016 + IEC 60300 + IEC 60812:2018 FMEA + IEC 61025 FTA + MIL-STD-1629A FMECA + Hobbs HALT/HASS + Weibull/Arrhenius/Eyring/Coffin-Manson/Norris-Landzberg
In the engineering-guide series we have described the lithium-ion battery with BMS and a thermal-runaway intro, the brake system, the motor and the controller, the suspension, the tyre, lighting and visibility, frame and fork, display + HMI, the SMPS CC/CV charger, connector + wiring harness, IP protection, bearings under ISO 281 L10, the stem and folding mechanism, the deck, handgrip + lever + throttle, the wheel as an assembly, fastener engineering as the joining axis, thermal management as the heat-dissipation axis, EMC/EMI as the interference-mitigation axis, cybersecurity as the interconnect-trust axis, NVH as the acoustic-vibration-emission axis, functional safety as the safety-integrity axis, battery lifecycle as the sustainability axis, repairability as the repairability axis, environmental robustness as the environmental-conditioning axis and privacy and personal-data protection as the privacy-preservation axis. These 27 engineering axes described subsystems, joining methods, thermal and electromagnetic phenomena, safety, sustainability, repairability, environmental conditioning and privacy — each episodically referred to reliability concepts (L10 in bearings, IFR in BMS, MTBF in motors, ALT in connectors), but none of them described the reliability-engineering toolkit itself: how system MTBF is computed from component-level FIT rates, how it is validated through ALT/HALT, how Weibull analysis of field returns is interpreted.
Reliability engineering is the meta-axis of all other engineering axes. It supplies the formal apparatus (probability distributions, hazard functions, RBD), standards for quantitative prediction (MIL-HDBK-217F + IEC 61709 + FIDES + Telcordia SR-332), validation protocols (ALT/HALT/HASS per Hobbs) and process tools (FMEA + FTA + FRACAS + DRBFM) that allow one to predict and validate the reliability of each of the 27 previous axes before market release and throughout the lifecycle.
This is the twenty-eighth engineering-axis deep-dive in the guide series — and the eleventh cross-cutting infrastructure axis (parallel to joining DT + heat-dissipation DV + interference-mitigation DX + interconnect-trust DZ + acoustic-vibration-emission EB + safety-integrity ED + sustainability EF + repairability EH + environmental-conditioning EJ + privacy-preservation EL, now reliability-prediction EN). Unlike previous axes that described a separate subsystem or a particular aspect, the reliability axis is integral: it has no hardware “node” of its own — instead it is a methodology layered on top of every other axis.
1. Reliability ≠ functional safety ≠ maintenance: a separate axis
Reliability, functional safety and maintenance are often conflated but solve different problems:
| Dimension | Reliability (EN) | Functional safety (ED) | Maintenance |
|---|---|---|---|
| Question | How many hours until failure? | What happens upon failure? | How quickly can it be restored after failure? |
| Metric | MTBF, FIT, R(t) | SIL/ASIL level, PFD/PFH | MTTR, availability |
| Foundational standard | MIL-HDBK-217F + IEC 61709 + FIDES + Telcordia SR-332 | IEC 61508 + ISO 26262 + ISO 13849 | IEC 60300-3-14 + EN 13306 |
| Analytical tool | FMEA + FTA + RBD + Weibull | HARA + PHA + SIL decomposition | RCM (Reliability-Centered Maintenance) |
| Engineering goal | Prevent failure statistically | If a failure occurs — fail safe | Reduce downtime |
| Validation cycle | ALT/HALT/HASS + field MTBF | SIL audit + safety case | MTBF/MTTR ratio measurement |
| Trigger | “How long will it last?” | “What if it fails?” | “How to repair it?” |
A canonical example of the distinction: an e-scooter brake system with SIL-2 hardware (functional safety) and MTBF 50 000 hours (reliability). The functional-safety axis is perfectly satisfied — on any detected failure the system transitions to a failsafe state (the mechanical brake takes over). The reliability axis is at the same time a separate problem: what is the probability that a detected failure occurs within the first year (Weibull β < 1 — infant mortality) versus after the fifth year (β > 1 — wear-out)? That is not a functional-safety question — it is a reliability-engineering question.
2. Reliability function R(t), failure rate λ(t), MTBF, MTTF, MTTR, FIT
The reliability function R(t) is the probability that a component operates without failure during the interval [0, t]:
R(t) = P(T > t), where T is the random time-to-failure
The cumulative distribution function (CDF) F(t) is the probability of failure by time t:
F(t) = 1 − R(t) = P(T ≤ t)
The probability density function (PDF) f(t) is the density of the time-to-failure distribution:
f(t) = dF(t)/dt = −dR(t)/dt
The failure rate (hazard rate) λ(t) is the instantaneous probability of failure conditional on survival to time t:
λ(t) = f(t)/R(t)
It is not a probability — it is an intensity (1/time), and it is precisely the failure rate that gives the bathtub curve its physical meaning (see § 3).
MTBF (Mean Time Between Failures) — for repairable systems, the average time between successive failures:
MTBF = ∫₀^∞ R(t) dt (for repairable systems restored to as-good-as-new)
MTTF (Mean Time To Failure) — for non-repairable components, the expected time to first failure:
MTTF = E[T] = ∫₀^∞ t · f(t) dt = ∫₀^∞ R(t) dt
(For an exponential distribution MTBF = MTTF = 1/λ; for other distributions the difference matters.)
MTTR (Mean Time To Repair) — the average time to restore the system after a failure. Not reliability per se, but a component of availability:
A = MTBF / (MTBF + MTTR)
FIT (Failures In Time) — the number of failures per 10⁹ hours of operation (the standardised unit for component-level reliability):
FIT = λ × 10⁹ (failures per billion hours)
Typical orders of magnitude: passive resistor — 0.1 FIT, silicon MOSFET — 5–50 FIT, electrolytic capacitor — 100–500 FIT, BLDC motor — 5 000–20 000 FIT, lithium-ion cell — 1 000–10 000 FIT (per FIDES Guide 2009A + Telcordia SR-332 Issue 4).
3. The bathtub curve: three life phases
The empirically observed bathtub curve describes λ(t) of a typical electronic/electromechanical component in three phases:
| Phase | Name | Duration | λ(t) behaviour | Weibull β | Dominant mechanism |
|---|---|---|---|---|---|
| 1 | Infant mortality (early failure) | 0 – 1 000 hr | Decreasing failure rate (DFR) | β < 1 | Manufacturing defects: solder voids, contamination, weak die-attach |
| 2 | Useful life (steady state) | 1 000 – 100 000 hr | Constant failure rate (CFR) | β = 1 | Random triggers: ESD, overstress, transients |
| 3 | Wear-out | > 100 000 hr | Increasing failure rate (IFR) | β > 1 | Cumulative: electromigration, capacitor dry-out, bearing fatigue |
The Weibull β shape parameter (see § 4) is the most compact summary of which phase the component is in. Field-return data plotted on Weibull paper immediately reveals which phase contains most of the returns:
- β < 1 → engineering had a manufacturing defect; solution: strengthen screening (burn-in / HASS).
- β ≈ 1 → failures are random; solution: increase stress margin / redundancy.
- β > 1 → wear-out within the warranty window; solution: revisit derating / materials / tolerances.
The engineering goal is to push the whole curve down and extend phase 2, achieved through three practices: (a) derating (operate components at ≤ 50 % of rated stress); (b) burn-in screening (filter phase 1 in the factory); (c) wear-out lifetime > intended life (select components with MTBF > 5 × warranty period).
4. Time-to-failure probability distributions
| Distribution | Parameters | PDF f(t) | When it applies |
|---|---|---|---|
| Exponential | λ (rate) | λe^(−λt) | Constant failure rate (bathtub phase 2), random failures, memoryless |
| Weibull (2-parameter) | β (shape), η (scale) | (β/η)(t/η)^(β−1) · exp(−(t/η)^β) | Universal: β < 1 = infant, β = 1 = CFR, β > 1 = wear-out |
| Weibull (3-parameter) | β, η, γ (location) | same with shift γ | When a “guaranteed” failure-free interval exists (γ > 0) |
| Lognormal | μ (location), σ (shape) | (1/(tσ√(2π))) · exp(−(ln t − μ)²/(2σ²)) | Fatigue, crack growth, corrosion, semiconductor diffusion |
| Normal | μ, σ | (1/(σ√(2π))) · exp(−(t−μ)²/(2σ²)) | Wear-out with symmetric scatter (rare in electronics) |
The Weibull distribution (Waloddi Weibull, “A Statistical Distribution Function of Wide Applicability”, Journal of Applied Mechanics, 1951) is the canonical reliability distribution, because a single parameter (β) describes all three bathtub phases. On a Weibull-paper plot (ln(ln(1/(1−F(t)))) versus ln(t)) the data form a straight line whose slope = β and intersection at 63.2 % = η (in the two-parameter form).
The exponential distribution is the special case of Weibull at β = 1. It has the unique memoryless property: P(T > s+t | T > s) = P(T > t). This means “having operated for 1 000 hours, it remains as fresh as new for prediction purposes” — which is not realistic for wear-out regimes but accurate for random overstress triggers in phase 2.
The lognormal distribution applies to mechanisms in which damage accumulates multiplicatively rather than additively: Paris-Erdogan crack growth, corrosion, IMC growth in solder joints. Coffin-Manson cycles-to-failure often follows a lognormal distribution.
5. Standards corpus — 9-row reliability matrix
| Standard | Year | Origin | Scope | Key metric |
|---|---|---|---|---|
| MIL-HDBK-217F Notice 2 | 1995 (Notice 2) | US DoD | Parts-stress + parts-count prediction for military electronics | Component FIT with 27 stress factors π |
| IEC 61709:2017 | 2017 | IEC TC56 | Reference conditions + stress models for failure-rate conversion | λ_ref + multipliers (π_T, π_U, π_I, π_S) |
| FIDES Guide 2009 Edition A | 2009 | French defence consortium (DGA + Airbus + Thales + Sagem + MBDA) | Industrial reliability handbook covering modern EEE components | Process factor (manufacturing quality) integrated |
| Telcordia SR-332 Issue 4 | 2016 | Bellcore/Telcordia (telecom origin) | Component reliability prediction, telecom equipment | Methods I/II/III (proprietary multiplicative factors) |
| IEEE 1413-2010 | 2010 | IEEE Reliability Society | Framework standard: how to perform reliability prediction (not what values to predict) | Quality criteria for the prediction methodology |
| JEDEC JEP122H | 2016 | JEDEC JC-14 | Failure mechanisms and models for semiconductor devices | TDDB, EM, HCI, NBTI, TC acceleration models |
| IEC 62308:2006 | 2006 | IEC TC56 | Equipment reliability — assessment methods | Decision tree: prediction vs test vs field data |
| ISO/IEC 25023:2016 | 2016 | ISO/IEC JTC1 SC7 | Software product quality measurement (includes reliability) | Software failure intensity, maturity, fault tolerance |
| IEC 60300 series | 2014–2024 | IEC TC56 | Dependability management (umbrella for reliability + availability + maintainability + safety = RAMS) | Programme + processes |
For an e-scooter the most operationally relevant standards are: MIL-HDBK-217F Notice 2 and FIDES Guide 2009A — for component-level FIT computations (BMS controller IC, motor-controller MOSFET, charger SMPS); IEC 61709:2017 — for normalising datasheet λ to actual operating conditions; Telcordia SR-332 — for laboratory burn-in screening models; JEDEC JEP122H — for concrete failure mechanisms in semiconductors (electromigration in power MOSFETs, NBTI in MCU CMOS); IEC 62308 — as a decision framework: when to do prediction vs ALT testing vs field tracking.
MIL-HDBK-217F vs IEC 61709 vs FIDES are three competing prediction methods with different failure-rate values for the same components (up to 10× discrepancy). IEEE 1413-2010 does not pick a winner — instead it demands transparency: any reliability claim must document the method, the data source, the assumptions and the uncertainty.
6. Acceleration models — 5-row matrix
ALT (Accelerated Life Test) works because an accelerator (elevated temperature, voltage or cycle frequency) accelerates the very same physical reaction that produces failure at use conditions. The acceleration factor (AF) is the ratio of TTF at use conditions to TTF at stress conditions:
AF = TTF_use / TTF_stress
If an ALT at stress conditions yields TTF = 100 hours and AF = 1 000, then at use conditions TTF = 100 × 1 000 = 100 000 hours (≈ 11 years of continuous operation).
| Model | Stressor | AF formula | Applies to | Origin |
|---|---|---|---|---|
| Arrhenius | Temperature | exp((E_a/k_B)·(1/T_use − 1/T_stress)) | Chemical-reaction–driven: IMC growth, corrosion, NBTI, oxide breakdown | Svante Arrhenius, “Über die Reaktionsgeschwindigkeit”, Z. Physik. Chem. 4, 1889 |
| Eyring | Temperature + secondary stress | (T_stress/T_use) · exp((ΔH/k_B)·(1/T_use − 1/T_stress)) · f(stress) | Rate processes with a non-thermal co-stress (humidity, voltage) | Henry Eyring, “The Activated Complex in Chemical Reactions”, J. Chem. Phys. 3, 1935 |
| Inverse Power Law | Voltage / mechanical stress | (V_stress/V_use)^n | Capacitor dielectric, insulation, bearing fatigue | Power-law fits across many domains, formalised in Nelson, “Accelerated Testing”, 1990 |
| Norris-Landzberg | Thermal cycling | (Δf_stress/Δf_use) · (ΔT_use/ΔT_stress)^n · exp((E_a/k_B)·(1/T_max_use − 1/T_max_stress)) | Solder-joint thermal fatigue (SnPb, SAC305) | Norris & Landzberg, “Reliability of Controlled Collapse Interconnections”, IBM J. Res. Dev. 13, 1969 |
| Coffin-Manson | Plastic-strain amplitude / thermal cycling | (Δε_p_use/Δε_p_stress)^n or (ΔT_use/ΔT_stress)^n | Low-cycle fatigue (solder joints, ductile metals), thermal-expansion mismatch | L. F. Coffin, “A Study of the Effects of Cyclic Thermal Stresses…”, Trans. ASME 76, 1954; S. S. Manson, “Behaviour of Materials Under Conditions of Thermal Stress”, NACA Report 1170, 1954 |
For an e-scooter the most important are: Arrhenius (BMS-MCU NBTI, motor-controller MOSFET TDDB, electrolytic capacitor dry-out), Norris-Landzberg (solder joints on the controller PCB at every heating/cooling cycle = one ride), Coffin-Manson (BMS cell-connection tabs as cells expand and contract), Inverse Power Law (the Y-capacitor in the EMI filter during line-surge events).
Activation energy E_a for typical mechanisms (per JEDEC JEP122H):
- Electromigration (Al / Cu interconnects): 0.5–0.9 eV
- Time-Dependent Dielectric Breakdown (TDDB): 0.6–0.9 eV
- Hot Carrier Injection (HCI): 0.2–0.4 eV (counterintuitively, lower T accelerates)
- NBTI (Negative Bias Temperature Instability): 0.2–0.5 eV
- Electrolytic capacitor dry-out: 0.5–0.7 eV
- Solder fatigue (SnPb / SAC): ~0.123 eV (Norris-Landzberg)
- Aluminium corrosion: ~0.7 eV
Rule of thumb: a 10 °C temperature increase with E_a = 0.7 eV gives AF ≈ 2 (the doubling rule). This is the physical basis of derating: running a component 25 °C below its maximum rating roughly doubles MTBF.
7. Parts-count vs parts-stress prediction workflow
MIL-HDBK-217F (like the other reliability-prediction handbooks) offers two methods of increasing precision:
Parts-count method (early design) — used when detailed component-level stress data is unavailable (concept stage):
λ_equip = Σᵢ Nᵢ · (λ_g,ᵢ · π_Q,ᵢ)
where Nᵢ is the number of components of type i, λ_g,ᵢ is the generic failure rate from a MIL-HDBK-217F table, and π_Q,ᵢ is the quality factor (commercial / industrial / military / space).
Parts-stress method (detailed design) — used when detailed stress data is available for every component:
λ_part = λ_b · π_T · π_S · π_E · π_Q · π_A · …
where λ_b is the base failure rate and π are multiplicative stress factors (Temperature, Stress, Environment, Quality, Application…).
For a typical e-scooter BLDC controller PCB (illustrative):
| Component | N | λ_g (FIT) | π_Q | Sub-total FIT |
|---|---|---|---|---|
| MOSFET (power, 6×) | 6 | 50 | 1.0 (industrial) | 300 |
| Gate-driver IC (3×) | 3 | 15 | 1.0 | 45 |
| MCU (1×) | 1 | 20 | 1.0 | 20 |
| Electrolytic capacitor (4×) | 4 | 200 | 1.0 | 800 |
| Ceramic capacitor (40×) | 40 | 0.3 | 1.0 | 12 |
| Resistor (60×) | 60 | 0.1 | 1.0 | 6 |
| Connector (3×) | 3 | 30 | 1.0 | 90 |
| Sum | 1 273 FIT |
MTBF = 10⁹ / 1 273 ≈ 785 000 hours ≈ 89 years of continuous operation. That is the prediction at reference conditions (25 °C, no humidity, no shock). At actual use conditions (40–60 °C average, vibration, daily thermal cycling) multiply by an environmental π_E (~5–10) and MTBF falls to 8 000–18 000 hours (≈ 5–10 duty-cycle-adjusted years), matching the typical 2-year warranty period of an e-scooter.
8. Stress-strength interference + derating
The classical reliability model: a component has a strength (capability to withstand stress) — a random variable with distribution P(strength). The operating environment imposes stress — a random variable with distribution P(stress). Failure occurs when stress > strength (the interference region):
P(failure) = ∫₀^∞ f_stress(x) · F_strength(x) dx
P(failure) can be reduced in three ways:
- Increase mean(strength) — pick a more expensive component with a higher rating.
- Decrease mean(stress) — derating (operate at ≤ 50 % of rated).
- Decrease σ(stress) or σ(strength) — quality control + screening.
Derating practices (industry standard per NASA EEE-INST-002, ECSS-Q-30-11):
| Component | Derating ratio (operating / rated) | Rationale |
|---|---|---|
| Resistor power | ≤ 50 % | Temperature rise + drift |
| Capacitor voltage | ≤ 50 % (electrolytic), ≤ 80 % (ceramic) | Dielectric stress + leakage |
| Diode forward current | ≤ 50 % | Junction temperature |
| Power MOSFET V_DS | ≤ 80 % | Avalanche safety margin |
| Power MOSFET I_D | ≤ 80 % | RDS(on) thermal headroom |
| IC junction temperature | ≤ Tj_max − 25 °C | Arrhenius doubling rule |
| Connector contact current | ≤ 75 % | Contact-resistance heating |
| Bearing dynamic load | ≤ C/P ≥ 4 (L10 > 30 000 hr) | ISO 281 L10 life |
For an e-scooter the most critical case is power MOSFETs in the motor controller: continuous current at start/hill-climb approaches 80 % of I_D rating → junction temperature approaches 150 °C → Arrhenius AF against a reference 75 °C = 2^(75/10) ≈ 180× shorter MTBF. Industrial-grade controllers therefore use 2× MOSFET parallelisation and active gate-driver thermal monitoring.
9. Reliability Block Diagrams (RBD)
An RBD is a graphical representation of a system showing how subsystems combine to form overall reliability:
Series configuration — the system operates only if all components operate:
R_series(t) = R₁(t) · R₂(t) · … · Rₙ(t)
For exponential distributions: λ_series = λ₁ + λ₂ + … + λₙ. Every component lowers total reliability — series is the “weakest link” architecture.
Parallel (active redundancy) — the system operates if at least one component operates:
R_parallel(t) = 1 − (1 − R₁(t)) · (1 − R₂(t)) · … · (1 − Rₙ(t))
Two identical units (R₁ = R₂ = R) give R_parallel = 2R − R². At R = 0.99, R_parallel = 0.9999. This is an expensive improvement (2× cost), so it is reserved for safety-critical paths.
k-out-of-n — the system operates if at least k of n components operate. Binomial summation:
R_{k/n}(t) = Σ_{j=k}^{n} C(n,j) · R(t)^j · (1 − R(t))^{n−j}
Bridge network — a non-decomposable topology that requires either the pivotal-decomposition method or enumeration of minimal path / cut sets.
E-scooter as a series-parallel RBD (simplified):
Battery → BMS → [Controller A || Controller B (redundant)] → Motor → Wheel
\→ Charger (off-board, not in series during the ride)
↓
[Lighting] — paralleled in the safety path
A typical e-scooter has no redundancy on the critical path (battery → BMS → controller → motor → wheel — usually single-channel). This is a deliberate compromise: redundancy adds weight + cost > value for personal mobility. Reliability is instead guaranteed through derating + screening + ALT validation.
10. FMEA (MIL-STD-1629A → IEC 60812:2018)
FMEA (Failure Mode and Effects Analysis) is a bottom-up systematic analysis: for every component, identify the possible failure modes, the effect, severity, probability of occurrence and detectability. Created by the US DoD in MIL-STD-1629A (1980), expanded in MIL-STD-1629A Notice 3 (1998, although the standard was cancelled in 1998 it remains a de facto industry reference), formalised as IEC 60812:2018 (current revision) and the AIAG-VDA FMEA Handbook 2019 (automotive industry consensus, replacing SAE J1739).
The Risk Priority Number (RPN) is a multiplicative score:
RPN = Severity × Occurrence × Detection (each 1–10)
Mitigation is prioritised by descending RPN. AIAG-VDA 2019 replaced RPN with Action Priority (AP) — a three-tier classification (High / Medium / Low) based on a Severity-Occurrence-Detection table that does not multiply (this fixes a known defect of the old RPN, where 5×5×5 = 125 and 10×5×2.5 = 125 — equal numbers but very different cases).
FMEA for an e-scooter BMS (excerpt, illustrative):
| Component | Failure mode | Effect | S | O | D | RPN |
|---|---|---|---|---|---|---|
| BMS MOSFET (charge gate) | Stuck-on (short) | Cannot disconnect → overcharge → thermal runaway | 10 | 3 | 5 | 150 |
| BMS MOSFET (charge gate) | Stuck-off (open) | Cannot charge → user complaint | 4 | 4 | 2 | 32 |
| Cell-voltage sense wire | Open | Loss of monitoring → individual cell overvoltage possible | 9 | 5 | 4 | 180 |
| Temperature sensor (NTC) | Open | BMS reads −∞ → no thermal cutoff | 9 | 3 | 3 | 81 |
| Temperature sensor (NTC) | Short | BMS reads +∞ → false trip | 4 | 4 | 2 | 32 |
The highest RPN (cell-voltage sense wire open, 180) → mitigation: redundant sense lines + a plausibility check (compare the sum of cell voltages against the pack-voltage measurement).
11. FTA (IEC 61025)
FTA (Fault Tree Analysis) is a top-down deductive analysis: given a top event (e.g. “Battery thermal runaway”), construct a logical tree of causes with AND/OR gates that decomposes down to basic events (atomic component failures). Created by H. A. Watson at Bell Labs (1962, Minuteman ICBM safety analysis), formalised as IEC 61025:2006 (US analogue NUREG-0492).
A minimal cut set is the smallest combination of basic events that triggers the top event. The order of a cut set is the number of basic events: order 1 = a single point of failure, order ≥ 2 = redundancy exists.
FTA for the top event “Thermal runaway” (excerpt):
TOP: Battery thermal runaway
OR
├── Overcharge during charge cycle
│ AND
│ ├── BMS charge MOSFET stuck-on (basic event)
│ └── Charger overvoltage-protection failure (basic event)
├── Internal short (cell-level)
│ OR
│ ├── Manufacturing defect (basic event)
│ ├── Mechanical damage (impact / vibration) (basic event)
│ └── Dendrite growth (overcharge / ageing) (basic event)
├── External short
│ AND
│ ├── Insulation breach (basic event)
│ └── Both BMS discharge MOSFETs stuck-on (basic event)
└── Thermal abuse
OR
├── External heat source > 60 °C (basic event)
└── Cooling failure + high discharge load (basic event)
Order-1 cut sets (single points of failure) — manufacturing defect, mechanical damage, dendrite growth, external heat. Order-2 cut sets — BMS + charger combined, insulation + both MOSFETs.
Quantitative FTA: substitute component failure probabilities → top-event probability via AND (multiply) / OR (sum for rare events). If BMS MOSFET stuck-on = 10⁻⁴/year and charger overvoltage = 10⁻³/year, the “Overcharge” subtree = 10⁻⁷/year (one incident per ten million scooters per year — acceptable per ISO 26262 ASIL-C SIL target).
12. FRACAS + DRBFM
FRACAS (Failure Reporting, Analysis, and Corrective Action System) is a closed-loop process for detecting recurring failure modes among field returns. Steps:
- Report: a warranty claim becomes a standardised failure ticket (component + symptom + serial number + use conditions).
- Analyse: root-cause analysis (5-Why, fishbone / Ishikawa, 8D problem solving).
- Corrective action: design change / supplier change / process change.
- Verify: a pilot batch with the fix confirms a reduced failure rate.
- Close: document the change in the master FMEA, update design rules.
DRBFM (Design Review Based on Failure Mode) is a Toyota practice (Shigeru Mizuno, 1996) for change reviews: on any modification to an existing design, a formal review focuses only on the changes and their interaction with the unchanged parts. It is cheaper than rebuilding a full FMEA from scratch yet catches the regression bugs that the change introduces.
13. ALT / HALT / HASS — the Hobbs method
ALT (Accelerated Life Test) applies elevated stress to compress lifetime. Two protocols:
- Constant-stress ALT: 3+ samples at each of 3+ stress levels (constant during the test) → fit a Weibull-Arrhenius model → extrapolate to use conditions. Rigorous statistical foundation, conservative (Nelson, Accelerated Testing, Wiley 1990).
- Step-stress ALT: the same samples are tested at progressively higher stress steps. Faster but harder to analyse (Nelson’s cumulative-damage model).
HALT (Highly Accelerated Life Test) is a Gregg Hobbs technique (Accelerated Reliability Engineering: HALT and HASS, Wiley 2000). It is not used for a quantitative MTBF — it is used for discovery of design weaknesses through step-stress to destruction:
- Cold step stress: −10 °C every 10 min until non-operational → operating limit; continue to destruct.
- Hot step stress: +10 °C every 10 min until non-operational; continue to destruct.
- Rapid thermal cycling: ±X °C/min ramp rate, capture intermediate failures.
- Vibration step stress: 5 G_rms increments to destruct.
- Combined: temperature + vibration + voltage simultaneously.
The output: an operating limit (≥ specification + margin) and a destruct limit (catastrophic stress level). Design changes reinforce the weak points until the operating margin > 50 % above worst-case use conditions.
HASS (Highly Accelerated Stress Screening) is a production-line screening based on HALT-derived limits. Typically 80 % of the operating limit, applied to every unit produced. It is designed to catch phase-1 infant mortality (manufacturing defects) without consuming the useful life of healthy units.
For an e-scooter HALT is typically performed by the tier-1 controller manufacturer (motor controller / BMS PCB):
- Cold: −40 °C (Arctic winter operating envelope).
- Hot: +85 °C (motor-controller compartment summer heat soak).
- Vibration: 30 G_rms random (road shock + curb drop).
- Thermal cycling: −30 °C ↔ +60 °C × 100 cycles (storage + use).
- Voltage: ±20 % of rated battery voltage (low-cell + full-charge corners).
ESS (Environmental Stress Screening) is the older term, often used interchangeably with HASS. IEC 61163-1:2006 specifies ESS protocols.
14. Cross-axis matrix: reliability concepts across the 27 previous axes
| Engineering axis | Reliability concept | Metric | Acceleration model | Standard |
|---|---|---|---|---|
| Battery cell + BMS | Cycle life, calendar life | C/3 cycles to 80 % SoH | Arrhenius (calendar) + cycle throughput (Bazant) | IEC 62660-2, UL 1973 |
| Battery lifecycle | Second-life capability | RPT (Reference Performance Test) | Calendar + cycle combined | IEC 62902, ISO/IEC 12405-4 |
| Motor + controller | MOSFET TDDB, motor bearing | FIT (semiconductor), L10 (bearing) | Arrhenius + Eyring + Norris-Landzberg | JEDEC JEP122H, MIL-HDBK-217F |
| Brake system | Pad wear, hydraulic seal | mm/1000 km, leak rate | Inverse Power Law (load) | ECE R78, ISO 11157 |
| Suspension | Damper seal life, spring fatigue | leak/cycles, S-N curve | Coffin-Manson, IPL | DIN 53513, ISO 12131 |
| Tyre | Tread depth, casing fatigue | mm/1000 km, TWI | IPL (load) + Arrhenius (rubber ageing) | ISO 28580, UTQG |
| Lighting | LED L70/L80/L90 (lumen maintenance) | hours to 70 %/80 %/90 % initial lumen | Arrhenius (junction T) | LM-80, TM-21 |
| Frame + fork | HCF (high-cycle fatigue) | S-N curve, endurance limit | Basquin’s law (S-N) | EN 17128, ISO 4210 |
| Display + HMI | LCD/OLED degradation | nits to 50 %, dead-pixel count | Arrhenius + photon dose | IEC 62977 |
| Charger SMPS | Electrolytic capacitor dry-out, MOSFET TDDB | FIT, ESR drift | Arrhenius (E_a ~0.7 eV) | IEC 62368-1, JEDEC |
| Connector + harness | Contact fretting, insulation ageing | mΩ drift, IR drop | Arrhenius + Inverse Power Law | EIA-364-23, IEC 60512 |
| IP protection | Seal compression set | leak/cycles | Arrhenius (rubber) | ISO 815, IEC 60529 |
| Bearings (ISO 281 L10) | L10 dynamic life | million revolutions to 90 % survival | Lundberg-Palmgren (a₁·a₂·a₃) | ISO 281, ISO 16281 |
| Stem + folding | Hinge wear, fold-cycle fatigue | cycles to failure | Coffin-Manson (low-cycle) | EN 17128 |
| Deck + footboard | Composite fatigue, surface wear | strain cycles, μ deg | Basquin, IPL | EN 17128 |
| Handgrip + lever + throttle | Polymer fatigue, hall-sensor drift | cycles, output drift | Coffin-Manson + Arrhenius | ISO 11421 |
| Wheel + rim + spoke | Spoke-tension fatigue, rim corrosion | cycles, μm/year | S-N + Arrhenius | EN 17128, ASTM B117 |
| Fastener + bolted joint | Preload loss (embedment + relaxation) | %/cycles | Logarithmic (relaxation) | VDI 2230 |
| Thermal management | Cooling fan MTBF, TIM degradation | hours, °C·m²/W drift | Arrhenius (TIM) + IPL (fan) | JEDEC JESD51 |
| EMC/EMI | Y-capacitor degradation, choke insulation | μA leakage drift | Arrhenius + IPL (voltage) | IEC 60384-14, IEC 60938 |
| Cybersecurity | Cryptographic obsolescence (post-quantum migration) | years to algorithm sunset | (Non-statistical: planned per the NIST PQC roadmap) | NIST FIPS 203/204/205 |
| NVH | Damping-element ageing | tan δ drift | Arrhenius (rubber) | ISO 6721 |
| Functional safety | PFD/PFH (safety integrity) | failures per demand / per hour | Constant-rate exponential | IEC 61508, ISO 26262 |
| Repair + reparability | MTTR (mean time to repair) | hours | Operational, not predicted | EN 45554 |
| Environmental robustness | Combined-stress ageing | composite metric | Multi-stress Eyring | MIL-STD-810H, IEC 60068 |
| Privacy | Cryptographic key lifetime | years | (Non-statistical: per NIST SP 800-57 cryptoperiod) | NIST SP 800-57 |
| Helmet + protective gear | EPS foam ageing, polycarbonate UV | years to brittleness | Arrhenius + photon dose | EN 1078, EN 17128 |
27 engineering axes + 1 reliability meta-axis (this article) = the complete engineering corpus. Reliability, as a meta-axis, provides a unified apparatus for the quantification of every other axis.
15. Eight DIY owner reliability practices
The owner of an e-scooter does not perform ALT/HALT but can extend the field MTBF through simple practices:
- Avoid thermal-cycling extremes — do not leave the scooter in a +50 °C sunlit boot in summer or move it abruptly from −20 °C frost into a +25 °C room (Norris-Landzberg solder fatigue: a 70 °C ΔT swing shortens solder-joint life ~10× compared with a 30 °C swing).
- Charge at room temperature — the Arrhenius rule: 10 °C lower → 2× longer battery calendar life. Do not charge immediately after a ride (the battery is hot) — wait 30 minutes.
- Storage SoC 40–60 %, not 100 % — Arrhenius + calendar fade depend on voltage stress (Eyring); 100 % SoC stored at 40 °C loses 20 % of capacity per year vs 50 % SoC at 20 °C — 2 % per year (IEC 62660-2 calendar-test methodology).
- Do not overload during sustained climbs — when power-MOSFET I_D approaches its rated I_D continuously → Tj approaches 150 °C → Arrhenius doubling: a 25 °C Tj overshoot = 5× shorter MOSFET MTBF.
- Rain + vibration = Norris-Landzberg + corrosion — the IP rating is not infinite. After every ride in the rain wipe with a dry cloth, especially around connectors (EIA-364-23 fretting corrosion is voltage-accelerated).
- Stop bolted-joint loosening — preload loss (per VDI 2230 + the fastener axis) accelerates with vibration cycles. Check torque on critical joints (stem, fork, axle) every 200 km.
- A field-return signal is a pattern, not a single event — if two or three failures of the same type appear in a short period, it is not chance but β > 1 wear-out or β < 1 batch defect. Contact the manufacturer with the serial numbers and failure dates — that is genuine FRACAS input.
- Document dates and mileage at every subsystem replacement — these become your personal warranty data. In three years’ time, when the next defect appears, you will have Weibull-actionable data for negotiation with the manufacturer.
16. Recap — 10 key statements
- Reliability engineering is the meta-axis of all 27 other engineering axes; it provides the quantitative apparatus (R(t), λ(t), MTBF, FIT) for predicting and validating the reliability of each of them.
- Reliability ≠ functional safety ≠ maintenance — three different axes: reliability asks “how long until failure?”, functional safety asks “what happens upon failure?”, maintenance asks “how to restore?”.
- The bathtub curve has three phases: infant mortality (β < 1, DFR), useful life (β ≈ 1, CFR), wear-out (β > 1, IFR). The engineering goal is to push the whole curve downward and extend phase 2.
- The Weibull distribution (Waloddi Weibull 1951) is the canonical reliability distribution: a single parameter β captures all three bathtub phases.
- The standards corpus: MIL-HDBK-217F Notice 2 + IEC 61709:2017 + FIDES Guide 2009A + Telcordia SR-332 Issue 4 — the four leading prediction methods (different values up to 10× for the same components; IEEE 1413 mandates transparency).
- Acceleration models: Arrhenius (T), Eyring (T + other), Inverse Power Law (V), Norris-Landzberg (TC), Coffin-Manson (plastic strain) — the physical foundation of both ALT testing and derating.
- Reliability Block Diagrams — series (multiply), parallel (1 − product of unreliabilities), k-out-of-n (binomial). An e-scooter is mostly a series RBD without redundancy on the critical path.
- FMEA + FTA + FRACAS + DRBFM — the process toolset: FMEA (bottom-up, MIL-STD-1629A → IEC 60812:2018 → AIAG-VDA 2019), FTA (top-down, IEC 61025), FRACAS (closed-loop field returns), DRBFM (change-driven Toyota practice).
- HALT/HASS (Hobbs method) — qualitative discovery of design weak points through step-stress to destruct + production-line screening at 80 % of the operating limit. Not ALT — ALT is for quantitative MTBF; HALT is for design hardening.
- DIY owner practices — derating through behaviour: avoid thermal extremes, charge cool, store at 40–60 % SoC, do not run sustained max-load, document subsystem-replacement dates for your personal Weibull dataset.
Reliability engineering is the twenty-eighth engineering axis and the eleventh cross-cutting infrastructure axis after privacy. It does not exist as a separate hardware “node” inside the scooter — it is a methodology layered on top of every one of the 27 previous axes that lets us answer the question each of them only hints at: how long will this subsystem last under these operating conditions, and what will the failure mode look like when it finally fails.