MISC-TN-019: Post-portem analysis of embedded Linux systems — Part 1

From DAVE Developer's Wiki
Revision as of 09:18, 25 June 2021 by U0001 (talk | contribs) (Introduction)

Jump to: navigation, search
Info Box


History[edit | edit source]

Version Date Notes
1.0.0 June 2021 First public release

Introduction[edit | edit source]

One of the most challenging problems related to embedded Linux systems is the so called post-mortem analysis. Post-mortem is a Latin expression that means "after death". In this context, death is meant as an event after which the system becomes unstable or even gets stuck. Therefore, post-mortem analysis refers to the tasks carried out after the occurrence of such an event to figure out its root cause

Even worse, post-mortem analyses are yet harder when these events occur randomly and it is apparently impossible to trigger them in a controlled fashion. In spite of thorough testing at qualification stage, unfortunately, these situations may even occur when the system has already been deployed on the field and is used by end customers making the analysis amazingly troublesome.

Several techniques are available for post-mortem analysis. Software tools, hardware tools, or a combination of both can be leveraged. This article is the first of a series of Technical Notes (TN) describing in more details some of these techniques. Interestingly, some TN's refer to real-world cases for which DAVE Embedded Systems deployed its expertise to support several customers reporting on-field failures they were unable to analyze with traditional debugging tools and approaches. In these cases, information reported by customers are necessarily so limited and fragmented that is generally impossible to determine a priori if the root cause is software or hardware related. Thus, no assumption about the root cause domain can be made and engineers need to be very open-minded to consider every possible cause.

Articles in this series[edit | edit source]

TBD