MISC-TN-019: Post-portem analysis of embedded Linux systems — Part 1

From DAVE Developer's Wiki
Jump to: navigation, search
Info Box
Warning-icon.png This Technical Note was validated against specific versions of hardware and software. What is described here may not work with other versions. Warning-icon.png


History[edit | edit source]

Version Date Notes
1.0.0 June 2021 First public release

Introduction[edit | edit source]

One of the most challenging problems related to embedded Linux systems is the so called post-mortem analysis. Post-mortem is a Latin expression that means "after death". In this context, death is meant as an event after which the system becomes unstable or even gets stuck. Therefore, post-mortem analysis refers to the tasks carried out after the occurrence of such an event to figure out its root cause

Even worse, post-mortem analyses are yet harder when these events occur randomly and it is apparently impossible to trigger them in a controlled fashion. Sometimes, these situations occur when the system has already been deployed on the field and is used by end customers making the analysis amazingly troublesome.

Several techniques are available for post-mortem analysis. Software tools, hardware tools, or a combination of both can be leveraged. This article is the first of a series of Technical Notes (TN) describing real-world cases in which DAVE Embedded Systems put in filed its expertise and leveraged some of these techniques to support several customers reporting on-field failures they were able to analyze with traditional debugging tools.