Difference between revisions of "MISC-TN-019: Post-portem analysis of embedded Linux systems — Part 1"

From DAVE Developer's Wiki
Jump to: navigation, search
(Introduction)
(Introduction)
Line 21: Line 21:
 
One of the most challenging problems related to embedded Linux systems is the so called post-mortem analysis. Post-mortem is a Latin expression that means "after death". In this context, death is meant as an event after which the system becomes unstable or even gets stuck. Therefore, post-mortem analysis refers to the tasks carried out after the occurrence of such an event to figure out its root cause
 
One of the most challenging problems related to embedded Linux systems is the so called post-mortem analysis. Post-mortem is a Latin expression that means "after death". In this context, death is meant as an event after which the system becomes unstable or even gets stuck. Therefore, post-mortem analysis refers to the tasks carried out after the occurrence of such an event to figure out its root cause
  
Even worse, post-mortem analyses are yet harder when these events occur randomly and it is apparently impossible to trigger them in a controlled fashion. Sometimes, these situations occur when the system has already been deployed on the field and is used by end customers making the analysis amazingly troublesome.
+
Even worse, post-mortem analyses are yet harder when these events occur randomly and it is apparently impossible to trigger them in a controlled fashion. In spite of thorough testing at qualification stage, unfortunately, these situations may even occur when the system has already been deployed on the field and is used by end customers making the analysis amazingly troublesome.
  
Several techniques are available for post-mortem analysis. Software tools, hardware tools, or a combination of both can be leveraged. This article is the first of a series of Technical Notes (TN) describing in more details some of these techniques. Some TN's refer to real-world cases in which DAVE Embedded Systems put in field its expertise to support several customers reporting on-field failures they were able to analyze with traditional debugging tools.
+
Several techniques are available for post-mortem analysis. Software tools, hardware tools, or a combination of both can be leveraged. This article is the first of a series of Technical Notes (TN) describing in more details some of these techniques. Interestingly, some TN's refer to real-world cases in which DAVE Embedded Systems deployed its expertise to support several customers reporting on-field failures they were unable to analyze with traditional debugging tools and approaches. In these cases, often information reported by customers are necessarily so limited and fragmented that is impossible to determine a priori if the root cause is software or hardware. Thus, no assumption about the root cause domain can be made and engineers need to be very open-minded to consider every possible cause.
 
 
It is worth remembering that the analysis described in this series generally made no assumption about the root cause domain. In other words, information reported by customers were so limited and fragmented that was impossible to determine a priori if the root cause was software or hardware.
 

Revision as of 16:04, 21 June 2021

Info Box
Warning-icon.png This Technical Note was validated against specific versions of hardware and software. What is described here may not work with other versions. Warning-icon.png


History[edit | edit source]

Version Date Notes
1.0.0 June 2021 First public release

Introduction[edit | edit source]

One of the most challenging problems related to embedded Linux systems is the so called post-mortem analysis. Post-mortem is a Latin expression that means "after death". In this context, death is meant as an event after which the system becomes unstable or even gets stuck. Therefore, post-mortem analysis refers to the tasks carried out after the occurrence of such an event to figure out its root cause

Even worse, post-mortem analyses are yet harder when these events occur randomly and it is apparently impossible to trigger them in a controlled fashion. In spite of thorough testing at qualification stage, unfortunately, these situations may even occur when the system has already been deployed on the field and is used by end customers making the analysis amazingly troublesome.

Several techniques are available for post-mortem analysis. Software tools, hardware tools, or a combination of both can be leveraged. This article is the first of a series of Technical Notes (TN) describing in more details some of these techniques. Interestingly, some TN's refer to real-world cases in which DAVE Embedded Systems deployed its expertise to support several customers reporting on-field failures they were unable to analyze with traditional debugging tools and approaches. In these cases, often information reported by customers are necessarily so limited and fragmented that is impossible to determine a priori if the root cause is software or hardware. Thus, no assumption about the root cause domain can be made and engineers need to be very open-minded to consider every possible cause.