Failure of a Circuit
Circuit board failures
When systems fail in the field due to critical PCBA's (Printed Circuit Board Assembly) having and event, dendrite shorting, open solder joints, or even creep corrosion due to sulfur from the environment, it is important to determine root causes and determine what impact did each of the following elements play into the failure that didn't protect the system for the PFMEA and beyond:
Design - were critical circuits placed too close creating an easy short for moisture / residue?
Fabrication - were fabrication residues on components or PCB's absorbing moisture or corrosive?
Assembly - were interactive processes creating residues that caused parasitic or shorting conditions?
Environment - Did shipping, storage, or operational environmental introduce gases or moisture to cause performance problems, external or internal to sealed systems outgassing?
Electronic hardware is designed, fabricated, assembled and qualified per the requirements of each OEM in there industrial application, and when hardware fails in the field it is assessed compared to what was qualified and historical performance against the failure to see if anything changed. It is assumed many times that everything is as the qualification were ran, but there is variation in the fabrication / manufacturing process that occur, understanding this variation becomes critical, and this is why the PFMEA (Potential Failure Mode and Effects Analysis) and fish-bone diagrams are so critical.
There are so many failure modes to discuss in this short blog, but the key element to assess, are the conditions that caused the failure external to the hardware (i.e. sulfur gaseous environment causing creep corrosion) or open leads due to corrosion, shorting leads, vias, or traces due to process residues causing electrochemical shorts. It is the information we learn from the field returns that support the design improvements and hardening of the system to reduce these risk in future builds, and ways of protecting current systems in the field. The reality in 2022 is that for the last 15 years the number of recalls for all electronic systems are at the highest in industrial history (even though we are putting electronics in places where electronics where never expected to be). Just this month an automotive manufacturer announced a recall of 485,000 vehicles due to a non-crash fire risk of cars from 2014 - 2019 and instructed owners to park vehicles outside and not drive them.
Understanding that electronic hardware was never designed to fail and building on historical information on the operating environment and conditions, precautions that have worked historically are incorporated, but there are so many changes that are in today's electronics, these may not be completely effective either. The use of conformal coatings, housings, and protection systems that are used with todays more sensitive and denser spaced components many times are not effective at complete protection.
With the complete range of electronics from wearable tech, advanced medical systems, autonomous vehicles, drones, and advanced space flight to garage door openers, toys, and consumer appliance, having problems on there own and now introduce environmental fogging with disinfectants (all but a couple are very corrosive or flammable) that are impacting field performance. The risk of viruses and the ability to maintain a safe working environment adds a new risk to everyday electronics.
This is why it is critical to learn from each and ever return including the "No Trouble Found" (NTF) units that may have failed due to a parasitic path on a clock circuit not seen at the bench due to the conditions changing from the time of the event. We suggest with some of these returns to ensure it is not just a software glitch, to place the returns in a humid environment (non-condensing 40C/85%RH) for 4 hours and retest to see if a parasitic event did occur due to process residues or environmental exposure conditions.
Electronic hardware is qualified to perform in the field and meet the operational expectations but when it doesn't it becomes critical to understand if the process have variational effects, rework, or did the environments create a condition that caused the performance issue. On a daily basis electronic system perform as intended for the expected life of the hardware, but more cases are showing failures that lead to critical recalls. So as a trend it is important to understand when things are going well and no field returns, what is right, so when it hits the fan data is present to understand what changed.