Reliability physics has historically focused on models for time-to-failure, but that approach reaching its limit. Those models generally were developed using data gathered from very simple test structures that could be stressed to failure. Today, with electronics playing a such a critical role in our everyday life, failures are no longer an option. The underlying ICs being implemented call for mega-functionality, Nano-scale manufacturing processes, advanced packaging and eventually, ceaseless use. Manufacturers must reach volume production quickly and efficiently, and adhere to strict requirements, especially in applications that demand high field reliability, such as Automotive, Datacenter and Telecom. A paradigm data-centric transformation is needed on a much broader scale.
So what will be the next big step in ensuring that sudden chip failures never occur? This is a vital question that must be answered as mission-critical electronics grow in complexity and scale, and take stage in almost every aspect of our lives.
The key is not just to identify failures, but to be able to predict failures in electronics. This is all about the Physics of Failure, estimating the remaining time-to-failure and creating alerts in advance. The next paradigm shift in reliability assurance is in performance-degradation monitoring and analysis as a precursor of failure.
Multiple physical mechanisms (HCI, BTI, EM, SM, etc.) demonstrate continuous degradation well in advance of failure. Relatively small monitoring circuits strategically placed and connected in many locations on the chip can be used to for warn of chip-circuit degradation and send alerts to the user of impending failure.
One such approach utilizes a combination of IC embedded circuits (proteanTecs calls these Agents), and off-chip machine learning algorithms that infer the digital readouts of circuits during their entire operational lifetime. The margin degradation of the ICs, as well as other vital parameters of the IC and its environmental stress are continuously monitored, predicting and preventing potential failures before they occur, and point to the Physics of Failure – providing an estimation of their time-to-failure.
There are several types of Agents:
Throughout the lifetime of the product embedded with a suite of Agents, a software-based platform uses their combined outputs as input into machine learning algorithms. Correlating readouts of a full population of a specific product further provides extremely reliable predictive maintenance in autonomous vehicle, hyperscale datacenters, medical instrumentation and other sectors where reliability is of prime importance.
The paper was initially presented at the 2019 IEEE International Reliability Physics Symposium (IRPS) and is available at 10.1109/IRPS.2019.8720527