In-Chip Monitoring for Safer Electronics

Safety-critical plus process complexity means higher risk

If you’re involved in designing or manufacturing chips, you’re in the front row watching two simultaneous and independent trends that are causing big challenges both for you and for the folks building systems out of your ICs.

The first trend is the electrification of systems that may have fatal outcomes if they fail. Historically, this has been a consideration for systems that are clearly safety-critical, like automobiles and airplanes with their “drive-by-wire” and “fly-by-wire” initiatives. But, increasingly, electronics are showing up in all kinds of systems everywhere, and these devices will be widely interconnected. So even a failure within an innocuous-seeming system could lead, through a chain of interactions, to a serious negative outcome, like a service outage.

The second trend comes from the complexity of advanced silicon process nodes, making it much harder to hit aggressive performance, power, and cost targets. Process-parameter variation makes your job even harder.

Put these two trends together, and you have safety-critical systems containing semiconductor chips that have been built on the most advanced silicon process nodes. The increasing risk of failure and harm creates a need to monitor how each individual integrated circuit is operating. This will help with three critical phases of a chip’s life.

  • When a chip is manufactured and tested, precise measurements on the internal state of delicate circuits tell you exactly how the chip is performing. You might see this as a job for traditional chip testing, but such tests reveal only what’s available on die pads (at wafer sort) and package pins (at final test); the chip itself remains a “black box.” And many tests give you only a pass/fail result, providing no deeper clues about why the test failed. You can get much better insights by monitoring internal parameters. This can help to improve yields and eliminate devices that are defective or close to, but not quite, defective (“walking wounded”).
  • Once shipped, the chip is then integrated into a system, and that system is tested. Here again, the testing determines whether the system can be shipped, but it doesn’t give a more nuanced picture of how each IC is individually performing. In fact, even less test information is available than during chip test. For instance, if the current drawn by the power pins on one chip has changed as a result of some assembly step, there’s no way to isolate that current on a circuit board. You need to have a way of measuring the chip internally, independently of the rest of the system.
  • Finally, a system containing the chip is deployed into the field. Having passed IC and system tests, the system operates as expected at time zero. But degradation and environmental factors affect that performance, and those factors may be different from the testing environments. The system may encounter conditions or events outside the range service providers and the system designers anticipated. Or perhaps the external conditions are suitable, but the chip is marginal and     fails in the field. There may also be “latent defects” that testing doesn’t detect. The right field circumstances can trigger an unexpected failure. If a chip fails for any of these reasons, internal monitoring gives direct evidence as to what went wrong and why. The chip also serves as an IoT-like sensor, warning on system issues external to the chip as well.  Even if there is no outright failure, monitoring can provide system service providers with a red flag about faults in the system, allowing a controlled replacement before anything goes wrong. This is the promise of predictive maintenance.  

Agents monitor in-situ and machine learning infers in-vivo

In order to make such monitoring possible, we need to start at chip design. This is when Agents are inserted into the design:low footprint circuits embedded into the chip, with corresponding algorithms placed in a server residing in a cloud or server farm. proteanTecs’ patented technology determines, through detailed analysis of process parameter variability and design models, the type and in-chip location of these Agents.

Agents take direct parametric measurements and communicate them to a cloud-based analytics platform through existing communication channels on the testers or in the systems. Machine learning is applied to convert the raw measurements into intelligence that lets any number of stakeholders take action where and when needed. The aggregated data will reflect the entire lifecycle of the chip, from design and manufacturing through its life in the field.

Depending on where you are in the lifecycle, you can

  • implement precise fine binning within a family of products
  • assess internal margins and see how they vary with time and with operating parameters like temperature
  • look at actual measurements vs. expected values
  • identify outliers, and
  • monitor degradation over time.

Each individual chip can be monitored from production through its useful life. Adverse events can be anticipated before they happen.

Chip manufacturers can feed measurements from early in the lifecycle forward, helping to drive decisions as to whether, for example, a specific marginal device should be used in a particular system or perhaps reserved for some less-demanding system. This can be an effective way of avoiding possible expensive system recalls.

Manufacturing engineers can feed ongoing learning from any field failures back into the manufacturing line to speed identification of root-cause corrections, improving the quality and reliability of subsequent chips. Process engineers can also use that data to improve the design models that you use during design verification. System integrators will be comforted not only by the increased manufacturing scrutiny, but also by the ongoing monitoring that will provide a heads-up before anything goes wrong in the field.

Interested in learning more about our solutions?

Button Text