Chips today are under immense pressure. With wider process variation manifested at wafer and die levels in single-digit nodes, highly complex designs, and effects of application and system integration – it’s no wonder the electronics value chain is becoming ever more reliant on expensive guard-bands. The eco-system is not yet equipped to find all existing defects during test. So while quality escapes are accepted as a given, system maintenance in the field must be based on redundancies to ensure uncompromising uptime.
Why are so many defects seeping through?
Today’s Best-Known Methods (BKMs) are failing to find “difficult-to-detect” defects that although pass standard tests, will eventually lead to functional failure once deployed. Structural tests find many of the defects, but there are very small defects that pass unnoticed throughout the whole testing process. These innocent-looking chips will result in random system failures, leading to costly returns and more importantly, damaged reputation for the manufacturers. This calls for advanced techniques such as outlier detection.
Part Average Testing is one of the most commonly used methods for outlier detection. It is based on the notion that outliers can be found using high coverage measurements, such as leakage current (iDDQ), on smaller populations, such as a wafer.
Each wafer iDDQ is plotted and chips with measured iDDQ beyond the robust sigma (+/- 4 sigma) are disqualified. The inherent limitation of Part Average Testing is that in advanced process nodes the distribution inside a wafer is very wide and can be close to the whole process distribution, making this method ineffective at finding outliers.
As a result, test engineers often let chips through that, unbeknownst to them, should be labeled as rejects. This is one of many problems that are addressed by proteanTecs’ new Universal Chip Telemetry, or UCTᵀᴹ.
Defect detection in the single-digit era
Fortunately, Deep Data has reached a level of sophistication that allows it to be used to test chip parameters. Deep Data monitors the health and performance of chips from within, by means of chip telemetry. Machine learning is applied to millions of monitoring points for each of the hundreds of chips on a silicon wafer and can find relationships between them, at time zero and over time. For once there is a motivation to increase the number of parameters measured rather than to reduce them to the smallest acceptable set.
So how do we obtain input that serves as the basis of the analytics? By meticulously learning the design and process interaction, then embedding monitoring circuits into the ICs during design. These circuits are tiny, consuming a negligible part of the chip area, and even fitting into spots that might otherwise remain blank. Each of these tiny monitors, called Agentsᵀᴹ, carefully watches over a specific set of parameters. By using a large number of them a great deal can be discovered about the inner workings of the chip, parameters that may otherwise have eluded measurement because they were buried so deep within the chip, but that are extremely crucial to understanding defect patterns and root cause.
It then becomes important to distill this massive amount of information into something that makes sense to the manufacturer. proteanTecs has developed the concept of “Families”.
Readouts from the Agents are used by Machine Learning algorithms to profile and classify the chips into high resolution clusters, with a 1σ distribution (one standard deviation). Chips that fit into a specific Family are likely to perform consistently during all production stages, regardless of operating conditions, and those that start to stray from their Family’s distribution are a red flag, indicating quality issues that may arise later in the chip’s life. It’s almost like having many mini production lines that are supposed to be well behaved.
For chips in the same Family, measured parameters now travel together. These can include leakage current, dynamic power, delay, VDDmin and more. An understanding of these relationships is the key to unlocking much available, yet seemingly invisible, data.
What happens when a measured parameter does not correlate to its designated Family? Does that tell us anything? Indeed it does. Chips that don’t behave the way their respective Family does are an indication of a small defect and should be weeded out at test. These anomalous chips, or outliers, may happily pass today’s standard production testing but should be considered as “walking wounded” – chips that are quite likely to fail once they are put into continuous day-to-day operation.
Family-based outlier detection
The chart below gives an example of Deep Data outlier detection. In this case the measured leakage current is plotted against the Family classification. The leakage current measurement was not used to create the Families, but rather to compare and find chips that do not behave as expected.
Each dot below represents one or more chips. For the sake of clarity, the Families, which are listed on the horizontal axis, are sorted in this particular chart according to their mean leakage current so that they form a rising line. Note that all the gray dots in any single vertical column represent individual chips that are members of a Family. All of the gray chips of the same family have a leakage current that is within 1σ of the total distribution.
The red horizontal lines, indicating limits, represent today’s best-known test methods: The dotted lines are the upper and lower Part Average Testing limit, which is based on the normal distribution of the parts, usually on a wafer level population. The two solid lines, the upper and lower Spec Limits, represent the levels that the chip manufacturer guarantees in the chip specification.
All of the dots but one fit between the upper and lower Part Average Testing limits. All of these chips will pass standard functional and structural tests that use today’s best-known methods. Only the one that lies outside of the upper Part Average Testing limit (circled below in Family 18) will be disqualified and weeded out, even though it’s within specification. The rest of the chips will be considered good chips that passed all tests.
The orange dots are more than 1σ away from the mean – you can see two such outliers in Family 11 and Family 18. The one in Family 18 (circled below) will pass the standard functional test, even though it has a small defect that caused its leakage to be higher than expected relative to its Family. Whatever has caused this one chip to deviate from Family 18 is likely to result in issues later on, and it should be disqualified. But standard functional and structural testing will let it pass since its leakage current falls between the high & low limits, and the defect impact in structural test result will not be detected.
Since it is an outlier this chip is trying to tell us that something is different, and this might be a warning of a latent defect. Note, though, that not only is the leakage for this chip within the Part Average Testing limits, it is also lower than the leakage of the chip with the highest leakage at the far right-hand side of the chart, as can be seen by following the thin black line from the orange dot to the right side of the chart. Seemingly, this chip is well within the full or wafer population distribution. If we were to try to weed out the outlier in Family 18 by lowering the Part Average Testing limit, then we would necessarily reject the completely functional chips on the upper right.
The two outliers in Family 11 are even more challenging. They are circled below.
As with the outlier in Family 18, the upper Family 11 outlier has a leakage current that is slightly lower than a few of the perfectly healthy chips at the right side of the chart, as illustrated by the thin black lines in the chart below.
Lowering the test limit to capture the top outlier would also result in rejecting a few good chips in the upper right side of the gray dots. If the Part Average Testing limits were tightened to weed out both of these outliers, then almost 20% of the good chips would be disqualified. This is a completely unacceptable solution but allowing the outlier to pass the test is also unacceptable.
So how do you find the outliers without compromising yield?
The answer is to move from today’s standard “wide distribution” methods to a more sophisticated form of outlier detection, in which chips are classified into Families and then outliers can be searched from these Families. The result is numerous product populations, each of which is unlikely to fail, while the risk of quality escapes will have been removed. Field tests have determined that defect parts per million (DPPM) can be slashed to one tenth their existing level by shifting to Family based outlier detection, all without sacrificing good chips as collateral damage.
Thanks to this approach chip manufacturers know exactly what they are shipping, and that the number of field failures will be as low as guaranteed.