<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" version="2.0">
  <channel>
    <title>Blog</title>
    <link>https://www.proteantecs.com/blog</link>
    <description>Explore the latest trends and innovations in technology, focusing on optimizing power and performance, on-chip monitoring, semiconductor testing, and enhancing electronics and system reliability.</description>
    <language>en</language>
    <pubDate>Tue, 07 Apr 2026 09:02:37 GMT</pubDate>
    <dc:date>2026-04-07T09:02:37Z</dc:date>
    <dc:language>en</dc:language>
    <item>
      <title>Why Hardware Monitoring Needs Infrastructure, Not Just Sensors</title>
      <link>https://www.proteantecs.com/blog/why-hardware-monitoring-needs-infrastructure-not-just-sensors</link>
      <description>&lt;div class="hs-featured-image-wrapper"&gt; 
 &lt;a href="https://www.proteantecs.com/blog/why-hardware-monitoring-needs-infrastructure-not-just-sensors" title="" class="hs-featured-image-link"&gt; &lt;img src="https://www.proteantecs.com/hubfs/blog%20Why%20Hardware%20Monitoring%20Needs%20Infrastructure%2c%20Not%20Just%20Sensors%20thumbnail.jpg" alt="Why Hardware Monitoring Needs Infrastructure, Not Just Sensors" class="hs-featured-image" style="width:auto !important; max-width:50%; float:left; margin:0 15px 15px 0;"&gt; &lt;/a&gt; 
&lt;/div&gt; 
&lt;h5 style="text-align: center;"&gt;&lt;strong&gt;&lt;span style="line-height: 18.3458px;"&gt;proteanTecs Hardware Monitoring System from Agents &amp;amp; Sensors to Insights&lt;/span&gt;&lt;/strong&gt;&lt;span style="background-color: #606060; line-height: 18.3458px;"&gt; &lt;/span&gt;&lt;/h5&gt;</description>
      <content:encoded>&lt;div class="hs-featured-image-wrapper"&gt; 
 &lt;a href="https://www.proteantecs.com/blog/why-hardware-monitoring-needs-infrastructure-not-just-sensors" title="" class="hs-featured-image-link"&gt; &lt;img src="https://www.proteantecs.com/hubfs/blog%20Why%20Hardware%20Monitoring%20Needs%20Infrastructure%2c%20Not%20Just%20Sensors%20thumbnail.jpg" alt="Why Hardware Monitoring Needs Infrastructure, Not Just Sensors" class="hs-featured-image" style="width:auto !important; max-width:50%; float:left; margin:0 15px 15px 0;"&gt; &lt;/a&gt; 
&lt;/div&gt; 
&lt;h5 style="text-align: center;"&gt;&lt;strong&gt;&lt;span style="line-height: 18.3458px;"&gt;proteanTecs Hardware Monitoring System from Agents &amp;amp; Sensors to Insights&lt;/span&gt;&lt;/strong&gt;&lt;span style="background-color: #606060; line-height: 18.3458px;"&gt; &lt;/span&gt;&lt;/h5&gt;  
&lt;img src="https://track.hubspot.com/__ptq.gif?a=7884687&amp;amp;k=14&amp;amp;r=https%3A%2F%2Fwww.proteantecs.com%2Fblog%2Fwhy-hardware-monitoring-needs-infrastructure-not-just-sensors&amp;amp;bu=https%253A%252F%252Fwww.proteantecs.com%252Fblog&amp;amp;bvt=rss" alt="" width="1" height="1" style="min-height:1px!important;width:1px!important;border-width:0!important;margin-top:0!important;margin-bottom:0!important;margin-right:0!important;margin-left:0!important;padding-top:0!important;padding-bottom:0!important;padding-right:0!important;padding-left:0!important; "&gt;</content:encoded>
      <pubDate>Wed, 18 Mar 2026 09:20:16 GMT</pubDate>
      <guid>https://www.proteantecs.com/blog/why-hardware-monitoring-needs-infrastructure-not-just-sensors</guid>
      <dc:date>2026-03-18T09:20:16Z</dc:date>
      <dc:creator>Admin</dc:creator>
    </item>
    <item>
      <title>OCP Warns AI Is Compromised by SDC. proteanTecs In-Chip Monitoring Restores Trust | proteanTecs Blog</title>
      <link>https://www.proteantecs.com/blog/ocp-warns-ai-is-compromised-by-sdc.-proteantecs-in-chip-monitoring-restores-trust</link>
      <description>&lt;div class="hs-featured-image-wrapper"&gt; 
 &lt;a href="https://www.proteantecs.com/blog/ocp-warns-ai-is-compromised-by-sdc.-proteantecs-in-chip-monitoring-restores-trust" title="" class="hs-featured-image-link"&gt; &lt;img src="https://www.proteantecs.com/hubfs/blog%20response%20to%20OCP%20by%20proteantecs.jpg" alt="OCP Warns AI Is Compromised by SDC. proteanTecs In-Chip Monitoring Restores Trust | proteanTecs Blog" class="hs-featured-image" style="width:auto !important; max-width:50%; float:left; margin:0 15px 15px 0;"&gt; &lt;/a&gt; 
&lt;/div&gt; 
&lt;h4 style="line-height: 31px; color: #474c68; background-color: #ffffff;"&gt;Ensuring AI Reliability: Mitigating OCP's Silent Data Corruption Risks.&lt;/h4&gt; 
&lt;div style="color: #474c68; background-color: #ffffff;"&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;br&gt;Silent Data Corruption (SDC) is an industry challenge affecting data centers worldwide with increasing frequency. This phenomenon stems from untraceable hardware failures that make detection notoriously difficult. SDCs don’t leave any record in system logs or trigger exception mechanisms. The corrupted data they produce can propagate unnoticed, causing cascading failures that often demand extensive resources to root cause.&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;A recent&lt;a href="https://www.opencompute.org/documents/sdc-in-ai-ocp-whitepaper-final-pdf" style="color: #697694;"&gt;Open Compute Project (OCP) whitepaper&lt;/a&gt;, authored by experts from NVIDIA, Google, Meta, Microsoft, and others, underscores the critical impact of SDC on large-scale AI/ML systems.&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;strong&gt;OCP Says SDC Is on the Rise, Compromising AI Workload Integrity in Data Centers&lt;/strong&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;SDC has emerged as a critical reliability threat to scaling AI training and inference, as it corrupts computations without triggering alerts. Unlike memory bit flips, for example, mitigated by error correction codes (ECC), SDCs originate from subtle timing violations, aging effects, or marginal defects that escape standard semiconductor testing and data center monitoring.&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;The problem has grown worse with GenAI's explosive growth and increasingly complex chip architectures, leading the paper to regard SDC as a "needle in a haystack" challenge. New process nodes push semiconductor boundaries, while the unprecedented scale of intensive AI workloads stresses chipsets to their thermal and timing limits.&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;The OCP paper walks through multiple stress factors that increase SDC probability in AI hardware, with several significant ones outlined below:&lt;/p&gt; 
 &lt;ul&gt; 
  &lt;li style="color: #697694;"&gt;&lt;strong&gt;Shrinking Process Geometries Increase Fault Susceptibility&lt;/strong&gt;&lt;br&gt;Smaller transistors tighten device margins, increasing vulnerability to transient faults and permanent failures.&lt;/li&gt; 
 &lt;/ul&gt; 
 &lt;ul&gt; 
  &lt;li style="color: #697694;"&gt;&lt;strong&gt;Aggressive Voltage and Frequency Scaling Reduces Timing Margins&lt;/strong&gt;&lt;br&gt;Dynamic scaling improves performance but narrows timing headroom, making small-delay defects more likely to escape detection.&lt;/li&gt; 
 &lt;/ul&gt; 
 &lt;ul&gt; 
  &lt;li style="color: #697694;"&gt;&lt;strong&gt;Increased Current Draw and Power-Delivery Noise Raise Timing Issues&lt;/strong&gt;&lt;br&gt;Wider parallel execution and higher clock frequencies increase current draw and PDN noise, making systems more prone to timing-related faults.&lt;/li&gt; 
 &lt;/ul&gt; 
 &lt;ul&gt; 
  &lt;li style="color: #697694;"&gt;&lt;strong&gt;Progressive Wear-Out Introduces Time-Dependent Failures&lt;/strong&gt;&lt;br&gt;Over time, defects such as electromigration or process marginalities can cause transistors or interconnects to fail occasionally. As a result, hardware that initially passes validation gradually degrades under intensive AI workloads until it fails.&lt;/li&gt; 
 &lt;/ul&gt; 
 &lt;ul&gt; 
  &lt;li style="color: #697694;"&gt;&lt;strong&gt;Hardware Faults Remain Hidden Across Software Layers&lt;/strong&gt;&lt;br&gt;Errors introduced at the hardware level may surface only after several software transformations, making detection more difficult.&lt;/li&gt; 
  &lt;li style="color: #697694;"&gt;&lt;strong&gt;Combined Stress Conditions Amplify Reliability Challenges&lt;/strong&gt;&lt;br&gt;The likelihood of SDC grows when several factors, such as voltage droop and high temperature, combine under intensive workloads, making silent errors hard to detect and reproduce. This factor is more significant in AI accelerators, which often operate near the limits of their power and thermal envelopes.&lt;/li&gt; 
 &lt;/ul&gt;  
 &lt;a href="http://proteantecs-7884687-hs-sites-com.sandbox.hs-sites.com/resources/whitepaper-outsmarting-silent-data-corruption-in-ai-processors-with-two-stage-detection" style="color: #697694;"&gt; 
  &lt;div style="color: rgba(0, 0, 0, 0);"&gt;&lt;/div&gt; &lt;/a&gt;  
 &lt;p style="color: #697694; line-height: 27px;"&gt;‍&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;‍&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;strong&gt;SDC Impact on AI Training and Inference&lt;/strong&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;The OCP whitepaper emphasizes that SDCs pose distinct challenges depending on the type of AI workload. During training, even a single undetected fault can waste months of valuable computational resources by silently corrupting the learning process. In inference deployments, SDCs directly undermine the reliability of AI services by producing incorrect outputs. The impact is especially severe in safety-critical applications such as autonomous vehicles and medical diagnostics.&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;Workload-specific SDC impacts:&lt;/p&gt; 
 &lt;strong&gt;Training:&lt;span&gt; &lt;/span&gt;&lt;/strong&gt;Corrupted Gradients Create an Illusion of Progress 
 &lt;br&gt; 
 &lt;br&gt;W 
 &lt;span style="background-color: transparent;"&gt;hen SDC &lt;/span&gt; 
 &lt;span style="background-color: transparent;"&gt;corrupts values without triggering Not-a-Number (NaN) errors, distributed training propagates this invalid data as legitimate results across multiple cluster accelerators. This contamination can lead to gradient explosion, implosion, or convergence at an incorrect local minimum. Such problems may take a very long time to detect while the training appears to be making forward progress.&lt;/span&gt; 
 &lt;br&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;‍&lt;/p&gt; 
 &lt;ul&gt; 
  &lt;li style="color: #697694;"&gt;&lt;strong&gt;Inference:&lt;span&gt; &lt;/span&gt;&lt;/strong&gt;Persistent Defects Contaminate Thousands of Predictions&lt;/li&gt; 
 &lt;/ul&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;Faulty hardware in inference clusters might generate corrupted outputs, potentially affecting thousands of users per hour. Debugging these errors can be highly challenging, as they can bypass detection mechanisms while compromising privacy and integrity policies. Moreover, this troubleshooting process can affect production capacity until the offending node is identified and quarantined.&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&amp;nbsp;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;strong&gt;Why Traditional Controls Miss SDC in AI Fleets&lt;/strong&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;Standard testing methodologies, whether executed in situ or via scheduled maintenance, exhibit notable deficiencies:&lt;/p&gt; 
 &lt;ul&gt; 
  &lt;li style="color: #697694;"&gt;&lt;strong&gt;In-situ testing&lt;/strong&gt;, when relying on canary circuits, fails to account for the actual, critical path timing margins, which might decrease due to aging and process variations. This is a particularly vital concern given the rising levels of on-chip variation within a device, a trend highlighted in the 2024 paper, "Manufacturing Roadmap for Heterogeneous Integration and Electronics Packaging."&lt;br&gt;‍&lt;/li&gt; 
 &lt;/ul&gt; 
 &lt;ul&gt; 
  &lt;li style="color: #697694;"&gt;&lt;strong&gt;Periodic maintenance testing&lt;/strong&gt;&lt;span&gt; &lt;/span&gt;often lacks sufficient sensitivity, tending to detect only distinct failures while missing the more subtle issues related to SDC. Furthermore, this method lacks the real-life operational conditions that characterize in-situ monitoring, as the tested devices are temporarily removed from the active fleet.&lt;br&gt;‍&lt;/li&gt; 
 &lt;/ul&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&amp;nbsp;&lt;/p&gt;  
 &lt;div style="color: rgba(0, 0, 0, 0);"&gt;&lt;/div&gt;  
 &lt;sub style="line-height: 0;"&gt;A canary circuit that monitors design margins is a critical path replicator, which cannot provide accurate data about actual critical path timing.&lt;/sub&gt;   
 &lt;p style="color: #697694; line-height: 27px;"&gt;&amp;nbsp;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;‍&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&amp;nbsp;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;Given the limited efficacy of current best-known methods, the OCP paper dedicates a whole section to multiple open research questions. It regards SDC as an unresolved challenge with a critical impact on AI systems, calling for novel approaches that capture the nuanced ways in which silent errors occur.&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;‍&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;strong&gt;proteanTecs’ In-Chip Monitoring Restores Trust With Real-Time SDC Prevention&lt;/strong&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;Conventional SDC prevention methods typically rely on periodic maintenance, which incurs costly overhead by testing all servers regardless of their health. However, even fleet operators who accept the expense of excessive testing are not secure. Unfortunately, they still face many SDC cases, which they often detect only after the faults have already impacted the production environment.&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;‍&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;proteanTecs takes a different approach, offering predictive maintenance instead of preventive maintenance. This novel technology can identify issues in real time and even correct them. The detected events are not actual faults yet, but they might accumulate to a low chip Health Index, which often precedes SDCs. proteanTecs uses dedicated thresholds to deduce when margins get dangerously low, as depicted below.&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;‍&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&amp;nbsp;&lt;/p&gt;  
 &lt;div style="color: rgba(0, 0, 0, 0);"&gt;&lt;/div&gt;  
 &lt;sub style="line-height: 0;"&gt;proteanTecs provides a real-time indication of a severe margin drop that might cause SDC in a 5nm data center chip (visualization of embedded firmware application).&lt;/sub&gt;   
 &lt;p style="color: #697694; line-height: 27px;"&gt;&amp;nbsp;&lt;/p&gt;  
 &lt;a href="https://proteantecs-7884687-hs-sites-com.sandbox.hs-sites.com/resources/whitepaper-scaling-genai-training-and-inference-chips-with-runtime-monitoring" style="color: #697694;"&gt; 
  &lt;div style="color: rgba(0, 0, 0, 0);"&gt;&lt;/div&gt; &lt;/a&gt;  
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;br&gt;Unlike canary circuits, proteanTecs uses on-chip Agents that monitor the timing margins of millions of real paths for more informed decisions. These Agents can provide very high coverage of the design’s logic and pinpoint the real critical paths that traditional methods often miss. This approach allows precise action based on real workloads, aging, and IR drops.&lt;/p&gt;  
 &lt;div style="color: rgba(0, 0, 0, 0);"&gt;&lt;/div&gt;  
 &lt;sub style="line-height: 0;"&gt;Unlike canary circuits (right, in yellow), proteanTecs uses on-chip Agents (left, in blue) that monitor true critical paths.&lt;/sub&gt;   
 &lt;p style="color: #697694; line-height: 27px;"&gt;‍&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;proteanTecs provides the Health Index by processing on-chip Agent readings alongside other inputs using advanced real-time algorithms. A low index score might trigger an interrupt, allowing the Baseboard Management Controller (BMC) to decide whether to take corrective action given the current system status.&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;In some configurations, the proteanTecs solutions take corrective action on their own without the BMC, offering prescriptive maintenance as well. Chips equipped with this technology can automatically adjust voltage or frequency to compensate for aging, adapt to workload demands, and help prevent SDC.&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;strong&gt;Ensuring AI Reliability: Chip Monitoring as the Answer to SDC&lt;/strong&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;As AI systems continue to scale and process nodes shrink further, SDC will only become more prevalent. The OCP whitepaper makes clear that traditional approaches to mitigating SDC are insufficient for the RAS (reliability, availability, serviceability) demands of modern AI infrastructure.&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;proteanTecs' runtime monitoring technology represents a fundamental shift in how the industry can address this challenge. By monitoring millions of real critical paths during actual workload execution, it transforms SDC from an invisible threat into a manageable risk.&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;The ability to detect margin degradation before it causes corruption protects months of training investment and prevents corrupted outputs from reaching inference customers. At AI's current scale and intensity, this capability is no longer optional.&lt;/p&gt; 
&lt;/div&gt;</description>
      <content:encoded>&lt;div class="hs-featured-image-wrapper"&gt; 
 &lt;a href="https://www.proteantecs.com/blog/ocp-warns-ai-is-compromised-by-sdc.-proteantecs-in-chip-monitoring-restores-trust" title="" class="hs-featured-image-link"&gt; &lt;img src="https://www.proteantecs.com/hubfs/blog%20response%20to%20OCP%20by%20proteantecs.jpg" alt="OCP Warns AI Is Compromised by SDC. proteanTecs In-Chip Monitoring Restores Trust | proteanTecs Blog" class="hs-featured-image" style="width:auto !important; max-width:50%; float:left; margin:0 15px 15px 0;"&gt; &lt;/a&gt; 
&lt;/div&gt; 
&lt;h4 style="line-height: 31px; color: #474c68; background-color: #ffffff;"&gt;Ensuring AI Reliability: Mitigating OCP's Silent Data Corruption Risks.&lt;/h4&gt; 
&lt;div style="color: #474c68; background-color: #ffffff;"&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;br&gt;Silent Data Corruption (SDC) is an industry challenge affecting data centers worldwide with increasing frequency. This phenomenon stems from untraceable hardware failures that make detection notoriously difficult. SDCs don’t leave any record in system logs or trigger exception mechanisms. The corrupted data they produce can propagate unnoticed, causing cascading failures that often demand extensive resources to root cause.&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;A recent&lt;a href="https://www.opencompute.org/documents/sdc-in-ai-ocp-whitepaper-final-pdf" style="color: #697694;"&gt;Open Compute Project (OCP) whitepaper&lt;/a&gt;, authored by experts from NVIDIA, Google, Meta, Microsoft, and others, underscores the critical impact of SDC on large-scale AI/ML systems.&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;strong&gt;OCP Says SDC Is on the Rise, Compromising AI Workload Integrity in Data Centers&lt;/strong&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;SDC has emerged as a critical reliability threat to scaling AI training and inference, as it corrupts computations without triggering alerts. Unlike memory bit flips, for example, mitigated by error correction codes (ECC), SDCs originate from subtle timing violations, aging effects, or marginal defects that escape standard semiconductor testing and data center monitoring.&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;The problem has grown worse with GenAI's explosive growth and increasingly complex chip architectures, leading the paper to regard SDC as a "needle in a haystack" challenge. New process nodes push semiconductor boundaries, while the unprecedented scale of intensive AI workloads stresses chipsets to their thermal and timing limits.&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;The OCP paper walks through multiple stress factors that increase SDC probability in AI hardware, with several significant ones outlined below:&lt;/p&gt; 
 &lt;ul&gt; 
  &lt;li style="color: #697694;"&gt;&lt;strong&gt;Shrinking Process Geometries Increase Fault Susceptibility&lt;/strong&gt;&lt;br&gt;Smaller transistors tighten device margins, increasing vulnerability to transient faults and permanent failures.&lt;/li&gt; 
 &lt;/ul&gt; 
 &lt;ul&gt; 
  &lt;li style="color: #697694;"&gt;&lt;strong&gt;Aggressive Voltage and Frequency Scaling Reduces Timing Margins&lt;/strong&gt;&lt;br&gt;Dynamic scaling improves performance but narrows timing headroom, making small-delay defects more likely to escape detection.&lt;/li&gt; 
 &lt;/ul&gt; 
 &lt;ul&gt; 
  &lt;li style="color: #697694;"&gt;&lt;strong&gt;Increased Current Draw and Power-Delivery Noise Raise Timing Issues&lt;/strong&gt;&lt;br&gt;Wider parallel execution and higher clock frequencies increase current draw and PDN noise, making systems more prone to timing-related faults.&lt;/li&gt; 
 &lt;/ul&gt; 
 &lt;ul&gt; 
  &lt;li style="color: #697694;"&gt;&lt;strong&gt;Progressive Wear-Out Introduces Time-Dependent Failures&lt;/strong&gt;&lt;br&gt;Over time, defects such as electromigration or process marginalities can cause transistors or interconnects to fail occasionally. As a result, hardware that initially passes validation gradually degrades under intensive AI workloads until it fails.&lt;/li&gt; 
 &lt;/ul&gt; 
 &lt;ul&gt; 
  &lt;li style="color: #697694;"&gt;&lt;strong&gt;Hardware Faults Remain Hidden Across Software Layers&lt;/strong&gt;&lt;br&gt;Errors introduced at the hardware level may surface only after several software transformations, making detection more difficult.&lt;/li&gt; 
  &lt;li style="color: #697694;"&gt;&lt;strong&gt;Combined Stress Conditions Amplify Reliability Challenges&lt;/strong&gt;&lt;br&gt;The likelihood of SDC grows when several factors, such as voltage droop and high temperature, combine under intensive workloads, making silent errors hard to detect and reproduce. This factor is more significant in AI accelerators, which often operate near the limits of their power and thermal envelopes.&lt;/li&gt; 
 &lt;/ul&gt;  
 &lt;a href="http://proteantecs-7884687-hs-sites-com.sandbox.hs-sites.com/resources/whitepaper-outsmarting-silent-data-corruption-in-ai-processors-with-two-stage-detection" style="color: #697694;"&gt; 
  &lt;div style="color: rgba(0, 0, 0, 0);"&gt;&lt;/div&gt; &lt;/a&gt;  
 &lt;p style="color: #697694; line-height: 27px;"&gt;‍&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;‍&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;strong&gt;SDC Impact on AI Training and Inference&lt;/strong&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;The OCP whitepaper emphasizes that SDCs pose distinct challenges depending on the type of AI workload. During training, even a single undetected fault can waste months of valuable computational resources by silently corrupting the learning process. In inference deployments, SDCs directly undermine the reliability of AI services by producing incorrect outputs. The impact is especially severe in safety-critical applications such as autonomous vehicles and medical diagnostics.&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;Workload-specific SDC impacts:&lt;/p&gt; 
 &lt;strong&gt;Training:&lt;span&gt; &lt;/span&gt;&lt;/strong&gt;Corrupted Gradients Create an Illusion of Progress 
 &lt;br&gt; 
 &lt;br&gt;W 
 &lt;span style="background-color: transparent;"&gt;hen SDC &lt;/span&gt; 
 &lt;span style="background-color: transparent;"&gt;corrupts values without triggering Not-a-Number (NaN) errors, distributed training propagates this invalid data as legitimate results across multiple cluster accelerators. This contamination can lead to gradient explosion, implosion, or convergence at an incorrect local minimum. Such problems may take a very long time to detect while the training appears to be making forward progress.&lt;/span&gt; 
 &lt;br&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;‍&lt;/p&gt; 
 &lt;ul&gt; 
  &lt;li style="color: #697694;"&gt;&lt;strong&gt;Inference:&lt;span&gt; &lt;/span&gt;&lt;/strong&gt;Persistent Defects Contaminate Thousands of Predictions&lt;/li&gt; 
 &lt;/ul&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;Faulty hardware in inference clusters might generate corrupted outputs, potentially affecting thousands of users per hour. Debugging these errors can be highly challenging, as they can bypass detection mechanisms while compromising privacy and integrity policies. Moreover, this troubleshooting process can affect production capacity until the offending node is identified and quarantined.&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&amp;nbsp;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;strong&gt;Why Traditional Controls Miss SDC in AI Fleets&lt;/strong&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;Standard testing methodologies, whether executed in situ or via scheduled maintenance, exhibit notable deficiencies:&lt;/p&gt; 
 &lt;ul&gt; 
  &lt;li style="color: #697694;"&gt;&lt;strong&gt;In-situ testing&lt;/strong&gt;, when relying on canary circuits, fails to account for the actual, critical path timing margins, which might decrease due to aging and process variations. This is a particularly vital concern given the rising levels of on-chip variation within a device, a trend highlighted in the 2024 paper, "Manufacturing Roadmap for Heterogeneous Integration and Electronics Packaging."&lt;br&gt;‍&lt;/li&gt; 
 &lt;/ul&gt; 
 &lt;ul&gt; 
  &lt;li style="color: #697694;"&gt;&lt;strong&gt;Periodic maintenance testing&lt;/strong&gt;&lt;span&gt; &lt;/span&gt;often lacks sufficient sensitivity, tending to detect only distinct failures while missing the more subtle issues related to SDC. Furthermore, this method lacks the real-life operational conditions that characterize in-situ monitoring, as the tested devices are temporarily removed from the active fleet.&lt;br&gt;‍&lt;/li&gt; 
 &lt;/ul&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&amp;nbsp;&lt;/p&gt;  
 &lt;div style="color: rgba(0, 0, 0, 0);"&gt;&lt;/div&gt;  
 &lt;sub style="line-height: 0;"&gt;A canary circuit that monitors design margins is a critical path replicator, which cannot provide accurate data about actual critical path timing.&lt;/sub&gt;   
 &lt;p style="color: #697694; line-height: 27px;"&gt;&amp;nbsp;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;‍&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&amp;nbsp;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;Given the limited efficacy of current best-known methods, the OCP paper dedicates a whole section to multiple open research questions. It regards SDC as an unresolved challenge with a critical impact on AI systems, calling for novel approaches that capture the nuanced ways in which silent errors occur.&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;‍&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;strong&gt;proteanTecs’ In-Chip Monitoring Restores Trust With Real-Time SDC Prevention&lt;/strong&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;Conventional SDC prevention methods typically rely on periodic maintenance, which incurs costly overhead by testing all servers regardless of their health. However, even fleet operators who accept the expense of excessive testing are not secure. Unfortunately, they still face many SDC cases, which they often detect only after the faults have already impacted the production environment.&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;‍&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;proteanTecs takes a different approach, offering predictive maintenance instead of preventive maintenance. This novel technology can identify issues in real time and even correct them. The detected events are not actual faults yet, but they might accumulate to a low chip Health Index, which often precedes SDCs. proteanTecs uses dedicated thresholds to deduce when margins get dangerously low, as depicted below.&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;‍&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&amp;nbsp;&lt;/p&gt;  
 &lt;div style="color: rgba(0, 0, 0, 0);"&gt;&lt;/div&gt;  
 &lt;sub style="line-height: 0;"&gt;proteanTecs provides a real-time indication of a severe margin drop that might cause SDC in a 5nm data center chip (visualization of embedded firmware application).&lt;/sub&gt;   
 &lt;p style="color: #697694; line-height: 27px;"&gt;&amp;nbsp;&lt;/p&gt;  
 &lt;a href="https://proteantecs-7884687-hs-sites-com.sandbox.hs-sites.com/resources/whitepaper-scaling-genai-training-and-inference-chips-with-runtime-monitoring" style="color: #697694;"&gt; 
  &lt;div style="color: rgba(0, 0, 0, 0);"&gt;&lt;/div&gt; &lt;/a&gt;  
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;br&gt;Unlike canary circuits, proteanTecs uses on-chip Agents that monitor the timing margins of millions of real paths for more informed decisions. These Agents can provide very high coverage of the design’s logic and pinpoint the real critical paths that traditional methods often miss. This approach allows precise action based on real workloads, aging, and IR drops.&lt;/p&gt;  
 &lt;div style="color: rgba(0, 0, 0, 0);"&gt;&lt;/div&gt;  
 &lt;sub style="line-height: 0;"&gt;Unlike canary circuits (right, in yellow), proteanTecs uses on-chip Agents (left, in blue) that monitor true critical paths.&lt;/sub&gt;   
 &lt;p style="color: #697694; line-height: 27px;"&gt;‍&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;proteanTecs provides the Health Index by processing on-chip Agent readings alongside other inputs using advanced real-time algorithms. A low index score might trigger an interrupt, allowing the Baseboard Management Controller (BMC) to decide whether to take corrective action given the current system status.&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;In some configurations, the proteanTecs solutions take corrective action on their own without the BMC, offering prescriptive maintenance as well. Chips equipped with this technology can automatically adjust voltage or frequency to compensate for aging, adapt to workload demands, and help prevent SDC.&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;strong&gt;Ensuring AI Reliability: Chip Monitoring as the Answer to SDC&lt;/strong&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;As AI systems continue to scale and process nodes shrink further, SDC will only become more prevalent. The OCP whitepaper makes clear that traditional approaches to mitigating SDC are insufficient for the RAS (reliability, availability, serviceability) demands of modern AI infrastructure.&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;proteanTecs' runtime monitoring technology represents a fundamental shift in how the industry can address this challenge. By monitoring millions of real critical paths during actual workload execution, it transforms SDC from an invisible threat into a manageable risk.&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;The ability to detect margin degradation before it causes corruption protects months of training investment and prevents corrupted outputs from reaching inference customers. At AI's current scale and intensity, this capability is no longer optional.&lt;/p&gt; 
&lt;/div&gt;  
&lt;img src="https://track.hubspot.com/__ptq.gif?a=7884687&amp;amp;k=14&amp;amp;r=https%3A%2F%2Fwww.proteantecs.com%2Fblog%2Focp-warns-ai-is-compromised-by-sdc.-proteantecs-in-chip-monitoring-restores-trust&amp;amp;bu=https%253A%252F%252Fwww.proteantecs.com%252Fblog&amp;amp;bvt=rss" alt="" width="1" height="1" style="min-height:1px!important;width:1px!important;border-width:0!important;margin-top:0!important;margin-bottom:0!important;margin-right:0!important;margin-left:0!important;padding-top:0!important;padding-bottom:0!important;padding-right:0!important;padding-left:0!important; "&gt;</content:encoded>
      <pubDate>Tue, 24 Feb 2026 16:00:00 GMT</pubDate>
      <guid>https://www.proteantecs.com/blog/ocp-warns-ai-is-compromised-by-sdc.-proteantecs-in-chip-monitoring-restores-trust</guid>
      <dc:date>2026-02-24T16:00:00Z</dc:date>
      <dc:creator>Admin</dc:creator>
    </item>
    <item>
      <title>Resilient and Optimized GenAI Systems with proteanTecs and Arm’s Neoverse CSS | proteanTecs Blog</title>
      <link>https://www.proteantecs.com/blog/resilient-and-optimized-genai-systems-with-proteantecs-and-arms-neoverse-css</link>
      <description>&lt;div class="hs-featured-image-wrapper"&gt; 
 &lt;a href="https://www.proteantecs.com/blog/resilient-and-optimized-genai-systems-with-proteantecs-and-arms-neoverse-css" title="" class="hs-featured-image-link"&gt; &lt;img src="https://www.proteantecs.com/hubfs/Website/Pages/Blogs%20Images%20and%20Thumbnail/2%20-%20Resilient%20and%20Optimized%20GenAI.png" alt="Resilient and Optimized GenAI Systems with proteanTecs and Arm’s Neoverse CSS | proteanTecs Blog" class="hs-featured-image" style="width:auto !important; max-width:50%; float:left; margin:0 15px 15px 0;"&gt; &lt;/a&gt; 
&lt;/div&gt; 
&lt;h4 style="line-height: 31px; color: #474c68; background-color: #ffffff;"&gt;Next-gen AI demands real-time insight. Discover proteanTecs and Arm integration.&lt;br&gt;&lt;br&gt;&lt;/h4&gt; 
&lt;div style="color: #474c68; background-color: #ffffff;"&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;AI and datacenter systems are being pushed to their limits, with soaring complexity, nonstop inference workloads, and rising energy demands. Addressing these pressures requires more than incremental improvements, it calls for collaboration across the ecosystem. That’s why proteanTecs has joined forces with Arm, bringing our real-time monitoring technology into Arm’s Neoverse Compute Subsystems (CSS). Successful integration brings a customer-ready solution - designed to accelerate power efficiency, performance, and reliability at scale.&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;strong&gt;Challenges Facing Next-Gen AI Infrastructure&lt;/strong&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;The cloud AI landscape is at an inflection point. Explosive growth in model complexity, inference demand, and system scale has strained the very fabric of compute infrastructure. Training runs that once required thousands of GPUs now demand tens of thousands, with costs reaching hundreds of millions of dollars. Inference, once considered “easier,” now drives massive daily workloads that&lt;a href="http://proteantecs-7884687-hs-sites-com.sandbox.hs-sites.com/resources/whitepaper-optimizing-system-production-with-on-chip-telemetry-and-ml-driven-analytics" style="color: #697694;"&gt;push energy budgets and hardware reliability to the brink&lt;/a&gt;.&lt;/p&gt; 
 &lt;ul&gt; 
  &lt;li style="color: #697694;"&gt;&lt;strong&gt;Power efficiency&lt;/strong&gt;: AI data centers will consume over 90 TWh annually by 2026. Excessive voltage guard bands, designed for worst-case scenarios, drive unnecessary energy waste.&lt;/li&gt; 
 &lt;/ul&gt; 
 &lt;ul&gt; 
  &lt;li style="color: #697694;"&gt;&lt;strong&gt;Performance at scale&lt;/strong&gt;: Even small throughput inefficiencies cascade at hyperscale. A 10% gain in throughput can reduce training times by weeks and save millions in infrastructure costs&lt;/li&gt; 
  &lt;li style="color: #697694;"&gt;&lt;strong&gt;Reliability and resilience&lt;/strong&gt;: Silent Data Corruption (SDC) is an invisible risk. A single undetected error can corrupt weights across thousands of GPUs, invalidating billion-dollar training runs.&lt;/li&gt; 
 &lt;/ul&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;For hyperscalers, the stakes are clear: every watt saved, every percentage of performance reclaimed, and every silent error prevented translates into millions of dollars and competitive advantage.&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;Meeting these challenges requires more than node upgrades or incremental optimizations. It demands in-situ visibility into how chips behave under real workloads and operating conditions, and the ability to act on that knowledge in real time.&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&amp;nbsp;&lt;/p&gt;  
 &lt;div style="color: rgba(0, 0, 0, 0);"&gt;&lt;/div&gt;  
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;br&gt;&lt;em&gt;Growth in transistor density versus the PFLOPS required to train AI models from a 2021 baseline. By 2024, AI compute requirements surged by 6847%, while transistor density grew by only 183%. 2025 value is based on the projected PFLOPS required to train GPT-5. Source: Mollick, E. (2024). Scaling: The state of play in AI. One Useful Thing.&lt;/em&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;strong&gt;Deep Data Needed to Face these Challenges&lt;/strong&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;Current methods for optimizing performance, power, and reliability all share the same blind spot: they don’t see how chips behave under actual workloads in the field. GenAI cloud operators pay for this lack of real-time visibility through higher power draw, lower throughput, and increased risk of failure. Performance tuning relies on static margins. Power controls are triggered by basic telemetry. Reliability checks happen too late, after failure is already underway. None of these approaches adapts to actual stress and environmental conditions during live operation.&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;That’s the gap.&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;proteanTecs closes this gap by providing deep data monitoring solutions that give system designers and operators unprecedented visibility into chip health and performance throughout the lifecycle.&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;The technology delivers a complete monitoring solution spanning silicon to system. At the hardware level, an on-chip HW IP Monitoring System combines lightweight Agents with built-in infrastructure for seamless access, control, and integration, enabling deep visibility from within the silicon. Complementing this are advanced EDA-based integration and implementation tools that ensure high coverage and smooth deployment with no design impact. On top of the hardware, a suite of machine learning–driven software applications run in the field and in real time, providing predictive monitoring.&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;By embedding Agents within the silicon, we enable performance improvements, power reduction, and diagnostics throughout the device’s mission.&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;The on-chip Agents provide parametric measurements in-situ and in functional mode, to detect timing issues, operational and environmental effects, aging and application stress. Among the suite of Agents are the Margin Agents that monitor timing margins of millions of real paths for more informed decisions. Margin Agents provide very high coverage of the design’s logic and monitor the real performance-limiting paths that traditional methods often miss. The real performance-limiting (minimum voltage or maximum frequency) paths are ensured to be covered for all devices in the process distribution, and for all the operating conditions and functional workloads.&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&amp;nbsp;&lt;/p&gt;  
 &lt;div style="color: rgba(0, 0, 0, 0);"&gt;&lt;/div&gt;  
 &lt;p style="color: #697694; line-height: 27px;"&gt;&amp;nbsp;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;em&gt;Unlike canary circuits (right, in yellow), proteanTecs uses &lt;/em&gt;&lt;em&gt;on-chip Margin Agents (left, in blue) that monitor true critical paths.&lt;/em&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;strong&gt;proteanTecs and Arm CSS: Customer-Ready Integration&lt;/strong&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;Now, in collaboration with Arm, we’re bringing these capabilities directly into the heart of next-generation datacenter and AI infrastructure.&lt;a href="https://www.proteantecs.com/pressroom/proteantecs-joins-arm-total-design-brings-lifecycle-health-and-performance-monitoring-to-arm-based-custom-socs" style="color: #697694;"&gt;As part of Arm Total Design&lt;/a&gt;, proteanTecs has successfully integrated its monitoring solutions into Arm’s Neoverse Compute Subsystems (CSS). This milestone means our Agent integration&lt;strong&gt;is validated, and optimized for Neoverse CSS&lt;/strong&gt;, enabling mutual customers to benefit from seamless integration into their custom SoCs.&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;This milestone means:&lt;/p&gt; 
 &lt;ul&gt; 
  &lt;li style="color: #697694;"&gt;Customer-ready integration: proteanTecs monitoring solutions are now natively available within Neoverse CSS-based custom SoCs.&lt;/li&gt; 
 &lt;/ul&gt; 
 &lt;ul&gt; 
  &lt;li style="color: #697694;"&gt;Preferential access: As a member of Arm Total Design, proteanTecs gains early access to Neoverse CSS, enabling deep integration and joint validation.&lt;/li&gt; 
  &lt;li style="color: #697694;"&gt;Faster time-to-market: Mutual customers benefit from seamless adoption - cutting integration effort, validation cycles, and deployment risk.&lt;/li&gt; 
 &lt;/ul&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;The result: system designers can bring powerful AI/datacenter SoCs to market faster, with embedded visibility, power/performance optimization, and reliability monitoring built-in.&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&amp;nbsp;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;strong&gt;Demonstrating Coverage, Efficiency, and Seamless Integration&lt;/strong&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;The integration of proteanTecs monitoring solutions into Arm’s Neoverse CSS has now been validated in practice, and the results underscore the value of a customer-ready reference design.&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;In this implementation - in an advanced process node, 200 Margin Agents (MAs) were integrated and implemented in one of the most advanced Arm Neoverse CPU core. proteanTecs proprietary algorithms, part of proteanTecs EDA tools, provide the decision on which endpoints should be monitored by each Margin Agent. This ensures that the true performance-limiting paths are monitored.&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;This strategic monitoring achieved a coverage result of 96.63% (based on proteanTecs proprietary coverage metrics), a level of visibility that allows customers to make confident, data-driven decisions. For more information about proteanTecs’ coverage methodology, customers are encouraged&lt;a href="http://proteantecs-7884687-hs-sites-com.sandbox.hs-sites.com/contact" style="color: #697694;"&gt;to reach out to our support team&lt;/a&gt;.&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;‍&lt;/p&gt;  
 &lt;div style="color: rgba(0, 0, 0, 0);"&gt;&lt;/div&gt;  
 &lt;p style="color: #697694; line-height: 27px;"&gt;&amp;nbsp;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;Equally important, the addition of monitoring capability had virtually no effect on the design itself. Timing and power measurements remained stable and well within normal run-to-run variation, confirming that the integration does not compromise efficiency. Max timing and power results are shown in the table below.&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&amp;nbsp;&lt;/p&gt;  
 &lt;div style="color: rgba(0, 0, 0, 0);"&gt;&lt;/div&gt;  
 &lt;p style="color: #697694; line-height: 27px;"&gt;&amp;nbsp;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;No manual timing fixes were applied, so the results reflect a true Synthesis and Place-and-Route tools output, ensuring transparency and reliability in the process.&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;Taken together, these findings provide customers with a reference implementation that demonstrates how proteanTecs can be embedded seamlessly into high-speed designs at advanced process nodes, without introducing overhead or risk.&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;proteanTecs’ solution is an open architecture and can work under partner monitoring frameworks. Among the supported frameworks is the &lt;a href="https://developer.arm.com/community/arm-community-blogs/b/servers-and-cloud-computing-blog/posts/system-monitoring-control-framework-arm-neoverse-css" style="color: #697694;"&gt;Arm System Monitoring Control Framework (SMCF)&lt;/a&gt;, which enhances monitoring for Arm CSS solutions. You can learn more about proteanTecs’ integration with SMCF&lt;a href="https://www.proteantecs.com/blog/expanding-the-horizon-of-system-monitoring-with-the-arm-smcf" style="color: #697694;"&gt;here&lt;/a&gt;.&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;strong&gt;Unlocking Efficiency, Performance, and Reliability&lt;/strong&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;proteanTecs’ suite of applications, now enabled for Neoverse CSS, ensure datacenter operators can optimize at runtime:&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;strong&gt;AVS Pro™&lt;/strong&gt;: Workload and reliability aware, real-time power reduction - delivering up to 14% lower power with no performance loss, while extending the device RUL by ~20%. To learn more, read the&lt;a href="https://hubs.la/Q03JqyC40" style="color: #697694;"&gt;white paper&lt;/a&gt;here.&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;strong&gt;AFS Pro™&lt;/strong&gt;: Workload and reliability aware, real-time frequency increase - capturing frequency headroom for up to 10% performance boost.&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;strong&gt;RTHM™&lt;/strong&gt;: Monitors health in real-time, flagging risks before they cascade into SDC or system failures.&lt;a href="http://proteantecs-7884687-hs-sites-com.sandbox.hs-sites.com/resources/whitepaper-redefining-ras-in-datacenters-with-real-time-health-monitoring-white-paper_extending-chip-lifetime-with-safer-voltage-scaling" style="color: #697694;"&gt;Read more here&lt;/a&gt;.&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&amp;nbsp;&lt;/p&gt;  
 &lt;div style="color: rgba(0, 0, 0, 0);"&gt;&lt;/div&gt;  
 &lt;p style="color: #697694; line-height: 27px;"&gt;&amp;nbsp;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;By embedding these capabilities into Neoverse CSS-based SoCs, mutual customers gain a powerful edge: the ability to scale AI infrastructure power efficiency, performance, and reliably.&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;strong&gt;Conclusion: Real-Time Monitoring for Scalable GenAI Chips&lt;/strong&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;As GenAI chips reach unprecedented levels of complexity, chipmakers need visibility into how each chip truly behaves under live workloads.&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;proteanTecs delivers exactly that, with a new class of in-chip monitoring and applications that dynamically tune in real-time each device for optimal efficiency, performance, and RAS. Now, through successful integration with Arm’s Neoverse Compute Subsystems (CSS) as part of Arm Total Design, proteanTecs’ real-time monitoring solutions are validated, optimized, and customer-ready. This seamless integration enables mutual customers to accelerate time-to-market while benefiting from power reduction, performance improvement, and built-in resilience at hyperscale.&lt;/p&gt; 
&lt;/div&gt;</description>
      <content:encoded>&lt;div class="hs-featured-image-wrapper"&gt; 
 &lt;a href="https://www.proteantecs.com/blog/resilient-and-optimized-genai-systems-with-proteantecs-and-arms-neoverse-css" title="" class="hs-featured-image-link"&gt; &lt;img src="https://www.proteantecs.com/hubfs/Website/Pages/Blogs%20Images%20and%20Thumbnail/2%20-%20Resilient%20and%20Optimized%20GenAI.png" alt="Resilient and Optimized GenAI Systems with proteanTecs and Arm’s Neoverse CSS | proteanTecs Blog" class="hs-featured-image" style="width:auto !important; max-width:50%; float:left; margin:0 15px 15px 0;"&gt; &lt;/a&gt; 
&lt;/div&gt; 
&lt;h4 style="line-height: 31px; color: #474c68; background-color: #ffffff;"&gt;Next-gen AI demands real-time insight. Discover proteanTecs and Arm integration.&lt;br&gt;&lt;br&gt;&lt;/h4&gt; 
&lt;div style="color: #474c68; background-color: #ffffff;"&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;AI and datacenter systems are being pushed to their limits, with soaring complexity, nonstop inference workloads, and rising energy demands. Addressing these pressures requires more than incremental improvements, it calls for collaboration across the ecosystem. That’s why proteanTecs has joined forces with Arm, bringing our real-time monitoring technology into Arm’s Neoverse Compute Subsystems (CSS). Successful integration brings a customer-ready solution - designed to accelerate power efficiency, performance, and reliability at scale.&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;strong&gt;Challenges Facing Next-Gen AI Infrastructure&lt;/strong&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;The cloud AI landscape is at an inflection point. Explosive growth in model complexity, inference demand, and system scale has strained the very fabric of compute infrastructure. Training runs that once required thousands of GPUs now demand tens of thousands, with costs reaching hundreds of millions of dollars. Inference, once considered “easier,” now drives massive daily workloads that&lt;a href="http://proteantecs-7884687-hs-sites-com.sandbox.hs-sites.com/resources/whitepaper-optimizing-system-production-with-on-chip-telemetry-and-ml-driven-analytics" style="color: #697694;"&gt;push energy budgets and hardware reliability to the brink&lt;/a&gt;.&lt;/p&gt; 
 &lt;ul&gt; 
  &lt;li style="color: #697694;"&gt;&lt;strong&gt;Power efficiency&lt;/strong&gt;: AI data centers will consume over 90 TWh annually by 2026. Excessive voltage guard bands, designed for worst-case scenarios, drive unnecessary energy waste.&lt;/li&gt; 
 &lt;/ul&gt; 
 &lt;ul&gt; 
  &lt;li style="color: #697694;"&gt;&lt;strong&gt;Performance at scale&lt;/strong&gt;: Even small throughput inefficiencies cascade at hyperscale. A 10% gain in throughput can reduce training times by weeks and save millions in infrastructure costs&lt;/li&gt; 
  &lt;li style="color: #697694;"&gt;&lt;strong&gt;Reliability and resilience&lt;/strong&gt;: Silent Data Corruption (SDC) is an invisible risk. A single undetected error can corrupt weights across thousands of GPUs, invalidating billion-dollar training runs.&lt;/li&gt; 
 &lt;/ul&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;For hyperscalers, the stakes are clear: every watt saved, every percentage of performance reclaimed, and every silent error prevented translates into millions of dollars and competitive advantage.&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;Meeting these challenges requires more than node upgrades or incremental optimizations. It demands in-situ visibility into how chips behave under real workloads and operating conditions, and the ability to act on that knowledge in real time.&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&amp;nbsp;&lt;/p&gt;  
 &lt;div style="color: rgba(0, 0, 0, 0);"&gt;&lt;/div&gt;  
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;br&gt;&lt;em&gt;Growth in transistor density versus the PFLOPS required to train AI models from a 2021 baseline. By 2024, AI compute requirements surged by 6847%, while transistor density grew by only 183%. 2025 value is based on the projected PFLOPS required to train GPT-5. Source: Mollick, E. (2024). Scaling: The state of play in AI. One Useful Thing.&lt;/em&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;strong&gt;Deep Data Needed to Face these Challenges&lt;/strong&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;Current methods for optimizing performance, power, and reliability all share the same blind spot: they don’t see how chips behave under actual workloads in the field. GenAI cloud operators pay for this lack of real-time visibility through higher power draw, lower throughput, and increased risk of failure. Performance tuning relies on static margins. Power controls are triggered by basic telemetry. Reliability checks happen too late, after failure is already underway. None of these approaches adapts to actual stress and environmental conditions during live operation.&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;That’s the gap.&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;proteanTecs closes this gap by providing deep data monitoring solutions that give system designers and operators unprecedented visibility into chip health and performance throughout the lifecycle.&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;The technology delivers a complete monitoring solution spanning silicon to system. At the hardware level, an on-chip HW IP Monitoring System combines lightweight Agents with built-in infrastructure for seamless access, control, and integration, enabling deep visibility from within the silicon. Complementing this are advanced EDA-based integration and implementation tools that ensure high coverage and smooth deployment with no design impact. On top of the hardware, a suite of machine learning–driven software applications run in the field and in real time, providing predictive monitoring.&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;By embedding Agents within the silicon, we enable performance improvements, power reduction, and diagnostics throughout the device’s mission.&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;The on-chip Agents provide parametric measurements in-situ and in functional mode, to detect timing issues, operational and environmental effects, aging and application stress. Among the suite of Agents are the Margin Agents that monitor timing margins of millions of real paths for more informed decisions. Margin Agents provide very high coverage of the design’s logic and monitor the real performance-limiting paths that traditional methods often miss. The real performance-limiting (minimum voltage or maximum frequency) paths are ensured to be covered for all devices in the process distribution, and for all the operating conditions and functional workloads.&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&amp;nbsp;&lt;/p&gt;  
 &lt;div style="color: rgba(0, 0, 0, 0);"&gt;&lt;/div&gt;  
 &lt;p style="color: #697694; line-height: 27px;"&gt;&amp;nbsp;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;em&gt;Unlike canary circuits (right, in yellow), proteanTecs uses &lt;/em&gt;&lt;em&gt;on-chip Margin Agents (left, in blue) that monitor true critical paths.&lt;/em&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;strong&gt;proteanTecs and Arm CSS: Customer-Ready Integration&lt;/strong&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;Now, in collaboration with Arm, we’re bringing these capabilities directly into the heart of next-generation datacenter and AI infrastructure.&lt;a href="https://www.proteantecs.com/pressroom/proteantecs-joins-arm-total-design-brings-lifecycle-health-and-performance-monitoring-to-arm-based-custom-socs" style="color: #697694;"&gt;As part of Arm Total Design&lt;/a&gt;, proteanTecs has successfully integrated its monitoring solutions into Arm’s Neoverse Compute Subsystems (CSS). This milestone means our Agent integration&lt;strong&gt;is validated, and optimized for Neoverse CSS&lt;/strong&gt;, enabling mutual customers to benefit from seamless integration into their custom SoCs.&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;This milestone means:&lt;/p&gt; 
 &lt;ul&gt; 
  &lt;li style="color: #697694;"&gt;Customer-ready integration: proteanTecs monitoring solutions are now natively available within Neoverse CSS-based custom SoCs.&lt;/li&gt; 
 &lt;/ul&gt; 
 &lt;ul&gt; 
  &lt;li style="color: #697694;"&gt;Preferential access: As a member of Arm Total Design, proteanTecs gains early access to Neoverse CSS, enabling deep integration and joint validation.&lt;/li&gt; 
  &lt;li style="color: #697694;"&gt;Faster time-to-market: Mutual customers benefit from seamless adoption - cutting integration effort, validation cycles, and deployment risk.&lt;/li&gt; 
 &lt;/ul&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;The result: system designers can bring powerful AI/datacenter SoCs to market faster, with embedded visibility, power/performance optimization, and reliability monitoring built-in.&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&amp;nbsp;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;strong&gt;Demonstrating Coverage, Efficiency, and Seamless Integration&lt;/strong&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;The integration of proteanTecs monitoring solutions into Arm’s Neoverse CSS has now been validated in practice, and the results underscore the value of a customer-ready reference design.&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;In this implementation - in an advanced process node, 200 Margin Agents (MAs) were integrated and implemented in one of the most advanced Arm Neoverse CPU core. proteanTecs proprietary algorithms, part of proteanTecs EDA tools, provide the decision on which endpoints should be monitored by each Margin Agent. This ensures that the true performance-limiting paths are monitored.&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;This strategic monitoring achieved a coverage result of 96.63% (based on proteanTecs proprietary coverage metrics), a level of visibility that allows customers to make confident, data-driven decisions. For more information about proteanTecs’ coverage methodology, customers are encouraged&lt;a href="http://proteantecs-7884687-hs-sites-com.sandbox.hs-sites.com/contact" style="color: #697694;"&gt;to reach out to our support team&lt;/a&gt;.&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;‍&lt;/p&gt;  
 &lt;div style="color: rgba(0, 0, 0, 0);"&gt;&lt;/div&gt;  
 &lt;p style="color: #697694; line-height: 27px;"&gt;&amp;nbsp;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;Equally important, the addition of monitoring capability had virtually no effect on the design itself. Timing and power measurements remained stable and well within normal run-to-run variation, confirming that the integration does not compromise efficiency. Max timing and power results are shown in the table below.&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&amp;nbsp;&lt;/p&gt;  
 &lt;div style="color: rgba(0, 0, 0, 0);"&gt;&lt;/div&gt;  
 &lt;p style="color: #697694; line-height: 27px;"&gt;&amp;nbsp;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;No manual timing fixes were applied, so the results reflect a true Synthesis and Place-and-Route tools output, ensuring transparency and reliability in the process.&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;Taken together, these findings provide customers with a reference implementation that demonstrates how proteanTecs can be embedded seamlessly into high-speed designs at advanced process nodes, without introducing overhead or risk.&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;proteanTecs’ solution is an open architecture and can work under partner monitoring frameworks. Among the supported frameworks is the &lt;a href="https://developer.arm.com/community/arm-community-blogs/b/servers-and-cloud-computing-blog/posts/system-monitoring-control-framework-arm-neoverse-css" style="color: #697694;"&gt;Arm System Monitoring Control Framework (SMCF)&lt;/a&gt;, which enhances monitoring for Arm CSS solutions. You can learn more about proteanTecs’ integration with SMCF&lt;a href="https://www.proteantecs.com/blog/expanding-the-horizon-of-system-monitoring-with-the-arm-smcf" style="color: #697694;"&gt;here&lt;/a&gt;.&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;strong&gt;Unlocking Efficiency, Performance, and Reliability&lt;/strong&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;proteanTecs’ suite of applications, now enabled for Neoverse CSS, ensure datacenter operators can optimize at runtime:&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;strong&gt;AVS Pro™&lt;/strong&gt;: Workload and reliability aware, real-time power reduction - delivering up to 14% lower power with no performance loss, while extending the device RUL by ~20%. To learn more, read the&lt;a href="https://hubs.la/Q03JqyC40" style="color: #697694;"&gt;white paper&lt;/a&gt;here.&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;strong&gt;AFS Pro™&lt;/strong&gt;: Workload and reliability aware, real-time frequency increase - capturing frequency headroom for up to 10% performance boost.&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;strong&gt;RTHM™&lt;/strong&gt;: Monitors health in real-time, flagging risks before they cascade into SDC or system failures.&lt;a href="http://proteantecs-7884687-hs-sites-com.sandbox.hs-sites.com/resources/whitepaper-redefining-ras-in-datacenters-with-real-time-health-monitoring-white-paper_extending-chip-lifetime-with-safer-voltage-scaling" style="color: #697694;"&gt;Read more here&lt;/a&gt;.&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&amp;nbsp;&lt;/p&gt;  
 &lt;div style="color: rgba(0, 0, 0, 0);"&gt;&lt;/div&gt;  
 &lt;p style="color: #697694; line-height: 27px;"&gt;&amp;nbsp;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;By embedding these capabilities into Neoverse CSS-based SoCs, mutual customers gain a powerful edge: the ability to scale AI infrastructure power efficiency, performance, and reliably.&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;strong&gt;Conclusion: Real-Time Monitoring for Scalable GenAI Chips&lt;/strong&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;As GenAI chips reach unprecedented levels of complexity, chipmakers need visibility into how each chip truly behaves under live workloads.&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;proteanTecs delivers exactly that, with a new class of in-chip monitoring and applications that dynamically tune in real-time each device for optimal efficiency, performance, and RAS. Now, through successful integration with Arm’s Neoverse Compute Subsystems (CSS) as part of Arm Total Design, proteanTecs’ real-time monitoring solutions are validated, optimized, and customer-ready. This seamless integration enables mutual customers to accelerate time-to-market while benefiting from power reduction, performance improvement, and built-in resilience at hyperscale.&lt;/p&gt; 
&lt;/div&gt;  
&lt;img src="https://track.hubspot.com/__ptq.gif?a=7884687&amp;amp;k=14&amp;amp;r=https%3A%2F%2Fwww.proteantecs.com%2Fblog%2Fresilient-and-optimized-genai-systems-with-proteantecs-and-arms-neoverse-css&amp;amp;bu=https%253A%252F%252Fwww.proteantecs.com%252Fblog&amp;amp;bvt=rss" alt="" width="1" height="1" style="min-height:1px!important;width:1px!important;border-width:0!important;margin-top:0!important;margin-bottom:0!important;margin-right:0!important;margin-left:0!important;padding-top:0!important;padding-bottom:0!important;padding-right:0!important;padding-left:0!important; "&gt;</content:encoded>
      <pubDate>Mon, 03 Nov 2025 16:00:00 GMT</pubDate>
      <guid>https://www.proteantecs.com/blog/resilient-and-optimized-genai-systems-with-proteantecs-and-arms-neoverse-css</guid>
      <dc:date>2025-11-03T16:00:00Z</dc:date>
      <dc:creator>Admin</dc:creator>
    </item>
    <item>
      <title>Same Chip, Two Destinies: How Power Profiles Improve With On-Chip Monitoring | proteanTecs Blog</title>
      <link>https://www.proteantecs.com/blog/same-chip-two-destinies-how-power-profiles-improve-with-on-chip-monitoring</link>
      <description>&lt;div class="hs-featured-image-wrapper"&gt; 
 &lt;a href="https://www.proteantecs.com/blog/same-chip-two-destinies-how-power-profiles-improve-with-on-chip-monitoring" title="" class="hs-featured-image-link"&gt; &lt;img src="https://www.proteantecs.com/hubfs/Website/Pages/Blogs%20Images%20and%20Thumbnail/3%20-%20Same%20Chip%2c%20Two%20Destinies.png" alt="Same Chip, Two Destinies: How Power Profiles Improve With On-Chip Monitoring | proteanTecs Blog" class="hs-featured-image" style="width:auto !important; max-width:50%; float:left; margin:0 15px 15px 0;"&gt; &lt;/a&gt; 
&lt;/div&gt; 
&lt;h4 style="line-height: 31px; color: #474c68; background-color: #ffffff;"&gt;&lt;span style="color: #474c68;"&gt;The Impact of On-Chip Telemetry on Peak Power, Average Power, and Di/Dt Noise&lt;/span&gt;&lt;br&gt;&lt;br&gt;&lt;/h4&gt; 
&lt;div style="color: #474c68; background-color: #ffffff;"&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;What happens to critical power-related considerations when the same chip is handled two different ways, with or without visibility from within?&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;This article begins by examining how the absence of on-chip monitoring impacts peak power, average power, and Di/Dt noise (rate of current change), as illustrated in the diagram below and the subsequent discussion. It then details how these aspects change when in-chip telemetry is available.&lt;/span&gt;&lt;br&gt;&lt;span style="color: #474c68;"&gt;‍&lt;/span&gt;&lt;/p&gt;  
 &lt;div style="color: rgba(0, 0, 0, 0);"&gt; 
  &lt;span style="color: #474c68;"&gt;&lt;/span&gt; 
 &lt;/div&gt;  
 &lt;span style="color: #474c68;"&gt;Fig. 1: As the power profile shifts with different modes and switching activity, high Di/Dt noise, peak power, and average power introduce thermal, cost, and reliability penalties.&lt;/span&gt;   
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;‍&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="line-height: 44px; font-weight: bold;"&gt;&lt;span style="color: #474c68;"&gt;On-Chip Telemetry OFF: Excessive Peak Power&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;To improve power and performance specs while reducing chip operational costs, engineers must determine the lowest reliable voltage, known as VDDmin, at a certain frequency of operation, which varies significantly between dies due to the process distribution.&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;Without on-chip telemetry, chipmakers typically detect VDDmin using VDD search testing, which lowers the voltage step by step until chip failure occurs to identify the last functional VDD. However, this method presents a difficult tradeoff:&lt;/span&gt;&lt;/p&gt; 
 &lt;ul&gt; 
  &lt;li style="color: #697694;"&gt;&lt;span style="color: #474c68;"&gt;Smaller voltage steps improve accuracy but increase test time.&lt;/span&gt;&lt;/li&gt; 
  &lt;li style="color: #697694;"&gt;&lt;span style="color: #474c68;"&gt;Larger voltage steps are quicker but might overshoot the optimal point.&lt;/span&gt;&lt;br&gt;&lt;span style="color: #474c68;"&gt;‍&lt;/span&gt;&lt;/li&gt; 
 &lt;/ul&gt;  
 &lt;div style="color: rgba(0, 0, 0, 0);"&gt; 
  &lt;span style="color: #474c68;"&gt;&lt;br&gt;&lt;/span&gt; 
 &lt;/div&gt;  
 &lt;span style="color: #474c68;"&gt;Fig. 2: Voltage search plots. Determining an accurate VDDmin using this method often requires an impractically long time and high test cost, leading to painful compromises.&lt;/span&gt;   
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;‍&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;As a compromise, many chipmakers divide all chips into a few bins, such as slow/fast/typical, setting a single voltage level per bin. However, due to the substantial variation in each bin, many units are assigned higher-than-required VDDmin, leading to excessive peak power and power density that have significant downsides, including:&lt;/span&gt;&lt;br&gt;&lt;span style="color: #474c68;"&gt;‍&lt;/span&gt;&lt;/p&gt; 
 &lt;ul&gt; 
  &lt;li style="color: #697694;"&gt;&lt;span style="color: #474c68;"&gt;Higher case temperature (Tcase)&lt;/span&gt;&lt;/li&gt; 
  &lt;li style="color: #697694;"&gt;&lt;span style="color: #474c68;"&gt;Higher Thermal Design Power (TDP)&lt;/span&gt;&lt;/li&gt; 
  &lt;li style="color: #697694;"&gt;&lt;span style="color: #474c68;"&gt;More expensive cooling&lt;/span&gt;&lt;/li&gt; 
  &lt;li style="color: #697694;"&gt;&lt;span style="color: #474c68;"&gt;Reduced reliability&lt;/span&gt;&lt;/li&gt; 
  &lt;li style="color: #697694;"&gt;&lt;span style="color: #474c68;"&gt;Shorter product lifetime&lt;/span&gt;&lt;br&gt;&lt;span style="color: #474c68;"&gt;‍&lt;/span&gt;&lt;/li&gt; 
 &lt;/ul&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;TDP dictates the form factor, cooling architecture, and rack density. When chips operate above their true minimum voltage, dynamic power increases sharply. That power converts to heat, resulting in higher TDP, expensive cooling solutions, higher failure rates, and shorter lifetime.&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="line-height: 44px;"&gt;&lt;span style="color: #474c68;"&gt;&lt;span style="font-weight: bold;"&gt;On-Chip Telemetry OFF: Excessive Di/Dt noise&lt;/span&gt;&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;Current spikes go undetected without on-chip telemetry, forcing engineers to compensate with increasing chip cost due to more expensive packaging, on-die/off-die decoupling capacitance, and on-die active droop mitigation solutions that are designed to absorb Di/Dt noise and reduce voltage droop. But that cost is only part of the tradeoff. Without visibility into current transients, designers must raise voltage or apply large safety margins to prevent failures in marginal paths.&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;These decisions suppress frequency and harm performance. Meanwhile, higher power turns into heat, increasing cooling demands and pushing thermal limits.&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;What begins as an invisible current fluctuation ends in performance loss and higher costs:&lt;/span&gt;&lt;/p&gt; 
 &lt;ul&gt; 
  &lt;li style="color: #697694;"&gt;&lt;span style="color: #474c68;"&gt;Higher risk of droop&lt;/span&gt;&lt;/li&gt; 
  &lt;li style="color: #697694;"&gt;&lt;span style="color: #474c68;"&gt;Cost&lt;/span&gt;&lt;/li&gt; 
  &lt;li style="color: #697694;"&gt;&lt;span style="color: #474c68;"&gt;Performance penalty&lt;/span&gt;&lt;/li&gt; 
 &lt;/ul&gt; 
 &lt;p style="line-height: 44px; font-weight: bold;"&gt;&lt;span style="color: #474c68;"&gt;On-Chip Telemetry OFF: Excessive Average Power&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;Without on-chip monitoring, voltage adjustment in the field relies on guesswork rather than real timing data. Typically, Adaptive Voltage Scaling (AVS) uses canary circuits based on ring oscillators (ROSC). This method attempts to mimic critical paths but fails to reflect actual workload and reliability stress, or aging effects on the real logic.&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;‍&lt;/span&gt;&lt;/p&gt;  
 &lt;div style="color: rgba(0, 0, 0, 0);"&gt; 
  &lt;span style="color: #474c68;"&gt;&lt;/span&gt; 
 &lt;/div&gt;  
 &lt;span style="color: #474c68;"&gt;Fig. 3: A canary circuit that monitors design margins is a critical path replicator, which cannot provide accurate data about actual critical path timing.&lt;/span&gt;   
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;‍&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;‍&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;To compensate for the inaccuracy, designers must apply conservative guard bands to prevent failures, leading to higher voltages that cause excessive average power and reduced performance.&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;These overprotective settings inflate operational costs and compromise long-term reliability, while offering no visibility into when and where timing issues may arise.&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;Excessive average power also affects performance by raising thermal load and limiting voltage-frequency optimization. Both effects force the system to reduce operating frequency to remain within power and thermal limits.&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;The effects of excessive average power carry several long-term drawbacks:&lt;/span&gt;&lt;br&gt;&lt;span style="color: #474c68;"&gt;‍&lt;/span&gt;&lt;/p&gt; 
 &lt;ul&gt; 
  &lt;li style="color: #697694;"&gt;&lt;span style="color: #474c68;"&gt;Inefficient power-performance solution&lt;/span&gt;&lt;/li&gt; 
  &lt;li style="color: #697694;"&gt;&lt;span style="color: #474c68;"&gt;High power cost&lt;/span&gt;&lt;/li&gt; 
  &lt;li style="color: #697694;"&gt;&lt;span style="color: #474c68;"&gt;Shorter battery life (when applicable)&lt;/span&gt;&lt;/li&gt; 
  &lt;li style="color: #697694;"&gt;&lt;span style="color: #474c68;"&gt;Shorter product lifetime&lt;/span&gt;&lt;/li&gt; 
  &lt;li style="color: #697694;"&gt;&lt;span style="color: #474c68;"&gt;Reliability degradation&lt;/span&gt;&lt;br&gt;&lt;span style="color: #474c68;"&gt;‍&lt;/span&gt;&lt;/li&gt; 
 &lt;/ul&gt; 
 &lt;p style="line-height: 44px; font-weight: bold;"&gt;&lt;span style="color: #474c68;"&gt;Power optimization: A solution that sees what others can’t&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;Chipmakers face three power-related constraints in every design: peak power, average power, and Di/Dt. Without visibility into real device behavior, these factors are managed through best known assumptions and worst-case settings.&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;To compensate for these blind spots, engineers divide dies into broad voltage bins, apply conservative voltage guard bands, and use expensive packages designed to absorb transient noise.&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;These choices increase test time, inflate cost, reduce performance, and shorten system life, among other drawbacks.&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;To address these severe implications, proteanTecs has introduced a novel approach with its on-chip Agents, which are specialized monitoring IPs embedded during design. These Agents provide accurate measurements of critical parameters such as real logic timing margins during actual operation.&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;The rich telemetry data can also feed the proteanTecs advanced data analytics software, including ML models, to guide vital decisions throughout the device production lifecycle. This level of accuracy enables meaningful reductions in cost, greater reliability, and measurable improvements in power and performance.&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;‍&lt;/span&gt;&lt;/p&gt;  
 &lt;div style="color: rgba(0, 0, 0, 0);"&gt; 
  &lt;span style="color: #474c68;"&gt;&lt;/span&gt; 
 &lt;/div&gt;  
 &lt;span style="color: #474c68;"&gt;Table 1: The impact of the proteanTecs on-chip monitoring solutions on three key optimization goals.&lt;/span&gt;   
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;‍&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="line-height: 44px; font-weight: bold;"&gt;&lt;span style="color: #474c68;"&gt;On-Chip Telemetry ON: Optimized Peak Power&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;With proteanTecs VDDmin Prediction for static operational voltage setting per device, voltage is accurately predicted per die and mapped to a much finer bin based on actual measured behavior. No more time-consuming voltage sweeps that lead to unnecessary overhead. Production cases have demonstrated ~ 70% reduction in test steps with no accuracy impact, resulting in decreased costs and accelerated time-to-market. This VDDmin prediction can be done both at the tester level and at the system level, using real application workloads.&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;‍&lt;/span&gt;&lt;/p&gt; 
 &lt;div style="color: rgba(0, 0, 0, 0);"&gt; 
  &lt;span style="color: #474c68;"&gt;&lt;/span&gt; 
 &lt;/div&gt; 
 &lt;p style="width: 800px; text-align: left;"&gt;&lt;span style="color: #474c68;"&gt;Fig. 4: proteanTecs VDDmin Prediction: Measured VDDmin (Y-axis) vs. predicted VDDmin (X-axis) comparison demonstrates exceptional accuracy with 0.15 NRMSE.&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;‍&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;VDDmin Prediction is based on an ML model, trained on accurate data from the on-chip Agents. During chip-level high volume production testing, the model is integrated into the test program software and used in real time, per device, on the test floor – to predict the optimal voltage. The prediction is tested and after a minimal number of search steps, the operational voltage is fused in the device.&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;Voltage reduction has a substantial effect on peak power, which in turn lowers the Tcase and the cooling solution cost:&lt;/span&gt;&lt;/p&gt;  
 &lt;div style="color: rgba(0, 0, 0, 0);"&gt; 
  &lt;span style="color: #474c68;"&gt;&lt;/span&gt; 
 &lt;/div&gt;  
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;As peak power reduction translates to lower thermal load, it has a system-wide impact:&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;‍&lt;/span&gt;&lt;/p&gt; 
 &lt;ul&gt; 
  &lt;li style="color: #697694;"&gt;&lt;span style="color: #474c68;"&gt;Lower Tcase&lt;/span&gt;&lt;/li&gt; 
  &lt;li style="color: #697694;"&gt;&lt;span style="color: #474c68;"&gt;Lower TDP&lt;/span&gt;&lt;/li&gt; 
  &lt;li style="color: #697694;"&gt;&lt;span style="color: #474c68;"&gt;Cheaper cooling solution&lt;/span&gt;&lt;/li&gt; 
  &lt;li style="color: #697694;"&gt;&lt;span style="color: #474c68;"&gt;Better reliability&lt;/span&gt;&lt;/li&gt; 
  &lt;li style="color: #697694;"&gt;&lt;span style="color: #474c68;"&gt;Increased lifetime&lt;/span&gt;&lt;br&gt;&lt;span style="color: #474c68;"&gt;‍&lt;/span&gt;&lt;/li&gt; 
 &lt;/ul&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;For example, these are the quantified benefits when VDDmin Prediction reduces voltage by 3%-5%:&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;‍&lt;/span&gt;&lt;/p&gt; 
 &lt;ul&gt; 
  &lt;li style="color: #697694;"&gt;&lt;span style="color: #474c68;"&gt;∆&lt;em&gt;P&lt;/em&gt; [&lt;em&gt;W&lt;/em&gt;] is within -6% to -10%&lt;/span&gt;&lt;/li&gt; 
  &lt;li style="color: #697694;"&gt;&lt;span style="color: #474c68;"&gt;∆&lt;em&gt;T&lt;sub style="line-height: 0;"&gt;case&lt;/sub&gt;&lt;/em&gt; [&lt;em&gt;°C&lt;/em&gt;] is within -3% to -5%&lt;/span&gt;&lt;/li&gt; 
  &lt;li style="color: #697694;"&gt;&lt;span style="color: #474c68;"&gt;∆&lt;em&gt;TDP&lt;/em&gt; [&lt;em&gt;W&lt;/em&gt;] is within 3% to 5%&lt;/span&gt;&lt;/li&gt; 
  &lt;li style="color: #697694;"&gt;&lt;span style="color: #474c68;"&gt;∆&lt;em&gt;Cooling Cost&lt;/em&gt; [$] is within -3% to -5%&lt;/span&gt;&lt;br&gt;&lt;span style="color: #474c68;"&gt;‍&lt;/span&gt;&lt;/li&gt; 
 &lt;/ul&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;These optimizations are critical because cooling systems already consume 30% to 55% of datacenter power budgets. Reducing chip power directly cuts thermal load, which translates into real savings in infrastructure.&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;In high-density racks, advanced liquid cooling can cost between $1,000 and $2,000 per kW cooled, which can add up to millions of dollars annually. Every watt saved at the silicon level reduces that burden.&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="line-height: 44px; font-weight: bold;"&gt;&lt;span style="color: #474c68;"&gt;On-Chip Telemetry ON: Optimized Di/Dt Noise&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;High current swings trigger voltage droop, which can disrupt timing and cause failures. In the absence of accurate real-time monitoring, engineers compensate by using higher voltages, wider margins, on-die droop mitigation solutions (that incur performance penalty), and die cost to absorb these transients.&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;proteanTecs VDDmin Prediction makes these compensations unnecessary\y by lowering VDD per die, which improves signal integrity through reduced current swings and Di/Dt noise. Lower voltage also makes room for higher frequencies that boost performance, as captured by this equation:&lt;/span&gt;&lt;/p&gt;  
 &lt;div style="color: rgba(0, 0, 0, 0);"&gt; 
  &lt;span style="color: #474c68;"&gt;&lt;/span&gt; 
 &lt;/div&gt;  
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;These improvements enable:&lt;/span&gt;&lt;/p&gt; 
 &lt;ul&gt; 
  &lt;li style="color: #697694;"&gt;&lt;span style="color: #474c68;"&gt;Safer operation (reduced noise)&lt;/span&gt;&lt;/li&gt; 
  &lt;li style="color: #697694;"&gt;&lt;span style="color: #474c68;"&gt;Cheaper package&lt;/span&gt;&lt;/li&gt; 
  &lt;li style="color: #697694;"&gt;&lt;span style="color: #474c68;"&gt;Better performance&lt;/span&gt;&lt;/li&gt; 
 &lt;/ul&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;For example, these are the quantified benefits when VDDmin Prediction reduces voltage by 3%-5%:&lt;/span&gt;&lt;/p&gt; 
 &lt;ul&gt; 
  &lt;li style="color: #697694;"&gt;&lt;span style="color: #474c68;"&gt;∆&lt;em&gt;I&lt;/em&gt; [&lt;em&gt;mA&lt;/em&gt;] is within -3% to -5%&lt;/span&gt;&lt;/li&gt; 
  &lt;li style="color: #697694;"&gt;&lt;span style="color: #474c68;"&gt;∆&lt;em&gt;V&lt;sub style="line-height: 0;"&gt;noise&lt;/sub&gt;&lt;/em&gt; [&lt;em&gt;mV&lt;/em&gt;] is within -3% to -5%&lt;/span&gt;&lt;/li&gt; 
  &lt;li style="color: #697694;"&gt;&lt;span style="color: #474c68;"&gt;∆&lt;em&gt;F&lt;/em&gt; [&lt;em&gt;MHz&lt;/em&gt;] is within 3% to 5%&lt;/span&gt;&lt;/li&gt; 
 &lt;/ul&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;In addition, proteanTecs provides real-time voltage droop sensors to protect the device in mission-mode. They provide real-time hardware signals that can trigger a clock throttling event to avoid failure and reduce Di/Dt.&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px; font-weight: bold;"&gt;&lt;span style="color: #474c68;"&gt;‍On-Chip Telemetry ON: Optimized Average Power&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;Unlike canary circuits, proteanTecs AVS Pro uses Agents that monitor true logic paths for more informed decisions. proteanTecs’ technology allows high coverage of the performance limiters, allowing precise guard-band tuning based on real workloads, aging, and IR drops.&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;This approach enables safer voltage scaling, avoiding worst-case guard-bands and allowing the device to operate closer to its actual limits without compromising functionality, performance and reliability. As demonstrated below, &lt;a href="https://www.proteantecs.com/hubfs/Resources%20-%20outbound/proteanTecs%20AVS%20Pro%20case%20study.pdf" style="color: #474c68;"&gt;AVS Pro safely reduced power consumption of a production 5nm SoC by 12.5%&lt;/a&gt;. At the same time, it extended the predicted lifetime by 18%.&lt;/span&gt;&lt;/p&gt;  
 &lt;div style="color: rgba(0, 0, 0, 0);"&gt; 
  &lt;span style="color: #474c68;"&gt;&lt;/span&gt; 
 &lt;/div&gt;  
 &lt;span style="color: #474c68;"&gt;Fig. 5: AVS Pro, visualized here, enables 12.51% power saving through safer voltage scaling, leading to 18% projected lifetime extension.&lt;/span&gt;   
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;‍&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;proteanTecs AVS Pro continuously adjusts voltage based on real-time Agent data. As the device operates with a surplus of timing margin, AVS Pro reduces the voltage. When more stressful functional workloads operate or degradation reduces timing margins, AVS Pro increases the voltage only as much as needed to maintain safe operation.&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;This continuous response avoids both oversupply and undershoot, providing substantial benefits:&lt;/span&gt;&lt;/p&gt; 
 &lt;ul&gt; 
  &lt;li style="color: #697694;"&gt;&lt;span style="color: #474c68;"&gt;Optimized power-performance&lt;/span&gt;&lt;/li&gt; 
  &lt;li style="color: #697694;"&gt;&lt;span style="color: #474c68;"&gt;Reduced power cost&lt;/span&gt;&lt;/li&gt; 
  &lt;li style="color: #697694;"&gt;&lt;span style="color: #474c68;"&gt;Higher battery life (when applicable)&lt;/span&gt;&lt;/li&gt; 
  &lt;li style="color: #697694;"&gt;&lt;span style="color: #474c68;"&gt;Increased lifetime&lt;/span&gt;&lt;/li&gt; 
  &lt;li style="color: #697694;"&gt;&lt;span style="color: #474c68;"&gt;Better reliability&lt;/span&gt;&lt;/li&gt; 
 &lt;/ul&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;The chart below shows how AVS Pro delays degradation over time. The device maintains safe performance levels for longer, pushing the wear-out point further into the product lifecycle.&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;This type of lifetime extension has significant financial implications. &lt;a href="http://proteantecs-7884687-hs-sites-com.sandbox.hs-sites.com/resources/whitepaper-extending-chip-lifetime-with-safer-voltage-scaling" style="color: #474c68;"&gt;Hyperscalers like Amazon, Alphabet, and Microsoft&lt;/a&gt; publicly attribute billions in annual net income to extending server lifespans by just one to two years. proteanTecs AVS Pro supports similar CAPEX reduction strategies by delaying degradation without compromising performance.&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;To learn more about the benefits of using AVS Pro for chip lifetime extension, read the full paper &lt;a href="http://proteantecs-7884687-hs-sites-com.sandbox.hs-sites.com/resources/whitepaper-application-specific-power-performance-optimizer-with-on-chip-monitoring" style="color: #474c68;"&gt;here&lt;/a&gt;.&lt;/span&gt;&lt;/p&gt;  
 &lt;div style="color: rgba(0, 0, 0, 0);"&gt; 
  &lt;span style="color: #474c68;"&gt;&lt;/span&gt; 
 &lt;/div&gt;  
 &lt;span style="color: #474c68;"&gt;Fig. 6: An example of chip lifetime extension enabled by AVS Pro: 5nm delay degradation simulations [%] at nominal conditions: T junction 85 °C, V=0.75V&lt;br&gt;&lt;br&gt;&lt;/span&gt;   
 &lt;p style="line-height: 44px; font-weight: bold;"&gt;&lt;span style="color: #474c68;"&gt;Conclusion – A tale of two chips&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;Many love underdogs, but as this article shows, products with on-chip telemetry win by a knockout.&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;proteanTecs provides visibility from within that spans production and deployment. With VDDmin Prediction and AVS Pro, power optimization begins at production test and continues throughout system operation.&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;VDDmin Prediction reduces peak power and Di/DT noise by tuning VDD with personalization and precision. Dedicated voltage droop sensors protect the device in real time, when unexpected workloads arrive. AVS Pro cuts average power through safer voltage scaling in the field. Together, they improve critical aspects of power, performance, and cost:&lt;/span&gt;&lt;/p&gt; 
 &lt;ul&gt; 
  &lt;li style="color: #697694;"&gt;&lt;span style="color: #474c68;"&gt;&lt;strong&gt;For end users&lt;/strong&gt; such as data center operators: lower energy costs, better performance, improved reliability, and longer lifetime.&lt;/span&gt;&lt;/li&gt; 
  &lt;li style="color: #697694;"&gt;&lt;span style="color: #474c68;"&gt;&lt;strong&gt;For system providers&lt;/strong&gt;: lower system power, lower TDP, cheaper system that is also more compact, improved reliability, and longer lifetime.&lt;/span&gt;&lt;/li&gt; 
  &lt;li style="color: #697694;"&gt;&lt;span style="color: #474c68;"&gt;&lt;strong&gt;For chip vendors&lt;/strong&gt;: improved power-performance, VDD Noise reduction, cheaper package, improved reliability, and longer lifetime.&lt;/span&gt;&lt;/li&gt; 
 &lt;/ul&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;Ready to realize the full benefits of on-chip telemetry? Contact our team &lt;a href="http://proteantecs-7884687-hs-sites-com.sandbox.hs-sites.com/contact" style="color: #474c68;"&gt;here&lt;/a&gt; or download our &lt;a href="http://proteantecs-7884687-hs-sites-com.sandbox.hs-sites.com/resources/whitepaper-application-specific-power-performance-optimizer-with-on-chip-monitoring" style="color: #474c68;"&gt;whitepaper&lt;/a&gt; to see how proteanTecs enhances performance, efficiency, reliability, and product lifetime from test to deployment.&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;‍&lt;/span&gt;&lt;/p&gt; 
&lt;/div&gt;</description>
      <content:encoded>&lt;div class="hs-featured-image-wrapper"&gt; 
 &lt;a href="https://www.proteantecs.com/blog/same-chip-two-destinies-how-power-profiles-improve-with-on-chip-monitoring" title="" class="hs-featured-image-link"&gt; &lt;img src="https://www.proteantecs.com/hubfs/Website/Pages/Blogs%20Images%20and%20Thumbnail/3%20-%20Same%20Chip%2c%20Two%20Destinies.png" alt="Same Chip, Two Destinies: How Power Profiles Improve With On-Chip Monitoring | proteanTecs Blog" class="hs-featured-image" style="width:auto !important; max-width:50%; float:left; margin:0 15px 15px 0;"&gt; &lt;/a&gt; 
&lt;/div&gt; 
&lt;h4 style="line-height: 31px; color: #474c68; background-color: #ffffff;"&gt;&lt;span style="color: #474c68;"&gt;The Impact of On-Chip Telemetry on Peak Power, Average Power, and Di/Dt Noise&lt;/span&gt;&lt;br&gt;&lt;br&gt;&lt;/h4&gt; 
&lt;div style="color: #474c68; background-color: #ffffff;"&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;What happens to critical power-related considerations when the same chip is handled two different ways, with or without visibility from within?&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;This article begins by examining how the absence of on-chip monitoring impacts peak power, average power, and Di/Dt noise (rate of current change), as illustrated in the diagram below and the subsequent discussion. It then details how these aspects change when in-chip telemetry is available.&lt;/span&gt;&lt;br&gt;&lt;span style="color: #474c68;"&gt;‍&lt;/span&gt;&lt;/p&gt;  
 &lt;div style="color: rgba(0, 0, 0, 0);"&gt; 
  &lt;span style="color: #474c68;"&gt;&lt;/span&gt; 
 &lt;/div&gt;  
 &lt;span style="color: #474c68;"&gt;Fig. 1: As the power profile shifts with different modes and switching activity, high Di/Dt noise, peak power, and average power introduce thermal, cost, and reliability penalties.&lt;/span&gt;   
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;‍&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="line-height: 44px; font-weight: bold;"&gt;&lt;span style="color: #474c68;"&gt;On-Chip Telemetry OFF: Excessive Peak Power&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;To improve power and performance specs while reducing chip operational costs, engineers must determine the lowest reliable voltage, known as VDDmin, at a certain frequency of operation, which varies significantly between dies due to the process distribution.&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;Without on-chip telemetry, chipmakers typically detect VDDmin using VDD search testing, which lowers the voltage step by step until chip failure occurs to identify the last functional VDD. However, this method presents a difficult tradeoff:&lt;/span&gt;&lt;/p&gt; 
 &lt;ul&gt; 
  &lt;li style="color: #697694;"&gt;&lt;span style="color: #474c68;"&gt;Smaller voltage steps improve accuracy but increase test time.&lt;/span&gt;&lt;/li&gt; 
  &lt;li style="color: #697694;"&gt;&lt;span style="color: #474c68;"&gt;Larger voltage steps are quicker but might overshoot the optimal point.&lt;/span&gt;&lt;br&gt;&lt;span style="color: #474c68;"&gt;‍&lt;/span&gt;&lt;/li&gt; 
 &lt;/ul&gt;  
 &lt;div style="color: rgba(0, 0, 0, 0);"&gt; 
  &lt;span style="color: #474c68;"&gt;&lt;br&gt;&lt;/span&gt; 
 &lt;/div&gt;  
 &lt;span style="color: #474c68;"&gt;Fig. 2: Voltage search plots. Determining an accurate VDDmin using this method often requires an impractically long time and high test cost, leading to painful compromises.&lt;/span&gt;   
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;‍&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;As a compromise, many chipmakers divide all chips into a few bins, such as slow/fast/typical, setting a single voltage level per bin. However, due to the substantial variation in each bin, many units are assigned higher-than-required VDDmin, leading to excessive peak power and power density that have significant downsides, including:&lt;/span&gt;&lt;br&gt;&lt;span style="color: #474c68;"&gt;‍&lt;/span&gt;&lt;/p&gt; 
 &lt;ul&gt; 
  &lt;li style="color: #697694;"&gt;&lt;span style="color: #474c68;"&gt;Higher case temperature (Tcase)&lt;/span&gt;&lt;/li&gt; 
  &lt;li style="color: #697694;"&gt;&lt;span style="color: #474c68;"&gt;Higher Thermal Design Power (TDP)&lt;/span&gt;&lt;/li&gt; 
  &lt;li style="color: #697694;"&gt;&lt;span style="color: #474c68;"&gt;More expensive cooling&lt;/span&gt;&lt;/li&gt; 
  &lt;li style="color: #697694;"&gt;&lt;span style="color: #474c68;"&gt;Reduced reliability&lt;/span&gt;&lt;/li&gt; 
  &lt;li style="color: #697694;"&gt;&lt;span style="color: #474c68;"&gt;Shorter product lifetime&lt;/span&gt;&lt;br&gt;&lt;span style="color: #474c68;"&gt;‍&lt;/span&gt;&lt;/li&gt; 
 &lt;/ul&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;TDP dictates the form factor, cooling architecture, and rack density. When chips operate above their true minimum voltage, dynamic power increases sharply. That power converts to heat, resulting in higher TDP, expensive cooling solutions, higher failure rates, and shorter lifetime.&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="line-height: 44px;"&gt;&lt;span style="color: #474c68;"&gt;&lt;span style="font-weight: bold;"&gt;On-Chip Telemetry OFF: Excessive Di/Dt noise&lt;/span&gt;&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;Current spikes go undetected without on-chip telemetry, forcing engineers to compensate with increasing chip cost due to more expensive packaging, on-die/off-die decoupling capacitance, and on-die active droop mitigation solutions that are designed to absorb Di/Dt noise and reduce voltage droop. But that cost is only part of the tradeoff. Without visibility into current transients, designers must raise voltage or apply large safety margins to prevent failures in marginal paths.&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;These decisions suppress frequency and harm performance. Meanwhile, higher power turns into heat, increasing cooling demands and pushing thermal limits.&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;What begins as an invisible current fluctuation ends in performance loss and higher costs:&lt;/span&gt;&lt;/p&gt; 
 &lt;ul&gt; 
  &lt;li style="color: #697694;"&gt;&lt;span style="color: #474c68;"&gt;Higher risk of droop&lt;/span&gt;&lt;/li&gt; 
  &lt;li style="color: #697694;"&gt;&lt;span style="color: #474c68;"&gt;Cost&lt;/span&gt;&lt;/li&gt; 
  &lt;li style="color: #697694;"&gt;&lt;span style="color: #474c68;"&gt;Performance penalty&lt;/span&gt;&lt;/li&gt; 
 &lt;/ul&gt; 
 &lt;p style="line-height: 44px; font-weight: bold;"&gt;&lt;span style="color: #474c68;"&gt;On-Chip Telemetry OFF: Excessive Average Power&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;Without on-chip monitoring, voltage adjustment in the field relies on guesswork rather than real timing data. Typically, Adaptive Voltage Scaling (AVS) uses canary circuits based on ring oscillators (ROSC). This method attempts to mimic critical paths but fails to reflect actual workload and reliability stress, or aging effects on the real logic.&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;‍&lt;/span&gt;&lt;/p&gt;  
 &lt;div style="color: rgba(0, 0, 0, 0);"&gt; 
  &lt;span style="color: #474c68;"&gt;&lt;/span&gt; 
 &lt;/div&gt;  
 &lt;span style="color: #474c68;"&gt;Fig. 3: A canary circuit that monitors design margins is a critical path replicator, which cannot provide accurate data about actual critical path timing.&lt;/span&gt;   
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;‍&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;‍&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;To compensate for the inaccuracy, designers must apply conservative guard bands to prevent failures, leading to higher voltages that cause excessive average power and reduced performance.&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;These overprotective settings inflate operational costs and compromise long-term reliability, while offering no visibility into when and where timing issues may arise.&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;Excessive average power also affects performance by raising thermal load and limiting voltage-frequency optimization. Both effects force the system to reduce operating frequency to remain within power and thermal limits.&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;The effects of excessive average power carry several long-term drawbacks:&lt;/span&gt;&lt;br&gt;&lt;span style="color: #474c68;"&gt;‍&lt;/span&gt;&lt;/p&gt; 
 &lt;ul&gt; 
  &lt;li style="color: #697694;"&gt;&lt;span style="color: #474c68;"&gt;Inefficient power-performance solution&lt;/span&gt;&lt;/li&gt; 
  &lt;li style="color: #697694;"&gt;&lt;span style="color: #474c68;"&gt;High power cost&lt;/span&gt;&lt;/li&gt; 
  &lt;li style="color: #697694;"&gt;&lt;span style="color: #474c68;"&gt;Shorter battery life (when applicable)&lt;/span&gt;&lt;/li&gt; 
  &lt;li style="color: #697694;"&gt;&lt;span style="color: #474c68;"&gt;Shorter product lifetime&lt;/span&gt;&lt;/li&gt; 
  &lt;li style="color: #697694;"&gt;&lt;span style="color: #474c68;"&gt;Reliability degradation&lt;/span&gt;&lt;br&gt;&lt;span style="color: #474c68;"&gt;‍&lt;/span&gt;&lt;/li&gt; 
 &lt;/ul&gt; 
 &lt;p style="line-height: 44px; font-weight: bold;"&gt;&lt;span style="color: #474c68;"&gt;Power optimization: A solution that sees what others can’t&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;Chipmakers face three power-related constraints in every design: peak power, average power, and Di/Dt. Without visibility into real device behavior, these factors are managed through best known assumptions and worst-case settings.&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;To compensate for these blind spots, engineers divide dies into broad voltage bins, apply conservative voltage guard bands, and use expensive packages designed to absorb transient noise.&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;These choices increase test time, inflate cost, reduce performance, and shorten system life, among other drawbacks.&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;To address these severe implications, proteanTecs has introduced a novel approach with its on-chip Agents, which are specialized monitoring IPs embedded during design. These Agents provide accurate measurements of critical parameters such as real logic timing margins during actual operation.&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;The rich telemetry data can also feed the proteanTecs advanced data analytics software, including ML models, to guide vital decisions throughout the device production lifecycle. This level of accuracy enables meaningful reductions in cost, greater reliability, and measurable improvements in power and performance.&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;‍&lt;/span&gt;&lt;/p&gt;  
 &lt;div style="color: rgba(0, 0, 0, 0);"&gt; 
  &lt;span style="color: #474c68;"&gt;&lt;/span&gt; 
 &lt;/div&gt;  
 &lt;span style="color: #474c68;"&gt;Table 1: The impact of the proteanTecs on-chip monitoring solutions on three key optimization goals.&lt;/span&gt;   
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;‍&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="line-height: 44px; font-weight: bold;"&gt;&lt;span style="color: #474c68;"&gt;On-Chip Telemetry ON: Optimized Peak Power&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;With proteanTecs VDDmin Prediction for static operational voltage setting per device, voltage is accurately predicted per die and mapped to a much finer bin based on actual measured behavior. No more time-consuming voltage sweeps that lead to unnecessary overhead. Production cases have demonstrated ~ 70% reduction in test steps with no accuracy impact, resulting in decreased costs and accelerated time-to-market. This VDDmin prediction can be done both at the tester level and at the system level, using real application workloads.&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;‍&lt;/span&gt;&lt;/p&gt; 
 &lt;div style="color: rgba(0, 0, 0, 0);"&gt; 
  &lt;span style="color: #474c68;"&gt;&lt;/span&gt; 
 &lt;/div&gt; 
 &lt;p style="width: 800px; text-align: left;"&gt;&lt;span style="color: #474c68;"&gt;Fig. 4: proteanTecs VDDmin Prediction: Measured VDDmin (Y-axis) vs. predicted VDDmin (X-axis) comparison demonstrates exceptional accuracy with 0.15 NRMSE.&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;‍&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;VDDmin Prediction is based on an ML model, trained on accurate data from the on-chip Agents. During chip-level high volume production testing, the model is integrated into the test program software and used in real time, per device, on the test floor – to predict the optimal voltage. The prediction is tested and after a minimal number of search steps, the operational voltage is fused in the device.&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;Voltage reduction has a substantial effect on peak power, which in turn lowers the Tcase and the cooling solution cost:&lt;/span&gt;&lt;/p&gt;  
 &lt;div style="color: rgba(0, 0, 0, 0);"&gt; 
  &lt;span style="color: #474c68;"&gt;&lt;/span&gt; 
 &lt;/div&gt;  
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;As peak power reduction translates to lower thermal load, it has a system-wide impact:&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;‍&lt;/span&gt;&lt;/p&gt; 
 &lt;ul&gt; 
  &lt;li style="color: #697694;"&gt;&lt;span style="color: #474c68;"&gt;Lower Tcase&lt;/span&gt;&lt;/li&gt; 
  &lt;li style="color: #697694;"&gt;&lt;span style="color: #474c68;"&gt;Lower TDP&lt;/span&gt;&lt;/li&gt; 
  &lt;li style="color: #697694;"&gt;&lt;span style="color: #474c68;"&gt;Cheaper cooling solution&lt;/span&gt;&lt;/li&gt; 
  &lt;li style="color: #697694;"&gt;&lt;span style="color: #474c68;"&gt;Better reliability&lt;/span&gt;&lt;/li&gt; 
  &lt;li style="color: #697694;"&gt;&lt;span style="color: #474c68;"&gt;Increased lifetime&lt;/span&gt;&lt;br&gt;&lt;span style="color: #474c68;"&gt;‍&lt;/span&gt;&lt;/li&gt; 
 &lt;/ul&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;For example, these are the quantified benefits when VDDmin Prediction reduces voltage by 3%-5%:&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;‍&lt;/span&gt;&lt;/p&gt; 
 &lt;ul&gt; 
  &lt;li style="color: #697694;"&gt;&lt;span style="color: #474c68;"&gt;∆&lt;em&gt;P&lt;/em&gt; [&lt;em&gt;W&lt;/em&gt;] is within -6% to -10%&lt;/span&gt;&lt;/li&gt; 
  &lt;li style="color: #697694;"&gt;&lt;span style="color: #474c68;"&gt;∆&lt;em&gt;T&lt;sub style="line-height: 0;"&gt;case&lt;/sub&gt;&lt;/em&gt; [&lt;em&gt;°C&lt;/em&gt;] is within -3% to -5%&lt;/span&gt;&lt;/li&gt; 
  &lt;li style="color: #697694;"&gt;&lt;span style="color: #474c68;"&gt;∆&lt;em&gt;TDP&lt;/em&gt; [&lt;em&gt;W&lt;/em&gt;] is within 3% to 5%&lt;/span&gt;&lt;/li&gt; 
  &lt;li style="color: #697694;"&gt;&lt;span style="color: #474c68;"&gt;∆&lt;em&gt;Cooling Cost&lt;/em&gt; [$] is within -3% to -5%&lt;/span&gt;&lt;br&gt;&lt;span style="color: #474c68;"&gt;‍&lt;/span&gt;&lt;/li&gt; 
 &lt;/ul&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;These optimizations are critical because cooling systems already consume 30% to 55% of datacenter power budgets. Reducing chip power directly cuts thermal load, which translates into real savings in infrastructure.&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;In high-density racks, advanced liquid cooling can cost between $1,000 and $2,000 per kW cooled, which can add up to millions of dollars annually. Every watt saved at the silicon level reduces that burden.&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="line-height: 44px; font-weight: bold;"&gt;&lt;span style="color: #474c68;"&gt;On-Chip Telemetry ON: Optimized Di/Dt Noise&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;High current swings trigger voltage droop, which can disrupt timing and cause failures. In the absence of accurate real-time monitoring, engineers compensate by using higher voltages, wider margins, on-die droop mitigation solutions (that incur performance penalty), and die cost to absorb these transients.&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;proteanTecs VDDmin Prediction makes these compensations unnecessary\y by lowering VDD per die, which improves signal integrity through reduced current swings and Di/Dt noise. Lower voltage also makes room for higher frequencies that boost performance, as captured by this equation:&lt;/span&gt;&lt;/p&gt;  
 &lt;div style="color: rgba(0, 0, 0, 0);"&gt; 
  &lt;span style="color: #474c68;"&gt;&lt;/span&gt; 
 &lt;/div&gt;  
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;These improvements enable:&lt;/span&gt;&lt;/p&gt; 
 &lt;ul&gt; 
  &lt;li style="color: #697694;"&gt;&lt;span style="color: #474c68;"&gt;Safer operation (reduced noise)&lt;/span&gt;&lt;/li&gt; 
  &lt;li style="color: #697694;"&gt;&lt;span style="color: #474c68;"&gt;Cheaper package&lt;/span&gt;&lt;/li&gt; 
  &lt;li style="color: #697694;"&gt;&lt;span style="color: #474c68;"&gt;Better performance&lt;/span&gt;&lt;/li&gt; 
 &lt;/ul&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;For example, these are the quantified benefits when VDDmin Prediction reduces voltage by 3%-5%:&lt;/span&gt;&lt;/p&gt; 
 &lt;ul&gt; 
  &lt;li style="color: #697694;"&gt;&lt;span style="color: #474c68;"&gt;∆&lt;em&gt;I&lt;/em&gt; [&lt;em&gt;mA&lt;/em&gt;] is within -3% to -5%&lt;/span&gt;&lt;/li&gt; 
  &lt;li style="color: #697694;"&gt;&lt;span style="color: #474c68;"&gt;∆&lt;em&gt;V&lt;sub style="line-height: 0;"&gt;noise&lt;/sub&gt;&lt;/em&gt; [&lt;em&gt;mV&lt;/em&gt;] is within -3% to -5%&lt;/span&gt;&lt;/li&gt; 
  &lt;li style="color: #697694;"&gt;&lt;span style="color: #474c68;"&gt;∆&lt;em&gt;F&lt;/em&gt; [&lt;em&gt;MHz&lt;/em&gt;] is within 3% to 5%&lt;/span&gt;&lt;/li&gt; 
 &lt;/ul&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;In addition, proteanTecs provides real-time voltage droop sensors to protect the device in mission-mode. They provide real-time hardware signals that can trigger a clock throttling event to avoid failure and reduce Di/Dt.&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px; font-weight: bold;"&gt;&lt;span style="color: #474c68;"&gt;‍On-Chip Telemetry ON: Optimized Average Power&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;Unlike canary circuits, proteanTecs AVS Pro uses Agents that monitor true logic paths for more informed decisions. proteanTecs’ technology allows high coverage of the performance limiters, allowing precise guard-band tuning based on real workloads, aging, and IR drops.&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;This approach enables safer voltage scaling, avoiding worst-case guard-bands and allowing the device to operate closer to its actual limits without compromising functionality, performance and reliability. As demonstrated below, &lt;a href="https://www.proteantecs.com/hubfs/Resources%20-%20outbound/proteanTecs%20AVS%20Pro%20case%20study.pdf" style="color: #474c68;"&gt;AVS Pro safely reduced power consumption of a production 5nm SoC by 12.5%&lt;/a&gt;. At the same time, it extended the predicted lifetime by 18%.&lt;/span&gt;&lt;/p&gt;  
 &lt;div style="color: rgba(0, 0, 0, 0);"&gt; 
  &lt;span style="color: #474c68;"&gt;&lt;/span&gt; 
 &lt;/div&gt;  
 &lt;span style="color: #474c68;"&gt;Fig. 5: AVS Pro, visualized here, enables 12.51% power saving through safer voltage scaling, leading to 18% projected lifetime extension.&lt;/span&gt;   
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;‍&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;proteanTecs AVS Pro continuously adjusts voltage based on real-time Agent data. As the device operates with a surplus of timing margin, AVS Pro reduces the voltage. When more stressful functional workloads operate or degradation reduces timing margins, AVS Pro increases the voltage only as much as needed to maintain safe operation.&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;This continuous response avoids both oversupply and undershoot, providing substantial benefits:&lt;/span&gt;&lt;/p&gt; 
 &lt;ul&gt; 
  &lt;li style="color: #697694;"&gt;&lt;span style="color: #474c68;"&gt;Optimized power-performance&lt;/span&gt;&lt;/li&gt; 
  &lt;li style="color: #697694;"&gt;&lt;span style="color: #474c68;"&gt;Reduced power cost&lt;/span&gt;&lt;/li&gt; 
  &lt;li style="color: #697694;"&gt;&lt;span style="color: #474c68;"&gt;Higher battery life (when applicable)&lt;/span&gt;&lt;/li&gt; 
  &lt;li style="color: #697694;"&gt;&lt;span style="color: #474c68;"&gt;Increased lifetime&lt;/span&gt;&lt;/li&gt; 
  &lt;li style="color: #697694;"&gt;&lt;span style="color: #474c68;"&gt;Better reliability&lt;/span&gt;&lt;/li&gt; 
 &lt;/ul&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;The chart below shows how AVS Pro delays degradation over time. The device maintains safe performance levels for longer, pushing the wear-out point further into the product lifecycle.&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;This type of lifetime extension has significant financial implications. &lt;a href="http://proteantecs-7884687-hs-sites-com.sandbox.hs-sites.com/resources/whitepaper-extending-chip-lifetime-with-safer-voltage-scaling" style="color: #474c68;"&gt;Hyperscalers like Amazon, Alphabet, and Microsoft&lt;/a&gt; publicly attribute billions in annual net income to extending server lifespans by just one to two years. proteanTecs AVS Pro supports similar CAPEX reduction strategies by delaying degradation without compromising performance.&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;To learn more about the benefits of using AVS Pro for chip lifetime extension, read the full paper &lt;a href="http://proteantecs-7884687-hs-sites-com.sandbox.hs-sites.com/resources/whitepaper-application-specific-power-performance-optimizer-with-on-chip-monitoring" style="color: #474c68;"&gt;here&lt;/a&gt;.&lt;/span&gt;&lt;/p&gt;  
 &lt;div style="color: rgba(0, 0, 0, 0);"&gt; 
  &lt;span style="color: #474c68;"&gt;&lt;/span&gt; 
 &lt;/div&gt;  
 &lt;span style="color: #474c68;"&gt;Fig. 6: An example of chip lifetime extension enabled by AVS Pro: 5nm delay degradation simulations [%] at nominal conditions: T junction 85 °C, V=0.75V&lt;br&gt;&lt;br&gt;&lt;/span&gt;   
 &lt;p style="line-height: 44px; font-weight: bold;"&gt;&lt;span style="color: #474c68;"&gt;Conclusion – A tale of two chips&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;Many love underdogs, but as this article shows, products with on-chip telemetry win by a knockout.&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;proteanTecs provides visibility from within that spans production and deployment. With VDDmin Prediction and AVS Pro, power optimization begins at production test and continues throughout system operation.&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;VDDmin Prediction reduces peak power and Di/DT noise by tuning VDD with personalization and precision. Dedicated voltage droop sensors protect the device in real time, when unexpected workloads arrive. AVS Pro cuts average power through safer voltage scaling in the field. Together, they improve critical aspects of power, performance, and cost:&lt;/span&gt;&lt;/p&gt; 
 &lt;ul&gt; 
  &lt;li style="color: #697694;"&gt;&lt;span style="color: #474c68;"&gt;&lt;strong&gt;For end users&lt;/strong&gt; such as data center operators: lower energy costs, better performance, improved reliability, and longer lifetime.&lt;/span&gt;&lt;/li&gt; 
  &lt;li style="color: #697694;"&gt;&lt;span style="color: #474c68;"&gt;&lt;strong&gt;For system providers&lt;/strong&gt;: lower system power, lower TDP, cheaper system that is also more compact, improved reliability, and longer lifetime.&lt;/span&gt;&lt;/li&gt; 
  &lt;li style="color: #697694;"&gt;&lt;span style="color: #474c68;"&gt;&lt;strong&gt;For chip vendors&lt;/strong&gt;: improved power-performance, VDD Noise reduction, cheaper package, improved reliability, and longer lifetime.&lt;/span&gt;&lt;/li&gt; 
 &lt;/ul&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;Ready to realize the full benefits of on-chip telemetry? Contact our team &lt;a href="http://proteantecs-7884687-hs-sites-com.sandbox.hs-sites.com/contact" style="color: #474c68;"&gt;here&lt;/a&gt; or download our &lt;a href="http://proteantecs-7884687-hs-sites-com.sandbox.hs-sites.com/resources/whitepaper-application-specific-power-performance-optimizer-with-on-chip-monitoring" style="color: #474c68;"&gt;whitepaper&lt;/a&gt; to see how proteanTecs enhances performance, efficiency, reliability, and product lifetime from test to deployment.&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;‍&lt;/span&gt;&lt;/p&gt; 
&lt;/div&gt;  
&lt;img src="https://track.hubspot.com/__ptq.gif?a=7884687&amp;amp;k=14&amp;amp;r=https%3A%2F%2Fwww.proteantecs.com%2Fblog%2Fsame-chip-two-destinies-how-power-profiles-improve-with-on-chip-monitoring&amp;amp;bu=https%253A%252F%252Fwww.proteantecs.com%252Fblog&amp;amp;bvt=rss" alt="" width="1" height="1" style="min-height:1px!important;width:1px!important;border-width:0!important;margin-top:0!important;margin-bottom:0!important;margin-right:0!important;margin-left:0!important;padding-top:0!important;padding-bottom:0!important;padding-right:0!important;padding-left:0!important; "&gt;</content:encoded>
      <pubDate>Tue, 09 Sep 2025 15:00:00 GMT</pubDate>
      <guid>https://www.proteantecs.com/blog/same-chip-two-destinies-how-power-profiles-improve-with-on-chip-monitoring</guid>
      <dc:date>2025-09-09T15:00:00Z</dc:date>
      <dc:creator>Admin</dc:creator>
    </item>
    <item>
      <title>Thermal Sensing Headache Finally Over for 2nm and Beyond | proteanTecs Blog</title>
      <link>https://www.proteantecs.com/blog/thermal-sensing-headache-finally-over-for-2nm-and-beyond-proteantecs</link>
      <description>&lt;div class="hs-featured-image-wrapper"&gt; 
 &lt;a href="https://www.proteantecs.com/blog/thermal-sensing-headache-finally-over-for-2nm-and-beyond-proteantecs" title="" class="hs-featured-image-link"&gt; &lt;img src="https://www.proteantecs.com/hubfs/Website/Pages/Blogs%20Images%20and%20Thumbnail/4%20-%20Thermal%20Sensing%20Headache.png" alt="Thermal Sensing Headache Finally Over for 2nm and Beyond | proteanTecs Blog" class="hs-featured-image" style="width:auto !important; max-width:50%; float:left; margin:0 15px 15px 0;"&gt; &lt;/a&gt; 
&lt;/div&gt; 
&lt;h4 style="line-height: 31px; color: #474c68; background-color: #ffffff;"&gt;&lt;span style="color: #474c68;"&gt;Silicon-Proven LVTS for 2nm: A New Era of Accuracy and Integration in Thermal Monitoring&lt;/span&gt;&lt;br&gt;&lt;br&gt;&lt;/h4&gt; 
&lt;div style="color: #474c68; background-color: #ffffff;"&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;Effective thermal management is crucial to prevent overheating and optimize performance in modern SoCs. Inadequate temperature control due to inaccurate thermal sensing compromises power management, reliability, processing speed, and lifespan, leading to issues like electromigration, and hot carrier injection and even thermal runaway.&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;Unfortunately, precise thermal monitoring reached an inflection point at 2nm, with traditional solutions proving less practical below 3nm. To tackle the issue, this article delves into a novel approach, accurate to ±1.0°C, that overcomes this critical challenge.&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;&lt;strong&gt;proteanTecs now offers a customer-ready, silicon-proven solution for 5nm, 3nm and 2nm nodes.&lt;/strong&gt; In fact, our latest silicon reports demonstrate robust performance, validating that accurate and scalable thermal sensing is achievable in the most advanced nodes.&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;&lt;strong&gt;Accurate Thermal Sensing in Advanced Process Nodes: A Growing Challenge&lt;/strong&gt;&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;As process nodes scale to 2nm and below, accurately measuring on-chip temperature has become increasingly difficult. Traditional Voltage and Temperature sensors based on diodes are less practical in these nodes due to their high-voltage requirements. This gap in temperature measurement creates risks that compel chipmakers to seek future-ready solutions. The challenge is magnified in designs that leverage DVFS techniques.&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;&lt;strong&gt;Why Traditional Solutions Fall Short&lt;/strong&gt;&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;Traditional thermal sensing technologies are hitting hard limitations in precision and overall feasibility when moving beyond 3nm:&lt;/span&gt;&lt;/p&gt; 
 &lt;ul&gt; 
  &lt;li style="color: #697694;"&gt;&lt;span style="color: #474c68;"&gt;&lt;strong&gt;Temperature sensors based on BJT diodes&lt;/strong&gt;&lt;/span&gt;&lt;/li&gt; 
 &lt;/ul&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;Analog thermal diodes with Bipolar Junction Transistors (BJTs) have been a go-to option for accurate thermal sensing. However, their reliance on high I/O voltages makes them inapplicable for nodes beyond 3nm based on Gate-All-Around (GAA) technology, which doesn't support high I/O (analog) voltages, and BJT support may be discontinued as well in the future.&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;‍&lt;/span&gt;&lt;/p&gt;  
 &lt;div style="color: rgba(0, 0, 0, 0);"&gt; 
  &lt;span style="color: #474c68;"&gt;&lt;/span&gt; 
 &lt;/div&gt;  
 &lt;p style="line-height: 25px;"&gt;&lt;span style="color: #474c68;"&gt;&lt;em&gt;PNPBJT in a diode-connected configuration. The base-emitter junction has a predictable &lt;/em&gt;‍&lt;em&gt;transfer function that depends on temperature, making it suitable for thermal sensing. &lt;/em&gt;‍&lt;em&gt;However, analog thermal diodes are a no-go for nodes beyond 3nm.&lt;/em&gt;&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;Even before GAA, thermal diodes suffered from low coverage as they were hard to integrate. Their design restricted placement to chip edges near the I/O power supply, leaving vital internal areas unmonitored due to analog routing limitations. Furthermore, they consumed more power than low-voltage alternatives due to their high-voltage requirement.&lt;/span&gt;&lt;/p&gt; 
 &lt;ul&gt; 
  &lt;li style="color: #697694;"&gt;&lt;span style="color: #474c68;"&gt;&lt;strong&gt;Digital Temperature Measurements based on Ring oscillators&lt;/strong&gt;&lt;/span&gt;&lt;/li&gt; 
 &lt;/ul&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;Ring oscillators are scalable to advanced nodes, but their temperature measurement error can be as high as ±10°C. They are inadequate where accuracy is paramount. One example concern using thermal sensing to determine voltage or frequency adjustments (e.g. DVFS), as even slight temperature variations can significantly degrade performance.&lt;/span&gt;&lt;/p&gt;  
 &lt;div style="color: rgba(0, 0, 0, 0);"&gt; 
  &lt;span style="color: #474c68;"&gt;&lt;/span&gt; 
 &lt;/div&gt;  
 &lt;span style="color: #474c68;"&gt;Ring oscillator temperature error of different calibration techniques. Can be greater than -10°C, which is too high for many use cases.&lt;/span&gt;   
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;‍&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;The limitations above underscore the need for an accurate thermal sensing solution designed with core transistors only to fit advanced nodes.&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;&lt;strong&gt;A Thermal Sensor Built for the Future&lt;/strong&gt;&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;proteanTecs LVTS&lt;strong&gt;™&lt;/strong&gt; (Local Voltage and Thermal Sensor) is purpose-built for precision thermal sensing in advanced nodes without relying on I/O transistors and high analog I/O voltages and even BJTs. It measures temperature with accuracy of ±1.0°C while using core transistors exclusively and operating in a wide range of core voltages, combining precision with future readiness for GAA nodes.&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;&lt;strong&gt;Key features of LVTS:&lt;/strong&gt;&lt;/span&gt;&lt;/p&gt; 
 &lt;ul&gt; 
  &lt;li style="color: #697694; line-height: 1;"&gt;&lt;span style="color: #474c68;"&gt;Temperature measurement accuracy of +/-1°C (3-sigma)&lt;/span&gt;&lt;/li&gt; 
 &lt;/ul&gt; 
 &lt;ul style="line-height: 1;"&gt; 
  &lt;li style="color: #697694;"&gt;&lt;span style="color: #474c68;"&gt;Voltage measurement accuracy of +/-1.5% (3-sigma)&lt;/span&gt;&lt;/li&gt; 
 &lt;/ul&gt; 
 &lt;ul style="line-height: 1;"&gt; 
  &lt;li style="color: #697694;"&gt;&lt;span style="color: #474c68;"&gt;Over temperature fast alert&lt;/span&gt;&lt;/li&gt; 
 &lt;/ul&gt; 
 &lt;ul&gt; 
  &lt;li style="color: #697694;"&gt;&lt;span style="color: #474c68;"&gt;Wide range of operational voltages (650-950 mV)&lt;br&gt;&lt;br&gt;&lt;/span&gt;&lt;/li&gt; 
  &lt;li style="color: #697694; line-height: 1;"&gt;&lt;span style="color: #474c68;"&gt;High-speed measurement&lt;/span&gt;&lt;/li&gt; 
 &lt;/ul&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&amp;nbsp;&lt;/p&gt;  
 &lt;div style="color: rgba(0, 0, 0, 0);"&gt; 
  &lt;span style="color: #474c68;"&gt;&lt;/span&gt; 
 &lt;/div&gt;  
 &lt;span style="color: #474c68;"&gt;proteanTecs LVTS measurements demonstrate an accuracy of ±1°C in a wide range of voltages (0.65V SSG – 1.05V FFG) and temperatures (-40°C - 125°C.)&lt;/span&gt;   
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;‍&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;‍&lt;strong&gt;Unmatched Benefits Across All Critical Parameters&lt;/strong&gt;&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;LVTS operates with low VDD core rather than high I/O voltage while maintaining superb accuracy, unlike Digital Thermal sensors based on ring oscillators. This unique design enables easy integration anywhere on the chip, providing more granular voltage and temperature monitoring than thermal diodes. Additionally, its smaller size and lower power consumption minimize the impact on PPA compared to BJT-based solutions.&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&amp;nbsp;&lt;/p&gt;  
 &lt;div style="color: rgba(0, 0, 0, 0);"&gt; 
  &lt;span style="color: #474c68;"&gt;&lt;/span&gt; 
 &lt;/div&gt;  
 &lt;span style="color: #474c68;"&gt;LVTS compared with thermal diodes and ring oscillators (ROSC)&lt;/span&gt;   
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;‍&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;An additional capability of LVTS provides &lt;strong&gt;real-time warnings and critical alerts&lt;/strong&gt; in the form of HW signals when predetermined thermal thresholds are breached. This feature enables immediate corrective action, reducing the risk of overheating to maintain chip integrity.&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;&lt;strong&gt;LVTS Flavors for Enhanced Flexibility&lt;/strong&gt;&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;In addition to the standard LVTS described above, proteanTecs offers two specialized variants to address diverse design needs:&lt;/span&gt;&lt;/p&gt; 
 &lt;ul&gt; 
  &lt;li style="color: #697694;"&gt;&lt;span style="color: #474c68;"&gt;An extended flavor - includes external voltage measurement to extend the measured voltage range down to zero volts.&lt;/span&gt;&lt;/li&gt; 
 &lt;/ul&gt; 
 &lt;ul&gt; 
  &lt;li style="color: #697694;"&gt;&lt;span style="color: #474c68;"&gt;A distributed flavor - designed as a Core VDD-only, analog thermal and DC voltage level sensor hub, it supports extremely small remote thermal sensors for precise temperature measurements at hot spots.&lt;/span&gt;&lt;/li&gt; 
 &lt;/ul&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;These two versions complement the regular LVTS, allowing chipmakers to tailor their thermal sensing approach for maximum coverage, precision, and responsiveness in critical areas of the design.&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;&lt;strong&gt;Complementing Deep Data Analytics with Accurate Voltage and Temperature Sensing&lt;/strong&gt;&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;LVTS is already silicon-proven in 5nm, 3nm, and now also in 2nm, with a detailed silicon report available, making it the industry-leading, future-proof, customer-ready solution.&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;This innovation was warmly embraced by multiple chipmakers concerned about the absence of accurate and reliable thermal sensing in next-generation silicon.&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;These customers use LVTS alongside other proteanTecs products, as it complements the broader deep data monitoring and analytics solutions &lt;a href="http://proteantecs-7884687-hs-sites-com.sandbox.hs-sites.com/knowledge-center" style="color: #474c68;"&gt;explored here&lt;/a&gt;.&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;LVTS is seamlessly integrated into proteanTecs’ HW Monitoring System, enabling accurate DC voltage and thermal measurements real-time, making LVTS a vital addition to chipmaker power and reliability strategies.&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;Want to know more about how LVTS can help scale your design to advanced nodes with accurate voltage and temperature sensing? &lt;a href="http://proteantecs-7884687-hs-sites-com.sandbox.hs-sites.com/contact" style="color: #474c68;"&gt;Contact us here&lt;/a&gt;.&lt;/span&gt;&lt;/p&gt; 
&lt;/div&gt;</description>
      <content:encoded>&lt;div class="hs-featured-image-wrapper"&gt; 
 &lt;a href="https://www.proteantecs.com/blog/thermal-sensing-headache-finally-over-for-2nm-and-beyond-proteantecs" title="" class="hs-featured-image-link"&gt; &lt;img src="https://www.proteantecs.com/hubfs/Website/Pages/Blogs%20Images%20and%20Thumbnail/4%20-%20Thermal%20Sensing%20Headache.png" alt="Thermal Sensing Headache Finally Over for 2nm and Beyond | proteanTecs Blog" class="hs-featured-image" style="width:auto !important; max-width:50%; float:left; margin:0 15px 15px 0;"&gt; &lt;/a&gt; 
&lt;/div&gt; 
&lt;h4 style="line-height: 31px; color: #474c68; background-color: #ffffff;"&gt;&lt;span style="color: #474c68;"&gt;Silicon-Proven LVTS for 2nm: A New Era of Accuracy and Integration in Thermal Monitoring&lt;/span&gt;&lt;br&gt;&lt;br&gt;&lt;/h4&gt; 
&lt;div style="color: #474c68; background-color: #ffffff;"&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;Effective thermal management is crucial to prevent overheating and optimize performance in modern SoCs. Inadequate temperature control due to inaccurate thermal sensing compromises power management, reliability, processing speed, and lifespan, leading to issues like electromigration, and hot carrier injection and even thermal runaway.&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;Unfortunately, precise thermal monitoring reached an inflection point at 2nm, with traditional solutions proving less practical below 3nm. To tackle the issue, this article delves into a novel approach, accurate to ±1.0°C, that overcomes this critical challenge.&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;&lt;strong&gt;proteanTecs now offers a customer-ready, silicon-proven solution for 5nm, 3nm and 2nm nodes.&lt;/strong&gt; In fact, our latest silicon reports demonstrate robust performance, validating that accurate and scalable thermal sensing is achievable in the most advanced nodes.&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;&lt;strong&gt;Accurate Thermal Sensing in Advanced Process Nodes: A Growing Challenge&lt;/strong&gt;&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;As process nodes scale to 2nm and below, accurately measuring on-chip temperature has become increasingly difficult. Traditional Voltage and Temperature sensors based on diodes are less practical in these nodes due to their high-voltage requirements. This gap in temperature measurement creates risks that compel chipmakers to seek future-ready solutions. The challenge is magnified in designs that leverage DVFS techniques.&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;&lt;strong&gt;Why Traditional Solutions Fall Short&lt;/strong&gt;&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;Traditional thermal sensing technologies are hitting hard limitations in precision and overall feasibility when moving beyond 3nm:&lt;/span&gt;&lt;/p&gt; 
 &lt;ul&gt; 
  &lt;li style="color: #697694;"&gt;&lt;span style="color: #474c68;"&gt;&lt;strong&gt;Temperature sensors based on BJT diodes&lt;/strong&gt;&lt;/span&gt;&lt;/li&gt; 
 &lt;/ul&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;Analog thermal diodes with Bipolar Junction Transistors (BJTs) have been a go-to option for accurate thermal sensing. However, their reliance on high I/O voltages makes them inapplicable for nodes beyond 3nm based on Gate-All-Around (GAA) technology, which doesn't support high I/O (analog) voltages, and BJT support may be discontinued as well in the future.&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;‍&lt;/span&gt;&lt;/p&gt;  
 &lt;div style="color: rgba(0, 0, 0, 0);"&gt; 
  &lt;span style="color: #474c68;"&gt;&lt;/span&gt; 
 &lt;/div&gt;  
 &lt;p style="line-height: 25px;"&gt;&lt;span style="color: #474c68;"&gt;&lt;em&gt;PNPBJT in a diode-connected configuration. The base-emitter junction has a predictable &lt;/em&gt;‍&lt;em&gt;transfer function that depends on temperature, making it suitable for thermal sensing. &lt;/em&gt;‍&lt;em&gt;However, analog thermal diodes are a no-go for nodes beyond 3nm.&lt;/em&gt;&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;Even before GAA, thermal diodes suffered from low coverage as they were hard to integrate. Their design restricted placement to chip edges near the I/O power supply, leaving vital internal areas unmonitored due to analog routing limitations. Furthermore, they consumed more power than low-voltage alternatives due to their high-voltage requirement.&lt;/span&gt;&lt;/p&gt; 
 &lt;ul&gt; 
  &lt;li style="color: #697694;"&gt;&lt;span style="color: #474c68;"&gt;&lt;strong&gt;Digital Temperature Measurements based on Ring oscillators&lt;/strong&gt;&lt;/span&gt;&lt;/li&gt; 
 &lt;/ul&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;Ring oscillators are scalable to advanced nodes, but their temperature measurement error can be as high as ±10°C. They are inadequate where accuracy is paramount. One example concern using thermal sensing to determine voltage or frequency adjustments (e.g. DVFS), as even slight temperature variations can significantly degrade performance.&lt;/span&gt;&lt;/p&gt;  
 &lt;div style="color: rgba(0, 0, 0, 0);"&gt; 
  &lt;span style="color: #474c68;"&gt;&lt;/span&gt; 
 &lt;/div&gt;  
 &lt;span style="color: #474c68;"&gt;Ring oscillator temperature error of different calibration techniques. Can be greater than -10°C, which is too high for many use cases.&lt;/span&gt;   
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;‍&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;The limitations above underscore the need for an accurate thermal sensing solution designed with core transistors only to fit advanced nodes.&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;&lt;strong&gt;A Thermal Sensor Built for the Future&lt;/strong&gt;&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;proteanTecs LVTS&lt;strong&gt;™&lt;/strong&gt; (Local Voltage and Thermal Sensor) is purpose-built for precision thermal sensing in advanced nodes without relying on I/O transistors and high analog I/O voltages and even BJTs. It measures temperature with accuracy of ±1.0°C while using core transistors exclusively and operating in a wide range of core voltages, combining precision with future readiness for GAA nodes.&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;&lt;strong&gt;Key features of LVTS:&lt;/strong&gt;&lt;/span&gt;&lt;/p&gt; 
 &lt;ul&gt; 
  &lt;li style="color: #697694; line-height: 1;"&gt;&lt;span style="color: #474c68;"&gt;Temperature measurement accuracy of +/-1°C (3-sigma)&lt;/span&gt;&lt;/li&gt; 
 &lt;/ul&gt; 
 &lt;ul style="line-height: 1;"&gt; 
  &lt;li style="color: #697694;"&gt;&lt;span style="color: #474c68;"&gt;Voltage measurement accuracy of +/-1.5% (3-sigma)&lt;/span&gt;&lt;/li&gt; 
 &lt;/ul&gt; 
 &lt;ul style="line-height: 1;"&gt; 
  &lt;li style="color: #697694;"&gt;&lt;span style="color: #474c68;"&gt;Over temperature fast alert&lt;/span&gt;&lt;/li&gt; 
 &lt;/ul&gt; 
 &lt;ul&gt; 
  &lt;li style="color: #697694;"&gt;&lt;span style="color: #474c68;"&gt;Wide range of operational voltages (650-950 mV)&lt;br&gt;&lt;br&gt;&lt;/span&gt;&lt;/li&gt; 
  &lt;li style="color: #697694; line-height: 1;"&gt;&lt;span style="color: #474c68;"&gt;High-speed measurement&lt;/span&gt;&lt;/li&gt; 
 &lt;/ul&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&amp;nbsp;&lt;/p&gt;  
 &lt;div style="color: rgba(0, 0, 0, 0);"&gt; 
  &lt;span style="color: #474c68;"&gt;&lt;/span&gt; 
 &lt;/div&gt;  
 &lt;span style="color: #474c68;"&gt;proteanTecs LVTS measurements demonstrate an accuracy of ±1°C in a wide range of voltages (0.65V SSG – 1.05V FFG) and temperatures (-40°C - 125°C.)&lt;/span&gt;   
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;‍&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;‍&lt;strong&gt;Unmatched Benefits Across All Critical Parameters&lt;/strong&gt;&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;LVTS operates with low VDD core rather than high I/O voltage while maintaining superb accuracy, unlike Digital Thermal sensors based on ring oscillators. This unique design enables easy integration anywhere on the chip, providing more granular voltage and temperature monitoring than thermal diodes. Additionally, its smaller size and lower power consumption minimize the impact on PPA compared to BJT-based solutions.&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&amp;nbsp;&lt;/p&gt;  
 &lt;div style="color: rgba(0, 0, 0, 0);"&gt; 
  &lt;span style="color: #474c68;"&gt;&lt;/span&gt; 
 &lt;/div&gt;  
 &lt;span style="color: #474c68;"&gt;LVTS compared with thermal diodes and ring oscillators (ROSC)&lt;/span&gt;   
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;‍&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;An additional capability of LVTS provides &lt;strong&gt;real-time warnings and critical alerts&lt;/strong&gt; in the form of HW signals when predetermined thermal thresholds are breached. This feature enables immediate corrective action, reducing the risk of overheating to maintain chip integrity.&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;&lt;strong&gt;LVTS Flavors for Enhanced Flexibility&lt;/strong&gt;&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;In addition to the standard LVTS described above, proteanTecs offers two specialized variants to address diverse design needs:&lt;/span&gt;&lt;/p&gt; 
 &lt;ul&gt; 
  &lt;li style="color: #697694;"&gt;&lt;span style="color: #474c68;"&gt;An extended flavor - includes external voltage measurement to extend the measured voltage range down to zero volts.&lt;/span&gt;&lt;/li&gt; 
 &lt;/ul&gt; 
 &lt;ul&gt; 
  &lt;li style="color: #697694;"&gt;&lt;span style="color: #474c68;"&gt;A distributed flavor - designed as a Core VDD-only, analog thermal and DC voltage level sensor hub, it supports extremely small remote thermal sensors for precise temperature measurements at hot spots.&lt;/span&gt;&lt;/li&gt; 
 &lt;/ul&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;These two versions complement the regular LVTS, allowing chipmakers to tailor their thermal sensing approach for maximum coverage, precision, and responsiveness in critical areas of the design.&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;&lt;strong&gt;Complementing Deep Data Analytics with Accurate Voltage and Temperature Sensing&lt;/strong&gt;&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;LVTS is already silicon-proven in 5nm, 3nm, and now also in 2nm, with a detailed silicon report available, making it the industry-leading, future-proof, customer-ready solution.&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;This innovation was warmly embraced by multiple chipmakers concerned about the absence of accurate and reliable thermal sensing in next-generation silicon.&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;These customers use LVTS alongside other proteanTecs products, as it complements the broader deep data monitoring and analytics solutions &lt;a href="http://proteantecs-7884687-hs-sites-com.sandbox.hs-sites.com/knowledge-center" style="color: #474c68;"&gt;explored here&lt;/a&gt;.&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;LVTS is seamlessly integrated into proteanTecs’ HW Monitoring System, enabling accurate DC voltage and thermal measurements real-time, making LVTS a vital addition to chipmaker power and reliability strategies.&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;Want to know more about how LVTS can help scale your design to advanced nodes with accurate voltage and temperature sensing? &lt;a href="http://proteantecs-7884687-hs-sites-com.sandbox.hs-sites.com/contact" style="color: #474c68;"&gt;Contact us here&lt;/a&gt;.&lt;/span&gt;&lt;/p&gt; 
&lt;/div&gt;  
&lt;img src="https://track.hubspot.com/__ptq.gif?a=7884687&amp;amp;k=14&amp;amp;r=https%3A%2F%2Fwww.proteantecs.com%2Fblog%2Fthermal-sensing-headache-finally-over-for-2nm-and-beyond-proteantecs&amp;amp;bu=https%253A%252F%252Fwww.proteantecs.com%252Fblog&amp;amp;bvt=rss" alt="" width="1" height="1" style="min-height:1px!important;width:1px!important;border-width:0!important;margin-top:0!important;margin-bottom:0!important;margin-right:0!important;margin-left:0!important;padding-top:0!important;padding-bottom:0!important;padding-right:0!important;padding-left:0!important; "&gt;</content:encoded>
      <pubDate>Mon, 01 Sep 2025 15:00:00 GMT</pubDate>
      <guid>https://www.proteantecs.com/blog/thermal-sensing-headache-finally-over-for-2nm-and-beyond-proteantecs</guid>
      <dc:date>2025-09-01T15:00:00Z</dc:date>
      <dc:creator>Admin</dc:creator>
    </item>
    <item>
      <title>Critical Optimization Factors for GenAI Chipmakers | proteanTecs Blog</title>
      <link>https://www.proteantecs.com/blog/critical-optimization-factors-for-genai-chipmakers</link>
      <description>&lt;div class="hs-featured-image-wrapper"&gt; 
 &lt;a href="https://www.proteantecs.com/blog/critical-optimization-factors-for-genai-chipmakers" title="" class="hs-featured-image-link"&gt; &lt;img src="https://www.proteantecs.com/hubfs/Website/Pages/Blogs%20Images%20and%20Thumbnail/5%20-%20Critical%20Optimization%20Factors.png" alt="Critical Optimization Factors for GenAI Chipmakers | proteanTecs Blog" class="hs-featured-image" style="width:auto !important; max-width:50%; float:left; margin:0 15px 15px 0;"&gt; &lt;/a&gt; 
&lt;/div&gt; 
&lt;h4 style="line-height: 31px; color: #474c68; background-color: #ffffff;"&gt;&lt;span style="color: #474c68;"&gt;The diverse approaches and innovative solutions shaping the future of AI hardware, essential for win&lt;/span&gt;&lt;br&gt;&lt;br&gt;&lt;/h4&gt; 
&lt;div style="color: #474c68; background-color: #ffffff;"&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;Today’s GenAI arms race is fought with novel chip architectures and packaging. Specialized hardware designs are proliferating in the form of GPUs, TPUs, NPUs, and more, all tuned for parallelism and matrix-heavy AI math.&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;In this hyper-competitive landscape, chip vendors scramble to differentiate their products on multiple fronts. They promise some mix of better performance, efficiency, or scalability, but the specific strategies vary widely:&lt;/span&gt;&lt;/p&gt; 
 &lt;ul&gt; 
  &lt;li style="color: #697694;"&gt;&lt;span style="color: #474c68;"&gt;&lt;strong&gt;Performance&lt;/strong&gt;&lt;/span&gt;&lt;/li&gt; 
 &lt;/ul&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;Some chipmakers aim to outgun the competition with sheer performance. Flagship GPUs, for example, focus on FLOPS and huge memory throughput. While memory is a critical factor in GenAI performance, this paper focuses on compute throughput bottlenecks.&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;One approach that chipmakers employ to win this category is advanced packaging, connecting multiple silicon chiplets in a single heterogeneous device to increase performance density.&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;Even a 10% speed improvement will have a profound impact due to the immense scale. For example, training a model like LLaMA 3.1 405B involved 16,000 GPUs, consumed approximately 27 megawatts, and required an estimated 40 billion PFLOPS [23]. That level of optimization can &lt;strong&gt;reduce training time by several weeks&lt;/strong&gt; and eliminate the need for thousands of GPU-days, &lt;strong&gt;translating to millions of dollars&lt;/strong&gt; in infrastructure savings.&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;​In large-scale AI inference operations, even modest throughput enhancements can lead to significant cost reductions. For instance, OpenAI's GPT-4 processes approximately 50 billion queries annually, incurring an estimated $144 million in compute costs [24]. Implementing a 10% throughput improvement could decrease the number of required servers, resulting in an estimated $14.4 million in annual savings.&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;‍&lt;/span&gt;&lt;/p&gt;  
 &lt;div style="color: rgba(0, 0, 0, 0);"&gt; 
  &lt;span style="color: #474c68;"&gt;&lt;/span&gt; 
 &lt;/div&gt;  
 &lt;span style="color: #474c68;"&gt;&lt;em&gt;A dramatic increase in inference latency from 73 ms/token in&lt;/em&gt;‍&lt;em&gt;OpenAI gpt-3.5-turbo to 196 ms/token in OpenAI gpt-4 [25].&lt;/em&gt;&lt;/span&gt;   
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;‍&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;Throughput optimization also reduces inference latency, which is a critical factor in user experience. For example, ​the response time of OpenAI's GPT-4 model has been measured at approximately 196 milliseconds per generated token [25]. Enhancing throughput by 10% could proportionally &lt;strong&gt;reduce this latency&lt;/strong&gt;, leading to faster response times and improved user satisfaction.&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;Performance improvements typically begin with design-time architecture exploration and RTL optimization, such as pipeline depth, compute unit allocation, and dataflow design. On top of that, chipmakers apply techniques like standard Adaptive Frequency Scaling (AFS) to push efficiency under dynamic conditions in the field.&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;However, these runtime methods are generally static and not workload-aware, leading to suboptimal performance in real-world deployments. Frequency scaling is also done conservatively to preserve thermal and functional stability. While these approaches help extract more performance within safe limits, they may fall short of what GenAI workloads demand.&lt;/span&gt;&lt;/p&gt; 
 &lt;ul&gt; 
  &lt;li style="color: #697694;"&gt;&lt;span style="color: #474c68;"&gt;&lt;strong&gt;Power Efficiency&lt;/strong&gt;&lt;/span&gt;&lt;/li&gt; 
 &lt;/ul&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;GenAI’s exponential growth in computational requirements urges chipmakers to pay closer attention to power consumption. Beyond immediate consequences, such as thermal problems, excessive wattage has severe implications for customers’ operational costs.&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;As a consequence, design wins increasingly revolve around &lt;strong&gt;Total Cost of Ownership&lt;/strong&gt; (TCO). This metric factors in not only the upfront hardware cost but also ongoing expenses like power, cooling, and infrastructure. Solutions that deliver more compute per watt can significantly reduce TCO and make large-scale AI deployments more sustainable.&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;Furthermore, reducing the power consumption of individual devices directly expands infrastructure performance. Every watt saved per chip frees up headroom within the data center’s fixed power budget, enabling higher system utilization across the fleet.&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;This power reduction allows operators to run more workloads, serve more users, or deploy additional systems without breaching energy limits. Improving PPW at the chip level becomes a strategic lever for maximizing performance within existing power constraints.&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;To explore how this dynamic plays out across real data center deployments, read &lt;a href="https://www.proteantecs.com/blog/examining-the-impact-of-chip-power-reduction-on-datacenter-economics" style="color: #474c68;"&gt;the full blog post here.&lt;/a&gt;&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;‍&lt;/span&gt;&lt;/p&gt;  
 &lt;div style="color: rgba(0, 0, 0, 0);"&gt; 
  &lt;span style="color: #474c68;"&gt;&lt;/span&gt; 
 &lt;/div&gt;  
 &lt;span style="color: #474c68;"&gt;&lt;em&gt;PPW can grow by increasing performance within the power &lt;/em&gt;‍&lt;em&gt;envelope or by reducing wattage without impacting FLOPS.&lt;/em&gt;&lt;/span&gt;   
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;‍&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;Power efficiency is typically optimized through a combination of design-time techniques and runtime control. Clock gating, power gating, and multi-voltage domains are widely used at the architecture and implementation levels to reduce dynamic and leakage power.&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;At runtime, methods like Dynamic Voltage and Frequency Scaling (DVFS) and Adaptive Voltage Scaling (AVS) are applied to adjust power consumption based on static models or basic telemetry, such as temperature or process variation. These standard techniques are not workload-aware and typically apply uniform guard bands across all chips to ensure stability across all devices and workloads.&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;As a result, they leave significant excess guard bands that cause unnecessary power consumption, undermining PPW. This inefficiency calls for more precise, real-time approaches that optimize power without compromising performance or reliability.&lt;/span&gt;&lt;/p&gt; 
 &lt;ul&gt; 
  &lt;li style="color: #697694;"&gt;&lt;span style="color: #474c68;"&gt;&lt;strong&gt;Reliaiblity&lt;/strong&gt;&lt;/span&gt;&lt;/li&gt; 
 &lt;/ul&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;A chip’s reliability at large scales is just as critical as its raw performance. DPPM measures the fraction of chips that exhibit failures post-manufacturing, directly impacting system uptime and operational costs. While semiconductor testing filters out detectable defects, latent issues stay hidden until real workloads expose them. As GenAI compute infrastructure scales to millions of deployed chips, even a low DPPM might translate to frequent failures with substantial consequences.&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;Furthermore, Silent Data Corruption (SDC) has emerged as a critical reliability threat to scaling GenAI training, as it corrupts computations without triggering alerts. Unlike memory bit flips, for example, mitigated by error correction codes (ECC), SDCs originate from subtle timing violations, aging effects, or marginal defects that escape standard semiconductor testing.&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;These errors leave no trace, yet a single one can distort model weights across interdependent nodes, quietly derailing a training run that may span weeks, involve over 25,000 GPUs, and cost more than $100 million [12]. In training clusters, even a single faulty processor can jeopardize the entire job. These workloads run across tightly coupled systems, each contributing to shared model parameters. If one chip introduces a silent error during synchronization, that corruption spreads throughout the cluster. &lt;strong&gt;‍&lt;/strong&gt;&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;&lt;strong&gt;Download White Paper: &lt;/strong&gt;&lt;a href="http://proteantecs-7884687-hs-sites-com.sandbox.hs-sites.com/resources/whitepaper-outsmarting-silent-data-corruption-in-ai-processors-with-two-stage-detection" style="color: #474c68;"&gt;Outsmarting Silent Data Corruption in AI Processors with Two-Stage Detection&lt;/a&gt;&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;Ensuring reliability has traditionally relied on periodic field testing to uncover potential failures. While effective for basic quality assurance, these methods may miss latent defects, workload-driven faults, accelerated aging, and SDCs. They are also time-consuming and difficult to streamline within data center environments running high-intensity GenAI. The limitations of these offline techniques point to the need for continuous, in-situ monitoring to maintain reliability at hyperscale.&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;Despite these diverse optimization strategies, all chipmakers share a common challenge. They must set conservative operating guard bands to ensure reliability. This necessity presents an overlooked opportunity for significant optimization that can shape who wins the GenAI race.&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;&lt;strong&gt;proteanTecs Real-Time Monitoring for Scalable GenAI Chips&lt;/strong&gt;&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;As GenAI chips reach unprecedented levels of complexity, traditional design-time assumptions and static controls are no longer enough. Standard runtime methods such as AVS, DVFS, and AFS are static and rely on conservative guard bands. These approaches waste power, limit throughput, and fail to detect real-time reliability issues.&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;What chipmakers need is visibility into how each chip behaves under actual workloads. Not just design-time guard bands or environmental telemetry, but in-situ insights into timing margins, aging, and stress.&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;proteanTecs closes this critical gap by enabling a new class of in-chip applications that optimize each chip by tuning it in real time according to actual workloads.&lt;/span&gt;&lt;/p&gt;  
 &lt;div style="color: rgba(0, 0, 0, 0);"&gt; 
  &lt;span style="color: #474c68;"&gt;&lt;/span&gt; 
 &lt;/div&gt;  
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;By embedding agents inside the chip, proteanTecs delivers precise monitoring of real, performance limiting paths’ timing margins, application stress, operational and environmental effects, aging, latent defects, and process variation. This approach uncovers insights undetectable by legacy methods. With dedicated algorithms, these insights power three breakthrough applications:&lt;/span&gt;&lt;/p&gt; 
 &lt;ul&gt; 
  &lt;li style="color: #697694;"&gt;&lt;span style="color: #474c68;"&gt;&lt;strong&gt;AVS Pro&lt;/strong&gt; dramatically reduces power consumption by safely trimming excess voltage guard bands, improving Performance-Per-Watt (PPW) and lowering TCO while guaranteeing reliability.&lt;/span&gt;&lt;/li&gt; 
 &lt;/ul&gt; 
 &lt;ul&gt; 
  &lt;li style="color: #697694;"&gt;&lt;span style="color: #474c68;"&gt;&lt;strong&gt;RTHM&lt;/strong&gt; provides continuous health tracking of the device, detecting marginal behavior before it leads to functional failures or SDC. This capability is especially crucial for billion-dollar model training runs.&lt;/span&gt;&lt;/li&gt; 
 &lt;/ul&gt; 
 &lt;ul&gt; 
  &lt;li style="color: #697694;"&gt;&lt;span style="color: #474c68;"&gt;&lt;strong&gt;AFS Pro&lt;/strong&gt; extracts extra performance by reclaiming hidden frequency headroom, dynamically tuning each chip closer to its unique threshold for maximum throughput, while maintaining a functionality safety net.&lt;/span&gt;&lt;/li&gt; 
 &lt;/ul&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;In the GenAI era, chipmakers must strategically balance unprecedented performance, stringent power efficiency, and rock-solid reliability. The complexity of achieving these goals calls for real-time, workload-aware optimization techniques beyond conventional guard bands and static methods. As GenAI continues its rapid evolution, embedding advanced monitoring and dynamic tuning capabilities directly within chips emerges not only as a differentiator but a necessity—shaping who will ultimately lead this high-stakes technological revolution.&lt;/span&gt;&lt;/p&gt;  
 &lt;div style="color: rgba(0, 0, 0, 0);"&gt; 
  &lt;span style="color: #474c68;"&gt;&lt;/span&gt; 
 &lt;/div&gt;  
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;Together, these solutions turn conservative margins into a competitive advantage, allowing GenAI chipmakers and cloud operators to scale faster, safer, and smarter.&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;&lt;span style="font-weight: normal;"&gt;Want to learn how these capabilities deliver up to 12.5% power reduction and 8% higher performance?&lt;/span&gt;&#x1f449; &lt;a href="http://proteantecs-7884687-hs-sites-com.sandbox.hs-sites.com/resources/whitepaper-scaling-genai-training-and-inference-chips-with-runtime-monitoring" style="color: #474c68; font-weight: bold;"&gt;Read the full white paper &lt;/a&gt;to see how real-time in-chip optimization is redefining what’s possible in GenAI infrastructure.&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;This is part 3 of 3-part series:&lt;/span&gt;&lt;/p&gt; 
 &lt;h5 style="line-height: 25px;"&gt;&lt;span style="color: #474c68;"&gt;&lt;a href="https://www.proteantecs.com/blog/genais-breakneck-pace-is-reshaping-the-semiconductor-industry" style="color: #474c68;"&gt;Click here for part 1&lt;/a&gt; - GenAI's Breakneck Pace is Reshaping the Semiconductor Industry&lt;/span&gt;&lt;/h5&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;Unpacks how generative AI is outpacing Moore’s Law, the semiconductor shake-up driven by generative AI’s explosive rise, where generative models are racing toward superintelligence and chipmakers are scrambling to keep up.&lt;/span&gt;&lt;/p&gt; 
 &lt;h5 style="line-height: 25px;"&gt;&lt;span style="color: #474c68;"&gt;&lt;a href="https://www.proteantecs.com/blog/the-painful-reality-of-scaling-cloud-ai" style="color: #474c68;"&gt;Click here for part 2 &lt;/a&gt;- The Painful Reality of Scaling Cloud AI&lt;/span&gt;&lt;/h5&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;Delving deeper into the painful realities of scaling cloud AI infrastructure. We'll examine practical obstacles chipmakers face—including hardware failures and reliability issues such as Silent Data Corruption (SDC), surging power demands, and workload growth that continues to outpace Moore's Law.&lt;/span&gt;&lt;/p&gt; 
&lt;/div&gt;</description>
      <content:encoded>&lt;div class="hs-featured-image-wrapper"&gt; 
 &lt;a href="https://www.proteantecs.com/blog/critical-optimization-factors-for-genai-chipmakers" title="" class="hs-featured-image-link"&gt; &lt;img src="https://www.proteantecs.com/hubfs/Website/Pages/Blogs%20Images%20and%20Thumbnail/5%20-%20Critical%20Optimization%20Factors.png" alt="Critical Optimization Factors for GenAI Chipmakers | proteanTecs Blog" class="hs-featured-image" style="width:auto !important; max-width:50%; float:left; margin:0 15px 15px 0;"&gt; &lt;/a&gt; 
&lt;/div&gt; 
&lt;h4 style="line-height: 31px; color: #474c68; background-color: #ffffff;"&gt;&lt;span style="color: #474c68;"&gt;The diverse approaches and innovative solutions shaping the future of AI hardware, essential for win&lt;/span&gt;&lt;br&gt;&lt;br&gt;&lt;/h4&gt; 
&lt;div style="color: #474c68; background-color: #ffffff;"&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;Today’s GenAI arms race is fought with novel chip architectures and packaging. Specialized hardware designs are proliferating in the form of GPUs, TPUs, NPUs, and more, all tuned for parallelism and matrix-heavy AI math.&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;In this hyper-competitive landscape, chip vendors scramble to differentiate their products on multiple fronts. They promise some mix of better performance, efficiency, or scalability, but the specific strategies vary widely:&lt;/span&gt;&lt;/p&gt; 
 &lt;ul&gt; 
  &lt;li style="color: #697694;"&gt;&lt;span style="color: #474c68;"&gt;&lt;strong&gt;Performance&lt;/strong&gt;&lt;/span&gt;&lt;/li&gt; 
 &lt;/ul&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;Some chipmakers aim to outgun the competition with sheer performance. Flagship GPUs, for example, focus on FLOPS and huge memory throughput. While memory is a critical factor in GenAI performance, this paper focuses on compute throughput bottlenecks.&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;One approach that chipmakers employ to win this category is advanced packaging, connecting multiple silicon chiplets in a single heterogeneous device to increase performance density.&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;Even a 10% speed improvement will have a profound impact due to the immense scale. For example, training a model like LLaMA 3.1 405B involved 16,000 GPUs, consumed approximately 27 megawatts, and required an estimated 40 billion PFLOPS [23]. That level of optimization can &lt;strong&gt;reduce training time by several weeks&lt;/strong&gt; and eliminate the need for thousands of GPU-days, &lt;strong&gt;translating to millions of dollars&lt;/strong&gt; in infrastructure savings.&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;​In large-scale AI inference operations, even modest throughput enhancements can lead to significant cost reductions. For instance, OpenAI's GPT-4 processes approximately 50 billion queries annually, incurring an estimated $144 million in compute costs [24]. Implementing a 10% throughput improvement could decrease the number of required servers, resulting in an estimated $14.4 million in annual savings.&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;‍&lt;/span&gt;&lt;/p&gt;  
 &lt;div style="color: rgba(0, 0, 0, 0);"&gt; 
  &lt;span style="color: #474c68;"&gt;&lt;/span&gt; 
 &lt;/div&gt;  
 &lt;span style="color: #474c68;"&gt;&lt;em&gt;A dramatic increase in inference latency from 73 ms/token in&lt;/em&gt;‍&lt;em&gt;OpenAI gpt-3.5-turbo to 196 ms/token in OpenAI gpt-4 [25].&lt;/em&gt;&lt;/span&gt;   
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;‍&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;Throughput optimization also reduces inference latency, which is a critical factor in user experience. For example, ​the response time of OpenAI's GPT-4 model has been measured at approximately 196 milliseconds per generated token [25]. Enhancing throughput by 10% could proportionally &lt;strong&gt;reduce this latency&lt;/strong&gt;, leading to faster response times and improved user satisfaction.&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;Performance improvements typically begin with design-time architecture exploration and RTL optimization, such as pipeline depth, compute unit allocation, and dataflow design. On top of that, chipmakers apply techniques like standard Adaptive Frequency Scaling (AFS) to push efficiency under dynamic conditions in the field.&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;However, these runtime methods are generally static and not workload-aware, leading to suboptimal performance in real-world deployments. Frequency scaling is also done conservatively to preserve thermal and functional stability. While these approaches help extract more performance within safe limits, they may fall short of what GenAI workloads demand.&lt;/span&gt;&lt;/p&gt; 
 &lt;ul&gt; 
  &lt;li style="color: #697694;"&gt;&lt;span style="color: #474c68;"&gt;&lt;strong&gt;Power Efficiency&lt;/strong&gt;&lt;/span&gt;&lt;/li&gt; 
 &lt;/ul&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;GenAI’s exponential growth in computational requirements urges chipmakers to pay closer attention to power consumption. Beyond immediate consequences, such as thermal problems, excessive wattage has severe implications for customers’ operational costs.&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;As a consequence, design wins increasingly revolve around &lt;strong&gt;Total Cost of Ownership&lt;/strong&gt; (TCO). This metric factors in not only the upfront hardware cost but also ongoing expenses like power, cooling, and infrastructure. Solutions that deliver more compute per watt can significantly reduce TCO and make large-scale AI deployments more sustainable.&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;Furthermore, reducing the power consumption of individual devices directly expands infrastructure performance. Every watt saved per chip frees up headroom within the data center’s fixed power budget, enabling higher system utilization across the fleet.&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;This power reduction allows operators to run more workloads, serve more users, or deploy additional systems without breaching energy limits. Improving PPW at the chip level becomes a strategic lever for maximizing performance within existing power constraints.&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;To explore how this dynamic plays out across real data center deployments, read &lt;a href="https://www.proteantecs.com/blog/examining-the-impact-of-chip-power-reduction-on-datacenter-economics" style="color: #474c68;"&gt;the full blog post here.&lt;/a&gt;&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;‍&lt;/span&gt;&lt;/p&gt;  
 &lt;div style="color: rgba(0, 0, 0, 0);"&gt; 
  &lt;span style="color: #474c68;"&gt;&lt;/span&gt; 
 &lt;/div&gt;  
 &lt;span style="color: #474c68;"&gt;&lt;em&gt;PPW can grow by increasing performance within the power &lt;/em&gt;‍&lt;em&gt;envelope or by reducing wattage without impacting FLOPS.&lt;/em&gt;&lt;/span&gt;   
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;‍&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;Power efficiency is typically optimized through a combination of design-time techniques and runtime control. Clock gating, power gating, and multi-voltage domains are widely used at the architecture and implementation levels to reduce dynamic and leakage power.&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;At runtime, methods like Dynamic Voltage and Frequency Scaling (DVFS) and Adaptive Voltage Scaling (AVS) are applied to adjust power consumption based on static models or basic telemetry, such as temperature or process variation. These standard techniques are not workload-aware and typically apply uniform guard bands across all chips to ensure stability across all devices and workloads.&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;As a result, they leave significant excess guard bands that cause unnecessary power consumption, undermining PPW. This inefficiency calls for more precise, real-time approaches that optimize power without compromising performance or reliability.&lt;/span&gt;&lt;/p&gt; 
 &lt;ul&gt; 
  &lt;li style="color: #697694;"&gt;&lt;span style="color: #474c68;"&gt;&lt;strong&gt;Reliaiblity&lt;/strong&gt;&lt;/span&gt;&lt;/li&gt; 
 &lt;/ul&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;A chip’s reliability at large scales is just as critical as its raw performance. DPPM measures the fraction of chips that exhibit failures post-manufacturing, directly impacting system uptime and operational costs. While semiconductor testing filters out detectable defects, latent issues stay hidden until real workloads expose them. As GenAI compute infrastructure scales to millions of deployed chips, even a low DPPM might translate to frequent failures with substantial consequences.&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;Furthermore, Silent Data Corruption (SDC) has emerged as a critical reliability threat to scaling GenAI training, as it corrupts computations without triggering alerts. Unlike memory bit flips, for example, mitigated by error correction codes (ECC), SDCs originate from subtle timing violations, aging effects, or marginal defects that escape standard semiconductor testing.&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;These errors leave no trace, yet a single one can distort model weights across interdependent nodes, quietly derailing a training run that may span weeks, involve over 25,000 GPUs, and cost more than $100 million [12]. In training clusters, even a single faulty processor can jeopardize the entire job. These workloads run across tightly coupled systems, each contributing to shared model parameters. If one chip introduces a silent error during synchronization, that corruption spreads throughout the cluster. &lt;strong&gt;‍&lt;/strong&gt;&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;&lt;strong&gt;Download White Paper: &lt;/strong&gt;&lt;a href="http://proteantecs-7884687-hs-sites-com.sandbox.hs-sites.com/resources/whitepaper-outsmarting-silent-data-corruption-in-ai-processors-with-two-stage-detection" style="color: #474c68;"&gt;Outsmarting Silent Data Corruption in AI Processors with Two-Stage Detection&lt;/a&gt;&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;Ensuring reliability has traditionally relied on periodic field testing to uncover potential failures. While effective for basic quality assurance, these methods may miss latent defects, workload-driven faults, accelerated aging, and SDCs. They are also time-consuming and difficult to streamline within data center environments running high-intensity GenAI. The limitations of these offline techniques point to the need for continuous, in-situ monitoring to maintain reliability at hyperscale.&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;Despite these diverse optimization strategies, all chipmakers share a common challenge. They must set conservative operating guard bands to ensure reliability. This necessity presents an overlooked opportunity for significant optimization that can shape who wins the GenAI race.&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;&lt;strong&gt;proteanTecs Real-Time Monitoring for Scalable GenAI Chips&lt;/strong&gt;&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;As GenAI chips reach unprecedented levels of complexity, traditional design-time assumptions and static controls are no longer enough. Standard runtime methods such as AVS, DVFS, and AFS are static and rely on conservative guard bands. These approaches waste power, limit throughput, and fail to detect real-time reliability issues.&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;What chipmakers need is visibility into how each chip behaves under actual workloads. Not just design-time guard bands or environmental telemetry, but in-situ insights into timing margins, aging, and stress.&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;proteanTecs closes this critical gap by enabling a new class of in-chip applications that optimize each chip by tuning it in real time according to actual workloads.&lt;/span&gt;&lt;/p&gt;  
 &lt;div style="color: rgba(0, 0, 0, 0);"&gt; 
  &lt;span style="color: #474c68;"&gt;&lt;/span&gt; 
 &lt;/div&gt;  
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;By embedding agents inside the chip, proteanTecs delivers precise monitoring of real, performance limiting paths’ timing margins, application stress, operational and environmental effects, aging, latent defects, and process variation. This approach uncovers insights undetectable by legacy methods. With dedicated algorithms, these insights power three breakthrough applications:&lt;/span&gt;&lt;/p&gt; 
 &lt;ul&gt; 
  &lt;li style="color: #697694;"&gt;&lt;span style="color: #474c68;"&gt;&lt;strong&gt;AVS Pro&lt;/strong&gt; dramatically reduces power consumption by safely trimming excess voltage guard bands, improving Performance-Per-Watt (PPW) and lowering TCO while guaranteeing reliability.&lt;/span&gt;&lt;/li&gt; 
 &lt;/ul&gt; 
 &lt;ul&gt; 
  &lt;li style="color: #697694;"&gt;&lt;span style="color: #474c68;"&gt;&lt;strong&gt;RTHM&lt;/strong&gt; provides continuous health tracking of the device, detecting marginal behavior before it leads to functional failures or SDC. This capability is especially crucial for billion-dollar model training runs.&lt;/span&gt;&lt;/li&gt; 
 &lt;/ul&gt; 
 &lt;ul&gt; 
  &lt;li style="color: #697694;"&gt;&lt;span style="color: #474c68;"&gt;&lt;strong&gt;AFS Pro&lt;/strong&gt; extracts extra performance by reclaiming hidden frequency headroom, dynamically tuning each chip closer to its unique threshold for maximum throughput, while maintaining a functionality safety net.&lt;/span&gt;&lt;/li&gt; 
 &lt;/ul&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;In the GenAI era, chipmakers must strategically balance unprecedented performance, stringent power efficiency, and rock-solid reliability. The complexity of achieving these goals calls for real-time, workload-aware optimization techniques beyond conventional guard bands and static methods. As GenAI continues its rapid evolution, embedding advanced monitoring and dynamic tuning capabilities directly within chips emerges not only as a differentiator but a necessity—shaping who will ultimately lead this high-stakes technological revolution.&lt;/span&gt;&lt;/p&gt;  
 &lt;div style="color: rgba(0, 0, 0, 0);"&gt; 
  &lt;span style="color: #474c68;"&gt;&lt;/span&gt; 
 &lt;/div&gt;  
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;Together, these solutions turn conservative margins into a competitive advantage, allowing GenAI chipmakers and cloud operators to scale faster, safer, and smarter.&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;&lt;span style="font-weight: normal;"&gt;Want to learn how these capabilities deliver up to 12.5% power reduction and 8% higher performance?&lt;/span&gt;&#x1f449; &lt;a href="http://proteantecs-7884687-hs-sites-com.sandbox.hs-sites.com/resources/whitepaper-scaling-genai-training-and-inference-chips-with-runtime-monitoring" style="color: #474c68; font-weight: bold;"&gt;Read the full white paper &lt;/a&gt;to see how real-time in-chip optimization is redefining what’s possible in GenAI infrastructure.&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;This is part 3 of 3-part series:&lt;/span&gt;&lt;/p&gt; 
 &lt;h5 style="line-height: 25px;"&gt;&lt;span style="color: #474c68;"&gt;&lt;a href="https://www.proteantecs.com/blog/genais-breakneck-pace-is-reshaping-the-semiconductor-industry" style="color: #474c68;"&gt;Click here for part 1&lt;/a&gt; - GenAI's Breakneck Pace is Reshaping the Semiconductor Industry&lt;/span&gt;&lt;/h5&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;Unpacks how generative AI is outpacing Moore’s Law, the semiconductor shake-up driven by generative AI’s explosive rise, where generative models are racing toward superintelligence and chipmakers are scrambling to keep up.&lt;/span&gt;&lt;/p&gt; 
 &lt;h5 style="line-height: 25px;"&gt;&lt;span style="color: #474c68;"&gt;&lt;a href="https://www.proteantecs.com/blog/the-painful-reality-of-scaling-cloud-ai" style="color: #474c68;"&gt;Click here for part 2 &lt;/a&gt;- The Painful Reality of Scaling Cloud AI&lt;/span&gt;&lt;/h5&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;Delving deeper into the painful realities of scaling cloud AI infrastructure. We'll examine practical obstacles chipmakers face—including hardware failures and reliability issues such as Silent Data Corruption (SDC), surging power demands, and workload growth that continues to outpace Moore's Law.&lt;/span&gt;&lt;/p&gt; 
&lt;/div&gt;  
&lt;img src="https://track.hubspot.com/__ptq.gif?a=7884687&amp;amp;k=14&amp;amp;r=https%3A%2F%2Fwww.proteantecs.com%2Fblog%2Fcritical-optimization-factors-for-genai-chipmakers&amp;amp;bu=https%253A%252F%252Fwww.proteantecs.com%252Fblog&amp;amp;bvt=rss" alt="" width="1" height="1" style="min-height:1px!important;width:1px!important;border-width:0!important;margin-top:0!important;margin-bottom:0!important;margin-right:0!important;margin-left:0!important;padding-top:0!important;padding-bottom:0!important;padding-right:0!important;padding-left:0!important; "&gt;</content:encoded>
      <pubDate>Sat, 05 Jul 2025 15:00:00 GMT</pubDate>
      <guid>https://www.proteantecs.com/blog/critical-optimization-factors-for-genai-chipmakers</guid>
      <dc:date>2025-07-05T15:00:00Z</dc:date>
      <dc:creator>Admin</dc:creator>
    </item>
    <item>
      <title>The Painful Reality of Scaling Cloud AI | proteanTecs Blog</title>
      <link>https://www.proteantecs.com/blog/the-painful-reality-of-scaling-cloud-ai</link>
      <description>&lt;div class="hs-featured-image-wrapper"&gt; 
 &lt;a href="https://www.proteantecs.com/blog/the-painful-reality-of-scaling-cloud-ai" title="" class="hs-featured-image-link"&gt; &lt;img src="https://www.proteantecs.com/hubfs/Website/Pages/Blogs%20Images%20and%20Thumbnail/6%20-%20The%20Painful%20Reality.png" alt="The Painful Reality of Scaling Cloud AI | proteanTecs Blog" class="hs-featured-image" style="width:auto !important; max-width:50%; float:left; margin:0 15px 15px 0;"&gt; &lt;/a&gt; 
&lt;/div&gt; 
&lt;h4 style="line-height: 31px; color: #474c68; background-color: #ffffff;"&gt;&lt;span style="color: #474c68;"&gt;GenAI Workload Demands are Growing Orders of Magnitude Faster than Transistor Density&lt;br&gt;&lt;br&gt;&lt;/span&gt;&lt;/h4&gt; 
&lt;div style="color: #474c68; background-color: #ffffff;"&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;The shift to Generative AI (GenAI) has overwhelmed existing infrastructure, transforming previously rare issues into daily operational realities. Skyrocketing costs, intense energy consumption, and hardware failures at unprecedented scales illustrate the strain of current AI workloads. With models like GPT-4 costing tens of millions and GPT-5 projected to surpass a billion-dollar threshold, the economic and energy implications are staggering. In this section, we'll explore these critical challenges, detailing the escalating pressure on infrastructure as GenAI rapidly evolves and highlighting the urgent need for innovative solutions to scale AI sustainably and reliably.&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;The shift to GenAI has outpaced the infrastructure it runs on. What were once rare exceptions are now daily operations: high model complexity, non-stop inference demand, and intolerable cost structures. The numbers are no longer abstract. They’re a warning.&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;Training a model like GPT-4 reportedly consumed 25,000 GPUs over nearly 100 days, with costs reaching $100 million [12]. GPT-5 is expected to break the $1 billion mark [13]. Energy usage is just as daunting. Training GPT-4 drew an estimated 50 GWh, enough to power over 23,000 U.S. homes for a year [14]. Even with all that investment, reliability is fragile. A 16,384-GPU run experienced hardware failures every three hours, posing a threat to the integrity of weeks-long workloads [15].&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&amp;nbsp;&lt;/p&gt;  
 &lt;div style="color: rgba(0, 0, 0, 0);"&gt; 
  &lt;span style="color: #474c68;"&gt;&lt;/span&gt; 
 &lt;/div&gt;  
 &lt;span style="color: #474c68;"&gt;Projected AI power consumption grows from 8 TWh in 2024 to 652 TWh by 2030 (8,050%), driven by both training and a rapidly growing share of inference. Based on Wells Fargo data via IO Fund [16].&lt;/span&gt;   
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;‍&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;Inference isn’t easier. ChatGPT now serves more than one billion queries daily, with operational costs nearing $700K per day [17]. Each response, priced at just fractions of a cent, adds up to an infrastructure bill that outpaces most business models. That pressure is made worse by performance gaps. Users frequently report over 20-second delays for answers [18]. At this scale, even slight inefficiencies multiply into real dollars and degraded user experience.&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;These are not isolated incidents. They are signs of systemic strain. Massive training runs, crushing query volumes, rising failure rates, and mounting electricity costs—this is the environment GenAI must thrive in. What's needed isn’t incremental optimization. It’s a way to reclaim control and scale effectively.&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;The table below outlines the core challenges behind these risks. Each is backed by hard data. Together, they show just how steep the hill has become.&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;‍&lt;/span&gt;&lt;/p&gt;  
 &lt;div style="color: rgba(0, 0, 0, 0);"&gt; 
  &lt;span style="color: #474c68;"&gt;&lt;/span&gt; 
 &lt;/div&gt;  
 &lt;span style="color: #474c68;"&gt;Key operational challenges in cloud AI workloads.&lt;/span&gt;   
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;‍&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;‍&lt;strong&gt;Why Moore’s Law Is No Longer Enough&lt;/strong&gt;&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;Moore’s Law predicts that the number of transistors in an IC doubles approximately every two years. The law was accurate for decades, yet recent fabrication challenges slowed it to around 2.5 years for each new node [19]. More importantly, even the original rate couldn’t keep up with GenAI's computational requirements, which double much faster than transistor density.&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;It took 2.6 years to move from 5nm to 3nm, yet the reported performance gain at the same power was only about 10-15%, with 25-30% improvements in power efficiency at the same speed [20]. &lt;strong&gt;Meanwhile, GenAI workload demands are growing orders of magnitude faster.&lt;/strong&gt;&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;‍&lt;/span&gt;&lt;/p&gt;  
 &lt;div style="color: rgba(0, 0, 0, 0);"&gt; 
  &lt;span style="color: #474c68;"&gt;&lt;/span&gt; 
 &lt;/div&gt;  
 &lt;span style="color: #474c68;"&gt;&lt;em&gt;Growth in transistor density versus the PFLOPS required to train AI models from a 2021 baseline.&lt;/em&gt;‍&lt;em&gt;By 2024, AI compute requirements surged by 6847%, while transistor density grew by only 183%. 2025 value is based on the projected PFLOPS required to train GPT-5 [21].&lt;/em&gt;&lt;/span&gt;   
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;‍&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;Still, chipmakers manage to keep up with GenAI advancements, which marks a departure from the traditional scaling model. In some cases, a chip can be 30 times faster than its predecessor, which was announced less than a year earlier [22]. Such relentless demands force chipmakers to constantly seek new ways to optimize their products.&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;In &lt;a href="https://www.proteantecs.com/blog/critical-optimization-factors-for-genai-chipmakers" style="color: #474c68;"&gt;Part III of this series&lt;/a&gt;, we will discuss the critical optimization factors for GenAI chipmakers. We will explore how chipmakers differentiate their products using novel architectures, packaging strategies, and optimization techniques that target performance, power efficiency, and reliability. This next installment will detail the diverse approaches and innovative solutions shaping the future of AI hardware, essential for winning in today's hyper-competitive GenAI arms race.&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="line-height: 31px;"&gt;&lt;span style="color: #474c68;"&gt;This is part 2 of a 3-part blog series:&lt;/span&gt;&lt;/p&gt; 
 &lt;h5 style="line-height: 25px;"&gt;&lt;span style="color: #474c68;"&gt;&lt;a href="https://www.proteantecs.com/blog/genais-breakneck-pace-is-reshaping-the-semiconductor-industry" style="color: #474c68;"&gt;Click here for part 1 &lt;/a&gt;- GenAI's Breakneck Pace is Reshaping the Semiconductor Industry&lt;/span&gt;&lt;/h5&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;Unpacks how generative AI is outpacing Moore’s Law, the semiconductor shake-up driven by generative AI’s explosive rise, where generative models are racing toward superintelligence and chipmakers are scrambling to keep up.&lt;/span&gt;&lt;/p&gt; 
 &lt;h5 style="line-height: 25px;"&gt;&lt;span style="color: #474c68;"&gt;&lt;a href="https://www.proteantecs.com/blog/critical-optimization-factors-for-genai-chipmakers" style="color: #474c68;"&gt;Click here for part 3&lt;/a&gt; - Critical Optimization Factors for GenAI Chipmakers&lt;/span&gt;&lt;/h5&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;Discussing the critical optimization factors for GenAI chipmakers. We will explore how chipmakers differentiate their products using novel architectures, packaging strategies, and optimization techniques that target performance, power efficiency, and reliability.&lt;/span&gt;&lt;/p&gt; 
&lt;/div&gt;</description>
      <content:encoded>&lt;div class="hs-featured-image-wrapper"&gt; 
 &lt;a href="https://www.proteantecs.com/blog/the-painful-reality-of-scaling-cloud-ai" title="" class="hs-featured-image-link"&gt; &lt;img src="https://www.proteantecs.com/hubfs/Website/Pages/Blogs%20Images%20and%20Thumbnail/6%20-%20The%20Painful%20Reality.png" alt="The Painful Reality of Scaling Cloud AI | proteanTecs Blog" class="hs-featured-image" style="width:auto !important; max-width:50%; float:left; margin:0 15px 15px 0;"&gt; &lt;/a&gt; 
&lt;/div&gt; 
&lt;h4 style="line-height: 31px; color: #474c68; background-color: #ffffff;"&gt;&lt;span style="color: #474c68;"&gt;GenAI Workload Demands are Growing Orders of Magnitude Faster than Transistor Density&lt;br&gt;&lt;br&gt;&lt;/span&gt;&lt;/h4&gt; 
&lt;div style="color: #474c68; background-color: #ffffff;"&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;The shift to Generative AI (GenAI) has overwhelmed existing infrastructure, transforming previously rare issues into daily operational realities. Skyrocketing costs, intense energy consumption, and hardware failures at unprecedented scales illustrate the strain of current AI workloads. With models like GPT-4 costing tens of millions and GPT-5 projected to surpass a billion-dollar threshold, the economic and energy implications are staggering. In this section, we'll explore these critical challenges, detailing the escalating pressure on infrastructure as GenAI rapidly evolves and highlighting the urgent need for innovative solutions to scale AI sustainably and reliably.&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;The shift to GenAI has outpaced the infrastructure it runs on. What were once rare exceptions are now daily operations: high model complexity, non-stop inference demand, and intolerable cost structures. The numbers are no longer abstract. They’re a warning.&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;Training a model like GPT-4 reportedly consumed 25,000 GPUs over nearly 100 days, with costs reaching $100 million [12]. GPT-5 is expected to break the $1 billion mark [13]. Energy usage is just as daunting. Training GPT-4 drew an estimated 50 GWh, enough to power over 23,000 U.S. homes for a year [14]. Even with all that investment, reliability is fragile. A 16,384-GPU run experienced hardware failures every three hours, posing a threat to the integrity of weeks-long workloads [15].&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&amp;nbsp;&lt;/p&gt;  
 &lt;div style="color: rgba(0, 0, 0, 0);"&gt; 
  &lt;span style="color: #474c68;"&gt;&lt;/span&gt; 
 &lt;/div&gt;  
 &lt;span style="color: #474c68;"&gt;Projected AI power consumption grows from 8 TWh in 2024 to 652 TWh by 2030 (8,050%), driven by both training and a rapidly growing share of inference. Based on Wells Fargo data via IO Fund [16].&lt;/span&gt;   
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;‍&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;Inference isn’t easier. ChatGPT now serves more than one billion queries daily, with operational costs nearing $700K per day [17]. Each response, priced at just fractions of a cent, adds up to an infrastructure bill that outpaces most business models. That pressure is made worse by performance gaps. Users frequently report over 20-second delays for answers [18]. At this scale, even slight inefficiencies multiply into real dollars and degraded user experience.&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;These are not isolated incidents. They are signs of systemic strain. Massive training runs, crushing query volumes, rising failure rates, and mounting electricity costs—this is the environment GenAI must thrive in. What's needed isn’t incremental optimization. It’s a way to reclaim control and scale effectively.&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;The table below outlines the core challenges behind these risks. Each is backed by hard data. Together, they show just how steep the hill has become.&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;‍&lt;/span&gt;&lt;/p&gt;  
 &lt;div style="color: rgba(0, 0, 0, 0);"&gt; 
  &lt;span style="color: #474c68;"&gt;&lt;/span&gt; 
 &lt;/div&gt;  
 &lt;span style="color: #474c68;"&gt;Key operational challenges in cloud AI workloads.&lt;/span&gt;   
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;‍&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;‍&lt;strong&gt;Why Moore’s Law Is No Longer Enough&lt;/strong&gt;&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;Moore’s Law predicts that the number of transistors in an IC doubles approximately every two years. The law was accurate for decades, yet recent fabrication challenges slowed it to around 2.5 years for each new node [19]. More importantly, even the original rate couldn’t keep up with GenAI's computational requirements, which double much faster than transistor density.&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;It took 2.6 years to move from 5nm to 3nm, yet the reported performance gain at the same power was only about 10-15%, with 25-30% improvements in power efficiency at the same speed [20]. &lt;strong&gt;Meanwhile, GenAI workload demands are growing orders of magnitude faster.&lt;/strong&gt;&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;‍&lt;/span&gt;&lt;/p&gt;  
 &lt;div style="color: rgba(0, 0, 0, 0);"&gt; 
  &lt;span style="color: #474c68;"&gt;&lt;/span&gt; 
 &lt;/div&gt;  
 &lt;span style="color: #474c68;"&gt;&lt;em&gt;Growth in transistor density versus the PFLOPS required to train AI models from a 2021 baseline.&lt;/em&gt;‍&lt;em&gt;By 2024, AI compute requirements surged by 6847%, while transistor density grew by only 183%. 2025 value is based on the projected PFLOPS required to train GPT-5 [21].&lt;/em&gt;&lt;/span&gt;   
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;‍&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;Still, chipmakers manage to keep up with GenAI advancements, which marks a departure from the traditional scaling model. In some cases, a chip can be 30 times faster than its predecessor, which was announced less than a year earlier [22]. Such relentless demands force chipmakers to constantly seek new ways to optimize their products.&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;In &lt;a href="https://www.proteantecs.com/blog/critical-optimization-factors-for-genai-chipmakers" style="color: #474c68;"&gt;Part III of this series&lt;/a&gt;, we will discuss the critical optimization factors for GenAI chipmakers. We will explore how chipmakers differentiate their products using novel architectures, packaging strategies, and optimization techniques that target performance, power efficiency, and reliability. This next installment will detail the diverse approaches and innovative solutions shaping the future of AI hardware, essential for winning in today's hyper-competitive GenAI arms race.&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="line-height: 31px;"&gt;&lt;span style="color: #474c68;"&gt;This is part 2 of a 3-part blog series:&lt;/span&gt;&lt;/p&gt; 
 &lt;h5 style="line-height: 25px;"&gt;&lt;span style="color: #474c68;"&gt;&lt;a href="https://www.proteantecs.com/blog/genais-breakneck-pace-is-reshaping-the-semiconductor-industry" style="color: #474c68;"&gt;Click here for part 1 &lt;/a&gt;- GenAI's Breakneck Pace is Reshaping the Semiconductor Industry&lt;/span&gt;&lt;/h5&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;Unpacks how generative AI is outpacing Moore’s Law, the semiconductor shake-up driven by generative AI’s explosive rise, where generative models are racing toward superintelligence and chipmakers are scrambling to keep up.&lt;/span&gt;&lt;/p&gt; 
 &lt;h5 style="line-height: 25px;"&gt;&lt;span style="color: #474c68;"&gt;&lt;a href="https://www.proteantecs.com/blog/critical-optimization-factors-for-genai-chipmakers" style="color: #474c68;"&gt;Click here for part 3&lt;/a&gt; - Critical Optimization Factors for GenAI Chipmakers&lt;/span&gt;&lt;/h5&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;Discussing the critical optimization factors for GenAI chipmakers. We will explore how chipmakers differentiate their products using novel architectures, packaging strategies, and optimization techniques that target performance, power efficiency, and reliability.&lt;/span&gt;&lt;/p&gt; 
&lt;/div&gt;  
&lt;img src="https://track.hubspot.com/__ptq.gif?a=7884687&amp;amp;k=14&amp;amp;r=https%3A%2F%2Fwww.proteantecs.com%2Fblog%2Fthe-painful-reality-of-scaling-cloud-ai&amp;amp;bu=https%253A%252F%252Fwww.proteantecs.com%252Fblog&amp;amp;bvt=rss" alt="" width="1" height="1" style="min-height:1px!important;width:1px!important;border-width:0!important;margin-top:0!important;margin-bottom:0!important;margin-right:0!important;margin-left:0!important;padding-top:0!important;padding-bottom:0!important;padding-right:0!important;padding-left:0!important; "&gt;</content:encoded>
      <pubDate>Mon, 30 Jun 2025 15:00:00 GMT</pubDate>
      <guid>https://www.proteantecs.com/blog/the-painful-reality-of-scaling-cloud-ai</guid>
      <dc:date>2025-06-30T15:00:00Z</dc:date>
      <dc:creator>Admin</dc:creator>
    </item>
    <item>
      <title>GenAI's Breakneck Pace is Reshaping the Semiconductor Industry | proteanTecs Blog</title>
      <link>https://www.proteantecs.com/blog/genais-breakneck-pace-is-reshaping-the-semiconductor-industry</link>
      <description>&lt;div class="hs-featured-image-wrapper"&gt; 
 &lt;a href="https://www.proteantecs.com/blog/genais-breakneck-pace-is-reshaping-the-semiconductor-industry" title="" class="hs-featured-image-link"&gt; &lt;img src="https://www.proteantecs.com/hubfs/Website/Pages/Blogs%20Images%20and%20Thumbnail/7%20-%20GenAIs%20Breakneck%20Pace.png" alt="GenAI's Breakneck Pace is Reshaping the Semiconductor Industry | proteanTecs Blog" class="hs-featured-image" style="width:auto !important; max-width:50%; float:left; margin:0 15px 15px 0;"&gt; &lt;/a&gt; 
&lt;/div&gt; 
&lt;h4 style="line-height: 31px; color: #474c68; background-color: #ffffff;"&gt;&lt;span style="color: #474c68;"&gt;GenAI’s Explosive Pace Is Shattering the Semiconductor Landscape&lt;/span&gt;&lt;/h4&gt; 
&lt;p&gt;&amp;nbsp;&lt;/p&gt;</description>
      <content:encoded>&lt;div class="hs-featured-image-wrapper"&gt; 
 &lt;a href="https://www.proteantecs.com/blog/genais-breakneck-pace-is-reshaping-the-semiconductor-industry" title="" class="hs-featured-image-link"&gt; &lt;img src="https://www.proteantecs.com/hubfs/Website/Pages/Blogs%20Images%20and%20Thumbnail/7%20-%20GenAIs%20Breakneck%20Pace.png" alt="GenAI's Breakneck Pace is Reshaping the Semiconductor Industry | proteanTecs Blog" class="hs-featured-image" style="width:auto !important; max-width:50%; float:left; margin:0 15px 15px 0;"&gt; &lt;/a&gt; 
&lt;/div&gt; 
&lt;h4 style="line-height: 31px; color: #474c68; background-color: #ffffff;"&gt;&lt;span style="color: #474c68;"&gt;GenAI’s Explosive Pace Is Shattering the Semiconductor Landscape&lt;/span&gt;&lt;/h4&gt; 
&lt;p&gt;&amp;nbsp;&lt;/p&gt;  
&lt;img src="https://track.hubspot.com/__ptq.gif?a=7884687&amp;amp;k=14&amp;amp;r=https%3A%2F%2Fwww.proteantecs.com%2Fblog%2Fgenais-breakneck-pace-is-reshaping-the-semiconductor-industry&amp;amp;bu=https%253A%252F%252Fwww.proteantecs.com%252Fblog&amp;amp;bvt=rss" alt="" width="1" height="1" style="min-height:1px!important;width:1px!important;border-width:0!important;margin-top:0!important;margin-bottom:0!important;margin-right:0!important;margin-left:0!important;padding-top:0!important;padding-bottom:0!important;padding-right:0!important;padding-left:0!important; "&gt;</content:encoded>
      <pubDate>Wed, 25 Jun 2025 15:00:00 GMT</pubDate>
      <guid>https://www.proteantecs.com/blog/genais-breakneck-pace-is-reshaping-the-semiconductor-industry</guid>
      <dc:date>2025-06-25T15:00:00Z</dc:date>
      <dc:creator>Admin</dc:creator>
    </item>
    <item>
      <title>Can Your ATPG Do This? Cut Defects Escaping Detection With ML | proteanTecs Blog</title>
      <link>https://www.proteantecs.com/blog/can-your-atpg-do-this-cut-defects-escaping-detection-with-ml</link>
      <description>&lt;div class="hs-featured-image-wrapper"&gt; 
 &lt;a href="https://www.proteantecs.com/blog/can-your-atpg-do-this-cut-defects-escaping-detection-with-ml" title="" class="hs-featured-image-link"&gt; &lt;img src="https://www.proteantecs.com/hubfs/Website/Pages/Blogs%20Images%20and%20Thumbnail/8%20-%20Can%20Your%20ATPG%20Do%20This.png" alt="Can Your ATPG Do This? Cut Defects Escaping Detection With ML | proteanTecs Blog" class="hs-featured-image" style="width:auto !important; max-width:50%; float:left; margin:0 15px 15px 0;"&gt; &lt;/a&gt; 
&lt;/div&gt; 
&lt;h4 style="line-height: 31px; color: #474c68; background-color: #ffffff; text-align: center;"&gt;&lt;span style="color: #474c68;"&gt;Identify early indicators of risk by analyzing timing margin data from within the chip.&lt;br&gt;&lt;br&gt;&lt;/span&gt;&lt;/h4&gt; 
&lt;div style="color: #474c68; background-color: #ffffff;"&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;Chipmakers worldwide consider Automatic Test Pattern Generation (ATPG) their go-to method for achieving high test coverage in production. ATPG generates test patterns designed to detect faults in the silicon and ensures they are applied effectively using the chip’s Design-for-Test (DFT) infrastructure. This combination enhances fault detection while optimizing test efficiency.&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;These patterns are injected by Automatic Test Equipment (ATE) into each die during high-volume manufacturing (HVM), enabling solid quality control through large-scale testing of all chips.&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;ATPG at speed tests are targeted for different kinds of faults (e.g., transition faults, small delay faults) and have earned their spot in the semiconductor testing hall of fame—but what about their limitations? This article explores the risks and remedies of ATPG drawbacks to help you create a robust test program that cuts defects without affecting yield.&lt;/span&gt;&lt;/p&gt; 
 &lt;h5 style="line-height: 44px; font-weight: bold;"&gt;&lt;span style="color: #474c68;"&gt;Understanding ATPG’s limitations and their impact&lt;/span&gt;&lt;/h5&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;If you’re worried about your test patterns letting defects slip through, you’re not alone. Despite its advantages, conventional ATPG may not catch small, latent and marginal defects, while even creating false positives/negatives:&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;&lt;strong&gt;Latent/marginal defects: A threat to product reliability&lt;/strong&gt;&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;&lt;strong&gt;‍&lt;br&gt;&lt;/strong&gt;One of the major concerns is defects that are too subtle for the pass/fail granularity of ATPG results. The marginal performance of such chips is just enough to pass all patterns on ATE, yet they are “walking-wounded” devices.&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;These issues often escape detection until customers discover them in the field. For example, undetected defects that potentially cause Silent Data Corruption (SDC) might lead to costly post-release issues that jeopardize product reliability and customer trust. They can also cost as much as $50,000 per RMA, not counting lost reputation and resources allocated from other projects to investigate. You can read more about such faults and their remedies in &lt;a href="https://proteantecs-7884687-hs-sites-com.sandbox.hs-sites.com/resources/whitepaper-outsmarting-silent-data-corruption-in-ai-processors-with-two-stage-detection" style="color: #474c68;"&gt;this whitepaper&lt;/a&gt;.&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;&lt;strong&gt;Misalignment between ATPG and real-life conditions&lt;/strong&gt;&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;Another inherent limitation is the potential misalignment between test patterns and real-world scenarios, raising doubts about whether ATPG truly reflects the conditions a chip will face during lifetime operation.&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;To compensate for this limitation, chipmakers may tighten test thresholds, but this can lead to two risks. Overly stringent testing (overkill) may generate unrealistic patterns that cause unnecessary failures at ATE, reducing yield without real benefit. On the other hand, insufficiently representative patterns (underkill) may overlook defects that could emerge under actual workloads, leading to field failures.&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;Striking the right balance is critical to ensuring both high yield and long-term reliability.&lt;/span&gt;&lt;/p&gt; 
 &lt;h5 style="line-height: 44px;"&gt;&lt;span style="color: #474c68;"&gt;What if you had high coverage timing margin data from within the chip?&lt;/span&gt;&lt;/h5&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;Many latent faults in the field exhibit abnormal behavior that can evolve into future timing violations. These defects often escape detection due to ATPG’s limitations in capturing subtleties. Thankfully, by analyzing timing margin data from within the chip, it’s possible to identify early indicators of risk, addressing blind spots and strengthening confidence in the test program.&lt;/span&gt;&lt;/p&gt;  
 &lt;div style="color: rgba(0, 0, 0, 0); padding-left: 200px;"&gt; 
  &lt;span style="color: #474c68;"&gt;&lt;br&gt;&lt;/span&gt; 
 &lt;/div&gt;  
 &lt;span style="color: #474c68;"&gt;&lt;em&gt;Parametric margin data from within the chip mitigates ATPG limitations by tackling their causes.&lt;/em&gt;&lt;/span&gt;   
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;‍&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;The result? Imagine a robust test program that catches all those marginal issues in advance. Thanks to powerful machine learning (ML) algorithms, you could analyze high-coverage timing margin data with unprecedented visibility into every die. The ML model can be loaded onto the ATE to eliminate the blind spots of your ATPG patterns automatically.&lt;/span&gt;&lt;/p&gt; 
 &lt;h5 style="line-height: 44px;"&gt;&lt;span style="color: #474c68;"&gt;Timing margin visibility: Enhancing quality with ML precision&lt;/span&gt;&lt;/h5&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;Using proteanTecs’ Margin Agents (MA), designed to boost quality without compromising yield during structural tests, the minimum margin to operating frequency of millions of paths is measured, and critical issues are pinpointed per die. By analyzing parametric timing data, these Margin Agents tackle the inherent limitations of ATPG head-on. &lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;The solution learns the normal behavior by processing margin agent readings using ML and can identify anomalies undetectable by ATPG.&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;The solution includes a cloud-based deep data analytics platform and edge software deployed on the ATE. It leverages advanced machine learning algorithms in the cloud to analyze timing margin measurements. It trains on extensive data to profile normal behavior across different operating conditions and the process distribution. Then the trained models are deployed to the edge, for inline decisions on the test floor. By generating a highly accurate predicted timing margin values across the chip, it can detect subtle deviations that ATPG would miss. If the measured timing margin deviates from the predicted value, the chip is flagged as an outlier, allowing preventive action before it reaches the field.&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;‍&lt;/span&gt;&lt;/p&gt;  
 &lt;div style="color: rgba(0, 0, 0, 0);"&gt; 
  &lt;span style="color: #474c68;"&gt;&lt;/span&gt; 
 &lt;/div&gt;  
 &lt;span style="color: #474c68;"&gt;&lt;em&gt;Combining on-chip agent reading with precise Machine Learning models deployed at ATE.&lt;/em&gt;&lt;/span&gt;   
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;‍&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;The solution integrates seamlessly with your workflow:&lt;/span&gt;&lt;/p&gt; 
 &lt;ul&gt; 
  &lt;li style="color: #697694;"&gt;&lt;span style="color: #474c68;"&gt;&lt;strong&gt;On-chip timing margin monitors: &lt;/strong&gt;proteanTecs Margin Agents capture real-time timing margin data from millions of logic paths, which serves as a baseline for ML model creation.&lt;/span&gt;&lt;/li&gt; 
  &lt;li style="color: #697694;"&gt;&lt;span style="color: #474c68;"&gt;&lt;strong&gt;Cloud-based deep data analytics platform&lt;/strong&gt;: Processes massive datasets with ML to train a model that learns the normal behavior, enabling the detection of anomalies beyond the scope of ATPG’s pass/fail metrics.&lt;/span&gt;&lt;/li&gt; 
  &lt;li style="color: #697694;"&gt;&lt;span style="color: #474c68;"&gt;&lt;strong&gt;Edge software on the ATE&lt;/strong&gt;: Automates the detection and classification of faulty dies on the ATE by combining real-time margin measurements with a trained model. This enables identification of latent defects and eliminates ATPG blind spots during high-volume manufacturing.&lt;/span&gt;&lt;/li&gt; 
 &lt;/ul&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;This&amp;nbsp;powerful combination ensures unprecedented visibility into every die, reducing DPPM, preventing costly RMAs, and driving confidence in your test program.&lt;/span&gt;&lt;/p&gt; 
 &lt;h5 style="line-height: 44px;"&gt;&lt;span style="color: #474c68;"&gt;Eliminating your ATPG blind spots to reduce DPPM and RMA-related costs&lt;/span&gt;&lt;/h5&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;proteanTecs MA-based outlier detection can prevent the escapes of marginal and latent defects characteristic of complex designs and advanced nodes. Such issues might pass conventional ATPG tests as they are too subtle to detect, yet they can cause hardware failures in the field. The shift left that timing margin measurements enable directly reduces DPPM and RMA-related costs, by moving detection from the field to production testing.&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;As depicted below, the new data can help to make informed decisions regarding quality. A close examination of the wafer-level testing results to the left reveals that a faulty outlier which had enough margin to pass all ATPG patterns, including at-speed patterns, has outlier behavior from the expected behavior. Following the detection of the outlier die, the software pinpoints the location in the chip where the problem occurred.&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;‍&lt;/span&gt;&lt;/p&gt;  
 &lt;div style="color: rgba(0, 0, 0, 0);"&gt; 
  &lt;span style="color: #474c68;"&gt;&lt;br&gt;&lt;/span&gt; 
 &lt;/div&gt;  
 &lt;span style="color: #474c68;"&gt;&lt;em&gt;Reducing DPPM while simplifying defect investigation: proteanTecs MA-based outlier detection uses ML to identify faulty outliers undetectable by ATPG and then pinpoints the exact location of the problem in the chip.&lt;/em&gt;&lt;/span&gt;   
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;‍&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;Customers report a significant DPPM reduction thanks to proteanTecs MA, detecting chips with timing margin issues that passed all ATPG tests.&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;Customers report a significant DPPM reduction thanks to proteanTecs MA based outlier detection. For one datacenter chipmaker, despite their high risk of failure, some devices passed all ATPG tests, in fact all production tests, as their performance was marginal rather than unacceptable. After integrating proteanTecs’ solution, the same chips showed lower-than-expected timing margin measurements, leading to their disqualification. If undetected, these units were likely to suffer timing violations that could cause Silent Data Errors after some in-field usage.&lt;/span&gt;&lt;/p&gt; 
 &lt;h5 style="line-height: 44px;"&gt;&lt;span style="color: #474c68;"&gt;Correlating your ATPG and functional tests to reflect real-life conditions&lt;/span&gt;&lt;/h5&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;During New Product Introduction (NPI), it is essential to establish a solid test program for High Volume Manufacturing (HVM) testing with ATPG patterns and functional system-level tests (SLT), or even System tests. As explained above, the ATPG patterns might not reflect real workloads, unlike functional tests, potentially hurting yield and DPPM.&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;To mitigate this misalignment, proteanTecs helps to correlate ATPG patterns and functional workloads by comparing their timing margin measurements, provided by the Margin Agents, on the same devices. There are two options for the alignment process depending on the comparison results:&lt;/span&gt;&lt;/p&gt; 
 &lt;ul&gt; 
  &lt;li style="color: #697694;"&gt;&lt;span style="color: #474c68;"&gt;&lt;strong&gt;ATPG timing margins are worse than functional test ones:&lt;/strong&gt; In this case, ATPG results may be overstressing (from a performance point of view). For example, running ATPG at-speed patterns on the entire chip can cause unnatural IR drops that won’t occur in functional tests. To fix the problem, the patterns can be adjusted to reduce false fallout without compromising quality.&lt;/span&gt;&lt;/li&gt; 
  &lt;li style="color: #697694;"&gt;&lt;span style="color: #474c68;"&gt;&lt;strong&gt;Functional timing margins are worse than ATPG ones:&lt;/strong&gt; This case is dangerously misleading, making it seem like the chip is doing well, as it passed all test patterns successfully. However, timing margin measurements would reveal insufficient ATPG at-speed coverage instead, calling for additional test patterns that reflect actual functionality.&lt;/span&gt;&lt;/li&gt; 
 &lt;/ul&gt;  
 &lt;div style="color: rgba(0, 0, 0, 0);"&gt; 
  &lt;span style="color: #474c68;"&gt;&lt;/span&gt; 
 &lt;/div&gt;  
 &lt;span style="color: #474c68;"&gt;&lt;em&gt;The proteanTecs solution correlates margin agent data of wafer-level chip probing (left bar) and system-level test (right bar) to help reflect real-life conditions in ATPG patterns.&lt;/em&gt;&lt;/span&gt;   
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;‍&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;For example, the Margin Agent measurements above show that wafer-level ATPG timing margins are much higher than functional ones on average. These results imply that ATPG patterns fail to reflect real workloads, potentially leading to systematic failures in the field. When the chipmaker noticed, the test engineering team worked to extend ATPG patterns until their margins were aligned with functional ones.‍&lt;/span&gt;&lt;/p&gt; 
 &lt;h5 style="line-height: 44px;"&gt;&lt;span style="color: #474c68;"&gt;Taking ATPG to the field&lt;/span&gt;&lt;/h5&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;You can also use timing margin monitoring when the chip is in the field beyond NPI and HVM. This approach is aligned with the trend of running ATPG in the field at some pre-defined testing cycles or during SLT. This is called “In-System Test.”&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;In case of In-System Test in the Field, the timing margin information provided by the Margin Agents can once again show how close to failure a device is, even if it passes the In-System Test. The Margin Agents are capable of measuring while the device is operating real workloads. In this case, the timing margin monitoring is available both while the device is operating and executing real workloads and when running deterministic ATPG tests during In-System test cycles.‍&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;In case a malfunctioning chip returns as RMA, you can compare its timing margins across three different measurements:&lt;/span&gt;&lt;/p&gt; 
 &lt;ul&gt; 
  &lt;li style="color: #697694;"&gt;&lt;span style="color: #474c68;"&gt;Original ATPG results during HVM&lt;/span&gt;&lt;/li&gt; 
  &lt;li style="color: #697694;"&gt;&lt;span style="color: #474c68;"&gt;Functional mode in the field&lt;/span&gt;&lt;/li&gt; 
  &lt;li style="color: #697694;"&gt;&lt;span style="color: #474c68;"&gt;Post-RMA ATPG results&lt;/span&gt;&lt;/li&gt; 
 &lt;/ul&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;This approach can accelerate root-cause analysis, supporting test program improvements and design optimizations.&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;Ready to cut your DPPM and shift left defect detection? Download our exclusive &lt;a href="https://proteantecs-7884687-hs-sites-com.sandbox.hs-sites.com/resources/whitepaper-cut-defects_not-yield_outlier-detection-with-ml-precision" style="color: #474c68;"&gt;whitepaper&lt;/a&gt; or contact our team today at this &lt;a href="https://www.proteantecs.com/contact-us/?utm_source=atpg_blog&amp;amp;utm_campaign=stma&amp;amp;utm_medium=contact-us" style="color: #474c68;"&gt;link&lt;/a&gt;.&lt;/span&gt;&lt;/p&gt; 
&lt;/div&gt;</description>
      <content:encoded>&lt;div class="hs-featured-image-wrapper"&gt; 
 &lt;a href="https://www.proteantecs.com/blog/can-your-atpg-do-this-cut-defects-escaping-detection-with-ml" title="" class="hs-featured-image-link"&gt; &lt;img src="https://www.proteantecs.com/hubfs/Website/Pages/Blogs%20Images%20and%20Thumbnail/8%20-%20Can%20Your%20ATPG%20Do%20This.png" alt="Can Your ATPG Do This? Cut Defects Escaping Detection With ML | proteanTecs Blog" class="hs-featured-image" style="width:auto !important; max-width:50%; float:left; margin:0 15px 15px 0;"&gt; &lt;/a&gt; 
&lt;/div&gt; 
&lt;h4 style="line-height: 31px; color: #474c68; background-color: #ffffff; text-align: center;"&gt;&lt;span style="color: #474c68;"&gt;Identify early indicators of risk by analyzing timing margin data from within the chip.&lt;br&gt;&lt;br&gt;&lt;/span&gt;&lt;/h4&gt; 
&lt;div style="color: #474c68; background-color: #ffffff;"&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;Chipmakers worldwide consider Automatic Test Pattern Generation (ATPG) their go-to method for achieving high test coverage in production. ATPG generates test patterns designed to detect faults in the silicon and ensures they are applied effectively using the chip’s Design-for-Test (DFT) infrastructure. This combination enhances fault detection while optimizing test efficiency.&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;These patterns are injected by Automatic Test Equipment (ATE) into each die during high-volume manufacturing (HVM), enabling solid quality control through large-scale testing of all chips.&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;ATPG at speed tests are targeted for different kinds of faults (e.g., transition faults, small delay faults) and have earned their spot in the semiconductor testing hall of fame—but what about their limitations? This article explores the risks and remedies of ATPG drawbacks to help you create a robust test program that cuts defects without affecting yield.&lt;/span&gt;&lt;/p&gt; 
 &lt;h5 style="line-height: 44px; font-weight: bold;"&gt;&lt;span style="color: #474c68;"&gt;Understanding ATPG’s limitations and their impact&lt;/span&gt;&lt;/h5&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;If you’re worried about your test patterns letting defects slip through, you’re not alone. Despite its advantages, conventional ATPG may not catch small, latent and marginal defects, while even creating false positives/negatives:&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;&lt;strong&gt;Latent/marginal defects: A threat to product reliability&lt;/strong&gt;&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;&lt;strong&gt;‍&lt;br&gt;&lt;/strong&gt;One of the major concerns is defects that are too subtle for the pass/fail granularity of ATPG results. The marginal performance of such chips is just enough to pass all patterns on ATE, yet they are “walking-wounded” devices.&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;These issues often escape detection until customers discover them in the field. For example, undetected defects that potentially cause Silent Data Corruption (SDC) might lead to costly post-release issues that jeopardize product reliability and customer trust. They can also cost as much as $50,000 per RMA, not counting lost reputation and resources allocated from other projects to investigate. You can read more about such faults and their remedies in &lt;a href="https://proteantecs-7884687-hs-sites-com.sandbox.hs-sites.com/resources/whitepaper-outsmarting-silent-data-corruption-in-ai-processors-with-two-stage-detection" style="color: #474c68;"&gt;this whitepaper&lt;/a&gt;.&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;&lt;strong&gt;Misalignment between ATPG and real-life conditions&lt;/strong&gt;&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;Another inherent limitation is the potential misalignment between test patterns and real-world scenarios, raising doubts about whether ATPG truly reflects the conditions a chip will face during lifetime operation.&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;To compensate for this limitation, chipmakers may tighten test thresholds, but this can lead to two risks. Overly stringent testing (overkill) may generate unrealistic patterns that cause unnecessary failures at ATE, reducing yield without real benefit. On the other hand, insufficiently representative patterns (underkill) may overlook defects that could emerge under actual workloads, leading to field failures.&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;Striking the right balance is critical to ensuring both high yield and long-term reliability.&lt;/span&gt;&lt;/p&gt; 
 &lt;h5 style="line-height: 44px;"&gt;&lt;span style="color: #474c68;"&gt;What if you had high coverage timing margin data from within the chip?&lt;/span&gt;&lt;/h5&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;Many latent faults in the field exhibit abnormal behavior that can evolve into future timing violations. These defects often escape detection due to ATPG’s limitations in capturing subtleties. Thankfully, by analyzing timing margin data from within the chip, it’s possible to identify early indicators of risk, addressing blind spots and strengthening confidence in the test program.&lt;/span&gt;&lt;/p&gt;  
 &lt;div style="color: rgba(0, 0, 0, 0); padding-left: 200px;"&gt; 
  &lt;span style="color: #474c68;"&gt;&lt;br&gt;&lt;/span&gt; 
 &lt;/div&gt;  
 &lt;span style="color: #474c68;"&gt;&lt;em&gt;Parametric margin data from within the chip mitigates ATPG limitations by tackling their causes.&lt;/em&gt;&lt;/span&gt;   
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;‍&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;The result? Imagine a robust test program that catches all those marginal issues in advance. Thanks to powerful machine learning (ML) algorithms, you could analyze high-coverage timing margin data with unprecedented visibility into every die. The ML model can be loaded onto the ATE to eliminate the blind spots of your ATPG patterns automatically.&lt;/span&gt;&lt;/p&gt; 
 &lt;h5 style="line-height: 44px;"&gt;&lt;span style="color: #474c68;"&gt;Timing margin visibility: Enhancing quality with ML precision&lt;/span&gt;&lt;/h5&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;Using proteanTecs’ Margin Agents (MA), designed to boost quality without compromising yield during structural tests, the minimum margin to operating frequency of millions of paths is measured, and critical issues are pinpointed per die. By analyzing parametric timing data, these Margin Agents tackle the inherent limitations of ATPG head-on. &lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;The solution learns the normal behavior by processing margin agent readings using ML and can identify anomalies undetectable by ATPG.&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;The solution includes a cloud-based deep data analytics platform and edge software deployed on the ATE. It leverages advanced machine learning algorithms in the cloud to analyze timing margin measurements. It trains on extensive data to profile normal behavior across different operating conditions and the process distribution. Then the trained models are deployed to the edge, for inline decisions on the test floor. By generating a highly accurate predicted timing margin values across the chip, it can detect subtle deviations that ATPG would miss. If the measured timing margin deviates from the predicted value, the chip is flagged as an outlier, allowing preventive action before it reaches the field.&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;‍&lt;/span&gt;&lt;/p&gt;  
 &lt;div style="color: rgba(0, 0, 0, 0);"&gt; 
  &lt;span style="color: #474c68;"&gt;&lt;/span&gt; 
 &lt;/div&gt;  
 &lt;span style="color: #474c68;"&gt;&lt;em&gt;Combining on-chip agent reading with precise Machine Learning models deployed at ATE.&lt;/em&gt;&lt;/span&gt;   
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;‍&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;The solution integrates seamlessly with your workflow:&lt;/span&gt;&lt;/p&gt; 
 &lt;ul&gt; 
  &lt;li style="color: #697694;"&gt;&lt;span style="color: #474c68;"&gt;&lt;strong&gt;On-chip timing margin monitors: &lt;/strong&gt;proteanTecs Margin Agents capture real-time timing margin data from millions of logic paths, which serves as a baseline for ML model creation.&lt;/span&gt;&lt;/li&gt; 
  &lt;li style="color: #697694;"&gt;&lt;span style="color: #474c68;"&gt;&lt;strong&gt;Cloud-based deep data analytics platform&lt;/strong&gt;: Processes massive datasets with ML to train a model that learns the normal behavior, enabling the detection of anomalies beyond the scope of ATPG’s pass/fail metrics.&lt;/span&gt;&lt;/li&gt; 
  &lt;li style="color: #697694;"&gt;&lt;span style="color: #474c68;"&gt;&lt;strong&gt;Edge software on the ATE&lt;/strong&gt;: Automates the detection and classification of faulty dies on the ATE by combining real-time margin measurements with a trained model. This enables identification of latent defects and eliminates ATPG blind spots during high-volume manufacturing.&lt;/span&gt;&lt;/li&gt; 
 &lt;/ul&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;This&amp;nbsp;powerful combination ensures unprecedented visibility into every die, reducing DPPM, preventing costly RMAs, and driving confidence in your test program.&lt;/span&gt;&lt;/p&gt; 
 &lt;h5 style="line-height: 44px;"&gt;&lt;span style="color: #474c68;"&gt;Eliminating your ATPG blind spots to reduce DPPM and RMA-related costs&lt;/span&gt;&lt;/h5&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;proteanTecs MA-based outlier detection can prevent the escapes of marginal and latent defects characteristic of complex designs and advanced nodes. Such issues might pass conventional ATPG tests as they are too subtle to detect, yet they can cause hardware failures in the field. The shift left that timing margin measurements enable directly reduces DPPM and RMA-related costs, by moving detection from the field to production testing.&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;As depicted below, the new data can help to make informed decisions regarding quality. A close examination of the wafer-level testing results to the left reveals that a faulty outlier which had enough margin to pass all ATPG patterns, including at-speed patterns, has outlier behavior from the expected behavior. Following the detection of the outlier die, the software pinpoints the location in the chip where the problem occurred.&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;‍&lt;/span&gt;&lt;/p&gt;  
 &lt;div style="color: rgba(0, 0, 0, 0);"&gt; 
  &lt;span style="color: #474c68;"&gt;&lt;br&gt;&lt;/span&gt; 
 &lt;/div&gt;  
 &lt;span style="color: #474c68;"&gt;&lt;em&gt;Reducing DPPM while simplifying defect investigation: proteanTecs MA-based outlier detection uses ML to identify faulty outliers undetectable by ATPG and then pinpoints the exact location of the problem in the chip.&lt;/em&gt;&lt;/span&gt;   
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;‍&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;Customers report a significant DPPM reduction thanks to proteanTecs MA, detecting chips with timing margin issues that passed all ATPG tests.&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;Customers report a significant DPPM reduction thanks to proteanTecs MA based outlier detection. For one datacenter chipmaker, despite their high risk of failure, some devices passed all ATPG tests, in fact all production tests, as their performance was marginal rather than unacceptable. After integrating proteanTecs’ solution, the same chips showed lower-than-expected timing margin measurements, leading to their disqualification. If undetected, these units were likely to suffer timing violations that could cause Silent Data Errors after some in-field usage.&lt;/span&gt;&lt;/p&gt; 
 &lt;h5 style="line-height: 44px;"&gt;&lt;span style="color: #474c68;"&gt;Correlating your ATPG and functional tests to reflect real-life conditions&lt;/span&gt;&lt;/h5&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;During New Product Introduction (NPI), it is essential to establish a solid test program for High Volume Manufacturing (HVM) testing with ATPG patterns and functional system-level tests (SLT), or even System tests. As explained above, the ATPG patterns might not reflect real workloads, unlike functional tests, potentially hurting yield and DPPM.&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;To mitigate this misalignment, proteanTecs helps to correlate ATPG patterns and functional workloads by comparing their timing margin measurements, provided by the Margin Agents, on the same devices. There are two options for the alignment process depending on the comparison results:&lt;/span&gt;&lt;/p&gt; 
 &lt;ul&gt; 
  &lt;li style="color: #697694;"&gt;&lt;span style="color: #474c68;"&gt;&lt;strong&gt;ATPG timing margins are worse than functional test ones:&lt;/strong&gt; In this case, ATPG results may be overstressing (from a performance point of view). For example, running ATPG at-speed patterns on the entire chip can cause unnatural IR drops that won’t occur in functional tests. To fix the problem, the patterns can be adjusted to reduce false fallout without compromising quality.&lt;/span&gt;&lt;/li&gt; 
  &lt;li style="color: #697694;"&gt;&lt;span style="color: #474c68;"&gt;&lt;strong&gt;Functional timing margins are worse than ATPG ones:&lt;/strong&gt; This case is dangerously misleading, making it seem like the chip is doing well, as it passed all test patterns successfully. However, timing margin measurements would reveal insufficient ATPG at-speed coverage instead, calling for additional test patterns that reflect actual functionality.&lt;/span&gt;&lt;/li&gt; 
 &lt;/ul&gt;  
 &lt;div style="color: rgba(0, 0, 0, 0);"&gt; 
  &lt;span style="color: #474c68;"&gt;&lt;/span&gt; 
 &lt;/div&gt;  
 &lt;span style="color: #474c68;"&gt;&lt;em&gt;The proteanTecs solution correlates margin agent data of wafer-level chip probing (left bar) and system-level test (right bar) to help reflect real-life conditions in ATPG patterns.&lt;/em&gt;&lt;/span&gt;   
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;‍&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;For example, the Margin Agent measurements above show that wafer-level ATPG timing margins are much higher than functional ones on average. These results imply that ATPG patterns fail to reflect real workloads, potentially leading to systematic failures in the field. When the chipmaker noticed, the test engineering team worked to extend ATPG patterns until their margins were aligned with functional ones.‍&lt;/span&gt;&lt;/p&gt; 
 &lt;h5 style="line-height: 44px;"&gt;&lt;span style="color: #474c68;"&gt;Taking ATPG to the field&lt;/span&gt;&lt;/h5&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;You can also use timing margin monitoring when the chip is in the field beyond NPI and HVM. This approach is aligned with the trend of running ATPG in the field at some pre-defined testing cycles or during SLT. This is called “In-System Test.”&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;In case of In-System Test in the Field, the timing margin information provided by the Margin Agents can once again show how close to failure a device is, even if it passes the In-System Test. The Margin Agents are capable of measuring while the device is operating real workloads. In this case, the timing margin monitoring is available both while the device is operating and executing real workloads and when running deterministic ATPG tests during In-System test cycles.‍&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;In case a malfunctioning chip returns as RMA, you can compare its timing margins across three different measurements:&lt;/span&gt;&lt;/p&gt; 
 &lt;ul&gt; 
  &lt;li style="color: #697694;"&gt;&lt;span style="color: #474c68;"&gt;Original ATPG results during HVM&lt;/span&gt;&lt;/li&gt; 
  &lt;li style="color: #697694;"&gt;&lt;span style="color: #474c68;"&gt;Functional mode in the field&lt;/span&gt;&lt;/li&gt; 
  &lt;li style="color: #697694;"&gt;&lt;span style="color: #474c68;"&gt;Post-RMA ATPG results&lt;/span&gt;&lt;/li&gt; 
 &lt;/ul&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;This approach can accelerate root-cause analysis, supporting test program improvements and design optimizations.&lt;/span&gt;&lt;/p&gt; 
 &lt;p style="color: #697694; line-height: 27px;"&gt;&lt;span style="color: #474c68;"&gt;Ready to cut your DPPM and shift left defect detection? Download our exclusive &lt;a href="https://proteantecs-7884687-hs-sites-com.sandbox.hs-sites.com/resources/whitepaper-cut-defects_not-yield_outlier-detection-with-ml-precision" style="color: #474c68;"&gt;whitepaper&lt;/a&gt; or contact our team today at this &lt;a href="https://www.proteantecs.com/contact-us/?utm_source=atpg_blog&amp;amp;utm_campaign=stma&amp;amp;utm_medium=contact-us" style="color: #474c68;"&gt;link&lt;/a&gt;.&lt;/span&gt;&lt;/p&gt; 
&lt;/div&gt;  
&lt;img src="https://track.hubspot.com/__ptq.gif?a=7884687&amp;amp;k=14&amp;amp;r=https%3A%2F%2Fwww.proteantecs.com%2Fblog%2Fcan-your-atpg-do-this-cut-defects-escaping-detection-with-ml&amp;amp;bu=https%253A%252F%252Fwww.proteantecs.com%252Fblog&amp;amp;bvt=rss" alt="" width="1" height="1" style="min-height:1px!important;width:1px!important;border-width:0!important;margin-top:0!important;margin-bottom:0!important;margin-right:0!important;margin-left:0!important;padding-top:0!important;padding-bottom:0!important;padding-right:0!important;padding-left:0!important; "&gt;</content:encoded>
      <pubDate>Thu, 01 May 2025 15:00:00 GMT</pubDate>
      <guid>https://www.proteantecs.com/blog/can-your-atpg-do-this-cut-defects-escaping-detection-with-ml</guid>
      <dc:date>2025-05-01T15:00:00Z</dc:date>
      <dc:creator>Admin</dc:creator>
    </item>
    <item>
      <title>By integrating with Arm SMCF, proteanTecs strengthens its offering of predictive deep data solutions that span the entire deployment of Neoverse CSS-based custom SoCs.  | proteanTecs Blog</title>
      <link>https://www.proteantecs.com/blog/expanding-the-horizon-of-system-monitoring-with-the-arm-smcf</link>
      <description>&lt;div class="hs-featured-image-wrapper"&gt; 
 &lt;a href="https://www.proteantecs.com/blog/expanding-the-horizon-of-system-monitoring-with-the-arm-smcf" title="" class="hs-featured-image-link"&gt; &lt;img src="https://www.proteantecs.com/hubfs/blog%20-%20Expanding%20the%20Horizon%20of%20System%20Monitoring%20with%20the%20Arm%20SMCF.jpg" alt="By integrating with Arm SMCF, proteanTecs strengthens its offering of predictive deep data solutions that span the entire deployment of Neoverse CSS-based custom SoCs.  | proteanTecs Blog" class="hs-featured-image" style="width:auto !important; max-width:50%; float:left; margin:0 15px 15px 0;"&gt; &lt;/a&gt; 
&lt;/div&gt; 
&lt;h4&gt;Rising Complexity in System Design&lt;/h4&gt; 
&lt;p&gt;In an era where system complexity is scaling rapidly, real-time monitoring and predictive analytics play a pivotal role in maintaining lifetime performance and reliability. At proteanTecs, we are committed to enabling advanced diagnostics, predictive maintenance, and on-chip actionable visibility for today’s mission-critical systems, across high-performance industries.&lt;/p&gt;</description>
      <content:encoded>&lt;div class="hs-featured-image-wrapper"&gt; 
 &lt;a href="https://www.proteantecs.com/blog/expanding-the-horizon-of-system-monitoring-with-the-arm-smcf" title="" class="hs-featured-image-link"&gt; &lt;img src="https://www.proteantecs.com/hubfs/blog%20-%20Expanding%20the%20Horizon%20of%20System%20Monitoring%20with%20the%20Arm%20SMCF.jpg" alt="By integrating with Arm SMCF, proteanTecs strengthens its offering of predictive deep data solutions that span the entire deployment of Neoverse CSS-based custom SoCs.  | proteanTecs Blog" class="hs-featured-image" style="width:auto !important; max-width:50%; float:left; margin:0 15px 15px 0;"&gt; &lt;/a&gt; 
&lt;/div&gt; 
&lt;h4&gt;Rising Complexity in System Design&lt;/h4&gt; 
&lt;p&gt;In an era where system complexity is scaling rapidly, real-time monitoring and predictive analytics play a pivotal role in maintaining lifetime performance and reliability. At proteanTecs, we are committed to enabling advanced diagnostics, predictive maintenance, and on-chip actionable visibility for today’s mission-critical systems, across high-performance industries.&lt;/p&gt;  
&lt;img src="https://track.hubspot.com/__ptq.gif?a=7884687&amp;amp;k=14&amp;amp;r=https%3A%2F%2Fwww.proteantecs.com%2Fblog%2Fexpanding-the-horizon-of-system-monitoring-with-the-arm-smcf&amp;amp;bu=https%253A%252F%252Fwww.proteantecs.com%252Fblog&amp;amp;bvt=rss" alt="" width="1" height="1" style="min-height:1px!important;width:1px!important;border-width:0!important;margin-top:0!important;margin-bottom:0!important;margin-right:0!important;margin-left:0!important;padding-top:0!important;padding-bottom:0!important;padding-right:0!important;padding-left:0!important; "&gt;</content:encoded>
      <pubDate>Mon, 20 Jan 2025 11:15:00 GMT</pubDate>
      <guid>https://www.proteantecs.com/blog/expanding-the-horizon-of-system-monitoring-with-the-arm-smcf</guid>
      <dc:date>2025-01-20T11:15:00Z</dc:date>
      <dc:creator>Admin</dc:creator>
    </item>
  </channel>
</rss>
