GoldenFuzz: Generative Golden Reference Hardware Fuzzing
Episode

GoldenFuzz: Generative Golden Reference Hardware Fuzzing

Dec 25, 202510:32
Cryptography and Security
No ratings yet

Abstract

Modern hardware systems, driven by demands for high performance and application-specific functionality, have grown increasingly complex, introducing large surfaces for bugs and security-critical vulnerabilities. Fuzzing has emerged as a scalable solution for discovering such flaws. Yet, existing hardware fuzzers suffer from limited semantic awareness, inefficient test refinement, and high computational overhead due to reliance on slow device simulation. In this paper, we present GoldenFuzz, a novel two-stage hardware fuzzing framework that partially decouples test case refinement from coverage and vulnerability exploration. GoldenFuzz leverages a fast, ISA-compliant Golden Reference Model (GRM) as a ``digital twin'' of the Device Under Test (DUT). It fuzzes the GRM first, enabling rapid, low-cost test case refinement, accelerating deep architectural exploration and vulnerability discovery on DUT. During the fuzzing pipeline, GoldenFuzz iteratively constructs test cases by concatenating carefully chosen instruction blocks that balance the subtle inter- and intra-instructions quality. A feedback-driven mechanism leveraging insights from both high- and low-coverage samples further enhances GoldenFuzz's capability in hardware state exploration. Our evaluation of three RISC-V processors, RocketChip, BOOM, and CVA6, demonstrates that GoldenFuzz significantly outperforms existing fuzzers in achieving the highest coverage with minimal test case length and computational overhead. GoldenFuzz uncovers all known vulnerabilities and discovers five new ones, four of which are classified as highly severe with CVSS v3 severity scores exceeding seven out of ten. It also identifies two previously unknown vulnerabilities in the commercial BA51-H core extension.

Summary

The paper "GoldenFuzz: Generative Golden Reference Hardware Fuzzing" addresses the challenge of efficiently discovering bugs and security vulnerabilities in complex hardware systems. Existing hardware fuzzers struggle with limited semantic awareness, inefficient test refinement, and high computational overhead due to reliance on slow device simulations. The authors introduce GoldenFuzz, a novel two-stage hardware fuzzing framework that decouples test case refinement from coverage and vulnerability exploration. GoldenFuzz employs a fast, ISA-compliant Golden Reference Model (GRM) as a "digital twin" of the Device Under Test (DUT). In the first stage, the GRM is fuzzed, enabling rapid, low-cost test case refinement and accelerating architectural exploration. This stage uses a language model to generate instruction blocks that are carefully concatenated to balance inter- and intra-instruction quality. A feedback-driven mechanism leveraging both high- and low-coverage samples further enhances hardware state exploration. The optimized fuzzing policy is then transferred to the DUT fuzzing stage for vulnerability detection. The evaluation on three RISC-V processors (RocketChip, BOOM, and CVA6) demonstrates that GoldenFuzz significantly outperforms existing fuzzers in achieving higher coverage with minimal test case length and computational overhead. The key contribution of GoldenFuzz is the two-stage fuzzing process that uses a GRM for initial test case refinement. This approach allows for faster exploration of the hardware state space compared to traditional fuzzing methods that directly interact with the DUT. The paper shows that GoldenFuzz uncovers all known vulnerabilities in the tested processors and discovers five new ones, four of which are classified as highly severe (CVSS v3 > 7). It also identifies two previously unknown vulnerabilities in the commercial BA51-H core extension. These findings highlight the practical significance of GoldenFuzz in improving hardware security verification.

Key Insights

  • GoldenFuzz decouples test case refinement from coverage exploration by using a fast GRM, improving fuzzing efficiency.
  • The framework employs a customized language model to generate semantically aware instruction sequences, leading to more effective test cases.
  • GoldenFuzz uses a block-wise test case generation scheme that iteratively concatenates carefully chosen instruction blocks.
  • A dual-layer scoring system (intra- and inter-test case scoring) is used in the DUT fuzzing stage to incentivize the exploration of new coverage points while penalizing redundant coverage.
  • The paper demonstrates that GoldenFuzz achieves higher hardware coverage than state-of-the-art fuzzers with shorter test cases and reduced computational overhead. For example, GoldenFuzz achieves coverage comparable to that of other fuzzers using test cases of only 30 instructions and less than 1% of test cases.
  • GoldenFuzz discovers five new vulnerabilities in tested cores (RocketChip, Boom, and CVA6), including four critical ones with CVSS 3.0 scores above seven.
  • A limitation is the manual effort required for vulnerability confirmation and analysis, although the authors propose a filtering approach to mitigate this.

Practical Implications

  • GoldenFuzz can be used by hardware designers and verification engineers to improve the security and reliability of their designs.
  • The two-stage fuzzing approach can be adapted to other hardware architectures and instruction sets.
  • The language model-based test case generation technique can be further refined to incorporate more domain-specific knowledge and constraints.
  • The filtering approach for mismatch analysis can be extended to incorporate more sophisticated pattern recognition techniques.
  • Future research directions include exploring the use of more advanced language models and reinforcement learning techniques to further optimize the fuzzing process.

Links & Resources

Authors