[Read Paper USENIX 2023] DAFL: Directed Grey-box Fuzzing Guided by Data Dependency

Background#

The method of Fuzzing is a way to find errors in the tested program by generating a large amount of input.

One of the key points of Fuzzing is the input generation method, the second is the input sorting method, and the third is the acquisition and application of internal software information.

If the input is completely self-constructed, this method is called generational fuzzing.

If the input is obtained by slightly modifying the original seed, this type of fuzzing is called Mutational fuzzing.

If the Mutator can be guided by certain program information, it is called GrayBox Fuzzing, such as coverage-guided fuzz testing.

If the Seed is allocated energy through Power Schedule and then the Mutator is sorted based on the energy of the Seed, this method is called Boosted GrayBox Fuzzing.

What is grey-box fuzzing#

White-box largely utilizes source code for constraint solving, program analysis, and other operations.

Black-box does not obtain any information about the source code.

Gray-box, on the other hand, obtains partial information about the program, commonly the program's execution coverage. The most common method is binary instrumentation (inserting custom recording code after each branch at compile time). For example, AFL is a gray-box fuzzer.

What is DGF (directed grey-box fuzzing)#

Why is there DGF#

Most gray-box fuzz testing uses seed coverage as guidance to cover as many paths as possible within a limited time. The main reason for doing this is that code coverage and bug coverage are positively correlated. When a fuzz testing tool can cover more code, it often finds more bugs.

However, not all code in the program should be treated equally or with equal weight, as the vast majority of code covered by the fuzz testing tool does not contain bugs. Moreover, in practical applications, it is very difficult or even impossible to achieve complete code coverage.

To overcome this problem, directed fuzz testing is proposed.

How is it different from traditional gray-box fuzzing#

Compared with traditional gray-box fuzz testing:

Different goals: Traditional gray-box fuzz testing aims to improve code coverage, while DGF tends to test specific code block bugs.
Different techniques: DGF generally requires data flow and control flow graphs to calculate the distance between seeds and target code segments.

The problem that DGF ultimately solves#

Use the distance between the seed and the target as the fitness function to help select the seed.

Seeds that are closer to the target location have more opportunities for mutation, guiding gray-box fuzz testing to reach the target location.

The entire directed gray-box fuzz testing converts the previous reachability problem into an optimization problem.

That is, optimizing the distance between the generated seeds and the target location.

After understanding the entire process of directed gray-box fuzz testing, it can be found that the key to the entire problem is how to define the distance between the seed and the target location for optimization.

Background#

DGF is currently mainly based on two key mechanisms:

DGF can support the evolution of test cases guided by coverage guidance in undirected gray-box fuzzing.
DGF measures the distance between execution nodes and target nodes in the CFG (control flow graph) to prioritize and sort each test case, providing guidance for the fuzzer to reach the target program point.
There are two scalability issues with the current DFG:
Traditional coverage feedback does not always provide meaningful guidance to find target program points (segments that may cause crashes).
When the program has complex control structures, the existing seed distance algorithms perform poorly.

Regarding coverage guidance, sometimes the direction of coverage guidance deviates from the target program point due to the large size of the program. Beacon was the first to attempt to solve this problem and used a weakest precondition to determine whether the target program point would be reached. However, this method is ineffective when the program has complex loops.

When the program is large, the length of the execution node chain will be long and will contain many nodes that are semantically unrelated to the target program point. WindRanger is the first to use Deviation Basic Blocks (DBBs) to solve the deviation problem, but it is not a good form in loop structures.

Motivation#

Simply put, it is an array out-of-bounds error in C language. The user can input an arbitrarily sized integer as an index, and a vulnerable function does not validate the input index.

There is only one path to reach this vulnerable function, but from the main function, there are many branching paths along the way, which greatly disperses the energy allocation of test cases and leads to misdirection.

By comparing the seed distance calculation of AFLGo and WindRanger, it is found that the results are the same. When dealing with complex loop structures, the distance measurement based on DBBs will also mislead.

Solution#

Selective coverage instrumentation#

In order to reach the specified code point t of program P, a Def-Use Graph is constructed backward at t to collect all dependent statements and use Thin Slicing.

About Thin Slicing: Dereference pointers (remove redundant variables caused by pointers) to reduce the size of the slice (understood as a backward data flow graph of data dependencies), thereby providing more efficient guidance for directed fuzzing.

Why use DUG? Because DUG can automatically ignore many irrelevant nodes, leaving only nodes with data dependencies. Moreover, even if the corresponding CFG has a loop, DUG does not have a loop (as long as there is no loop data dependency).

Semantic relevance scoring#

The farthest distance from the target node to the DUG is recorded as 1, and each time a node is approached, the shortest distance from all subsequent nodes to the target node is increased by 1. In this way, the sum of the scores of the execution node paths is the semantic relevance score.

To achieve the effect: If a seed executes more nodes or nodes closer to the target program point, the comprehensive score will be higher (as shown in the 3c graph).

Exp#

Five fuzzers, AFL, AFLGo, Beacon, WindRanger, and DAFL, were used to fuzz 41 vulnerabilities in the Beacon paper, and the metric was TTE (Time-to-Exposure), which is the time it takes to fuzz the first crash in 24 hours.

But I don't know why Beacon is not in the figure.

The result is that DFL is at least 1.93 times faster on average than the other fuzzers.

Subsequent ablation experiments were also conducted:

Thin slicing can make it 2.01 times faster on average.

Selective coverage instrumentation can make it 1.73 times faster on average (compared to AFL).

Semantic relevance scoring can make it 1.39 times faster on average (compared to AFL).

The combination of these two techniques can achieve 1.93 times faster on average (compared to AFL).

Discussion#

Discusses the support of DAFL for multiple targets, that is, when there are multiple target program points on the execution node chain, the average distance to them is taken as the relevance score.

Also discussed the issues of supporting other languages such as C/C++, Go, etc.

Conclusion#

Due to the misleading problem of the existing DGF distance calculation, DUG replaces CFG, and data flow information replaces control flow information, so that irrelevant nodes do not exist, thereby not misleading the fuzzer's direction.

Using thin slicing for pruning, DUG becomes more concise, improving the efficiency of the fuzzer.

Reference Links#

Summary of Fuzzing Techniques (Brief Surveys on Fuzz Testing)
https://zhuanlan.zhihu.com/p/43432370
Summary of Directed Greybox Fuzzing Related Content
https://zhuanlan.zhihu.com/p/415316061
Hawkeye: Directed Greybox Fuzzing Technology
https://zhuanlan.zhihu.com/p/130767857
Get Enlightened about Your Fuzz Testing - Overview of Directed Greybox Fuzzing
https://www.cnblogs.com/welkinchan/p/17685797.html
[Study Notes] Fuzzing Study Notes 3 - Gray Box Fuzzing
https://superlova.github.io/2020/08/20/%E3%80%90%E5%AD%A6%E4%B9%A0%E7%AC%94%E8%AE%B0%E3%80%91Fuzzing%E5%AD%A6%E4%B9%A0%E7%AC%94%E8%AE%B03%E2%80%94%E2%80%94%E7%81%B0%E7%9B%92Fuzzing