To add a new target one can take inspiration from libtiff
.
Preliminary steps
One needs first to understand how to successfully compile a library. This may mean to deal with different building systems, e.g., CMake, make, ninja. I recommend to spend some time to build the library in your local machine, and run a minimal example (a simple hello.cc that links against the library).
What is it important?
You need to understand:
- how to control
CFLAGS
andCXXFLAGS
- how to set the compiler
CC
andCXX
- indicate the installation folder
- compile statically, e.g., producing a
.a
file somehow
Once mastered these skills (!), you should be able to create the following scripts.
Create a new folder
Add a new folder in ./target
:
NEWLIBRARY=<a-library>
mkdir -p ./target/$NEWLIBRARY
Scripts We need seven scripts/files for the full pipeline:
analysis.sh
build_library.sh
fetch.sh
fuzz_driver.sh
generator.toml
preinstall.sh
public_headers.txt
preinstall.sh
The bash script runs an apt install
(or similar) to install system dependencies.
For instance:
#!/bin/bash
apt-get update && \
apt-get install -y git make autoconf automake libtool cmake nasm \
zlib1g-dev liblzma-dev libjpeg-turbo8-dev wget
The script will be automatically invoked by analysis.sh
and build_library.sh
.
fetch.sh
This downloads the source files and stores them into $TARGET
. You must assume
$TARGET
is an asbolute path, a possible example is:
#!/bin/bash
git clone --no-checkout https://gitlab.com/libtiff/libtiff.git \
"$TARGET/repo"
git -C "$TARGET/repo" checkout c145a6c14978f73bb484c955eb9f84203efcb12e
If the library is not in git/mercury/svn, you may want to fallback to the old good wget+tar.
The script will be automatically invoked in the docker building.
public_headers.txt
The file indicates which library headers are meant to be included in the consumer/driver. One just needs to indiates the actual file, no subfolder handled yet. Example:
htp.h
htp_transaction.h
bstr.h
analysis.sh
This contains the steps to analyze the library and produce the meta-data for generating the drivers later. Make sure the building system adheres to the following set-up.
Be sure that:
- create working dir:
WORK="$TARGET/work"
rm -rf "$WORK"
mkdir -p "$WORK"
mkdir -p "$WORK/lib" "$WORK/include"
- Set
wllvm
as a compiler along with its respective flags. The dev container (i.e., the docker) should already contain a custom clang in$LLVM_DIR
. SettingCC
/CXX
might differ for your library.
export CC=wllvm
export CXX=wllvm++
export LLVM_COMPILER=clang
export LLVM_COMPILER_PATH=$LLVM_DIR/bin
export LIBFUZZ_LOG_PATH=$WORK/apipass
- Make sure you create the folder where storing analysis's results
mkdir -p $LIBFUZZ_LOG_PATH
- Configure the library building by setting
CFLAGS
andCXXFLAGS
and including"-mllvm -get-api-pass -g -O0
. This step depends on the library building system.
CC=wllvm CXX=wllvm++
CXXFLAGS="-mllvm -get-api-pass -g -O0"
CFLAGS="-mllvm -get-api-pass -g -O0"
- Rule of thumb: create empty files to record the results:
touch $LIBFUZZ_LOG_PATH/exported_functions.txt
touch $LIBFUZZ_LOG_PATH/incomplete_types.txt
touch $LIBFUZZ_LOG_PATH/apis_clang.json
touch $LIBFUZZ_LOG_PATH/apis_llvm.json
touch $LIBFUZZ_LOG_PATH/coerce.log
- Compile/install, e.g.,
make
/make install
NOTE! Sometime setting CFLAGS
/CXXFLAGS
is not sufficient to force -O0
. For
Makefile, for instance, a command like that might turn out handy!
find . -name Makefile -exec sed -i 's/-O2/-O0/g' {} \;
- Extract
.bc
files, for instance
extract-bc -b $WORK/lib/libtiffxx.a
extract-bc -b $WORK/lib/libtiff.a
- Invoke header analysis, e.g.,
$TOOLS_DIR/tool/misc/extract_included_functions.py -i "$WORK/include" \
-p "$LIBFUZZ/targets/${TARGET_NAME}/public_headers.txt" \
-e "$LIBFUZZ_LOG_PATH/exported_functions.txt" \
-t "$LIBFUZZ_LOG_PATH/incomplete_types.txt" \
-a "$LIBFUZZ_LOG_PATH/apis_clang.json" \
-n "$LIBFUZZ_LOG_PATH/enum_types.txt"
It is important that -e
, -t
, and -a
flags refer to files in
$LIBFUZZ_LOG_PATH
. The option -p
, insteaed, points to the $TARGET_NAME
in
folder.
- Run the static analyzer, e.g.,
$TOOLS_DIR/condition_extractor/bin/extractor \
$WORK/lib/libtiff.a.bc \
-interface "$LIBFUZZ_LOG_PATH/apis_clang.json" \
-output "$LIBFUZZ_LOG_PATH/conditions.json" \
-minimize_api "$LIBFUZZ_LOG_PATH/apis_minimized.txt" \
-v v0 -t json -do_indirect_jumps \
-data_layout "$LIBFUZZ_LOG_PATH/data_layout.txt"
Again, the analyzer seeks for the .bc
extracted before, while -interface
,
-output
, -minimize_api
, and -data_layout
point to $LIBFUZZ_LOG_PATH
.
Moreover, use always -v v0
, -t json
, and -do_indirect_jumps
.
Entry point
./docker/run_analysis.sh
is the analysis entry point. To run the analysis, do like:
# to run outside the docker, for local usage see (*)
cd ./docker
TARGET=libtiff ./run_analysis.sh
If everything works fine, the analyzer should generate a folder in ./analysis/$TARGET_NAME
, such as:
ls ./analysis/libtiff
repo work
repo
contains the cloned library source code. work
contains the analysis
result and works as the installation folder for the library. Important folders:
ls libtiff/work/apipass # results of the analysis
apis_clang.json conditions.json incomplete_types.txt
apis_llvm.json data_layout.txt apis_minimized.txt
coerce.log exported_functions.txt
ls libtiff/work/lib # .a and .bc files after installation
libtiff.a libtiff.a.bc libtiff.la libtiffxx.a libtiffxx.a.bc libtiffxx.la pkgconfig
ls libtiff/work/include # header files / exposed functions
tiff.h tiffconf.h tiffio.h tiffio.hxx tiffvers.h
If you really want to know more regarding options etc, check here.
generator.toml
This controls how to generate drivers. Simplest approach: copy-paste from
libtiff
and change the folder paths. The paths must be absolute and refer to
the in-docker file system.
NOTE: Remember that some parameters could be overwritten by a global
configuration.
In $LIBFUZZ
, there is overwrite.toml
.
[generator]
pool_size = 40
driver_size = 10
num_seeds = 20
./tool/main.py
checks also this file and replaces the parameters. This is used
to control bulk-driver generation. I am sure people will get confused, so I
better write this here.
Entry point
After running the analysis! You can run the driver generation phase.
# to run outside the docker, for local usage see (*)
cd ./docker
TARGET=libtiff ./run_drivergeneration.sh
This will create a workdir/$TARGET
, somethig like:
ls workdir/libtiff
corpus # all the initial corpus
drivers # all the drivers
If you want to know more of driver generation, check here.
build_library.sh
This compiles and prepares the library to be fuzzed with libfuzzer.
We have to compile the library twice. First with fuzzing (fuzzer-no-link, ASan) instrumentation
and secondly with coverage instrumentation.
This is very similar to analysis.sh
. The difference is in the CFLAGS
and CXXFLAGS
that first must be set as:
export CXXFLAGS="-fsanitize=fuzzer-no-link,address -g" \
export CFLAGS="-fsanitize=fuzzer-no-link,address -g"
-fsanitize=fuzzer-no-link
prepares the library to be used in LibFuzzer, while
-fsanitize=address
includes ASan.
While for coverage build we need:
export CXXFLAGS="-fprofile-instr-generate -fcoverage-mapping -g" \
export CFLAGS="-fprofile-instr-generate -fcoverage-mapping -g"
It is important to save the profile version of the library before running fuzzing build:
mv $WORK/lib/{library_name}.a $WORK/lib/{library_name}_profile.a
Moreover, we do not need wllvm
here. A simpler configuration such as the following is preferable.
export CC=$LLVM_DIR/bin/clang
export CXX=$LLVM_DIR/bin/clang++
The rest depends on the library building system. I (aka Flavio) suggest to
copy analysis.sh
, modify only CC
, CXX
, CFLAGS
, CXXFLAGS
, and remove the
analysis part.
Similar to analysis.sh
, the scripts assumes $TARGET
is an absolute path in
the system, and $WORK
is the installation path for the library. The Dockerfile
changes the $WORK
directory making it point to a different location. In
practice, you will have two library installations, one for analysis and another
for fuzzing. However, this should be transparent for you. Meaning you should not
care.
For more info refer to here.
compile_driver.sh
This script is a simple loop that link drivers in ./workdir/${TARGET}/drivers
against the library compiled from build_library.sh
.
Similarly, copy compile_driver.sh
from libtiff
and adjust the compilation step.
For reference, I left two [TAG]
that contains the part of the script to modify
in libtiff
's compile_driver.sh
script.
Similar to building library, we have to compile the driver twice: for fuzzing
and for profiling (with usual -fprofile-instr-generate -fcoverage-mapping
flags).
The important thing is to compile drivers against the .a
library from
./build_library.sh
(the profile driver should be compiled against respective profile library),
and save the output in the same .cc
folder for fuzzer and inside profiles
folder for coverage intrumented driver.
The rest should work transparently.
The actual fuzzing campaign is handled by the script
./targets/start_fuzz_driver.sh
, which will look up the correct
compile_driver.sh
according to the $TARGET
set.
Entry point for compile_driver.sh and build_library.sh
Thery are used together, just run:
# to run outside the docker, for local usage see (*)
cd ./docker
TARGET=libtiff TIMEOUT=1m DRIVER=driver8 ./run_fuzzing.sh
TARGET
-- the library to fuzzTIMEOUT
-- campaign length, accepts minutes [m], seconds [s], hours [h]DRIVER
-- the actual driver to fuzz, if omitted compile/fuzz all drivres (equivalent toDRIVER=*
)
The script additionally creates a crashes
folder for the, guess what, crashes!
The initial corpus is also copied in a new folder corpus_new
to divide
generated and initial seeds.
For more info refer to here.
(*) Debug Locally