PAPA is a minimalistic CAPA-like tool implemented in C++ that analyzes Windows Portable Executable (PE) files for specific capabilities. By disassembling the binary and scanning for strings, Windows API calls, and opcode patterns, the tool identifies indicators of various capabilities. It leverages a custom rule engine that uses a simple domain-specific language (DSL) to query disassembled code, making it an effective and lightweight alternative to more heavyweight analysis frameworks.
PAPA uses LIEF to parse Windows PE files and extract important sections, and Capstone to disassemble code into functions and basic blocks. A custom rule engine is integrated, which:
- Loads rule files from a specified directory.
- Parses rules written in a simple DSL supporting logical expressions (using AND/OR operators), regex with wildcards, case-insensitivity, and even base64 searches.
- Evaluates each rule against the disassembled code at three different scopes: the entire file, individual functions, or individual basic blocks.
This design allows the tool to quickly and efficiently identify areas of interest in the binary.
- Speed: By focusing on disassembled code and leveraging efficient in-memory string and regex matching, this tool is much faster than the original CAPA tool.
- Lightweight: The project avoids heavy dependencies for rule parsing by using a custom DSL and only relies on well-established libraries (LIEF and Capstone) for PE parsing and disassembly.
- Flexibility: Users can write custom rules to search at different scopes (file, function, basic block) and employ advanced matching techniques (regex with wildcards, case-insensitivity, and base64 decoding).
- Ease of Use: The rule engine is designed to be user-friendly. New rules can be added simply by dropping text files into the rules directory without needing to recompile the tool.
Ensure you have a C++17-compliant compiler installed along with the following dependencies:
- LIEF: For parsing and analyzing PE files.
- Capstone: For disassembling binary code.
The tool is run from the command line with the following syntax: PAPA.exe {PE file path} {rule directory} [--scan-only]
PE file path
: Path to the Windows PE file you want to analyze.rule directory
: Directory containing rule files.--scan-only
: Optional flag to run in scan-only mode (only outputs the names of matched rules without printing the full disassembly).
Each rule file uses a simple DSL with the following format: name: scope: condition: ()
- name: A descriptive identifier for the rule.
- scope: Defines the level at which the rule is applied. Valid values are:
file
: The rule is evaluated against the entire disassembled output.function
: The rule is evaluated per function.basicblock
: The rule is evaluated on each basic block.
- condition: A logical expression that combines conditions with the operators
AND
andOR
. Conditions can target:- API calls: e.g.,
api:"<pattern>"
- Strings: e.g.,
string:"<pattern>"
- Opcodes: e.g.,
opcode:"<pattern>"
- API calls: e.g.,
-
Regex and Wildcards:
Enclose the pattern in forward slashes to enable regex matching. For instance,
api:"/MessageBoxA/i"
will match the API callMessageBoxA
in a case-insensitive manner. Wildcards can be used via standard regex constructs (e.g.,/Hello.*/
). -
Base64 Search:
Prefix the pattern withb64:
to enable base64 matching. For example,
string:"b64:/Hello World/i"
instructs the tool to scan for base64-encoded substrings, decode them, and then perform a case-insensitive regex search for "Hello World".
-
Detect MessageBoxA with Hello World in the same basic block:
name: Detect MessageBoxA with Hello world scope: basicblock condition: (api:"/MessageBoxA/i" AND string:"/Hello World/i")
-
Detect a function-level capability to capture screenshots:
name: capture screenshot scope: function condition: ((api:"GetWindowDC" OR api:"GetDC" OR api:"CreateDC") AND (api:"BitBlt" OR api:"GetDIBits")) OR (api:"System.Drawing.Graphics::CopyFromScreen")
PAPA has over 600 official rules, which are official CAPA rules that were modified to support PAPA's DSL syntax. We currently only support rules that don't use the "match", "section", "offset" & "number" features of CAPA - hopefully these would be supported in the future.
-
Support Decompilation of Mupltiple Sections:
Decompile all of the sections which have execution permissions -
Extend Regex & Encoding Support:
Expand the current base64 matching feature to include additional encoding formats (e.g. base32, hex), in addition to full Regex support for more robust detection. -
Matching Other Rules as a Condition:
Require the matching of other rules as a condition in a rule, just like CAPA allows. -
Export Results to JSON:
Add an option to output scan results in JSON format. -
Add more condition-scopes:
Support exports, section names, mnemonics, offsets, etc. -
Support CAPA Rules Syntax:
Prase CAPA-compatible rule syntax, making it possible to use existing CAPA rules.