-
Notifications
You must be signed in to change notification settings - Fork 544
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extract engines into external libs #226
Comments
After a brainstorming the split will look like this: External lib
The
libtriton's namespaces Basically, the library will look like this:
Triton-Pin
The pintool will extract information used by the Why this big step forward? In several cases, we have to perform analysis on a trace. This trace may come from everywhere (not necessarily Pin-based only). For example, this trace may come from a database (offline analysis), another DBI engine like dynamorio or emulators like qemu, bochs and medusa. Extract all Triton features into an external lib (independent from any trace extractor) allows to the user to plug Triton everywhere. This is also a good way to support more architectures like ARM. |
Sounds great. I do not understand the last sentense though. Could you explan in a few words what needed to be done in order to add arm support ? Trace recording plus the translation of the trace into a format which the libtriton understands? Or will triton Be able to do the arm translation? |
Hey,
Yes.
Nop. In fact, the new design allow us to plug any arch. Currently, we will release only the x86 semantics but you can easily add ARM semantics if you want. The new design looks like this: $ tree src
src
├── api
│ └── api.cpp
├── arch
│ ├── architecture.cpp
│ ├── bitsVector.cpp
│ ├── immediateOperand.cpp
│ ├── instruction.cpp
│ ├── memoryOperand.cpp
│ ├── operandWrapper.cpp
│ ├── registerOperand.cpp
│ └── x86
│ ├── x8664Cpu.cpp
│ ├── x86Cpu.cpp
│ ├── x86Semantics.cpp
│ └── x86Specifications.cpp
├── bindings
│ └── python
│ ├── init.cpp
│ ├── modules
│ │ ├── smt2libCallbacks.cpp
│ │ └── tritonCallbacks.cpp
│ ├── namespaces
│ │ ├── initArchNamespace.cpp
│ │ ├── initCpuSizeNamespace.cpp
│ │ ├── initOperandNamespace.cpp
│ │ ├── initRegNamespace.cpp
│ │ ├── initSmtAstNodeNamespace.cpp
│ │ └── initX86OpcodesNamespace.cpp
│ ├── objects
│ │ ├── PyBitvector.cpp
│ │ ├── PyImmediate.cpp
│ │ ├── PyInstruction.cpp
│ │ ├── PyMemory.cpp
│ │ ├── PyRegister.cpp
│ │ └── PySmtAstNode.cpp
│ ├── pyXFunctions.cpp
│ └── utils.cpp
├── ctx
│ └── context.cpp
├── engines
│ ├── symbolic
│ │ ├── symbolicEngine.cpp
│ │ ├── symbolicExpression.cpp
│ │ └── symbolicVariable.cpp
│ └── taint
│ └── taintEngine.cpp
├── includes
│ ├── *.hpp
├── os
│ └── unix
│ ├── syscallNumberToString.cpp
│ └── syscalls.cpp
└── smt2lib
└── smt2lib.cpp As you can see, there is a namespace triton {
namespace arch {
namespace x86 {
namespace semantics {
[...]
void xor_s(triton::arch::Instruction& inst) {
auto dst = inst.operands[0];
auto src = inst.operands[1];
auto op1 = api.buildSymbolicOperand(dst);
auto op2 = api.buildSymbolicOperand(src);
auto node = smt2lib::bvxor(op1, op2);
auto expr = api.createSymbolicExpression(inst, node, dst);
api.taintUnion(expr, dst, src);
clearFlag(inst, ID_TMP_CF, "Clears carry flag");
clearFlag(inst, ID_TMP_OF, "Clears overflow flag");
pf(inst, expr, dst);
sf(inst, expr, dst);
zf(inst, expr, dst);
}
[...]
}; /* semantics namespace */
}; /* x86 namespace */
}; /* arch namespace */
}; /* triton namespace */ If you want to add the ARM semantics, you have to keep the same structure as x86 (when the v0.3 will be released, I will write a blog post about that). Then, as you said, with the new design, you can plug any tracer to extract semantics (online, offline or whatever). An offline example: import sys
from triton import *
trace = [
(0x400000, "\x48\x8b\x05\xb8\x13\x00\x00"), # mov rax, QWORD PTR [rip+0x13b8]
(0x400007, "\x48\x8d\x34\xc3"), # lea rsi, [rbx+rax*8]
(0x40000b, "\x67\x48\x8D\x74\xC3\x0A"), # lea rsi, [ebx+eax*8+0xa]
(0x400011, "\x66\x0F\xD7\xD1"), # pmovmskb edx, xmm1
(0x400015, "\x89\xd0"), # mov eax, edx
(0x400017, "\x80\xf4\x99"), # xor ah, 0x99
]
if __name__ == '__main__':
#Set the arch
setArchitecture(ARCH.X86_64)
for (addr, opcodes) in trace:
# Build an instruction
inst = Instruction()
# Setup opcodes
inst.setOpcodes(opcodes)
# Setup Address
inst.setAddress(addr)
# optional - Update the register state at each program point
inst.updateContext(Register(REG.RAX, 0x4444444455555555));
inst.updateContext(Register(REG.RBX, 0x1111111122222222));
# optional - Add memory access <addr, size, content> at each program point
inst.updateContext(Memory(0x66666666, 4, 0x31323334));
# Process everything. Builds IR and updates engines.
processing(inst)
sys.exit(0) As you can imagine, the new design allows you to use any tracer like:
I do my best to release the v0.3 as soon as possible. I hope it will be available in January 2016. Cheers, |
Thank you for your answer and for taking the time to add a more detailed explanation. At the bottom of the last code snippet your comment states "Builds ir". What do you meant by that? Given the new upcoming design, what do you think of the idea of supporting ir traces, i.e., a trace already translated to e.g. REIL or vex? Do you see any particular challenges or disadvantages? An obvious advantage would be the support of arm and other platforms with a one-time effort. |
The SMT2-Lib representation.
Disadvantages more than challenges. I'm not really convinced by REIL, VEX, BAP or whatever. Intermediate Representation are useful for generic analysis. In the case of Triton, what we want is to represent the control flow with a symbolic representation, solve constraints and perform simplifications. In order to do that, the best representation is the SMT2-Lib. Why the SMT2-Lib? Because it's an international initiative aimed at facilitating research and development in Satisfiability Modulo Theories. Which means that we can use any SAT/SMT solvers which support this format. If we use another IR (REIL, VEX or BAP), the translation's chain looks like this:
Obviously, it's less performant than:
It's already as one-time effort You should consider SMT2-Lib like another IR and like every IR, there is no magic, we must convert each instruction semantics into the targeted representation :). |
The development of the v0.3 is now available on a new branch. A doxygen is also available. |
Useful to plug any DBI engines.
External libs
Pin core
The text was updated successfully, but these errors were encountered: