Emulating File I/O for In-Memory Fuzzing

October 12, 2020

By: Christopher Vella

Emulating File I/O for In-Memory Fuzzing | Signal Labs | Advanced Offensive Cybersecurity Training | Self-Paced Trainings | Live Trainings | Virtual Trainings | Custom Private Trainings for Business

One problem I’ve encountered during fuzzing is how to best fuzz an application that performs multiple file reads on an input file, but in a performant way (e.g. in-memory without actually touching disk). For example, say an application takes in an input file path from a user and parses it, if the application loads the entire file into a single buffer to parse, this is simple to fuzz in-memory (we can modify the buffer in-memory and resume), however if the target does multiple reads on a file from disk, how can we fuzz performantly?

Of course if we’re fuzzing by replacing the file on disk for each fuzz case we can fuzz such a target, but for performance if we’re fuzzing entirely in-memory (or using a snapshot-fuzzer that doesn’t support disk-based I/O) we need to ensure each read operation the target performs on our input does not actually touch disk, but instead reads from memory.

The method I decided to implement for my fuzzing was to hook the different file IO operations (e.g. ReadFile) and implement my own custom handlers for these functions that redirects the read operations to memory instead of disk, this has multiple benefits:

  1. We eliminate syscalls, as lots of file operations result in syscalls and my custom handler does not use syscalls, we avoid context switching into the kernel and obtain better perf
  2. We keep track of different file operations but it all operates on a memory-mapped version of our input file, this means we can mutate the entire mem-mapped file once and guarantee all ReadFile calls will be on our mutated Memory-mapped file

The normal operation of reading a file (without using my hooks) is:

  1. CreateFile is called on a file target
  2. ReadFile is used on the target to read into a buffer (resulting in syscalls and disk IO)
  3. Process parses the buffer
  4. ReadFile is used on the target to read more from the file on disk
  5. Process continues to parse the buffer
Process Reading from Disk without Hooks

With our hooks, the operations instead look like:

  1. CreateFile is called on a file target (our hook memory maps the target once entirely in-memory)
  2. ReadFile is used on the target to read into a buffer (resulting in our custom ReadFile implementation to be called via our hook, and we handle the ReadFile by returning contents from our in-memory copy of the file, resulting in no syscalls or Disk IO)
  3. Process parses the buffer
  4. ReadFile is used on the target to read more from the file (in-memory again, just like the first ReadFile)
  5. Process continues to parse the buffer
Process Reading a File with our Hooks (In-Memory)

This greatly simplifies mutation and eliminates syscalls for the file IO operations.

The implementation wasn’t complex, MSDN has good documentation on how the APIs perform so we can emulate them, alongside writing a test suite to verify our emulation accuracy.

The code for this can be found on my GitHub: https://github.com/Kharos102/FileHook

Brand Icon Seperator | Signal Labs | Advanced Offensive Cybersecurity Training | Self-Paced Trainings | Live Trainings | Virtual Trainings | Custom Private Trainings for Business

Empowering Cyber Defense with Advanced Offensive Security Capabilities

Signal Labs provides self-paced and live training solutions, empowering our learners to acquire the latest cutting-edge skills in this rapidly evolving field. Improve your vulnerability research campaigns and adversary simulation capabilities with the latest in offensive research and techniques.

Stay Connected

We'll let you know when our next live training is scheduled.

Stay Connected

We'll let you know when our next live training is scheduled.

Stay Connected

We'll let you know when our next live training is scheduled.

Stay Connected

We'll let you know when our next live training is scheduled.