Extracting Memory Data with PTRACE_PEEKDATA
Using PTRACE_GETREGS, we can inspect the system call number, arguments, and return value stored in registers at the time of a system call.
However, some system calls take pointers to memory addresses rather than simple integer values as arguments.
In such cases, the value stored in the register is not the actual data, but merely an address in the tracee’s virtual memory space.
Therefore, we must explicitly read the memory contents pointed to by that address.
A representative example is the write system call.
ssize_t write(int fd, const void *buf, size_t count);The second argument buf is a memory address pointing to the data to be written,
and the register stores only that address, not the string itself.
In this article, we will use PTRACE_PEEKDATA to demonstrate how to extract data from the tracee process’s virtual memory.
Overview of PTRACE_PEEKDATA
long ptrace(PTRACE_PEEKDATA, pid_t pid, void *addr, void *data);PTRACE_PEEKDATA reads from the tracee process’s virtual memory,
returning sizeof(long) bytes (i.e., one machine word) starting at the given address(addr).
PTRACE_PEEKTEXT, PTRACE_PEEKDATA
Read a word at the addressaddrin the tracee’s memory, returning the word as the result of theptrace()call.
— man 2 ptrace
The important points to note are:
- The return unit is a word, not a single byte (
sizeof(long)) - On x86_64 systems,
sizeof(long) == 8 - The returned value must be interpreted in little-endian format
Inspecting write(buf) with PTRACE_PEEKDATA
Below is a simple example that inspects the buf argument of a write system call printing "Hello, World\n",
at the system call entry point.
In this example,
PTRACE_O_TRACESYSGOODis enabled, so system call stops are delivered asSIGTRAP | 0x80.
static void peek_word(pid_t pid, unsigned long addr) {
errno = 0;
long ret = ptrace(PTRACE_PEEKDATA, pid, addr, NULL);
if (ret == -1 && errno) {
perror("PTRACE_PEEKDATA");
return;
}
printf("peekdata: 0x%lx\n", ret);
}case SIGTRAP | 0x80:
if (!in_syscall) { /* syscall entry */
peek_word(pid, regs.rsi); /* write(fd, buf, count) */
}
break;Output:
peekdata: 0x57202c6f6c6c6548Since the word size on x86_64 is 8 bytes,
8 bytes are returned at once from the memory pointed to by buf.
Little-Endian Interpretation
Breaking down the returned value 0x57202c6f6c6c6548 byte by byte yields:
| Memory Address | Value |
|---|---|
| addr + 0 | 0x48 (H) |
| addr + 1 | 0x65 (e) |
| addr + 2 | 0x6c (l) |
| addr + 3 | 0x6c (l) |
| addr + 4 | 0x6f (o) |
| addr + 5 | 0x2c (,) |
| addr + 6 | 0x20 (SPACE) |
| addr + 7 | 0x57 (W) |
In other words, the returned word corresponds to the first 8 bytes of "Hello, W".
The Simplest Way to Read the Entire String (Not Recommended)
The following approach is for demonstration purposes only and is not recommended due to performance and alignment concerns.
static void peek_bytewise(pid_t pid, unsigned long addr, size_t sz) {
for (size_t i = 0; i < sz; i++) {
errno = 0;
long ret = ptrace(PTRACE_PEEKDATA, pid, addr + i, NULL);
if (ret == -1 && errno) {
perror("PTRACE_PEEKDATA");
return;
}
putchar(ret & 0xff);
}
}- Calls
ptrace()once per byte - Inefficient for a syscall tracer
- Implicitly assumes safe unaligned access
Recommended Approach: Read by Words and Copy to a Buffer
By using the byte count and the word size, we can minimize the number of ptrace calls.
#define WORD_BYTES (sizeof(long))
static void peek_buffer(pid_t pid,
unsigned long addr,
size_t size)
{
size_t off;
char *buf = calloc(size, 1);
for (off = 0; off < size; off += WORD_BYTES) {
errno = 0;
long word = ptrace(PTRACE_PEEKDATA, pid, addr + off, NULL);
if (word == -1 && errno) {
perror("PTRACE_PEEKDATA");
break;
}
size_t n = size - off;
if (n > WORD_BYTES)
n = WORD_BYTES;
memcpy(buf + off, &word, n);
}
fwrite(buf, 1, size, stdout);
free(buf);
}This approach has the following characteristics:
- Number of ptrace calls is
ceil(size / WORD_BYTES) - Does not rely on memory alignment
- Safe for arbitrary binary buffers, not just strings
Important Note: write Is Not a String System Call
The write system call is not a string-output API.
write()writes up to count bytes from the buffer starting atbuf
— man 2 write
This means:
bufis not guaranteed to be NULL-terminated- It may contain binary data
- Using
printf("%s", buf)is logically incorrect
Only a limited set of system calls can safely assume string arguments.
General Classification of Pointer Arguments
In ptrace-based syscall tracers, pointer arguments typically fall into three categories:
- Buffer + Length
read,write- Meaning depends on syscall entry vs exit
- NULL-terminated Strings
open,unlink,execve
- Structs / Arrays / Pointer Arrays
stat,uname,execve(argv)
write belongs to the simplest category.