I am an Engineer at Systems & Technology Research (STR), where I build emulation, instrumentation, and vulnerability research tooling for embedded and real-time systems. Most of my day-to-day involves extending QEMU to model new processor architectures and SoCs, writing eBPF-based tracing frameworks, and developing fuzzers that operate at the full-system level.
I hold a combined Master's and Bachelor's in Computer Science from Northwestern University, where I founded the CTF team, served on the Blockchain Club executive board, and TA'd courses in system security, digital forensics, and deep learning.
The bulk of my work at STR falls into a few related areas.
I extend QEMU to support targets that don't have upstream support, both at the ISA and SoC level. This includes writing a full TCG translation frontend for the TMS320C6000, a VLIW DSP from Texas Instruments with 8-way parallel execution, branch delay slots, and cross-path register constraints that make it significantly harder to model than a conventional scalar architecture. I've also built a complete peripheral model for the EFM32HG (ARM Cortex-M0+), including its clock management unit with propagation to GPIO, USART, and RTC peripherals, USB device-side callbacks, and external USART-attached flash and GPS modules. The goal is full-system emulation accurate enough to run and analyze real firmware without the physical hardware, for example, running the entire SnapperGPS firmware stack in QEMU.
I modernized a significant portion of STR's QEMU instrumentation layer from C to C++23, using concepts for type constraints, ranges for composable data pipelines, and std::expected<T,E> for error handling in memory tracing paths. The rewrite achieved full API compatibility with existing tooling while reducing runtime overhead by 30%, mainly by eliminating unnecessary allocations and virtual dispatch in hot paths.
Separately, I built an eBPF-based tracing framework that uses C++ template metaprogramming to generate BPF programs at compile time with zero-cost abstractions for shared object and syscall tracking. This replaced an earlier approach based on ptrace (Linux) and Detours (Windows), cutting overhead from 12% to under 1% and per-syscall latency from 25μs to 800ns. The template approach means adding a new tracepoint is a type-safe, compile-checked operation rather than hand-writing BPF bytecode.
I developed a full-system fuzzer built on top of QEMU's TCG that works by injecting test inputs directly into guest memory and instrumenting translation blocks for coverage feedback. Each fuzzed process gets isolated coverage tracking, which is important when you're fuzzing an entire OS image and don't want kernel noise polluting your signal. This has found 15+ vulnerabilities across several targets. To deal with the throughput problem inherent to full-system emulation, I built a snapshot-and-restore pipeline using KVM migration with custom DMA/PCI ivshmem drivers on both Windows and Linux guests, which brought fuzzing throughput up by roughly 100x.
To address the bottleneck of manual reverse engineering on unfamiliar firmware, I architected a custom QEMU plugin that tracks process-isolated code coverage at the translation block level and exports it into Ghidra for visualization. This gives analysts an immediate heatmap of which code paths have been exercised during emulation or fuzzing, reducing the manual analysis time on new targets by roughly 70% compared to starting from a raw disassembly.
For a timing-critical RTOS target, I reverse engineered the firmware and its board support package to understand the scheduler well enough to inject custom validation tasks that run alongside the native workload. Combined with FPGA-based external monitoring, this approach eliminated roughly 95% of the overhead compared to traditional JTAG-based validation, important when the system under test has hard real-time deadlines that intrusive debugging would violate.
A bare-metal x86-64 kernel written in C++26 with modules, designed as a vehicle for exploring modern C++ in a freestanding environment and for understanding what it takes to build a system from nothing. The kernel boots into 64-bit long mode via Multiboot2 with a higher-half mapping through 4-level paging, has a physical memory manager with bitmap allocation across DMA/Normal/High zones, a heap allocator, and a PIT-driven interrupt system. Currently in Phase 1, the next steps are a virtio-net driver and the beginnings of a network stack, which would eventually support market data parsing and order execution as the motivating use case. The real point is less about trading and more about having a concrete target that forces you to care about every microsecond from boot to packet.
A framework for writing kernel components in Rust for Northwestern's Nautilus aerokernel. Wrapped the kernel's C IRQ, threading, and character device subsystems with idiomatic Rust APIs that use RAII to bind resource lifecycles to object lifetimes and Send + Sync trait bounds to make data races on shared handler state a compile-time error. Built an async executor for cooperative multitasking and a Virtio GPU driver. Validated the framework by porting the original DOOM engine to run in the Nautilus shell.
An autonomous Cyber Reasoning System that discovers vulnerabilities and generates patches without human intervention. The system combines BandFuzz (a custom fuzzer with selective instrumentation) with a multi-LLM pipeline using GPT-4 and Claude 3 for patch generation, achieving a 92% success rate across 178 known vulnerabilities including PyTorch CVEs. ML-guided fuzzing reduced discovery time by 85%. We deployed it as a microservices architecture with cost-optimized model selection to stay within the competition's compute budget. Several patches were accepted upstream into PyTorch and other open-source projects.
Built a Northwestern University knowledge assistant by fine-tuning LLaMA 7B on ~1,000 instruction pairs sourced from the Daily Northwestern, r/Northwestern, Stack Exchange, and WikiHow. GPT-4 generated the university-specific Q&A pairs after GPT-3.5 proved too shallow, producing only short, verbatim extractions rather than the synthesized answers needed for instruction tuning. LoRA kept training to ~4% of parameters, preserving the base model's general capability (MMLU 34.18 vs. 35.1 baseline) while teaching domain-specific knowledge. DeepSpeed brought training time from 12 hours to 20 minutes. The main challenge was data quality: uncleaned HTML entities in training data caused the model to waste parameters learning tokens like &, and the Daily Northwestern alone lacked the basic institutional facts needed for a university assistant, a gap we filled by mixing in Wikipedia articles under the Northwestern category.
Built tooling to streamline launching microservices across IRAD projects by exploiting shared structure in forked codebases. Adapted a microservice to subscribe to a Kafka topic carrying LINK16 messages, extract assets from track data, and republish them for display on a map within a C2 application.
Built a data model over ~10M tweets per month in ClickHouse to surface sentiment and trend signals across crypto markets. Expanded a public figures database of 20K+ entries by linking Twitter accounts with on-chain activity for simultaneous tracking. Retrieved 170K labels for tokens, accounts, blocks, and transactions to improve query coverage, and grew the live and historical pricing database from 5 cryptocurrencies to 577 tokens. Prototyped an MLP regressor using scikit-learn trained on 20K Twitter-to-Ethereum address mappings to estimate account net worth, ultimately limited by the noisiness of the feature space.
| Languages | C/C++ (primary), Rust, x86/ARM/PPC/SH4 assembly, Python, Java, TypeScript |
| Systems | Linux and Windows internals, RTOS development, QEMU/KVM (TCG frontend development, device modeling, migration), eBPF (kprobes, tracepoints, XDP), DMA and PCI device drivers, FPGA integration, JTAG/SWD debugging, UART/SPI/I2C bus protocols |
| Security | Binary analysis (Ghidra, IDA), fuzzing (AFL++, custom harnesses, coverage-guided full-system), static analysis (CodeQL), reverse engineering (firmware, RTOS, embedded), exploit development |
| Tools | GDB, perf, Wireshark, tcpdump, strace/ltrace, Make/CMake, Git, Docker |
| Networking | TCP/IP, UDP, multicast, Bluetooth LE, IEEE 802.15.4/Thread, LoRaWAN |
| ML/Data | PyTorch, LoRA/PEFT, DeepSpeed, ClickHouse, Kafka |
Founded the university CTF team. Blockchain Club executive board member. Teaching assistant for System Security, Digital Forensics, and Deep Learning. Coursework included low-level software development, microprocessor system design, operating systems, computer networking, electronics design, algorithms, database systems, wireless protocols, and generative models.
Most of my work so far has been at the firmware and OS level, but I'm increasingly drawn toward the hardware side of that boundary. HFT-Zero is a current outlet for this, a bare-metal x86-64 kernel in C++26 where the next milestone is a virtio-net driver and packet processing path targeting sub-microsecond latency. I'm actively learning Verilog/VHDL with the goal of working on projects at the hardware-software interface: custom accelerators, hardware-in-the-loop security validation, SoC prototyping on FPGA, and instrumentation that reaches below what software alone can observe. The long-term interest is in closing the gap between the people who design silicon and the people who write the code that runs on it.