Skip to main content

Jason Lau - A Full-Stack Developer

Jason Lau is a , who has full-stack expertise in Hardware Design, Compiler, Devops, HPC, Algorithm, and Web Frontend and Backend.
Jason Lau

Professional Experience

Hardware Engineer

Jump Trading Group
Aug 2025 – Present | Sunnyvale, CA

Researching and developing high-level synthesis algorithms for high-frequency trading, and creating high-performance FPGA and ARM hardware accelerators based on the developed toolchain. Developing low-latency hardware logic and integration for modern multi-die FPGAs to optimize trading performance.

Chief Technology Officer & Co-Founder

RapidStream Design Automation (Acquired by a Leading Global HFT Firm)
Jan 2023 – Aug 2025 | Sunnyvale, CA

Devise the technical strategy and development of compiler solutions that deliver low-latency, high-performance FPGA accelerators. Engineered systems enabling efficient high-level physical designs directly from software specifications, significantly reducing compilation cycles.

Compilation Researcher (Intern)

Advanced Micro Devices (AMD), Xilinx
Jun 2022 – Sep 2022 | Longmont, CO

Researched and implemented an MLIR compiler that is hardware physical aware for AMD Versal devices, targeting the AI Engine array. Designed flows enabling high-performance application acceleration on next-generation SoCs.

Cloud Software Engineer (Intern)

Google
Jun 2016 – Sep 2016 | Cambridge, MA

Created "AwesomeChart," a cloud performance data analytical platform and visualization library capable of rendering millions of time series smoothly. The platform was integrated into Google Cloud Platform and remains in use by customers for monitoring critical metrics.

Key Engineering & Systems Projects

TAPA: Task-Parallel FPGA Compiler

Maintainer and Co-Author

In collaboration with lab members at UCLA, implemented TAPA, a task-parallel FPGA compiler based on Clang that enables high-level programming of FPGAs using C++ dataflow graph. Extended the compiler with nested tasks supporting complex applications. Implemented with lab members a high-performance parallel RTL co-simulation framework, allowing real-world application mixing HLS and RTL designs. Extended a high-level synthesis tool, AutoSA, to generate TAPA programs that provide higher performance than Vitis HLS.

HeteroRefactor: Source-to-Source Code Refactoring

Automated Code Refactoring Developer

Developed HeteroRefactor, a ROSE-based source-to-source code refactoring tool that automatically transforms C++ code to use heterogeneous programming models, so that the code's recursion, pointers and dynamic memory allocation are translated to vendor tool supported equivalents.

TUNA Open Source Mirror Site

Maintainer and Organizational Chair

Maintained the largest open-source mirror site in China (mirrors.tuna.tsinghua.edu.cn). Built a high-performance, reliable multi-server system with fast data storage, serving an average throughput of 3.1 Gbps used by users across China.

NGINXwise: RL-based Congestion Control

Network Software Developer

Built a production-ready NGINX-based testbed for evaluating reinforcement learning for optimizing the initial congestion control window (CWND). The testbed is used in measuring HTTP request latency on live traffic of a top Chinese website.

MPIfuse: Distributed File System

Computer System Researcher

In collaboration with team members, developed a high-performance distributed file system prototype using MPI and FUSE. Identified bottlenecks in FUSE and achieved 3.6 GB/s parallel write speeds on an 8-machine cluster, outperforming NFS on tmpfs.

CachedMIPS: FPGA-Based Processor

Hardware Design Engineer

Developed a pipelined write-back L1 cache with AHB protocol for a MIPS32r1 compatible CPU on Xilinx FPGA. Improved interrupt logic and peripheral support to enable full Linux booting in 8.4 seconds, achieving a 50x speedup on general applications.

Cherry: Cluster Management System

Full Stack Developer

Developed a network management system for a cluster used by a Chinese backbone network. Implemented deployment configuration (Puppet), resource management (virtualization, bandwidth), and metrics monitoring using Node.js, Redis, and Nagios.

iTunet: Automated Network Configuration

iOS Developer

Developed an iOS widget application that automatically configures settings for the Tsinghua University campus network. The app simplifies the process of connecting to the university's network by automating the connection setup, making the login process transparent on disconnections.

Iodine: Online Program Judge System

Full Stack Developer

In collaboration with the HUSTOJ author, developed an online judge system for programming competitions. Refactored the code to support modern web standards, including a new frontend using Bootstrap and a backend with Laravel PHP. The grading is isolated in Docker containers, allowing for secure and efficient execution of user-submitted code.