Portfolio

FlashInfer-Bench: Building the Virtuous Cycle for AI-driven LLM Systems

A standardized, closed-loop framework that connects kernel generation, benchmarking, and deployment

Nimbus: Burst-Resilient Hybrid Inference for LLMs

Routing algorithm for real-world LLM services—balancing TTFT SLOs and cost via selective API offload.

Clock2Q+: Correlation-Aware Metadata Caching Replacement Algorithm for Enterprise Storage Systems

Production-oriented cache replacement algorithm for VMware vSAN.

WebLLM Assistant: Browser Agents Powered by In-Browser LLMs

A middle-layer API that bridges local web agents with the browser environment; Overleaf & Google Workspace integrations.