Software Engineer · ML Engineer · AI Engineer · M.S. @ Pitt
5+ years engineering production systems at scale — from distributed data pipelines and ML models to high-concurrency backend services.
About
I'm Tsung-Han (Johnson) Jao, currently pursuing my M.S. at the University of Pittsburgh with a perfect 4.0 GPA. I have 5+ years of industry experience building scalable backend systems and production ML pipelines across cybersecurity and e-commerce.
At Trend Micro, I architected distributed data pipelines processing billions of events and built high-concurrency scoring services. At Tagtoo, I developed deep learning models and recommendation systems serving 10M+ users. I bridge the gap between software engineering and machine learning.
Where ideas become code.
At Work
Experience
Architected distributed pipelines with PySpark & Airflow processing billions of security events in TB-scale Parquet on AWS S3.
Developed 10 detection filters in one quarter via statistical & sequence analysis on ADX, achieving 10× alert volume increase with <5% false positive rate.
Designed discriminative features for a deep learning phishing detection pipeline from email content.
Mentored interns to raise test coverage from 5% → 100% and build CI/CD pipelines.
Built deep learning models (CNN, DNN, Autoencoder) for identity resolution, unifying 10M+ user profiles across devices.
Developed SVD-based recommendation systems achieving 50% conversion rate lift, validated through A/B testing.
Orchestrated end-to-end model lifecycle via Docker & Kubeflow — weekly retraining, daily inference, drift monitoring.
Refactored legacy scripts, reducing report generation from 3 days to under 2 hours across 500+ e-commerce sites.
Education
GPA 4.0/4.0. TA for Mathematical Foundations of ML. Coursework in AI, Algorithm Design, Cloud Computing, Network Security.
Thesis: Deep hashing neural network for content-based image retrieval — 4.96% mAP increase on CIFAR-10/100.
Foundation in algorithms, data structures, and systems programming.
Never stop learning.
Pittsburgh, PA
Selected Work
A multi-agent system where a CEO agent autonomously recruits and orchestrates specialized sub-agents to collaboratively solve user problems — a virtual office powered by AI.
Full-stack application that integrates Gmail API with LLM-powered parsing to automatically track and organize job applications. Built with a Python backend, frontend UI, and database layer.
Designed discriminative feature sets from email content for a deep learning classification model, contributing to a 10× increase in detected threats with under 5% false positive rate.
Built CNN/DNN/Autoencoder models to consolidate fragmented browser & device footprints into unified profiles for 10M+ users, enabling precision-targeted advertising.
Developed matrix factorization-based recommendation system optimizing latent factors, achieving a 50% conversion rate increase validated through A/B testing.
Designed a CNN with Residual layers for content-based image retrieval, achieving 4.96% mAP improvement on CIFAR-10/100 through feature extraction optimization.
Speaking
Spoke on leveraging Kaggle competitions to sharpen practical ML engineering skills and bridge the gap between competition and production.
October 2022 · Taipei
"We don't stop until we cross the finish line."
Overheard at Tokyo Marathon, 2025
Race Log
Other Races
"Without data, you're just another person with an opinion."— W. Edwards Deming
Blog