arunkumar saravanan — performance engineer

# about

i'm a performance engineer who enjoys the boring details — context switches, cache misses, branch mispredictions, stalls, and the occasional questionable JIT decision. i spend most of my time making large-scale distributed systems predictable under load.

currently the first and only full-time performance engineer at startree, driving all performance initiatives end-to-end. — from the release certification framework for startree pinot to benchmarking, regression detection, and customer-impacting optimizations in apache pinot.

previously at salesforce (hyperforce coreapp, graviton / gp3 migrations, jvm/jdk tuning) and zoho (application server scalability, sre).

now

driving startree iceberg benchmarking & scale tests · building a real-time database benchmarking framework (releasing soon) · shipping the release certification service for startree pinot · writing up every investigation as it happens.

recent landings: iceberg benchmarking blog, apache/pinot#17485 (40% json parse), #17593 (protobuf extractor caching).

# selected impact

40%

JSON parsing speedup

apache pinot ingestion

35%

Pinot reload optimization

thread-level caching

15–20%

Protobuf ingestion CPU

production at 2–3M events/sec

$1.8M

Cost savings

GP2 → GP3 migration (hyperforce)

15+

Customer workload replicas

onboarded via query replay

7+

Years

since 2019

# case studies

production perf investigations — hypothesis, tooling, finding, numbers. read the long-form versions on the case studies page.

a 40% json parsing speedup in apache pinot ~35–55% @ decode · merged

jackson was parsing every message twice. two-line fix to bind bytes directly to a Map. apache/pinot#17485.

3 optimizations in apache pinot's protobuf ingestion ~15–20% ingestion cpu

customer flame graph at 2–3M events/sec: rocksdb data-block cache miss, redundant findFieldByName() in hot loop, and a 2.9× faster code-gen decoder.

→ all case studies

# projects

heaplens

instant jvm heap dump analysis, right in your browser

github · docker hub

a fast, web-based HPROF heap dump analyzer that starts showing results the moment you upload a file. class histograms appear in under a second, the full dominator tree builds in the background, and the entire experience runs from a single static binary with zero external dependencies.

built with rust to explore streaming heap dump analysis and performance tooling. no jvm, no database, no config files.

key features

streaming results — histogram updates live as bytes arrive
dominator tree with retained-size analysis & gc root paths
oql — sql-like queries over heap objects
mat-style leak suspects with dominator-chain dedup
thread view with stack traces and retained object correlation
import from browser, s3, k8s pods, or local path

perf (5 gb / ~72m objects)

histogram (streaming)< 1s
reference extraction~60s
dominator tree~25s
retained sizes~15s

architecture

parser — streaming visitor, zero-copy mmap
graph — csr adjacency, lengauer-tarjan dominators
import — frontier file reader for concurrent parse

rust · axum · tokio · memmap2 · docker · k8s · apache 2.0 · tested up to 22 gb dumps (~450m objects)

# experience

senior performance engineer

startree · remote, chennai

mar 2025 — present

founding perf first and only full-time performance engineer at startree, owning every perf initiative end-to-end.

benchmarks, certification & regression

in flight Driving StarTree Iceberg benchmarking & scale tests; writeup coming shortly.
in flight Building a real-time database benchmarking framework (releasing soon).
Certify StarTree Pinot releases and sign off on production deployments; contributed to feature testing such as OOM protection.
Built deployment prechecks to detect ingestion-transformation breakages caused by backward incompatibilities in new releases.
Built tooling to generate customer-replica queries & synthetic datasets by parsing broker logs; onboarded 15+ customer replica query sets.
Identified and proposed a fix that improved Apache Pinot reload time by ~35% via thread-level caching.
Identified 3 performance bugs in Pinot's protobuf ingestion pipeline with potential to save 15–20% CPU — see case study.
Found and fixed a 40% JSON parsing regression in the Kafka → Pinot ingestion path — apache/pinot#17485, case study.
Conducted Graviton benchmarking for StarTree Pinot.
Upstream Apache Pinot contributions: #17485, #17593.
Participated in on-call rotation to ensure availability of production Pinot clusters.

certification framework as a service

Designed and built an end-to-end testing framework to certify StarTree releases — automated functional, performance and operational validations; reusable automation framework, internal utilities SDK, and deployment-as-a-service, with native realtime ingestion validation (via ShadowTraffic) and an AI-powered comparison assistant to surface regressions and strengthen release confidence. Write-up: startree.ai.

member of technical staff

salesforce · bangalore

aug 2023 — mar 2025

ec2 selection & cloud performance

Led Graviton 3rd and 4th generation processor performance evaluations for Hyperforce CoreApp in collaboration with AWS — analyzed system-level hotspots beyond the JVM (context switches, branch & TLB effectiveness, stalls, IPC, L1/L2/L3 cache effectiveness) to drive optimizations using perf and vTune.
Performance assessments across App and DB instance types (m6i.24xl, u-6tb1).
Led GP2 → GP3 migration evaluation in Hyperforce, contributing to ~$1.8M in cost savings.
Vertical scaling performance evaluation for m6a.32xl.

jvm / jdk optimization

JVM tuning (THP, StringDedup, escape analysis) — contributing to 5–10% CPU utilization improvement.
JVM heap-size reduction effort landing at 48 GB — ~3 GB improvement.
Evaluated Zing JVM performance in collaboration with the Azul team.
Performance evaluation for JDK 11 → JDK 17 migration.

ai & llm

Enhanced an internal AI agent with LLM-based intent recognition and chat history integration.
Built a POC tackling the broken-selector problem using an LLM for UI-based testing.

associate member of technical staff

salesforce · hyderabad

jan 2022 — jul 2023

database performance, release certification, tooling

Performance load execution and DB analysis for migrating a premium customer to Salesforce's in-house database.
Evaluated performance of a key scaling feature in Salesforce's in-house database.
Salesforce release and patch certification on Hyperforce and first-party environments.
RHEL 9 OS performance assessment in Hyperforce — profile debugging and analysis.
Built data-transfer pipeline to Tableau Analytics using Python and shell.
Developed and stabilized JMeter performance workloads; built payload-validation utility to boost productivity.

earlier — zoho corporation

chennai · feb 2019 — dec 2021

2019 — 2021

performance & scalability engineer → site reliability engineer → project trainee

performance & scalability engineer (aug 2020 – dec 2021): improved application server scalability through JVM, JIT and heap tunings. Reduced ParNew GC STW duration by ~70%. API scaling and capacity planning. Set up monitoring team and tooling for premium customers.
site reliability engineer (may 2019 – jul 2020): monitored Zoho CRM staging to flag off builds for production; debugging and code optimization in both staging and production environments; used VisualVM and Async Profiler for performance analysis; collaborated with dev teams to ensure product scalability prior to live releases.
project trainee (feb 2019 – mar 2019): first stint at Zoho before joining full-time.

# reliability

Reliability Highlights

Built StarTree Release Certification Service
Production deployment validation
Customer workload replay platform
On-call support for Pinot production clusters
AWS Graviton migration assessments
GP2 → GP3 migration ($1.8M savings)

# tools & signals

the instruments i actually use, and the signals i actually look at.

profilers: async-profiler · jfr · visualvm · intel vtune · perf (record/stat/annotate)
benchmarking: jmh · jmeter · shadowtraffic · custom harnesses for ingestion / query replay
jvm signals: gc pause & stw (parnew, g1, zgc) · jit compile queue & deopts · escape analysis · string dedup · thp · heap & off-heap growth · safepoint latency
cpu microarchitecture analysis: ipc · l1/l2/l3 cache hierarchy efficiency · tlb effectiveness · branch prediction behavior · frontend/backend stalls · context switches · numa locality · false sharing · perf counters (vTune, perf)
memory signals: heap & off-heap growth · page cache behavior · false sharing · numa locality
storage / os signals: page cache behavior · iostat / bpftrace i/o · rocksdb block cache hit rate (overall vs data block) · disk queue depth
languages: java · python · javascript · shell
platforms: apache pinot · kafka · hyperforce · aws ec2/ebs (graviton, gp2/gp3, nitro)
observability: grafana · prometheus · flame graphs · custom dashboards for ingestion / query health
infra: docker · kubernetes · linux · jenkins
other: llms / generative ai (coursera · cred)

# writing

40% JSON parsing speedup in apache pinot

linkedin · jan 2026

problem: ingestion regression in kafka → pinot pipeline · scale: production workloads · outcome: ~40% decode improvement via two-line fix

3 performance optimizations in apache pinot's protobuf ingestion

linkedin · jan 2026

problem: customer flame graph at 2–3M events/sec showing hotspots · scale: production-scale ingestion · outcome: 15–20% CPU reduction across rocksdb, lookup, and code-gen

building startree's release certification service

startree

problem: manual release validation is error-prone and time-consuming · scale: starttree pinot production deployments · outcome: end-to-end automation framework with real-time validation & ai-powered regression detection

iceberg benchmarking: startree vs. trino vs. clickhouse

startree

problem: no standardized benchmarks for lakehouse query performance · scale: multi-system comparative analysis · outcome: comprehensive benchmark suite driving starttree iceberg roadmap

# recommendations

selected, verbatim, from linkedin.

jimmy lam senior director of software engineering, central performance services · salesforce senior to arunkumar, did not manage directly · apr 2026

i highly recommend arun for any performance role. i have had the pleasure of indirectly leading arun and observing his work firsthand. what stands out most is how incredibly methodical he is in his investigations. when faced with complex technical issues, arun applies a rigorous, step-by-step approach that ensures no detail is overlooked and the root cause is thoroughly addressed. his systematic mindset and commitment to quality make him a tremendous asset to any engineering organization.

rupesh bandre principal software performance engineer · salesforce worked on the same team · apr 2025

i had the pleasure of mentoring arunkumar during his early days in salesforce, and i was immediately impressed by his rapid learning curve. he quickly grasped complex concepts and seamlessly transitioned into contributing to our performance engineering activities.

arunkumar truly embodies the spirit of a performance engineer. he possesses a remarkable ability to delve into challenging problem spaces, dissect them with precision, and implement effective solutions swiftly. his exceptional analytical skills have been instrumental in troubleshooting and resolving critical issues, significantly contributing to the success of our high-visibility deliverables.

beyond his technical prowess, arunkumar is a fantastic team player. he is always eager to learn from others, generously shares his knowledge, and consistently prioritizes team success. his collaborative spirit and positive attitude made him a valuable asset to our team.

vijaykumar mohandass member leadership staff · zoho crm was arunkumar's mentor · sep 2024

arun had both the strong technical skills and the drive to get things done. he was an asset to the team. be it a simple optimisation problem or an uphill climb, he was passionate and dedicated in the tasks picked. good attention to detail and very good reasoning abilities. perfect sre material.

thirumal kumar d phd bioinformatics · researcher was arunkumar's mentor · jun 2018

a young innovator who grasps concepts very quickly. a hard worker who could complete work in limited time.

# education & certifications

2015 — 2019: b.tech, information technology · anna university, tindivanam
oct 2024: generative ai with large language models · coursera · credential

# contact

$ cat contact.txt
email    = arunkumarucet@gmail.com
linkedin = linkedin.com/in/arunkumarucet
github   = github.com/arunkumarucet
location = "chennai, tamil nadu, india"