Experience
Columnar file format used by multiple analytics-oriented storage systems, delivering order-of-magnitude latency and cost reductions for analytical queries.
The Artus reader library (~27,000 transitive dependencies) was too large to embed in downstream client applications, causing build failures and blocking a key storage system client's rollout. Analyzed the full dependency tree, identified the top contributors, and refactored them to reduce library size by 40%. Unblocked adoption across hundreds of client applications and resolved company-wide build health issues.
Designed and implemented support for schema-mismatched fields in the Artus columnar reader/writer — a common problem in distributed systems where proto descriptors diverge. For write-time unknowns (field absent from writer schema), data is preserved as serialized blobs instead of columnar shredding. For read-time unknowns, resolution relies on file column datatype metadata. Led the read-path implementation and guided a teammate on write-path. Ensured lossless data integrity during schema evolution, unblocking petabyte-scale migrations from legacy formats.
Initiated and designed a next-generation modular, stateless data transformation library to replace a tightly-coupled, stateful reader. New architecture emphasizes separation of concerns and minimal dependencies, enabling comprehensive unit testing, faster rollouts, and support for large data rows across major data warehousing platforms.
Became key expert for the core data reading stack. Mentored new team members, resolved critical production bugs including high-priority customer escalations and memory issues, maintaining system stability.
Distributed storage system that ingests, processes, and serves infrastructure data as live state and historical time series across cluster, regional, and global failure domains. Replacement for legacy system (Infrastore).
Led migration of 8 high-volume PII data streams from Infrastore to IDX, requiring a 6.76× traffic scale-up (8 GB to 54 GB every 10 min) that the legacy system could not handle. Designed and built the pseudonymization service end-to-end (HLD/LLD), scaling it from 20 QPS to 2M QPS. Made targeted performance changes across other pipeline services to support the increased load. Received an internal engineering award for this project.
Collation copies data across replicas and failure domains (cluster → regional → global). Reduced replica collation latency from 4–8 min to under 30s by implementing asynchronous scheduling and removing sequential timestamp dependencies.
Resolved file deduplication issues across replicas that caused 1.5–2× redundant storage, preventing unnecessary data propagation from regional to global.
Designed and executed migration of 80+ query users across 73 tables from Infrastore to IDX. Solved rollback/rollforward challenges for client-binary-integrated query libraries.
Software Engineer Intern, Google
Built cost-in-dollars tracking features with permission-based access for a resource planning tool (Java/Angular).
Software Engineer Intern, Amazon
Built Partner Onboarding Dashboard for product managers to onboard customer surveys (Spring MVC/React).
Skills
Achievements
C++ Readability — Granted Google C++ readability, a rigorous code quality certification requiring expert review
Competitive Programming — 4-star on CodeChef, Specialist on Codeforces
Education
Jawaharlal Nehru Technological University
Integrated B.Tech + M.Tech (Dual Degree), Computer Science & Engineering