Shazia Samreen — Software Engineer

01

Experience

Artus Team — Columnar File Format Jan 2025 → Present

Columnar file format used by multiple analytics-oriented storage systems, delivering order-of-magnitude latency and cost reductions for analytical queries.

Reader Library Size Reduction

The Artus reader library (~27,000 transitive dependencies) was too large to embed in downstream client applications, causing build failures and blocking a key storage system client's rollout. Analyzed the full dependency tree, identified the top contributors, and refactored them to reduce library size by 40%. Unblocked adoption across hundreds of client applications and resolved company-wide build health issues.

~27K Dependencies

40% Size Reduction

100s Apps Unblocked

Unknown Field Handling

Designed and implemented support for schema-mismatched fields in the Artus columnar reader/writer — a common problem in distributed systems where proto descriptors diverge. For write-time unknowns (field absent from writer schema), data is preserved as serialized blobs instead of columnar shredding. For read-time unknowns, resolution relies on file column datatype metadata. Led the read-path implementation and guided a teammate on write-path. Ensured lossless data integrity during schema evolution, unblocking petabyte-scale migrations from legacy formats.

PB-scale Migrations Enabled

0 Data Loss

Reader Revamp In Progress

Initiated and designed a next-generation modular, stateless data transformation library to replace a tightly-coupled, stateful reader. New architecture emphasizes separation of concerns and minimal dependencies, enabling comprehensive unit testing, faster rollouts, and support for large data rows across major data warehousing platforms.

Technical Leadership

Became key expert for the core data reading stack. Mentored new team members, resolved critical production bugs including high-priority customer escalations and memory issues, maintaining system stability.

IDX (Infra Data Exchange) — Distributed Storage System Aug 2022 → Dec 2024

Distributed storage system that ingests, processes, and serves infrastructure data as live state and historical time series across cluster, regional, and global failure domains. Replacement for legacy system (Infrastore).

UAS Stream Migration & Pseudonymization Award

Led migration of 8 high-volume PII data streams from Infrastore to IDX, requiring a 6.76× traffic scale-up (8 GB to 54 GB every 10 min) that the legacy system could not handle. Designed and built the pseudonymization service end-to-end (HLD/LLD), scaling it from 20 QPS to 2M QPS. Made targeted performance changes across other pipeline services to support the increased load. Received an internal engineering award for this project.

100,000× QPS Scale-up

6.76× Traffic Growth

8 Streams Migrated

Replica Collation Latency Reduction

Collation copies data across replicas and failure domains (cluster → regional → global). Reduced replica collation latency from 4–8 min to under 30s by implementing asynchronous scheduling and removing sequential timestamp dependencies.

4-8m → 30s Latency

Cross-Replica Deduplication

Resolved file deduplication issues across replicas that caused 1.5–2× redundant storage, preventing unnecessary data propagation from regional to global.

1.5-2× Storage Saved

Flume Library Migration

Designed and executed migration of 80+ query users across 73 tables from Infrastore to IDX. Solved rollback/rollforward challenges for client-binary-integrated query libraries.

80+ Users Migrated

73 Tables

Internships

Software Engineer Intern, Google

Built cost-in-dollars tracking features with permission-based access for a resource planning tool (Java/Angular).

Jan – Jul 2022

Software Engineer Intern, Amazon

Built Partner Onboarding Dashboard for product managers to onboard customer surveys (Spring MVC/React).

May – Jun 2021

02

Skills

Languages

C++ C SQL Java Go JavaScript

Systems

Distributed Systems Microservices Data Pipelines Data Warehousing Real-time Processing Pub/Sub

Tools & Practices

gRPC Protocol Buffers Docker System Design Performance Optimization On-call

03

Achievements

◆

C++ Readability — Granted Google C++ readability, a rigorous code quality certification requiring expert review

◆

Competitive Programming — 4-star on CodeChef, Specialist on Codeforces

04

Education

Jawaharlal Nehru Technological University

Integrated B.Tech + M.Tech (Dual Degree), Computer Science & Engineering

2017 — 2022

GPA: 8.8 / 10