Kamala Ramasubramanian webpage

About

Researcher & Sfotware Engineer

Degree: PhD (Expected: 2022)
Research Focus: Understanding, improving and troubleshooting large-scale distributed systems

As part of my PhD, I have been involved in efforts to:

Troubleshoot failures in production systems
Understand and improve fault-tolerance properites in large systems

Prior to this, I worked as a Software engineer at Arista Networks and Hewlett-Packard.

Work & Education

Sumary

Kamala Ram

email: kamala.ramas@gmail.com

Broadly, I analyze and troubleshoot large distributed systems by posing and answering questions based on observed system executions. For example, to determine the next fault to inject when testing a system for fault tolerance, system designers may ask - What do all successful executions have in common? To troubleshoot an issue, operators may ask - How do unsuccessful executions differ from successful executions? To obtain an overall understanding of the system for feature development, programmers may ask - What can successful executions teach us about how they succeed? In my dissertation research, I answer the three questions posed above by deriving insights from observed system executions (distributed traces and provenance) and building software tools to demonstrate their applicability.

Education

PhD (Computer Science)

2015 - 2022 (Expected)

University of California, Santa Cruz

MSc[Engg] (Computer Science)

2010 - 2013

Indian Institute of Science, Bangalore

B.E. (Computer Science)

2004 - 2008

Visvesvaraya Technological University, Belgaum

Programming Languages

Skill Level	Programming Languages
Proficient	C, Python
Familiar	C, C++, Java, Go, TypeScript, Haskell, Perl

Work Experience

Intern, eBay

2018 - 2022

Aggregate Comparison of Traces for incident localization
Understanding fault-tolerance properities of microservices using traces and fault-injection

Intern, Intel Labs

June 2017 - August 2017

Worked on experimental framework to induce Service Level Aggreement (SLA) violations in GET path for Openstack Swift

Intern, Elastic

July 2016 - September 2016

Modeled data replication protocol at Elastic (flavor of primary backup)
For a pre-defined class of faults, our appoach demostrates how we use data lineage to ensure that the expected invariants are upheld even as the system evolves

Software Engineer, Arista Networks

2013 - 2015

Worked on supporting and enhacing network protocol stack (Programming Language: C)
Upstreamed a patch to linux kernel to fix support for blackhole and prohibit routes Link to patch

Software Engineer, Hewlett Packard

2008 - 2010

As an engineer in the Photo team (responsible for displaying thumbnails on-screen to printing of photos), I was involved in feature development and maintenance for a variety of product lines. (Programming language: C)

Publications & Talks

Posters & Publications

Dissertation PDF
Aggregate Comparison of traces for Incident Localization PDF
Identifying microservice design patterns PDF
Socc 2018, Poster: Does your fault-tolerant system tolerate faults?
HotCloud 2017, Paper: Growing a protocol PDF, Code

Talks & Write-ups

HPTS, 2019: Automated Fault Diagnostics
HPTS, 2017: Growing a protocol
Chaos Community Day, 2017: SLA violations: How and Why?
Chaos Community Day, 2017: Growing a Protocol
Blog Post: Model and test data replication at Elasticsearch Link to post