Kamala Ram


Researcher & Sfotware Engineer

  • Degree: PhD (Expected: 2022)
  • Research Focus: Understanding, improving and troubleshooting large-scale distributed systems

As part of my PhD, I have been involved in efforts to:

  • Troubleshoot failures in production systems
  • Understand and improve fault-tolerance properites in large systems
Prior to this, I worked as a Software engineer at Arista Networks and Hewlett-Packard.

Work & Education


Kamala Ram

email: kamala.ramas@gmail.com

Broadly, I analyze and troubleshoot large distributed systems by posing and answering questions based on observed system executions. For example, to determine the next fault to inject when testing a system for fault tolerance, system designers may ask - What do all successful executions have in common? To troubleshoot an issue, operators may ask - How do unsuccessful executions differ from successful executions? To obtain an overall understanding of the system for feature development, programmers may ask - What can successful executions teach us about how they succeed? In my dissertation research, I answer the three questions posed above by deriving insights from observed system executions (distributed traces and provenance) and building software tools to demonstrate their applicability.


PhD (Computer Science)

2015 - 2022 (Expected)

University of California, Santa Cruz

MSc[Engg] (Computer Science)

2010 - 2013

Indian Institute of Science, Bangalore

B.E. (Computer Science)

2004 - 2008

Visvesvaraya Technological University, Belgaum

Programming Languages

Skill Level Programming Languages
Proficient C, Python
Familiar C, C++, Java, Go, TypeScript, Haskell, Perl

Work Experience

Intern, eBay

2018 - 2022
  • Aggregate Comparison of Traces for incident localization
  • Understanding fault-tolerance properities of microservices using traces and fault-injection

Intern, Intel Labs

June 2017 - August 2017
  • Worked on experimental framework to induce Service Level Aggreement (SLA) violations in GET path for Openstack Swift

Intern, Elastic

July 2016 - September 2016
  • Modeled data replication protocol at Elastic (flavor of primary backup)
  • For a pre-defined class of faults, our appoach demostrates how we use data lineage to ensure that the expected invariants are upheld even as the system evolves

Software Engineer, Arista Networks

2013 - 2015
  • Worked on supporting and enhacing network protocol stack (Programming Language: C)
  • Upstreamed a patch to linux kernel to fix support for blackhole and prohibit routes Link to patch

Software Engineer, Hewlett Packard

2008 - 2010
  • As an engineer in the Photo team (responsible for displaying thumbnails on-screen to printing of photos), I was involved in feature development and maintenance for a variety of product lines. (Programming language: C)

Publications & Talks

Posters & Publications

  • Dissertation PDF
  • Aggregate Comparison of traces for Incident Localization PDF
  • Identifying microservice design patterns PDF
  • Socc 2018, Poster: Does your fault-tolerant system tolerate faults?
  • HotCloud 2017, Paper: Growing a protocol PDF, Code

Talks & Write-ups

  • HPTS, 2019: Automated Fault Diagnostics
  • HPTS, 2017: Growing a protocol
  • Chaos Community Day, 2017: SLA violations: How and Why?
  • Chaos Community Day, 2017: Growing a Protocol
  • Blog Post: Model and test data replication at Elasticsearch Link to post