Data Versioning, Data Pipelines, and Data Lineage

DevOps Engineer

Location
San Francisco, CA OR Remote Anywhere US / Remote
Job Type
Full-time
Apply to Pachyderm and hundreds of other fast-growing YC startups with a single profile.
Apply to role ›

About the role

About Pachyderm

 

At Pachyderm, we're building an open-source enterprise-grade data science platform that lets you deploy and manage multi-stage, language-agnostic data pipelines while maintaining complete reproducibility and provenance. If you want to learn more about our grand vision, read what has become our "manifesto." Our system, developed with open source roots, shifts the paradigm of data science workflows by providing reproducibility, data provenance, and opportunity for true collaboration. Pachyderm utilizes modern technologies like Docker and Kubernetes to build an entirely new method of analyzing data.  Offered both as an in-house solution as well as hosted-service, Pachyderm brings together version-control for data with the tools to build scalable end-to-end ML/AI pipelines while empowering users to use any language, framework, or tool they want. 

 

What it’s like being part of The Pach

Pachyderm is a rapidly growing, Series B company funded by the top VC’s — Benchmark, Decibel, M12, and YCombinator. Pachyderm has always and will always embrace a “Remote-first” approach to growing our team. This allows us to hire a diverse group of individuals across the country (and world!) while giving our team members the flexibility to work from anywhere.

Being a member of The Pach means joining a supportive team that cares about you, values kindness and works hard to create an open and transparent workplace. 

Pachyderm is still small, so joining means you are getting in right at the ground floor and have an enormous impact on the success and direction of the company and product. 

The Role

Love Docker, Python, Golang, and distributed systems?

Pachyderm is hiring a DevOps expert to be a senior member of our team to help improve infrastructure, deployment, and testing processes. Pachyderm has a rapidly-growing engineering team and we're long overdue for some major improvements to our internal infra and engineering methodologies. 

While your primary focus will be building and maintaining various internal systems for the hosted platform, you'll also have the opportunity to contribute to the core product and work directly with users/customers who have complex deployment environments. At Pachyderm, OSS user and customer feedback is a major driver of our product roadmap and we believe that everyone within the company should experience that first-hand.

Pachyderm is just a small team right now, so you'd be getting in right at the ground floor and have an enormous impact on the success and direction of the company and product.  

We offer significant equity, full benefits, and all the usual startup perks.

Your responsibilities will include:

  • Develop our internal Go backend for the hosted platform
  • Manage and maintain internal Kubernetes clusters and hosted Pachyderm clusters
  • Optimize Pachyderm's CI to improve our development workflow and increase developer velocity.
  • Develop Pachyderm's internal testing/benchmarking framework (probably in Go) to perform large-scale benchmarks on a regular cadence.
  • Improve, test, script, and document the multitude of deployment options for Pachyderm's core product including all cloud providers and various permutations of on-prem k8s and object stores.
  • Build standard monitoring, logging, and deployment (e.g. Helm chart) packages so that Pachyderm users can get up and running faster
  • Work closely with our front-end, backend, and systems team to improve hosted cluster stability and uptime.

Qualifications:

  • Golang or other programming experience is required. While much of the job is automation and scripting, our testing frameworks, product backend, and internal automation work (e.g. k8s operators/CRDs) are all written in Go.
  • 4+ years of experience building, maintaining, and automating distributed systems, data infrastructure, back-end systems, or related infrastructure.
  • Experience working in a continuous integration / continuous delivery development environment
  • Expertise running and managing Kubernetes and Docker in one or more cloud providers, preferably as part of a large-scale, enterprise-class product related to storage, processing, networking, and/or virtualization
  • Expertise running and managing build, test, and release processes for 10+ person engineering orgs
  • Must have strong communication skills when talking about technical concepts. Our interview process strongly tests for communication as we have a very collaborative work environment where many parts of the codebase interact in complex ways

Benefits:

  • Significant equity, 401k and full benefits (100% medical, 99% dental and vision, 50% for all dependents).
  • Flexible PTO - work/life balance is important and we want you to take time off to rejuvenate!
  • Remote friendly- we were remote before remote was cool and we intend to continue to invest in a remote first culture.
  • Tons of fun swag and surprise packages sent to your doorstep. 
  • Tech and office stipends - what you buy is yours to keep.
  • Education and donation stipends - we want to support your career growth and the community.
  • Supportive parental leave (see also: work/life balance).
  • Encouraged fun - game days, fun activities, zoom hangouts and more (and - when responsible - visits to our home base for team on-sites)

We can’t wait to meet you and hope you’ll join our PACH!

Why you should join Pachyderm

At Pachyderm, we're building an open-source enterprise-grade data science platform that lets you deploy and manage multi-stage, language-agnostic data pipelines while maintaining complete reproducibility and provenance. If you want to learn more about our grand vision, read what has become our "manifesto." Our system, developed with open source roots, shifts the paradigm of data science workflows by providing reproducibility, data provenance, and opportunity for true collaboration. Pachyderm utilizes modern technologies like Docker and Kubernetes to build an entirely new method of analyzing data. Offered both as an in-house solution as well as hosted-service, Pachyderm brings together version-control for data with the tools to build scalable end-to-end ML/AI pipelines while empowering users to use any language, framework, or tool they want.

Pachyderm
Founded:2014
Team Size:60
Location:San Francisco
Founders
Joe Doliner
Joe Doliner
CEO
Joey Zwicker
Joey Zwicker
Founder