What You’ll Do
As a Senior Data Engineer on the data platform team, we’ll rely on your expertise across multiple disciplines to develop, deploy and support data systems, data pipelines, data lakes, and lakehouses. Your ability to automate, performance tune, and scale the data platform will be key to your success.
Your initial areas of focus will include:
Collaborate with stakeholders to make effective use of core data assets
With Spark and Pyspark libraries, load both streaming and batched data
Engineer lakehouse models to support defined data patterns and use cases
Leverage a combination of tools, engines, libraries, and code to build scalable data pipelines
Work within an IT managed AWS account and VPC to stand up and maintain data platform development, staging, and production environments
Documentation of data pipelines, cloud infrastructure, and standard operating procedures
Express data platform cloud infrastructure, services, and configuration as code
Automate load, scaling, and performance testing of data platform pipelines and infrastructure
Monitor, operate, and optimize data pipelines and distributed applications
Help ensure appropriate data privacy and security
Automate continuous upgrades and testing of data platform infrastructure and services
Build data pipeline unit, integration, quality, and performance tests
Participate in peer code reviews, code approvals, and pull requests
Identify, recommend, and implement opportunities for improvement in efficiency, resilience, scale, security, and performance
What You Need to Get the Job Done (if you don’t have all, apply anyway!)
Experience developing, scaling, and tuning data pipelines in Spark with PySpark
Understanding of data lake, lakehouse, and data warehouse systems, and related technologies
Knowledge and understanding of data formats, data patterns, models, and methodologies
Experience storing data objects in hadoop or hadoop like environments such as S3
Demonstrated ability to deploy, configure, secure, performance tune, and scale EMR and Spark
Experience working with streaming technologies such as Kafka and Kinesis
Experience with the administration, configuration, performance tuning, and security of database engines like Snowflake, Databricks, Redshift, Vertica, or Greenplum
Ability to work with cloud infrastructure including resource scaling, S3, RDS, IAM, security groups, AMIs, cloudwatch, cloudtrail, and secrets manager
Understanding of security around cloud infrastructure and data systems
Git-based team coding workflows
Bonus Skills (Not Required, So Apply Anyway!)
Experience deploying and implementing lakehouse technologies such as Hudi, Iceberg, and Delta
Experience with Flink, Presto, Dremio, Databricks, or Kubernetes
Experience with expressing infrastructure as code leveraging tools like Terraform
Experience and understanding of a zero trust security framework
Experience developing CI/CD pipelines for automated testing and code deployment
Experience with QA and test automation
Exposure to visualization tools like Tableau
Beyond the technical skills, we’re looking for individuals who are:
Clear communicators with team members and stakeholders
Analytical and perceptive of patterns
Creative in coding
Detail-oriented and persistent
Productive in a dynamic setting
If you love to learn, you’ll be in good company. You’ll likely have a Bachelor’s degree in computer science, information systems, or equivalent working experience.
$62,500 — $120,000/year
To apply for this job, please visit the application page
