The API Services team is responsible for engineering and delivering cutting-edge services to aide in content delivery to end customers. These services support 110 news brands, and more than 110 million unique monthly visitors.
The Principal Data Engineer will play a key role in architecting, developing and maintaining the data architecture for Gannett’s new Content Platform that supports the content production & delivery systems that are consumed by both our network of 3000 journalists & our customer facing products. You will be expected to design & consume large scale, fault tolerant and highly available architectures. A large part of your role will be forward looking, with an emphasis on optimizing content structures & relationships. If you have a passion for rapid development, automation, learning, challenging and bettering your peers, with a strong desire to operate in a full stack environment, you’d probably fit in well here.
Responsibilities:
Collaborate with stakeholders & developers to identify data needs & ideal implementation.
Contribute to the architecture and vision of Gannett’s content data pipeline.
Track record of evolving complex data environments.
Continuously evaluate data usage patterns and identify areas of improvement.
Interface closely with data scientists and engineering to ensure reliability and scalability of data environment.
Drive future state technologies, designs and ideas across the organization.
Provide planning for two-week sprints.
Provide day to day operational support for our applications.
Improve and establish best practice around our application and infrastructure monitoring.
Automate everything:
Containerizing applications with Docker
Scripting new solutions/APIs/services to reduce toil
Research new tools to optimize cost, deployment speed and resource usage
Assist in improving our onboarding structure and documentation.
Responsibility Breakdown:
30% – Data architecture design / review
20% – Mentoring
15% – Application Support
15% – Planning / Documentation
10% – Design applications / recommendations / poc
10% – New Technology Evaluation
Technologies:
Systems:
Linux
Couchbase
Elastic Search
Solr
Neo4j
Other NoSQL Databases
Exciting things you get to do:
Engineering high-performant applications with an emphasis on concurrency
Agile
Amazon Web Services, Google Compute Engine
Google DataStore, Spanner, DynamoDB
Docker, Kubernetes
Database testing
GraphQL
Fastly
Terraform
Monitoring with NewRelic
Minimum Qualifications:
Deep experience in ETL design, schema design and dimensional data modeling.
Ability to match business requirements to technical ETL design and data infrastructure needs.
Experience using search technologies like Elasticsearch and Solr and designing the integration of search with a persistent data store.
Deep understanding of data normalization methodologies.
Deep understanding of both Relational and NoSQL databases.
Experience with data solutions like Hadoop, Teradata, Oracle.
Proven expertise with query languages such as SQL, T-SQL, NRQL, solr querying.
Self-Starter that can operate in a remote-friendly environment.
Experience with Agile (Scrum) and test driven development, continuous integration and version control (GIT).
Experience deploying to Cloud compute or container hosting.
Experience working with data modeling tools.
Basic understanding of REST APIs, SDKs and CLI toolsets.
Understanding of web technologies.
Experience with Data in a media industry is a plus.