Principal Data Engineer - Data Platform & Ingress
Movable Ink - Remote US
As a Principal Data Engineer, you will help drive the direction of our Data Warehouse and Hybrid Data Lake Infrastructure. You will be hands-on with teammates across all departments, enabling them to access the data and empowering teams to make data-driven decisions about the direction of our business. You will play a pivotal role in owning the infrastructure and code for our data pipelines within our Data Platform, with a focus on handling data at scale. You will be responsible for designing, implementing, and optimizing ingestion pipelines, enabling seamless extraction and loading of data from various sources such as customer warehouses, business systems, and event messaging buses. Movable Ink collects campaign data resulting from billions of requests served each day—come and help us manage and make sense of the massive amount of data we’re ingesting!
Responsibilities:
- Partner with internal operations teams to identify, collect, and integrate data from various business systems, ensuring comprehensive and accurate data capture.
- Design, implement, and maintain robust batch and real-time data pipelines, leveraging tools like Apache Airflow, Apache Flink, and Terraform for IaC.
- Build and optimize Hybrid Data Lake / Data Warehouse infrastructure with solutions like Apache Iceberg for scalable and cost-effective storage.
- Ensure data pipelines adhere to best practices and are optimized for performance, scalability, and reliability.
- Conduct thorough testing of data pipelines to validate data accuracy and integrity.
- Monitor data pipelines, implement telemetry and alerting, troubleshoot any issues that arise, and proactively improve system reliability.
- Establish and track SLAs for data processing and delivery, ensuring timely and reliable access to data for all users.
- Become a mentor for less experienced team members and establish patterns and practices that increase the quality, accuracy, and efficiency of solutions produced by the team.
- Design and implement Change Data Capture (CDC) solutions to support real-time data replication and point-in-time data queries.
- Work with other teams to ensure secure data access and compliance with regulatory requirements (e.g., GDPR, CCPA, etc.).
Qualifications:
- 12+ years of professional experience in data engineering, software engineering, database administration, business intelligence, or related fields, with 8+ years as a Data Engineer focused on cloud-based Data Warehouses(Redshift, Snowflake, Firebolt, BigQuery). We currently use Redshift.
- Deep experience working with multi-petabyte, mission-critical databases, optimizing for high availability, performance, and reliability, informed by a strong understanding of database internals.
- Expert proficiency with Python and SQL, and significant experience building robust data pipelines with these languages.
- Expert proficiency in deploying and managing data pipeline orchestration frameworks such as Apache Airflow or Prefect. We currently use Apache Airflow.
- Significant experience with Infrastructure-as-Code (Terraform) and automating cloud infrastructure management.
- Significant experience with stream processing technologies such as Apache Flink, Apache Kafka, or Apache Pulsar.
- Significant experience in building telemetry, monitoring, and alerting solutions for large-scale data pipelines.
- Significant experience in implementing Hybrid Data Lake / Data Warehouse architectures, with a focus on Apache Iceberg or similar technologies.
- Significant experience in designing and implementing solutions that comply with regulatory requirements such as GDPR and CCPA.
- Experience in Agile/Scrum environments, working with technical managers and product owners to break down high-level requirements into actionable work.
- Excellent communication skills, with the ability to effectively collaborate across technical and business teams.
The base pay range for this position is $230,000 - $250,000 USD/ year. The base pay offered may vary depending on job-related knowledge, skills, and experience. Stock options and other incentive pay may be provided as part of the compensation package, in addition to a full range of medical, financial, and/or other benefits, depending on the position ultimately offered.
Studies have shown that women, communities of color, and historically underrepresented people are less likely to apply to jobs unless they meet every single qualification. We are committed to building a diverse and inclusive culture where all Inkers can thrive. If you’re excited about the role but don’t meet all of the abovementioned qualifications, we encourage you to apply. Our differences bring a breadth of knowledge and perspectives that makes us collectively stronger.
We welcome and employ people regardless of race, color, gender identity or expression, religion, genetic information, parental or pregnancy status, national origin, sexual orientation, age, citizenship, marital status, ethnicity, family or marital status, physical and mental ability, political affiliation, disability, Veteran status, or other protected characteristics. We are proud to be an equal opportunity employer.
ApplyJob Profile
Remote US
Benefits/PerksCollaborative environment Diverse and inclusive culture Flexible hours Incentive pay Inclusive culture Remote work Stock options
Tasks- Design and implement data pipelines
- Ensure data compliance
- Mentor team members
- Monitor and troubleshoot data systems
- Optimize data warehouse infrastructure
AI AI decisioning Apache Airflow Apache Flink BigQuery Business Intelligence Cloud-based data warehouses Cloud Infrastructure Communication Compliance Content Personalization Data-activated content generation Data compliance Data engineering Data Integration Data Monitoring Data pipeline orchestration Data Quality Engineering Infrastructure as Code Kafka Python Redshift Snowflake Software Engineering SQL Stream processing Terraform
Experience12 years
TimezonesAmerica/Anchorage America/Chicago America/Denver America/Los_Angeles America/New_York Pacific/Honolulu UTC-10 UTC-5 UTC-6 UTC-7 UTC-8 UTC-9