Data Engineer - Stream Data Processing - Distributed Data Processing
San Francisco, California, United States - Remote
About Pathway
Deeptech start-up, founded in March 2020.
- Our primary developer offering is an ultra-performant Data Processing Framework (unified streaming + batch) with a Python API, distributed Rust engine, and capabilities for data source integration & transformation at scale (Kafka, S3, databases/CDC,...).
- The single-machine version is provided on a free-to-use license (`pip install pathway`).
- Major data use cases are around event-stream data (including real-world data such as IoT), and graph data that changes over time.
- Our enterprise offering is currently used by leaders of the logistics industry, such as DB Schenker or La Poste, and tested across multiple industries. Pathway has been featured in Gartner's market guide for Event Stream Processing.
- Learn more at http://pathway.com/ and https://github.com/pathwaycom/.
Pathway is VC-funded, with amazing BAs from the AI space and industry. We have operations across Europe and in the US. We are headquartered in Paris, with significant support from the French ecosystem (BPI, Agoranov, WILCO,...).
The Team
Pathway is built by and for overachievers. Its co-founders and employees have worked in the best AI labs in the world (Microsoft Research, Google Brain, ETH Zurich), worked at Google, and graduated from top universities (Polytechnique, ENSAE, Sciences Po, HEC Paris, PhD obtained at the age of 20, etc…). Pathway’s CTO is a co-author with Goeff Hinton and Yoshua Bengio. The management team also includes the co-founder of Spoj.com (1M+ developer users) and NK.pl (13.5M+ users) and experienced growth leader who has scaled companies with multiple exits.
The opportunity
We are searching for a person with a Data Processing or Data Engineering profile, willing to work with live client datasets, and to test, benchmark, and showcase our brand-new stream data processing technology.
The end-user of our product are mostly developers and data engineers working in a corporate environment. Our development framework is one day expected to become for them a part of their preferred development stack for analytics projects at work – their daily bread & butter.
You Will
You will be working closely with our CTO, Head of Product, as well as key developers. You will be expected to:
- Implement the flow of data from their location in client's warehouses up to Pathway's ingress.
- Set up CDC interfaces for change streams between client data stores and i/o data processed by Pathway; ensuring data persistence for Pathway outputs.
- Design ETL pipelines within Pathway.
- …
This job isn't fresh anymore!
Search Fresh JobsJob Profile
- Design ETL pipelines
Apache Beam Apache Spark AWS Azure Dask Data engineering Data processing Distributed Systems ETL Flink Kafka Kubernetes Message Queues Protobuf Python Ray SQL Stream processing
Experience1 years
TimezonesAmerica/Anchorage America/Chicago America/Denver America/Los_Angeles America/New_York Pacific/Honolulu UTC-10 UTC-5 UTC-6 UTC-7 UTC-8 UTC-9