We are looking for a PySpark Data Engineer to design, build, and optimize scalable data pipelines and distributed data processing systems. You will work on high-performance ETL workflows using Spark and cloud-native technologies, ensuring reliable and efficient data delivery across platforms.
This is a hands-on individual contributor role focused on building robust data products in a fast-paced, engineering-driven environment.
Design, build, and maintain scalable ETL pipelines using PySpark
Develop high-performance batch and real-time data processing systems
Extract, transform, and load data from multiple sources (databases, APIs, files)
Work extensively with large and complex datasets, including nested JSON structures
Build efficient data transformation logic using Python, Spark, NumPy, and Pandas
Write and optimize complex SQL queries for data processing and analytics
Collaborate with data scientists, analysts, and engineers to support data-driven applications
Ensure data quality, consistency, and performance across pipelines
Optimize Spark jobs for scalability and performance
Strong hands-on experience with PySpark (mandatory)
Advanced Python skills with NumPy and Pandas
Strong SQL skills (complex queries, joins, transformations)
Experience building data pipelines and ETL workflows
Ability to handle complex data transformations and nested JSON structures
Experience integrating data from APIs, databases, and flat files
Strong problem-solving skills in data processing and manipulation
Experience with Apache Airflow or similar workflow orchestration tools
Familiarity with AWS or GCP cloud platforms
Knowledge of modern data lake technologies (e.g., Iceberg, Delta Lake)
Experience working in startup or fast-paced product environments
Exposure to distributed systems and large-scale data platforms
We are looking for engineers who are:
Strong in hands-on coding (not just theoretical knowledge)
Passionate about data engineering and scalable systems
Comfortable working in fast-paced, high-ownership environments
Highly analytical, detail-oriented, and problem-solving driven
Proactive, energetic, and committed to delivering high-quality data solutions