Data Engineer
Hybrid
IGS GLOBAL
Small/Medium Business
Product & Service
B2B
₹ 14-17 Lacs PA
Series C
Information Technology
Bangalore, Karnataka, India
Post Status: Active
Permanent
144 applications
Experience: 4-6 Years
Skills
SQL
JSON
Numpy
AWS
Pandas
Big Data
PySpark
Apache Airflow
ETL
Data Lakes
Posted 46 days ago

About the job

We are looking for a PySpark Data Engineer to design, build, and optimize scalable data pipelines and distributed data processing systems. You will work on high-performance ETL workflows using Spark and cloud-native technologies, ensuring reliable and efficient data delivery across platforms.

This is a hands-on individual contributor role focused on building robust data products in a fast-paced, engineering-driven environment.

 

Key Responsibilities

  • Design, build, and maintain scalable ETL pipelines using PySpark

  • Develop high-performance batch and real-time data processing systems

  • Extract, transform, and load data from multiple sources (databases, APIs, files)

  • Work extensively with large and complex datasets, including nested JSON structures

  • Build efficient data transformation logic using Python, Spark, NumPy, and Pandas

  • Write and optimize complex SQL queries for data processing and analytics

  • Collaborate with data scientists, analysts, and engineers to support data-driven applications

  • Ensure data quality, consistency, and performance across pipelines

  • Optimize Spark jobs for scalability and performance

Must-Have Skills

  • Strong hands-on experience with PySpark (mandatory)

  • Advanced Python skills with NumPy and Pandas

  • Strong SQL skills (complex queries, joins, transformations)

  • Experience building data pipelines and ETL workflows

  • Ability to handle complex data transformations and nested JSON structures

  • Experience integrating data from APIs, databases, and flat files

  • Strong problem-solving skills in data processing and manipulation

Good-to-Have Skills

  • Experience with Apache Airflow or similar workflow orchestration tools

  • Familiarity with AWS or GCP cloud platforms

  • Knowledge of modern data lake technologies (e.g., Iceberg, Delta Lake)

  • Experience working in startup or fast-paced product environments

  • Exposure to distributed systems and large-scale data platforms

Ideal Candidate Profile

We are looking for engineers who are:

  • Strong in hands-on coding (not just theoretical knowledge)

  • Passionate about data engineering and scalable systems

  • Comfortable working in fast-paced, high-ownership environments

  • Highly analytical, detail-oriented, and problem-solving driven

  • Proactive, energetic, and committed to delivering high-quality data solutions