Senior Data Engineer
On-site
Fanatics
Enterprise
Product
B2C
₹ 25-50 Lacs PA
IPO/Public
Internet
Hyderabad, Telangana, India
Post Status: Active
Permanent
26 applications
Experience: 5-9 Years
Skills
Apache Spark
Snowflake
Apache Kafka
Java
Amazon Redshift
Scala
AWS Glue
Data Engineering
Big Data
ETL
Posted 101 days ago

About the job

We are looking for a Senior Data Engineer with a deep understanding of Apache Spark (Scala & PySpark), Kafka Streams (Java), AWS services, Snowflake, Apache Iceberg, Tableau, and Data Lake architectures.
As a senior member of our team, you will be responsible for leading the design, implementation, and optimization of large-scale data systems, real-time streaming solutions, and cloud-based data platforms.
You will work with other engineers to deliver high-quality data solutions, mentor junior team members, and collaborate closely with cross-functional teams to solve complex business problems.

Key Responsibilities:

  • Lead the design and development of scalable, high-performance data architectures on AWS, leveraging services such as S3, EMR, Glue, Redshift, Lambda, and Kinesis.

  • Architect and manage Data Lakes for handling structured, semi-structured, and unstructured data.

  • Design and build complex data pipelines using Apache Spark (Scala & PySpark), Kafka Streams (Java), and cloud-native technologies for batch and real-time data processing.

  • Optimize these pipelines for high performance, scalability, and cost-effectiveness.

  • Develop and optimize real-time data streaming applications using Kafka Streams in Java.

  • Build reliable, low-latency streaming solutions to handle high-throughput data, ensuring smooth data flow from sources to sinks in real-time.

  • Manage Snowflake for cloud data warehousing, ensuring seamless data integration, optimization of queries, and advanced analytics.

  • Implement Apache Iceberg in Data Lakes for managing large-scale datasets with ACID compliance, schema evolution, and versioning.

  • Design and maintain highly scalable Data Lakes on AWS using S3, Glue, and Apache Iceberg.

  • Ensure data is easily accessible, stored in optimal formats, and well-integrated with downstream analytics systems.

  • Work with business stakeholders to create actionable insights using Tableau.

  • Build data models and dashboards that drive key business decisions, ensuring that data is easily accessible and interpretable.

  • Continuously monitor and optimize Spark jobs, Kafka Streams processing, and other cloud based data systems for performance, scalability, and cost.

  • Implement best practices for stream processing, batch processing, and cloud resource management.

  • Lead and mentor junior engineers, fostering a culture of collaboration, continuous learning, and technical excellence.

  • Ensure high-quality code delivery, adherence to best practices, and optimal use of resources.

  • Work closely with Data Scientists, Product Managers, and DevOps teams to understand business needs and deliver impactful data solutions.

  • Participate in technical discussions, from system design to data governance.

  • Ensure that data pipelines, architectures, and systems are thoroughly documented and follow coding and design best practices.

  • Promote knowledge-sharing across the team to maintain high standards for quality and scalability.

    Required Skills & Qualifications:
    Education: Bachelor’s or Master’s degree in Computer Science, Engineering, or related field (or equivalent work experience).

    Experience:
    • 5+ years of experience in Data Engineering or a related field, with a proven track record of designing, implementing, and maintaining large-scale distributed data systems.
    • Proficiency in Apache Spark (Scala & PySpark) for distributed data processing and real-time analytics.
    • Hands-on experience with Kafka Streams using Java for real-time data streaming applications.
    • Strong experience in Data Lake architectures on AWS, using services like S3, Glue, EMR, and data management platforms like Apache Iceberg.
    • Proficiency in Snowflake for cloud-based data warehousing, data modeling, and query optimization.
    • Expertise in SQL for querying relational and NoSQL databases, and experience with database design and optimization.

    Technical Skills:
    • Strong Experience in building ETL pipelines using Spark(Scala & Pyspark) and maintain them.
    • Proficiency in Java, particularly in the context of building and optimizing Kafka Streams applications for real-time data processing.
    • Experience with AWS services (e.g., Lambda, Redshift, Athena, Glue, S3) and managing cloud infrastructure.
    • Expertise with Apache Iceberg for handling large-scale, transactional data in Data Lakes, supporting versioning, schema evolution, and partitioning.
    • Experience with Tableau for business intelligence, dashboard creation, and data visualization is a plus.
    • Knowledge of CI/CD tools and practices, particularly in data engineering environments.
    • Familiarity with containerization tools like Docker and Kubernetes for managing cloud based services.

    Soft Skills:
    • Excellent problem-solving skills, with a strong ability to debug and optimize large-scale distributed systems.
    • Strong communication skills to engage with both technical and non-technical stakeholders.
    • Proven leadership ability, including mentoring and guiding junior engineers.
    • A collaborative mindset and the ability to work across teams to deliver integrated solutions.

    Preferred Qualifications:
    • Experience with stream processing frameworks like Apache Flink or Apache Beam.
    • Knowledge of machine learning workflows and integration of ML models in data pipelines.
    • Familiarity with data governance, security, and compliance practices in cloud environments.
    • Experience with DevOps practices and infrastructure automation tools such as Terraform or CloudFormation.