Senior Data Engineer
Hybrid
TechGrove
Small/Medium Business
Product
B2B
₹ 35-55 Lacs PA
Private Equity
Information Technology
Chennai, Tamil Nadu, India
Post Status: Active
Permanent
9 applications
Experience: 6-12 Years
Skills
Data Analysis
Apache Kafka
AWS
Machine Learning
PostgreSQL
Python
Query Optimization
ETL
Data Modeling
Database Migration
Posted 18 days ago

About the job

TechGrove is the Centre of Excellence for Banyan Software, based in Chennai, India. It plays a key role in supporting Banyan’s global businesses through technology, security, and software development. TechGrove brings together India’s deep pool of technical talent with Banyan’s long-term approach to growth, creating a trusted, developer-focused environment where people can do their best work.

This is a Senior Data Engineer – ML Systems role focused on rebuilding Touchstream’s data platform from the ground up to enable AI-driven streaming observability. Today, data is fragmented across systems and stored in PostgreSQL in a way that doesn’t support ML. Your role is to unify, scale, and redesign the data layer, then build intelligent anomaly detection on top of it.

What will you be responsible for?

1. Rebuild the Data Foundation

  • Consolidate two existing PostgreSQL systems into a single, coherent data model

  • Resolve inconsistent identifiers and clean up real-world data inconsistencies

  • Execute a zero-downtime migration strategy, including dual-write, backfilling, validation, and rollback planning

2. Build a Scalable Time-Series Platform

  • Migrate ~800M+ rows of streaming metrics into a purpose-built time-series architecture

  • Design a system that supports:

    • Real-time queries for dashboards and alerting

    • Historical data access for ML training and analysis

    • Data retention, aggregation, and rollup strategies

  • Handle high-cardinality metrics efficiently at scale

3. Enable ML-Driven Anomaly Detection

  • Replace manual monitoring workflows with automated anomaly detection systems

  • Develop models that:

    • Learn normal behavior patterns per stream over time

    • Handle noise, seasonality, and evolving baselines

    • Balance false positives with missed incident detection

  • Implement robust statistical baselines (e.g., EWMA, Z-score) alongside ML-based approaches

4. Optimize and Own the Backend Data Systems

  • Tune PostgreSQL for high performance (query optimization, indexing, partitioning)

  • Build and maintain reliable data pipelines with strong observability

  • Ensure the platform is production-ready for real-time, AI-driven features

What are we looking for?

Core Requirements

  • Strong PostgreSQL expertise (performance tuning, large-scale systems)

  • Experience with production data migrations (zero/low downtime)

  • Hands-on with time-series data or databases

  • Experience building anomaly detection or time-series ML systems in production

  • Strong Python engineering (production-quality code)

  • Comfortable working with AWS infrastructure

Nice to Have

  • Experience with multi-tenant data systems

  • Exposure to ML pipelines or synthetic data

  • Background in streaming, CDN, or video systems