Senior Data Engineer position at TechGrove in Chennai

Back to job board

Senior Data Engineer

Hybrid

TechGrove

Small/Medium Business

Product

B2B

₹ 35-55 Lacs PA

Private Equity

Information Technology

Chennai, Tamil Nadu, India

Post Status: Active

Permanent

9 applications

Experience: 6-12 Years

Skills

Data Analysis

Apache Kafka

AWS

Machine Learning

PostgreSQL

Python

Query Optimization

ETL

Data Modeling

Database Migration

Posted 18 days ago

About the job

TechGrove is the Centre of Excellence for Banyan Software, based in Chennai, India. It plays a key role in supporting Banyan’s global businesses through technology, security, and software development. TechGrove brings together India’s deep pool of technical talent with Banyan’s long-term approach to growth, creating a trusted, developer-focused environment where people can do their best work.

This is a Senior Data Engineer – ML Systems role focused on rebuilding Touchstream’s data platform from the ground up to enable AI-driven streaming observability. Today, data is fragmented across systems and stored in PostgreSQL in a way that doesn’t support ML. Your role is to unify, scale, and redesign the data layer, then build intelligent anomaly detection on top of it.

What will you be responsible for?

1. Rebuild the Data Foundation

Consolidate two existing PostgreSQL systems into a single, coherent data model
Resolve inconsistent identifiers and clean up real-world data inconsistencies
Execute a zero-downtime migration strategy, including dual-write, backfilling, validation, and rollback planning

2. Build a Scalable Time-Series Platform

Migrate ~800M+ rows of streaming metrics into a purpose-built time-series architecture
Design a system that supports:
- Real-time queries for dashboards and alerting
- Historical data access for ML training and analysis
- Data retention, aggregation, and rollup strategies
Handle high-cardinality metrics efficiently at scale

3. Enable ML-Driven Anomaly Detection

Replace manual monitoring workflows with automated anomaly detection systems
Develop models that:
- Learn normal behavior patterns per stream over time
- Handle noise, seasonality, and evolving baselines
- Balance false positives with missed incident detection
Implement robust statistical baselines (e.g., EWMA, Z-score) alongside ML-based approaches

4. Optimize and Own the Backend Data Systems

Tune PostgreSQL for high performance (query optimization, indexing, partitioning)
Build and maintain reliable data pipelines with strong observability
Ensure the platform is production-ready for real-time, AI-driven features

What are we looking for?

Core Requirements

Strong PostgreSQL expertise (performance tuning, large-scale systems)
Experience with production data migrations (zero/low downtime)
Hands-on with time-series data or databases
Experience building anomaly detection or time-series ML systems in production
Strong Python engineering (production-quality code)
Comfortable working with AWS infrastructure

Nice to Have

Experience with multi-tenant data systems
Exposure to ML pipelines or synthetic data
Background in streaming, CDN, or video systems