ജോലി വിവരണം

About the company

Zillwork is a technology company headquartered in Singapore. The team builds solutions for parts of the economy that have historically been neglected, applying strong engineering practices along with modern AI and machine learning to messy, complex problems that many businesses have ignored.

The company is focused on creating infrastructure for a large and underserved global workforce that remains in demand but has not been well supported by existing systems.

If you are looking for work that tackles difficult technical challenges with meaningful real-world impact, this role may be a strong fit.

Role overview

As a Data Engineer, you will take ownership of core data infrastructure and pipelines. The role spans batch and streaming data movement, workflow orchestration, storage design, preprocessing for voice and text data, and privacy-aware data handling.

Key responsibilities

Build and maintain ingestion pipelines for both batch and streaming data using tools in the Kafka / Spark / Flink category.
Design and run ETL/ELT workflows with orchestrators such as Airflow, Prefect, or Dagster.
Plan data storage and schema structures across transactional databases like PostgreSQL as well as object and column-based storage such as S3 and Parquet.
Support feature store development and dataset version control using systems in the DVC / LakeFS category.
Handle voice and text preprocessing tasks including audio resampling, voice activity detection, transcript alignment, and tokenization/normalization for an Indic language.
Implement consent-based, DPDP-compliant data practices, including separation of personally identifiable information, retention policies, and data lineage tracking.
Set up and improve data quality controls such as validation, deduplication, and drift monitoring with tools similar to Great Expectations.

Requirements

At least 4 years of experience in data engineering.
Strong hands-on ability in Python and SQL, including window functions and query optimization.
Proven ownership of production data pipelines covering both batch and streaming systems.
Practical experience with distributed processing frameworks such as Spark, Flink, or Beam, along with a workflow orchestration tool.
Solid understanding of data modeling, including partitioning, indexing, and the trade-offs between normalization and denormalization.

Preferred experience

Experience working on audio/speech or NLP data pipelines.
Familiarity with CDC systems such as Debezium.
Exposure to columnar data warehouses like BigQuery, Snowflake, or ClickHouse.
Experience with message queues and infrastructure as code.
Applicants whose background is limited to BI or dashboard work, without pipeline ownership or distributed/streaming experience, are not a fit.
Submissions generated using AI are not acceptable.

Location

This position is based in Coimbatore, Tamil Nadu, India.

Data Engineer

Where you'll work