- അനുഭവം
- 4+ yrs
- ശമ്പളം
- —
- ഓപ്പണിംഗുകൾ
- 1
- പോസ്റ്റ് ചെയ്തു
- 5 മണിക്കൂർ മുമ്പ്
- Work mode
- ഓഫീസിൽ
- Eligibility
- Candidates with strong data engineering experience who can own production batch and streaming pipelines, work on distributed systems, and contribute to privacy-compliant data infrastructure are suitable. BI/dashboard-only professionals, applicants without pipeline ownership, and those lacking distr…
- Resume
- Required to apply
Where you'll work
ജോലി വിവരണം
About the company
Zillwork is a technology company headquartered in Singapore. The team builds solutions for parts of the economy that have historically been neglected, applying strong engineering practices along with modern AI and machine learning to messy, complex problems that many businesses have ignored.
The company is focused on creating infrastructure for a large and underserved global workforce that remains in demand but has not been well supported by existing systems.
If you are looking for work that tackles difficult technical challenges with meaningful real-world impact, this role may be a strong fit.
Role overview
As a Data Engineer, you will take ownership of core data infrastructure and pipelines. The role spans batch and streaming data movement, workflow orchestration, storage design, preprocessing for voice and text data, and privacy-aware data handling.
Key responsibilities
- Build and maintain ingestion pipelines for both batch and streaming data using tools in the Kafka / Spark / Flink category.
- Design and run ETL/ELT workflows with orchestrators such as Airflow, Prefect, or Dagster.
- Plan data storage and schema structures across transactional databases like PostgreSQL as well as object and column-based storage such as S3 and Parquet.
- Support feature store development and dataset version control using systems in the DVC / LakeFS category.
- Handle voice and text preprocessing tasks including audio resampling, voice activity detection, transcript alignment, and tokenization/normalization for an Indic language.
- Implement consent-based, DPDP-compliant data practices, including separation of personally identifiable information, retention policies, and data lineage tracking.
- Set up and improve data quality controls such as validation, deduplication, and drift monitoring with tools similar to Great Expectations.
Requirements
- At least 4 years of experience in data engineering.
- Strong hands-on ability in Python and SQL, including window functions and query optimization.
- Proven ownership of production data pipelines covering both batch and streaming systems.
- Practical experience with distributed processing frameworks such as Spark, Flink, or Beam, along with a workflow orchestration tool.
- Solid understanding of data modeling, including partitioning, indexing, and the trade-offs between normalization and denormalization.
Preferred experience
- Experience working on audio/speech or NLP data pipelines.
- Familiarity with CDC systems such as Debezium.
- Exposure to columnar data warehouses like BigQuery, Snowflake, or ClickHouse.
- Experience with message queues and infrastructure as code.
- Applicants whose background is limited to BI or dashboard work, without pipeline ownership or distributed/streaming experience, are not a fit.
- Submissions generated using AI are not acceptable.
Location
This position is based in Coimbatore, Tamil Nadu, India.