任职要求:
Bachelor’s or Master’s degree in Computer Science, Engineering, Data Science, or related field.
8+ years of handson experience designing and building data pipelines and infrastructure.
Strong proficiency in data engineering tools and technologies, such as Apache Spark, Hadoop, Kafka, and cloud-based platforms (AWS, Azure, or GCP).
Hands-on experience with data modeling, ETL processes, and distributed data processing frameworks.
Proficiency in programming languages like Python, R, Java, and expertise in SQL.
Solid understanding of data architecture principles, microservices, and data governance frameworks. Excellent problem-solving skills and ability to drive solutions in a fast-paced environment.
Strong communication and collaboration skills, with a proven ability to work with both technical and non-technical stakeholders.
岗位职责:
Architect, implement, and optimize ETL/ELT workflows using tools such as Databricks and Informatica to ingest and process large datasets.
Deploy and manage data infrastructure on Azure and AWS, including containerized services and serverless components.
Partner with data scientists, software engineers, and business stakeholders to translate requirements into scalable data solutions.
Monitor and tune pipeline performance, implement robust monitoring/alerting (e.g., Prometheus, DataDog), and troubleshoot production issues.
Define and enforce data validation rules, lineage tracking, and governance standards to ensure accurate and compliant data usage.
Build CI/CD pipelines for data code, automate deployments, and support model operationalization workflows.
Share knowledge of data engineering best practices, conduct code reviews, and mentor junior engineers.