The candidate will be responsible for designing, building, and maintaining efficient, scalable, and reliable data infrastructure.
If you have a passion for building and optimizing data pipelines, and enjoy working with a team of skilled professionals, we encourage you to apply for this position.
The candidate should have a solid understanding of Databricks and its various components to be able to design, build, and optimize data pipelines on the Databricks platform. They should be able to leverage Databricks notebooks and clusters to develop ETL processes and perform data transformations. They should also be familiar with Databricks SQL and Databricks Delta for querying and managing data on Databricks.
In addition to the programming languages and big data processing frameworks mentioned earlier, the candidate should also have experience working with Databricks APIs and SDKs to automate various aspects of Databricks workflows. This could include automating cluster provisioning, job scheduling, and workflow orchestration using tools like Python and Apache Airflow.
Overall, the ideal candidate should be a well-rounded data engineer with expertise in developing and architecting data pipelines, as well as specific experience working with Databricks.
· Design, build and maintain data pipelines for various data sources and destinations
· Architect data pipelines for scalability, reliability and performance
· Develop ETL processes to integrate data from multiple sources
· Implement data quality checks and monitoring for data pipeline health
· Work with data analysts and data scientists to provide them with clean, reliable data for analysis
· Collaborate with other teams to integrate data across multiple systems
· Optimize and tune the performance of data pipelines
· Automate the deployment and management of data pipelines
· Experience working with Databricks, including knowledge of Databricks notebooks, clusters, jobs, and workflows
· Familiarity with Databricks data engineering best practices and optimization techniques
· Knowledge of Databricks SQL, Databricks Delta, and Databricks ML flow
· Experience with integrating Databricks with other data processing systems and tools
· Bachelor's degree in computer science, Information Technology, or related field
· Minimum of 3 years of experience as a data engineer
· Strong experience with programming languages like Python, Java, and Scala
· Experience with big data processing frameworks like Apache Spark, Hadoop, or Flink
· Strong knowledge of SQL and NoSQL databases
· Experience with cloud-based data processing services like AWS Glue, Azure Data Factory, or Google Cloud Dataflow
· Familiarity with data modeling and data warehousing concepts
· Experience with source control systems like Git
· Strong analytical and problem-solving skills
· Excellent communication and collaboration skills
· Master's degree in Computer Science, Information Technology, or related field
· Experience with distributed data processing frameworks like Apache Kafka or Apache Storm
· Familiarity with containerization and container orchestration systems like Docker and Kubernetes
· Experience with data visualization tools like Tableau, Power BI, or QlikView
Being part of the pioneer batch of data science hire, you will have to take full ownership of platform’s entir