Databricks
Sign Up/Sign In
All jobs
Report
Orchestration using Databricks Jobs
Location
remote
About the job
This job is sourced from a job board
Overview
About the role
Databricks has built-in features for orchestrating data processing workloads so that you can coordinate and run multiple tasks as part of a larger workflow. You can optimize and schedule the execution of frequent, repeatable tasks and manage complex workflows. This article introduces concepts and choices related to managing production workloads using Databricks jobs. What are jobs? In Databricks, a job is used to schedule and orchestrate tasks on Databricks in a workflow. Common data processing workflows include ETL workflows, running notebooks, and machine learning (ML) workflows, as well as integrating with external systems like dbt. Jobs consist of one or more tasks, and support custom control flow logic like branching (if / else statements) or looping (for each statements) using a visual authoring UI. Tasks can load or transform data in an ETL workflow, or build, train and deploy ML models in a controlled and repeatable way as part of your machine learning pipelines. Example: Daily data processing and validation job The example below shows a job in Databricks. This example job has the following characteristics: The first task ingests revenue data. The second task is an if / else check for nulls. If not, then a transformation task is run. Otherwise, it runs a notebook task with a data quality validation. It is scheduled to run every day at 11:29 AM. To get a quick introduction to creating your own job, see Create your first workflow with a Databricks job.
About the company
Databricks has built-in features for orchestrating data processing workloads so that you can coordinate and run multiple tasks as part of a larger workflow.
Skills
data processing
workflow
orchestration
databricks
Apply for this job