Report

Orchestration using Databricks Jobs

Location

remote

About the job

Info This job is sourced from a job board

Overview

About the role

Databricks has built-in features for orchestrating data processing workloads so that you can coordinate and run multiple tasks as part of a larger workflow. You can optimize and schedule the execution of frequent, repeatable tasks and manage complex workflows. This article introduces concepts and choices related to managing production workloads using Databricks jobs. What are jobs? In Databricks, a job is used to schedule and orchestrate tasks on Databricks in a workflow. Common data processing workflows include ETL workflows, running notebooks, and machine learning (ML) workflows, as well as integrating with external systems like dbt. Jobs consist of one or more tasks, and support custom control flow logic like branching (if / else statements) or looping (for each statements) using a visual authoring UI. Tasks can load or transform data in an ETL workflow, or build, train and deploy ML models in a controlled and repeatable way as part of your machine learning pipelines. Example: Daily data processing and validation job The example below shows a job in Databricks. This example job has the following characteristics: The first task ingests revenue data. The second task is an if / else check for nulls. If not, then a transformation task is run. Otherwise, it runs a notebook task with a data quality validation. It is scheduled to run every day at 11:29 AM. To get a quick introduction to creating your own job, see Create your first workflow with a Databricks job.

About the company

Databricks has built-in features for orchestrating data processing workloads so that you can coordinate and run multiple tasks as part of a larger workflow.

Skills

data processing

workflow

orchestration

databricks