About the role
This is a remote position.
We are looking for DataBricks developer to assist in the development of the ETL process and Notebook cleanup. The project involves resolving several inconsistencies in the notebooks and report design. Therefore, we need solid experience with big data and DataBricks and the ability to review and clean notebook design and coding.
2+ Yrs. Experience preferred. This is a work from home opportunity. We are looking for both full-time and part-time roles.
Responsibilities:
Designing and implementing highly performant data ingestion pipelines from multiple sources using Apache Spark and/or Azure Databricks
Delivering and presenting proofs of concept of key technology components to project stakeholders.
Developing scalable and re-usable frameworks for ingesting assorted data sets
Integrating the end-to-end data pipeline to take data from source systems to target data repositories ensuring the quality and consistency of data are always maintained
Working with event-based / streaming technologies to ingest and process data
Working with other members of the project team to support the delivery of additional project components (API interfaces, Search)
Working with Azure CI/CD pipelines
Working with Azure Dev Ops to support and maintain source control
Evaluating the performance and applicability of multiple tools against customer requirements
Working within an Agile delivery / DevOps methodology to deliver proof of concept and production implementation in iterative sprints.
Knowledge:
Strong knowledge of Data Management principles
Experience in building ETL / data warehouse transformation processes
Direct experience in building data pipelines using Azure Data Factory and Apache Spark (preferably Databricks).
Experience using geospatial frameworks on Apache Spark and associated design and development patterns
Microsoft Azure Big Data Architecture certification.
Hands-on experience designing and delivering solutions using the Azure Data Analytics platform (Cortana Intelligence Platform) including Azure Storage, Azure SQL Data Warehouse, Azure Data Lake, Azure Cosmos DB, Azure Stream Analytics