Data Cleaning Process Using Databricks

Описание к видео Data Cleaning Process Using Databricks

This is my 11th YouTube video for Data Community to share my programming experience with pyspark using azure data bricks Here my objective is to show , method of build an automated process to clean data in data lake using functions like fillna & regular expression on Azure databricks . Replace null value, fix incorrect data & data type conversion taken as main steps to clean data. Configuration file used to define and maintain required meta data ( column name etc.) and conditions . That will help to run common steps through multiple tables and files.

#datacleaning #Azure #databricks #python #pyspark #datalake #dataengineering #parquet #ETL

Комментарии

Информация по комментариям в разработке