Complete Master Class on Pydeequ & AWS Glue Data Quality for ETL Pipelines

Описание к видео Complete Master Class on Pydeequ & AWS Glue Data Quality for ETL Pipelines

You generally write unit tests for your code, but do you also test your data? Incoming data quality can make or break your application. Incorrect, missing, or malformed data can have a large impact on production systems. Examples of data quality issues include the following:
Missing values can lead to failures in the production system that require non-null values (NullPointerException)
Changes in the distribution of data can lead to unexpected outputs of machine learning (ML) models
Aggregations of incorrect data can lead to misguided business decisions

In this video, we will be exploring PyDeequ, an open source Python wrapper over Deequ (an open source tool developed and used at Amazon). Deequ is written in Scala, whereas PyDeequ allows you to use its data quality and testing capabilities from Python and PySpark, the language of choice for many data scientists.

Code:
======
https://github.com/SatadruMukherjee/D...
https://github.com/SatadruMukherjee/D...

Check this playlist for more Data Engineering related videos:
   • Demystifying Data Engineering with Cl...  

Apache Kafka form scratch
   • Apache Kafka for Python Developers  

Messaging Made Easy: AWS SQS Playlist
   • Messaging Made Easy: AWS SQS Playlist  

Snowflake Complete Course from scratch with End-to-End Project with in-depth explanation--
https://doc.clickup.com/37466271/d/h/...

Explore our vlog channel:
   / @funwithourfam  

Your Queries:
===========
Testing data quality at scale with PyDeequ
Monitor data quality in your data lake using PyDeequ
Test data quality at scale with Deequ
How to use PyDeequ for Testing Data Quality at Scale
Data Quality with Pydeequ
Data Quality with PyDeequ: A Comprehensive Guide
Getting started with AWS Glue Data Quality
Getting started with AWS Glue Data Quality for ETL Pipelines
AWS Glue Data Quality Overview | Amazon Web Services
Building Data Quality in ETL pipelines using AWS Glue
Monitor & manage data quality in your data lake with AWS Glue
Guaranteeing Data Quality SLAs with Deequ
Data quality, the secret of good analytics
Using PyDeequ with AWS Glue

Комментарии

Информация по комментариям в разработке