What Is A Data Catalog And Why Do People Use Them?

Описание к видео What Is A Data Catalog And Why Do People Use Them?

Special Thanks To Atlan For Partnering With Me On This Video. Learn more about them here: https://bit.ly/3VMCCXV

What is a data catalog?

iData was Facebook’s data discoverability tool. It provided a lot of functionality that I have started to miss. This included the baseline functions you would expect including the ability to find tables, trace lineage, and track down owners of said tables.

But there were also other beneficial features like cost tracking, data quality assessments, and table certification. All of these features made it easy for a new data engineer to quickly orient themselves as they started on new projects.

My Favorite iData Feature
My favorite features involved being able to see how other users were using the data on a query level. This provided a lot more context than just commented fields. ERDs and data lineage are all great. But seeing exactly how other users were using the data made it easy to understand(also they were great people to ping if you had questions).

It was so easy to quickly understand how the data was already being used. This provided several benefits including:

Reducing the duplication of work

Providing context on how data could join together(even across multiple data sources)

It would let you know who to ask questions about the data. Sure, the owner is one great place to start, but sometimes owners, over time, move away from datasets

Upon leaving the company formerly known as Facebook I felt like I kept stumbling on a new data catalog or discoverability tool every week. At this point, I am sure I have come across at least 3-5 dozen data discovery tools all of which add their own flair to helping teams manage their metadata.


If you enjoyed this video, check out some of my other top videos.

Top Courses To Become A Data Engineer In 2022
   • Top Courses To Become A Data Engineer...  

What Is The Modern Data Stack - Intro To Data Infrastructure Part 1
   • What Is The Modern Data Stack - Intro...  

If you would like to learn more about data engineering, then check out Googles GCP certificate
https://bit.ly/3NQVn7V

If you'd like to read up on my updates about the data field, then you can sign up for our newsletter here.

https://seattledataguy.substack.com/​​

Or check out my blog
https://www.theseattledataguy.com/

And if you want to support the channel, then you can become a paid member of my newsletter
https://seattledataguy.substack.com/s...


Tags: Data engineering projects, Data engineer project ideas, data project sources, data analytics project sources, data project portfolio

_____________________________________________________________
Subscribe:    / @seattledataguy  
_____________________________________________________________
About me:
I have spent my career focused on all forms of data. I have focused on developing algorithms to detect fraud, reduce patient readmission and redesign insurance provider policy to help reduce the overall cost of healthcare. I have also helped develop analytics for marketing and IT operations in order to optimize limited resources such as employees and budget. I privately consult on data science and engineering problems both solo as well as with a company called Acheron Analytics. I have experience both working hands-on with technical problems as well as helping leadership teams develop strategies to maximize their data.

*I do participate in affiliate programs, if a link has an "*" by it, then I may receive a small portion of the proceeds at no extra cost to you.

Комментарии

Информация по комментариям в разработке