how to compare thousands & millions of rows using python | datacompy library| pandas | pyspark

Описание к видео how to compare thousands & millions of rows using python | datacompy library| pandas | pyspark

import datacompy
import pandas as pd


df1 = pd.read_csv("C:\\Users\\Public\\Data\\loan_100k_v1.csv")

df2 = pd.read_csv("C:\\Users\\Public\\Data\\loan_100k_v2.csv")
compare = datacompy.Compare(
df1,
df2,
join_columns='loan_number', #You can also specify a list of columns
abs_tol=0, #Optional, default to 0
rel_tol=0, #Optional, default to 0
df1_name='source', #Optional, defaults to 'df1'
df2_name='destination' #Optional, defaults to 'df2'
)

compare = datacompy.Compare(df1, df2, join_columns=['loan_number'])
print(compare.report()) #prinet results
print(compare.all_mismatch()) #gives mismatches between source and target

Комментарии

Информация по комментариям в разработке