Data Science Project: Engineer Time Series Data For Classification With Machine Learning

Описание к видео Data Science Project: Engineer Time Series Data For Classification With Machine Learning

In this data science project in Python, I walk you through my reasoning when I have to build a binary classification model, but the data is time series data or transaction data.
From problem framing to feature engineering, data modeling and cross validation, I go step by step in this beginner friendly data science tutorial.
Moreover, I explain how to split your dataset in train and test while insuring you don't have data leakage. I explain how I think about measuring the model performance and metric interpretation.

Will a customer buy at least once in the next three days?
00:00 - Introduction
00:58 - Load a csv with pandas
01:57 - Problem framing
03:23 - Discard time information from datetime column
04:51 - Aggregate pandas data frame by two conditions
06:50 - Fill in the missing days in time series
13:38 - How to use time series data in machine learning
16:08 - Feature engineering from time series data
21:09 - What is data leakage and how to avoid it?
22:45 - Remove duplicate rows from dataframe by subset of columns
23:25 - Data balancing
26:02 - Split data in train and test
27:15 - Binary classification with XGBoost
29:10 - Metrics for binary classification
37:47 - Cross validation
42:20 - Improvement potential
43:02 - See you next time!
#datascience #datascienceproject #machinelearning #machinelearningproject #timeseriesanalysis

Dataset: https://data.mendeley.com/datasets/9j...
Same dataset is used in:
Part 1: Exploratory Data Analysis Tutorial:    • Exploratory Data Analysis In Python: ...  
Give a 🌟 to the code repository: https://github.com/giraffa-analytics/...

Комментарии

Информация по комментариям в разработке