Where Does Cross Validation Fail? (K-Fold)

Описание к видео Where Does Cross Validation Fail? (K-Fold)

As data science and machine learning have become more popular in finance I see all kinds of attempts to apply it without understanding either the data science concepts or in this case, not understanding basic theory (time-series). In this example I explain why and how you shouldn't use cross validation (k-fold) on time-series data. The two main issues are information leakage and serial correlation.

Information leakage occurs more often with panel data because people assume it is cross sectional based on time which is true however with a large enough sample from each time period you basically have all the information needed. If this doesn't make sense, review the central limit theorem.

Serial correlation is the correlation across time. It is a core concept to most financial data. I have come across people who think you can randomly scramble the data and then model it. This was done because time-series is hard and has many assumptions that need to be tested to create a robust and meaningful model. Instead of doing the hard work they put their heads in the sand and pretended it would work in practice. After it failed miserably, they then had a new approach to refit the model every month to "fix" this "mysterious" problem.


Support me with Ko-Fi (coffee) ☕
https://ko-fi.com/fancyquant


Website:
https://www.FancyQuantNation.com

Quant t-shirts, mugs, and hoodies:
teespring.com/stores/fancy-quant

Connect with me:
  / dimitri-bianco  
  / dimitribianco  

Комментарии

Информация по комментариям в разработке