Cikla & Zhutovsky - Transfer Learning in Boosting Models | PyData Amsterdam 2023

Описание к видео Cikla & Zhutovsky - Transfer Learning in Boosting Models | PyData Amsterdam 2023

Did you know that you could do transfer learning on boosted forests too? Even in current days, we face business cases where the modelling sample is very low. This brings an uncertainty to the modelling results and in some cases no ability to model at all. To counter it, we investigated the ability to use transfer learning approaches on boosting models. In this talk, we would like to show the methods used and results from a real case example applied to the credit risk domain.

Transfer learning (TL), a form of machine learning, involves leveraging knowledge acquired while addressing one task and applying it to a related task. While TL is mainly associated with deep learning tasks, it is also applicable to boosting algorithms which are commonly used in advanced credit risk modelling.

During the talk, we present a real use-case involving building a probability of default (PD) model for a customer segment with small data history within the bank. There can be several ways to benefit from data coming from other customer segments with already rich data available within the bank.

Simple approaches would be:
Fit a model on only rich data & just apply to the limited data
Fit a model on both data sets, but tune it on the limited data

More complex (TL) approaches:
Fit a model on rich data with sample weights come from resemblance analysis to calculate similarity between these two data sources.
Use refitting with the limited data on the model trained on rich data
Start with an initial pre-trained model while modelling on the limited data

Join us for an engaging session where we will share the outcomes of our experiments and lessons learned, as we address these approaches that hold relevance beyond the presented use-case, offering practical applicability for similar scenarios in your own domain.

Bios:
Busra Cikla
Busra is an experienced data scientist with passion for analytics at ING’s Risk & Pricing Advanced Analytics Team in Amsterdam. She has designed and developed end-to-end advanced analytics solutions to a business problem in different domains during the last 5 years at ING. Currently, she is working on real-time credit risk models by using ML. Busra has a background on optimisation and operational research from her B.Sc. study and she has M.Sc. degree on Data Science.

Paul Zhutovsky

===

www.pydata.org

PyData is an educational program of NumFOCUS, a 501(c)3 non-profit organization in the United States. PyData provides a forum for the international community of users and developers of data analysis tools to share ideas and learn from each other. The global PyData network promotes discussion of best practices, new approaches, and emerging technologies for data management, processing, analytics, and visualization. PyData communities approach data science using many languages, including (but not limited to) Python, Julia, and R.

PyData conferences aim to be accessible and community-driven, with novice to advanced level presentations. PyData tutorials and talks bring attendees the latest project features along with cutting-edge use cases.

00:00 Welcome!
00:10 Help us add time stamps or captions to this video! See the description for details.

Want to help add timestamps to our YouTube videos to help with discoverability? Find out more here: https://github.com/numfocus/YouTubeVi...

Комментарии

Информация по комментариям в разработке