Towards Explainable Clustering: A Constrained Declarative based Approach (Matthieu Guilbert)

Описание к видео Towards Explainable Clustering: A Constrained Declarative based Approach (Matthieu Guilbert)

The importance of interpretability extends across various machine learning domains including clustering. Indeed, unsupervised clustering tasks often necessitate validation and understanding by domain experts. Our work introduces a novel interpretable clustering approach, seeking both high-quality clustering according to classic criteria and cluster explainability. In our context, we consider that a good cluster explanation should highlight properties that are frequent (coverage) and that distinguish it from the other clusters (discrimination). Our work can be compared to clustering ensemble methods that generate many base partitions and return a single final partition. While multiple approaches aiming at integrating expert knowledge in clustering ensemble methods have been introduced in recent years, none has focused on cluster selection from a set of clusters, nor on interpretability. The interpretable constrained clustering method that we propose leverages two views of data: one for clustering and another Boolean for generating explanations. The model starts by building a pool of candidate clusters and covering patterns for each of these clusters. The subsequent step relies on Constraint Programming (CP) for combinatorial cluster and pattern selection to satisfy various constraints, in the end resulting in a final clustering where each cluster is explained by covering and discriminant patterns. Expert knowledge can be integrated as structural constraints, where for example Must-Link and Cannot-Link constraints can be used to decrease the number of candidate clusters,or as explanation-based constraints where the expert specifies coverage and discrimination requirements, and other constraints such as the allowed overlapping between all clusters. Contributions include a formalization of interpretable clustering, a novel CP clustering model, and the introduction of three novel clustering explana-tion quality measures. The method also allows the use of expert knowledge at different stages, enhancing the cluster selection process. In this presentation, we will detail each step of our process and present evaluation and examples of results. We will provide comparison of the impact of different parameters on different datasets. This work was funded by the ANR project InvolvD (Interactive constraint elicitation for unsupervised and semi-supervised data mining) (ANR-20-CE23-0023).


Информация по комментариям в разработке