Pattern dicovery

3. Discovering interpretable patterns, correlations, and causality

Spokesperson: Jilles Vreeken (MPI for Informatics)

Co-Spokesperson: Gerhard Weikum (MPI for Informatics) , Matthias Scheffler, Luca Ghiringhelli (Fritz Haber Institute), Jan Michael Rost (MPI for the Physics of Complex Systems)

For our part within BiGmax, we focus on knowledge discovery in scientific data. Our goal is to identify surprisingly structured aspects of the data to the scientist, coupled with easily understandable descriptions of the discovered structure. In other words, we want to provide the scientist with potential building blocks for novel hypotheses about the process behind the data.

For interpretable insights we investigate pattern-based modeling for scientific knowledge discovery. Pattern mining is a key branch of data mining. Loosely speaking, a pattern is a “query” that selects a subset of the records in our data, and an explanation of why this subgroup is interesting. The traditional goal is to discover all such patterns, by which the result set is often highly redundant. Modern pattern mining circumvents this by asking instead for the set of patterns that describes the data best without redundancy.

We see two main challenges. The first is the efficient discovery of descriptive pattern-based models from very large and highly complex data considered. The second is the discovery of causal patterns directly from empirical data.

The individual projects and the corresponding members are listed here:

3.1. Multidimensional momentum maps, pulse shape optimization with Gaussian processes and XFEL diffraction patterns - Jan Michael Rost, Ulf Saalmann (MPI for the Physics of Complex Systems)

3.2. Exploiting causal knowledge for machine learning - Bernhard Schölkopf, Stefan Bauer (MPI for Intelligent Systems)

3.3. Discovery of interpretable patterns, correlation and causality in scientific data - Jilles Vreeken, Gerhard Weikum (MPI for Informatics)

3.4. Information-theory based features' selection for material properties - Matthias Scheffler, Luca Ghiringhelli (Fritz Haber Institute)