- spokesperson: J. Vreeken (Max Planck Institute for Informatics)
- G. Weikum (Max Planck Institute for Informatics)
- L. Ghiringhelli, M. Scheffler (Fritz Haber Institute)
- J.M. Rost (Max Planck Institute for the Physics of Complex Systems)
For our part within BigMax, we focus on knowledge discovery in scientific data. Our goal is to identify surprisingly structured aspects of the data to the scientist, coupled with easily understandable descriptions of the discovered structure. In other words, we want to provide the scientist with potential building blocks for novel hypotheses about the process behind the data.
For interpretable insights we investigate pattern-based modeling for scientific knowledge discovery. Pattern mining is a key branch of data mining. Loosely speaking, a pattern is a “query” that selects a subset of the records in our data, and an explanation of why this subgroup is interesting. The traditional goal is to discover all such patterns, by which the result set is often highly redundant. Modern pattern mining circumvents this by asking instead for the set of patterns that describes the data best without redundancy.
We see two main challenges. The first is the efficient discovery of descriptive pattern-based models from very large and highly complex data considered. The second is the discovery of causal patterns directly from empirical data.