Big data on innovative materials
Ten institutions of the Max Planck Society and Humboldt-Universität zu Berlin combine their know-how in data-driven materials science. The aim is a better use of the possibilities associated with analyzing large amounts of data.
Which alloying constituents lend a steel unique bending strength, extreme hardness and non-rusting properties? Are semiconductors that promise greater efficiencies for solar modules available, and do they offer greater flexibility than silicon? What would be the best catalyst for a very specific chemical reaction? Or, how should a surface be coated to achieve the best possible thermal protection? To more easily find answers to these typical problems facing materials scientist in future, researchers from the above cited Institutions hope to better exploit the opportunities presented by analyzing large volumes of data. To this end, they cooperate in MaxNet on Big-Data-Driven Materials Science or, simply, BiGmax.
Generally, when scientists search for a new material for a specific purpose, they previously had to rely on the results of experiments on selected materials. And yet they never know whether there is not a better solution out there. How practical would it be, then, if researchers from both academia and industry could simply refer to a table to find the optimal material for their purpose? However, this is still far from the reality. "To date, around 240,000 inorganic materials alone are known; yet we have knowledge of only some of the properties of less than 100 of these substances", says Matthias Scheffler, Director at the Max Planck Society's Fritz Haber Institute in Berlin. As a theoretical physicist, he is certain that the large volumes of data being universally collected, also referred to as Big Data, can help to move closer to the table mentioned above. He imagines though this table more as a kind of multi-dimensional materials map.
Scheffler is a co-initiator of the cross-institutional alliance MaxNet on Big Data-Driven Materials Science within the Max Planck Society. The declared aim of BiGmax is to innovatively utilize the large, in part previously existing data, and to thereby make them a driving force in materials research. In addition to the Humboldt-Universität zu Berlin, another 10 MPG facilities are collaborating: the Max Planck Institutes for Dynamics of Complex Technical Systems (in Magdeburg), for Colloid and Interface Research (Potsdam-Golm), for Polymer Research (Mainz), for Eisenforschung (Düsseldorf), for Physics of Complex Systems (Dresden), for Structure and Dynamics of Matter (Hamburg), for Intelligent Systems (Tübingen), for Computer Science (Saarbrücken), Fritz Haber Institute (Berlin) and Max Planck Computing and Data Facility (Garching).
Patterns in large amounts of data provide a complete new information
"All of these facilities are already working with large data volumes collected during experiments or computer simulations", explains Peter Benner. At the Max Planck Institute for Dynamics of Complex Technical Systems in Magdeburg, the mathematician leads the Computational Methods in Systems and Control Theory Research Group. For example, Benner says, procedures such as x-ray structural analysis or atom probe tomography alone deliver millions of data values per minute; data from which researchers gain insights into the configuration of atoms in solids, for example. Enormous data volumes also result from the quantum mechanics analyses commonplace in solid-state physics and chemistry. The researchers can now draw conclusions on material properties from these data.
However, the new alliance aims to gain even more insights from these data. New methods will be developed to this end, and existing methods refined. "For example, in materials research the data present highly specific challenges to the computer algorithms", explains Benner, who coordinates the new collaboration together with Matthias Scheffler. "This can all be achieved better jointly", says Benner. "Because although we research in different disciplines, the methodological problems are the same for the respective data analyses”.
One of the central objectives: investigating the data for particular structures or patterns, which will then allow completely new information to be extracted, in addition to what is already known. Matthias Scheffler from Berlin points out other disciplines where this is already the case. Epidemiologists, for example, were able to derive in which regions the flu was prevalent based on user queries in Internet search engines. They were able to follow the outbreak's propagation and even forecast its future dispersal on this basis. As Scheffler says, one only needs to recognize the patterns in the data.
A new paradigm in materials science
The cooperating Max Planck scientists are consequently now hopeful that in future, materials researchers can gain new insights from their existing data material. The network aims to concentrate joint activities on five different topics. The objective is to be able to theoretically predict the properties of metals and alloys, determine the causal relationships between material properties and data structures, develop data diagnostics methodologies to convert collected experimental data even more quickly to image information, and facilitate the design of polymer materials with specific, desired properties. In the fifth topic area, the network aims to continue the already started Materials Encyclopaedia. The Novel Materials Discovery Laboratory (NOMAD Centre of Excellence) had previously worked on this encyclopaedia, using exclusively theoretically computed entries. Experimental data will now also be included as part of BigMax.
For Peter Benner, there is no question that the cross-institutional cooperation will integrate complementary information and thus substantially simplify their work. One example he sees is data diagnostics, in which his Magdeburg-based Group collaborates with their colleagues in Potsdam-Golm. "In Golm, they are researching imaging methods that allow new insights into the nanostructures of biomaterials such as bones, for example", explains Benner. "Here, we mathematicians can help to suitably compress the accrued data volume such that they can be quickly converted to informative images."
Until the dream of the multi-dimensional material map is fulfilled, in which one simply looks up the best material to use, there is still a long way to go. But Matthias Scheffler does not doubt the fact that Big Data will help reach this target. Here, he sees a new paradigm in the materials sciences: "Previously, researchers have investigated selected systems and developed models based on a general theoretical understanding", says Scheffler. "I believe that the future quest in terms of Big Data analyses will be the search for structures and patterns in large data volumes. And once we have finally developed the equations to describe them, we can then apply them to materials that we have not even analyzed yet."
With data from solar cells to new thermoelectrics
The physicist believes he can also reach unconventional solutions much more easily using this method. "In individual experiments one usually begins with established criteria", says Scheffler. "This means: one predominantly searches for supraconductors in the substance group in which one was previously successful." But it is exactly this that makes revolutionary developments more difficult. Here, the structural analysis of large data volumes is much more impartial. Matthias Scheffler can therefore readily envisage new thermoelectric materials – that is, materials that convert undesirable waste heat into useful electricity – being discovered in the future, for example in data generated during solar cell research.
If, one day, it is finally possible to theoretically derive material properties, Peter Benner also sees an additional advantage. "This would save the time and money expended on some experiments", says the mathematician from the Max Planck Institute in Magdeburg. And the patience of the researchers, who currently are often forced to approach solutions using the trial-and-error method, would also be less taxed.
4 V Challenge
Materials science is entering an era where the growth of data from experiments and simulations is expanding beyond a level that is properly processable by established scientific methods. At several MPIs of the Section for Chemistry, Physics, and Technology (CPTS), the so-called “4 V challenge” is becoming eminent: - Volume (the amount of data)- Variety (the heterogeneity of form and meaning of data)- Velocity (the rate at which data may change or new data arrive)- Veracity (uncertainty of quality) This exploitation requires new and dedicated technology based on approaches in statistical and machine learning, compressed sensing, and other recent technologies from mathematics, computer science, statistics and information technology. The PIs of the proposed MaxNet cover a significant breadth in research areas, and they are convinced that the envisioned synergy will enable them and their MPIs to develop novel, domain-specific and property-specific methods to enter and shape the era of data-driven materials research. The goal of the proposed MaxNet is to fully exploit these scientific potentials of materials science activities of the CPTS and to raise the consortium to world leadership in data-driven materials science.