Crystal structure prediction in the context of inverse materials design

删除或更新信息，请邮件至freekaoyan#163.com(#换成@)

本站小编 Free考研考试/2022-01-01

1.
Crystal structure prediction as a key step in inverse materials design

In traditional computational materials science research, the central problem that is solved is predicting the properties of a material given its composition and structure. Materials properties are functions of the chemical composition and atoms position degrees of freedom. These degrees of freedom can be expressed by the set of variables σ = {a, b, c; r₁, s₁, r₂, s₂, …, r_N, s_N}, where {a, b, c} are the lattice vectors of the unit cell, {r₁, r₂, …, r_N } are the vector positions of the N atoms in the unit cell, and the labels s_i specify the chemical species of the atom at each site r_i. A property P is therefore a function P(σ) of σ and a large range of properties can nowadays be accurately calculated by the modern methods of ab initio electronic structure theory^{[1, 2]}. Inverse design^[3] turns the traditional paradigm of computational materials science on its head and addresses the problem of predicting materials that exhibit a given target property P^target. This inverse prediction problem translates into an optimization problem in which a space of possible solutions (i.e, a space of candidate materials) is explored in search for the optimal solution (i.e, the material that meets the design target).

For the materials predicted by inverse design to stand a chance to be useful in applications, they should be stable and synthesizable. To guarantee stability, it is reasonable to explore series of solids for which there is experimental evidence of stability and whose crystal structure has been measured and reported in the literature. Series of known solids can be retrieved, for instance, from experimental databases of inorganic materials such as the ICSD^[4]. However, the materials that have been synthesized thus far (and that are reported in the literature), are only a fraction of all the compounds that can be conceived just by following the basic chemical rules of valence. For instance, if one considered the ternary “principal” compositions ABX, ABX₂, ABX₃, and A₂BX₄, and substituted to A, B, and X species with valence states that insure overall charge neutrality of the compound, then it would be possible to generate large families of viable compositions. Out of the chemically viable compositions often a number, or, in some cases, even most them, are missing from the experimental databases. As an example, Fig. 1 shows schematically the I-I-VI series of ABX compounds^{[5, 6]} in which A and B are two distinct monovalent species selected either from the alkali elements (column I in the periodic table) or the d¹⁰ noble metals, i.e, Cu, Ag, and Au, and X is a chalcogen species (column VI of the periodic table), i.e, S, Se, or Te. The checkmarks indicate the I-I-VI that are reported to be stable. Several entries are missing in this I-I-VI series and correspond to compounds that are not reported in the literature and experimental databases. Similar compilations of missing compounds have been reported for other widely studied families of solids, such as the A₂BX₄^[7] and ABX₃ families.

onerror="this.onerror=null;this.src='http://www.jos.ac.cn/fileBDTXB/journal/article/jos/2018/7/PIC/17090022-1.jpg'"
class="figure_img" id="Figure1"/>

Download

Larger image

PowerPoint slide

Figure1.
(Color online) (a) Schematic summarizing which of the series of possible I-I-VI ABX compounds are reported in the ICSD (shown by check marks) and those that are missing from the ICSD and whose existence was unknown. Here, A and B are column I metals, i.e, Li, Na, K, Rb, and the d¹⁰ noble metals Cu, Ag, and Au (which predominantly exhibit a monovalent character). The X species are the column VI elements S, Se and Te. (b) Compounds predicted to be stable (denoted by + signs) or unstable (denoted by – signs) in the half-Heusler alloy structure. (c) Compounds predicted to be stable or unstable in Ref. [20] performing DFT total energy calculations of each ABX system using a set of 41 structure types. Adapted with permission from Ref. [5].

The fact that numerous hypothetical, yet chemically viable compounds are missing from the experimental databases begs the question as to which of these missing materials are thermodynamically stable and which are not. To determine whether a hypothetical compound is stable, one must first predict its lowest-energy crystal structure and then assess whether the compound, in that crystal structure, is thermodynamically stable against decomposition into competing phases of different stoichiometry. Computational crystal structure prediction is, thus, an essential step in inverse materials design whenever the search for optimal materials is performed on families of compounds comprising both known and hypothetical materials.

In this paper, we give an overview of the application of biologically-inspired global optimization methods combined with modern density functional theory (DFT) to predict^{[8, 9]} the stability and crystal structure of solids. We consider two general classes of structure prediction problems: First, configuration search problems that arise in systems characterized by a given underlying lattice (e.g, fcc) and in which the objective is to predict the stable configurations formed by two or more species on the lattice. Second, lattice-type search problems that arise in systems in which the structure is not constrained to a given underlying lattice and the goal is to predict the lattice sites, as well as the distribution of the different species of atoms on the lattice. We show how biologically-inspired global optimization algorithms^{[10, 11]} can overcome some serious limitations of the standard prediction methods based on configuration enumeration and screening of fixed lists of candidate structure types. Specifically, we describe a method that combines exhaustive enumeration and a genetic algorithm^[10] to search for optimal atom arrangements in large configuration spaces and global optimization method based on an evolutionary algorithms^[11] to address, using DFT, lattice-type search problems. We then discuss the application of these optimization methods to problems of interest in alloy theory and to predict the stable phases of inorganic solids that is a central step in the inverse design of inorganic functional materials.

2.
Structure types and conditions for thermodynamic stability of a solid

One of the most striking facts in solid-state chemistry^[12] is the observation that numerous solids, even formed by wildly different chemical species, might show similar crystal structures. These similarities are not obvious when one compares the numerical values of the atom coordinates but emerge when the space-group symmetry and set of Wyckoff sites as used ad descriptors of the structure. In this respect, it is useful to introduce the concept of structure type. Following the classic definition given by Ewald and Hermann in the Struckturbericht^[13] a structure type is defined by the space-group symmetry, the type of occupied Wyckoff sites, and the chemical formula. We will refer to a structure type so defined as to a formal structure type (FST). Two structures that coincide with respect to these descriptors are considered to exhibit the same structure type. Note that some, or all, the Wyckoff sites (u, v, w) might include symmetry-unconstrained coordinates. Only when the coordinates of the Wyckoff sites are constant numbers mandated by the symmetry operations, does an FST provide the information needed to specify the topology of the periodic atom packing and of the bonding network in the crystal. In the case where one or more of the Wyckoff coordinates (u, v, w) are free parameters, it will be necessary to specify their numerical values to determine the atom arrangement and local order in a crystal with that structure type. In this case, it is even possible that a material with a certain FST might exhibit multiple local total-energy minima corresponding to somewhat different atomic arrangements.

The relevant quantity to assess the thermodynamic stability of a compound is its formation energy ΔH with respect to its elemental constituents. For simplicity, here, we will consider the case of binary systems. In this case, the formation energy of a binary phase of composition A_p B_q is

$${
m{Delta }}Hleft( {{{
m A}_p}{{
m B}_q}}
ight) = Eleft( {{{
m A}_p}{{
m B}_q}}
ight)-left( {1-x}
ight)Eleft({
m A}
ight)-xEleft( {
m B}
ight),$$

(1)

where E(A_p B_q), E(A), and E(B) are the equilibrium total energies per atom, respectively, of the A_pB_q phase with structure σ and of the elements A and B in their stable solid phases, and $x = frac{q}{{p ,,+,, q}}$

is the concentration of species B in mole fraction. For a structure σ with B concentration x_σ to be thermodynamically stable, its formation energy must be negative, i.e, ΔH(σ) < 0, and the phase must be stable against decomposition into any pairs of phases σ' and σ'', respectively, at lower and higher concentration than σ, that is, x_(σ') < x_σ < x_(σ''). This is formally expressed by the inequality

$${
m{Delta }}Hleft( sigma
ight) leqslant {
m{Delta }}Hleft( {sigma ''}
ight)frac{{{x_sigma }-{x_{sigma '}}}}{{{x_{sigma ''}}-{x_{sigma '}}}} + {
m{Delta }}Hleft( {sigma '}
ight)frac{{{x_{sigma ''}}-{x_sigma }}}{{{x_{sigma ''}}-{x_{sigma '}}}},$$

(2)

in which the term on the right-hand side defines the tie line between the phases σ' and σ'' at concentration x_(σ') and x_(σ'') and the condition of stability translates into the requirement that the formation energy ΔH(σ) of σ lie below (or on) the tie line that connects the formation energies of any pairs of σ' and σ'' phases. The phases across the composition range that satisfy this condition define the ground-state convex hull line, which is a continuous piece-wise function C(x) in the ΔH versus x plane.

3.
Configuration search problems

3.1
Search by exhaustive enumeration versus search by sampling

A large class of crystal structure prediction problems are characterized by a given underlying lattice, and the space of possible crystal structures is defined by the way the atoms of different species occupy the available lattice sites. We will consider here binary systems in which atoms of A and B species distribute over the given lattice, for instance an fcc or a bcc lattice, forming periodic patterns. A periodic cell with N lattice sites can accommodate the binary compositions A_pB_N-p, where p = 1, …, N ? 1. The patterns formed by the A and B atoms can be represented by the set of configurational variables {s₁, s₂, …, s_N } where at each site i the variable s_i takes two values, for instance, +1 and ?1, corresponding respectively to the A and B species. A total of 2^N binary configurations can be accommodated within a periodic cell comprising N lattice sites, which means the number of possible configurations increases exponentially with N. Binary configuration spaces emerge in the context of alloy theory. For example, when one searches for the ground-state configurations of a binary system at temperature T = 0 one must consider the formation energy ΔH in principle of all possible binary configurations A_pB_N-p across the composition range to search for the configurations corresponding to the breaking points on the stability convex-hull C(x).

The structures forming a binary configuration space can be organized into a hierarchy of subsets^[14]: (i) the configurations can be grouped by the number N of lattice sites in the unit cell, (ii) the configurations with N sites can be divided into groups with distinct “inequivalent cell shapes” (ICSs), and (iii) the configurations within each ICS can be grouped into same-shape structures (SSSs) each corresponding to one of the possible compositions A_pB_N-p. There are two general strategies to search for configurations with a target property within a binary configuration space: (a) exhaustive enumeration, where all the distinct A_pB_q configurations are enumerated and the target property P is calculated for each configuration; (b) a global optimization algorithm in which a sampling strategy is used to search for the optimal configuration.

Ferreira, Wei, and Zunger (FWZ)^[15] developed an algorithm to perform the exhaustive enumeration of all the distinct binary configurations on fcc lattices up to N = Q lattice sites per cell. This algorithm makes it easy to estimate the number of ICSs and SSSs as functions of the number N of sites per cell. As Fig. 2 shows, the number N_ICS of ICSs scales with N following a power law, i.e, N_ICS ≈ BN^α where α ≈ 1.5. This power dependence is characterized by a prefactor B that assures a relatively slow growth of N_ICS with N. For instance, when N = 50 (that correspond already to relatively large cells) there are about 300 ICSs. Going up to cells of 100 sites the number of distinct ICSs grows to a few thousands that can be generated with a modest computational effort.The number N_SSS of symmetry-inequivalent SSSs scales with N following the exponential law as N_SSS ≈ Ae^0.6N ^[14], which results in a rapid growth of N_SSS with N. In prior studies (see, e.g, Ref. [16]) the enumeration has been typically performed up to structures with 20 sites cell giving a total of ~3 000 000 symmetry-distinct configurations. This number of structures is between two to three orders of magnitude beyond the 10³ and 10⁴ structures that is practical to evaluate by DFT in the usual computational campaigns on modern supercomputers. The screening of ~10⁶ structures becomes feasible, even using only one core on modern processors, when an analytic function like a cluster expansion^[17] is used to evaluate the physical property to optimize. However, the exhaustive enumeration of the configurations becomes increasingly less practical going beyond 20 sites per cell. For instance, when the number of lattice sites per cell N = 32, the number of SSSs in a single ICS is larger than 10⁶ and one can estimate that the exhaustive enumeration of all structures up to 32 sites will generate more than 10⁸ distinct configurations. This will pose a challenge because the exhaustive enumeration and evaluation would be computationally intensive tasks will require specialized “big data” procedures and platforms to handle the vast amount of data produced.

onerror="this.onerror=null;this.src='http://www.jos.ac.cn/fileBDTXB/journal/article/jos/2018/7/PIC/17090022-2.jpg'"
class="figure_img" id="Figure2"/>

Download

Larger image

PowerPoint slide

Figure2.
(Color online) (a) The circles indicate the number of inequivalent cell shapes of a system of binary compounds A_xB_1-x as a function of the number of atoms N per unit cell as obtained applying the exhaustive direct enumeration algorithm of Ref. [21]. The interpolating solid red line shows that the number N_ICS of ICSs obtained by exhaustive enumeration increases as a function of N following a power law BN^α where α ≈ 1.5. (b) The largest number of same-shape structures (SSSs), is a function of the number of sites N per cell was not full sentence. Note that the enumeration algorithm of Ref. [21] applies symmetry checks to discard structures that are symmetry-equivalent to structures previously generated during the enumeration. Even with the symmetry filters applied, the number N_SSS ≈ increases as a function of the number of atoms per cell following the exponential law Ae^γN. Adapted with permission from Ref. [14].

The alternative to exhaustive enumeration is to search for the optimal configuration among the SSSs by a global optimization algorithm such as simulated annealing^{[8, 18]} or a genetic algorithm^[19]. Global optimization algorithms sample a subset of configurations instead of enumerating all the configurations to find the optimal solution and, therefore, are preferable to exhaustive enumeration when dealing with large configuration space. On the other hand, sampling algorithms like simulated annealing and genetic algorithms work best when the configuration “genes” refer to structures with common cell shape.

3.2
Combining the strengths of exhaustive enumeration and sampling

To formulate an improved search method able to explore spaces of configurations with a number of sites per cell larger than 20, we proposed a procedure^[14] that leverages the strengths of the exhaustive enumeration method and of global optimization methods like the genetic algorithm. We call this combined approach different sampling for different cell shapes^[14] (DSDS, for short). This procedure uses the exhaustive enumeration procedure to generate all the possible ICSs for a given lattice type (e.g, fcc) containing up to a given number Q of sites per cell. The exhaustive enumeration of the ICSs, as we saw earlier, can be performed easily up to a large number of sites per cell. Then, the configurations that exhibit the optimal values of the target property are searched for in correspondence of all possible compositions within each distinct ICS using a global optimization algorithm, for instance a genetic algorithm.

In the concrete implementation of the DSDS method presented in Ref. [14] we searched for the global optimum configuration by a bit-string genetic algorithm^[19]. In a genetic algorithm, one uses a “population” of distinct configurations, called “individuals”, which represent candidate solutions of the optimization problem. A distinctive character of genetic algorithms for global optimization is that the configurations are represented by strings (i.e, lists) of numbers. In the case of the search for optimal binary alloy configurations, one uses strings of –1 and 1 values to represent atoms of species A and B, respectively. Each individual is assigned a “fitness” that measures how close that individual is to the optimal solution. In the case of the prediction of stable binary configurations, the measure of fitness is given by the formation energy Δ H. The initial population is made of configurations that are generated randomly. The individuals forming the population are sorted in ascending order of formation energy with the fittest configuration having the lowest formation energies. A new population is then generated by selecting individuals among the fittest ones and performing “crossover” operations on a couple of individuals or “mutations” on single individuals. The crossover operation combines two individuals into a new one by populating the new string by single bits or sequences of bits taken from each parent individuals. The mutation operation creates a new individual from an existing one by randomly swapping a subset of bits. A segment of the population, e.g, 20 %, including the least fit individuals is then replaced by the newly generated ones thus producing the next “generation” of the population. This process of applying the genetic operators to the current population to produce a new one is repeated iteratively until a stopping criterion is reached, for instance, the best configuration is not improved by the algorithm for a given minimum number of generations. The advantage of the DSDS approach with respect to the exhaustive evaluation is that there is no need to enumerate and calculate all the SSSs, which is the most time-consuming part of the enumeration, thus decreasing the number of configurations that should be evaluated to find the optimal ones.

4.
Lattice-type search problems

4.1
High-throughput screening of a fixed list of candidate structure types

An intuitive approach to predicting the lowest energy structure of a solid of a given composition is to perform structural relaxation and total energy calculations of a set of hypothetical structure types previously reported in other materials and take as predicted crystal structure the lowest-energy structure type among those calculated. This approach of screening a fixed list of structure types is based on the hypothesis that a missing compound in a certain chemical family, for instance the series of ABX solids, will form in one of the structure types experimentally observed in the known compounds belonging to that same family. This hypothesis originates from the empirical observation that a finite, number of structure types are observed in the experimentally known compounds within a certain chemical (e.g, ABX, ABX₂, ABX₃, A₂BX₄, etc.). Currently, this type of screening of candidate structures is performed by high-throughput methods^{[19, 20]}, that is, computational pipelines that automatize the workflow of the preparation, execution, and analysis of the output of ab initio calculations, delivering a high-throughput in the generation of results.

The space group of the given formal structure type determines which of the lattice parameters {a, b, c} (the lengths of the lattice vectors {a, b, c}) and lattice angles {α, β, γ} are not constrained by symmetry. The optimal values of the symmetry unconstrained lattice parameters and Wyckoff coordinates are obtained by local total energy minimization. Usually, as initial values for the variable Wyckoff coordinates one uses the values that they assume in solid compounds used as prototype materials exhibiting that formal structure type. Gradient-following minimization methods, such as the conjugated gradient method and improvements of it, are applied to iteratively search for the closest local minimum using the derivatives of the self-consistent DFT total energy^[20] with respect to the atom positions and lattice parameters.

4.2
Symmetry-unconstrained structure search at fixed composition

The total energy E_tot (a, b, c; r₁, …, r_N) of a solid is a highly non-linear function with multiple local minima. Therefore, predicting the lowest energy structure is a global optimization problem that should be addressed with a search method that allows the exploration of the Born-Oppenheimer surface defined by E_tot without getting stuck into local minima. A crystal structure search approach based on the high-throughput screening visits only the local minima corresponding to the finite set of formal structure types considered and runs the risk of missing the global minimum if its structure type has never been observed before within that family of solids (or was missed searching the literature). To overcome this drawback the challenge is to formulate a truly global minimization method for E_tot (a, b, c; r₁, …, r_N) that can visit candidate solutions without constraints on the space group and the connectivity of the network of bonds in the structure. We refer to this type of search problem as the global space-group optimization (GSGO) problem.

Several global optimization methods have been formulated so far to tackle the GSGO problem. These include simulated annealing^[21], metadynamics^{[22, 23]}, basin hopping^{[24, 25]}, minima hopping^{[26, 27]}, and random search^[28–30]. However, these methods explore the Born-Oppenheimer surface by moving between contiguous local minima and can achieve a global view of the search space only through long simulation runs. Even with the state-of-the-art software packages implementing the DFT total-energy method, it continues to be very computationally intensive to perform the calculation of the hundreds or even thousands of structures needed for a truly global search of the Born-Oppenheimer surface with the above mentioned global optimization methods. An important advance in the field of crystal structure prediction in the last decade has been the application of biologically-inspired global optimization to symmetry-unconstrained structure search^[9]. These methods include genetic algorithms^{[8, 31, 32]}, evolutionary algorithms^[33–39], and particle swarm optimization^{[40, 41]}. A unifying feature of these methods is that they sample the search space with a “population” of candidate solutions. The set of candidate solutions is then evolved by replacing a fraction of the population corresponding to the highest energy structures with new ones produced by evolutionary operators that are designed to explore new parts of the phase space in search of the optimal solution.

To tackle the GSGO problem we apply^{[42, 43]} the ideas underlying a few evolutionary algorithms^{[33–37, 44]} developed to predict the global total energy minimum of a crystal. The evolutionary search starts from a population of candidate structures with randomly generated lattice vectors and atom positions. The population of candidate structures is evolved through a series of generations in which at each generation a subset formed by the highest-energy structures (e.g, 25% of the whole population) are removed from the population and replaced with new ones produced by mutation and crossover operations. The crossover operation is performed on two structures amongst the most fit within the current population. These two structures, in general, have different cell shapes, a fact that renders not obvious how to establish a correspondence between the cells that could enable a process of inheritance from the two parent structures to the child structure. To circumvent this problem, the two cells are first reduced to unitary dimensions, a transformation that makes the cells have the same shape^[33–37]. The crossover is then performed by a cut-and-splice operation^[45] in which the two cells are split into two slabs that are swapped and then spliced together to generate a child structure. The lattice parameters and atom positions of each new structure are fully relaxed to the closest DFT local total energy minimum. The fully relaxed structures are finally added to the “surviving” part of the population from the previous generation and the new population is sorted in ascending order according to the total energies of the individuals. The evolutionary search continues iteratively through a series of generations until a certain stopping criterion is met, for instance the total number of structures evaluated has reached a pre-set maximum.

4.3
Variable-composition symmetry-unconstrained structure search

So far, we have considered lattice search problems in which the species of the elements are given along with the overall composition, e.g, A_pB_qC_r. In principle, the most general crystal structure prediction problem is that in which one seeks the composition and crystal structure of the stable compounds, for instance, containing the A, B, and C species. Addressing this problem requires exploring the space of possible compositions and crystal structures and applying to the candidate compounds the condition of thermodynamic stability (which, e.g, for binary systems, is given in Section 2).

In the evolutionary algorithm described in Section 4.2 the composition is fixed and the objective function to minimize is the total energy. To tackle the search not only for the stable crystal structure, but also for the compositions of the stable compounds in a multi-component system, we need to relax the constraint of fixed composition in the evolutionary search. Furthermore, the quantity which is relevant to assess the thermodynamic stability of a solid, is its formation of energy ΔH(σ) calculated with respect to the phases of the elemental constituents. For instance, in the case of a binary phase A_pB_q the formation energy will be calculated using Eq. (1) taking the total energies of solid A and B in their respective stable phases. Following these requirements for an optimization method that be able to predict stoichiometry and structure of the stable phase, in Ref. [46], we extended the ab initio evolutionary GSGO algorithm by lifting the constraints on composition. Moreover, to optimize the convex hull, instead of minimizing the total energy, we minimize the objective function δ⁽ⁿ⁾ (σ) = ΔH(σ) – C⁽ⁿ⁾ (x), that is, the difference between the formation energy of a phase in the population at the generation n and the convex hull C⁽ⁿ⁾ (x) at that generation. For the structures on the convex hull the objective function will be zero. By removing from the population the structures farther away from the convex hull (with largest value of the objective function) and replacing them with newly generated ones, one puts pressure on the population to evolve towards structures with ever lower energy. This in turn tends to refine the convex hull toward its optimal shape with the breaking points corresponding to the ground-state structures. We call X-GSGO^[46] this extended GSGO algorithm.

5.
Applications

5.1
Configuration search problems: prediction of stable alloy configurations

The prediction of the stable atom configurations on a given underlying lattice is an important problem in the theory of metal and semiconductor alloys. Here, we consider Au–Pd^[16] and Mo–Ta^[47–49] as two examples of binary metal alloys in which the ordered phases exhibit the same underlying lattice, specifically the fcc lattice in the case of Au–Pd and the bcc lattice in the case of Mo–Ta. To calculate the formation energies of the configurations in these alloys we used the cluster expansion reported in Ref. [16] for Au–Pd and in Refs. [47–49] for Mo-Ta. The terms included in the cluster expansion and the respective coefficients were determined by an iterative fitting of the DFT formation energies of a few dozens of binary structures. The cluster expansions so obtained reproduce with a high accuracy the DFT formation energies at a fraction of the computational cost and therefore can be applied to rapidly screen millions of configurations. Nevertheless, the exponential explosion of the number of configurations as a function of the number of lattice sites N has limited the exhaustive evaluation even with cluster expansion typically to configurations with up to 20 atoms per cell.

The DSDS method allows one to explore the configuration space for Au–Pd and Mo–Ta extending the search to configurations with more than 20 atoms per cell. Importantly, the DSDS protocol applied to structures with up to N = 20 atoms per cell gives the same result as exhaustive evaluation both for Mo–Ta and Au–Pd. The result of the DSDS search for all ICSs up to 32 atoms per cell in the case of Au-Pd is presented in Fig. 3(a) with the resulting convex hull shown in Fig. 3(b). The DSDS search identifies two new ground-state structures (with respect to the exhaustive enumeration up to 20 atoms per cell) with composition Au₅Pd and Au₅Pd₃ and 24 atoms in the unit cell (see Fig. 3(b)). The depth of these ground states relative to the tie line on the convex hull is small, i.e, 0.1 meV/atom, and as a result the DSDS ground-state line appears almost unchanged with respect to that obtained by exhaustive evaluation of the structures with up to 20 atoms per cell. The structure with composition Au₅Pd₃ is not a superlattice. The Au₅Pd is a superlattice along the (310) direction with the sequence of layers (PdAu₄)(PdAu₅)(PdAu₆)(PdAu₅). Interestingly, this new Au₅Pd ground-state configuration is an example of the class of “adaptive structures”^[50], that is, structures that allow an efficient adaptation to infinitely small changes in the concentration within a fairly large range of alloy compositions. This new adaptive structure emerges in the region of the composition range where other adaptive structures were identified by exhaustive evaluation in Ref. [16]. For Mo–Ta, the DSDS search (see Fig. 3(c)) didn’t identify new ground-state structures when it was extended to comprise up to N = 32 atoms per cell.

onerror="this.onerror=null;this.src='http://www.jos.ac.cn/fileBDTXB/journal/article/jos/2018/7/PIC/17090022-3.jpg'"
class="figure_img" id="Figure3"/>

Download

Larger image

PowerPoint slide

Figure3.
(Color online) The result of the ground state search performed by the DSDS procedure for the Au–Pd metallic alloy exploring structures with up to 32 atoms per cell: each circle corresponds to the structure of the lowest formation energy found by performing a genetic algorithm search for each of the directly enumerated inequivalent cell shapes. Up to four genetic algorithm runs were performed per ICS starting from independent random populations and the lowest among the optimal structures was retained for the given ICS. Adapted with permission from Ref. [14].

5.2
Lattice-type search problems

5.2.1
High-throughput structure type prediction of selected ABX missing compounds

Figs. 1(b) and 1(c) show the result of the application of the high-throughput method to predict which of the missing I-I-VI ABX compounds are stable and in which structure type. The missing materials that are predicted to be stable or unstable are shown, respectively, by plus and minus symbols. Fig. 1(b) shows the predictions made by calculating only the half-Heusler alloy structure (C1_b), which is one the most frequently observed ABX structure types. Testing only one structure type obviously runs the risk of missing structure types of even lower energy. A better approach is to test as many of the observed ABX structure types as possible. A survey^{[6, 51]} of the literature and crystallographic databases shows that the vast majority of ABX compounds exhibit one of 41 structure types, C1_b being one of the most common of these. Fig. 1(c) shows the missing materials predicted stable and unstable when these 41 structure types are used high-throughput calculations. Several of the compounds that were predicted to be thermodynamically unstable in the half-Heusler alloy structure are now predicted to be thermodynamically stable in one of the other 40 structure types tested. In the next sections, we will show how the evolutionary search is able to look beyond the local minima that are visited by high-throughput calculations of a fixed list of structure types.

5.2.2
Predicting the correct realization of a formal structure type

RbCuS and RbCuSe are two of the compounds (of which RbCuSe was previously unreported) that were studied in Ref. [6] by high-throughput DFT calculations of 41 ABX formal structure types. To each formal structure type is associated as a prototype a solid compound reported in the literature. The atoms in the prototype structures are substituted with the A, B and X atoms of a hypothetical ABX compound to obtain the initial atom coordinates from which the full local relaxation of this hypothetical ABX system, in that structure type, is started. The lowest-energy structure type predicted for RbCuS and RbCuSe has the Cmcm space group symmetry and is isotypical to RbAuS. The local relaxation of RbCuS and RbCuSe in this RbAuS-type structure was started from the experimental structure of RbAuS in which Au was replaced with Cu (and S was replaced with Se in the case of RbCuSe). Later synthesis work^[5] confirmed the stability of RbCuS and RbCuSe in a Cmcm structure isotypical to RbAuS, that is, in which the atoms occupy the same Wyckoff positions as in the RbAuS prototype. However, the experimental structure exhibits a qualitatively different local coordination environment around Rb than in the structure predicted starting the local relaxation from the RbAuS atom positions. The GSGO evolutionary algorithm discussed in Section 4.2 is instead able to predict for RbCuS and RbCuSe structures with the observed Cmcm space group symmetry and the atoms in the correct Wyckoff positions but with the numerical value of the fractional coordinates of Rb in agreement with the experimental ones. measured.

5.2.3
Exploring the diversity of lattice types in metallic compounds

Binary metallic alloys A–B are often formed by elements A and B that in their pure solid phases have different structure types, for instance A forms in the fcc structure and B in the hcp structure. In this case it is difficult to guess if the mixed phases A_pB_q form in structure types based on the underlying lattice of pure solid A or that of pure solid B, or if their structure types are altogether unrelated to the structure of solid A and solid B. An example of a binary system with hard to guess structure types is Al–Sc^[52–54]. Solid Al and Sc exhibit respectively the fcc and hcp structure. The intermediate stable Al–Sc phases exhibit a variety of structures, for instance, Al₃Sc forms in the fcc-based L1₂ structure and AlSc in the bcc-based B2 structure. In addition, Curtarolo et al.^[55] studied this system by high-throughput DFT calculations of 176 binary structure types and predicted a new stable phase with composition AlSc₃ and the hcp-based D0₁₉ structure in a region where the experimental phase diagram indicates instead the coexistence of the Al and AlSc₂-B8₂ solid phases at T = 0 K. The crystal structure of these phases could still be predicted by building multiple cluster expansions, one for each of the possible underlying lattices, then searching for the convex hulls for each lattice by exhaustive evaluation or the DSDS method, and finally combining the convex hulls so obtained to determine the lowest energy structures across the composition range. The evolutionary GSGO method is not limited to any given underlying lattice and can be applied using the DFT total energy functional. Fig. 4 shows the result of the evolutionary search applied to Al₃Sc. In this case a population of 20 structures with two formula units per cell, that is, with the composition Al₆Sc₂ was used. The lowest energy-structure was found after 14 generations (for a total of 76 structures relaxed). The application of a space-group symmetry detection algorithm to this structure shows that it is isotypical to the D0₁₉ structure predicted by high-throughput calculations.

onerror="this.onerror=null;this.src='http://www.jos.ac.cn/fileBDTXB/journal/article/jos/2018/7/PIC/17090022-4.jpg'"
class="figure_img" id="Figure4"/>

Download

Larger image

PowerPoint slide

Figure4.
(Color online) Evolutionary optimization run for Al₂Sc₆ performed with a population of 20 individuals evolved replacing the 4 highest energy individuals at each generation. The optimal structure is displayed in panel (b) and exhibits the D0₁₉ structure type and was found after 76 structures were generated and evaluated by full structural relaxation during the search run. Adapted with permission from Ref. [42].

The prediction problem becomes even more challenging for intermetallic phases A_pB_q with crystal structures that are not daughter structures of the underlying lattice of either solid A or solid B. One such example is PdTi₃^[56] that forms in the A15-type structure in which the Pd atoms order into a bcc sub-lattice and the Ti atoms form pairs at the center of each face of the conventional (cubic) cell of the Pd sub-lattice (see Fig. 5). The evolutionary GSGO search performed on PdTi₃ using a population of structures of composition Pd₂Ti₆ (i.e, two formula units per cell). The history plot of the evolutionary search and the predicted lowest-energy structure are shown in Fig. 5. The lowest energy structure was found after ~100 structures evaluation and the symmetry detection algorithm confirmed that it is isotypical to the experimental ground-state A15 structure.

onerror="this.onerror=null;this.src='http://www.jos.ac.cn/fileBDTXB/journal/article/jos/2018/7/PIC/17090022-5.jpg'"
class="figure_img" id="Figure5"/>

Download

Larger image

PowerPoint slide

Figure5.
(Color online) (a) Evolutionary optimization run for Pd₂Ti₆ performed with a population of 20 individuals evolved replacing the 5 highest energy individuals at each generation. The optimal structure displayed in panel (b) exhibits the A15 structure type and was found after approximately 100 structures were generated and evaluated by full structural relaxation during the search run. Adapted with permission from Ref. [42].

5.2.4
Predicting the stable compositions and structures in binary systems

Other than the phases mentioned in the previous section, the Al–Sc^{[52, 53]} system includes the Al₂Sc phase with the C15 structure and the AlSc₂ phase with the hcp-based B8₂ structure. In Ref. [46], we applied the ab initio X-GSGO algorithm to Al–Sc. Indeed, the variety of structure types in this system and the fact that C15 and B2 cannot be guessed as daughter structures of the fcc lattice of solid Al or the hcp lattice of solid Sc, makes Al–Sc a challenging system to test the ability of the X-GSGO algorithm to predict composition and structure of the stable phases across the full composition range. A total of N_at – 1 binary compositions Al _pSc_Nat-p, with p = 1, …, N_at – 1, can be represented in a cell containing N_at atoms. This means that cells with a given N_at might accommodate only some of the compositions that are observed in the phase diagram. We considered N_at = 6 and 8 in the periodic cell and performed, respectively. Each X-GSGO run started with a randomly generated population of structures and was usually carried out for at least 100 structural evaluations. The convex hulls reconstructed from the runs with N_at = 6 and 8 are shown in Fig. 6. The convex hull for the case of N_at = 6 has a vertex at Al₅Sc with corresponding to a Cmmm structure and includes the AlSc₂-B8₂, AlSc-B2, and Al₄Sc₂-C15 ground states. All the three runs performed for N_at = 8 retrieved the Al₄Sc₄-B2 and Al₆Sc₂-L1₂; the Al₂Sc₆-D0₁₉ structure was found in one restart while Al₂Sc₆-L1₂ was found in two of the three restarts. With the convex hulls built from the range of lowest formation energy structures found across the composition range with N_at = 6 and 8, we built the overall convex hull that includes all the phases and structures previously known from experiment plus the AlSc₃-D0₁₉ phase previously found by DFT high-throughput calculations.

onerror="this.onerror=null;this.src='http://www.jos.ac.cn/fileBDTXB/journal/article/jos/2018/7/PIC/17090022-6.jpg'"
class="figure_img" id="Figure6"/>

Download

Larger image

PowerPoint slide

Figure6.
(Color online) Central panel: the solid line shows the ground-state convex hull of Al–Sc with the breaking points (shown by the red dots) corresponding to the formation energies of the ground-state structures. The overall ground state convex hull line is determined by combining the convex hull lines obtained by X-GSGO searches run on cells containing 6 and 8 atoms shown respectively by the dash line and the dot-dashed line. Adapted with permission from Ref. [46].

6.
Discussion and conclusion

The ability of biologically inspired algorithms to predict crystal structures makes it possible to extend inverse design beyond the universe of structure types that are currently known and that are used by high throughput methods. One might speculate whether global optimization algorithms once deployed on the next generation of peta-scale and exa-scale supercomputers could substitute altogether high-throughput screening protocols to predict the structure of hypothetical materials. It is, instead, reasonable to envisage that future crystal structure prediction protocols will be computational pipelines that will combine both high-throughput and global optimization methods. For instance, a first stage of the prediction pipeline might be the high-throughput screening of a pool of known structure types. These predictions would then be refined by biologically-inspired optimization algorithms to achieve a truly global view of the search space. The new structure types that global optimization might identify would then become part of the list of structure types to screen at the outset.

In conclusion, in this paper we gave an overview of selected implementations of genetic and evolutionary algorithms to predict the stable phases and the corresponding crystal structures of crystalline solid materials. For configuration search problems defined on a given lattice, we showed how a genetic algorithm combined with the exhaustive enumeration of the cell shapes enables the search of optimal configurations with a number of sites per cell larger than previously possible. This combined approach avoids the computational complexity of the exhaustive enumeration of all ordered atom configurations on the lattice. For lattice-type search problems, biologically-inspired optimization algorithms have emerged as effective methods to address seamlessly the prediction of the underlying lattice and of the composition of stable solid phases going beyond the high-throughput method that screens a finite set of candidate structure types.

Acknowledgments

It is a pleasure to acknowledge the collaboration with Alex Zunger on all the projects that produced the results presented here and the collaborations on one or multiple of these projects over the years with Sergey Barabash, Zhe Liu, Peter Graf, Xiuwen Zhang, (the late) Arthur Freeman, Michael Vermeer, and Ken Poeppelmeier.