Polynomial-time algorithms for isomorphism, automorphism partitioning, and cannonical labeling of molecular graphs

The graph isomorphism problem consists in deciding whether two given graphs are isomorphic, i.e. whether there is a one-to-one mapping (a permutation) from the vertices of one graph to the vertices of the second graph, such that the edge connections are respected. An isomorphic mapping of the vertices onto themselves is called an automorphism. The set of all automorphisms of a given graph is the automorphism group of the graph. The automorphism group contains information regarding the topological symmetry of the graph. In particular, the orbits of an automorphism group identify symmetrical vertices. The canonical labeling problem consists of finding a labeling of the vertices of a given graph that maximizes (or minimizes) its adjacency matrix, while preserving the graph connectivity. Two graphs having the same canonical representation are isomorphic. While most articles related to graph isomorphism have been published in the computer science literature, the computation of the orbits of automorphism groups using partitioning techniques has received most attention in chemistry.

The graph isomorphism problem belongs to the class of NP problems, and has been conjectured intractable, although probably not NP-complete. However, in the context of chemistry because molecules are a restricted class of graphs, we have proven the problems of graph isomorphism, automorphism partitioning, and canonical labeling to be polynomial-time. This is an important result for chemical information studies, and in particular, for chemical structure generation, since all structure generators are making use of an isomorphism or a canonical labeling routine. A key element that lead to this result is the construction of a one to one mapping between molecular graphs and simple graphs. A molecular graph is a multigraph without loop (an atom is not bonded to itself) colored by the elements of the periodic table. To remove the color of the molecular graph dummy vertices are attached to each atom. The number of dummy vertices attached to a given atom a is equal to Z(a) + Z0, where Z0 is a constant (Z0 = 4 is valid for most organic compounds) and Z(a) is the atomic number of atom a in the periodic table. Double bonds are removed by adding a dummy vertex between the corresponding atoms. Triple bonds are removed by adding two dummy vertices. The above transformation enables one to use for molecules all the polynomial-time isomorphism and canonical labeling algorithms develop in the context of graph theory. However, we could not find in the litterature polynomial-time algorithms for automorphism partioning, and decided to test Morgan's extended connectivity concept for planar graphs. This study was motivated by the fact that most molecules have planar representations. Examples include DNA, RNA, proteins, paraffins, olefins, polyaromatic hydrocarbons, fullerenes, and dendrimers. We proved Morgan's technique to be correct for non-cyclic molecular graphs, with a quadradic-time complexity. We proposed an optimized version of Morgan's algorithm, with a running time scaling linearly for non-cyclic graphs. We also generalized the extended connectivity concept in order to handle all planar molecular graphs. In this generalization, the extended connectivity is now computed with respect to the planar drawings of the graph. The above automorphism partitioning algorithms are the first to be rigorously proven polynomial-time for molecular graphs. The consequences of the polynomial-time scaling is that the molecular structures that can be processed by the proposed algorithms are several order of magnitudes larger than previously reported. Practical applications of this study are in (1) molecular topological symmetry perception for chemical information and quantum mechanics calculations, (2) computer-assisted structure elucidation , (3) NMR spectra simulation, and (4) database storage and retrieval including determination of maximum common substructure.

References

Faulon, J. L. , Automorphism Partitioning, and Canonical Labeling Can Be Solved in Polynomial- Time for Molecular Graphs, J. Chem. Inf. Comput. Sci., 1998, 38, 432-444. (link to journal)

Faulon J.L., Collins M., Carr R.D. The Signature Molecular Descriptor. 4. Canonizing Molecules Using Extended Valence Sequence, J. Chem. Inf. Comput. Sci., 44, 427-436, 2004. [PMID: 15032522]. (.pdf manuscript).

Polynomial-time algorithms for isomorphism, automorphism partitioning, and cannonical labeling of molecular graphsThe graph isomorphism problem consists in deciding whether two given graphs are isomorphic, i.e. whether there is a one-to-one mapping (a permutation) from the vertices of one graph to the vertices of the second graph, such that the edge connections are respected. An isomorphic mapping of the vertices onto themselves is called an automorphism. The set of all automorphisms of a given graph is the automorphism group of the graph. The automorphism group contains information regarding the topological symmetry of the graph. In particular, the orbits of an automorphism group identify symmetrical vertices. The canonical labeling problem consists of finding a labeling of the vertices of a given graph that maximizes (or minimizes) its adjacency matrix, while preserving the graph connectivity. Two graphs having the same canonical representation are isomorphic. While most articles related to graph isomorphism have been published in the computer science literature, the computation of the orbits of automorphism groups using partitioning techniques has received most attention in chemistry.

The graph isomorphism problem belongs to the class of NP problems, and has been conjectured intractable, although probably not NP-complete. However, in the context of chemistry because molecules are a restricted class of graphs, we have proven the problems of graph isomorphism, automorphism partitioning, and canonical labeling to be polynomial-time. This is an important result for chemical information studies, and in particular, for chemical structure generation, since all structure generators are making use of an isomorphism or a canonical labeling routine. A key element that lead to this result is the construction of a one to one mapping between molecular graphs and simple graphs. A molecular graph is a multigraph without loop (an atom is not bonded to itself) colored by the elements of the periodic table. To remove the color of the molecular graph dummy vertices are attached to each atom. The number of dummy vertices attached to a given atom a is equal to Z(

a) + Z0, where Z0 is a constant (Z0 = 4 is valid for most organic compounds) and Z(a) is the atomic number of atomain the periodic table. Double bonds are removed by adding a dummy vertex between the corresponding atoms. Triple bonds are removed by adding two dummy vertices. The above transformation enables one to use for molecules all the polynomial-time isomorphism and canonical labeling algorithms develop in the context of graph theory. However, we could not find in the litterature polynomial-time algorithms for automorphism partioning, and decided to test Morgan's extended connectivity concept for planar graphs. This study was motivated by the fact that most molecules have planar representations. Examples include DNA, RNA, proteins, paraffins, olefins, polyaromatic hydrocarbons, fullerenes, and dendrimers. We proved Morgan's technique to be correct for non-cyclic molecular graphs, with a quadradic-time complexity. We proposed an optimized version of Morgan's algorithm, with a running time scaling linearly for non-cyclic graphs. We also generalized the extended connectivity concept in order to handle all planar molecular graphs. In this generalization, the extended connectivity is now computed with respect to the planar drawings of the graph. The above automorphism partitioning algorithms are the first to be rigorously proven polynomial-time for molecular graphs. The consequences of the polynomial-time scaling is that the molecular structures that can be processed by the proposed algorithms are several order of magnitudes larger than previously reported. Practical applications of this study are in (1) molecular topological symmetry perception for chemical information and quantum mechanics calculations, (2) computer-assisted structure elucidation , (3) NMR spectra simulation, and (4) database storage and retrieval including determination of maximum common substructure.ReferencesFaulon, J. L., Automorphism Partitioning, and Canonical Labeling Can Be Solved in Polynomial- Time for Molecular Graphs, J. Chem. Inf. Comput. Sci., 1998, 38, 432-444. (link to journal)Faulon J.L.,Collins M., Carr R.D. The Signature Molecular Descriptor. 4. Canonizing Molecules Using Extended Valence Sequence, J. Chem. Inf. Comput. Sci., 44, 427-436, 2004. [PMID: 15032522]. (.pdf manuscript).JLF 1997 & 2004