Chemical structure generators are computational tools that construct chemical structural formulas within a collection of chemical and physical constraints. Applications of these tools are in structural elucidation, molecular design, chemical kinetics, and synthesis design. Due to the relevance of these problems in chemistry, generators have been a prolific field of research since the late 1960s. The problem of structure generation is equivalent to the combinatorial problem of enumerating and sampling graphs. However, known graph theoretical techniques cannot directly be applied because molecules are not simple graphs but multigraphs colored by the elements of the periodic table. Hence, specific algorithms have to be designed in the context of chemistry. We are currently investigating three problems related to molecular graph generation: isomorphism, counting and enumeration, and sampling.

  1. Faulon J.L., Visco D, Roe D. Enumerating Molecules. In: Reviews in Computational Chemistry Vol. 21, Lipkowitz K. Edt., Wiley-VCH, 2005. (link to

Molecular graph isomorphism. Because molecules are a particular type of graphs, isomorphism need to be redefined. The good news is that isomorphism, automorphism partitioning, and cannonical labeling can be solved in polynomial-time because molecules are bounded valence graphs. We have written the first polynomial time algorithm for automorphism partitioning of planar molecular graphs. More recently (2004) we developed a canonization algorithm that is not limited to planar or bounded valence graphs, this algorithm perform well compare to Nauty even for difficult graphs such as projective planes.

  1. Faulon J.L., Collins M., Carr R.D. The Signature Molecular Descriptor. 4. Canonizing Molecules Using Extended Valence Sequence, J. Chem. Inf. Comput. Sci., 44, 427-436, 2004. [PMID: 15032522]. (.pdf manuscript).
  2. Faulon, J. L. , Automorphism Partitioning, and Canonical Labeling Can Be Solved in Polynomial- Time for Molecular Graphs, J. Chem. Inf. Comput. Sci., 1998, 38, 432-444. (link to journal)

Counting and enumerating molecules. The time-complexity of enumerating molecules remains unknown. However, there are indications that the enumeration problem should be polynomial-time per output (polynomial-delay) although all published algorithms are not polynomial-delay. First, the general problem of enumerating simple graphs has a polynomial-time complexity per output. Secondly, it is known that specific types of molecules (for example, molecular trees) can be enumerated with polynomial-delay algorithms. We are studying the time-complexity of enumerating molecules, and its implications for structural elucidation and chemical kinetics.
  1. Faulon J.L., Brown W.M., Martin S Reverse engineering chemical structures from molecular descriptors: how many solutions? J. Comput Aided Mol Des. 2005 in press [PMID: 16267694] (.pdf manuscript).
  2. Faulon J. L., Visco D. Churchwell, C. J., The Signature Molecular Descriptor. 2. Enumerating Molcules from their Extended Valence Sequences, J. Chem. Inf. Comput. Sci., 43 (3), 721 -734, 2003. [PMID: 12767130]. (link to journal)
  3. Faulon, J.L., On Using the Graph-Equivalent Classes for the Structure Elucidation of Large Molecules, J. Chem. Inf. Comput. Sci., 1992, 32, 338-348. (.pdf)

Sampling molecules. Stochastic algorithms to sample chemical structures: random sampling, Monte Carlo sampling, and genetic algorithms have been developped. These algorithms have been applied to infer samples of structures reprentative of petroleum feedstock, and to model polymeric materials. A comparative study of the performances of unlabeled sampling techniques versus labeled sampling techniques demonstrated that isomorph-free algorithms are not necessary to search structures matching specific properties. While sampling unlabeled molecules necessitates the use of isomorph-free algorithms, sampling labeled molecules does not require checking for isomorphism. Although both problems can theoretically be solved polynomially, much faster algorithms can be designed in the labeled case.
  1. Faulon, J. L., Stochastic Generator of Chemical Structure. (4) Building Polymeric Systems with Specified Properties, J. Comput. Chem., 2001, 22, 580-590. (link to journal)
  2. Faulon, J. L., Stochastic Generator of Chemical Structure. (2) Using Simulated Annealing to Search the Space of Constitutional Isomers, J. Chem. Inf. Comput. Sci., 1996, 36, 731-740. (link to journal). NOTE: This algorithm has been inplemented into CDK library maintained by C. Steinbeck.
  3. Faulon, J. L., Stochastic Generator of Chemical Structure. (1) Application to the Structure Elucidation of Large Molecules, J. Chem. Inf. Comput. Sci., 1994, 34, 1204-1218. (.pdf)

Practical uses of sampling molecules to elucidate natural products.

Reaction network generation. Reaction network generators are computational tools developed for chemical kinetics applications in synthesis design, combustion and petroleum refining, and cellular signaling networks. Network generators either construct networks from a given set of reactants and elementary reaction steps (RGB generators), or build networks from a set of reactants and a set of products (RGR generators). Most of the formal network generators are making use of the Ugi et al. methodology, which in turn is based on graph theory. The caveat of Ugi et al. methodology is the computational scaling: the generated networks grow exponentially with the problem size. We have developed the first network generator where generation and network reduction are performed in parallel. More precisely, we propose three algorithms that perform network reduction during the generation process. These algorithms achieve polynomial scaling while maintaining good results for thermal cracking of hydrocarbons. The algorithms are being implemented in BioNetGen, a cell signaling network generator (cf. also Faeder, et al. (2005) Rule-based modeling of biochemical networks. Complexity 10:22-41).

  1. Faulon, J. L. ; Sault, A. G. Stochastic Generator of Chemical Structure (3) Reaction Network, J. Chem. Inf. Comput. Sci, 41, 894 -908, 2001. [PMID: 11500106]. (link to journal)