Motivated by applications in synthesis design, combustion, and petrochemical refining, reaction network generation has been a prolific field of research for the past 25 years. As reviewed in Ugi et al. paper, reaction network generation is based on three types of techniques. Empirical techniques, which strategies are based on data from reaction libraries, semiformal methods based on heuristic algorithms, where reactions are derived from a few mechanistic elementary reactions, and formal techniques, based on graph theory. Ugi et al. argue that formal techniques are general enough to be applicable to any type of reacting system in organic or inorganic chemistry, furthermore formal techniques are the only methods capable of generating new reaction mechanisms, and therefore elucidating unresolved chemical processes.
While formal techniques are robust their computational limits their applicability to real reacting systems. Indeed as it has been shown by many authors, the number of reactions and intermediate species that are created generally scale exponentially with the number of atoms of the reactants. Two views of approaching this problem have been proposed. One is to wait until sufficient computational power becomes available. However, the exponential scaling of formal techniques make unprobable that we will ever reach enough computing power to solve the problem. The other view, which is considered in the present paper, is to reduce the reaction mechanism. The question raised when reducing mechanism is how to choose a reaction subset such that it describes correctly the dynamic behavior of the studied reacting system. Reduction strategies in the area of combustion modeling have been reviewed by Frenklach, these are quite general and applicable to other systems. According to Frenklach there are five types of reduction strategies: 1) global reduction, 2) response modeling, 3) chemical lumping, 4) statistical lumping, and 5) detailed reduction. Global modeling techniques transform a complete reaction scheme into a small number of global reaction steps. Global modeling techniques comprise ad-hoc methods such as empirical fitting, reduction by approximations, and lumping. All global techniques are specific to a particular problem and cannot be generalized. Response modeling techniques consist of mapping model responses and model variables through functional relationships. Generally, models responses are species concentrations and model variables are the initial boundary conditions of the reacting mixture and the model parameters, such as rate coefficients, and transport properties. The solution mapping method provides a general procedure to solve model response. In this method, model responses are expressed as simple algebraic functions (usually polynomials) in terms of model variables. These algebraic functions are obtained by using either computer experiments or experimental data. As with global techniques, response modeling solutions are problem specific since they require data to build algebraic functions. Chemical lumping and statistical lumping were both developed for polymerization-type reactions. Chemical lumping models are used when a polymer grows by reaction between the polymer and monomer species. The lumping strategy is guided by similarity in chemical structure or chemical reactivity of the reacting species. The main assumption of chemical lumping is that the reactions describing the polymer growth are essentially the same and the associated thermochemical and rate parameters exhibit only a weak dependence on the polymer size. Statistical lumping is used when polymer-polymer interactions are of concern, such as in soot formation, silica powder growth, and metal oxide growth. Such processes are described by the Smoluchowki equation. The integration of the Smoluchowki equation encounters severe prohibitive demands on computer time and memory and statistical approximation of the equation have been proposed. Both chemical and statistical lumping methods are specific to polymerization reactions. Finally, the detailed reduction mechanism technique consists of identifying and removing noncontributing reactions. An effective reduction strategy is to compare the individual reaction rates with the rate of a chosen reference reaction. The reference reaction being for instance the rate limiting step or the fastest reaction. The detailed reaction reduction approach is a general technique and is a priori applicable to any reacting system.
The goal of this paper is to propose an original technique for reaction network generation that is computationally tractable and general enough to be applicable to real processes, such as combustion, petroleum refining, and retrosynthesis. To achieve such a goal, formal generation techniques must be used since with the above chemical processes not all reactions are known and there is a need to create new reaction mechanisms. However, as mentioned earlier formal techniques scales exponentially with the problem size and therefore reduction methods need to be applied. As discussed earlier, the only reduction technique applicable to general systems is the detailed reduction method. The caveat of the method however, is that the entire network must be known prior reduction. Therefore, network generation and reduction cannot be processed in sequence if our goal is to derive a computationally tractable technique. In the following section, we outline a new method where network generation and reduction are performed in parallel. More precisely we give four algorithms: deterministic-network-generator (DNG), random-sampling-network-generator (RSNG), concentration-sampling-network-generator (CSNG) and MC-sampling-network-generator (MCNG). While the first algorithm is deterministic and similar to techniques previously reported, the three sampling algorithms are stochastic in nature and appears to be the first stochastic methods ever published for reaction network generation. In the complexity section, we theoretically prove that our sampling generation-reduction algorithms no longer scales exponentially with the problem size but scales polynomially. Our generation-reduction method requires the knowledge of the rates of the reactions output during the generation process, this issue is detailed in a section on rate calculation. Finally, the last section illustrates the generation-reduction technique in the context of hydrocarbons thermal cracking. Methodology
The proposed network generator is based on the Dugundji-Ugi theory. According to this theory compounds and reaction are represented by bond electron (be-) and reaction (r-) matrices. In a given be-matrix representing a compound the ith row (and column) is assigned the ith atom of the compound. The entry bij (bji, i_j) of the ith row and jth column of a be-matrix is the bond order of the covalent bond between atoms i and j. The diagonal entry bij is the number of free valence electron for atom i. The reader may noticed that the adjacency and connectivity matrices used in chemical information differ from the be-matrices in their diagonal entries. The redistribution of valence electrons by which the starting materials of chemical reactions are converted into their products are represented by r-matrices. The fundamental equation of the Dugundji-Ugi model for any reaction b1 + b2 + .. bn -> e1 + e2 + .. em is B + R = E, where B and E are the be-matrices of reactants and products, and R is the reaction matrix. The addition of matrices proceeds entry by entry, that is, bij + rij = eij. Since there are no formal negative bond orders or negative numbers of valence electrons, the negative entries of R must be matched by positive entries of B of at least equal values. Within the above restrictions, Dugundji and Ugi proved that be- and r- matrices forms a free additive abelian group. There are two types of formal reaction generators, RGB and RGR. RGB generators solve the fundamental equation for a given B, while RGR generator solve the equation for a given R.
The generator proposed in this paper is of type RGB. Consequently, the generator builds all the products species that can be generated from a set of reactants, a set of constraints, and a given chemical process. The current version of our generator is limited to monomolecular and bimolecular reactions but could easily be extended to more complex reactions. Following Dugundji-Ugi methodology reactants are represented by be-matrices. The constraints are entered by the user from the list given in Table I. The chemical process is given into the form of elementary steps, which are stored in a library of r-matrices. Elementary steps are the electronic transitions that atoms of the reacting system can undergo. Each elementary step is composed of two configurations containing all the atoms participating to a reaction before and after the reaction has taken place. Examples of elementary steps and associated r-matrices are given in Figure 1. Fontain and Reitsman have computed the complete set of elementary steps that carbon atoms can take in organic reactions. This set is composed of 324 steps, however the set can be greatly reduced depending on the studied process. For instance, it well known that all hydrocarbon thermal cracking reactions can be generated from the five elementary steps listed in Figure 1. The algorithms of our generator are given next and are tested for thermal cracking of hydrocarbons in the Result section. 1. Deterministic algorithm
The input of our reaction network generator is a list of reactant species, a list of constraints, and a list of elementary steps. The output is a network composed of all possible species and reactions that can be created from the input. The algorithm starts by computing all the possible species (Ls1) that can be derived by applying the elementary steps (Les) to the initial species (Lso) while respecting the constraints. To prohibit duplication of species, Ls1 is composed of species that are not already present in Ls0. When applying elementary steps, one has to make the distinction between monomolecular and bimolecular reactions. For each monomolecular elementary step, Ls1 is composed of all the possible products that can be generated by applying the elementary step in all possible ways to the species of Lso. For each bimolecular reaction, Ls1 is composed of the products derived by applying the elementary step to all possible pairs of species in Lso. The list of reactions Lr1 is updated each time an elementary step applies to a species in Lso. Each reaction is represented by a n-tuples composed of the elementary step (from Les), the reacting species (from Ls0), and the product species (from Ls1 or Ls0). Monomolecular and bimolecular reaction procedures are illustrated in Figure 2. Once the list Ls1 has been computed the algorithm proceeds and compute recursively Ls2, Ls3,.. Lsi. Note that in order to compute Lsi (i > 1) one has to consider monomolecular reactions only from the set Lsi-1 since all monomolecular reactions from the sets Ls i-2, Ls i-3,... have already been computed in the previous steps. For the same reason, the products of bimolecular reactions in Lsi are generated for every possible pairs of species having at least on element in Lsi-1. To avoid redundancies of species in the list Ls0, Ls1,.. Lsi, at any step i > 1 a new species is added to Lsi only if the species is not in Lsi-1 U Lsi-2 U..U Ls0. The process halts at any step i, where the corresponding lists of species Lsi and reaction Lri are empty. Since bimolecular reactions can potentially create products of infinite molecular weight, in order to keep a finite network size one has to set a limit for the maximum species size, let n be this limit. Note that with the exception of polymerization reaction it is reasonable to assume that all species in a given network will be limited in size. In turn, note that if the species have a limited size then the network generation algorithm is guaranty to converge. Indeed, the maximum number of species, N that can be formed is bounded by the total number of molecular structures having a number of atoms raging from 1 to the specified value of n. This latter number is equal to the sum of the numbers of isomers containing 1,2,..,n atoms. Finally, note that the number of reactions is also finite and is bounded by N2 the maximum number of edges in the network (i.e., each species reacts with all the others).
The network generator algorithm given in Scheme I uses specific data structures to describe species and reactions. A molecular species s is represented by a molecular graph G(s). This graph in turn is represented by a be-matrix. An elementary step (es) is represented by two molecular graphs, Gr(es) the electronic configurations of the surrounding atoms before the reaction has taken place, and Gp(es) the electronic configuration after the reaction has taken place. From these molecular graph one constructs a r-matrix which is the difference between the two configurations (i.e. r-matrix corresponding to Gp(es)-Gr(es)). Finally, a reaction is an n-tuple composed of a elementary step (es), a list of reactant species (Lr(r)) and a list of product species (Lp(r)). Additionally, the algorithm maintain several list of species and reactions. The list of reactants (Lso), the list of species (Lsi) and reactions (Lri) created at the current step i, and the list of species generated at the previous step (Lsi-1). The operator U* add a new species to any list Lsi if and only if the species is not already present in Lsi. Since species are represented by molecular graph the operator U* performs a series of graph isomorphism checks between the species to be inserted and the species already in the list. The molecular graph isomorphism routines used in this study have been published elsewhere. The computational scaling is O(n2), where n is the number of atoms of the species. The following scheme gives the deterministic network generator routines. The function species-constraints is rather trivial and not detailed in scheme I, the function verifies the constraints listed in Table I, the function returns FALSE if any of the constraint is not verified and return TRUE otherwise. Scheme I: deterministic-network-generator(Lso,Les,Ls,Lr) Input: Lso, list of initial species Les, list of elementary steps Output: Ls, list of all species in network Lr, list of all reactions in network Local variable: Lsi, list of species created at step i Lri, list of reactions created at step i Ls = Lso, Lr = ø, Lsi = ø, i = 0 do Lsi-1 = Lsi, Lsi = ø, Lri = ø, i = i + 1 generate-species-reactions(Lsi-1,Ls,Les,i,Lsi,Lri) Ls = Ls U* Lsi Lr = Lr U* Lri until (Lsi = ø and Lri = ø) generate-species-reactions(Ls1,Ls2,Les,i,Lsi,Lri) Input: Ls1,Ls2, list of species Les, list of elementary steps i, step number Output: Lsi, list of species created at step i Lri, list of reactions created at step i For all species s in Ls1 do For all reactions es in Les with order(er) = 1 do LGp = ø generate-product(G(s),Gr(er),Gp(er),LGp) For all graphs Gp in LGp do compute Lp the list of connected components of Gp Lsi = Lsi U* Lp Lri = Lri U* (es,s,Lp) done done done For all species s1 in Ls1 and s2 in Ls2 do For all es in Les with order(er) = 2 do LGp = ø generate-product(G(s1) U G(s2),Gr(er),Gp(er),LGp) For all graphs Gp in LGp do compute Lp the list of connected components of Gp Lsi = Lsi U* Lp Lri = Lri U* (es,s1,s2,Lp) done done done generate-product(G(s),Gr(er),Gp(er),LGp) Input: G(s), molecular graph of species s Gr(er), molecular graph of the reactants of er Gp(er), molecular graph of the products of er Output: LGp, list of graphs obtained after applying er to s For all Gs' in Gs such as Gs' = Gr(er) do Gp = Gs - Gs' + Grp(er) if (species-constraints(Gp) = TRUE)then LGp = LGp U* Gp fi done 2.1 Sampling algorithms
Although the size of the network generated by the deterministic algorithm is finite due the limited size of the species, as shown by Broadbelt et al. the number species in the network grows exponentially with the with n, the number of atoms of the largest species. In fact, Table II demonstrates that it is virtually impossible to generated thermal cracking networks for species larger than pentane. As argued in the introduction there are several technique to reduce the network size. We propose in this section three network sampling algorithms based on the detailed reduction idea. These three algorithms are: random-sampling, concentration-sampling, and Monte-Carlo sampling. 2.a) Random-sampling algorithm. The algorithm is identical to the deterministic network generator with the exception that at each step i the list Lsi of species created is reduced to a size equal to a pre-defined Ms number. The reduction is done at random, that is, species are removed at random from Lsi until the size of Lsi is below Ms. The algorithm is given in Scheme II. The subroutine Generate-species-reactions is given in scheme I. Scheme II: random-sampling-network-generator(Lso,Les,Ls,Lr) Input: Lso, list of initial species Les, list of elementary steps Output: Ls, list of all species in network Lr, list of all reactions in network Local variable: Lsi, list of species created at step i Lri, list of reactions created at step i Ls = Lso, Lr = ø, Lsi = ø, i = 0 do Lsi-1 = Lsi, Lsi = ø, Lri = ø, i = i + 1 generate-species-reactions(Lsi-1,Ls,Les,i,Lsi,Lri) reduce-mechanism-random(Ls,Lr,Lsi,Lri) Ls = Ls U* Lsi Lr = Lr U* Lri until Lsi = ø and Li = ø reduce-mechanism-random(Ls,Lr,Lsi,Lri) Input: Ls, list of species Les, list of elementary steps Output: Lsi, list of species created at step i Lri, list of reactions created at step i Constant: Ms, maximum number of species allowed in Lsi While (|Lsi| > Ms) do select s in Lsi at random Lsi = Lsi - s For all reaction r in Lri do if s belongs to r Lri = Lri - r done done 2.b) Concentration-sampling algorithm. With the concentration-sampling algorithm, the reduction of the network is not performed at random but based on the species concentrations. The main assumption of this method is that if a species created at any given step i has low concentration values over the reaction time, this species will generate products with concentrations at most equal to that low value. Therefore, removing low concentration species from the list Lsi should have a neglectable effect on the final product distribution. Note that this assumption is valid only if the species is consumed during the reactions and do not act as a catalyst. Indeed if a species is a catalyst even with low concentration, this species could have a relatively large impact of the final product distribution. However, as it is shown in the result section, for the thermal cracking reactions species do not act as catalysts and the assumption appears to be valid. Using the above assumption, the algorithm works as follows. At each step n the deterministic routine Generate-species-reaction compute is run to compute the list of new species. At this point the network is composed of all species in Lso U ... U Lsi and associated reactions. Although the network may no be completed since the reaction rates are known (cf. Next section), the time evolution of the species concentrations can be computed by solving the system of differential equations associated with the network. In the present algorithm, we use the Monte-Carlo Gillespie (MC-Gillespie) technique to solve the system. More precisely, the MC-Gillespie technique monitors the number of particles for each species versus time. The initial number of particles of the reactants are computed from their initial concentrations (given by the user), and the initial number Mp of particule in the system. In the present paper, we are using the MC-Gillespie technique at constant volume V, hence the initial number of particle is calculated from the particle density, and the reaction volume (both user?s input). The MC-Gillespie technique is an exact method for numerical integration of the time evolution of any spatially homogeneous mixture of molecular species that interact through a specified set of coupled chemical reaction channels. The technique is based on a fundamental equation giving the probability at time t that the next reaction in V will occur in the differential time interval [t+t, t+t+dt] and will be an rm reaction:
P(t,m) dt = P1(t) . P2(m/t) (1)
where
P1(t) = a exp (-at) dt (2)
and
P2(m/t) = am/a (3)
In eqs 2 and 3, amdt is the probability, to first order in dt, that a reaction rm will occur in V in the next time interval dt and a = Smam. The rate constant km is related to am in the different ways depending if the reaction is monomolecular or bimolecular. For monomolecular reactions am [s] km, where [s] is the concentration of the reactant species s. For bimolecular reaction involving two species s1 and s2, am [s1][s2] km/V and for bimolecular reaction involving only one reactant species am = [s][s-1] km/V. In order to integrate eq. 1, Gillespie proposes the following scheme. First generate a random value t according to P1(t) and second generate a random integer m according to P2(m/t). In our implementation of the MC-Gillespie technique, the random value t is computed by simply drawing a random number r1 from an uniform distribution in the unit interval and taking: t = (1/a)ln(1/r1) (4)
In turn, the random integer m is generated by drawing another random number r2 in the unit interval and by taking m the integer that verifying: Sn=1m-1 an < r2 a _ Sn=1m an (5)
In his original paper, Gillespie has proven that expressions (4) and (5) are indeed correct to simulate stochastically the time evolution of homogenous reactions.
During the MC-Gillespie integration, the algorithm retains for each species present in the network the maximum number of particles reached over the time period the system is integrated. Species created at step i are then sorted by decreasing maximum number of particles, the first Ms elements of the sorted list are keep in Lsi while the other are eliminated. The algorithm is given in the following scheme, the routine generate-species-reaction is given in scheme I. The routine assign-initial-number-particle is not detailed here, but returns the initial number of particles for each reactants. This number is simply equal to the product of the maximum number of particles (Mp) by the initial concentration of the reactant. Both numbers are user?s input. Scheme III: concentration-sampling-network-generator(Lso,Les,Ls,Lr) Input: Lso, list of initial species Les, list of elementary steps Output: Ls, list of all species in network Lr, list of all reactions in network Constant: Mc, maximum number of Monte-Carlo steps Mp, maximum number of particles Local variable: Lsi, list of species created at step i Lri, list of reactions created at step i Ls = Lso, Lr = ø, Lsi = ø, n = 0 do Lsi-1 = Lsi, Lsi = ø, Lri = ø, i = i + 1 generate-species-reactions(Lsi-1,Ls,Les,n,Lsi,Lri) reduce-mechanism-concentration(Ls,Lr,Lsi,Lri) Ls = Ls U* Lsi Lr = Lr U* Lri until Lsi = ø and Lri = ø reduce-mechanism-concentration(Ls,Lr,Lsi,Lri) Input: Ls, list of species Les, list of elementary steps Output: Lsi, list of species created at step i Lri, list of reactions created at step i Constant: Mc, maximum number of Monte-Carlo steps Local variable: L[s], list of species concentration L[smax], list of species maximum concentration t, time nc, number of Monte-Carlo steps For all species s in Ls such as step(s) = 0 do if step(s) = 0 then [s] = assign-initial-number-particles(s,Mp) [smax] = [s] else [s] = 0, [smax] = 0 fi L[s] = L[s] U [s] L[smax] = L[smax] U [smax] done t 0, nc 0 While nc < Mc do t = MC-step(Ls,L[s],Lr,t) For all species s in Ls do if [s] > [smax] then [smax] = [s] fi done nc = nc + 1 done While (|Lsi| > Ms) do find s in Lsi having to the lowest [smax] value Lsi = Lsi - s For all reactions r in Lri do if s belongs to r then Lri = Lri - r fi done done MC-Gillespie-step(Ls,L[s],Lr,t) Input: Ls, list of molecular species L[s], list of species concentration Lr, list of reactions t, time Output: updated L[s] t, time after event occurs compute time dt of next event using eq. 4 select reaction r in Lr occurring at time t+dt using eq. 5 t = t + dt For all reactant species s in Lr(r) do [s] = [s] - 1 done For all product species s in Lp(r) do [s] = [s] + 1 done return t 2.c) MC-sampling- algorithm. The idea of the MC-sampling algorithm is to perform at the same time both the Monte-Carlo integration and the network generation. The advantage of this technique is that there are no assumptions regarding catalyst species. As in the previous case, one starts with an initial reactant concentration given in the form of numbers of particles. The initial total number of particles (Mp) is calculated from the particle density and reaction volume. As it will be demonstrated in the last section of the paper, this number must be large to provide accurate results. At each step, the set of new species is computed using the generate-species-reactions routine, but in the present case these species are generated by applying the elementary steps only for species having non-zero concentration (i.e., set Ls' in Scheme IV). The concentrations of the new species are set to zero and updated using the MC-Gillespie integration step. The process is iterated until the number of step exceed the pre-defined Mc value. In the following scheme, the routine generate-species-reactions is given in scheme I, and the routine MC-Gillespie-step can be found in scheme III.

Scheme IV: MC-sampling-network-generator(Lso,Les,Ls,Lr) Input: Lso, list of initial species Les, list of elementary steps Output: Ls, list of all species in network Lr, list of all reactions in network Constant: Mc, maximum number of Monte-Carlo steps Mp, maximum number of particles Local variable: L[s], list of species concentration Ls', list of species with non-zero concentration t, time Ls = Lso, L[s] = ø, Lr = ø For all species s in Ls do [s] = assign-initial-number-particles(s,Mp) L[s] = L[s] U [s] done t 0, i 0 While i < Mc do Lsi = ø, Lri = ø, Ls' = ø, i = i + 1 For all s in Ls do if [s] = 0 then Ls' = Ls'- s fi done generate-species-reactions(Ls',Ls',Les,i,Lsi,Lri) Ls = Ls U* Lsi Lr = Lr U* Lri For all s in Lsi do [s] = 0 done t = MC-step(Ls,L[s],Lr,t) done Computational complexity
In this section we derive the upper bound time-complexity of the above algorithms, and show that the three sampling algorithms can be run in polynomial time. Let n be an upper bound for the number of atoms for each species, let N be an upper bound for the maximum number of species generated by the algorithm (i.e., N _ |Ls|), let Ni, be the number of species in the list Lsi (i.e., Ni = |Lsi|), and let N0 be the number of reactant species (i.e., N0 = |Ls0|). The maximum number of atoms for each elementary step or r-matrices is r, and the number of elementary reaction steps is R. Note that for hydrocarbon cracking reactions r _ 3 and R = 5 (cf. Figure 1). As a general rule r and R are always bounded and will be taken as constant in our complexity calculations.
We first evaluate the time complexity of the procedure generate-product. The procedure performs a subgraph isomorphism between a given species and an elementary step. Since each elementary step contains at most r atoms, there are r! ways of mapping Gr(es) onto each atom of Gs. Gs comprises at most n atoms, therefore, the number of subgraph isomorphism performed is nr!. Let us assume that all tries pass the subgraph isomorphism and the constraint tests. In such an instance nr! graphs are added to the list LGp. Each graph added to LGp is tested for isomorphism with a complexity of O(n2). The constraints listed in Table I can be checked in O(n) steps, consequently the time complexity of the generate-product routine is nr! (n + n2) = O(n3) since r is bounded by a constant. The routine generate-species-reaction distinguishes monomolecular reactions from bimolecular reactions. For monomolecular reactions all species in Ls1 = Lsi-1 and all elementary steps Les are sent to generate-product. Generate-product may return nr! graphs, these graphs is turn may contain at most n connected components (i.e., |Lp| = n). Therefore, there is a total of |Lsi-1| |Les| nr! |Lsi| |Lp| = Ni-1R Ni n2r! isomorphisms performed between the graphs of Lp and Lsi. Isomorphism between species can be achieved in O(n2), isomorphism between reactions can be performed in a constant time (i.e., O(1)) since reactions are represented by n-tuple. The overall complexity for monomolecular reaction is therefore O(Ni-1 Ni R n4) = O(Ni-1 Ni n4). For bimolecular reactions the same operations are performed for all pairs of species between Lsi-1 and Ls. The time-complexity is O(N Ni-1 n4), which is the dominant term for generate-species-reactions since N _ Ni. Note that the number of species returned by generate-species-reactions is | Lsi-1| |Les| nr! |Lp| = Ni-1 R n2r for monomolecular reactions and | Lsi-1| |Ls| |Les| nr! |Lp| = Ni-1 N R n2r for bimolecular reactions. In both case mono- and bi-molecular reactions we have |Lri| = |Lsi|. Let us now evaluate the number of iterations in the deterministic-network-generator routine. Let k be the maximum valence allowed, usually in organic chemistry k _ 4. It will take at most kn steps to remove all the bonds in Lso and it will take at most kn steps to create the largest structure comprising n atoms. Therefore all accessible isomers comprising 1 to n atoms can be created in 2kn steps, and the maximum number of iterations of the main routine is bounded by 2kn = O(n). The deterministic-network-generator routine performs O(n) calls to generate-species-reactions, O(n) isomorphism checks between Lsi and Ls, and O(n) insertions of Lri in Lr. Therefore, the overall time complexity is O(n N Ni-1n4 + N Ni-1 n2 + N Ni-1) = O(n5 N Ni-1) _ O(n5 N2). This time complexity is not polynomial since N is bounded by the number of isomers comprising 1,..n atoms, which increases exponentially with n.
Let us now consider the sampling routines. The procedure reduce-mechanism-random performs Ms random selections in Lsi. Since none of the lists Lso U ... U Lsi-1 have a size greater than Ms, we have |Ls| _ i Ms, and |Lsi | = |Lsi-1| |Les| nr! |Lp| + |Lsi-1| |Ls| |Les| nr! |Lp| _ MsRn2r + Msi MsRn2r _ i Ms2Rn2r! + MsRn2r!. Since i is bounded by 2kn and k, Ms, R, and r are constants |Lsi| = O(n3). Consequently, Ms random selections in Lsi will be performed with a time complexity O(Msn3) = O(n3). Let us recall that the complexity of generate-reactions-species is O(|Ls| |Lsi-1| n3) which in the present case reduces to O(n Ms2n3) = O(n4). The overall complexity of the random-sampling-network generator algorithm is therefore 2n (n4 + |Ls| |Lsi| n2 +(Lr| |Lri|)) _ 2n (n4 + n Ms Ms (n2+1)) = O(n5).
The concentration-sampling-network-generator algorithm is similar to random-sampling-network-generator with the exception of the reduce-mechanism routine. There are three loop in this routine. Clearly the complexity of the first loop is |Ls|, the second loop calls Mc times the Monte-Carlo-step routine and search the maximum concentration over time of all species in Ls. The Monte-Carlo-step routine has a complexity of |Lr|, hence the overall complexity of reduce-mechanism-concentration is Mc (|Ls| + |Lsi| |Lr| + |Ls|| Lri |) = Mc (n Ms + 2n Msn3) = Mc O(n4), and the complexity of the concentration-sampling-network-generator algorithm is Mc O(n5).
The main difference between MC-sampling-network-generator and the other algorithms is that the number of particles remains the same, Mp. Hence, the number of species with non-zero concentration is at most Mp (i.e., |Ls?| _ Mp). The routine generate-species-reactions is called for the list Ls?, consequently, its complexity is O(|Ls?| |Ls?| n3) _ Mp2n3 = O(n3). Note also that |Lsi| the number of species returned as well as |Lri| the number of reaction are bounded by |Ls?| |Les| nr! |Lp| _ Mp2 R n2 r! _ O(n2). The number of species at the first iteration of MC-sampling-network-generator (i.e., the number of reactants) is at most Mp. The number of species at the second iteration is |Ls0 U* Ls1| _ Mp + O(n2) = O(n2). It can easily be verified that after Mc iterations the number of species |Ls| will remain bounded by Mp + Mc O(n2) = Mc O(n2). Since | Lri| = |Lsi|, we have |Lr| = Mc O(n2). Hence, the complexity of the isomorphism checks in MC-sampling-network-generator is |Ls| |Lsi| n2 = Mc O(n6), while the complexity of MC-step is |Lr| = Mc O(n2). Since the number of iteration is equal to Mc, the overall complexity of MC-sampling-network-generator is Mc2 O(n6) + Mc2 O(n2) = Mc2 O(n6).

Motivated by applications in synthesis design, combustion, and petrochemical refining, reaction network generation has been a prolific field of research for the past 25 years. As reviewed in Ugi et al. paper, reaction network generation is based on three types of techniques. Empirical techniques, which strategies are based on data from reaction libraries, semiformal methods based on heuristic algorithms, where reactions are derived from a few mechanistic elementary reactions, and formal techniques, based on graph theory. Ugi et al. argue that formal techniques are general enough to be applicable to any type of reacting system in organic or inorganic chemistry, furthermore formal techniques are the only methods capable of generating new reaction mechanisms, and therefore elucidating unresolved chemical processes.

While formal techniques are robust their computational limits their applicability to real reacting systems. Indeed as it has been shown by many authors, the number of reactions and intermediate species that are created generally scale exponentially with the number of atoms of the reactants. Two views of approaching this problem have been proposed. One is to wait until sufficient computational power becomes available. However, the exponential scaling of formal techniques make unprobable that we will ever reach enough computing power to solve the problem. The other view, which is considered in the present paper, is to reduce the reaction mechanism. The question raised when reducing mechanism is how to choose a reaction subset such that it describes correctly the dynamic behavior of the studied reacting system. Reduction strategies in the area of combustion modeling have been reviewed by Frenklach, these are quite general and applicable to other systems. According to Frenklach there are five types of reduction strategies: 1) global reduction, 2) response modeling, 3) chemical lumping, 4) statistical lumping, and 5) detailed reduction. Global modeling techniques transform a complete reaction scheme into a small number of global reaction steps. Global modeling techniques comprise ad-hoc methods such as empirical fitting, reduction by approximations, and lumping. All global techniques are specific to a particular problem and cannot be generalized. Response modeling techniques consist of mapping model responses and model variables through functional relationships. Generally, models responses are species concentrations and model variables are the initial boundary conditions of the reacting mixture and the model parameters, such as rate coefficients, and transport properties. The solution mapping method provides a general procedure to solve model response. In this method, model responses are expressed as simple algebraic functions (usually polynomials) in terms of model variables. These algebraic functions are obtained by using either computer experiments or experimental data. As with global techniques, response modeling solutions are problem specific since they require data to build algebraic functions. Chemical lumping and statistical lumping were both developed for polymerization-type reactions. Chemical lumping models are used when a polymer grows by reaction between the polymer and monomer species. The lumping strategy is guided by similarity in chemical structure or chemical reactivity of the reacting species. The main assumption of chemical lumping is that the reactions describing the polymer growth are essentially the same and the associated thermochemical and rate parameters exhibit only a weak dependence on the polymer size. Statistical lumping is used when polymer-polymer interactions are of concern, such as in soot formation, silica powder growth, and metal oxide growth. Such processes are described by the Smoluchowki equation. The integration of the Smoluchowki equation encounters severe prohibitive demands on computer time and memory and statistical approximation of the equation have been proposed. Both chemical and statistical lumping methods are specific to polymerization reactions. Finally, the detailed reduction mechanism technique consists of identifying and removing noncontributing reactions. An effective reduction strategy is to compare the individual reaction rates with the rate of a chosen reference reaction. The reference reaction being for instance the rate limiting step or the fastest reaction. The detailed reaction reduction approach is a general technique and is a priori applicable to any reacting system.

The goal of this paper is to propose an original technique for reaction network generation that is computationally tractable and general enough to be applicable to real processes, such as combustion, petroleum refining, and retrosynthesis. To achieve such a goal, formal generation techniques must be used since with the above chemical processes not all reactions are known and there is a need to create new reaction mechanisms. However, as mentioned earlier formal techniques scales exponentially with the problem size and therefore reduction methods need to be applied. As discussed earlier, the only reduction technique applicable to general systems is the detailed reduction method. The caveat of the method however, is that the entire network must be known prior reduction. Therefore, network generation and reduction cannot be processed in sequence if our goal is to derive a computationally tractable technique. In the following section, we outline a new method where network generation and reduction are performed in parallel. More precisely we give four algorithms: deterministic-network-generator (DNG), random-sampling-network-generator (RSNG), concentration-sampling-network-generator (CSNG) and MC-sampling-network-generator (MCNG). While the first algorithm is deterministic and similar to techniques previously reported, the three sampling algorithms are stochastic in nature and appears to be the first stochastic methods ever published for reaction network generation. In the complexity section, we theoretically prove that our sampling generation-reduction algorithms no longer scales exponentially with the problem size but scales polynomially. Our generation-reduction method requires the knowledge of the rates of the reactions output during the generation process, this issue is detailed in a section on rate calculation. Finally, the last section illustrates the generation-reduction technique in the context of hydrocarbons thermal cracking.

MethodologyThe proposed network generator is based on the Dugundji-Ugi theory. According to this theory compounds and reaction are represented by bond electron (be-) and reaction (r-) matrices. In a given be-matrix representing a compound the ith row (and column) is assigned the ith atom of the compound. The entry bij (bji, i_j) of the ith row and jth column of a be-matrix is the bond order of the covalent bond between atoms i and j. The diagonal entry bij is the number of free valence electron for atom i. The reader may noticed that the adjacency and connectivity matrices used in chemical information differ from the be-matrices in their diagonal entries. The redistribution of valence electrons by which the starting materials of chemical reactions are converted into their products are represented by r-matrices. The fundamental equation of the Dugundji-Ugi model for any reaction b1 + b2 + .. bn -> e1 + e2 + .. em is B + R = E, where B and E are the be-matrices of reactants and products, and R is the reaction matrix. The addition of matrices proceeds entry by entry, that is, bij + rij = eij. Since there are no formal negative bond orders or negative numbers of valence electrons, the negative entries of R must be matched by positive entries of B of at least equal values. Within the above restrictions, Dugundji and Ugi proved that be- and r- matrices forms a free additive abelian group. There are two types of formal reaction generators, RGB and RGR. RGB generators solve the fundamental equation for a given B, while RGR generator solve the equation for a given R.

The generator proposed in this paper is of type RGB. Consequently, the generator builds all the products species that can be generated from a set of reactants, a set of constraints, and a given chemical process. The current version of our generator is limited to monomolecular and bimolecular reactions but could easily be extended to more complex reactions. Following Dugundji-Ugi methodology reactants are represented by be-matrices. The constraints are entered by the user from the list given in Table I. The chemical process is given into the form of elementary steps, which are stored in a library of r-matrices. Elementary steps are the electronic transitions that atoms of the reacting system can undergo. Each elementary step is composed of two configurations containing all the atoms participating to a reaction before and after the reaction has taken place. Examples of elementary steps and associated r-matrices are given in Figure 1. Fontain and Reitsman have computed the complete set of elementary steps that carbon atoms can take in organic reactions. This set is composed of 324 steps, however the set can be greatly reduced depending on the studied process. For instance, it well known that all hydrocarbon thermal cracking reactions can be generated from the five elementary steps listed in Figure 1. The algorithms of our generator are given next and are tested for thermal cracking of hydrocarbons in the Result section.

1. Deterministic algorithmThe input of our reaction network generator is a list of reactant species, a list of constraints, and a list of elementary steps. The output is a network composed of all possible species and reactions that can be created from the input. The algorithm starts by computing all the possible species (Ls1) that can be derived by applying the elementary steps (Les) to the initial species (Lso) while respecting the constraints. To prohibit duplication of species, Ls1 is composed of species that are not already present in Ls0. When applying elementary steps, one has to make the distinction between monomolecular and bimolecular reactions. For each monomolecular elementary step, Ls1 is composed of all the possible products that can be generated by applying the elementary step in all possible ways to the species of Lso. For each bimolecular reaction, Ls1 is composed of the products derived by applying the elementary step to all possible pairs of species in Lso. The list of reactions Lr1 is updated each time an elementary step applies to a species in Lso. Each reaction is represented by a n-tuples composed of the elementary step (from Les), the reacting species (from Ls0), and the product species (from Ls1 or Ls0). Monomolecular and bimolecular reaction procedures are illustrated in Figure 2. Once the list Ls1 has been computed the algorithm proceeds and compute recursively Ls2, Ls3,.. Lsi. Note that in order to compute Lsi (i > 1) one has to consider monomolecular reactions only from the set Lsi-1 since all monomolecular reactions from the sets Ls i-2, Ls i-3,... have already been computed in the previous steps. For the same reason, the products of bimolecular reactions in Lsi are generated for every possible pairs of species having at least on element in Lsi-1. To avoid redundancies of species in the list Ls0, Ls1,.. Lsi, at any step i > 1 a new species is added to Lsi only if the species is not in Lsi-1 U Lsi-2 U..U Ls0. The process halts at any step i, where the corresponding lists of species Lsi and reaction Lri are empty. Since bimolecular reactions can potentially create products of infinite molecular weight, in order to keep a finite network size one has to set a limit for the maximum species size, let n be this limit. Note that with the exception of polymerization reaction it is reasonable to assume that all species in a given network will be limited in size. In turn, note that if the species have a limited size then the network generation algorithm is guaranty to converge. Indeed, the maximum number of species, N that can be formed is bounded by the total number of molecular structures having a number of atoms raging from 1 to the specified value of n. This latter number is equal to the sum of the numbers of isomers containing 1,2,..,n atoms. Finally, note that the number of reactions is also finite and is bounded by N2 the maximum number of edges in the network (i.e., each species reacts with all the others).

The network generator algorithm given in Scheme I uses specific data structures to describe species and reactions. A molecular species s is represented by a molecular graph G(s). This graph in turn is represented by a be-matrix. An elementary step (es) is represented by two molecular graphs, Gr(es) the electronic configurations of the surrounding atoms before the reaction has taken place, and Gp(es) the electronic configuration after the reaction has taken place. From these molecular graph one constructs a r-matrix which is the difference between the two configurations (i.e. r-matrix corresponding to Gp(es)-Gr(es)). Finally, a reaction is an n-tuple composed of a elementary step (es), a list of reactant species (Lr(r)) and a list of product species (Lp(r)). Additionally, the algorithm maintain several list of species and reactions. The list of reactants (Lso), the list of species (Lsi) and reactions (Lri) created at the current step i, and the list of species generated at the previous step (Lsi-1). The operator U* add a new species to any list Lsi if and only if the species is not already present in Lsi. Since species are represented by molecular graph the operator U* performs a series of graph isomorphism checks between the species to be inserted and the species already in the list. The molecular graph isomorphism routines used in this study have been published elsewhere. The computational scaling is O(n2), where n is the number of atoms of the species. The following scheme gives the deterministic network generator routines. The function species-constraints is rather trivial and not detailed in scheme I, the function verifies the constraints listed in Table I, the function returns FALSE if any of the constraint is not verified and return TRUE otherwise.

Scheme I:

deterministic-network-generator(Lso,Les,Ls,Lr)Input: Lso, list of initial species

Les, list of elementary steps

Output: Ls, list of all species in network

Lr, list of all reactions in network

Local variable: Lsi, list of species created at step i

Lri, list of reactions created at step i

Ls = Lso, Lr = ø, Lsi = ø, i = 0

do

Lsi-1 = Lsi, Lsi = ø, Lri = ø, i = i + 1

generate-species-reactions(Lsi-1,Ls,Les,i,Lsi,Lri)

Ls = Ls U* Lsi

Lr = Lr U* Lri

until (Lsi = ø and Lri = ø)

generate-species-reactions(Ls1,Ls2,Les,i,Lsi,Lri)Input: Ls1,Ls2, list of species

Les, list of elementary steps

i, step number

Output: Lsi, list of species created at step i

Lri, list of reactions created at step i

For all species s in Ls1 do

For all reactions es in Les with order(er) = 1 do

LGp = ø

generate-product(G(s),Gr(er),Gp(er),LGp)

For all graphs Gp in LGp do

compute Lp the list of connected components of Gp

Lsi = Lsi U* Lp

Lri = Lri U* (es,s,Lp)

done

done

done

For all species s1 in Ls1 and s2 in Ls2 do

For all es in Les with order(er) = 2 do

LGp = ø

generate-product(G(s1) U G(s2),Gr(er),Gp(er),LGp)

For all graphs Gp in LGp do

compute Lp the list of connected components of Gp

Lsi = Lsi U* Lp

Lri = Lri U* (es,s1,s2,Lp)

done

done

done

generate-product(G(s),Gr(er),Gp(er),LGp)Input: G(s), molecular graph of species s

Gr(er), molecular graph of the reactants of er

Gp(er), molecular graph of the products of er

Output: LGp, list of graphs obtained after applying er to s

For all Gs' in Gs such as Gs' = Gr(er) do

Gp = Gs - Gs' + Grp(er)

if (species-constraints(Gp) = TRUE)then LGp = LGp U* Gp fi

done

2.1 Sampling algorithmsAlthough the size of the network generated by the deterministic algorithm is finite due the limited size of the species, as shown by Broadbelt et al. the number species in the network grows exponentially with the with n, the number of atoms of the largest species. In fact, Table II demonstrates that it is virtually impossible to generated thermal cracking networks for species larger than pentane. As argued in the introduction there are several technique to reduce the network size. We propose in this section three network sampling algorithms based on the detailed reduction idea. These three algorithms are: random-sampling, concentration-sampling, and Monte-Carlo sampling.

. The algorithm is identical to the deterministic network generator with the exception that at each step i the list Lsi of species created is reduced to a size equal to a pre-defined Ms number. The reduction is done at random, that is, species are removed at random from Lsi until the size of Lsi is below Ms. The algorithm is given in Scheme II. The subroutine Generate-species-reactions is given in scheme I.2.a) Random-sampling algorithmScheme II:

random-sampling-network-generator(Lso,Les,Ls,Lr)Input: Lso, list of initial species

Les, list of elementary steps

Output: Ls, list of all species in network

Lr, list of all reactions in network

Local variable: Lsi, list of species created at step i

Lri, list of reactions created at step i

Ls = Lso, Lr = ø, Lsi = ø, i = 0

do

Lsi-1 = Lsi, Lsi = ø, Lri = ø, i = i + 1

generate-species-reactions(Lsi-1,Ls,Les,i,Lsi,Lri)

reduce-mechanism-random(Ls,Lr,Lsi,Lri)

Ls = Ls U* Lsi

Lr = Lr U* Lri

until Lsi = ø and Li = ø

reduce-mechanism-random(Ls,Lr,Lsi,Lri)Input: Ls, list of species

Les, list of elementary steps

Output: Lsi, list of species created at step i

Lri, list of reactions created at step i

Constant: Ms, maximum number of species allowed in Lsi

While (|Lsi| > Ms) do

select s in Lsi at random

Lsi = Lsi - s

For all reaction r in Lri do

if s belongs to r Lri = Lri - r

done

done

. With the concentration-sampling algorithm, the reduction of the network is not performed at random but based on the species concentrations. The main assumption of this method is that if a species created at any given step i has low concentration values over the reaction time, this species will generate products with concentrations at most equal to that low value. Therefore, removing low concentration species from the list Lsi should have a neglectable effect on the final product distribution. Note that this assumption is valid only if the species is consumed during the reactions and do not act as a catalyst. Indeed if a species is a catalyst even with low concentration, this species could have a relatively large impact of the final product distribution. However, as it is shown in the result section, for the thermal cracking reactions species do not act as catalysts and the assumption appears to be valid. Using the above assumption, the algorithm works as follows. At each step n the deterministic routine Generate-species-reaction compute is run to compute the list of new species. At this point the network is composed of all species in Lso U ... U Lsi and associated reactions. Although the network may no be completed since the reaction rates are known (cf. Next section), the time evolution of the species concentrations can be computed by solving the system of differential equations associated with the network. In the present algorithm, we use the Monte-Carlo Gillespie (MC-Gillespie) technique to solve the system. More precisely, the MC-Gillespie technique monitors the number of particles for each species versus time. The initial number of particles of the reactants are computed from their initial concentrations (given by the user), and the initial number Mp of particule in the system. In the present paper, we are using the MC-Gillespie technique at constant volume V, hence the initial number of particle is calculated from the particle density, and the reaction volume (both user?s input). The MC-Gillespie technique is an exact method for numerical integration of the time evolution of any spatially homogeneous mixture of molecular species that interact through a specified set of coupled chemical reaction channels. The technique is based on a fundamental equation giving the probability at time t that the next reaction in V will occur in the differential time interval [t+t, t+t+dt] and will be an rm reaction:2.b) Concentration-sampling algorithmP(t,m) dt = P1(t) . P2(m/t) (1)

where

P1(t) = a exp (-at) dt (2)

and

P2(m/t) = am/a (3)

In eqs 2 and 3, amdt is the probability, to first order in dt, that a reaction rm will occur in V in the next time interval dt and a = Smam. The rate constant km is related to am in the different ways depending if the reaction is monomolecular or bimolecular. For monomolecular reactions am [s] km, where [s] is the concentration of the reactant species s. For bimolecular reaction involving two species s1 and s2, am [s1][s2] km/V and for bimolecular reaction involving only one reactant species am = [s][s-1] km/V. In order to integrate eq. 1, Gillespie proposes the following scheme. First generate a random value t according to P1(t) and second generate a random integer m according to P2(m/t). In our implementation of the MC-Gillespie technique, the random value t is computed by simply drawing a random number r1 from an uniform distribution in the unit interval and taking:

t = (1/a)ln(1/r1) (4)

In turn, the random integer m is generated by drawing another random number r2 in the unit interval and by taking m the integer that verifying:

Sn=1m-1 an < r2 a _ Sn=1m an (5)

In his original paper, Gillespie has proven that expressions (4) and (5) are indeed correct to simulate stochastically the time evolution of homogenous reactions.

During the MC-Gillespie integration, the algorithm retains for each species present in the network the maximum number of particles reached over the time period the system is integrated. Species created at step i are then sorted by decreasing maximum number of particles, the first Ms elements of the sorted list are keep in Lsi while the other are eliminated. The algorithm is given in the following scheme, the routine generate-species-reaction is given in scheme I. The routine assign-initial-number-particle is not detailed here, but returns the initial number of particles for each reactants. This number is simply equal to the product of the maximum number of particles (Mp) by the initial concentration of the reactant. Both numbers are user?s input.

Scheme III:

concentration-sampling-network-generator(Lso,Les,Ls,Lr)Input: Lso, list of initial species

Les, list of elementary steps

Output: Ls, list of all species in network

Lr, list of all reactions in network

Constant: Mc, maximum number of Monte-Carlo steps

Mp, maximum number of particles

Local variable: Lsi, list of species created at step i

Lri, list of reactions created at step i

Ls = Lso, Lr = ø, Lsi = ø, n = 0

do

Lsi-1 = Lsi, Lsi = ø, Lri = ø, i = i + 1

generate-species-reactions(Lsi-1,Ls,Les,n,Lsi,Lri)

reduce-mechanism-concentration(Ls,Lr,Lsi,Lri)

Ls = Ls U* Lsi

Lr = Lr U* Lri

until Lsi = ø and Lri = ø

reduce-mechanism-concentration(Ls,Lr,Lsi,Lri)Input: Ls, list of species

Les, list of elementary steps

Output: Lsi, list of species created at step i

Lri, list of reactions created at step i

Constant: Mc, maximum number of Monte-Carlo steps

Local variable: L[s], list of species concentration

L[smax], list of species maximum

concentration

t, time

nc, number of Monte-Carlo steps

For all species s in Ls such as step(s) = 0 do

if step(s) = 0 then

[s] = assign-initial-number-particles(s,Mp)

[smax] = [s]

else [s] = 0, [smax] = 0

fi

L[s] = L[s] U [s]

L[smax] = L[smax] U [smax]

done

t 0, nc 0

While nc < Mc do

t = MC-step(Ls,L[s],Lr,t)

For all species s in Ls do

if [s] > [smax] then [smax] = [s] fi

done

nc = nc + 1

done

While (|Lsi| > Ms) do

find s in Lsi having to the lowest [smax] value

Lsi = Lsi - s

For all reactions r in Lri do

if s belongs to r then Lri = Lri - r fi

done

done

MC-Gillespie-step(Ls,L[s],Lr,t)Input: Ls, list of molecular species

L[s], list of species concentration

Lr, list of reactions

t, time

Output: updated L[s]

t, time after event occurs

compute time dt of next event using eq. 4

select reaction r in Lr occurring at time t+dt using eq. 5

t = t + dt

For all reactant species s in Lr(r) do [s] = [s] - 1 done

For all product species s in Lp(r) do [s] = [s] + 1 done

return t

. The idea of the MC-sampling algorithm is to perform at the same time both the Monte-Carlo integration and the network generation. The advantage of this technique is that there are no assumptions regarding catalyst species. As in the previous case, one starts with an initial reactant concentration given in the form of numbers of particles. The initial total number of particles (Mp) is calculated from the particle density and reaction volume. As it will be demonstrated in the last section of the paper, this number must be large to provide accurate results. At each step, the set of new species is computed using the generate-species-reactions routine, but in the present case these species are generated by applying the elementary steps only for species having non-zero concentration (i.e., set Ls' in Scheme IV). The concentrations of the new species are set to zero and updated using the MC-Gillespie integration step. The process is iterated until the number of step exceed the pre-defined Mc value. In the following scheme, the routine generate-species-reactions is given in scheme I, and the routine MC-Gillespie-step can be found in scheme III.2.c) MC-sampling- algorithmScheme IV:

MC-sampling-network-generator(Lso,Les,Ls,Lr)Input: Lso, list of initial species

Les, list of elementary steps

Output: Ls, list of all species in network

Lr, list of all reactions in network

Constant: Mc, maximum number of Monte-Carlo steps

Mp, maximum number of particles

Local variable: L[s], list of species concentration

Ls', list of species with non-zero

concentration

t, time

Ls = Lso, L[s] = ø, Lr = ø

For all species s in Ls do

[s] = assign-initial-number-particles(s,Mp)

L[s] = L[s] U [s]

done

t 0, i 0

While i < Mc do

Lsi = ø, Lri = ø, Ls' = ø, i = i + 1

For all s in Ls do

if [s] = 0 then Ls' = Ls'- s fi

done

generate-species-reactions(Ls',Ls',Les,i,Lsi,Lri)

Ls = Ls U* Lsi

Lr = Lr U* Lri

For all s in Lsi do [s] = 0 done

t = MC-step(Ls,L[s],Lr,t)

done

Computational complexityIn this section we derive the upper bound time-complexity of the above algorithms, and show that the three sampling algorithms can be run in polynomial time. Let n be an upper bound for the number of atoms for each species, let N be an upper bound for the maximum number of species generated by the algorithm (i.e., N _ |Ls|), let Ni, be the number of species in the list Lsi (i.e., Ni = |Lsi|), and let N0 be the number of reactant species (i.e., N0 = |Ls0|). The maximum number of atoms for each elementary step or r-matrices is r, and the number of elementary reaction steps is R. Note that for hydrocarbon cracking reactions r _ 3 and R = 5 (cf. Figure 1). As a general rule r and R are always bounded and will be taken as constant in our complexity calculations.

We first evaluate the time complexity of the procedure generate-product. The procedure performs a subgraph isomorphism between a given species and an elementary step. Since each elementary step contains at most r atoms, there are r! ways of mapping Gr(es) onto each atom of Gs. Gs comprises at most n atoms, therefore, the number of subgraph isomorphism performed is nr!. Let us assume that all tries pass the subgraph isomorphism and the constraint tests. In such an instance nr! graphs are added to the list LGp. Each graph added to LGp is tested for isomorphism with a complexity of O(n2). The constraints listed in Table I can be checked in O(n) steps, consequently the time complexity of the generate-product routine is nr! (n + n2) = O(n3) since r is bounded by a constant. The routine generate-species-reaction distinguishes monomolecular reactions from bimolecular reactions. For monomolecular reactions all species in Ls1 = Lsi-1 and all elementary steps Les are sent to generate-product. Generate-product may return nr! graphs, these graphs is turn may contain at most n connected components (i.e., |Lp| = n). Therefore, there is a total of |Lsi-1| |Les| nr! |Lsi| |Lp| = Ni-1R Ni n2r! isomorphisms performed between the graphs of Lp and Lsi. Isomorphism between species can be achieved in O(n2), isomorphism between reactions can be performed in a constant time (i.e., O(1)) since reactions are represented by n-tuple. The overall complexity for monomolecular reaction is therefore O(Ni-1 Ni R n4) = O(Ni-1 Ni n4). For bimolecular reactions the same operations are performed for all pairs of species between Lsi-1 and Ls. The time-complexity is O(N Ni-1 n4), which is the dominant term for generate-species-reactions since N _ Ni. Note that the number of species returned by generate-species-reactions is | Lsi-1| |Les| nr! |Lp| = Ni-1 R n2r for monomolecular reactions and | Lsi-1| |Ls| |Les| nr! |Lp| = Ni-1 N R n2r for bimolecular reactions. In both case mono- and bi-molecular reactions we have |Lri| = |Lsi|. Let us now evaluate the number of iterations in the deterministic-network-generator routine. Let k be the maximum valence allowed, usually in organic chemistry k _ 4. It will take at most kn steps to remove all the bonds in Lso and it will take at most kn steps to create the largest structure comprising n atoms. Therefore all accessible isomers comprising 1 to n atoms can be created in 2kn steps, and the maximum number of iterations of the main routine is bounded by 2kn = O(n). The deterministic-network-generator routine performs O(n) calls to generate-species-reactions, O(n) isomorphism checks between Lsi and Ls, and O(n) insertions of Lri in Lr. Therefore, the overall time complexity is O(n N Ni-1n4 + N Ni-1 n2 + N Ni-1) = O(n5 N Ni-1) _ O(n5 N2). This time complexity is not polynomial since N is bounded by the number of isomers comprising 1,..n atoms, which increases exponentially with n.

Let us now consider the sampling routines. The procedure reduce-mechanism-random performs Ms random selections in Lsi. Since none of the lists Lso U ... U Lsi-1 have a size greater than Ms, we have |Ls| _ i Ms, and |Lsi | = |Lsi-1| |Les| nr! |Lp| + |Lsi-1| |Ls| |Les| nr! |Lp| _ MsRn2r + Msi MsRn2r _ i Ms2Rn2r! + MsRn2r!. Since i is bounded by 2kn and k, Ms, R, and r are constants |Lsi| = O(n3). Consequently, Ms random selections in Lsi will be performed with a time complexity O(Msn3) = O(n3). Let us recall that the complexity of generate-reactions-species is O(|Ls| |Lsi-1| n3) which in the present case reduces to O(n Ms2n3) = O(n4). The overall complexity of the random-sampling-network generator algorithm is therefore 2n (n4 + |Ls| |Lsi| n2 +(Lr| |Lri|)) _ 2n (n4 + n Ms Ms (n2+1)) = O(n5).

The concentration-sampling-network-generator algorithm is similar to random-sampling-network-generator with the exception of the reduce-mechanism routine. There are three loop in this routine. Clearly the complexity of the first loop is |Ls|, the second loop calls Mc times the Monte-Carlo-step routine and search the maximum concentration over time of all species in Ls. The Monte-Carlo-step routine has a complexity of |Lr|, hence the overall complexity of reduce-mechanism-concentration is Mc (|Ls| + |Lsi| |Lr| + |Ls|| Lri |) = Mc (n Ms + 2n Msn3) = Mc O(n4), and the complexity of the concentration-sampling-network-generator algorithm is Mc O(n5).

The main difference between MC-sampling-network-generator and the other algorithms is that the number of particles remains the same, Mp. Hence, the number of species with non-zero concentration is at most Mp (i.e., |Ls?| _ Mp). The routine generate-species-reactions is called for the list Ls?, consequently, its complexity is O(|Ls?| |Ls?| n3) _ Mp2n3 = O(n3). Note also that |Lsi| the number of species returned as well as |Lri| the number of reaction are bounded by |Ls?| |Les| nr! |Lp| _ Mp2 R n2 r! _ O(n2). The number of species at the first iteration of MC-sampling-network-generator (i.e., the number of reactants) is at most Mp. The number of species at the second iteration is |Ls0 U* Ls1| _ Mp + O(n2) = O(n2). It can easily be verified that after Mc iterations the number of species |Ls| will remain bounded by Mp + Mc O(n2) = Mc O(n2). Since | Lri| = |Lsi|, we have |Lr| = Mc O(n2). Hence, the complexity of the isomorphism checks in MC-sampling-network-generator is |Ls| |Lsi| n2 = Mc O(n6), while the complexity of MC-step is |Lr| = Mc O(n2). Since the number of iteration is equal to Mc, the overall complexity of MC-sampling-network-generator is Mc2 O(n6) + Mc2 O(n2) = Mc2 O(n6).