Quantitative Structure-Activity Relationships (QSAR) are empirical relationships that use molecular descriptors to quantify a specific biological activity or chemical property from the molecular structure. Typically, QSAR is used to refer to a process in which the structures of a set of compounds are quantified and then trained against their numerical values of the biological activity or physical property. The result is a mathematical model that can be used to predict the activity or property value of new compounds.
The independent variables in the QSAR equation are given in terms of the molecular descriptors, or operators on the molecular graph that strive to characterize the molecular structure. Such descriptors are based on molecular orbital theory, molecular topology, and molecular properties (lipophilicity, electronic, thermodynamic, quantum-chemical).
The quality and predictive capacity of the QSAR equation depends on the size and the diversity of the molecules found in the training set. The larger and more diverse the training set, the better suited the QSAR equation is equipped to predict activities of a new compound.


The QSAR problem uses independent variables, represented by molecular descriptors, to solve for the activity (dependent variable) of a new compound. We call this the forward QSAR problem. In contrast, the inverse-QSAR problem seeks to find values for the molecular descriptors that possess a desired activity/property value.
This problem is difficult for a number of reasons. First, one needs to solve the forward QSAR problem for a given activity. If this can be done, then the solutions will be given in terms of the molecular descriptors. The problem then lies in constructing a viable molecule from these descriptors. This is typically the limiting factor of most inverse-QSAR methods, since the descriptors are not reversible.
It is clear that the key to an effective solution methodology lies in the use of a molecular descriptor that facilitates the reconstruction of the solutions into actual compounds. Such a descriptor needs to be information rich, have good correlative abilities in QSAR applications, and most importantly, be computationally efficient. A computationally efficient descriptor should have a low degeneracy, meaning it should lead to a limited number of solutions when used with inverse-QSAR. We describe one such descriptor, named signature, that we believe meets the above criteria.


Signature is two-dimensional molecular descriptor based on the molecular graph of a molecule. The vertices of the graph are the set of atoms (or building blocks) in the molecule and the edges are the set of bonds that connect the vertices to one another. A complete list of the signature atom types can be found in the READ-ME file of the translator program.
external image sigandgraph.jpg
A sample molecular graph of 4H-Imidazol-4-ol rooted on vertex a with 3 levels of branching shown.
In this manner, a molecule is characterized by a set of unique canonical subgraphs, called signatures, each rooted on a different vertex with a predefined level of branching that describes the local neighborhood up to a distance h away from the root, called the height.
The set of signatures hσ(x) and their occurrence in the molecular graph comprise the molecular descriptors for a molecule. These are expressed as a string of characters corresponding to the canonized subgraph, read in breath-first order. Branch levels are indicated by a set of parenthesis following the parent vertex. (See examples)

QSAR and iQSAR publications & downloads