Package org.openscience.cdk.tools
Class ErtlFunctionalGroupsFinderUtility
- java.lang.Object
-
- org.openscience.cdk.tools.ErtlFunctionalGroupsFinderUtility
-
public class ErtlFunctionalGroupsFinderUtility extends java.lang.Object
This class gives utility methods for using ErtlFunctionalGroupsFinder, a CDK-based implementation, published here of the Ertl algorithm for automated functional groups detection. The methods of this class are basically public static re-implementations of the routines used for testing and evaluating the ErtlFunctionalGroupsFinder, as described in the publication.
-
-
Method Summary
All Methods Static Methods Concrete Methods Modifier and Type Method Description static boolean
applyAromaticityDetection(org.openscience.cdk.interfaces.IAtomContainer aMolecule, org.openscience.cdk.aromaticity.Aromaticity anAromaticityModel)
Convenience method for applying the given aromaticity model to the given molecule.static org.openscience.cdk.interfaces.IAtomContainer
applyFiltersAndPreprocessing(org.openscience.cdk.interfaces.IAtomContainer aMolecule, org.openscience.cdk.aromaticity.Aromaticity anAromaticityModel)
Checks whether the given molecule represented by an atom container should be filtered instead of being passed on to the ErtlFunctionalGroupsFinder.find() method and if not, applies necessary preprocessing steps.static org.openscience.cdk.interfaces.IAtomContainer
applyFiltersAndPreprocessing(org.openscience.cdk.interfaces.IAtomContainer aMolecule, org.openscience.cdk.aromaticity.Aromaticity anAromaticityModel, boolean areSingleAtomsFiltered)
Checks whether the given molecule represented by an atom container should be filtered instead of being passed on to the ErtlFunctionalGroupsFinder.find() method and if not, applies necessary preprocessing steps.static java.lang.String
createPseudoSmilesCode(org.openscience.cdk.interfaces.IAtomContainer aMolecule)
Gives the pseudo SMILES code for a given molecule / functional group.static org.openscience.cdk.hash.MoleculeHashGenerator
getFunctionalGroupHashGenerator()
Constructs a CDK MoleculeHashGenerator that is configured to count frequencies of the functional groups returned by ErtlFunctionalGroupsFinder.static boolean
isAtomOrBondCountZero(org.openscience.cdk.interfaces.IAtomContainer aMolecule)
Checks whether the atom count or bond count of the given molecule is zero.static boolean
isValidArgumentForFindMethod(org.openscience.cdk.interfaces.IAtomContainer aMolecule)
Checks whether the given molecule represented by an atom container can be passed on to the ErtlFunctionalGroupsFinder.find() method without problems if(!) strict input restrictions are turned on (turned off by default).static boolean
isValidArgumentForFindMethod(org.openscience.cdk.interfaces.IAtomContainer aMolecule, boolean areSingleAtomsFiltered)
Checks whether the given molecule represented by an atom container can be passed on to the ErtlFunctionalGroupsFinder.find() method without problems if(!) strict input restrictions are turned on (turned off by default).static void
neutralizeCharges(org.openscience.cdk.interfaces.IAtomContainer aMolecule)
Neutralizes charged atoms in the given atom container by zeroing the formal atomic charges and filling up free valences with implicit hydrogen atoms (according to the CDK atom types).static void
neutralizeCharges(org.openscience.cdk.interfaces.IAtom anAtom, org.openscience.cdk.interfaces.IAtomContainer aParentMolecule)
Neutralizes a charged atom in the given parent atom container by zeroing the formal atomic charge and filling up free valences with implicit hydrogen atoms (according to the CDK atom types).static void
perceiveAtomTypesAndConfigureAtoms(org.openscience.cdk.interfaces.IAtomContainer aMolecule)
Convenience method to perceive atom types for all IAtoms in the IAtomContainer, using the CDK AtomContainerManipulator or rather the CDKAtomTypeMatcher.static void
restoreOriginalEnvironmentalCarbons(java.util.List<org.openscience.cdk.interfaces.IAtomContainer> aListOfFunctionalGroups, org.openscience.cdk.interfaces.IAtomContainer aMolecule, boolean aConvertExplicitHydrogens, boolean aFillEmptyValences, org.openscience.cdk.interfaces.IChemObjectBuilder aBuilder)
Replaces the environmental carbon or pseudo-atoms (new IAtom objects) inserted by the EFGF in an identified functional group with the carbon IAtom objects from the original molecule object.static org.openscience.cdk.interfaces.IAtomContainer
selectBiggestUnconnectedComponent(org.openscience.cdk.interfaces.IAtomContainer aMolecule)
Returns the biggest unconnected component/structure of the given atom container, judging by the atom count.static boolean
shouldBeFiltered(org.openscience.cdk.interfaces.IAtomContainer aMolecule)
Checks whether the given molecule represented by an atom container should NOT be passed on to the ErtlFunctionalGroupsFinder.find() method but instead be filtered if(!) strict input restrictions are turned on (turned off by default).static boolean
shouldBeFiltered(org.openscience.cdk.interfaces.IAtomContainer aMolecule, boolean areSingleAtomsFiltered)
Checks whether the given molecule represented by an atom container should NOT be passed on to the ErtlFunctionalGroupsFinder.find() method but instead be filtered if(!) strict input restrictions are turned on (turned off by default).static boolean
shouldBePreprocessed(org.openscience.cdk.interfaces.IAtomContainer aMolecule)
Checks whether the given molecule represented by an atom container needs to be preprocessed before it is passed on to the ErtlFunctionalGroupsFinder.find() method because it is unconnected or contains charged atoms if(!) strict input restrictions are turned on (turned off by default).
-
-
-
Method Detail
-
getFunctionalGroupHashGenerator
public static org.openscience.cdk.hash.MoleculeHashGenerator getFunctionalGroupHashGenerator()
Constructs a CDK MoleculeHashGenerator that is configured to count frequencies of the functional groups returned by ErtlFunctionalGroupsFinder. It takes elements, bond order sum, and aromaticity of the atoms in an atom container into consideration. It does not consider things like isotopes, stereo-chemistry, orbitals, or charges.- Returns:
- MoleculeHashGenerator object configured for Ertl functional groups
-
isAtomOrBondCountZero
public static boolean isAtomOrBondCountZero(org.openscience.cdk.interfaces.IAtomContainer aMolecule) throws java.lang.NullPointerException
Checks whether the atom count or bond count of the given molecule is zero. The ErtlFunctionalGroupsFinder.find() method would still accept these molecules, but it is not recommended to pass them on (simply makes not much sense).- Parameters:
aMolecule
- the molecule to check- Returns:
- true, if the atom or bond count of the molecule is zero
- Throws:
java.lang.NullPointerException
- if the given molecule is 'null'
-
shouldBeFiltered
public static boolean shouldBeFiltered(org.openscience.cdk.interfaces.IAtomContainer aMolecule) throws java.lang.NullPointerException
Checks whether the given molecule represented by an atom container should NOT be passed on to the ErtlFunctionalGroupsFinder.find() method but instead be filtered if(!) strict input restrictions are turned on (turned off by default).
In detail, this function returns true if the given atom container contains metal, metalloid, or pseudo atoms or has an atom or bond count equal to zero.
If this method returns false, this does NOT mean the molecule can be passed on to find() without a problem. It still might need to be preprocessed first.- Parameters:
aMolecule
- the atom container to check- Returns:
- true if the given atom container should be discarded
- Throws:
java.lang.NullPointerException
- if parameter is 'null'
-
shouldBeFiltered
public static boolean shouldBeFiltered(org.openscience.cdk.interfaces.IAtomContainer aMolecule, boolean areSingleAtomsFiltered) throws java.lang.NullPointerException
Checks whether the given molecule represented by an atom container should NOT be passed on to the ErtlFunctionalGroupsFinder.find() method but instead be filtered if(!) strict input restrictions are turned on (turned off by default).
In detail, this function returns true if the given atom container contains metal, metalloid, or pseudo atoms or has an atom or bond count equal to zero. If the second parameter is set to "false", single atom molecules (bond count is 0) are accepted and not recommended to be filtered if they fulfill the other requirements.
If this method returns false, this does NOT mean the molecule can be passed on to find() without a problem. It still might need to be preprocessed first.- Parameters:
aMolecule
- the atom container to checkareSingleAtomsFiltered
- if false, molecules with bond count 0 but atom count 1 will return false (do not filter)- Returns:
- true if the given atom container should be discarded
- Throws:
java.lang.NullPointerException
- if parameter is 'null'
-
shouldBePreprocessed
public static boolean shouldBePreprocessed(org.openscience.cdk.interfaces.IAtomContainer aMolecule) throws java.lang.NullPointerException
Checks whether the given molecule represented by an atom container needs to be preprocessed before it is passed on to the ErtlFunctionalGroupsFinder.find() method because it is unconnected or contains charged atoms if(!) strict input restrictions are turned on (turned off by default).
It is advised to check via shouldBeFiltered() whether the given molecule should be discarded anyway before calling this function.- Parameters:
aMolecule
- the atom container to check- Returns:
- true is the given molecule needs to be preprocessed
- Throws:
java.lang.NullPointerException
- if parameter is 'null'
-
isValidArgumentForFindMethod
public static boolean isValidArgumentForFindMethod(org.openscience.cdk.interfaces.IAtomContainer aMolecule) throws java.lang.NullPointerException
Checks whether the given molecule represented by an atom container can be passed on to the ErtlFunctionalGroupsFinder.find() method without problems if(!) strict input restrictions are turned on (turned off by default).
This method will return false if the molecule contains any metal, metalloid, pseudo, or charged atoms, contains multiple unconnected parts, or has an atom or bond count of zero.- Parameters:
aMolecule
- the molecule to check- Returns:
- true if the given molecule is a valid parameter for ErtlFunctionalGroupsFinder.find() method
- Throws:
java.lang.NullPointerException
- if parameter is 'null'
-
isValidArgumentForFindMethod
public static boolean isValidArgumentForFindMethod(org.openscience.cdk.interfaces.IAtomContainer aMolecule, boolean areSingleAtomsFiltered) throws java.lang.NullPointerException
Checks whether the given molecule represented by an atom container can be passed on to the ErtlFunctionalGroupsFinder.find() method without problems if(!) strict input restrictions are turned on (turned off by default).
This method will return false if the molecule contains any metal, metalloid, pseudo, or charged atoms, contains multiple unconnected parts, or has an atom or bond count of zero. If the second parameter is set to "false", single atom molecules (bond count is 0) are accepted and not recommended to be filtered if they fulfill the other requirements.- Parameters:
aMolecule
- the molecule to checkareSingleAtomsFiltered
- if false, molecules with bond count 0 but atom count 1 will return true (do not filter)- Returns:
- true if the given molecule is a valid parameter for ErtlFunctionalGroupsFinder.find() method
- Throws:
java.lang.NullPointerException
- if parameter is 'null'
-
selectBiggestUnconnectedComponent
public static org.openscience.cdk.interfaces.IAtomContainer selectBiggestUnconnectedComponent(org.openscience.cdk.interfaces.IAtomContainer aMolecule) throws java.lang.NullPointerException
Returns the biggest unconnected component/structure of the given atom container, judging by the atom count. To pre-check whether the atom container consists of multiple unconnected components, use isStructureUnconnected(). All set properties of aMolecule will be set as properties of the returned atom container.
NOTE: The atom, bond etc. objects of the given atom container are re-used in the returned atom container but the former remains unchanged
Iterates through all unconnected components in the given atom container, so the method scales linearly with O(n) with n: number of unconnected components.- Parameters:
aMolecule
- the molecule whose biggest unconnected component should be found- Returns:
- the biggest (judging by the atom count) unconnected component of the given atom container
- Throws:
java.lang.NullPointerException
- if aMolecule is null or the biggest component
-
neutralizeCharges
public static void neutralizeCharges(org.openscience.cdk.interfaces.IAtomContainer aMolecule) throws java.lang.NullPointerException, org.openscience.cdk.exception.CDKException
Neutralizes charged atoms in the given atom container by zeroing the formal atomic charges and filling up free valences with implicit hydrogen atoms (according to the CDK atom types). This procedure allows a more general charge treatment than a pre-defined transformation list but may produce "wrong" structures, e.g. it turns a nitro NO2 group into a structure represented by the SMILES code "[H]O[N](=O)*" with an uncharged four-bonded nitrogen atom (other examples are "*[N](*)(*)*", "[C]#[N]*" or "*S(*)(*)*"). Thus, an improved charge neutralization scheme is desirable for future implementations.
NOTE: This method changes major properties and the composition of the given IAtomContainer object! If you want to retain your object unchanged for future calculations, use the IAtomContainer's clone() method.
Iterates through all atoms in the given atom container, so the method scales linearly with O(n) with n: number of atoms.- Parameters:
aMolecule
- the molecule to be neutralized- Throws:
java.lang.NullPointerException
- if aMolecule is 'null' or one of its atomsorg.openscience.cdk.exception.CDKException
- if no matching atom type can be determined for one atom or there is a problem with adding the implicit hydrogen atoms.
-
neutralizeCharges
public static void neutralizeCharges(org.openscience.cdk.interfaces.IAtom anAtom, org.openscience.cdk.interfaces.IAtomContainer aParentMolecule) throws java.lang.NullPointerException, org.openscience.cdk.exception.CDKException
Neutralizes a charged atom in the given parent atom container by zeroing the formal atomic charge and filling up free valences with implicit hydrogen atoms (according to the CDK atom types).
NOTE: This method changes major properties and the composition of the given IAtom and IAtomContainer object! If you want to retain your objects unchanged for future calculations, use the IAtomContainer's clone() method.- Parameters:
anAtom
- the atom to be neutralizedaParentMolecule
- the molecule the atom belongs to- Throws:
java.lang.NullPointerException
- if anAtom or aParentMolecule is 'null'org.openscience.cdk.exception.CDKException
- if the atom is not part of the molecule or no matching atom type can be determined for the atom or there is a problem with adding the implicit hydrogen atoms.
-
perceiveAtomTypesAndConfigureAtoms
public static void perceiveAtomTypesAndConfigureAtoms(org.openscience.cdk.interfaces.IAtomContainer aMolecule) throws java.lang.NullPointerException, org.openscience.cdk.exception.CDKException
Convenience method to perceive atom types for all IAtoms in the IAtomContainer, using the CDK AtomContainerManipulator or rather the CDKAtomTypeMatcher. If the matcher finds a matching atom type, the IAtom will be configured to have the same properties as the IAtomType. If no matching atom type is found, no configuration is performed.
Calling this method is equal to calling AtomContainerManipulator.percieveAtomTypesAndConfigureAtoms(aMolecule). It has been given its own method here because it is a necessary step in the preprocessing for ErtlFunctionalGroupsFinder.
NOTE: This method changes major properties of the given IAtomContainer object! If you want to retain your object unchanged for future calculations, use the IAtomContainer's clone() method.- Parameters:
aMolecule
- the molecule to configure- Throws:
java.lang.NullPointerException
- is aMolecule is 'null'org.openscience.cdk.exception.CDKException
- when something went wrong with going through the AtomType options- See Also:
AtomContainerManipulator.percieveAtomTypesAndConfigureAtoms(IAtomContainer)
,CDKAtomTypeMatcher.findMatchingAtomType(IAtomContainer, IAtom)
-
applyAromaticityDetection
public static boolean applyAromaticityDetection(org.openscience.cdk.interfaces.IAtomContainer aMolecule, org.openscience.cdk.aromaticity.Aromaticity anAromaticityModel) throws java.lang.NullPointerException, org.openscience.cdk.exception.CDKException
Convenience method for applying the given aromaticity model to the given molecule. Any existing aromaticity flags are removed - even if no aromatic bonds were found. This follows the idea of applying an aromaticity model to a molecule such that the result is the same irrespective of existing aromatic flags.
Calling this method is equal to calling Aromaticity.apply(aMolecule). It has been given its own method here because it is a necessary step in the preprocessing for ErtlFunctionalGroupsFinder.
NOTE: This method changes major properties and the composition of the given IAtomContainer object! If you want to retain your object unchanged for future calculations, use copy() in this class or the IAtomContainer's clone() method.- Parameters:
aMolecule
- the molecule to apply the model toanAromaticityModel
- the model to apply; Note that the choice of electron donation model and cycle finder algorithm has a heavy influence on the functional group detection of ErtlFunctionalGroupsFinder- Returns:
- true if the molecule (or parts of it) is determined to be aromatic
- Throws:
java.lang.NullPointerException
- if a parameter is 'null'org.openscience.cdk.exception.CDKException
- if a problem occurred with the cycle perception (see CDK docs)- See Also:
Aromaticity.apply(IAtomContainer)
-
applyFiltersAndPreprocessing
public static org.openscience.cdk.interfaces.IAtomContainer applyFiltersAndPreprocessing(org.openscience.cdk.interfaces.IAtomContainer aMolecule, org.openscience.cdk.aromaticity.Aromaticity anAromaticityModel) throws java.lang.NullPointerException
Checks whether the given molecule represented by an atom container should be filtered instead of being passed on to the ErtlFunctionalGroupsFinder.find() method and if not, applies necessary preprocessing steps. In the second case, this method applies preprocessing to the given atom container that is always needed (setting atom types and applying an aromaticity model) and preprocessing steps that are only needed in specific cases (selecting the biggest unconnected component, neutralizing charges). Molecules processed by this method can be passed on to find() without problems (Caution: The return value of this method is 'null' if the molecule should be filtered!) if(!) strict input restrictions are turned on (turned off by default).
NOTE: This method changes major properties and the composition of the given IAtomContainer object! If you want to retain your object unchanged for future calculations, use the IAtomContainer's clone() method.
NOTE2: The returned IAtomContainer object is the same as the one given as parameter!- Parameters:
aMolecule
- the molecule to check and processanAromaticityModel
- the aromaticity model to apply to the molecule in preprocessing; Note: The chosen ElectronDonation model can massively influence the extracted function groups of a molecule when using ErtlFunctionGroupsFinder!- Returns:
- the preprocessed atom container or 'null' if the molecule should be discarded
- Throws:
java.lang.NullPointerException
- if a parameter is 'null'; Note: All other exceptions are caught and logged by this class' logger
-
applyFiltersAndPreprocessing
public static org.openscience.cdk.interfaces.IAtomContainer applyFiltersAndPreprocessing(org.openscience.cdk.interfaces.IAtomContainer aMolecule, org.openscience.cdk.aromaticity.Aromaticity anAromaticityModel, boolean areSingleAtomsFiltered) throws java.lang.NullPointerException
Checks whether the given molecule represented by an atom container should be filtered instead of being passed on to the ErtlFunctionalGroupsFinder.find() method and if not, applies necessary preprocessing steps. In the second case, this method applies preprocessing to the given atom container that is always needed (setting atom types and applying an aromaticity model) and preprocessing steps that are only needed in specific cases (selecting the biggest unconnected component, neutralizing charges). Molecules processed by this method can be passed on to find() without problems (Caution: The return value of this method is 'null' if the molecule should be filtered!) if(!) strict input restrictions are turned on (turned off by default).
NOTE: This method changes major properties and the composition of the given IAtomContainer object! If you want to retain your object unchanged for future calculations, use the IAtomContainer's clone() method.
NOTE2: The returned IAtomContainer object is the same as the one given as parameter!- Parameters:
aMolecule
- the molecule to check and processanAromaticityModel
- the aromaticity model to apply to the molecule in preprocessing; Note: The chosen ElectronDonation model can massively influence the extracted functional groups of a molecule when using ErtlFunctionGroupsFinder!areSingleAtomsFiltered
- if false, molecules with bond count 0 but atom count 1 will be processed and not return null- Returns:
- the preprocessed atom container or 'null' if the molecule should be discarded
- Throws:
java.lang.NullPointerException
- if a parameter is 'null'; Note: All other exceptions are caught and logged by this class' logger
-
restoreOriginalEnvironmentalCarbons
public static void restoreOriginalEnvironmentalCarbons(java.util.List<org.openscience.cdk.interfaces.IAtomContainer> aListOfFunctionalGroups, org.openscience.cdk.interfaces.IAtomContainer aMolecule, boolean aConvertExplicitHydrogens, boolean aFillEmptyValences, org.openscience.cdk.interfaces.IChemObjectBuilder aBuilder) throws java.lang.NullPointerException, java.lang.IllegalArgumentException
Replaces the environmental carbon or pseudo-atoms (new IAtom objects) inserted by the EFGF in an identified functional group with the carbon IAtom objects from the original molecule object.
Important note: This method only works if the atom container has not been cloned for the extraction of functional groups by ErtlFunctionalGroupsFinder. Use the method "List<IAtomContainer> find(IAtomContainer container, boolean clone)" with clone set to false for this purpose.
Also note that the result differs if the environment has been generalized by the EFGF or not. In the former case, only environmental carbon atoms replaced by R-atoms in the generalized FG are restored.- Parameters:
aListOfFunctionalGroups
- functional groups of the molecule identified by EFGFaMolecule
- original structure in which the groups were identifiedaConvertExplicitHydrogens
- should explicit hydrogen atoms in the functional groups be converted to implicit hydrogensaFillEmptyValences
- should empty valences on the restored environmental carbon atoms be filled with implicit hydrogen atomsaBuilder
- a chem object builder instance- Throws:
java.lang.NullPointerException
- if a parameter is nulljava.lang.IllegalArgumentException
- if one of the functional groups does not originate from the given molecule or the molecule has been cloned for the extraction of functional groups
-
createPseudoSmilesCode
public static java.lang.String createPseudoSmilesCode(org.openscience.cdk.interfaces.IAtomContainer aMolecule) throws java.lang.NullPointerException, org.openscience.cdk.exception.CDKException
Gives the pseudo SMILES code for a given molecule / functional group. In this notation, aromatic atoms are marked by asterisks (*) and pseudo atoms are indicated by 'R'.
The function generates the SMILES string of the given molecule using CDK's SmilesGenerator and then replaces lowercase c, n, o etc. by C*, N*, O* etc. and wildcards ('*') by 'R' in the resulting string. For that, the function iterates through all characters in the generated SMILES string.
Note: All pseudo atoms or atoms that are represented by a wildcard ('*') in the generated SMILES string (e.g. the element [Uup] is interpreted by the CDK SmilesGenerator as a wildcard) are turned into an 'R' atom.- Parameters:
aMolecule
- the molecule whose pseudo SMILES code to generate- Returns:
- the pseudo SMILES representation as a string
- Throws:
java.lang.NullPointerException
- if aMolecule is 'null'org.openscience.cdk.exception.CDKException
- if the SMILES code of aMolecule cannot be generated
-
-