Class ErtlFunctionalGroupsFinderUtility


  • public class ErtlFunctionalGroupsFinderUtility
    extends java.lang.Object
    This class gives utility methods for using ErtlFunctionalGroupsFinder, a CDK-based implementation, published here of the Ertl algorithm for automated functional groups detection. The methods of this class are basically public static re-implementations of the routines used for testing and evaluating the ErtlFunctionalGroupsFinder, as described in the publication.
    • Method Summary

      All Methods Static Methods Concrete Methods 
      Modifier and Type Method Description
      static boolean applyAromaticityDetection​(org.openscience.cdk.interfaces.IAtomContainer aMolecule, org.openscience.cdk.aromaticity.Aromaticity anAromaticityModel)
      Convenience method for applying the given aromaticity model to the given molecule.
      static org.openscience.cdk.interfaces.IAtomContainer applyFiltersAndPreprocessing​(org.openscience.cdk.interfaces.IAtomContainer aMolecule, org.openscience.cdk.aromaticity.Aromaticity anAromaticityModel)
      Checks whether the given molecule represented by an atom container should be filtered instead of being passed on to the ErtlFunctionalGroupsFinder.find() method and if not, applies necessary preprocessing steps.
      static org.openscience.cdk.interfaces.IAtomContainer applyFiltersAndPreprocessing​(org.openscience.cdk.interfaces.IAtomContainer aMolecule, org.openscience.cdk.aromaticity.Aromaticity anAromaticityModel, boolean areSingleAtomsFiltered)
      Checks whether the given molecule represented by an atom container should be filtered instead of being passed on to the ErtlFunctionalGroupsFinder.find() method and if not, applies necessary preprocessing steps.
      static java.lang.String createPseudoSmilesCode​(org.openscience.cdk.interfaces.IAtomContainer aMolecule)
      Gives the pseudo SMILES code for a given molecule / functional group.
      static org.openscience.cdk.hash.MoleculeHashGenerator getFunctionalGroupHashGenerator()
      Constructs a CDK MoleculeHashGenerator that is configured to count frequencies of the functional groups returned by ErtlFunctionalGroupsFinder.
      static boolean isAtomOrBondCountZero​(org.openscience.cdk.interfaces.IAtomContainer aMolecule)
      Checks whether the atom count or bond count of the given molecule is zero.
      static boolean isValidArgumentForFindMethod​(org.openscience.cdk.interfaces.IAtomContainer aMolecule)
      Checks whether the given molecule represented by an atom container can be passed on to the ErtlFunctionalGroupsFinder.find() method without problems if(!) strict input restrictions are turned on (turned off by default).
      static boolean isValidArgumentForFindMethod​(org.openscience.cdk.interfaces.IAtomContainer aMolecule, boolean areSingleAtomsFiltered)
      Checks whether the given molecule represented by an atom container can be passed on to the ErtlFunctionalGroupsFinder.find() method without problems if(!) strict input restrictions are turned on (turned off by default).
      static void neutralizeCharges​(org.openscience.cdk.interfaces.IAtomContainer aMolecule)
      Neutralizes charged atoms in the given atom container by zeroing the formal atomic charges and filling up free valences with implicit hydrogen atoms (according to the CDK atom types).
      static void neutralizeCharges​(org.openscience.cdk.interfaces.IAtom anAtom, org.openscience.cdk.interfaces.IAtomContainer aParentMolecule)
      Neutralizes a charged atom in the given parent atom container by zeroing the formal atomic charge and filling up free valences with implicit hydrogen atoms (according to the CDK atom types).
      static void perceiveAtomTypesAndConfigureAtoms​(org.openscience.cdk.interfaces.IAtomContainer aMolecule)
      Convenience method to perceive atom types for all IAtoms in the IAtomContainer, using the CDK AtomContainerManipulator or rather the CDKAtomTypeMatcher.
      static void restoreOriginalEnvironmentalCarbons​(java.util.List<org.openscience.cdk.interfaces.IAtomContainer> aListOfFunctionalGroups, org.openscience.cdk.interfaces.IAtomContainer aMolecule, boolean aConvertExplicitHydrogens, boolean aFillEmptyValences, org.openscience.cdk.interfaces.IChemObjectBuilder aBuilder)
      Replaces the environmental carbon or pseudo-atoms (new IAtom objects) inserted by the EFGF in an identified functional group with the carbon IAtom objects from the original molecule object.
      static org.openscience.cdk.interfaces.IAtomContainer selectBiggestUnconnectedComponent​(org.openscience.cdk.interfaces.IAtomContainer aMolecule)
      Returns the biggest unconnected component/structure of the given atom container, judging by the atom count.
      static boolean shouldBeFiltered​(org.openscience.cdk.interfaces.IAtomContainer aMolecule)
      Checks whether the given molecule represented by an atom container should NOT be passed on to the ErtlFunctionalGroupsFinder.find() method but instead be filtered if(!) strict input restrictions are turned on (turned off by default).
      static boolean shouldBeFiltered​(org.openscience.cdk.interfaces.IAtomContainer aMolecule, boolean areSingleAtomsFiltered)
      Checks whether the given molecule represented by an atom container should NOT be passed on to the ErtlFunctionalGroupsFinder.find() method but instead be filtered if(!) strict input restrictions are turned on (turned off by default).
      static boolean shouldBePreprocessed​(org.openscience.cdk.interfaces.IAtomContainer aMolecule)
      Checks whether the given molecule represented by an atom container needs to be preprocessed before it is passed on to the ErtlFunctionalGroupsFinder.find() method because it is unconnected or contains charged atoms if(!) strict input restrictions are turned on (turned off by default).
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Method Detail

      • getFunctionalGroupHashGenerator

        public static org.openscience.cdk.hash.MoleculeHashGenerator getFunctionalGroupHashGenerator()
        Constructs a CDK MoleculeHashGenerator that is configured to count frequencies of the functional groups returned by ErtlFunctionalGroupsFinder. It takes elements, bond order sum, and aromaticity of the atoms in an atom container into consideration. It does not consider things like isotopes, stereo-chemistry, orbitals, or charges.
        Returns:
        MoleculeHashGenerator object configured for Ertl functional groups
      • isAtomOrBondCountZero

        public static boolean isAtomOrBondCountZero​(org.openscience.cdk.interfaces.IAtomContainer aMolecule)
                                             throws java.lang.NullPointerException
        Checks whether the atom count or bond count of the given molecule is zero. The ErtlFunctionalGroupsFinder.find() method would still accept these molecules, but it is not recommended to pass them on (simply makes not much sense).
        Parameters:
        aMolecule - the molecule to check
        Returns:
        true, if the atom or bond count of the molecule is zero
        Throws:
        java.lang.NullPointerException - if the given molecule is 'null'
      • shouldBeFiltered

        public static boolean shouldBeFiltered​(org.openscience.cdk.interfaces.IAtomContainer aMolecule)
                                        throws java.lang.NullPointerException
        Checks whether the given molecule represented by an atom container should NOT be passed on to the ErtlFunctionalGroupsFinder.find() method but instead be filtered if(!) strict input restrictions are turned on (turned off by default).
        In detail, this function returns true if the given atom container contains metal, metalloid, or pseudo atoms or has an atom or bond count equal to zero.
        If this method returns false, this does NOT mean the molecule can be passed on to find() without a problem. It still might need to be preprocessed first.
        Parameters:
        aMolecule - the atom container to check
        Returns:
        true if the given atom container should be discarded
        Throws:
        java.lang.NullPointerException - if parameter is 'null'
      • shouldBeFiltered

        public static boolean shouldBeFiltered​(org.openscience.cdk.interfaces.IAtomContainer aMolecule,
                                               boolean areSingleAtomsFiltered)
                                        throws java.lang.NullPointerException
        Checks whether the given molecule represented by an atom container should NOT be passed on to the ErtlFunctionalGroupsFinder.find() method but instead be filtered if(!) strict input restrictions are turned on (turned off by default).
        In detail, this function returns true if the given atom container contains metal, metalloid, or pseudo atoms or has an atom or bond count equal to zero. If the second parameter is set to "false", single atom molecules (bond count is 0) are accepted and not recommended to be filtered if they fulfill the other requirements.
        If this method returns false, this does NOT mean the molecule can be passed on to find() without a problem. It still might need to be preprocessed first.
        Parameters:
        aMolecule - the atom container to check
        areSingleAtomsFiltered - if false, molecules with bond count 0 but atom count 1 will return false (do not filter)
        Returns:
        true if the given atom container should be discarded
        Throws:
        java.lang.NullPointerException - if parameter is 'null'
      • shouldBePreprocessed

        public static boolean shouldBePreprocessed​(org.openscience.cdk.interfaces.IAtomContainer aMolecule)
                                            throws java.lang.NullPointerException
        Checks whether the given molecule represented by an atom container needs to be preprocessed before it is passed on to the ErtlFunctionalGroupsFinder.find() method because it is unconnected or contains charged atoms if(!) strict input restrictions are turned on (turned off by default).
        It is advised to check via shouldBeFiltered() whether the given molecule should be discarded anyway before calling this function.
        Parameters:
        aMolecule - the atom container to check
        Returns:
        true is the given molecule needs to be preprocessed
        Throws:
        java.lang.NullPointerException - if parameter is 'null'
      • isValidArgumentForFindMethod

        public static boolean isValidArgumentForFindMethod​(org.openscience.cdk.interfaces.IAtomContainer aMolecule)
                                                    throws java.lang.NullPointerException
        Checks whether the given molecule represented by an atom container can be passed on to the ErtlFunctionalGroupsFinder.find() method without problems if(!) strict input restrictions are turned on (turned off by default).
        This method will return false if the molecule contains any metal, metalloid, pseudo, or charged atoms, contains multiple unconnected parts, or has an atom or bond count of zero.
        Parameters:
        aMolecule - the molecule to check
        Returns:
        true if the given molecule is a valid parameter for ErtlFunctionalGroupsFinder.find() method
        Throws:
        java.lang.NullPointerException - if parameter is 'null'
      • isValidArgumentForFindMethod

        public static boolean isValidArgumentForFindMethod​(org.openscience.cdk.interfaces.IAtomContainer aMolecule,
                                                           boolean areSingleAtomsFiltered)
                                                    throws java.lang.NullPointerException
        Checks whether the given molecule represented by an atom container can be passed on to the ErtlFunctionalGroupsFinder.find() method without problems if(!) strict input restrictions are turned on (turned off by default).
        This method will return false if the molecule contains any metal, metalloid, pseudo, or charged atoms, contains multiple unconnected parts, or has an atom or bond count of zero. If the second parameter is set to "false", single atom molecules (bond count is 0) are accepted and not recommended to be filtered if they fulfill the other requirements.
        Parameters:
        aMolecule - the molecule to check
        areSingleAtomsFiltered - if false, molecules with bond count 0 but atom count 1 will return true (do not filter)
        Returns:
        true if the given molecule is a valid parameter for ErtlFunctionalGroupsFinder.find() method
        Throws:
        java.lang.NullPointerException - if parameter is 'null'
      • selectBiggestUnconnectedComponent

        public static org.openscience.cdk.interfaces.IAtomContainer selectBiggestUnconnectedComponent​(org.openscience.cdk.interfaces.IAtomContainer aMolecule)
                                                                                               throws java.lang.NullPointerException
        Returns the biggest unconnected component/structure of the given atom container, judging by the atom count. To pre-check whether the atom container consists of multiple unconnected components, use isStructureUnconnected(). All set properties of aMolecule will be set as properties of the returned atom container.
        NOTE: The atom, bond etc. objects of the given atom container are re-used in the returned atom container but the former remains unchanged
        Iterates through all unconnected components in the given atom container, so the method scales linearly with O(n) with n: number of unconnected components.
        Parameters:
        aMolecule - the molecule whose biggest unconnected component should be found
        Returns:
        the biggest (judging by the atom count) unconnected component of the given atom container
        Throws:
        java.lang.NullPointerException - if aMolecule is null or the biggest component
      • neutralizeCharges

        public static void neutralizeCharges​(org.openscience.cdk.interfaces.IAtomContainer aMolecule)
                                      throws java.lang.NullPointerException,
                                             org.openscience.cdk.exception.CDKException
        Neutralizes charged atoms in the given atom container by zeroing the formal atomic charges and filling up free valences with implicit hydrogen atoms (according to the CDK atom types). This procedure allows a more general charge treatment than a pre-defined transformation list but may produce "wrong" structures, e.g. it turns a nitro NO2 group into a structure represented by the SMILES code "[H]O[N](=O)*" with an uncharged four-bonded nitrogen atom (other examples are "*[N](*)(*)*", "[C]#[N]*" or "*S(*)(*)*"). Thus, an improved charge neutralization scheme is desirable for future implementations.
        NOTE: This method changes major properties and the composition of the given IAtomContainer object! If you want to retain your object unchanged for future calculations, use the IAtomContainer's clone() method.
        Iterates through all atoms in the given atom container, so the method scales linearly with O(n) with n: number of atoms.
        Parameters:
        aMolecule - the molecule to be neutralized
        Throws:
        java.lang.NullPointerException - if aMolecule is 'null' or one of its atoms
        org.openscience.cdk.exception.CDKException - if no matching atom type can be determined for one atom or there is a problem with adding the implicit hydrogen atoms.
      • neutralizeCharges

        public static void neutralizeCharges​(org.openscience.cdk.interfaces.IAtom anAtom,
                                             org.openscience.cdk.interfaces.IAtomContainer aParentMolecule)
                                      throws java.lang.NullPointerException,
                                             org.openscience.cdk.exception.CDKException
        Neutralizes a charged atom in the given parent atom container by zeroing the formal atomic charge and filling up free valences with implicit hydrogen atoms (according to the CDK atom types).
        NOTE: This method changes major properties and the composition of the given IAtom and IAtomContainer object! If you want to retain your objects unchanged for future calculations, use the IAtomContainer's clone() method.
        Parameters:
        anAtom - the atom to be neutralized
        aParentMolecule - the molecule the atom belongs to
        Throws:
        java.lang.NullPointerException - if anAtom or aParentMolecule is 'null'
        org.openscience.cdk.exception.CDKException - if the atom is not part of the molecule or no matching atom type can be determined for the atom or there is a problem with adding the implicit hydrogen atoms.
      • perceiveAtomTypesAndConfigureAtoms

        public static void perceiveAtomTypesAndConfigureAtoms​(org.openscience.cdk.interfaces.IAtomContainer aMolecule)
                                                       throws java.lang.NullPointerException,
                                                              org.openscience.cdk.exception.CDKException
        Convenience method to perceive atom types for all IAtoms in the IAtomContainer, using the CDK AtomContainerManipulator or rather the CDKAtomTypeMatcher. If the matcher finds a matching atom type, the IAtom will be configured to have the same properties as the IAtomType. If no matching atom type is found, no configuration is performed.
        Calling this method is equal to calling AtomContainerManipulator.percieveAtomTypesAndConfigureAtoms(aMolecule). It has been given its own method here because it is a necessary step in the preprocessing for ErtlFunctionalGroupsFinder.
        NOTE: This method changes major properties of the given IAtomContainer object! If you want to retain your object unchanged for future calculations, use the IAtomContainer's clone() method.
        Parameters:
        aMolecule - the molecule to configure
        Throws:
        java.lang.NullPointerException - is aMolecule is 'null'
        org.openscience.cdk.exception.CDKException - when something went wrong with going through the AtomType options
        See Also:
        AtomContainerManipulator.percieveAtomTypesAndConfigureAtoms(IAtomContainer), CDKAtomTypeMatcher.findMatchingAtomType(IAtomContainer, IAtom)
      • applyAromaticityDetection

        public static boolean applyAromaticityDetection​(org.openscience.cdk.interfaces.IAtomContainer aMolecule,
                                                        org.openscience.cdk.aromaticity.Aromaticity anAromaticityModel)
                                                 throws java.lang.NullPointerException,
                                                        org.openscience.cdk.exception.CDKException
        Convenience method for applying the given aromaticity model to the given molecule. Any existing aromaticity flags are removed - even if no aromatic bonds were found. This follows the idea of applying an aromaticity model to a molecule such that the result is the same irrespective of existing aromatic flags.
        Calling this method is equal to calling Aromaticity.apply(aMolecule). It has been given its own method here because it is a necessary step in the preprocessing for ErtlFunctionalGroupsFinder.
        NOTE: This method changes major properties and the composition of the given IAtomContainer object! If you want to retain your object unchanged for future calculations, use copy() in this class or the IAtomContainer's clone() method.
        Parameters:
        aMolecule - the molecule to apply the model to
        anAromaticityModel - the model to apply; Note that the choice of electron donation model and cycle finder algorithm has a heavy influence on the functional group detection of ErtlFunctionalGroupsFinder
        Returns:
        true if the molecule (or parts of it) is determined to be aromatic
        Throws:
        java.lang.NullPointerException - if a parameter is 'null'
        org.openscience.cdk.exception.CDKException - if a problem occurred with the cycle perception (see CDK docs)
        See Also:
        Aromaticity.apply(IAtomContainer)
      • applyFiltersAndPreprocessing

        public static org.openscience.cdk.interfaces.IAtomContainer applyFiltersAndPreprocessing​(org.openscience.cdk.interfaces.IAtomContainer aMolecule,
                                                                                                 org.openscience.cdk.aromaticity.Aromaticity anAromaticityModel)
                                                                                          throws java.lang.NullPointerException
        Checks whether the given molecule represented by an atom container should be filtered instead of being passed on to the ErtlFunctionalGroupsFinder.find() method and if not, applies necessary preprocessing steps. In the second case, this method applies preprocessing to the given atom container that is always needed (setting atom types and applying an aromaticity model) and preprocessing steps that are only needed in specific cases (selecting the biggest unconnected component, neutralizing charges). Molecules processed by this method can be passed on to find() without problems (Caution: The return value of this method is 'null' if the molecule should be filtered!) if(!) strict input restrictions are turned on (turned off by default).
        NOTE: This method changes major properties and the composition of the given IAtomContainer object! If you want to retain your object unchanged for future calculations, use the IAtomContainer's clone() method.
        NOTE2: The returned IAtomContainer object is the same as the one given as parameter!
        Parameters:
        aMolecule - the molecule to check and process
        anAromaticityModel - the aromaticity model to apply to the molecule in preprocessing; Note: The chosen ElectronDonation model can massively influence the extracted function groups of a molecule when using ErtlFunctionGroupsFinder!
        Returns:
        the preprocessed atom container or 'null' if the molecule should be discarded
        Throws:
        java.lang.NullPointerException - if a parameter is 'null'; Note: All other exceptions are caught and logged by this class' logger
      • applyFiltersAndPreprocessing

        public static org.openscience.cdk.interfaces.IAtomContainer applyFiltersAndPreprocessing​(org.openscience.cdk.interfaces.IAtomContainer aMolecule,
                                                                                                 org.openscience.cdk.aromaticity.Aromaticity anAromaticityModel,
                                                                                                 boolean areSingleAtomsFiltered)
                                                                                          throws java.lang.NullPointerException
        Checks whether the given molecule represented by an atom container should be filtered instead of being passed on to the ErtlFunctionalGroupsFinder.find() method and if not, applies necessary preprocessing steps. In the second case, this method applies preprocessing to the given atom container that is always needed (setting atom types and applying an aromaticity model) and preprocessing steps that are only needed in specific cases (selecting the biggest unconnected component, neutralizing charges). Molecules processed by this method can be passed on to find() without problems (Caution: The return value of this method is 'null' if the molecule should be filtered!) if(!) strict input restrictions are turned on (turned off by default).
        NOTE: This method changes major properties and the composition of the given IAtomContainer object! If you want to retain your object unchanged for future calculations, use the IAtomContainer's clone() method.
        NOTE2: The returned IAtomContainer object is the same as the one given as parameter!
        Parameters:
        aMolecule - the molecule to check and process
        anAromaticityModel - the aromaticity model to apply to the molecule in preprocessing; Note: The chosen ElectronDonation model can massively influence the extracted functional groups of a molecule when using ErtlFunctionGroupsFinder!
        areSingleAtomsFiltered - if false, molecules with bond count 0 but atom count 1 will be processed and not return null
        Returns:
        the preprocessed atom container or 'null' if the molecule should be discarded
        Throws:
        java.lang.NullPointerException - if a parameter is 'null'; Note: All other exceptions are caught and logged by this class' logger
      • restoreOriginalEnvironmentalCarbons

        public static void restoreOriginalEnvironmentalCarbons​(java.util.List<org.openscience.cdk.interfaces.IAtomContainer> aListOfFunctionalGroups,
                                                               org.openscience.cdk.interfaces.IAtomContainer aMolecule,
                                                               boolean aConvertExplicitHydrogens,
                                                               boolean aFillEmptyValences,
                                                               org.openscience.cdk.interfaces.IChemObjectBuilder aBuilder)
                                                        throws java.lang.NullPointerException,
                                                               java.lang.IllegalArgumentException
        Replaces the environmental carbon or pseudo-atoms (new IAtom objects) inserted by the EFGF in an identified functional group with the carbon IAtom objects from the original molecule object.
        Important note: This method only works if the atom container has not been cloned for the extraction of functional groups by ErtlFunctionalGroupsFinder. Use the method "List<IAtomContainer> find(IAtomContainer container, boolean clone)" with clone set to false for this purpose.
        Also note that the result differs if the environment has been generalized by the EFGF or not. In the former case, only environmental carbon atoms replaced by R-atoms in the generalized FG are restored.
        Parameters:
        aListOfFunctionalGroups - functional groups of the molecule identified by EFGF
        aMolecule - original structure in which the groups were identified
        aConvertExplicitHydrogens - should explicit hydrogen atoms in the functional groups be converted to implicit hydrogens
        aFillEmptyValences - should empty valences on the restored environmental carbon atoms be filled with implicit hydrogen atoms
        aBuilder - a chem object builder instance
        Throws:
        java.lang.NullPointerException - if a parameter is null
        java.lang.IllegalArgumentException - if one of the functional groups does not originate from the given molecule or the molecule has been cloned for the extraction of functional groups
      • createPseudoSmilesCode

        public static java.lang.String createPseudoSmilesCode​(org.openscience.cdk.interfaces.IAtomContainer aMolecule)
                                                       throws java.lang.NullPointerException,
                                                              org.openscience.cdk.exception.CDKException
        Gives the pseudo SMILES code for a given molecule / functional group. In this notation, aromatic atoms are marked by asterisks (*) and pseudo atoms are indicated by 'R'.
        The function generates the SMILES string of the given molecule using CDK's SmilesGenerator and then replaces lowercase c, n, o etc. by C*, N*, O* etc. and wildcards ('*') by 'R' in the resulting string. For that, the function iterates through all characters in the generated SMILES string.
        Note: All pseudo atoms or atoms that are represented by a wildcard ('*') in the generated SMILES string (e.g. the element [Uup] is interpreted by the CDK SmilesGenerator as a wildcard) are turned into an 'R' atom.
        Parameters:
        aMolecule - the molecule whose pseudo SMILES code to generate
        Returns:
        the pseudo SMILES representation as a string
        Throws:
        java.lang.NullPointerException - if aMolecule is 'null'
        org.openscience.cdk.exception.CDKException - if the SMILES code of aMolecule cannot be generated