Class ErtlFunctionalGroupsFinder


  • public class ErtlFunctionalGroupsFinder
    extends java.lang.Object
    Finds and extracts a molecule's functional groups in a purely rule-based manner. This class implements Peter Ertl's algorithm for the automated detection and extraction of functional groups in organic molecules ([Ertl P. An algorithm to identify functional groups in organic molecules. J Cheminform. 2017; 9:36.]) and has been described in a scientific publication ([Fritsch, S., Neumann, S., Schaub, J. et al. ErtlFunctionalGroupsFinder: automated rule-based functional group detection with the Chemistry Development Kit (CDK). J Cheminform. 2019; 11:37.]).

    In brief, the algorithm iterates through all atoms in the input molecule and marks hetero atoms and specific carbon atoms (i.a. those in non-aromatic double or triple bonds etc.) as being part of a functional group. Connected groups of marked atoms are extracted as separate functional groups, together with their unmarked, "environmental" carbon atoms. These environments can be important, e.g. to differentiate an alcohol from a phenol, but are less important in other cases. To account for this, Ertl also devised a "generalization" scheme that generalizes the functional group environments in a way that accounts for their varying significance in different cases. Most environmental atoms are exchanged with pseudo ("R") atoms there. All these functionalities are available in ErtlFunctionalgroupsFinder. Additionally, only the marked atoms completely without their environments can be extracted.

    To apply functional group detection to an input molecule, its atom types need to be set and aromaticity needs to be detected beforehand:
     //Prepare input
     SmilesParser tmpSmiPar = new SmilesParser(SilentChemObjectBuilder.getInstance());
     IAtomContainer tmpInputMol = tmpSmiPar.parseSmiles("C[C@@H]1CN(C[C@H](C)N1)C2=C(C(=C3C(=C2F)N(C=C(C3=O)C(=O)O)C4CC4)N)F"); //PubChem CID 5257
     AtomContainerManipulator.percieveAtomTypesAndConfigureAtoms(tmpInputMol);
     Aromaticity tmpAromaticity = new Aromaticity(ElectronDonation.cdk(), Cycles.cdkAromaticSet());
     tmpAromaticity.apply(tmpInputMol);
     //Identify functional groups
     ErtlFunctionalGroupsFinder tmpEFGF = new ErtlFunctionalGroupsFinder(); //default: generalization turned on
     List<IAtomContainer> tmpFunctionalGroupsList = tmpEFGF.find(tmpInputMol);
     
    In order to only identify functional groups in standardised, organic structures, ErtlFunctionalGroupsFinder can be configured to only accept molecules that do *not* contain any metal, metalloid, or pseudo (R) atoms or formal charges. Also structures consisting of more than one unconnected component (e.g. ion and counter-ion) are not accepted if(!) the strict input restrictions are turned on (they are turned off by default). This can be done via a boolean parameter in a variant of the central find() method. To identify molecules that need to be filtered from the input set or preprocessed in this use case, convenience methods are available in this class. Please note that structural properties like formal charges and the others mentioned above are not expected to cause issues (exceptions) when processed by this class, but they are not explicitly regarded by the Ertl algorithm and hence this implementation, too. They might therefore cause unexpected behaviour in functional group identification. For example, a charge is not listed as a reason to mark a carbon atom.

    Note: this implementation is not thread-safe. Each parallel thread should have its own instance of this class.
    • Nested Class Summary

      Nested Classes 
      Modifier and Type Class Description
      static class  ErtlFunctionalGroupsFinder.Mode
      Defines the mode for generalizing functional group environments (default), keeping them whole, or only extracting marked atoms.
    • Field Summary

      Fields 
      Modifier and Type Field Description
      static java.lang.String CARBONYL_C_MARKER
      Property name for marking carbonyl carbon atoms via IAtom properties.
      static org.openscience.cdk.tools.ILoggingTool LOGGING_TOOL
      CDK logging tool instance for this class.
      static java.util.Set<java.lang.Integer> NONMETAL_ATOMIC_NUMBERS
      Set of atomic numbers of nonmetal elements, namely hydrogen, carbon, nitrogen, oxygen, phosphorus, sulfur, selenium, halogens (fluorine, chlorine, bromine, iodine), and noble gases (helium, neon, argon, krypton, xenon, radon).
    • Method Summary

      All Methods Static Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      static void applyPreprocessing​(org.openscience.cdk.interfaces.IAtomContainer aMolecule, org.openscience.cdk.aromaticity.Aromaticity anAromaticityModel)
      Applies the always necessary preprocessing for functional group detection.
      static boolean containsChargedAtom​(org.openscience.cdk.interfaces.IAtomContainer aMolecule)
      Iterates through all atoms in the given molecule and checks whether they are charged.
      static boolean containsMetalMetalloidOrPseudoAtom​(org.openscience.cdk.interfaces.IAtomContainer aMolecule)
      Iterates through all atoms in the given molecule and checks them for metal, metalloid, and pseudo ("R") atoms.
      java.util.List<org.openscience.cdk.interfaces.IAtomContainer> find​(org.openscience.cdk.interfaces.IAtomContainer aMolecule)
      Find all functional groups in a molecule.
      java.util.List<org.openscience.cdk.interfaces.IAtomContainer> find​(org.openscience.cdk.interfaces.IAtomContainer aMolecule, boolean aShouldInputBeCloned)
      Find all functional groups in a molecule.
      java.util.List<org.openscience.cdk.interfaces.IAtomContainer> find​(org.openscience.cdk.interfaces.IAtomContainer aMolecule, boolean aShouldInputBeCloned, boolean anAreInputRestrictionsApplied)
      Find all functional groups in a molecule.
      ErtlFunctionalGroupsFinder.Mode getEnvMode()
      Returns the current setting for the treatment of functional group environments after extraction.
      static java.util.Set<java.lang.Integer> getNonmetalAtomicNumbers()
      Returns the unmodifiable set containing the atomic numbers that can be passed on to ErtlFunctionalGroupsFinder.find() if(!) input restrictions are enabled (turned off by default).
      static boolean isCharged​(org.openscience.cdk.interfaces.IAtom anAtom)
      Checks whether a given atom is charged.
      static boolean isMetalMetalloidOrPseudoAtom​(org.openscience.cdk.interfaces.IAtom anAtom)
      Checks whether a given atom is a metal, metalloid, or pseudo atom judging by its atomic number.
      static boolean isStructureUnconnected​(org.openscience.cdk.interfaces.IAtomContainer aMolecule)
      Checks whether the given molecule consists of two or more unconnected structures, e.g.
      static boolean isValidInputMoleculeWithRestrictionsTurnedOn​(org.openscience.cdk.interfaces.IAtomContainer aMolecule)
      Checks whether the given molecule represented by an atom container can be passed on to the ErtlFunctionalGroupsFinder.find() method without problems even if(!) the input restrictions are turned on (turned off by default).
      static ErtlFunctionalGroupsFinder newErtlFunctionalGroupsFinderFullEnvironmentMode()
      Constructs a new ErtlFunctionalGroupsFinder instance with generalization of returned functional groups turned OFF.
      static ErtlFunctionalGroupsFinder newErtlFunctionalGroupsFinderGeneralizingMode()
      Constructs a new ErtlFunctionalGroupsFinder instance with generalization of returned functional groups turned ON.
      static ErtlFunctionalGroupsFinder newErtlFunctionalGroupsFinderOnlyMarkedAtomsMode()
      Constructs a new ErtlFunctionalGroupsFinder instance that extracts only the marked atoms of the functional groups, no attached environmental atoms.
      void setEnvMode​(ErtlFunctionalGroupsFinder.Mode anEnvMode)
      Allows setting the treatment of functional group environments after extraction.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Field Detail

      • LOGGING_TOOL

        public static final org.openscience.cdk.tools.ILoggingTool LOGGING_TOOL
        CDK logging tool instance for this class. Use ErtlFunctionalGroupsFinder.LOGGING_TOOL.setLevel(ILoggingTool.DEBUG); to activate debug messages.
      • CARBONYL_C_MARKER

        public static final java.lang.String CARBONYL_C_MARKER
        Property name for marking carbonyl carbon atoms via IAtom properties.
        See Also:
        Constant Field Values
      • NONMETAL_ATOMIC_NUMBERS

        public static final java.util.Set<java.lang.Integer> NONMETAL_ATOMIC_NUMBERS
        Set of atomic numbers of nonmetal elements, namely hydrogen, carbon, nitrogen, oxygen, phosphorus, sulfur, selenium, halogens (fluorine, chlorine, bromine, iodine), and noble gases (helium, neon, argon, krypton, xenon, radon). Atoms of these elements are exclusively accepted in the input molecule if(!) the strict input restrictions are activated (turned off by default).
    • Constructor Detail

      • ErtlFunctionalGroupsFinder

        public ErtlFunctionalGroupsFinder()
        Default constructor for ErtlFunctionalGroupsFinder with functional group generalization turned ON.
      • ErtlFunctionalGroupsFinder

        public ErtlFunctionalGroupsFinder​(ErtlFunctionalGroupsFinder.Mode anEnvMode)
        Constructor for ErtlFunctionalGroupsFinder that allows setting the treatment of environments in the identified functional groups. Default: environments will be generalized; no generalization: environments will be kept as whole; only marked atoms: no environmental atoms whatsoever will be attached to the extracted functional groups.
        Parameters:
        anEnvMode - mode for treating functional group environments (see ErtlFunctionalGroupsFinder.Mode).
    • Method Detail

      • newErtlFunctionalGroupsFinderGeneralizingMode

        public static ErtlFunctionalGroupsFinder newErtlFunctionalGroupsFinderGeneralizingMode()
        Constructs a new ErtlFunctionalGroupsFinder instance with generalization of returned functional groups turned ON.
        Returns:
        new ErtlFunctionalGroupsFinder instance that generalizes returned functional groups
      • newErtlFunctionalGroupsFinderFullEnvironmentMode

        public static ErtlFunctionalGroupsFinder newErtlFunctionalGroupsFinderFullEnvironmentMode()
        Constructs a new ErtlFunctionalGroupsFinder instance with generalization of returned functional groups turned OFF. The FG will have their full environments.
        Returns:
        new ErtlFunctionalGroupsFinder instance that does NOT generalize returned functional groups
      • newErtlFunctionalGroupsFinderOnlyMarkedAtomsMode

        public static ErtlFunctionalGroupsFinder newErtlFunctionalGroupsFinderOnlyMarkedAtomsMode()
        Constructs a new ErtlFunctionalGroupsFinder instance that extracts only the marked atoms of the functional groups, no attached environmental atoms.
        Returns:
        new ErtlFunctionalGroupsFinder instance that extracts only marked atoms
      • setEnvMode

        public void setEnvMode​(ErtlFunctionalGroupsFinder.Mode anEnvMode)
        Allows setting the treatment of functional group environments after extraction. Default: environments will be generalized; no generalization: environments will be kept as whole; only marked atoms: no environmental atoms whatsoever will be attached to the extracted functional groups.
        Parameters:
        anEnvMode - mode for treating functional group environments (see ErtlFunctionalGroupsFinder.Mode).
      • getEnvMode

        public ErtlFunctionalGroupsFinder.Mode getEnvMode()
        Returns the current setting for the treatment of functional group environments after extraction.
        Returns:
        currently set environment mode
      • find

        public java.util.List<org.openscience.cdk.interfaces.IAtomContainer> find​(org.openscience.cdk.interfaces.IAtomContainer aMolecule)
                                                                           throws java.lang.CloneNotSupportedException
        Find all functional groups in a molecule. The input atom container instance is cloned before processing to leave the input container intact.

        Note: The strict input restrictions from previous versions (no charged atoms, metals, metalloids or unconnected components) do not apply anymore by default. They can be turned on again in another variant of this method below.

        Parameters:
        aMolecule - the molecule to identify functional groups in
        Returns:
        a list with all functional groups found in the molecule
        Throws:
        java.lang.CloneNotSupportedException - if cloning is not possible
      • find

        public java.util.List<org.openscience.cdk.interfaces.IAtomContainer> find​(org.openscience.cdk.interfaces.IAtomContainer aMolecule,
                                                                                  boolean aShouldInputBeCloned)
                                                                           throws java.lang.CloneNotSupportedException
        Find all functional groups in a molecule.

        Note: The strict input restrictions from previous versions (no charged atoms, metals, metalloids or unconnected components) do not apply anymore by default. They can be turned on again in another variant of this method below.

        Parameters:
        aMolecule - the molecule to identify functional groups in
        aShouldInputBeCloned - use 'false' to reuse the input container's bonds and atoms in the extraction of the functional groups; this may speed up the extraction and lower the memory consumption for processing large amounts of data but corrupts the original input container; use 'true' to work with a clone and leave the input container intact
        Returns:
        a list with all functional groups found in the molecule
        Throws:
        java.lang.CloneNotSupportedException - if cloning is not possible
      • find

        public java.util.List<org.openscience.cdk.interfaces.IAtomContainer> find​(org.openscience.cdk.interfaces.IAtomContainer aMolecule,
                                                                                  boolean aShouldInputBeCloned,
                                                                                  boolean anAreInputRestrictionsApplied)
                                                                           throws java.lang.CloneNotSupportedException,
                                                                                  java.lang.IllegalArgumentException
        Find all functional groups in a molecule.
        Parameters:
        aMolecule - the molecule to identify functional groups in
        aShouldInputBeCloned - use 'false' to reuse the input container's bonds and atoms in the extraction of the functional groups; this may speed up the extraction and lower the memory consumption for processing large amounts of data but corrupts the original input container; use 'true' to work with a clone and leave the input container intact
        anAreInputRestrictionsApplied - if true, the input must consist of one connected structure and may not contain charged atoms, metals or metalloids; an IllegalArgumentException will be thrown otherwise; see convenience methods in this class for detecting illegal input structures for this case
        Returns:
        a list with all functional groups found in the molecule
        Throws:
        java.lang.CloneNotSupportedException - if cloning is not possible
        java.lang.IllegalArgumentException - if input restrictions are applied and the given molecule does not fulfill them
      • applyPreprocessing

        public static void applyPreprocessing​(org.openscience.cdk.interfaces.IAtomContainer aMolecule,
                                              org.openscience.cdk.aromaticity.Aromaticity anAromaticityModel)
                                       throws java.lang.NullPointerException,
                                              java.lang.IllegalArgumentException
        Applies the always necessary preprocessing for functional group detection. Atom types are set and aromaticity detected in the input molecule.
        NOTE: This changes properties and flags in the given atom container instance. If you want to retain your object unchanged for future calculations, use the IAtomContainer's clone() method.
        Parameters:
        aMolecule - the molecule to process
        anAromaticityModel - the aromaticity model to apply to the molecule in preprocessing; Note: The chosen ElectronDonation model can massively influence the extracted functional groups of a molecule when using ErtlFunctionGroupsFinder!
        Throws:
        java.lang.NullPointerException - if any parameter is null
        java.lang.IllegalArgumentException - if the input molecule causes any other type of exception while processing
      • getNonmetalAtomicNumbers

        public static java.util.Set<java.lang.Integer> getNonmetalAtomicNumbers()
        Returns the unmodifiable set containing the atomic numbers that can be passed on to ErtlFunctionalGroupsFinder.find() if(!) input restrictions are enabled (turned off by default). These nonmetal elements include hydrogen, carbon, nitrogen, oxygen, phosphorus, sulfur, selenium, halogens (fluorine, chlorine, bromine, iodine), and noble gases (helium, neon, argon, krypton, xenon, radon). All other atomic numbers represent metal, metalloid, or pseudo ('R') atoms.
        Convenience method analogous to using ErtlFunctionalGroupsFinder.NONMETAL_ATOMIC_NUMBERS directly.
        Returns:
        all valid atomic numbers for ErtlFunctionalGroupsFinder.find() if input restrictions are activated
      • isMetalMetalloidOrPseudoAtom

        public static boolean isMetalMetalloidOrPseudoAtom​(org.openscience.cdk.interfaces.IAtom anAtom)
                                                    throws java.lang.NullPointerException
        Checks whether a given atom is a metal, metalloid, or pseudo atom judging by its atomic number. These atoms cannot be passed on to ErtlFunctionalGroupsFinder.find() if(!) input restrictions are enabled (turned off by default).
        Parameters:
        anAtom - the atom to check
        Returns:
        true, if the atomic number is not in the nonmetal atomic numbers set or 'null'
        Throws:
        java.lang.NullPointerException - if the given atom is 'null'
      • containsMetalMetalloidOrPseudoAtom

        public static boolean containsMetalMetalloidOrPseudoAtom​(org.openscience.cdk.interfaces.IAtomContainer aMolecule)
                                                          throws java.lang.NullPointerException
        Iterates through all atoms in the given molecule and checks them for metal, metalloid, and pseudo ("R") atoms. If this method returns 'true', the molecule cannot be passed on to ErtlFunctionalGroupsFinder.find() if(!) input restrictions are enabled (turned off by default). If you are using the strict input restrictions to only identify functional groups in standardised, organic structures, you should filter the molecules where this method returns true from your input set.
        This method scales linearly with O(n) with n: number of atoms in the given molecule.
        Parameters:
        aMolecule - the molecule to check
        Returns:
        true, if the molecule contains one or more metal, metalloid, or pseudo ("R") atoms
        Throws:
        java.lang.NullPointerException - if the given molecule (or one of its atoms) is 'null'
      • isCharged

        public static boolean isCharged​(org.openscience.cdk.interfaces.IAtom anAtom)
                                 throws java.lang.NullPointerException
        Checks whether a given atom is charged. These atoms cannot be passed on to ErtlFunctionalGroupsFinder.find() if(!) input restrictions are enabled (turned off by default).
        Parameters:
        anAtom - the atom to check
        Returns:
        true, if the atom is charged
        Throws:
        java.lang.NullPointerException - if the given atom is 'null'
      • containsChargedAtom

        public static boolean containsChargedAtom​(org.openscience.cdk.interfaces.IAtomContainer aMolecule)
                                           throws java.lang.NullPointerException
        Iterates through all atoms in the given molecule and checks whether they are charged. If this method returns 'true', the molecule cannot be passed on to ErtlFunctionalGroupsFinder.find() if(!) input restrictions are enabled (turned off by default). If you are using the strict input restrictions to only identify functional groups in standardised, organic structures, you can try to neutralise the charges in the molecules where this method returns true by standardisation routines.
        This method scales linearly with O(n) with n: number of atoms in the given molecule.
        Parameters:
        aMolecule - the molecule to check
        Returns:
        true, if the molecule contains one or more charged atoms
        Throws:
        java.lang.NullPointerException - if the given molecule is 'null'
      • isStructureUnconnected

        public static boolean isStructureUnconnected​(org.openscience.cdk.interfaces.IAtomContainer aMolecule)
                                              throws java.lang.NullPointerException
        Checks whether the given molecule consists of two or more unconnected structures, e.g. ion and counter-ion. This would make it unfit to be passed to ErtlFunctionalGroupsFinder.find() if(!) the input restrictions are turned on (turned off by default). If you are using the strict input restrictions to only identify functional groups in standardised, organic structures, you can try to select the biggest connected component in the input atom containers where this method returns true and only pass that to ErtlFunctionalGroupsFinder. Note: this is a convenience method basically applying ConnectivityChecker.isConnected(aMolecule);.
        Parameters:
        aMolecule - the molecule to check
        Returns:
        true, if the molecule consists of two or more unconnected structures
        Throws:
        java.lang.NullPointerException - if the given molecule is 'null'
      • isValidInputMoleculeWithRestrictionsTurnedOn

        public static boolean isValidInputMoleculeWithRestrictionsTurnedOn​(org.openscience.cdk.interfaces.IAtomContainer aMolecule)
                                                                    throws java.lang.NullPointerException,
                                                                           java.lang.IllegalArgumentException
        Checks whether the given molecule represented by an atom container can be passed on to the ErtlFunctionalGroupsFinder.find() method without problems even if(!) the input restrictions are turned on (turned off by default).
        This method will return false if the molecule contains any metal, metalloid, pseudo, or charged atoms or consists of multiple unconnected parts. Some of these issues (charges and multiple unconnected components) can be solved by respective standardisation routines.
        Parameters:
        aMolecule - the molecule to check
        Returns:
        true if the given molecule is a valid parameter for ErtlFunctionalGroupsFinder.find() method if(!) the input restrictions are turned on (turned off by default)
        Throws:
        java.lang.NullPointerException - if parameter is 'null'
        java.lang.IllegalArgumentException - if the input molecule causes any other type of exception while processing