Package org.openscience.cdk.tools
Class ErtlFunctionalGroupsFinder
- java.lang.Object
-
- org.openscience.cdk.tools.ErtlFunctionalGroupsFinder
-
public class ErtlFunctionalGroupsFinder extends java.lang.Object
Finds and extracts a molecule's functional groups in a purely rule-based manner. This class implements Peter Ertl's algorithm for the automated detection and extraction of functional groups in organic molecules ([Ertl P. An algorithm to identify functional groups in organic molecules. J Cheminform. 2017; 9:36.]) and has been described in a scientific publication ([Fritsch, S., Neumann, S., Schaub, J. et al. ErtlFunctionalGroupsFinder: automated rule-based functional group detection with the Chemistry Development Kit (CDK). J Cheminform. 2019; 11:37.]).
In brief, the algorithm iterates through all atoms in the input molecule and marks hetero atoms and specific carbon atoms (i.a. those in non-aromatic double or triple bonds etc.) as being part of a functional group. Connected groups of marked atoms are extracted as separate functional groups, together with their unmarked, "environmental" carbon atoms. These environments can be important, e.g. to differentiate an alcohol from a phenol, but are less important in other cases. To account for this, Ertl also devised a "generalization" scheme that generalizes the functional group environments in a way that accounts for their varying significance in different cases. Most environmental atoms are exchanged with pseudo ("R") atoms there. All these functionalities are available in ErtlFunctionalgroupsFinder. Additionally, only the marked atoms completely without their environments can be extracted.
To apply functional group detection to an input molecule, its atom types need to be set and aromaticity needs to be detected beforehand:
In order to only identify functional groups in standardised, organic structures, ErtlFunctionalGroupsFinder can be configured to only accept molecules that do *not* contain any metal, metalloid, or pseudo (R) atoms or formal charges. Also structures consisting of more than one unconnected component (e.g. ion and counter-ion) are not accepted if(!) the strict input restrictions are turned on (they are turned off by default). This can be done via a boolean parameter in a variant of the central find() method. To identify molecules that need to be filtered from the input set or preprocessed in this use case, convenience methods are available in this class. Please note that structural properties like formal charges and the others mentioned above are not expected to cause issues (exceptions) when processed by this class, but they are not explicitly regarded by the Ertl algorithm and hence this implementation, too. They might therefore cause unexpected behaviour in functional group identification. For example, a charge is not listed as a reason to mark a carbon atom.//Prepare input SmilesParser tmpSmiPar = new SmilesParser(SilentChemObjectBuilder.getInstance()); IAtomContainer tmpInputMol = tmpSmiPar.parseSmiles("C[C@@H]1CN(C[C@H](C)N1)C2=C(C(=C3C(=C2F)N(C=C(C3=O)C(=O)O)C4CC4)N)F"); //PubChem CID 5257 AtomContainerManipulator.percieveAtomTypesAndConfigureAtoms(tmpInputMol); Aromaticity tmpAromaticity = new Aromaticity(ElectronDonation.cdk(), Cycles.cdkAromaticSet()); tmpAromaticity.apply(tmpInputMol); //Identify functional groups ErtlFunctionalGroupsFinder tmpEFGF = new ErtlFunctionalGroupsFinder(); //default: generalization turned on List<IAtomContainer> tmpFunctionalGroupsList = tmpEFGF.find(tmpInputMol);
Note: this implementation is not thread-safe. Each parallel thread should have its own instance of this class.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
ErtlFunctionalGroupsFinder.Mode
Defines the mode for generalizing functional group environments (default), keeping them whole, or only extracting marked atoms.
-
Field Summary
Fields Modifier and Type Field Description static java.lang.String
CARBONYL_C_MARKER
Property name for marking carbonyl carbon atoms via IAtom properties.static org.openscience.cdk.tools.ILoggingTool
LOGGING_TOOL
CDK logging tool instance for this class.static java.util.Set<java.lang.Integer>
NONMETAL_ATOMIC_NUMBERS
Set of atomic numbers of nonmetal elements, namely hydrogen, carbon, nitrogen, oxygen, phosphorus, sulfur, selenium, halogens (fluorine, chlorine, bromine, iodine), and noble gases (helium, neon, argon, krypton, xenon, radon).
-
Constructor Summary
Constructors Constructor Description ErtlFunctionalGroupsFinder()
Default constructor for ErtlFunctionalGroupsFinder with functional group generalization turned ON.ErtlFunctionalGroupsFinder(ErtlFunctionalGroupsFinder.Mode anEnvMode)
Constructor for ErtlFunctionalGroupsFinder that allows setting the treatment of environments in the identified functional groups.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description static void
applyPreprocessing(org.openscience.cdk.interfaces.IAtomContainer aMolecule, org.openscience.cdk.aromaticity.Aromaticity anAromaticityModel)
Applies the always necessary preprocessing for functional group detection.static boolean
containsChargedAtom(org.openscience.cdk.interfaces.IAtomContainer aMolecule)
Iterates through all atoms in the given molecule and checks whether they are charged.static boolean
containsMetalMetalloidOrPseudoAtom(org.openscience.cdk.interfaces.IAtomContainer aMolecule)
Iterates through all atoms in the given molecule and checks them for metal, metalloid, and pseudo ("R") atoms.java.util.List<org.openscience.cdk.interfaces.IAtomContainer>
find(org.openscience.cdk.interfaces.IAtomContainer aMolecule)
Find all functional groups in a molecule.java.util.List<org.openscience.cdk.interfaces.IAtomContainer>
find(org.openscience.cdk.interfaces.IAtomContainer aMolecule, boolean aShouldInputBeCloned)
Find all functional groups in a molecule.java.util.List<org.openscience.cdk.interfaces.IAtomContainer>
find(org.openscience.cdk.interfaces.IAtomContainer aMolecule, boolean aShouldInputBeCloned, boolean anAreInputRestrictionsApplied)
Find all functional groups in a molecule.ErtlFunctionalGroupsFinder.Mode
getEnvMode()
Returns the current setting for the treatment of functional group environments after extraction.static java.util.Set<java.lang.Integer>
getNonmetalAtomicNumbers()
Returns the unmodifiable set containing the atomic numbers that can be passed on to ErtlFunctionalGroupsFinder.find() if(!) input restrictions are enabled (turned off by default).static boolean
isCharged(org.openscience.cdk.interfaces.IAtom anAtom)
Checks whether a given atom is charged.static boolean
isMetalMetalloidOrPseudoAtom(org.openscience.cdk.interfaces.IAtom anAtom)
Checks whether a given atom is a metal, metalloid, or pseudo atom judging by its atomic number.static boolean
isStructureUnconnected(org.openscience.cdk.interfaces.IAtomContainer aMolecule)
Checks whether the given molecule consists of two or more unconnected structures, e.g.static boolean
isValidInputMoleculeWithRestrictionsTurnedOn(org.openscience.cdk.interfaces.IAtomContainer aMolecule)
Checks whether the given molecule represented by an atom container can be passed on to the ErtlFunctionalGroupsFinder.find() method without problems even if(!) the input restrictions are turned on (turned off by default).static ErtlFunctionalGroupsFinder
newErtlFunctionalGroupsFinderFullEnvironmentMode()
Constructs a new ErtlFunctionalGroupsFinder instance with generalization of returned functional groups turned OFF.static ErtlFunctionalGroupsFinder
newErtlFunctionalGroupsFinderGeneralizingMode()
Constructs a new ErtlFunctionalGroupsFinder instance with generalization of returned functional groups turned ON.static ErtlFunctionalGroupsFinder
newErtlFunctionalGroupsFinderOnlyMarkedAtomsMode()
Constructs a new ErtlFunctionalGroupsFinder instance that extracts only the marked atoms of the functional groups, no attached environmental atoms.void
setEnvMode(ErtlFunctionalGroupsFinder.Mode anEnvMode)
Allows setting the treatment of functional group environments after extraction.
-
-
-
Field Detail
-
LOGGING_TOOL
public static final org.openscience.cdk.tools.ILoggingTool LOGGING_TOOL
CDK logging tool instance for this class. Use ErtlFunctionalGroupsFinder.LOGGING_TOOL.setLevel(ILoggingTool.DEBUG); to activate debug messages.
-
CARBONYL_C_MARKER
public static final java.lang.String CARBONYL_C_MARKER
Property name for marking carbonyl carbon atoms via IAtom properties.- See Also:
- Constant Field Values
-
NONMETAL_ATOMIC_NUMBERS
public static final java.util.Set<java.lang.Integer> NONMETAL_ATOMIC_NUMBERS
Set of atomic numbers of nonmetal elements, namely hydrogen, carbon, nitrogen, oxygen, phosphorus, sulfur, selenium, halogens (fluorine, chlorine, bromine, iodine), and noble gases (helium, neon, argon, krypton, xenon, radon). Atoms of these elements are exclusively accepted in the input molecule if(!) the strict input restrictions are activated (turned off by default).
-
-
Constructor Detail
-
ErtlFunctionalGroupsFinder
public ErtlFunctionalGroupsFinder()
Default constructor for ErtlFunctionalGroupsFinder with functional group generalization turned ON.
-
ErtlFunctionalGroupsFinder
public ErtlFunctionalGroupsFinder(ErtlFunctionalGroupsFinder.Mode anEnvMode)
Constructor for ErtlFunctionalGroupsFinder that allows setting the treatment of environments in the identified functional groups. Default: environments will be generalized; no generalization: environments will be kept as whole; only marked atoms: no environmental atoms whatsoever will be attached to the extracted functional groups.- Parameters:
anEnvMode
- mode for treating functional group environments (seeErtlFunctionalGroupsFinder.Mode
).
-
-
Method Detail
-
newErtlFunctionalGroupsFinderGeneralizingMode
public static ErtlFunctionalGroupsFinder newErtlFunctionalGroupsFinderGeneralizingMode()
Constructs a new ErtlFunctionalGroupsFinder instance with generalization of returned functional groups turned ON.- Returns:
- new ErtlFunctionalGroupsFinder instance that generalizes returned functional groups
-
newErtlFunctionalGroupsFinderFullEnvironmentMode
public static ErtlFunctionalGroupsFinder newErtlFunctionalGroupsFinderFullEnvironmentMode()
Constructs a new ErtlFunctionalGroupsFinder instance with generalization of returned functional groups turned OFF. The FG will have their full environments.- Returns:
- new ErtlFunctionalGroupsFinder instance that does NOT generalize returned functional groups
-
newErtlFunctionalGroupsFinderOnlyMarkedAtomsMode
public static ErtlFunctionalGroupsFinder newErtlFunctionalGroupsFinderOnlyMarkedAtomsMode()
Constructs a new ErtlFunctionalGroupsFinder instance that extracts only the marked atoms of the functional groups, no attached environmental atoms.- Returns:
- new ErtlFunctionalGroupsFinder instance that extracts only marked atoms
-
setEnvMode
public void setEnvMode(ErtlFunctionalGroupsFinder.Mode anEnvMode)
Allows setting the treatment of functional group environments after extraction. Default: environments will be generalized; no generalization: environments will be kept as whole; only marked atoms: no environmental atoms whatsoever will be attached to the extracted functional groups.- Parameters:
anEnvMode
- mode for treating functional group environments (seeErtlFunctionalGroupsFinder.Mode
).
-
getEnvMode
public ErtlFunctionalGroupsFinder.Mode getEnvMode()
Returns the current setting for the treatment of functional group environments after extraction.- Returns:
- currently set environment mode
-
find
public java.util.List<org.openscience.cdk.interfaces.IAtomContainer> find(org.openscience.cdk.interfaces.IAtomContainer aMolecule) throws java.lang.CloneNotSupportedException
Find all functional groups in a molecule. The input atom container instance is cloned before processing to leave the input container intact.Note: The strict input restrictions from previous versions (no charged atoms, metals, metalloids or unconnected components) do not apply anymore by default. They can be turned on again in another variant of this method below.
- Parameters:
aMolecule
- the molecule to identify functional groups in- Returns:
- a list with all functional groups found in the molecule
- Throws:
java.lang.CloneNotSupportedException
- if cloning is not possible
-
find
public java.util.List<org.openscience.cdk.interfaces.IAtomContainer> find(org.openscience.cdk.interfaces.IAtomContainer aMolecule, boolean aShouldInputBeCloned) throws java.lang.CloneNotSupportedException
Find all functional groups in a molecule.Note: The strict input restrictions from previous versions (no charged atoms, metals, metalloids or unconnected components) do not apply anymore by default. They can be turned on again in another variant of this method below.
- Parameters:
aMolecule
- the molecule to identify functional groups inaShouldInputBeCloned
- use 'false' to reuse the input container's bonds and atoms in the extraction of the functional groups; this may speed up the extraction and lower the memory consumption for processing large amounts of data but corrupts the original input container; use 'true' to work with a clone and leave the input container intact- Returns:
- a list with all functional groups found in the molecule
- Throws:
java.lang.CloneNotSupportedException
- if cloning is not possible
-
find
public java.util.List<org.openscience.cdk.interfaces.IAtomContainer> find(org.openscience.cdk.interfaces.IAtomContainer aMolecule, boolean aShouldInputBeCloned, boolean anAreInputRestrictionsApplied) throws java.lang.CloneNotSupportedException, java.lang.IllegalArgumentException
Find all functional groups in a molecule.- Parameters:
aMolecule
- the molecule to identify functional groups inaShouldInputBeCloned
- use 'false' to reuse the input container's bonds and atoms in the extraction of the functional groups; this may speed up the extraction and lower the memory consumption for processing large amounts of data but corrupts the original input container; use 'true' to work with a clone and leave the input container intactanAreInputRestrictionsApplied
- if true, the input must consist of one connected structure and may not contain charged atoms, metals or metalloids; an IllegalArgumentException will be thrown otherwise; see convenience methods in this class for detecting illegal input structures for this case- Returns:
- a list with all functional groups found in the molecule
- Throws:
java.lang.CloneNotSupportedException
- if cloning is not possiblejava.lang.IllegalArgumentException
- if input restrictions are applied and the given molecule does not fulfill them
-
applyPreprocessing
public static void applyPreprocessing(org.openscience.cdk.interfaces.IAtomContainer aMolecule, org.openscience.cdk.aromaticity.Aromaticity anAromaticityModel) throws java.lang.NullPointerException, java.lang.IllegalArgumentException
Applies the always necessary preprocessing for functional group detection. Atom types are set and aromaticity detected in the input molecule.
NOTE: This changes properties and flags in the given atom container instance. If you want to retain your object unchanged for future calculations, use the IAtomContainer's clone() method.- Parameters:
aMolecule
- the molecule to processanAromaticityModel
- the aromaticity model to apply to the molecule in preprocessing; Note: The chosen ElectronDonation model can massively influence the extracted functional groups of a molecule when using ErtlFunctionGroupsFinder!- Throws:
java.lang.NullPointerException
- if any parameter is nulljava.lang.IllegalArgumentException
- if the input molecule causes any other type of exception while processing
-
getNonmetalAtomicNumbers
public static java.util.Set<java.lang.Integer> getNonmetalAtomicNumbers()
Returns the unmodifiable set containing the atomic numbers that can be passed on to ErtlFunctionalGroupsFinder.find() if(!) input restrictions are enabled (turned off by default). These nonmetal elements include hydrogen, carbon, nitrogen, oxygen, phosphorus, sulfur, selenium, halogens (fluorine, chlorine, bromine, iodine), and noble gases (helium, neon, argon, krypton, xenon, radon). All other atomic numbers represent metal, metalloid, or pseudo ('R') atoms.
Convenience method analogous to usingErtlFunctionalGroupsFinder.NONMETAL_ATOMIC_NUMBERS
directly.- Returns:
- all valid atomic numbers for ErtlFunctionalGroupsFinder.find() if input restrictions are activated
-
isMetalMetalloidOrPseudoAtom
public static boolean isMetalMetalloidOrPseudoAtom(org.openscience.cdk.interfaces.IAtom anAtom) throws java.lang.NullPointerException
Checks whether a given atom is a metal, metalloid, or pseudo atom judging by its atomic number. These atoms cannot be passed on to ErtlFunctionalGroupsFinder.find() if(!) input restrictions are enabled (turned off by default).- Parameters:
anAtom
- the atom to check- Returns:
- true, if the atomic number is not in the nonmetal atomic numbers set or 'null'
- Throws:
java.lang.NullPointerException
- if the given atom is 'null'
-
containsMetalMetalloidOrPseudoAtom
public static boolean containsMetalMetalloidOrPseudoAtom(org.openscience.cdk.interfaces.IAtomContainer aMolecule) throws java.lang.NullPointerException
Iterates through all atoms in the given molecule and checks them for metal, metalloid, and pseudo ("R") atoms. If this method returns 'true', the molecule cannot be passed on to ErtlFunctionalGroupsFinder.find() if(!) input restrictions are enabled (turned off by default). If you are using the strict input restrictions to only identify functional groups in standardised, organic structures, you should filter the molecules where this method returns true from your input set.
This method scales linearly with O(n) with n: number of atoms in the given molecule.- Parameters:
aMolecule
- the molecule to check- Returns:
- true, if the molecule contains one or more metal, metalloid, or pseudo ("R") atoms
- Throws:
java.lang.NullPointerException
- if the given molecule (or one of its atoms) is 'null'
-
isCharged
public static boolean isCharged(org.openscience.cdk.interfaces.IAtom anAtom) throws java.lang.NullPointerException
Checks whether a given atom is charged. These atoms cannot be passed on to ErtlFunctionalGroupsFinder.find() if(!) input restrictions are enabled (turned off by default).- Parameters:
anAtom
- the atom to check- Returns:
- true, if the atom is charged
- Throws:
java.lang.NullPointerException
- if the given atom is 'null'
-
containsChargedAtom
public static boolean containsChargedAtom(org.openscience.cdk.interfaces.IAtomContainer aMolecule) throws java.lang.NullPointerException
Iterates through all atoms in the given molecule and checks whether they are charged. If this method returns 'true', the molecule cannot be passed on to ErtlFunctionalGroupsFinder.find() if(!) input restrictions are enabled (turned off by default). If you are using the strict input restrictions to only identify functional groups in standardised, organic structures, you can try to neutralise the charges in the molecules where this method returns true by standardisation routines.
This method scales linearly with O(n) with n: number of atoms in the given molecule.- Parameters:
aMolecule
- the molecule to check- Returns:
- true, if the molecule contains one or more charged atoms
- Throws:
java.lang.NullPointerException
- if the given molecule is 'null'
-
isStructureUnconnected
public static boolean isStructureUnconnected(org.openscience.cdk.interfaces.IAtomContainer aMolecule) throws java.lang.NullPointerException
Checks whether the given molecule consists of two or more unconnected structures, e.g. ion and counter-ion. This would make it unfit to be passed to ErtlFunctionalGroupsFinder.find() if(!) the input restrictions are turned on (turned off by default). If you are using the strict input restrictions to only identify functional groups in standardised, organic structures, you can try to select the biggest connected component in the input atom containers where this method returns true and only pass that to ErtlFunctionalGroupsFinder. Note: this is a convenience method basically applyingConnectivityChecker.isConnected(aMolecule);
.- Parameters:
aMolecule
- the molecule to check- Returns:
- true, if the molecule consists of two or more unconnected structures
- Throws:
java.lang.NullPointerException
- if the given molecule is 'null'
-
isValidInputMoleculeWithRestrictionsTurnedOn
public static boolean isValidInputMoleculeWithRestrictionsTurnedOn(org.openscience.cdk.interfaces.IAtomContainer aMolecule) throws java.lang.NullPointerException, java.lang.IllegalArgumentException
Checks whether the given molecule represented by an atom container can be passed on to the ErtlFunctionalGroupsFinder.find() method without problems even if(!) the input restrictions are turned on (turned off by default).
This method will return false if the molecule contains any metal, metalloid, pseudo, or charged atoms or consists of multiple unconnected parts. Some of these issues (charges and multiple unconnected components) can be solved by respective standardisation routines.- Parameters:
aMolecule
- the molecule to check- Returns:
- true if the given molecule is a valid parameter for ErtlFunctionalGroupsFinder.find() method if(!) the input restrictions are turned on (turned off by default)
- Throws:
java.lang.NullPointerException
- if parameter is 'null'java.lang.IllegalArgumentException
- if the input molecule causes any other type of exception while processing
-
-