Class SugarRemovalUtility
java.lang.Object
de.unijena.cheminf.deglycosylation.SugarRemovalUtility
The Sugar Removal Utility (SRU) implements a generalized algorithm for automated detection of circular and linear
sugars in molecular structures and their removal, as described in
"Schaub, J., Zielesny, A., Steinbeck, C., Sorokina, M. Too sweet: cheminformatics for deglycosylation in natural products. J Cheminform 12, 67 (2020). https://doi.org/10.1186/s13321-020-00467-y".
It offers various functions to detect and remove sugar moieties with different options.
- Version:
- 1.3.2.1
- Author:
- Jonas Schaub, Maria Sorokina
-
Nested Class Summary
Modifier and TypeClassDescriptionstatic enum
Enum with options for how to determine whether a substructure that gets disconnected from the molecule during the removal of a sugar moiety should be preserved or can get removed along with the sugar. -
Field Summary
Modifier and TypeFieldDescriptionstatic final boolean
Default setting for whether to add a property to given atom containers to indicate that the structure contains (or contained before removal) sugar moieties (default: true).static final String[]
Circular sugar structures represented as SMILES codes.static final String
Property key to indicate that the structure contains (or contained before removal) circular sugar moieties.static final String
Property key to indicate that the structure contains (or contained before removal) linear sugar moieties.static final String
Property key to indicate that the structure contains (or contained before removal) sugar moieties (of any kind).static final boolean
Default setting for whether detected circular sugar candidates must have a sufficient number of attached, single-bonded exocyclic oxygen atoms in order to be detected as a sugar moiety (default: true).static final boolean
Default setting for whether only circular sugar moieties that are attached to the parent structure or other sugar moieties via an O-glycosidic bond should be detected and subsequently removed (default: false).static final boolean
Default setting for whether sugar-like rings that have keto groups should also be detected as circular sugars (default: false).static final boolean
Default setting for whether to include the linear acidic sugar patterns in the linear sugar structures used for initial detection of linear sugars in a given molecule (default: false).static final boolean
Default setting for whether linear sugar structures that are part of a ring should be detected (default: false).static final boolean
Default setting for whether to include spiro rings in the initial set of detected rings considered for circular sugar detection (default: false).static final org.openscience.cdk.smarts.SmartsPattern
Daylight SMARTS pattern for matching ester bonds between linear sugars.static final org.openscience.cdk.smarts.SmartsPattern
Daylight SMARTS pattern for matching ether bonds between linear sugars.static final double
Default setting for the minimum ratio of attached exocyclic, single-bonded oxygen atoms to the number of atoms in the candidate circular sugar structure to reach in order to be classified as a sugar moiety if the number of exocyclic oxygen atoms should be evaluated (default: 0.5 so at a minimum 3 connected, exocyclic oxygen atoms for a six-membered ring, for example).static final String
Property key for index that is added to any IAtom object in a given IAtomContainer object for internal unique identification of the respective IAtom object.static final String
Key for property that is added to IAtom objects that connect a spiro ring system for identification and preservation of these atoms in the removal process.static final String[]
Linear acidic sugar structures represented as SMILES codes.static final int
Default setting for the maximum number of carbon atoms a linear sugar candidate can have in order to be detected as a sugar moiety (and subsequently be removed, default: 7, inclusive).static final int
Default setting for the minimum number of carbon atoms a linear sugar candidate must have in order to be detected as a sugar moiety (and subsequently be removed, default: 4, inclusive).static final String[]
Linear sugar structures represented as SMILES codes.static final org.openscience.cdk.smarts.SmartsPattern
Daylight SMARTS pattern for matching peroxide bonds between linear sugars.static final SugarRemovalUtility.PreservationModeOption
Default setting for how to determine whether a substructure that gets disconnected from the molecule during the removal of a sugar moiety should be preserved or can get removed along with the sugar.static final boolean
Default setting for whether only terminal sugar moieties should be removed, i.e. -
Constructor Summary
ConstructorDescriptionSugarRemovalUtility
(org.openscience.cdk.interfaces.IChemObjectBuilder aBuilder) Sole constructor of this class. -
Method Summary
Modifier and TypeMethodDescriptionboolean
addCircularSugarToPatternsList
(String aSmilesCode) Allows to add an additional sugar ring (represented as a SMILES string) to the list of circular sugar structures an input molecule is scanned for in circular sugar detection.boolean
addCircularSugarToPatternsList
(org.openscience.cdk.interfaces.IAtomContainer aCircularSugar) Allows to add an additional sugar ring to the list of circular sugar structures an input molecule is scanned for in circular sugar detection.boolean
addLinearSugarToPatternsList
(String aSmilesCode) Allows to add an additional linear sugar (represented as SMILES string) to the list of linear sugar structures an input molecule is scanned for in linear sugar detection.boolean
addLinearSugarToPatternsList
(org.openscience.cdk.interfaces.IAtomContainer aLinearSugar) Allows to add an additional linear sugar to the list of linear sugar structures an input molecule is scanned for in linear sugar detection.protected void
addUniqueIndicesToAtoms
(org.openscience.cdk.interfaces.IAtomContainer aMolecule) Adds an index as property to all atom objects of the given atom container to identify them uniquely within the atom container and its clones.protected boolean
areAllExocyclicBondsSingle
(org.openscience.cdk.interfaces.IAtomContainer aRingToTest, org.openscience.cdk.interfaces.IAtomContainer anOriginalMolecule, boolean anIgnoreKetoGroups) Checks whether all exocyclic bonds connected to a given ring fragment of a parent atom container are of single order.boolean
Specifies whether potential sugar cycles with keto groups are detected in circular sugar detection.boolean
Specifies whether linear acidic sugar patterns are currently included in the linear sugar structures used for initial detection of linear sugars in a given molecule.boolean
Specifies whether linear sugar structures that are part of a ring should be detected according to the current settings.boolean
Specifies whether detected circular sugar candidates must have a sufficient number of attached exocyclic oxygen atoms in order to be detected as a sugar moiety.boolean
Specifies whether only circular sugar moieties that are attached to the parent structure or other sugar moieties via an O-glycosidic bond should be detected and subsequently removed.boolean
Specifies whether only terminal sugar moieties should be removed, i.e.boolean
Specifies whether a property is added to given atom containers that contain (or contained before removal) sugar moieties.boolean
Specifies whether spiro rings are included in the initial set of detected rings considered for circular sugar detection.protected boolean
checkUniqueIndicesOfAtoms
(org.openscience.cdk.interfaces.IAtomContainer aMolecule) Checks whether all atoms in the given molecule have a unique (in the given molecule) index as property.void
Clears all the circular sugar structures an input molecule is scanned for in circular sugar detection.void
Clears all the linear sugar structures an input molecule is scanned for in linear sugar detection.protected List<org.openscience.cdk.interfaces.IAtomContainer>
combineOverlappingCandidates
(List<org.openscience.cdk.interfaces.IAtomContainer> aCandidateList) Combines all overlapping (i.e.protected List<org.openscience.cdk.interfaces.IAtomContainer>
detectLinearSugarCandidatesByPatternMatching
(org.openscience.cdk.interfaces.IAtomContainer aMolecule) Initial detection of linear sugar candidates by substructure search for the linear sugar patterns in the given molecule.protected List<org.openscience.cdk.interfaces.IAtomContainer>
detectPotentialSugarCycles
(org.openscience.cdk.interfaces.IAtomContainer aMolecule, boolean anIncludeSpiroRings, boolean anIgnoreKetoGroups) Detects and returns cycles of the given molecule that are isolated (spiro rings included or not according to the boolean parameter), isomorph to the circular sugar patterns, and only have exocyclic single bonds (keto groups ignored or not according to the boolean parameter).protected boolean
doesRingHaveEnoughExocyclicOxygenAtoms
(int aNumberOfAtomsInRing, int aNumberOfAttachedExocyclicOxygenAtoms) Simple decision-making function for deciding whether a candidate sugar ring has enough attached, single-bonded exocyclic oxygen atoms according to the set threshold.protected String
generateSubstructureIdentifier
(org.openscience.cdk.interfaces.IAtomContainer aSubstructure) Creates an identifier string for substructures of a molecule, based on the unique indices of the included atoms.generateSubstructureIdentifiers
(List<org.openscience.cdk.interfaces.IAtomContainer> aSubstructureList) Creates an identifier string for every substructures in the given list, based on the unique indices of the included atoms, respectively, and returns a set of the generated ids.List<org.openscience.cdk.interfaces.IAtomContainer>
getCircularSugarCandidates
(org.openscience.cdk.interfaces.IAtomContainer aMolecule) Extracts circular sugar moieties from the given molecule, according to the current settings for circular sugar detection.Returns a list of (unique) SMILES strings representing the circular sugar structures an input molecule is scanned for in circular sugar detection.protected int
getExocyclicOxygenAtomCount
(org.openscience.cdk.interfaces.IAtomContainer aRingToTest, org.openscience.cdk.interfaces.IAtomContainer anOriginalMolecule) Returns the number of attached exocyclic oxygen atoms of a given ring in the original atom container.double
Returns the currently set minimum ratio of attached, exocyclic, single-bonded oxygen atoms to the number of atoms in the candidate circular sugar structure to reach in order to be classified as a sugar moiety if the number of exocyclic oxygen atoms should be evaluated.int
Returns the currently set maximum number of carbon atoms a linear sugar candidate can have in order to be detected as a sugar moiety (and subsequently be removed).int
Returns the currently set minimum number of carbon atoms a linear sugar candidate must have in order to be detected as a sugar moiety (and subsequently be removed).List<org.openscience.cdk.interfaces.IAtomContainer>
getLinearSugarCandidates
(org.openscience.cdk.interfaces.IAtomContainer aMolecule) Extracts linear sugar moieties from the given molecule, according to the current settings for linear sugar detection.Returns a list of (unique) SMILES strings representing the linear sugar structures an input molecule is scanned for in linear sugar detection.int
getNumberOfCircularAndLinearSugars
(org.openscience.cdk.interfaces.IAtomContainer aMolecule) Detects circular and linear sugar moieties in the given molecule according to the current settings for circular and linear sugar detection and returns the number of detected moieties.int
getNumberOfCircularSugars
(org.openscience.cdk.interfaces.IAtomContainer aMolecule) Detects circular sugar moieties in the given molecule according to the current settings for circular sugar detection and returns the number of detected moieties.int
getNumberOfLinearSugars
(org.openscience.cdk.interfaces.IAtomContainer aMolecule) Detects linear sugar moieties in the given molecule according to the current settings for linear sugar detection and returns the number of detected moieties.Returns the current setting for how to determine whether a substructure that gets disconnected from the molecule during the removal of a sugar moiety should be preserved or can get removed along with the sugar.int
Returns the current threshold of e.g.boolean
hasCircularOrLinearSugars
(org.openscience.cdk.interfaces.IAtomContainer aMolecule) Detects circular and linear sugar moieties in the given molecule, according to the current settings for sugar detection.boolean
hasCircularSugarInPatternsList
(org.openscience.cdk.interfaces.IAtomContainer aCircularSugar) Checks whether the given circular sugar is already present in the list of circular sugar structures an input molecule is scanned for in circular sugar detection.boolean
hasCircularSugars
(org.openscience.cdk.interfaces.IAtomContainer aMolecule) Detects circular sugar moieties in the given molecule, according to the current settings for circular sugar detection.protected boolean
hasGlycosidicBond
(org.openscience.cdk.interfaces.IAtomContainer aRingToTest, org.openscience.cdk.interfaces.IAtomContainer anOriginalMolecule) Checks all exocyclic connections of the given ring to detect an O-glycosidic bond.boolean
hasLinearSugarInPatternsList
(org.openscience.cdk.interfaces.IAtomContainer aLinearSugar) Checks whether the given linear sugar is already present in the list of linear sugar structures an input molecule is scanned for in linear sugar detection.boolean
hasLinearSugars
(org.openscience.cdk.interfaces.IAtomContainer aMolecule) Detects linear sugar moieties in the given molecule, according to the current settings for linear sugar detection.protected boolean
isMoleculeEmptyAfterRemovalOfThisRing
(org.openscience.cdk.interfaces.IAtomContainer aRing, org.openscience.cdk.interfaces.IAtomContainer aMolecule) Checks whether the given molecule would be empty after removal of the given ring.boolean
isQualifiedForGlycosidicBondExemption
(org.openscience.cdk.interfaces.IAtomContainer aMolecule) Tests whether the given molecule qualifies for the glycosidic bond exemption.boolean
isTerminal
(org.openscience.cdk.interfaces.IAtomContainer aSubstructure, org.openscience.cdk.interfaces.IAtomContainer aParentMolecule, List<org.openscience.cdk.interfaces.IAtomContainer> aCandidateList) Checks whether the given substructure is terminal (i.e.boolean
isTooSmallToPreserve
(org.openscience.cdk.interfaces.IAtomContainer aMolecule) Checks whether the given molecule or structure is too small to be kept according to the current preservation mode and threshold setting.static List<org.openscience.cdk.interfaces.IAtomContainer>
partitionAndSortUnconnectedFragments
(org.openscience.cdk.interfaces.IAtomContainer aMolecule) Utility method that can be used to partition the unconnected structures in an atom container, e.g.void
postProcessAfterRemoval
(org.openscience.cdk.interfaces.IAtomContainer aMolecule) Clears away too small structures (according to the set preservation mode) from the given molecule.protected void
printAllMoleculesAsSmiles
(List<org.openscience.cdk.interfaces.IAtomContainer> aMoleculeList) Prints all molecules in the given list as unique SMILES representations to System.out.List<org.openscience.cdk.interfaces.IAtomContainer>
removeAndReturnCircularAndLinearSugars
(org.openscience.cdk.interfaces.IAtomContainer aMolecule, boolean aShouldBeCloned) Removes circular and linear sugar moieties from the given atom container and returns the resulting aglycon (at list index 0) and the removed sugar moieties.List<org.openscience.cdk.interfaces.IAtomContainer>
removeAndReturnCircularSugars
(org.openscience.cdk.interfaces.IAtomContainer aMolecule, boolean aShouldBeCloned) Removes circular sugar moieties from the given atom container and returns the resulting aglycon (at list index 0) and the removed circular sugar moieties.List<org.openscience.cdk.interfaces.IAtomContainer>
removeAndReturnLinearSugars
(org.openscience.cdk.interfaces.IAtomContainer aMolecule, boolean aShouldBeCloned) Removes linear sugar moieties from the given atom container and returns the resulting aglycon (at list index 0) and the removed linear sugar moieties.protected void
removeAtomsOfCircularSugarsFromCandidates
(List<org.openscience.cdk.interfaces.IAtomContainer> aCandidateList, org.openscience.cdk.interfaces.IAtomContainer aParentMolecule) Removes all atoms belonging to possible circular sugars, as returned by the method for initial circular sugar detection, from the given linear sugar candidates.protected void
removeCandidatesContainingCircularSugars
(List<org.openscience.cdk.interfaces.IAtomContainer> aCandidateList, org.openscience.cdk.interfaces.IAtomContainer aParentMolecule) Deprecated.boolean
removeCircularAndLinearSugars
(org.openscience.cdk.interfaces.IAtomContainer aMolecule) Removes circular and linear sugar moieties from the given atom container.org.openscience.cdk.interfaces.IAtomContainer
removeCircularAndLinearSugars
(org.openscience.cdk.interfaces.IAtomContainer aMolecule, boolean aShouldBeCloned) Removes circular and linear sugar moieties from the given atom container.boolean
removeCircularSugarFromPatternsList
(String aSmilesCode) Allows to remove a sugar ring pattern (represented as SMILES string) from the list of circular sugar structures an input molecule is scanned for in circular sugar detection.boolean
removeCircularSugarFromPatternsList
(org.openscience.cdk.interfaces.IAtomContainer aCircularSugar) Allows to remove a sugar ring from the list of circular sugar structures an input molecule is scanned for in circular sugar detection.boolean
removeCircularSugars
(org.openscience.cdk.interfaces.IAtomContainer aMolecule) Removes circular sugar moieties from the given atom container.org.openscience.cdk.interfaces.IAtomContainer
removeCircularSugars
(org.openscience.cdk.interfaces.IAtomContainer aMolecule, boolean aShouldBeCloned) Removes circular sugar moieties from the given atom container.protected void
removeCircularSugarsFromCandidates
(List<org.openscience.cdk.interfaces.IAtomContainer> aCandidateList, org.openscience.cdk.interfaces.IAtomContainer aParentMolecule) Deprecated.protected void
removeCyclicAtomsFromSugarCandidates
(List<org.openscience.cdk.interfaces.IAtomContainer> aCandidateList, org.openscience.cdk.interfaces.IAtomContainer aMolecule) Removes all atoms that are part of a cycle from the given linear sugar candidates.boolean
removeLinearSugarFromPatternsList
(String aSmilesCode) Allows to remove a linear sugar pattern (represented as SMILES string) from the list of linear sugar structures an input molecule is scanned for in linear sugar detection.boolean
removeLinearSugarFromPatternsList
(org.openscience.cdk.interfaces.IAtomContainer aLinearSugar) Allows to remove a linear sugar pattern from the list of linear sugar structures an input molecule is scanned for in linear sugar detection.boolean
removeLinearSugars
(org.openscience.cdk.interfaces.IAtomContainer aMolecule) Removes linear sugar moieties from the given atom container.org.openscience.cdk.interfaces.IAtomContainer
removeLinearSugars
(org.openscience.cdk.interfaces.IAtomContainer aMolecule, boolean aShouldBeCloned) Removes linear sugar moieties from the given atom container.List<org.openscience.cdk.interfaces.IAtomContainer>
removeSugarCandidates
(org.openscience.cdk.interfaces.IAtomContainer aMolecule, List<org.openscience.cdk.interfaces.IAtomContainer> aCandidateList) Removes the given sugar moieties (or substructures in general) from the given molecule and returns the removed moieties (not the aglycon!).protected void
removeSugarCandidatesWithCyclicAtoms
(List<org.openscience.cdk.interfaces.IAtomContainer> aCandidateList, org.openscience.cdk.interfaces.IAtomContainer aMolecule) Deprecated.protected List<org.openscience.cdk.interfaces.IAtomContainer>
removeTooSmallAndTooLargeCandidates
(List<org.openscience.cdk.interfaces.IAtomContainer> aCandidateList) Discards all linear sugar candidates that are too small or too big according to the current settings.void
removeTooSmallDisconnectedStructures
(org.openscience.cdk.interfaces.IAtomContainer aMolecule) Removes all unconnected fragments that are too small to keep according to the current preservation mode and threshold setting.void
Sets all settings to their default values (see public static constants or enquire via get/is methods).static org.openscience.cdk.interfaces.IAtomContainer
selectBiggestUnconnectedFragment
(org.openscience.cdk.interfaces.IAtomContainer aMolecule) Utility method that can be used to select the 'biggest' (i.e.static org.openscience.cdk.interfaces.IAtomContainer
selectHeaviestUnconnectedFragment
(org.openscience.cdk.interfaces.IAtomContainer aMolecule) Utility method that can be used to select the 'heaviest' (i.e.void
setAddPropertyToSugarContainingMoleculesSetting
(boolean aBoolean) Sets the option to add a respective property to given atom containers that contain (or contained before removal) sugar moieties.void
setDetectCircularSugarsOnlyWithEnoughExocyclicOxygenAtomsSetting
(boolean aBoolean) Sets the option to only detect (and subsequently remove) circular sugars that have a sufficient number of attached, exocyclic, single-bonded oxygen atoms.void
setDetectCircularSugarsOnlyWithOGlycosidicBondSetting
(boolean aBoolean) Sets the option to only detect (and subsequently remove) circular sugar moieties that are attached to the parent structure or other sugar moieties via an O-glycosidic bond.void
setDetectCircularSugarsWithKetoGroupsSetting
(boolean aBoolean) Sets the option to detect potential sugar cycles with keto groups as circular sugars in circular sugar detection.void
setDetectLinearAcidicSugarsSetting
(boolean aBoolean) Sets the option to include linear acidic sugar patterns in the linear sugar structures used for initial detection of linear sugars in a given molecule.void
setDetectLinearSugarsInRingsSetting
(boolean aBoolean) Sets the option to detect linear sugar structures that are part of a ring.void
setDetectSpiroRingsAsCircularSugarsSetting
(boolean aBoolean) Sets the option to include spiro rings in the initial set of detected rings considered for circular sugar detection.void
setExocyclicOxygenAtomsToAtomsInRingRatioThresholdSetting
(double aDouble) Sets the minimum ratio of attached, exocyclic, single-bonded oxygen atoms to the number of atoms in the candidate circular sugar structure to reach in order to be classified as a sugar moiety if the number of exocyclic oxygen atoms should be evaluated.void
setLinearSugarCandidateMaxSizeSetting
(int aMaxSize) Sets the maximum number of carbon atoms a linear sugar candidate can have in order to be detected as a sugar moiety (and subsequently be removed).void
setLinearSugarCandidateMinSizeSetting
(int aMinSize) Sets the minimum number of carbon atoms a linear sugar candidate must have in order to be detected as a sugar moiety (and subsequently be removed).void
Sets the preservation mode for structures that get disconnected by sugar removal and the preservation mode threshold is set to the default value of the given enum constant.void
setPreservationModeThresholdSetting
(int aThreshold) Sets the preservation mode threshold, i.e.void
setRemoveOnlyTerminalSugarsSetting
(boolean aBoolean) Sets the option to remove only terminal sugar moieties, i.e.protected List<org.openscience.cdk.interfaces.IAtomContainer>
splitEtherEsterAndPeroxideBonds
(List<org.openscience.cdk.interfaces.IAtomContainer> aCandidateList) Splits all ether, ester, and peroxide bonds in the given linear sugar candidates and separates those that get disconnected in the process.protected void
splitOverlappingCandidatesPseudoRandomly
(List<org.openscience.cdk.interfaces.IAtomContainer> aCandidateList) Deprecated.protected void
All linear sugar patterns represented by atom containers in the respective list are sorted, parsed into actual pattern objects and stored in the internal list for initial linear sugar detection.
-
Field Details
-
CONTAINS_CIRCULAR_SUGAR_PROPERTY_KEY
Property key to indicate that the structure contains (or contained before removal) circular sugar moieties.- See Also:
-
CONTAINS_LINEAR_SUGAR_PROPERTY_KEY
Property key to indicate that the structure contains (or contained before removal) linear sugar moieties.- See Also:
-
CONTAINS_SUGAR_PROPERTY_KEY
Property key to indicate that the structure contains (or contained before removal) sugar moieties (of any kind).- See Also:
-
INDEX_PROPERTY_KEY
Property key for index that is added to any IAtom object in a given IAtomContainer object for internal unique identification of the respective IAtom object. For internal use only.- See Also:
-
IS_SPIRO_ATOM_PROPERTY_KEY
Key for property that is added to IAtom objects that connect a spiro ring system for identification and preservation of these atoms in the removal process. For internal use only.- See Also:
-
LINEAR_SUGARS_SMILES
Linear sugar structures represented as SMILES codes. An input molecule is scanned for these substructures for the detection of linear sugars. This set consists of multiple aldoses, ketoses, and sugar alcohols with sizes between 3 and 7 carbons. Additional structures can be added or specific ones removed from the set at run-time using the respective methods. -
LINEAR_ACIDIC_SUGARS_SMILES
Linear acidic sugar structures represented as SMILES codes. These can be optionally added to the linear sugar structures used for initial detection of linear sugars in an input molecule. -
CIRCULAR_SUGARS_SMILES
Circular sugar structures represented as SMILES codes. The isolated rings of an input molecule are matched with these structures for the detection of circular sugars. The structures listed here only represent the circular part of sugar rings (i.e. one oxygen atom and multiple carbon atoms). Common exocyclic structures like hydroxy groups are not part of the patterns and therefore not part of the detected circular sugar moieties. The set includes tetrahydrofuran, tetrahydropyran, and oxepane to match furanoses, pyranoses, and heptoses per default. It can be configured at run-time using the respective methods. -
DETECT_CIRCULAR_SUGARS_ONLY_WITH_O_GLYCOSIDIC_BOND_DEFAULT
public static final boolean DETECT_CIRCULAR_SUGARS_ONLY_WITH_O_GLYCOSIDIC_BOND_DEFAULTDefault setting for whether only circular sugar moieties that are attached to the parent structure or other sugar moieties via an O-glycosidic bond should be detected and subsequently removed (default: false).- See Also:
-
REMOVE_ONLY_TERMINAL_SUGARS_DEFAULT
public static final boolean REMOVE_ONLY_TERMINAL_SUGARS_DEFAULTDefault setting for whether only terminal sugar moieties should be removed, i.e. those that when removed do not cause a split of the remaining molecular structure into two or more disconnected substructures (default: true).- See Also:
-
PRESERVATION_MODE_DEFAULT
Default setting for how to determine whether a substructure that gets disconnected from the molecule during the removal of a sugar moiety should be preserved or can get removed along with the sugar. (default: preserve all structures that consist of 5 or more heavy atoms). The set option plays a major role in discriminating terminal and non-terminal sugar moieties. The minimum value to reach for the respective characteristic to judge by is set in an additional option and all enum constants have their own default values. See the PreservationModeOption enum. -
DETECT_CIRCULAR_SUGARS_ONLY_WITH_ENOUGH_EXOCYCLIC_OXYGEN_ATOMS_DEFAULT
public static final boolean DETECT_CIRCULAR_SUGARS_ONLY_WITH_ENOUGH_EXOCYCLIC_OXYGEN_ATOMS_DEFAULTDefault setting for whether detected circular sugar candidates must have a sufficient number of attached, single-bonded exocyclic oxygen atoms in order to be detected as a sugar moiety (default: true). The 'sufficient number' is defined in another option / default setting.- See Also:
-
EXOCYCLIC_OXYGEN_ATOMS_TO_ATOMS_IN_RING_RATIO_THRESHOLD_DEFAULT
public static final double EXOCYCLIC_OXYGEN_ATOMS_TO_ATOMS_IN_RING_RATIO_THRESHOLD_DEFAULTDefault setting for the minimum ratio of attached exocyclic, single-bonded oxygen atoms to the number of atoms in the candidate circular sugar structure to reach in order to be classified as a sugar moiety if the number of exocyclic oxygen atoms should be evaluated (default: 0.5 so at a minimum 3 connected, exocyclic oxygen atoms for a six-membered ring, for example).- See Also:
-
DETECT_LINEAR_SUGARS_IN_RINGS_DEFAULT
public static final boolean DETECT_LINEAR_SUGARS_IN_RINGS_DEFAULTDefault setting for whether linear sugar structures that are part of a ring should be detected (default: false). This setting is important for e.g. macrocycles that contain sugars or pseudosugars.- See Also:
-
ADD_PROPERTY_TO_SUGAR_CONTAINING_MOLECULES_DEFAULT
public static final boolean ADD_PROPERTY_TO_SUGAR_CONTAINING_MOLECULES_DEFAULTDefault setting for whether to add a property to given atom containers to indicate that the structure contains (or contained before removal) sugar moieties (default: true). See property keys in the public constants of this class.- See Also:
-
LINEAR_SUGAR_CANDIDATE_MIN_SIZE_DEFAULT
public static final int LINEAR_SUGAR_CANDIDATE_MIN_SIZE_DEFAULTDefault setting for the minimum number of carbon atoms a linear sugar candidate must have in order to be detected as a sugar moiety (and subsequently be removed, default: 4, inclusive).- See Also:
-
LINEAR_SUGAR_CANDIDATE_MAX_SIZE_DEFAULT
public static final int LINEAR_SUGAR_CANDIDATE_MAX_SIZE_DEFAULTDefault setting for the maximum number of carbon atoms a linear sugar candidate can have in order to be detected as a sugar moiety (and subsequently be removed, default: 7, inclusive).- See Also:
-
DETECT_LINEAR_ACIDIC_SUGARS_DEFAULT
public static final boolean DETECT_LINEAR_ACIDIC_SUGARS_DEFAULTDefault setting for whether to include the linear acidic sugar patterns in the linear sugar structures used for initial detection of linear sugars in a given molecule (default: false).- See Also:
-
DETECT_SPIRO_RINGS_AS_CIRCULAR_SUGARS_DEFAULT
public static final boolean DETECT_SPIRO_RINGS_AS_CIRCULAR_SUGARS_DEFAULTDefault setting for whether to include spiro rings in the initial set of detected rings considered for circular sugar detection (default: false). If the option is turned on and a spiro sugar ring is removed, its atom connecting it to another ring is preserved.- See Also:
-
DETECT_CIRCULAR_SUGARS_WITH_KETO_GROUPS_DEFAULT
public static final boolean DETECT_CIRCULAR_SUGARS_WITH_KETO_GROUPS_DEFAULTDefault setting for whether sugar-like rings that have keto groups should also be detected as circular sugars (default: false). The general rule specified in the original algorithm description is that every potential sugar cycle with an exocyclic double or triple bond is excluded from circular sugar detection. If this option is turned on, an exemption to this rule is made for potential sugar cycles having keto groups. Also, the double-bound oxygen atoms will then count for the number of connected oxygen atoms and the algorithm will not regard how many keto groups are attached to the cycle (might be only one, might be that all connected oxygen atoms are double-bound). If this option is turned off (default), every sugar-like cycle with an exocyclic double or triple bond will be excluded from the detected circular sugars, as it is specified in the original algorithm description.- See Also:
-
ESTER_SMARTS_PATTERN
public static final org.openscience.cdk.smarts.SmartsPattern ESTER_SMARTS_PATTERNDaylight SMARTS pattern for matching ester bonds between linear sugars. Defines an aliphatic carbon atom connected to a double-bonded oxygen atom and a single-bonded oxygen atom that must not be in a ring and is connected to another aliphatic carbon atom via a single bond. The oxygen atom must not be in a ring to avoid breaking circular sugars. -
ETHER_SMARTS_PATTERN
public static final org.openscience.cdk.smarts.SmartsPattern ETHER_SMARTS_PATTERNDaylight SMARTS pattern for matching ether bonds between linear sugars. Defines an aliphatic carbon atom connected via single bond to an oxygen atom that must not be in a ring and is in turn connected to another aliphatic carbon atom. The oxygen atom must not be in a ring to avoid breaking circular sugars. This pattern also matches ester bonds which is why esters must be detected and processed before ethers. -
PEROXIDE_SMARTS_PATTERN
public static final org.openscience.cdk.smarts.SmartsPattern PEROXIDE_SMARTS_PATTERNDaylight SMARTS pattern for matching peroxide bonds between linear sugars. Defines an aliphatic carbon atom connected via single bond to an oxygen atom that must not be in a ring and is connected to another oxygen atom of the same kind, followed by another aliphatic carbon atom. Even tough it is highly unlikely for a peroxide bond to be in a ring, every ring should be preserved.
-
-
Constructor Details
-
SugarRemovalUtility
public SugarRemovalUtility(org.openscience.cdk.interfaces.IChemObjectBuilder aBuilder) throws NullPointerException Sole constructor of this class. All settings are set to their default values (see public static constants or enquire via get/is methods). To change these settings, use the respective 'setXY()' methods.- Parameters:
aBuilder
- IChemObjectBuilder for i.a. parsing SMILES strings into atom containers- Throws:
NullPointerException
-
-
Method Details
-
getLinearSugarPatternsList
Returns a list of (unique) SMILES strings representing the linear sugar structures an input molecule is scanned for in linear sugar detection. The returned list represents the current state of this list, i.e. externally added structures are included, externally removed structures not, and the linear acidic sugar structures are only included if the respective option is activated. The default structures can also be retrieved from the respective public constant of this class.
Note: If a structure cannot be parsed into a SMILES string, it is excluded from the list.- Returns:
- a list of SMILES codes
-
hasLinearSugarInPatternsList
public boolean hasLinearSugarInPatternsList(org.openscience.cdk.interfaces.IAtomContainer aLinearSugar) throws NullPointerException, IllegalArgumentException Checks whether the given linear sugar is already present in the list of linear sugar structures an input molecule is scanned for in linear sugar detection. It is checked whether it is isomorph to any linear sugar pattern already present in the list. Note that the return value 'false' does not guarantee its safe addition to the pattern list because is may not comply with other requirements detailed in the 'add'-method. Also note that the linear acidic sugar patterns are only included here if the respective option is turned on.- Parameters:
aLinearSugar
- the linear sugar pattern to check for- Returns:
- true if the linear sugar is already present in the linear sugar pattern list
- Throws:
NullPointerException
- if the given molecule is 'null'IllegalArgumentException
- if the given atom container is empty or its isomorphism with the already present structures could not be determined
-
getCircularSugarPatternsList
Returns a list of (unique) SMILES strings representing the circular sugar structures an input molecule is scanned for in circular sugar detection. The returned list represents the current state of this list, i.e. externally added structures are included, externally removed structures are not. The default structures can also be retrieved from the respective public constant of this class.
Note: If a structure cannot be parsed into a SMILES string, it is excluded from the list.- Returns:
- a list of SMILES codes
-
hasCircularSugarInPatternsList
public boolean hasCircularSugarInPatternsList(org.openscience.cdk.interfaces.IAtomContainer aCircularSugar) throws NullPointerException, IllegalArgumentException Checks whether the given circular sugar is already present in the list of circular sugar structures an input molecule is scanned for in circular sugar detection. It is checked whether it is isomorph to any circular sugar pattern already present in the list. Note that the return value 'false' does not guarantee its safe addition to the pattern list because is may not comply with other requirements detailed in the 'add'-method.- Parameters:
aCircularSugar
- the circular sugar pattern to check for- Returns:
- true if the circular sugar is already present in the circular sugar pattern list
- Throws:
NullPointerException
- if the given molecule is 'null'IllegalArgumentException
- if the given atom container is empty or its isomorphism with the already present structures could not be determined
-
areOnlyCircularSugarsWithOGlycosidicBondDetected
public boolean areOnlyCircularSugarsWithOGlycosidicBondDetected()Specifies whether only circular sugar moieties that are attached to the parent structure or other sugar moieties via an O-glycosidic bond should be detected and subsequently removed.- Returns:
- true if only circular sugar moieties connected via a glycosidic bond are removed according to the current settings
-
areOnlyTerminalSugarsRemoved
public boolean areOnlyTerminalSugarsRemoved()Specifies whether only terminal sugar moieties should be removed, i.e. those that when removed do not cause a split of the remaining molecular structure into two or more disconnected substructures.- Returns:
- true if only terminal sugar moieties are removed according to the current settings
-
getPreservationModeSetting
Returns the current setting for how to determine whether a substructure that gets disconnected from the molecule during the removal of a sugar moiety should be preserved or can get removed along with the sugar. This can e.g. be judged by its heavy atom count or its molecular weight or it can be specified that all structures are to be preserved. If too small / too light structures are discarded, an additional threshold is specified in the preservation mode threshold setting that the structures have to reach in order to be preserved (i.e. to be judged 'big/heavy enough').- Returns:
- a PreservationModeOption enum object representing the current setting
-
getPreservationModeThresholdSetting
public int getPreservationModeThresholdSetting()Returns the current threshold of e.g. molecular weight or heavy atom count (depending on the currently set preservation mode) a substructure that gets disconnected from the molecule by the removal of a sugar moiety has to reach in order to be preserved and not discarded.- Returns:
- an integer specifying the currently set threshold (either specified in Da or number of heavy atoms)
-
areOnlyCircularSugarsWithEnoughExocyclicOxygenAtomsDetected
public boolean areOnlyCircularSugarsWithEnoughExocyclicOxygenAtomsDetected()Specifies whether detected circular sugar candidates must have a sufficient number of attached exocyclic oxygen atoms in order to be detected as a sugar moiety. If this option is set, the circular sugar candidates have to reach an additionally specified minimum ratio of said oxygen atoms to the number of atoms in the respective ring in order to be seen as a sugar ring and being subsequently removed. See exocyclic oxygen atoms to atoms in ring ratio threshold setting.- Returns:
- true, if the ratio of attached, exocyclic, single-bonded oxygen atoms to the number of atoms in the candidate sugar ring is evaluated at circular sugar detection according to the current settings
-
getExocyclicOxygenAtomsToAtomsInRingRatioThresholdSetting
public double getExocyclicOxygenAtomsToAtomsInRingRatioThresholdSetting()Returns the currently set minimum ratio of attached, exocyclic, single-bonded oxygen atoms to the number of atoms in the candidate circular sugar structure to reach in order to be classified as a sugar moiety if the number of exocyclic oxygen atoms should be evaluated.- Returns:
- the minimum ratio of attached oxygen atoms to the number of atoms in the sugar ring; A value of e.g. 0.5 means that a six-membered sugar ring needs at least 3 attached oxygen atoms to be classified as a circular sugar moiety
-
areLinearSugarsInRingsDetected
public boolean areLinearSugarsInRingsDetected()Specifies whether linear sugar structures that are part of a ring should be detected according to the current settings. This setting is important for e.g. macrocycles that contain sugars or pseudosugars.
Note that potential circular sugar candidates (here always including spiro sugar rings also) are filtered from linear sugar candidates, even with this setting turned on.- Returns:
- true if linear sugars in rings are detected and removed with the current settings
-
arePropertiesAddedToSugarContainingMolecules
public boolean arePropertiesAddedToSugarContainingMolecules()Specifies whether a property is added to given atom containers that contain (or contained before removal) sugar moieties. See property keys in the public constants of this class.- Returns:
- true if properties are added to the given atom containers
-
getLinearSugarCandidateMinSizeSetting
public int getLinearSugarCandidateMinSizeSetting()Returns the currently set minimum number of carbon atoms a linear sugar candidate must have in order to be detected as a sugar moiety (and subsequently be removed).- Returns:
- the set minimum carbon atom count of detected linear sugars (inclusive)
-
getLinearSugarCandidateMaxSizeSetting
public int getLinearSugarCandidateMaxSizeSetting()Returns the currently set maximum number of carbon atoms a linear sugar candidate can have in order to be detected as a sugar moiety (and subsequently be removed).- Returns:
- the set maximum carbon atom count of detected linear sugars (inclusive)
-
areLinearAcidicSugarsDetected
public boolean areLinearAcidicSugarsDetected()Specifies whether linear acidic sugar patterns are currently included in the linear sugar structures used for initial detection of linear sugars in a given molecule.- Returns:
- true if acidic sugars are detected
-
areSpiroRingsDetectedAsCircularSugars
public boolean areSpiroRingsDetectedAsCircularSugars()Specifies whether spiro rings are included in the initial set of detected rings considered for circular sugar detection.
Note for linear sugar detection: Here, the spiro rings will always be filtered along with the potential circular sugar candidates.- Returns:
- true if spiro rings can be detected as circular sugars with the current settings
-
areCircularSugarsWithKetoGroupsDetected
public boolean areCircularSugarsWithKetoGroupsDetected()Specifies whether potential sugar cycles with keto groups are detected in circular sugar detection. The general rule specified in the original algorithm description is that every potential sugar cycle with an exocyclic double or triple bond is excluded from circular sugar detection. If this option is turned on, an exemption to this rule is made for potential sugar cycles having keto groups. Also, the double-bound oxygen atoms will then count for the number of connected oxygen atoms and the algorithm will not regard how many keto groups are attached to the cycle (might be only one, might be that all connected oxygen atoms are double-bound). If this option is turned off, every sugar-like cycle with an exocyclic double or triple bond will be excluded from the detected circular sugars, as it is specified in the original algorithm description.- Returns:
- true if potential sugar cycles having keto groups are detected in circular sugar detection
-
addCircularSugarToPatternsList
public boolean addCircularSugarToPatternsList(org.openscience.cdk.interfaces.IAtomContainer aCircularSugar) throws NullPointerException, IllegalArgumentException Allows to add an additional sugar ring to the list of circular sugar structures an input molecule is scanned for in circular sugar detection. The given structure must not be isomorph to the already present ones and it must contain exactly one isolated ring without any exocyclic moieties because only the isolated rings of an input structure are matched with the circular sugar patterns.- Parameters:
aCircularSugar
- an atom container representing only one isolated sugar ring- Returns:
- true if the addition was successful
- Throws:
NullPointerException
- if the given atom container is 'null'IllegalArgumentException
- if the given atom container is empty or does represent a molecule that contains no isolated ring, more than one isolated ring, consists of more structures than one isolated ring or is isomorph to a circular sugar structure already present
-
addCircularSugarToPatternsList
public boolean addCircularSugarToPatternsList(String aSmilesCode) throws NullPointerException, IllegalArgumentException Allows to add an additional sugar ring (represented as a SMILES string) to the list of circular sugar structures an input molecule is scanned for in circular sugar detection. The given structure must not be isomorph to the already present ones and it must contain exactly one isolated ring without any exocyclic moieties because only the isolated rings of an input structure are matched with the circular sugar patterns.- Parameters:
aSmilesCode
- a SMILES code representation of a molecule consisting of only one isolated sugar ring- Returns:
- true if the addition was successful
- Throws:
NullPointerException
- if the given string is 'null'IllegalArgumentException
- if the given SMILES string is empty or does represent a molecule that contains no isolated ring, more than one isolated ring, consists of more structures than one isolated ring, is isomorph to a circular sugar structure already present or if the given SMILES string cannot be parsed into a molecular structure
-
addLinearSugarToPatternsList
public boolean addLinearSugarToPatternsList(org.openscience.cdk.interfaces.IAtomContainer aLinearSugar) throws NullPointerException, IllegalArgumentException Allows to add an additional linear sugar to the list of linear sugar structures an input molecule is scanned for in linear sugar detection. The given structure must not be isomorph to the already present ones or the patterns for circular sugars.
Note: If the given structure contains cycles, the option to detect linear sugars in rings needs to be enabled to detect its matches entirely. Otherwise, all circular substructures of the 'linear sugars' will not be detected.
Additional note: If the given structure is isomorph to a default linear acidic sugar pattern, it may be added here when the option to detect these structures is turned off but will be removed from the pattern list if the option is turned on and off again after this addition.- Parameters:
aLinearSugar
- an atom container representing a molecular structure to search for at linear sugar detection- Returns:
- true if the addition was successful
- Throws:
NullPointerException
- if given atom container is 'null'IllegalArgumentException
- if the given atom container is empty or is isomorph to a linear sugar structure already present or a circular sugar pattern
-
addLinearSugarToPatternsList
public boolean addLinearSugarToPatternsList(String aSmilesCode) throws NullPointerException, IllegalArgumentException Allows to add an additional linear sugar (represented as SMILES string) to the list of linear sugar structures an input molecule is scanned for in linear sugar detection. The given structure must not be isomorph to the already present ones or the patterns for circular sugars.
Note: If the given structure contains cycles, the option to detect linear sugars in rings needs to be enabled to detect its matches entirely. Otherwise, all circular substructures of the 'linear sugars' will not be detected.
Additional note: If the given structure is isomorph to a default linear acidic sugar pattern, it may be added here when the option to detect these structures is turned off but will be removed from the pattern list if the option is turned on and off again after this addition.- Parameters:
aSmilesCode
- a SMILES code representation of a molecular structure to search for- Returns:
- true if the addition was successful
- Throws:
NullPointerException
- if given string is 'null'IllegalArgumentException
- if the given SMILES string is empty or does represent a molecule that is isomorph to a linear sugar structure already present or a circular sugar pattern or if it cannot be parsed into a molecular structure
-
removeCircularSugarFromPatternsList
public boolean removeCircularSugarFromPatternsList(String aSmilesCode) throws NullPointerException, IllegalArgumentException Allows to remove a sugar ring pattern (represented as SMILES string) from the list of circular sugar structures an input molecule is scanned for in circular sugar detection. The given character string must be a valid SMILES notation and be isomorph to one of the currently used structure patterns. Example usage: Pass the argument "C1CCOC1" (tetrahydrofuran) to stop detecting furanoses in the circular sugar detection algorithm.- Parameters:
aSmilesCode
- a SMILES code representation of a structure present in the circular sugar pattern list- Returns:
- true if the removal was successful
- Throws:
NullPointerException
- if the given string is 'null'IllegalArgumentException
- if the given SMILES string is empty or cannot be parsed into a molecule or the given structure cannot be found in the circular sugar pattern list
-
removeCircularSugarFromPatternsList
public boolean removeCircularSugarFromPatternsList(org.openscience.cdk.interfaces.IAtomContainer aCircularSugar) throws NullPointerException, IllegalArgumentException Allows to remove a sugar ring from the list of circular sugar structures an input molecule is scanned for in circular sugar detection. The given molecule must be isomorph to one of the currently used structure patterns. Example usage: Pass an atom container object representing the structure of tetrahydrofuran to stop detecting furanoses in the circular sugar detection algorithm.- Parameters:
aCircularSugar
- a molecule isomorph to a structure present in the circular sugar pattern list- Returns:
- true if the removal was successful
- Throws:
NullPointerException
- if the given atom container is 'null'IllegalArgumentException
- if the given atom container is empty or its structure is not isomorph to a circular sugar pattern structure in use
-
removeLinearSugarFromPatternsList
public boolean removeLinearSugarFromPatternsList(String aSmilesCode) throws NullPointerException, IllegalArgumentException Allows to remove a linear sugar pattern (represented as SMILES string) from the list of linear sugar structures an input molecule is scanned for in linear sugar detection. The given character string must be a valid SMILES notation and be isomorph to one of the currently used structure patterns. Example usage: Pass the argument "C(C(C=O)O)O" (aldotriose) to stop detecting such small aldoses in the linear sugar detection algorithm. Please note that adjusting the linear sugar candidate minimum and maximum sizes can be more straightforward than removing patterns here.
Note: If the linear acidic sugars are currently included in the linear sugar pattern structures, individual structures of this group can be removed here.- Parameters:
aSmilesCode
- a SMILES code representation of a structure present in the linear sugar pattern list- Returns:
- true if the removal was successful
- Throws:
NullPointerException
- if the given string is 'null'IllegalArgumentException
- if the given SMILES string is empty or cannot be parsed into a molecule or the given structure cannot be found in the linear sugar pattern list
-
removeLinearSugarFromPatternsList
public boolean removeLinearSugarFromPatternsList(org.openscience.cdk.interfaces.IAtomContainer aLinearSugar) throws NullPointerException, IllegalArgumentException Allows to remove a linear sugar pattern from the list of linear sugar structures an input molecule is scanned for in linear sugar detection. The given molecule must be isomorph to one of the currently used structure patterns. Example usage: Pass an atom container object representing the structure of aldotriose to stop detecting such small aldoses in the linear sugar detection algorithm. Please note that adjusting the linear sugar candidate minimum and maximum sizes can be more straightforward than removing patterns here.
Note: If the linear acidic sugars are currently included in the linear sugar pattern structures, individual structures of this group can be removed here.- Parameters:
aLinearSugar
- a molecule isomorph to a structure present in the linear sugar pattern list- Returns:
- true if the removal was successful
- Throws:
NullPointerException
- if the given atom container is 'null'IllegalArgumentException
- if the given atom container is empty or its structure is not isomorph to a linear sugar pattern structure in use
-
clearCircularSugarPatternsList
public void clearCircularSugarPatternsList()Clears all the circular sugar structures an input molecule is scanned for in circular sugar detection. -
clearLinearSugarPatternsList
public void clearLinearSugarPatternsList()Clears all the linear sugar structures an input molecule is scanned for in linear sugar detection. If the detection of linear acidic sugars is turned on, it is turned off in this method and these structures are also cleared from the linear sugar patterns. -
setDetectCircularSugarsOnlyWithOGlycosidicBondSetting
public void setDetectCircularSugarsOnlyWithOGlycosidicBondSetting(boolean aBoolean) Sets the option to only detect (and subsequently remove) circular sugar moieties that are attached to the parent structure or other sugar moieties via an O-glycosidic bond.- Parameters:
aBoolean
- true, if only circular sugar moieties connected via a glycosidic bond should be detected (and removed)
-
setRemoveOnlyTerminalSugarsSetting
public void setRemoveOnlyTerminalSugarsSetting(boolean aBoolean) Sets the option to remove only terminal sugar moieties, i.e. those that when removed do not cause a split of the remaining molecular structure into two or more disconnected substructures.- Parameters:
aBoolean
- true, if only terminal sugar moieties should be removed
-
setPreservationModeSetting
public void setPreservationModeSetting(SugarRemovalUtility.PreservationModeOption anOption) throws NullPointerException Sets the preservation mode for structures that get disconnected by sugar removal and the preservation mode threshold is set to the default value of the given enum constant. The preservation mode option specifies how to determine whether a substructure that gets disconnected from the molecule during the removal of a sugar moiety should be preserved or can get removed along with the sugar. This can e.g. be judged by its heavy atom count or its molecular weight or it can be specified that all structures are to be preserved. The available options can be selected from the PreservationModeOption enum. If too small / too light structures are discarded, an additional threshold is specified in the preservation mode threshold setting that the structures have to reach in order to be preserved (i.e. to be judged 'big/heavy enough'). This threshold is set to the default value of the given enum constant in this method.
Note that if the option "ALL" is combined with the removal of only terminal moieties, even the smallest attached structure will prevent the removal of a sugar. The most important consequence is that circular sugars with any hydroxy groups will not be removed because these are not considered as part of the sugar moiety.- Parameters:
anOption
- the selected preservation mode option- Throws:
NullPointerException
- if the given option is 'null'
-
setPreservationModeThresholdSetting
Sets the preservation mode threshold, i.e. the molecular weight or heavy atom count (depending on the currently set preservation mode) a substructure that gets disconnected from the molecule during the removal of a sugar moiety has to reach in order to be kept and not removed along with the sugar. If the preservation mode is set to "HEAVY_ATOM_COUNT", the threshold is interpreted as the needed minimum number of heavy atoms and if it is set to "MOLECUAL_WEIGHT", the threshold is interpreted as minimum molecular weight in Da.
Notes: A threshold of zero can be set here but it is recommended to choose the preservation mode "ALL" instead. On the other hand, if the preservation mode is set to "ALL", this threshold is automatically set to zero and this method will throw an exception if a non-zero value is given.- Parameters:
aThreshold
- the new threshold- Throws:
IllegalArgumentException
- if the preservation mode is currently set to preserve all structures or the threshold is negative
-
setDetectCircularSugarsOnlyWithEnoughExocyclicOxygenAtomsSetting
public void setDetectCircularSugarsOnlyWithEnoughExocyclicOxygenAtomsSetting(boolean aBoolean) Sets the option to only detect (and subsequently remove) circular sugars that have a sufficient number of attached, exocyclic, single-bonded oxygen atoms. If this option is set, the circular sugar candidates have to reach an additionally specified minimum ratio of said oxygen atoms to the number of atoms in the respective ring in order to be seen as a sugar ring and being subsequently removed. See exocyclic oxygen atoms to atoms in ring ratio threshold setting. If this option is re-activated, the previously set threshold is used again or the default value if no custom threshold has been set before.- Parameters:
aBoolean
- true, if the ratio of attached, exocyclic, single-bonded oxygen atoms to the number of atoms in the candidate sugar ring should be evaluated at circular sugar detection
-
setExocyclicOxygenAtomsToAtomsInRingRatioThresholdSetting
public void setExocyclicOxygenAtomsToAtomsInRingRatioThresholdSetting(double aDouble) throws IllegalArgumentException Sets the minimum ratio of attached, exocyclic, single-bonded oxygen atoms to the number of atoms in the candidate circular sugar structure to reach in order to be classified as a sugar moiety if the number of exocyclic oxygen atoms should be evaluated.
A ratio of e.g. 0.5 means that a six-membered candidate sugar ring needs to have at least 3 attached, exocyclic single-bonded oxygen atoms in order to be classified as a circular sugar sugar.
A zero value can be given if the option to remove only sugar rings with a sufficient number of exocyclic oxygen atoms is activated, but it is recommended to turn this option of instead. In the other case, when the option is turned off, this method will throw an exception if a non-zero value is passed.
Note: The normally present oxygen atom within a sugar ring is included in the number of ring atoms. So setting the threshold to 1.0 implies that at least one of the carbon atoms in the ring has two attached oxygen atoms. In general, the threshold can be set to values higher than 1.0 but it does not make a lot of sense.- Parameters:
aDouble
- the new ratio threshold- Throws:
IllegalArgumentException
- if the given number is infinite, 'NaN' or smaller than 0 or if the ratio is not evaluated under the current settings and a non-zero value is passed
-
setDetectLinearSugarsInRingsSetting
public void setDetectLinearSugarsInRingsSetting(boolean aBoolean) Sets the option to detect linear sugar structures that are part of a ring. This setting is important for e.g. macrocycles that contain sugars or pseudosugars.
Note that potential circular sugar candidates (here always including spiro sugar rings also) are filtered from linear sugar candidates, even with this setting turned on.- Parameters:
aBoolean
- true, if linear sugar structures that are part of a ring should be detected (and removed)
-
setAddPropertyToSugarContainingMoleculesSetting
public void setAddPropertyToSugarContainingMoleculesSetting(boolean aBoolean) Sets the option to add a respective property to given atom containers that contain (or contained before removal) sugar moieties. See property keys in the public constants of this class.- Parameters:
aBoolean
- true, if properties should be added to the given atom containers
-
setLinearSugarCandidateMinSizeSetting
Sets the minimum number of carbon atoms a linear sugar candidate must have in order to be detected as a sugar moiety (and subsequently be removed). This minimum is inclusive and does not affect the initial detection of linear sugars. Only at the end of the algorithm, linear sugar candidates that are too small are discarded.
Note: It is not tested whether the given minimum size is actually smaller than the set maximum size to allow a user-friendly adjustment of these parameters without having to adhere to a certain order of operations.- Parameters:
aMinSize
- the new minimum size (inclusive) of linear sugars detected, interpreted as carbon atom count- Throws:
IllegalArgumentException
- if the given size is smaller than one
-
setLinearSugarCandidateMaxSizeSetting
Sets the maximum number of carbon atoms a linear sugar candidate can have in order to be detected as a sugar moiety (and subsequently be removed). This maximum is inclusive and does not affect the initial detection of linear sugars. Only at the end of the algorithm, linear sugar candidates that are too big are discarded.
Note: It is not tested whether the given maximum size is actually greater than the set minimum size to allow a user-friendly adjustment of these parameters without having to adhere to a certain order of operations.- Parameters:
aMaxSize
- the new maximum size (inclusive) of linear sugars detected, interpreted as carbon atom count- Throws:
IllegalArgumentException
- if the given size is smaller than one
-
setDetectLinearAcidicSugarsSetting
public void setDetectLinearAcidicSugarsSetting(boolean aBoolean) Sets the option to include linear acidic sugar patterns in the linear sugar structures used for initial detection of linear sugars in a given molecule. If the option is turned on, the linear acidic sugar patterns are added to the linear sugar patterns list and can be retrieved and configured in the same way as the 'normal' linear sugar patterns. If the option is turned off, they are all removed again from the linear sugar patterns list.- Parameters:
aBoolean
- true, if linear acidic sugar patterns should be included in the linear sugar structures used for initial detection of linear sugars
-
setDetectSpiroRingsAsCircularSugarsSetting
public void setDetectSpiroRingsAsCircularSugarsSetting(boolean aBoolean) Sets the option to include spiro rings in the initial set of detected rings considered for circular sugar detection. If the option is turned on, spiro atoms connected two spiro rings will be protected if a spiro sugar ring is removed. In the opposite case, spiro rings will be filtered from the set of isolated cycles detected in the given molecule.
Note for linear sugar detection: Here, the spiro rings will always be filtered along with the potential circular sugar candidates.- Parameters:
aBoolean
- true, if spiro rings should be detectable as circular sugars
-
setDetectCircularSugarsWithKetoGroupsSetting
public void setDetectCircularSugarsWithKetoGroupsSetting(boolean aBoolean) Sets the option to detect potential sugar cycles with keto groups as circular sugars in circular sugar detection. The general rule specified in the original algorithm description is that every potential sugar cycle with an exocyclic double or triple bond is excluded from circular sugar detection. If this option is turned on, an exemption to this rule is made for potential sugar cycles having keto groups. Also, the double-bound oxygen atoms will then count for the number of connected oxygen atoms and the algorithm will not regard how many keto groups are attached to the cycle (might be only one, might be that all connected oxygen atoms are double-bound). If this option is turned off, every sugar-like cycle with an exocyclic double or triple bond will be excluded from the detected circular sugars, as it is specified in the original algorithm description.- Parameters:
aBoolean
- true, if circular sugars with keto groups should be detected
-
restoreDefaultSettings
public void restoreDefaultSettings()Sets all settings to their default values (see public static constants or enquire via get/is methods). This includes the pattern lists for linear and circular sugars. To call this method is equivalent to using the constructor of this class. -
hasLinearSugars
public boolean hasLinearSugars(org.openscience.cdk.interfaces.IAtomContainer aMolecule) throws NullPointerException Detects linear sugar moieties in the given molecule, according to the current settings for linear sugar detection. It is not influenced by the setting specifying whether only terminal sugar moieties should be removed and not by the set preservation mode. Therefore, this method will return true even if only non-terminal linear sugar moieties are detected.
If the respective option is set, a property will be added to the given atom container specifying whether it contains (linear) sugar moieties or not (in addition to the return value of this method).- Parameters:
aMolecule
- the atom container to scan for the presence of linear sugar moieties- Returns:
- true, if the given molecule contains linear sugar moieties
- Throws:
NullPointerException
- if the given atom container is 'null'
-
hasCircularSugars
public boolean hasCircularSugars(org.openscience.cdk.interfaces.IAtomContainer aMolecule) throws NullPointerException Detects circular sugar moieties in the given molecule, according to the current settings for circular sugar detection. It is not influenced by the setting specifying whether only terminal sugar moieties should be removed and not by the set preservation mode. Therefore, this method will return true even if only non-terminal circular sugar moieties are detected.
If the respective option is set, a property will be added to the given atom container specifying whether it contains (circular) sugar moieties or not (in addition to the return value of this method).- Parameters:
aMolecule
- the atom container to scan for the presence of circular sugar moieties- Returns:
- true, if the given molecule contains circular sugar moieties
- Throws:
NullPointerException
- if the given atom container is 'null'
-
hasCircularOrLinearSugars
public boolean hasCircularOrLinearSugars(org.openscience.cdk.interfaces.IAtomContainer aMolecule) throws NullPointerException Detects circular and linear sugar moieties in the given molecule, according to the current settings for sugar detection. It is not influenced by the setting specifying whether only terminal sugar moieties should be removed and not by the set preservation mode. Therefore, this method will return true even if only non-terminal sugar moieties are detected.
If the respective option is set, a property will be added to the given atom container specifying whether it contains (circular/linear/any kind of) sugar moieties or not (in addition to the return value of this method).- Parameters:
aMolecule
- the atom container to scan for the presence of sugar moieties- Returns:
- true, if the given molecule contains sugar moieties of any kind (circular or linear)
- Throws:
NullPointerException
- if the given atom container is 'null'
-
isQualifiedForGlycosidicBondExemption
public boolean isQualifiedForGlycosidicBondExemption(org.openscience.cdk.interfaces.IAtomContainer aMolecule) throws NullPointerException Tests whether the given molecule qualifies for the glycosidic bond exemption. This is true for molecules that practically are single-cycle circular sugars, meaning that the molecule is empty if the sugar ring is detected and removed according to the current settings. These molecules or sugar rings do not need to have a glycosidic bond in order to be detected as a sugar ring if the option to only detect those circular sugars that have one is activated. This exemption was introduced because these molecules do not contain any other structure to bind to via a glycosidic bond.
Note: It is checked whether the sugar ring really does not have a glycosidic bond.- Parameters:
aMolecule
- the molecule to check- Returns:
- true, if the given molecule qualifies for the exemption (it only has one sugar cycle, is empty after its removal, and does not have a glycosidic bond)
- Throws:
NullPointerException
- if the given atom container is 'null'
-
getNumberOfCircularSugars
public int getNumberOfCircularSugars(org.openscience.cdk.interfaces.IAtomContainer aMolecule) throws NullPointerException Detects circular sugar moieties in the given molecule according to the current settings for circular sugar detection and returns the number of detected moieties. It is not influenced by the setting specifying whether only terminal sugar moieties should be removed and not by the set preservation mode. Therefore, the return value of this method will include non-terminal moieties at all times (and terminal ones also).- Parameters:
aMolecule
- the atom container to scan for the presence of circular sugar moieties- Returns:
- an integer representing the number of detected circular sugar moieties in the given molecule
- Throws:
NullPointerException
- if the given atom container is 'null'
-
getNumberOfLinearSugars
public int getNumberOfLinearSugars(org.openscience.cdk.interfaces.IAtomContainer aMolecule) throws NullPointerException Detects linear sugar moieties in the given molecule according to the current settings for linear sugar detection and returns the number of detected moieties. It is not influenced by the setting specifying whether only terminal sugar moieties should be removed and not by the set preservation mode. Therefore, the return value of this method will include non-terminal moieties at all times (and terminal ones also).- Parameters:
aMolecule
- the atom container to scan for the presence of linear sugar moieties- Returns:
- an integer representing the number of detected linear sugar moieties in the given molecule
- Throws:
NullPointerException
- if the given atom container is 'null'
-
getNumberOfCircularAndLinearSugars
public int getNumberOfCircularAndLinearSugars(org.openscience.cdk.interfaces.IAtomContainer aMolecule) throws NullPointerException Detects circular and linear sugar moieties in the given molecule according to the current settings for circular and linear sugar detection and returns the number of detected moieties. It is not influenced by the setting specifying whether only terminal sugar moieties should be removed and not by the set preservation mode. Therefore, the return value of this method will include non-terminal moieties at all times (and terminal ones also).- Parameters:
aMolecule
- the atom container to scan for the presence of circular and linear sugar moieties- Returns:
- an integer representing the number of detected circular and linear sugar moieties in the given molecule
- Throws:
NullPointerException
- if the given atom container is 'null'
-
removeCircularSugars
public org.openscience.cdk.interfaces.IAtomContainer removeCircularSugars(org.openscience.cdk.interfaces.IAtomContainer aMolecule, boolean aShouldBeCloned) throws NullPointerException, CloneNotSupportedException, IllegalArgumentException Removes circular sugar moieties from the given atom container. Which substructures are removed depends on the settings for circular sugar detection, the setting specifying whether only terminal sugar moieties should be removed and on the set preservation mode.
If only terminal sugar moieties are to be removed, the detected circular sugars are one-by-one tested for whether they are terminal or not and removed if they are. The iteration starts anew after iterating over all candidates and stops if no terminal sugar was removed in one whole iteration. If only terminal sugar moieties are removed from the molecule, any disconnected structure resulting from a removal step must be too small to keep according to the set preservation mode option and the set threshold and is cleared away.
If all the circular sugar moieties are to be removed from the query molecule (including non-terminal ones), those disconnected structures that are too small are only cleared once at the end of the routine.
In the latter case, the deglycosylated molecule may consist of two or more disconnected structures when returned, whereas in the former case, the returned structure always consists of one connected structure.
If the given molecule consists only of circular sugars, an empty atom container is returned.
Spiro atoms connecting a removed circular sugar moiety to another cycle are preserved (if labelled by the respective property).
If the respective option is set, a property will be added to the given atom container specifying whether it contains (or contained before removal) circular sugar moieties or not.- Parameters:
aMolecule
- the molecule to remove circular sugar moieties fromaShouldBeCloned
- true, if the sugar moieties should not be removed from the given atom container but a clone of it should be generated and the sugars be removed from that- Returns:
- if the given atom container should NOT be cloned, this method returns the same given atom container after the sugar removal; the returned molecule may be unconnected if also non-terminal sugars are removed according to the settings and it may be empty if the resulting structure after sugar removal was too small to preserve due to the set preservation mode and the associated threshold (i.e. the molecule basically was a sugar)
- Throws:
NullPointerException
- if the given atom container is 'null'CloneNotSupportedException
- if the given atom container does not allow cloning (this function is needed in some steps of the algorithm)IllegalArgumentException
- if only terminal sugars should be removed but the given atom container already contains multiple, unconnected structures which makes the determination of terminal and non-terminal structures impossible
-
removeCircularSugars
public boolean removeCircularSugars(org.openscience.cdk.interfaces.IAtomContainer aMolecule) throws NullPointerException, CloneNotSupportedException, IllegalArgumentException Removes circular sugar moieties from the given atom container. Which substructures are removed depends on the settings for circular sugar detection, the setting specifying whether only terminal sugar moieties should be removed and on the set preservation mode.
If only terminal sugar moieties are to be removed, the detected circular sugars are one-by-one tested for whether they are terminal or not and removed if they are. The iteration starts anew after iterating over all candidates and stops if no terminal sugar was removed in one whole iteration. If only terminal sugar moieties are removed from the molecule, any disconnected structure resulting from a removal step must be too small to keep according to the set preservation mode option and the set threshold and is cleared away.
If all the circular sugar moieties are to be removed from the query molecule (including non-terminal ones), those disconnected structures that are too small are only cleared once at the end of the routine.
In the latter case, the deglycosylated molecule may consist of two or more disconnected structures after deglycosylation, whereas in the former case, the processed structure always consists of one connected structure.
If the given molecule consists only of circular sugars, an empty atom container is left after processing.
Spiro atoms connecting a removed circular sugar moiety to another cycle are preserved (if labelled by the respective property).
If the respective option is set, a property will be added to the given atom container specifying whether it contains (or contained before removal) circular sugar moieties or not.- Parameters:
aMolecule
- the molecule to remove circular sugar moieties from- Returns:
- true if sugar moieties were detected and removed
- Throws:
NullPointerException
- if the given atom container is 'null'CloneNotSupportedException
- if the given atom container does not allow cloning (this function is needed in some steps of the algorithm)IllegalArgumentException
- if only terminal sugars should be removed but the given atom container already contains multiple, unconnected structures which makes the determination of terminal and non-terminal structures impossible
-
removeAndReturnCircularSugars
public List<org.openscience.cdk.interfaces.IAtomContainer> removeAndReturnCircularSugars(org.openscience.cdk.interfaces.IAtomContainer aMolecule, boolean aShouldBeCloned) throws NullPointerException, CloneNotSupportedException, IllegalArgumentException Removes circular sugar moieties from the given atom container and returns the resulting aglycon (at list index 0) and the removed circular sugar moieties. Which substructures are removed depends on the settings for circular sugar detection, the setting specifying whether only terminal sugar moieties should be removed and on the set preservation mode.
If only terminal sugar moieties are to be removed, the detected circular sugars are one-by-one tested for whether they are terminal or not and removed if they are. The iteration starts anew after iterating over all candidates and stops if no terminal sugar was removed in one whole iteration. If only terminal sugar moieties are removed from the molecule, any disconnected structure resulting from a removal step must be too small to keep according to the set preservation mode option and the set threshold and is cleared away.
If all the circular sugar moieties are to be removed from the query molecule (including non-terminal ones), those disconnected structures that are too small are only cleared once at the end of the routine.
In the latter case, the deglycosylated molecule (aglycon at list index 0) may consist of two or more disconnected structures when returned, whereas in the former case, the returned structure always consists of one connected structure.
If the given molecule consists only of circular sugars, an empty atom container is returned at list index 0.
The returned sugar moieties that were removed from the molecule have invalid valences at atoms formerly bonded to the molecule core or to other sugar moieties while all valences on the aglycon at position 0 are saturated.
Spiro atoms connecting a removed circular sugar moiety to another cycle are preserved (if labelled by the respective property).
If the respective option is set, a property will be added to the given atom container at list index 0 specifying whether it contains (or contained before removal) circular sugar moieties or not.- Parameters:
aMolecule
- the molecule to remove circular sugar moieties fromaShouldBeCloned
- true, if the sugar moieties should not be removed from the given atom container but a clone of it should be generated and the sugars be removed from that; if true, the deglycosylated clone is returned at list index 0- Returns:
- a list of atom container objects representing the deglycosylated molecule at list index 0 and the removed circular sugar moieties at the remaining list positions. If the given atom container should NOT be cloned, the same given atom container object after sugar removal is returned at list index 0; the returned aglycon may be unconnected if also non-terminal sugars are removed according to the settings and it may be empty if the resulting structure after sugar removal was too small to preserve due to the set preservation mode and the associated threshold (i.e. the molecule basically was a sugar); the returned sugar moieties that were removed from the molecule have invalid valences at atoms formerly bonded to the molecule core or to other sugar moieties while all valences on the aglycon at position 0 are saturated
- Throws:
NullPointerException
- if the given atom container is 'null'CloneNotSupportedException
- if the given atom container does not allow cloning (this function is needed in some steps of the algorithm)IllegalArgumentException
- if only terminal sugars should be removed but the given atom container already contains multiple, unconnected structures which makes the determination of terminal and non-terminal structures impossible
-
removeLinearSugars
public org.openscience.cdk.interfaces.IAtomContainer removeLinearSugars(org.openscience.cdk.interfaces.IAtomContainer aMolecule, boolean aShouldBeCloned) throws NullPointerException, CloneNotSupportedException, IllegalArgumentException Removes linear sugar moieties from the given atom container. Which substructures are removed depends on the settings for linear sugar detection, the setting specifying whether only terminal sugar moieties should be removed and on the set preservation mode.
If only terminal sugar moieties are to be removed, the detected linear sugars are one-by-one tested for whether they are terminal or not and removed if they are. The iteration starts anew after iterating over all candidates and stops if no terminal sugar was removed in one whole iteration. If only terminal sugar moieties are removed from the molecule, any disconnected structure resulting from a removal step must be too small to keep according to the set preservation mode option and the set threshold and is cleared away.
If all the linear sugar moieties are to be removed from the query molecule (including non-terminal ones), those disconnected structures that are too small are only cleared once at the end of the routine.
In the latter case, the deglycosylated molecule may consist of two or more disconnected structures when returned, whereas in the former case, the returned structure always consists of one connected structure.
If the given molecule consists only of linear sugars, an empty atom container is returned.
If the respective option is set, a property will be added to the given atom container specifying whether it contains (or contained before removal) linear sugar moieties or not.- Parameters:
aMolecule
- the molecule to remove linear sugar moieties fromaShouldBeCloned
- true, if the sugar moieties should not be removed from the given atom container but a clone of it should be generated and the sugars be removed from that- Returns:
- if the given atom container should NOT be cloned, this method returns the same given atom container after the sugar removal; the returned molecule may be unconnected if also non-terminal sugars are removed according to the settings and it may be empty if the resulting structure after sugar removal was too small to preserve due to the set preservation mode and the associated threshold (i.e. the molecule basically was a sugar)
- Throws:
NullPointerException
- if the given atom container is 'null'CloneNotSupportedException
- if the given atom container does not allow cloning (this function is needed in some steps of the algorithm)IllegalArgumentException
- if only terminal sugars should be removed but the given atom container already contains multiple, unconnected structures which makes the determination of terminal and non-terminal structures impossible
-
removeLinearSugars
public boolean removeLinearSugars(org.openscience.cdk.interfaces.IAtomContainer aMolecule) throws NullPointerException, CloneNotSupportedException, IllegalArgumentException Removes linear sugar moieties from the given atom container. Which substructures are removed depends on the settings for linear sugar detection, the setting specifying whether only terminal sugar moieties should be removed and on the set preservation mode.
If only terminal sugar moieties are to be removed, the detected linear sugars are one-by-one tested for whether they are terminal or not and removed if they are. The iteration starts anew after iterating over all candidates and stops if no terminal sugar was removed in one whole iteration. If only terminal sugar moieties are removed from the molecule, any disconnected structure resulting from a removal step must be too small to keep according to the set preservation mode option and the set threshold and is cleared away.
If all the linear sugar moieties are to be removed from the query molecule (including non-terminal ones), those disconnected structures that are too small are only cleared once at the end of the routine.
In the latter case, the deglycosylated molecule may consist of two or more disconnected structures after deglycosylation, whereas in the former case, the processed structure always consists of one connected structure.
If the given molecule consists only of linear sugars, an empty atom container is left after processing.
If the respective option is set, a property will be added to the given atom container specifying whether it contains (or contained before removal) linear sugar moieties or not.- Parameters:
aMolecule
- the molecule to remove linear sugar moieties from- Returns:
- true if sugar moieties were detected and removed
- Throws:
NullPointerException
- if the given atom container is 'null'CloneNotSupportedException
- if the given atom container does not allow cloning (this function is needed in some steps of the algorithm)IllegalArgumentException
- if only terminal sugars should be removed but the given atom container already contains multiple, unconnected structures which makes the determination of terminal and non-terminal structures impossible
-
removeAndReturnLinearSugars
public List<org.openscience.cdk.interfaces.IAtomContainer> removeAndReturnLinearSugars(org.openscience.cdk.interfaces.IAtomContainer aMolecule, boolean aShouldBeCloned) throws NullPointerException, CloneNotSupportedException, IllegalArgumentException Removes linear sugar moieties from the given atom container and returns the resulting aglycon (at list index 0) and the removed linear sugar moieties. Which substructures are removed depends on the settings for linear sugar detection, the setting specifying whether only terminal sugar moieties should be removed and on the set preservation mode.
If only terminal sugar moieties are to be removed, the detected linear sugars are one-by-one tested for whether they are terminal or not and removed if they are. The iteration starts anew after iterating over all candidates and stops if no terminal sugar was removed in one whole iteration. If only terminal sugar moieties are removed from the molecule, any disconnected structure resulting from a removal step must be too small to keep according to the set preservation mode option and the set threshold and is cleared away.
If all the linear sugar moieties are to be removed from the query molecule (including non-terminal ones), those disconnected structures that are too small are only cleared once at the end of the routine.
In the latter case, the deglycosylated molecule (aglycon at list index 0) may consist of two or more disconnected structures when returned, whereas in the former case, the returned structure always consists of one connected structure.
If the given molecule consists only of linear sugars, an empty atom container is returned at list index 0.
The returned sugar moieties that were removed from the molecule have invalid valences at atoms formerly bonded to the molecule core or to other sugar moieties while all valences on the aglycon at position 0 are saturated.
If the respective option is set, a property will be added to the given atom container at list index 0 specifying whether it contains (or contained before removal) linear sugar moieties or not.- Parameters:
aMolecule
- the molecule to remove linear sugar moieties fromaShouldBeCloned
- true, if the sugar moieties should not be removed from the given atom container but a clone of it should be generated and the sugars be removed from that; if true, the deglycosylated clone is returned at list index 0- Returns:
- a list of atom container objects representing the deglycosylated molecule at list index 0 and the removed linear sugar moieties at the remaining list positions. If the given atom container should NOT be cloned, the same given atom container object after sugar removal is returned at list index 0; the returned aglycon may be unconnected if also non-terminal sugars are removed according to the settings and it may be empty if the resulting structure after sugar removal was too small to preserve due to the set preservation mode and the associated threshold (i.e. the molecule basically was a sugar); the returned sugar moieties that were removed from the molecule have invalid valences at atoms formerly bonded to the molecule core or to other sugar moieties while all valences on the aglycon at position 0 are saturated
- Throws:
NullPointerException
- if the given atom container is 'null'CloneNotSupportedException
- if the given atom container does not allow cloning (this function is needed in some steps of the algorithm)IllegalArgumentException
- if only terminal sugars should be removed but the given atom container already contains multiple, unconnected structures which makes the determination of terminal and non-terminal structures impossible
-
removeCircularAndLinearSugars
public org.openscience.cdk.interfaces.IAtomContainer removeCircularAndLinearSugars(org.openscience.cdk.interfaces.IAtomContainer aMolecule, boolean aShouldBeCloned) throws NullPointerException, CloneNotSupportedException, IllegalArgumentException Removes circular and linear sugar moieties from the given atom container. Which substructures are removed depends on the settings for circular and linear sugar detection, the setting specifying whether only terminal sugar moieties should be removed and on the set preservation mode.
If only terminal sugar moieties are to be removed, the detected sugars are one-by-one tested for whether they are terminal or not and removed if they are. The iteration starts anew after iterating over all candidates and stops if no terminal sugar was removed in one whole iteration. Important note: To ensure the removal also of linear sugars that only become terminal after removing one or more terminal circular sugar and vice-versa, multiple iterations of circular and linear sugar detection and removal are done here. Therefore, this method might in special cases return another aglycon (the 'true' aglycon) than e.g. a subsequent call to the methods for separate circular and linear sugar removal.
If only terminal sugar moieties are removed from the molecule, any disconnected structure resulting from a removal step must be too small to keep according to the preservation mode option and the set threshold and is cleared away.
If all the circular and linear sugars are to be removed from the query molecule (including non-terminal ones), those disconnected structures that are too small are only cleared once at the end of the routine.
In the latter case, the deglycosylated molecule may consist of two or more disconnected structures when returned, whereas in the former case, the returned structure always consists of one connected structure.
If the given molecule consists only of sugars, an empty atom container is returned.
Spiro atoms connecting a removed circular sugar moiety to another cycle are preserved (if labelled by the respective property).
If the respective option is set, a property will be added to the given atom container specifying whether it contains (or contained before removal) circular/linear/any kind of sugar moieties or not.- Parameters:
aMolecule
- the molecule to remove circular and linear sugar moieties fromaShouldBeCloned
- true, if the sugar moieties should not be removed from the given atom container but a clone of it should be generated and the sugars be removed from that- Returns:
- if the given atom container should NOT be cloned, this method returns the same given atom container after the sugar removal; the returned molecule may be unconnected if also non-terminal sugars are removed according to the settings and it may be empty if the resulting structure after sugar removal was too small to preserve due to the set preservation mode and the associated threshold (i.e. the molecule basically was a sugar)
- Throws:
NullPointerException
- if the given atom container is 'null'CloneNotSupportedException
- if the given atom container does not allow cloning (this function is needed in some steps of the algorithm)IllegalArgumentException
- if only terminal sugars should be removed but the given atom container already contains multiple, unconnected structures which makes the determination of terminal and non-terminal structures impossible
-
removeCircularAndLinearSugars
public boolean removeCircularAndLinearSugars(org.openscience.cdk.interfaces.IAtomContainer aMolecule) throws NullPointerException, CloneNotSupportedException, IllegalArgumentException Removes circular and linear sugar moieties from the given atom container. Which substructures are removed depends on the settings for circular and linear sugar detection, the setting specifying whether only terminal sugar moieties should be removed and on the set preservation mode.
If only terminal sugar moieties are to be removed, the detected sugars are one-by-one tested for whether they are terminal or not and removed if they are. The iteration starts anew after iterating over all candidates and stops if no terminal sugar was removed in one whole iteration. Important note: To ensure the removal also of linear sugars that only become terminal after removing one or more terminal circular sugar and vice-versa, multiple iterations of circular and linear sugar detection and removal are done here. Therefore, this method might in special cases return another aglycon (the 'true' aglycon) than e.g. a subsequent call to the methods for separate circular and linear sugar removal.
If only terminal sugar moieties are removed from the molecule, any disconnected structure resulting from a removal step must be too small to keep according to the preservation mode option and the set threshold and is cleared away.
If all the circular and linear sugars are to be removed from the query molecule (including non-terminal ones), those disconnected structures that are too small are only cleared once at the end of the routine.
In the latter case, the deglycosylated molecule may consist of two or more disconnected structures when returned, whereas in the former case, the returned structure always consists of one connected structure.
If the given molecule consists only of sugars, an empty atom container is returned.
Spiro atoms connecting a removed circular sugar moiety to another cycle are preserved (if labelled by the respective property).
If the respective option is set, a property will be added to the given atom container specifying whether it contains (or contained before removal) circular/linear/any kind of sugar moieties or not.- Parameters:
aMolecule
- the molecule to remove circular and linear sugar moieties from- Returns:
- the same given atom container after the sugar removal; the returned molecule may be unconnected if also non-terminal sugars are removed according to the settings and it may be empty if the resulting structure after sugar removal was too small to preserve due to the set preservation mode and the associated threshold (i.e. the molecule basically was a sugar)
- Throws:
NullPointerException
- if the given atom container is 'null'CloneNotSupportedException
- if the given atom container does not allow cloning (this function is needed in some steps of the algorithm)IllegalArgumentException
- if only terminal sugars should be removed but the given atom container already contains multiple, unconnected structures which makes the determination of terminal and non-terminal structures impossible
-
removeAndReturnCircularAndLinearSugars
public List<org.openscience.cdk.interfaces.IAtomContainer> removeAndReturnCircularAndLinearSugars(org.openscience.cdk.interfaces.IAtomContainer aMolecule, boolean aShouldBeCloned) throws NullPointerException, CloneNotSupportedException, IllegalArgumentException Removes circular and linear sugar moieties from the given atom container and returns the resulting aglycon (at list index 0) and the removed sugar moieties. Which substructures are removed depends on the settings for circular and linear sugar detection, the setting specifying whether only terminal sugar moieties should be removed and on the set preservation mode.
If only terminal sugar moieties are to be removed, the detected sugars are one-by-one tested for whether they are terminal or not and removed if they are. The iteration starts anew after iterating over all candidates and stops if no terminal sugar was removed in one whole iteration. Important note: To ensure the removal also of linear sugars that only become terminal after removing one or more terminal circular sugar and vice-versa, multiple iterations of circular and linear sugar detection and removal are done here. Therefore, this method might in special cases return another aglycon (the 'true' aglycon) than e.g. a subsequent call to the methods for separate circular and linear sugar removal.
If only terminal sugar moieties are removed from the molecule, any disconnected structure resulting from a removal step must be too small to keep according to the preservation mode option and the set threshold and is cleared away.
If all the circular and linear sugars are to be removed from the query molecule (including non-terminal ones), those disconnected structures that are too small are only cleared once at the end of the routine.
In the latter case, the deglycosylated molecule (aglycon at list index 0) may consist of two or more disconnected structures when returned, whereas in the former case, the returned structure always consists of one connected structure.
If the given molecule consists only of sugars, an empty atom container is returned at list index 0.
The returned sugar moieties that were removed from the molecule have invalid valences at atoms formerly bonded to the molecule core or to other sugar moieties while all valences on the aglycon at position 0 are saturated.
Spiro atoms connecting a removed circular sugar moiety to another cycle are preserved (if labelled by the respective property).
If the respective option is set, a property will be added to the given atom container at list index 0 specifying whether it contains (or contained before removal) circular/linear/any kind of sugar moieties or not.- Parameters:
aMolecule
- the molecule to remove circular and linear sugar moieties fromaShouldBeCloned
- true, if the sugar moieties should not be removed from the given atom container but a clone of it should be generated and the sugars be removed from that; if true, the deglycosylated clone is returned at list index 0- Returns:
- a list of atom container objects representing the deglycosylated molecule at list index 0 and the removed sugar moieties at the remaining list positions. If the given atom container should NOT be cloned, the same given atom container object after sugar removal is returned at list index 0; the returned aglycon may be unconnected if also non-terminal sugars are removed according to the settings and it may be empty if the resulting structure after sugar removal was too small to preserve due to the set preservation mode and the associated threshold (i.e. the molecule basically was a sugar); the returned sugar moieties that were removed from the molecule have invalid valences at atoms formerly bonded to the molecule core or to other sugar moieties while all valences on the aglycon at position 0 are saturated
- Throws:
NullPointerException
- if the given atom container is 'null'CloneNotSupportedException
- if the given atom container does not allow cloning (this function is needed in some steps of the algorithm)IllegalArgumentException
- if only terminal sugars should be removed but the given atom container already contains multiple, unconnected structures which makes the determination of terminal and non-terminal structures impossible
-
getCircularSugarCandidates
public List<org.openscience.cdk.interfaces.IAtomContainer> getCircularSugarCandidates(org.openscience.cdk.interfaces.IAtomContainer aMolecule) throws NullPointerException Extracts circular sugar moieties from the given molecule, according to the current settings for circular sugar detection. It is not influenced by the setting specifying whether only terminal sugar moieties should be removed and not by the set preservation mode. Therefore, this method will always return terminal and non-terminal moieties.- Parameters:
aMolecule
- the molecule to extract circular sugar moieties from- Returns:
- a list of substructures in the given molecule that are regarded as circular sugar moieties
- Throws:
NullPointerException
- if the given molecule is 'null'
-
getLinearSugarCandidates
public List<org.openscience.cdk.interfaces.IAtomContainer> getLinearSugarCandidates(org.openscience.cdk.interfaces.IAtomContainer aMolecule) throws NullPointerException Extracts linear sugar moieties from the given molecule, according to the current settings for linear sugar detection. It is not influenced by the setting specifying whether only terminal sugar moieties should be removed and not by the set preservation mode. Therefore, this method will always return terminal and non-terminal moieties.- Parameters:
aMolecule
- the molecule to extract linear sugar moieties from- Returns:
- a list of substructures in the given molecule that are regarded as linear sugar moieties
- Throws:
NullPointerException
- if the given molecule is 'null'
-
removeTooSmallDisconnectedStructures
public void removeTooSmallDisconnectedStructures(org.openscience.cdk.interfaces.IAtomContainer aMolecule) throws NullPointerException Removes all unconnected fragments that are too small to keep according to the current preservation mode and threshold setting. If all structures are too small, an empty atom container is returned.
This does not guarantee that the resulting atom container consists of only one connected structure. There might be multiple unconnected structures that are big enough to be preserved.- Parameters:
aMolecule
- the molecule to clean up; it might be empty after this method call but not null- Throws:
NullPointerException
- if the given molecule is 'null'
-
isTooSmallToPreserve
public boolean isTooSmallToPreserve(org.openscience.cdk.interfaces.IAtomContainer aMolecule) throws NullPointerException, UnsupportedOperationException Checks whether the given molecule or structure is too small to be kept according to the current preservation mode and threshold setting.- Parameters:
aMolecule
- the molecule to check- Returns:
- true, if the given structure is too small to be preserved
- Throws:
NullPointerException
- if the given molecule is 'null'UnsupportedOperationException
- if an unknown PreservationModeOption enum constant is set
-
isTerminal
public boolean isTerminal(org.openscience.cdk.interfaces.IAtomContainer aSubstructure, org.openscience.cdk.interfaces.IAtomContainer aParentMolecule, List<org.openscience.cdk.interfaces.IAtomContainer> aCandidateList) throws NullPointerException, IllegalArgumentException, CloneNotSupportedException Checks whether the given substructure is terminal (i.e. it can be removed without producing multiple unconnected structures in the remaining molecule) in the given parent molecule. To do this, the substructure and the parent molecule are cloned, the substructure is removed in the parent molecule clone and finally it is checked whether the parent molecule clone still consists of only one connected structure. If that is the case, the substructure is terminal. If the preservation mode is not set to 'preserve all structures', too small resulting fragments are cleared from the parent clone in between. These structures that are too small must also not be part of any other substructure in the given candidate list to avoid removing parts of other sugar candidates.
Note: This method only detects moieties that are immediately terminal. It will not deem terminal a sugar moiety that only becomes terminal after the removal of another sugar moiety, for example.- Parameters:
aSubstructure
- the substructure to check for whether it is terminalaParentMolecule
- the molecule the substructure is a part ofaCandidateList
- a list containing the detected sugar candidates to check whether atoms of other candidates would be cleared away if the given substructure was removed (which has to be avoided)- Returns:
- true, if the substructure is terminal
- Throws:
NullPointerException
- if any parameter is 'null'IllegalArgumentException
- if the substructure is not part of the parent molecule or if the parent molecule is already unconnected (i.e. consists of multiple, unconnected substructures)CloneNotSupportedException
- if one of the atom containers cannot be cloned
-
removeSugarCandidates
public List<org.openscience.cdk.interfaces.IAtomContainer> removeSugarCandidates(org.openscience.cdk.interfaces.IAtomContainer aMolecule, List<org.openscience.cdk.interfaces.IAtomContainer> aCandidateList) throws NullPointerException, IllegalArgumentException Removes the given sugar moieties (or substructures in general) from the given molecule and returns the removed moieties (not the aglycon!). The removal algorithm is the same for linear and circular sugars. The only settings influencing the removal are the option specifying whether to remove only terminal sugar moieties and the set preservation mode (because it influences the determination of terminal vs. non-terminal).
If only terminal sugar moieties are to be removed, the sugar candidates are one-by-one tested for whether they are terminal or not and removed if they are. The iteration starts anew after iterating over all candidates and stops if no terminal sugar was removed in one whole iteration. If only terminal sugar moieties are removed from the molecule, any disconnected structure resulting from a removal step must be too small to keep according to the preservation mode option and the set threshold and is cleared away.
If all the sugars are to be removed from the query molecule (including non-terminal ones), those disconnected structures that are too small are only cleared once at the end of the routine.
In the latter case, the deglycosylated molecule may consist of two or more disconnected structures after this method call, whereas in the former case, the remaining structure always consists of one connected structure.
Spiro atoms connecting a removed circular sugar moiety to another cycle are preserved (if labelled by the respective property).
Note that the deglycosylated core is not returned as part of the given list in this method.- Parameters:
aMolecule
- the molecule to remove the sugar candidates fromaCandidateList
- the list of sugar moieties in the given molecule- Returns:
- a list of atom container objects representing the removed sugar moieties; the returned sugar moieties that were removed from the molecule have invalid valences at atoms formerly bonded to the molecule core or to other sugar moieties while all valences on the aglycon (not in the list!) are saturated
- Throws:
NullPointerException
- if any parameter is 'null'IllegalArgumentException
- if at least one atom in the candidate list is not actually part of the molecule or if it cannot be cloned to determine whether it is terminal (if only terminal moieties are removed according to the current settings)
-
postProcessAfterRemoval
public void postProcessAfterRemoval(org.openscience.cdk.interfaces.IAtomContainer aMolecule) throws NullPointerException Clears away too small structures (according to the set preservation mode) from the given molecule. It may result in an empty atom container. Also, valid valences on the remaining molecule are generated by the addition of implicit hydrogen atoms to open valences.
Note: This method does not check whether a removed disconnected structure is part of a sugar candidate because in the case where only terminal structures are removed, this is checked elsewhere and in the case where all sugar candidates are removed, this method is not called in-between the removal steps.- Parameters:
aMolecule
- the molecule to post-process; might be empty after this method call- Throws:
NullPointerException
- if the given molecule is 'null'
-
selectBiggestUnconnectedFragment
public static org.openscience.cdk.interfaces.IAtomContainer selectBiggestUnconnectedFragment(org.openscience.cdk.interfaces.IAtomContainer aMolecule) throws NullPointerException Utility method that can be used to select the 'biggest' (i.e. the one with the highest heavy atom count) structure from an atom container containing multiple unconnected structures, e.g. after the removal of both terminal and non-terminal sugar moieties.
The properties of the given atom container (IAtomContainer.getProperties()) are transferred to the returned atom container.
Note: This method does not clear away structures that are too small. It is independent of all settings.- Parameters:
aMolecule
- the molecule to select the biggest structure from out of multiple unconnected structures- Returns:
- the biggest structure
- Throws:
NullPointerException
- if the given atom container is 'null' or the CDK ConnectivityChecker is unable to determine the unconnected structures
-
selectHeaviestUnconnectedFragment
public static org.openscience.cdk.interfaces.IAtomContainer selectHeaviestUnconnectedFragment(org.openscience.cdk.interfaces.IAtomContainer aMolecule) throws NullPointerException Utility method that can be used to select the 'heaviest' (i.e. the one with the highest molecular weight) structure from an atom container containing multiple unconnected structures, e.g. after the removal of both terminal and non-terminal sugar moieties.
The properties of the given atom container (IAtomContainer.getProperties()) are transferred to the returned atom container.
Note: This method does not clear away structures that are too small. It is independent of all settings.- Parameters:
aMolecule
- the molecule to select the heaviest structure from out of multiple unconnected structures- Returns:
- the heaviest structure
- Throws:
NullPointerException
- if the given atom container is 'null' or the CDK ConnectivityChecker is unable to determine the unconnected structures
-
partitionAndSortUnconnectedFragments
public static List<org.openscience.cdk.interfaces.IAtomContainer> partitionAndSortUnconnectedFragments(org.openscience.cdk.interfaces.IAtomContainer aMolecule) throws NullPointerException Utility method that can be used to partition the unconnected structures in an atom container, e.g. after the removal of both terminal and non-terminal sugar moieties, into a list of separate atom container objects and sort this list in decreasing order with the following criteria with decreasing priority: atom count, molecular weight, bond count and sum of bond orders.
Note: This method does not clear away structures that are too small. It is independent of all settings.- Parameters:
aMolecule
- the molecule whose unconnected structures to separate and sort- Returns:
- list of sorted atom containers representing the unconnected structures of the given molecule
- Throws:
NullPointerException
- if the given atom container is 'null'
-
addUniqueIndicesToAtoms
protected void addUniqueIndicesToAtoms(org.openscience.cdk.interfaces.IAtomContainer aMolecule) throws NullPointerException Adds an index as property to all atom objects of the given atom container to identify them uniquely within the atom container and its clones. This is required e.g. for the determination of terminal vs. non-terminal sugar moieties.- Parameters:
aMolecule
- the molecule that will be processed by the class- Throws:
NullPointerException
- if molecule is 'null'
-
generateSubstructureIdentifiers
protected Set<String> generateSubstructureIdentifiers(List<org.openscience.cdk.interfaces.IAtomContainer> aSubstructureList) throws NullPointerException, IllegalArgumentException Creates an identifier string for every substructures in the given list, based on the unique indices of the included atoms, respectively, and returns a set of the generated ids. It is only encoded which atoms are part of the respective substructure, no bond information etc. Used for a quick matching of substructures in the same molecule. The unique indices in every atom have to be set. Note: The returned set includes every id only once but duplicates are allowed in the input list.- Parameters:
aSubstructureList
- a list of substructures to create identifiers for- Returns:
- a set of the generated identifier strings
- Throws:
NullPointerException
- if the given list is 'null' (list elements may be null or empty, they will be skipped)IllegalArgumentException
- if the unique indices are not set in any non-null atom container of the list
-
generateSubstructureIdentifier
protected String generateSubstructureIdentifier(org.openscience.cdk.interfaces.IAtomContainer aSubstructure) throws NullPointerException, IllegalArgumentException Creates an identifier string for substructures of a molecule, based on the unique indices of the included atoms. It is only encoded which atoms are part of the substructure, no bond information etc. Used for a quick matching of substructures in the same molecule. The unique indices in every atom have to be set.- Parameters:
aSubstructure
- the substructure to create an identifier for- Returns:
- the identifier string
- Throws:
NullPointerException
- if the given substructure is 'null'IllegalArgumentException
- if the unique indices are not set
-
checkUniqueIndicesOfAtoms
protected boolean checkUniqueIndicesOfAtoms(org.openscience.cdk.interfaces.IAtomContainer aMolecule) throws NullPointerException, IllegalArgumentException Checks whether all atoms in the given molecule have a unique (in the given molecule) index as property. It checks the uniqueness of the detected indices but not whether there are numbers missing (the ids of this class are created as numbers starting from zero and growing in integer steps).- Parameters:
aMolecule
- the molecule to check- Returns:
- true if every atom has an index property that is unique in the given molecule
- Throws:
NullPointerException
- if the given molecule is 'null'IllegalArgumentException
- if the given molecule is empty
-
printAllMoleculesAsSmiles
protected void printAllMoleculesAsSmiles(List<org.openscience.cdk.interfaces.IAtomContainer> aMoleculeList) Prints all molecules in the given list as unique SMILES representations to System.out. Used for debugging and in test class.- Parameters:
aMoleculeList
- the list to print to console
-
detectPotentialSugarCycles
protected List<org.openscience.cdk.interfaces.IAtomContainer> detectPotentialSugarCycles(org.openscience.cdk.interfaces.IAtomContainer aMolecule, boolean anIncludeSpiroRings, boolean anIgnoreKetoGroups) throws NullPointerException Detects and returns cycles of the given molecule that are isolated (spiro rings included or not according to the boolean parameter), isomorph to the circular sugar patterns, and only have exocyclic single bonds (keto groups ignored or not according to the boolean parameter). These cycles are the general candidates for circular sugars that are filtered according to the other settings in the following steps. Spiro atoms are marked by a property.- Parameters:
aMolecule
- the molecule to extract potential circular sugars fromanIncludeSpiroRings
- specification whether spiro rings should be included in the detected potential sugar cycles or filtered out; for circular sugar detection this should be set according to the current 'detect spiro rings as circular sugars' setting; for filtering circular sugar candidates or their atoms during linear sugar detection, this should be set to 'true'anIgnoreKetoGroups
- specification whether potential sugar cycles with keto groups should be included in the returned list; for circular sugar detection this should be set according to the current 'detect circular sugars with keto groups' setting; for filtering circular sugar candidates or their atoms during linear sugar detection, this should be set to 'true'- Returns:
- a list of the potential sugar cycles
- Throws:
NullPointerException
- if the given molecule is 'null'
-
areAllExocyclicBondsSingle
protected boolean areAllExocyclicBondsSingle(org.openscience.cdk.interfaces.IAtomContainer aRingToTest, org.openscience.cdk.interfaces.IAtomContainer anOriginalMolecule, boolean anIgnoreKetoGroups) throws NullPointerException, IllegalArgumentException Checks whether all exocyclic bonds connected to a given ring fragment of a parent atom container are of single order. If the option to allow potential sugar cycles having keto groups is activated, this method also returns true if a cycle having a keto group is processed.
The method iterates over all cyclic atoms and all of their bonds. So the runtime scales linear with the number of cyclic atoms and their connected bonds. In principle, this method can be used also for non-cyclic substructures.
Note: It is not tested whether the original molecule is actually the parent of the ring to test.- Parameters:
aRingToTest
- the ring fragment to test; exocyclic bonds do not have to be included in the fragment but if it is a fused system of multiple rings, the internal interconnecting bonds of the different rings need to be included; all its atoms need to be exactly the same objects as in the second atom container parameteranOriginalMolecule
- the molecule that contains the ring under investigation; The exocyclic bonds will be queried from itanIgnoreKetoGroups
- true if this method should ignore keto groups, i.e. also return true if there are some attached to the cycle- Returns:
- true, if all exocyclic bonds connected to the ring are of single order
- Throws:
NullPointerException
- if one parameter is 'null'IllegalArgumentException
- if one parameter is empty
-
hasGlycosidicBond
protected boolean hasGlycosidicBond(org.openscience.cdk.interfaces.IAtomContainer aRingToTest, org.openscience.cdk.interfaces.IAtomContainer anOriginalMolecule) throws NullPointerException Checks all exocyclic connections of the given ring to detect an O-glycosidic bond. Checklist for glycosidic bond: Connected oxygen atom that is not in the ring, has two bonds that are both of single order and no bond partner is a hydrogen atom. This algorithm also classifies ester bonds as glycosidic bonds and any other bond type that meets the above criteria. Therefore, many 'non-classical, glycoside-like' connections are classified as O-glycosidic bonds.
Note: The 'ring' is not tested for whether it is circular or not. So theoretically, this method can also be used to detect glycosidic bonds of linear structures. BUT: The oxygen atom must not be part of the structure itself. Due to the processing of candidate linear sugar moieties this can make it difficult to use this method also for linear sugars.
Note: It is not tested whether the original molecule is actually the parent of the ring to test.- Parameters:
aRingToTest
- the candidate sugar ringanOriginalMolecule
- the molecule in which the ring is contained as a substructure to query the connected atoms from- Returns:
- true, if a glycosidic bond is detected
- Throws:
NullPointerException
- if one parameter is 'null'IllegalArgumentException
- if one parameter is empty
-
isMoleculeEmptyAfterRemovalOfThisRing
protected boolean isMoleculeEmptyAfterRemovalOfThisRing(org.openscience.cdk.interfaces.IAtomContainer aRing, org.openscience.cdk.interfaces.IAtomContainer aMolecule) throws NullPointerException, IllegalArgumentException, CloneNotSupportedException Checks whether the given molecule would be empty after removal of the given ring. Any remaining fragment will be cleared away if it is too small according to the set preservation mode option. The given parameters are not altered, clones of them are generated and processed. This method is intended to test for whether a molecule qualifies for the gylcosidic bond exemption.- Parameters:
aRing
- the ring to test whether its removal would result in an empty moleculeaMolecule
- the parent molecule- Returns:
- true if the parent molecule is empty after removal of the given ring and subsequent removal of too small remaining fragments
- Throws:
NullPointerException
- if any parameter is 'null'IllegalArgumentException
- if the given ring is not actually part of the given parent moleculeCloneNotSupportedException
- if the ring or the molecule cannot be cloned
-
getExocyclicOxygenAtomCount
protected int getExocyclicOxygenAtomCount(org.openscience.cdk.interfaces.IAtomContainer aRingToTest, org.openscience.cdk.interfaces.IAtomContainer anOriginalMolecule) throws NullPointerException Returns the number of attached exocyclic oxygen atoms of a given ring in the original atom container. The method iterates over all cyclic atoms and all of their connected atoms. So the runtime scales linear with the number of cyclic atoms and their connected atoms. The oxygen atoms are not tested for being attached by a single bond since in the algorithm, the determination whether a candidate sugar ring has only exocyclic single bonds precedes the calling of this method.
Note: The circularity of the given 'ring' is not tested, so this method could in theory also be used for linear structures. But his does not make much sense.
Note: This method does NOT check for hydroxy groups but for oxygen atoms. So e.g. the oxygen atom in a glycosidic bond is counted.
Note: It is not tested whether the original molecule is actually the parent of the ring to test.- Parameters:
aRingToTest
- the ring fragment to test; exocyclic bonds do not have to be included in the fragment but if it is a fused system of multiple rings, the internal interconnecting bonds of the different rings need to be included; all its atoms need to be exactly the same objects as in the second atom container parameter (they will be skipped otherwise)anOriginalMolecule
- the molecule that contains the ring under investigation; The exocyclic bonds will be queried from it- Returns:
- number of attached exocyclic oxygen atoms of the given ring
- Throws:
NullPointerException
- if a parameter is 'null'
-
doesRingHaveEnoughExocyclicOxygenAtoms
protected boolean doesRingHaveEnoughExocyclicOxygenAtoms(int aNumberOfAtomsInRing, int aNumberOfAttachedExocyclicOxygenAtoms) Simple decision-making function for deciding whether a candidate sugar ring has enough attached, single-bonded exocyclic oxygen atoms according to the set threshold. The given number of oxygen atoms is divided by the given number of atoms in the ring (should also contain the usually present oxygen atom in a sugar ring) and the resulting ratio is checked for being equal or higher than the currently set threshold.
Note: Only the number of atoms in the ring is checked for not being 0. No further parameter tests are implemented. If the number is 0, false is returned. No exceptions are thrown.- Parameters:
aNumberOfAtomsInRing
- number of atoms in the possible sugar ring, including the cyclic oxygen atomaNumberOfAttachedExocyclicOxygenAtoms
- number of attached exocyclic oxygen atoms of the ring under investigation (if zero, false is returned)- Returns:
- true, if the calculated ratio is equal to or higher than the currently set threshold
-
updateLinearSugarPatterns
protected void updateLinearSugarPatterns()All linear sugar patterns represented by atom containers in the respective list are sorted, parsed into actual pattern objects and stored in the internal list for initial linear sugar detection. To be called when a linear sugar patterns has been deleted or added to the list. It cannot directly be operated on the pattern objects because they cannot be sorted or represented in a human-readable format. -
detectLinearSugarCandidatesByPatternMatching
protected List<org.openscience.cdk.interfaces.IAtomContainer> detectLinearSugarCandidatesByPatternMatching(org.openscience.cdk.interfaces.IAtomContainer aMolecule) throws NullPointerException Initial detection of linear sugar candidates by substructure search for the linear sugar patterns in the given molecule. All 'unique' matches are returned as atom container objects. this means that the same substructure will not be included multiple times but the substructures may overlap.- Parameters:
aMolecule
- the molecule to search for linear sugar candidates- Returns:
- a list of possibly overlapping substructures from the given molecule matching the internal linear sugar patterns
- Throws:
NullPointerException
- if the given molecule is 'null'
-
combineOverlappingCandidates
protected List<org.openscience.cdk.interfaces.IAtomContainer> combineOverlappingCandidates(List<org.openscience.cdk.interfaces.IAtomContainer> aCandidateList) throws NullPointerException Combines all overlapping (i.e. sharing the same atoms or bonds) structures in the given list into one atom container, respectively, to return distinct, non-overlapping substructures. Second step of linear sugar detection. Note: The returned substructures can grow very big. This addressed in the third step. The parameter list is not altered and a completely new list returned.- Parameters:
aCandidateList
- a list of possibly overlapping substructures from the same atom container object- Returns:
- a list of distinct, non-overlapping substructures after combining every formerly overlapping structure
- Throws:
NullPointerException
- if the given list or one of its elements is 'null'
-
splitOverlappingCandidatesPseudoRandomly
@Deprecated protected void splitOverlappingCandidatesPseudoRandomly(List<org.openscience.cdk.interfaces.IAtomContainer> aCandidateList) throws NullPointerException Deprecated.Alternative method to combining overlapping substructures after the initial detection: Splitting them pseudo-randomly. The method iterates the given substructures and notes the indices of atoms already visited. If an already visited atom appears again in another substructure (- > overlap), it is removed from the respective candidate. In the end, all candidates that got disconnected by this, are separated into distinct atom container objects. The result are distinct, non-overlapping, connected substructures. Note: The returned substructures can be very small, even single-atom candidates can result. Another problem is that this method is practically an unpredictable black-box because the order of the substructures is not predictable. Note: here, the given list is altered, unlike in some other methods! Therefore, the list is not returned again.- Parameters:
aCandidateList
- a list of possibly overlapping substructures from the same atom container object- Throws:
NullPointerException
- if the given list is 'null'
-
splitEtherEsterAndPeroxideBonds
protected List<org.openscience.cdk.interfaces.IAtomContainer> splitEtherEsterAndPeroxideBonds(List<org.openscience.cdk.interfaces.IAtomContainer> aCandidateList) throws NullPointerException Splits all ether, ester, and peroxide bonds in the given linear sugar candidates and separates those that get disconnected in the process. Third step of linear sugar detection. This step was introduced because the linear sugar candidates returned by the combination method can be very big and contain connected sugar chains that should be detected as separate candidates. The detection is done using SMARTS patterns that are public constants of this class. The parameter list is not altered and a completely new list returned.- Parameters:
aCandidateList
- a list of potential sugar substructures from the same atom container object- Returns:
- a new list of candidates where all ether, ester, and peroxide bonds have been split and disconnected candidates separated
- Throws:
NullPointerException
- if the given list is 'null'
-
removeAtomsOfCircularSugarsFromCandidates
protected void removeAtomsOfCircularSugarsFromCandidates(List<org.openscience.cdk.interfaces.IAtomContainer> aCandidateList, org.openscience.cdk.interfaces.IAtomContainer aParentMolecule) throws NullPointerException Removes all atoms belonging to possible circular sugars, as returned by the method for initial circular sugar detection, from the given linear sugar candidates. Fourth step of linear sugar detection. The linear sugar patterns also match parts of circular sugar, so this step has to be done to ensure the separate treatment of circular and linear sugars. After the removal, disconnected candidates are separated into new candidates. Note: here, the given list is altered, unlike in some other methods! Therefore, the list is not returned again. Note also that it is not checked whether the given parent molecule is actually the parent of the given substructures.- Parameters:
aCandidateList
- a list of potential sugar substructures from the same atom container objectaParentMolecule
- the molecule that is currently scanned for linear sugars to detect its circular sugars- Throws:
NullPointerException
- if any parameter is 'null'
-
removeCircularSugarsFromCandidates
@Deprecated protected void removeCircularSugarsFromCandidates(List<org.openscience.cdk.interfaces.IAtomContainer> aCandidateList, org.openscience.cdk.interfaces.IAtomContainer aParentMolecule) throws NullPointerException Deprecated.Alternative method to removing all atoms that belong to circular sugars from the linear sugar candidates: Removing only complete, intact circular sugar rings from the candidates. The method detects potential circular sugars in the candidates and compares them to the potential sugar cycles in the parent molecule. If there is a match, the respective sugar ring is removed from the candidate. In the end, all candidates that got disconnected by this, are separated into distinct atom container objects. This method was deprecated because it relies on the circular sugars being intact in the linear sugar candidates which is not always the case and can lead to removal of parts of circular sugars. Note: here, the given list is altered, unlike in some other methods! Therefore, the list is not returned again. Note also that it is not checked whether the given parent molecule is actually the parent of the given substructures.- Parameters:
aCandidateList
- a list of linear sugar candidates from the same atom container objectaParentMolecule
- the molecule that is currently scanned for linear sugars to detect its circular sugars- Throws:
NullPointerException
- if any parameter is 'null'
-
removeCandidatesContainingCircularSugars
@Deprecated protected void removeCandidatesContainingCircularSugars(List<org.openscience.cdk.interfaces.IAtomContainer> aCandidateList, org.openscience.cdk.interfaces.IAtomContainer aParentMolecule) throws NullPointerException Deprecated.Alternative method to removing all atoms that belong to circular sugars from the linear sugar candidates: Rejecting every candidate completely that contains a circular sugar. The method detects potential circular sugars in the candidates and compares them to the potential sugar cycles in the parent molecule. If there is a match, the respective candidate is filtered out. This method was deprecated because it relies on the circular sugars being intact in the linear sugar candidates which is not always the case and because connected linear sugar moieties would also be filtered out using this approach. Note: here, the given list is altered, unlike in some other methods! Therefore, the list is not returned again. Note also that it is not checked whether the given parent molecule is actually the parent of the given substructures.- Parameters:
aCandidateList
- a list of linear sugar candidates from the same atom container objectaParentMolecule
- the molecule that is currently scanned for linear sugars to detect its circular sugars- Throws:
NullPointerException
- if any parameter is 'null'
-
removeCyclicAtomsFromSugarCandidates
protected void removeCyclicAtomsFromSugarCandidates(List<org.openscience.cdk.interfaces.IAtomContainer> aCandidateList, org.openscience.cdk.interfaces.IAtomContainer aMolecule) throws NullPointerException Removes all atoms that are part of a cycle from the given linear sugar candidates. Optional fifth step of linear sugar detection. The linear sugar patterns can also match in cycles that do not represent circular sugars but e.g. pseudo-sugars or macrocycles. It is optional to detect linear sugars in such structures or not. After the removal, disconnected candidates are separated into new candidates. Note: here, the given list is altered, unlike in some other methods! Therefore, the list is not returned again. Note also that it is not checked whether the given parent molecule is actually the parent of the given substructures.- Parameters:
aCandidateList
- a list of potential sugar substructures from the same atom container objectaMolecule
- the molecule that is currently scanned for linear sugars to detect its cycles- Throws:
NullPointerException
- if any parameter is 'null'
-
removeSugarCandidatesWithCyclicAtoms
@Deprecated protected void removeSugarCandidatesWithCyclicAtoms(List<org.openscience.cdk.interfaces.IAtomContainer> aCandidateList, org.openscience.cdk.interfaces.IAtomContainer aMolecule) throws NullPointerException Deprecated.Alternative method to removing all cyclic atoms from the linear sugar candidates: Rejecting every candidate completely that contains a cyclic atom. This method was deprecated because this way also connected linear moieties get discarded. Note: here, the given list is altered, unlike in some other methods! Therefore, the list is not returned again. Note also that it is not checked whether the given parent molecule is actually the parent of the given substructures.- Parameters:
aCandidateList
- a list of linear sugar candidates from the same atom container objectaMolecule
- the molecule that is currently scanned for linear sugars to detect its circular sugars- Throws:
NullPointerException
- if any parameter is 'null'
-
removeTooSmallAndTooLargeCandidates
protected List<org.openscience.cdk.interfaces.IAtomContainer> removeTooSmallAndTooLargeCandidates(List<org.openscience.cdk.interfaces.IAtomContainer> aCandidateList) throws NullPointerException Discards all linear sugar candidates that are too small or too big according to the current settings. Final step of linear sugar detection. This step was introduced because the preceding steps may produce small 'fragments', e.g. the hydroxy group of a circular sugar that was removed from a linear sugar candidate. These should be filtered out. ALso, a very large linear sugar that does not consist of multiple subunits linked by ether, ester, or peroxide bonds is considered too interesting to remove and should therefore also be filtered from the linear sugars detected for removal. The 'size' of the linear sugar candidates is determined as their carbon atom count. The set minimum and maximum sizes are inclusive. The parameter list is not altered and a completely new list returned.- Parameters:
aCandidateList
- a list of potential sugar substructures from the same atom container object- Returns:
- a new list of candidates where all too small and too big candidates have been filtered out
- Throws:
NullPointerException
- if the given list is 'null'
-