Class FragmentFingerprinter
java.lang.Object
de.unijena.cheminf.fragment.fingerprint.FragmentFingerprinter
- All Implemented Interfaces:
IFragmentFingerprinter,org.openscience.cdk.fingerprint.IFingerprinter
Class to generate fragment fingerprints. Bit and count fragment fingerprints can be generated.
Fragment fingerprints are key-based fingerprints.
Thus, the class requires predefined structures/fragments in
the form of unique SMILES to create the fingerprint. These structures must be passed when the
class is instantiated (in the constructor). The class implements the interface IFragmentFingerprinter,
which inherits the IFingerprinter (CDK), which allows the class to compute fingerprints in 2 ways.
The first way to calculate a bit or count fingerprint is to perform a substructure comparison with all
predefined fragments for a given IAtomContainer. The fingerprint created by the substructure search is based on
the CDK class SubtructureFingerprinter. The predefined fragment SMILES are interpreted as SMARTS patterns by the
SubstructureFingerprinter class. The second way to calculate fingerprints is by comparing
given fragments, which are in the form of unique SMILES, with the predefined fragments.
The second possibility is thus based on a pure comparison of strings. It is important to note that the two
different ways of creating fingerprints can produce different results.
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionint[]getBitArray(List<String> aListOfUniqueSmiles) Returns bit array for specified list.int[]getBitArray(Map<String, Integer> aUniqueSmilesToFrequencyMap) Returns bit array for the specified map.getBitDefinition(int aBit) Returns the bit definitions i.e.org.openscience.cdk.fingerprint.IBitFingerprintgetBitFingerprint(List<String> aListOfUniqueSmiles) Method to generate the bit fingerprint.org.openscience.cdk.fingerprint.IBitFingerprintgetBitFingerprint(org.openscience.cdk.interfaces.IAtomContainer container) int[]getCountArray(List<String> aListOfUniqueSmiles) Returns the count array for the specified list.int[]getCountArray(Map<String, Integer> aUniqueSmilesToFrequencyMap) Returns a CountArray, which is created based on the given parameter.org.openscience.cdk.fingerprint.ICountFingerprintgetCountFingerprint(List<String> aUniqueSmilesList) Generates a count fingerprint for a molecule based on its fragments, represented by unique SMILES in the list given as parameters.getCountFingerprint(Map<String, Integer> aUniqueSmilesToFrequencyMap) Generates count fingerprint for a molecule based on its fragments represented by unique SMILES strings in the key set and their frequencies in the value set of the given map.org.openscience.cdk.fingerprint.ICountFingerprintgetCountFingerprint(org.openscience.cdk.interfaces.IAtomContainer container) getFingerprint(org.openscience.cdk.interfaces.IAtomContainer mol) getRawFingerprint(org.openscience.cdk.interfaces.IAtomContainer container) UnsupportedOperationException.intgetSize()Since the FragmentFingerprinter is a key-based fingerprint, the size of the fingerprint is equal to the number of predefined fragments (unique SMILES) if the list of key fragments passed during initialization does not contain duplicates, otherwise the size of the fingerprint may be smaller than the number of fragments passed since duplicates are removed.
-
Constructor Details
-
FragmentFingerprinter
public FragmentFingerprinter(List<String> aFragmentList) throws NullPointerException, IllegalArgumentException Constructor. Initialization of the fragment fingerprinter by using a user-defined set of fragments in the form of unique SMILES. If the list passed during initialization contains duplicates, they will be removed. The number of predefined fragments specified by the user may then differ from the actual number of key fragments present, as duplicates are removed. This means that duplicate fragment SMILES strings in the input list are ignored and are not part of the fingerprint multiple times.- Parameters:
aFragmentList- is the ist in which the predefined fragments are stored.- Throws:
NullPointerException- is thrown if the list aFragmentList is null.IllegalArgumentException- is thrown if the list contains blank strings.
-
-
Method Details
-
getBitFingerprint
public org.openscience.cdk.fingerprint.IBitFingerprint getBitFingerprint(List<String> aListOfUniqueSmiles) throws NullPointerException, IllegalArgumentException Method to generate the bit fingerprint. An entered list of unique SMILES is compared with the predefined fragments. If there is a match, the position of the unique SMILES is determined from the map and set to true in the initialized BitSet. The method is intended to generate the fingerprint for one molecule but can in principle be applied to any fragment set, e.g. originating from a cluster of multiple molecules.- Specified by:
getBitFingerprintin interfaceIFragmentFingerprinter- Parameters:
aListOfUniqueSmiles- is a list that stores fragments in the form of unique SMILES. To be able to calculate the fingerprint for a molecule, the fragments should belong to one molecule.- Returns:
- BitSet. BitSet is a CDK class that implements the IBitFingerprint interface of CDK. This allows methods to be used that return useful information from the calculated bit fingerprint, such as the number of positive bits in the fingerprint, etc.
- Throws:
NullPointerException- is thrown if the list aListOfUniqueSmiles is null.IllegalArgumentException- is thrown if the list aListOfUniqueSmiles contains blank/empty strings.
-
getCountFingerprint
public CountFingerprint getCountFingerprint(Map<String, Integer> aUniqueSmilesToFrequencyMap) throws NullPointerException, IllegalArgumentExceptionGenerates count fingerprint for a molecule based on its fragments represented by unique SMILES strings in the key set and their frequencies in the value set of the given map. Given fragment SMILES codes that are not part of the set given at initialisation of this class, are ignored. The frequencies of those matching with the predefined set are used to construct the fingerprint. The method is intended to generate the fingerprint for one molecule but can in principle be applied to any fragment set, e.g. originating from a cluster of multiple molecules- Specified by:
getCountFingerprintin interfaceIFragmentFingerprinter- Parameters:
aUniqueSmilesToFrequencyMap- map usually represents a molecule by representing the fragments of the molecule by unique SMILES in the key set and indicating their frequency in the value set. In principle, however,such a map can be applied to any set of fragments. To be able to calculate the fingerprint for a molecule, the fragments must belong to a molecule.- Returns:
- count fingerprint
- Throws:
NullPointerException- is thrown if the map aUniqueSmilesToFrequencyMap is null or contains keys or values that are null respectively.IllegalArgumentException- is thrown if the map aUniqueSmilesToFrequencyMap contains keys or values that are blank/empty, respectively.
-
getCountFingerprint
public org.openscience.cdk.fingerprint.ICountFingerprint getCountFingerprint(List<String> aUniqueSmilesList) throws NullPointerException, IllegalArgumentException Generates a count fingerprint for a molecule based on its fragments, represented by unique SMILES in the list given as parameters. Given fragment SMILES codes that are not part of the set given at initialisation of this class, are ignored. The frequencies of those matching with the predefined set are used to construct the fingerprint The frequency of individual fragments depends on how often they occur in the specified list. Duplicates are thus allowed in this list. The method is intended to generate the fingerprint for one molecule but can in principle be applied to any fragment set, e.g. originating from a cluster of multiple molecules.- Specified by:
getCountFingerprintin interfaceIFragmentFingerprinter- Parameters:
aUniqueSmilesList- is a list that stores fragments in the form of unique SMILES. If a fragment occurs more than once in the molecule, it is also present more than once in the list. To be able to calculate the fingerprint for a molecule, the fragments should belong to one molecule.- Returns:
- count fingerprint
- Throws:
NullPointerException- is thrown if the list aUniqueSmilesToFrequencyList is null.IllegalArgumentException- is thrown if the list aListOfUniqueSmiles contains blank/empty strings.
-
getVersionDescription
- Specified by:
getVersionDescriptionin interfaceorg.openscience.cdk.fingerprint.IFingerprinter
-
getFingerprint
public BitSet getFingerprint(org.openscience.cdk.interfaces.IAtomContainer mol) throws org.openscience.cdk.exception.CDKException - Specified by:
getFingerprintin interfaceorg.openscience.cdk.fingerprint.IFingerprinter- Throws:
org.openscience.cdk.exception.CDKException
-
getBitFingerprint
public org.openscience.cdk.fingerprint.IBitFingerprint getBitFingerprint(org.openscience.cdk.interfaces.IAtomContainer container) throws org.openscience.cdk.exception.CDKException - Specified by:
getBitFingerprintin interfaceorg.openscience.cdk.fingerprint.IFingerprinter- Throws:
org.openscience.cdk.exception.CDKException- See Also:
-
SubstructureFingerprinter
-
getCountFingerprint
public org.openscience.cdk.fingerprint.ICountFingerprint getCountFingerprint(org.openscience.cdk.interfaces.IAtomContainer container) throws org.openscience.cdk.exception.CDKException - Specified by:
getCountFingerprintin interfaceorg.openscience.cdk.fingerprint.IFingerprinter- Throws:
org.openscience.cdk.exception.CDKException- See Also:
-
SubstructureFingerprinter
-
getRawFingerprint
public Map<String,Integer> getRawFingerprint(org.openscience.cdk.interfaces.IAtomContainer container) throws org.openscience.cdk.exception.CDKException UnsupportedOperationException. This method is not supported.- Specified by:
getRawFingerprintin interfaceorg.openscience.cdk.fingerprint.IFingerprinter- Throws:
UnsupportedOperationException- method is not supportedorg.openscience.cdk.exception.CDKException
-
getSize
public int getSize()Since the FragmentFingerprinter is a key-based fingerprint, the size of the fingerprint is equal to the number of predefined fragments (unique SMILES) if the list of key fragments passed during initialization does not contain duplicates, otherwise the size of the fingerprint may be smaller than the number of fragments passed since duplicates are removed. Which means that duplicate fragment SMILES strings are ignored during initialization and are not part of the fingerprint multiple times.- Specified by:
getSizein interfaceorg.openscience.cdk.fingerprint.IFingerprinter- Returns:
- int
-
getBitDefinition
Returns the bit definitions i.e. which bit stands for which fragment SMILES. Important, the number of possible bit definitions may differ from the number of key fragments passed during initialization, since duplicates are removed.- Parameters:
aBit- position in the fingerprint.- Returns:
- unique SMILES corresponding to the specified position.
- Throws:
IllegalArgumentException- is thrown if the given bit position is not present in the fingerprint.
-
getBitArray
public int[] getBitArray(List<String> aListOfUniqueSmiles) throws NullPointerException, IllegalArgumentException Returns bit array for specified list. The size of the array corresponds to the number of predefined (key) fragments passed during initialization. However, the size may differ if there are duplicates in the specified predefined fragments, as they will be ignored/removed. This method is only available for bit fingerprints based on unique SMILES comparisons.- Parameters:
aListOfUniqueSmiles- is a list that stores molecule fragments or arbitrary fragments in the form of unique SMILES.- Returns:
- int[] bit array
- Throws:
NullPointerException- is thrown if the list aListOfUniqueSmiles is null.IllegalArgumentException- is thrown if the list aListOfUniqueSmiles contains blank/empty strings.
-
getBitArray
public int[] getBitArray(Map<String, Integer> aUniqueSmilesToFrequencyMap) throws NullPointerException, IllegalArgumentExceptionReturns bit array for the specified map. The map represents a molecule based on its fragments, which are represented by unique SMILES in the key set and whose frequencies are mapped in the value set. But the map can also contain arbitrary fragment sets. This method is a convenience method and the given frequencies are not used. And the method is only available for bit fingerprints based on unique SMILES comparisons.- Parameters:
aUniqueSmilesToFrequencyMap- map usually represents a molecule by representing the fragments of the molecule by unique SMILES in the key set and indicating their frequency in the value set. In principle, however,such a map can be applied to any set of fragments.- Returns:
- int[] bit array
- Throws:
NullPointerException- is thrown if the map aUniqueSmilesToFrequencyMap is null or contains keys or values that are null respectively.IllegalArgumentException- is thrown if the map aUniqueSmilesToFrequencyMap contains keys or values that are blank/empty, respectively.- See Also:
-
getCountArray
public int[] getCountArray(Map<String, Integer> aUniqueSmilesToFrequencyMap) throws NullPointerException, IllegalArgumentExceptionReturns a CountArray, which is created based on the given parameter. The map represents a molecule based on its fragments, which are represented by unique SMILES in the key set and whose frequencies are mapped in the value set. But the map can also contain arbitrary fragment sets. The size of the array corresponds to the number of predefined (key) fragments passed during initialization. However, the size may differ if there are duplicates in the specified predefined fragments, as they will be ignored/removed. This method is only available for count fingerprints based on unique SMILES comparisons.- Parameters:
aUniqueSmilesToFrequencyMap- map usually represents a molecule by representing the fragments of the molecule by unique SMILES in the key set and indicating their frequency in the value set. In principle, however,such a map can be applied to any set of fragments.- Returns:
- int[] count array
- Throws:
NullPointerException- is thrown if the map aUniqueSmilesToFrequencyMap is null or contains keys or values that are null respectively.IllegalArgumentException- is thrown if the map aUniqueSmilesToFrequencyMap contains keys or values that are blank/empty, respectively.
-
getCountArray
public int[] getCountArray(List<String> aListOfUniqueSmiles) throws NullPointerException, IllegalArgumentException Returns the count array for the specified list. This method is only available for count fingerprints based on unique SMILES comparisons.- Parameters:
aListOfUniqueSmiles- is a list that stores molecule fragments or arbitrary fragments in the form of unique SMILES.- Returns:
- int[] count array
- Throws:
NullPointerException- is thrown if the list aListOfUniqueSmiles is null.IllegalArgumentException- is thrown if the list aListOfUniqueSmiles contains blank/empty strings.- See Also:
-