Class FragmentFingerprinter
java.lang.Object
de.unijena.cheminf.fragment.fingerprint.FragmentFingerprinter
- All Implemented Interfaces:
IFragmentFingerprinter,org.openscience.cdk.fingerprint.IFingerprinter
Class to generate fragment fingerprints. Bit and count fragment fingerprints can be generated.
Fragment fingerprints are key-based fingerprints.
Thus, the class requires predefined structures/fragments in
the form of unique SMILES to create the fingerprint. These structures must be passed when the
class is instantiated (in the constructor). The class implements the interface IFragmentFingerprinter,
which inherits the IFingerprinter (CDK), which allows the class to compute fingerprints in 2 ways.
The fingerprints are calculated by comparing given fragments, which are in the form of unique SMILES,
with the predefined fragments.
-
Constructor Summary
ConstructorsConstructorDescriptionFragmentFingerprinter(List<String> aFragmentsForMasterVectorList) Initialization of the fragment fingerprinter by using a user-defined set of fragments in the form of unique SMILES. -
Method Summary
Modifier and TypeMethodDescriptionintcount(String aSmiles, CountFingerprint aCountFingerprint) Method to return the count/occurrences/frequency of a given SMILES String in a given CountFingerprint instance.int[]getBitArray(List<String> aListOfUniqueSmiles) Returns bit array for specified list.int[]getBitArray(Map<String, Integer> aUniqueSmilesToFrequencyMap) Returns bit array for the specified map.getBitDefinition(int aBitPosition) Returns the bit definitions i.e.org.openscience.cdk.fingerprint.IBitFingerprintgetBitFingerprint(List<String> aListOfUniqueSmiles) Method to generate the bit fingerprint.org.openscience.cdk.fingerprint.IBitFingerprintgetBitFingerprint(org.openscience.cdk.interfaces.IAtomContainer container) Method directly returning the BitSet of the fingerprint generated from the given list of SMILES.Method directly returning the BitSet of the fingerprint generated from the given frequency map.int[]getCountArray(List<String> aListOfUniqueSmiles) Returns the count array for the specified list.int[]getCountArray(Map<String, Integer> aUniqueSmilesToFrequencyMap) Returns a CountArray, which is created based on the given parameter.org.openscience.cdk.fingerprint.ICountFingerprintgetCountFingerprint(List<String> aUniqueSmilesList) Generates a count fingerprint for a molecule based on its fragments, represented by unique SMILES in the list given as parameters.org.openscience.cdk.fingerprint.ICountFingerprintgetCountFingerprint(Map<String, Integer> aSmilesToFrequencyMap) Generates count fingerprint for a molecule based on its fragments represented by unique SMILES strings in the key set and their frequencies in the value set of the given map.org.openscience.cdk.fingerprint.ICountFingerprintgetCountFingerprint(org.openscience.cdk.interfaces.IAtomContainer container) getFingerprint(org.openscience.cdk.interfaces.IAtomContainer mol) protected voidgetFloatFingerprint(Map<String, Integer> aFragmentsFrequenciesMap, float[] aPreInitFloatArray, boolean anUseBitArrayStatement) Method to generate the float fingerprint of a given molecule's fragments (i.e.voidgetFragmentsComponentsFloatMatrix(Map<String, Integer>[] aFragmentsFrequenciesMapsArray, float[][] aFloatDataMatrix, boolean anUseBitArrayStatement) Public method to get a float[][] matrix containing the fingerprints for the given maps list (fragment sets of distinct molecules) in regard to the pre-defined fragment fingerprint.getRawFingerprint(org.openscience.cdk.interfaces.IAtomContainer container) UnsupportedOperationException.intgetSize()Since the FragmentFingerprinter is a key-based fingerprint, the size of the fingerprint is equal to the number of predefined fragments (unique SMILES) if the list of key fragments passed during initialization does not contain duplicates, otherwise the size of the fingerprint may be smaller than the number of fragments passed since duplicates are removed.
-
Constructor Details
-
FragmentFingerprinter
public FragmentFingerprinter(List<String> aFragmentsForMasterVectorList) throws NullPointerException, IllegalArgumentException Initialization of the fragment fingerprinter by using a user-defined set of fragments in the form of unique SMILES. If the list passed during initialization contains duplicates, they will be removed. The number of predefined fragments specified by the user may then differ from the actual number of key fragments present, as duplicates are removed. This means that duplicate fragment SMILES strings in the input list are ignored and are not part of the fingerprint multiple times.- Parameters:
aFragmentsForMasterVectorList- in which the predefined fragments are stored.- Throws:
NullPointerException- is thrown if the list param (or any of its elements) is null.IllegalArgumentException- is thrown if the list param contains blank Strings or Strings cannot be parsed as SMARTS.
-
-
Method Details
-
getBitFingerprint
public org.openscience.cdk.fingerprint.IBitFingerprint getBitFingerprint(List<String> aListOfUniqueSmiles) throws NullPointerException, IllegalArgumentException Method to generate the bit fingerprint. An entered list of unique SMILES is compared with the predefined fragments. If there is a match, the position of the unique SMILES is determined from the map and set to true in the initialized BitSet. The method is intended to generate the fingerprint for one molecule but can in principle be applied to any fragment set, e.g. originating from a cluster of multiple molecules.- Specified by:
getBitFingerprintin interfaceIFragmentFingerprinter- Parameters:
aListOfUniqueSmiles- is a list that stores fragments in the form of unique SMILES. To be able to calculate the fingerprint for a molecule, the fragments should belong to one molecule.- Returns:
- BitSetFingerprint. BitSetFingerprint is a CDK class that implements the IBitFingerprint interface of CDK. This allows methods to be used that return useful information from the calculated bit fingerprint, such as the number of positive bits in the fingerprint, etc.
- Throws:
NullPointerException- is thrown if the list param (or any of its elements) is null.IllegalArgumentException- is thrown if the list param contains blank/empty strings.
-
getCountFingerprint
public org.openscience.cdk.fingerprint.ICountFingerprint getCountFingerprint(Map<String, Integer> aSmilesToFrequencyMap) throws NullPointerException, IllegalArgumentExceptionGenerates count fingerprint for a molecule based on its fragments represented by unique SMILES strings in the key set and their frequencies in the value set of the given map. Given fragment SMILES codes that are not part of the set given at initialisation of this class, are ignored. The frequencies of those matching with the predefined set are used to construct the fingerprint. The method is intended to generate the fingerprint for one molecule but can in principle be applied to any fragment set, e.g. originating from a cluster of multiple molecules- Specified by:
getCountFingerprintin interfaceIFragmentFingerprinter- Parameters:
aSmilesToFrequencyMap- map usually represents a molecule by representing the fragments of the molecule by unique SMILES in the key set and indicating their frequency in the value set. In principle, however,such a map can be applied to any set of fragments. To be able to calculate the fingerprint for a molecule, the fragments must belong to a molecule.- Returns:
- count fingerprint
- Throws:
NullPointerException- is thrown if the map aSmilesToFrequencyMap is null or contains keys or values that are null respectively.IllegalArgumentException- is thrown if the map aSmilesToFrequencyMap contains keys or values that are blank/empty, respectively.
-
getCountFingerprint
public org.openscience.cdk.fingerprint.ICountFingerprint getCountFingerprint(List<String> aUniqueSmilesList) throws NullPointerException, IllegalArgumentException Generates a count fingerprint for a molecule based on its fragments, represented by unique SMILES in the list given as parameters. Given fragment SMILES codes that are not part of the set given at initialisation of this class, are ignored. The frequencies of those matching with the predefined set are used to construct the fingerprint. The frequency of individual fragments depends on how often they occur in the specified list. Duplicates are thus allowed in this list. The method is intended to generate the fingerprint for one molecule but can in principle be applied to any fragment set, e.g. originating from a cluster of multiple molecules.- Specified by:
getCountFingerprintin interfaceIFragmentFingerprinter- Parameters:
aUniqueSmilesList- is a list that stores fragments in the form of unique SMILES. If a fragment occurs more than once in the molecule, it is also present more than once in the list. To be able to calculate the fingerprint for a molecule, the fragments should belong to one molecule.- Returns:
- count fingerprint
- Throws:
NullPointerException- is thrown if the list param (or any of its elements) is null.IllegalArgumentException- is thrown if the list param contains blank/empty strings.
-
getVersionDescription
- Specified by:
getVersionDescriptionin interfaceorg.openscience.cdk.fingerprint.IFingerprinter
-
getFingerprint
public BitSet getFingerprint(org.openscience.cdk.interfaces.IAtomContainer mol) throws org.openscience.cdk.exception.CDKException - Specified by:
getFingerprintin interfaceorg.openscience.cdk.fingerprint.IFingerprinter- Throws:
org.openscience.cdk.exception.CDKException
-
getBitFingerprint
public org.openscience.cdk.fingerprint.IBitFingerprint getBitFingerprint(org.openscience.cdk.interfaces.IAtomContainer container) throws org.openscience.cdk.exception.CDKException - Specified by:
getBitFingerprintin interfaceorg.openscience.cdk.fingerprint.IFingerprinter- Throws:
org.openscience.cdk.exception.CDKException- See Also:
-
SubstructureFingerprinter
-
getCountFingerprint
public org.openscience.cdk.fingerprint.ICountFingerprint getCountFingerprint(org.openscience.cdk.interfaces.IAtomContainer container) throws org.openscience.cdk.exception.CDKException - Specified by:
getCountFingerprintin interfaceorg.openscience.cdk.fingerprint.IFingerprinter- Throws:
org.openscience.cdk.exception.CDKException- See Also:
-
SubstructureFingerprinter
-
getRawFingerprint
public Map<String,Integer> getRawFingerprint(org.openscience.cdk.interfaces.IAtomContainer container) throws org.openscience.cdk.exception.CDKException UnsupportedOperationException. This method is not supported.- Specified by:
getRawFingerprintin interfaceorg.openscience.cdk.fingerprint.IFingerprinter- Throws:
UnsupportedOperationException- method is not supportedorg.openscience.cdk.exception.CDKException
-
getSize
public int getSize()Since the FragmentFingerprinter is a key-based fingerprint, the size of the fingerprint is equal to the number of predefined fragments (unique SMILES) if the list of key fragments passed during initialization does not contain duplicates, otherwise the size of the fingerprint may be smaller than the number of fragments passed since duplicates are removed. Which means that duplicate fragment SMILES strings are ignored during initialization and are not part of the fingerprint multiple times.- Specified by:
getSizein interfaceorg.openscience.cdk.fingerprint.IFingerprinter- Returns:
- int
-
getBitDefinition
Returns the bit definitions i.e. which bit stands for which fragment SMILES. Important, the number of possible bit definitions may differ from the number of key fragments passed during initialization, since duplicates are removed.- Parameters:
aBitPosition- in the fingerprint.- Returns:
- unique SMILES corresponding to the specified position.
- Throws:
IllegalArgumentException- is thrown if the given bit position is not (or cannot be) present in the fingerprint.
-
getBitArray
public int[] getBitArray(List<String> aListOfUniqueSmiles) throws NullPointerException, IllegalArgumentException Returns bit array for specified list. The size of the array corresponds to the number of predefined (key) fragments passed during initialization. However, the size may differ if there are duplicates in the specified predefined fragments, as they will be ignored/removed. This method is only available for bit fingerprints based on unique SMILES comparisons.- Parameters:
aListOfUniqueSmiles- is a list that stores molecule fragments or arbitrary fragments in the form of unique SMILES.- Returns:
- int[] bit array
- Throws:
NullPointerException- is thrown if the list param (or any of its elements) is null.IllegalArgumentException- is thrown if the list param contains blank/empty strings.
-
getBitArray
public int[] getBitArray(Map<String, Integer> aUniqueSmilesToFrequencyMap) throws NullPointerException, IllegalArgumentExceptionReturns bit array for the specified map. The map represents a molecule based on its fragments, which are represented by unique SMILES in the key set and whose frequencies are mapped in the value set. But the map can also contain arbitrary fragment sets. This method is a convenience method and the given frequencies are not used. And the method is only available for bit fingerprints based on unique SMILES comparisons.- Parameters:
aUniqueSmilesToFrequencyMap- map usually represents a molecule by representing the fragments of the molecule by unique SMILES in the key set and indicating their frequency in the value set. In principle, however,such a map can be applied to any set of fragments.- Returns:
- int[] bit array
- Throws:
NullPointerException- is thrown if the map aUniqueSmilesToFrequencyMap is null or contains keys or values that are null respectively.IllegalArgumentException- is thrown if the map aUniqueSmilesToFrequencyMap contains keys or values that are blank/empty, respectively.- See Also:
-
getBitSet
public BitSet getBitSet(List<String> aListOfUniqueSmiles) throws NullPointerException, IllegalArgumentException Method directly returning the BitSet of the fingerprint generated from the given list of SMILES.- Parameters:
aListOfUniqueSmiles- storing the fragments or molecules from which a fingerprint (and BitSet) is to be generated- Returns:
- BitSet of the generated fingerprint
- Throws:
NullPointerExceptionIllegalArgumentException
-
getBitSet
public BitSet getBitSet(Map<String, Integer> aUniqueSmilesToFrequencyMap) throws NullPointerException, IllegalArgumentExceptionMethod directly returning the BitSet of the fingerprint generated from the given frequency map.- Parameters:
aUniqueSmilesToFrequencyMap- with SMILES to frequency representation from which a fingerprint (and BitSet) is to be generated- Returns:
- BitSet of the generated fingerprint
- Throws:
NullPointerExceptionIllegalArgumentException
-
getCountArray
public int[] getCountArray(Map<String, Integer> aUniqueSmilesToFrequencyMap) throws NullPointerException, IllegalArgumentExceptionReturns a CountArray, which is created based on the given parameter. The map represents a molecule based on its fragments, which are represented by unique SMILES in the key set and whose frequencies are mapped in the value set. But the map can also contain arbitrary fragment sets. The size of the array corresponds to the number of predefined (key) fragments passed during initialization. However, the size may differ if there are duplicates in the specified predefined fragments, as they will be ignored/removed. This method is only available for count fingerprints based on unique SMILES comparisons.- Parameters:
aUniqueSmilesToFrequencyMap- map usually represents a molecule by representing the fragments of the molecule by unique SMILES in the key set and indicating their frequency in the value set. In principle, however,such a map can be applied to any set of fragments.- Returns:
- int[] count array
- Throws:
NullPointerException- is thrown if the map aUniqueSmilesToFrequencyMap is null or contains keys or values that are null respectively.IllegalArgumentException- is thrown if the map aUniqueSmilesToFrequencyMap contains keys or values that are blank/empty, respectively.
-
getCountArray
public int[] getCountArray(List<String> aListOfUniqueSmiles) throws NullPointerException, IllegalArgumentException Returns the count array for the specified list. This method is only available for count fingerprints based on unique SMILES comparisons.- Parameters:
aListOfUniqueSmiles- is a list that stores molecule fragments or arbitrary fragments in the form of unique SMILES.- Returns:
- int[] count array
- Throws:
NullPointerException- is thrown if the list param (or any of its elements) is null.IllegalArgumentException- is thrown if the list param contains blank/empty strings.- See Also:
-
count
public int count(String aSmiles, CountFingerprint aCountFingerprint) throws IllegalArgumentException Method to return the count/occurrences/frequency of a given SMILES String in a given CountFingerprint instance. !Important: The CountFingerprint instance has got to be generated with the currently instanced/active FragmentFingerprinter!- Parameters:
aSmiles- String to get count foraCountFingerprint- wherein to search for SMILES String- Returns:
- integer count of given SMILES String
- Throws:
IllegalArgumentException- if SMILES is not present in fingerprint
-
getFloatFingerprint
protected void getFloatFingerprint(Map<String, Integer> aFragmentsFrequenciesMap, float[] aPreInitFloatArray, boolean anUseBitArrayStatement) Method to generate the float fingerprint of a given molecule's fragments (i.e. the given map). A setting is defined whether bit or count/frequency representation shall be used for generating the fingerprint.- Parameters:
aFragmentsFrequenciesMap- contains SMILES-frequency pairs of the molecule's fragmentsaPreInitFloatArray- is a pre-initialized float[] array (-> matrix row)anUseBitArrayStatement- is the setting whether bit or count/frequency representation is to be used
-
getFragmentsComponentsFloatMatrix
public void getFragmentsComponentsFloatMatrix(Map<String, Integer>[] aFragmentsFrequenciesMapsArray, float[][] aFloatDataMatrix, boolean anUseBitArrayStatement) Public method to get a float[][] matrix containing the fingerprints for the given maps list (fragment sets of distinct molecules) in regard to the pre-defined fragment fingerprint. It shall be noted that each Map in the given List represents ONE molecule's fragment frequencies (fragment-frequency pairs). Therefore, the whole list represents the entirety of fragments. Further, a setting whether to use "bit set" or "count/frequency of fragment" is available. It is recommended to check the size of the pre-initialized float matrix. In case of a matrix with a pre-initialized size smaller than the actual required size, an exception will be thrown and no matrix filling will take place.- Parameters:
aFragmentsFrequenciesMapsArray- contains maps of SMILES-Frequency pairs of fragmentsaFloatDataMatrix- is a pre-initialized float[][] matrix to write data intoanUseBitArrayStatement- is the setting whether bit or count/frequency representation should be used
-