Class FragmentFingerprinter

java.lang.Object
de.unijena.cheminf.fragment.fingerprint.FragmentFingerprinter
All Implemented Interfaces:
IFragmentFingerprinter, org.openscience.cdk.fingerprint.IFingerprinter

public class FragmentFingerprinter extends Object implements IFragmentFingerprinter
Class to generate fragment fingerprints. Bit and count fragment fingerprints can be generated. Fragment fingerprints are key-based fingerprints. Thus, the class requires predefined structures/fragments in the form of unique SMILES to create the fingerprint. These structures must be passed when the class is instantiated (in the constructor). The class implements the interface IFragmentFingerprinter, which inherits the IFingerprinter (CDK), which allows the class to compute fingerprints in 2 ways. The first way to calculate a bit or count fingerprint is to perform a substructure comparison with all predefined fragments for a given IAtomContainer. The fingerprint created by the substructure search is based on the CDK class SubtructureFingerprinter. The predefined fragment SMILES are interpreted as SMARTS patterns by the SubstructureFingerprinter class. The second way to calculate fingerprints is by comparing given fragments, which are in the form of unique SMILES, with the predefined fragments. The second possibility is thus based on a pure comparison of strings. It is important to note that the two different ways of creating fingerprints can produce different results.
  • Constructor Summary

    Constructors
    Constructor
    Description
    Constructor.
  • Method Summary

    Modifier and Type
    Method
    Description
    int[]
    getBitArray(List<String> aListOfUniqueSmiles)
    Returns bit array for specified list.
    int[]
    getBitArray(Map<String,Integer> aUniqueSmilesToFrequencyMap)
    Returns bit array for the specified map.
    getBitDefinition(int aBit)
    Returns the bit definitions i.e.
    org.openscience.cdk.fingerprint.IBitFingerprint
    getBitFingerprint(List<String> aListOfUniqueSmiles)
    Method to generate the bit fingerprint.
    org.openscience.cdk.fingerprint.IBitFingerprint
    getBitFingerprint(org.openscience.cdk.interfaces.IAtomContainer container)
    int[]
    getCountArray(List<String> aListOfUniqueSmiles)
    Returns the count array for the specified list.
    int[]
    getCountArray(Map<String,Integer> aUniqueSmilesToFrequencyMap)
    Returns a CountArray, which is created based on the given parameter.
    org.openscience.cdk.fingerprint.ICountFingerprint
    getCountFingerprint(List<String> aUniqueSmilesList)
    Generates a count fingerprint for a molecule based on its fragments, represented by unique SMILES in the list given as parameters.
    getCountFingerprint(Map<String,Integer> aUniqueSmilesToFrequencyMap)
    Generates count fingerprint for a molecule based on its fragments represented by unique SMILES strings in the key set and their frequencies in the value set of the given map.
    org.openscience.cdk.fingerprint.ICountFingerprint
    getCountFingerprint(org.openscience.cdk.interfaces.IAtomContainer container)
    getFingerprint(org.openscience.cdk.interfaces.IAtomContainer mol)
    getRawFingerprint(org.openscience.cdk.interfaces.IAtomContainer container)
    UnsupportedOperationException.
    int
    Since the FragmentFingerprinter is a key-based fingerprint, the size of the fingerprint is equal to the number of predefined fragments (unique SMILES) if the list of key fragments passed during initialization does not contain duplicates, otherwise the size of the fingerprint may be smaller than the number of fragments passed since duplicates are removed.

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
  • Constructor Details

    • FragmentFingerprinter

      public FragmentFingerprinter(List<String> aFragmentList) throws NullPointerException, IllegalArgumentException
      Constructor. Initialization of the fragment fingerprinter by using a user-defined set of fragments in the form of unique SMILES. If the list passed during initialization contains duplicates, they will be removed. The number of predefined fragments specified by the user may then differ from the actual number of key fragments present, as duplicates are removed. This means that duplicate fragment SMILES strings in the input list are ignored and are not part of the fingerprint multiple times.
      Parameters:
      aFragmentList - is the ist in which the predefined fragments are stored.
      Throws:
      NullPointerException - is thrown if the list aFragmentList is null.
      IllegalArgumentException - is thrown if the list contains blank strings.
  • Method Details

    • getBitFingerprint

      public org.openscience.cdk.fingerprint.IBitFingerprint getBitFingerprint(List<String> aListOfUniqueSmiles) throws NullPointerException, IllegalArgumentException
      Method to generate the bit fingerprint. An entered list of unique SMILES is compared with the predefined fragments. If there is a match, the position of the unique SMILES is determined from the map and set to true in the initialized BitSet. The method is intended to generate the fingerprint for one molecule but can in principle be applied to any fragment set, e.g. originating from a cluster of multiple molecules.
      Specified by:
      getBitFingerprint in interface IFragmentFingerprinter
      Parameters:
      aListOfUniqueSmiles - is a list that stores fragments in the form of unique SMILES. To be able to calculate the fingerprint for a molecule, the fragments should belong to one molecule.
      Returns:
      BitSet. BitSet is a CDK class that implements the IBitFingerprint interface of CDK. This allows methods to be used that return useful information from the calculated bit fingerprint, such as the number of positive bits in the fingerprint, etc.
      Throws:
      NullPointerException - is thrown if the list aListOfUniqueSmiles is null.
      IllegalArgumentException - is thrown if the list aListOfUniqueSmiles contains blank/empty strings.
    • getCountFingerprint

      public CountFingerprint getCountFingerprint(Map<String,Integer> aUniqueSmilesToFrequencyMap) throws NullPointerException, IllegalArgumentException
      Generates count fingerprint for a molecule based on its fragments represented by unique SMILES strings in the key set and their frequencies in the value set of the given map. Given fragment SMILES codes that are not part of the set given at initialisation of this class, are ignored. The frequencies of those matching with the predefined set are used to construct the fingerprint. The method is intended to generate the fingerprint for one molecule but can in principle be applied to any fragment set, e.g. originating from a cluster of multiple molecules
      Specified by:
      getCountFingerprint in interface IFragmentFingerprinter
      Parameters:
      aUniqueSmilesToFrequencyMap - map usually represents a molecule by representing the fragments of the molecule by unique SMILES in the key set and indicating their frequency in the value set. In principle, however,such a map can be applied to any set of fragments. To be able to calculate the fingerprint for a molecule, the fragments must belong to a molecule.
      Returns:
      count fingerprint
      Throws:
      NullPointerException - is thrown if the map aUniqueSmilesToFrequencyMap is null or contains keys or values that are null respectively.
      IllegalArgumentException - is thrown if the map aUniqueSmilesToFrequencyMap contains keys or values that are blank/empty, respectively.
    • getCountFingerprint

      public org.openscience.cdk.fingerprint.ICountFingerprint getCountFingerprint(List<String> aUniqueSmilesList) throws NullPointerException, IllegalArgumentException
      Generates a count fingerprint for a molecule based on its fragments, represented by unique SMILES in the list given as parameters. Given fragment SMILES codes that are not part of the set given at initialisation of this class, are ignored. The frequencies of those matching with the predefined set are used to construct the fingerprint The frequency of individual fragments depends on how often they occur in the specified list. Duplicates are thus allowed in this list. The method is intended to generate the fingerprint for one molecule but can in principle be applied to any fragment set, e.g. originating from a cluster of multiple molecules.
      Specified by:
      getCountFingerprint in interface IFragmentFingerprinter
      Parameters:
      aUniqueSmilesList - is a list that stores fragments in the form of unique SMILES. If a fragment occurs more than once in the molecule, it is also present more than once in the list. To be able to calculate the fingerprint for a molecule, the fragments should belong to one molecule.
      Returns:
      count fingerprint
      Throws:
      NullPointerException - is thrown if the list aUniqueSmilesToFrequencyList is null.
      IllegalArgumentException - is thrown if the list aListOfUniqueSmiles contains blank/empty strings.
    • getVersionDescription

      public String getVersionDescription()
      Specified by:
      getVersionDescription in interface org.openscience.cdk.fingerprint.IFingerprinter
    • getFingerprint

      public BitSet getFingerprint(org.openscience.cdk.interfaces.IAtomContainer mol) throws org.openscience.cdk.exception.CDKException
      Specified by:
      getFingerprint in interface org.openscience.cdk.fingerprint.IFingerprinter
      Throws:
      org.openscience.cdk.exception.CDKException
    • getBitFingerprint

      public org.openscience.cdk.fingerprint.IBitFingerprint getBitFingerprint(org.openscience.cdk.interfaces.IAtomContainer container) throws org.openscience.cdk.exception.CDKException
      Specified by:
      getBitFingerprint in interface org.openscience.cdk.fingerprint.IFingerprinter
      Throws:
      org.openscience.cdk.exception.CDKException
      See Also:
      • SubstructureFingerprinter
    • getCountFingerprint

      public org.openscience.cdk.fingerprint.ICountFingerprint getCountFingerprint(org.openscience.cdk.interfaces.IAtomContainer container) throws org.openscience.cdk.exception.CDKException
      Specified by:
      getCountFingerprint in interface org.openscience.cdk.fingerprint.IFingerprinter
      Throws:
      org.openscience.cdk.exception.CDKException
      See Also:
      • SubstructureFingerprinter
    • getRawFingerprint

      public Map<String,Integer> getRawFingerprint(org.openscience.cdk.interfaces.IAtomContainer container) throws org.openscience.cdk.exception.CDKException
      UnsupportedOperationException. This method is not supported.
      Specified by:
      getRawFingerprint in interface org.openscience.cdk.fingerprint.IFingerprinter
      Throws:
      UnsupportedOperationException - method is not supported
      org.openscience.cdk.exception.CDKException
    • getSize

      public int getSize()
      Since the FragmentFingerprinter is a key-based fingerprint, the size of the fingerprint is equal to the number of predefined fragments (unique SMILES) if the list of key fragments passed during initialization does not contain duplicates, otherwise the size of the fingerprint may be smaller than the number of fragments passed since duplicates are removed. Which means that duplicate fragment SMILES strings are ignored during initialization and are not part of the fingerprint multiple times.
      Specified by:
      getSize in interface org.openscience.cdk.fingerprint.IFingerprinter
      Returns:
      int
    • getBitDefinition

      public String getBitDefinition(int aBit) throws IllegalArgumentException
      Returns the bit definitions i.e. which bit stands for which fragment SMILES. Important, the number of possible bit definitions may differ from the number of key fragments passed during initialization, since duplicates are removed.
      Parameters:
      aBit - position in the fingerprint.
      Returns:
      unique SMILES corresponding to the specified position.
      Throws:
      IllegalArgumentException - is thrown if the given bit position is not present in the fingerprint.
    • getBitArray

      public int[] getBitArray(List<String> aListOfUniqueSmiles) throws NullPointerException, IllegalArgumentException
      Returns bit array for specified list. The size of the array corresponds to the number of predefined (key) fragments passed during initialization. However, the size may differ if there are duplicates in the specified predefined fragments, as they will be ignored/removed. This method is only available for bit fingerprints based on unique SMILES comparisons.
      Parameters:
      aListOfUniqueSmiles - is a list that stores molecule fragments or arbitrary fragments in the form of unique SMILES.
      Returns:
      int[] bit array
      Throws:
      NullPointerException - is thrown if the list aListOfUniqueSmiles is null.
      IllegalArgumentException - is thrown if the list aListOfUniqueSmiles contains blank/empty strings.
    • getBitArray

      public int[] getBitArray(Map<String,Integer> aUniqueSmilesToFrequencyMap) throws NullPointerException, IllegalArgumentException
      Returns bit array for the specified map. The map represents a molecule based on its fragments, which are represented by unique SMILES in the key set and whose frequencies are mapped in the value set. But the map can also contain arbitrary fragment sets. This method is a convenience method and the given frequencies are not used. And the method is only available for bit fingerprints based on unique SMILES comparisons.
      Parameters:
      aUniqueSmilesToFrequencyMap - map usually represents a molecule by representing the fragments of the molecule by unique SMILES in the key set and indicating their frequency in the value set. In principle, however,such a map can be applied to any set of fragments.
      Returns:
      int[] bit array
      Throws:
      NullPointerException - is thrown if the map aUniqueSmilesToFrequencyMap is null or contains keys or values that are null respectively.
      IllegalArgumentException - is thrown if the map aUniqueSmilesToFrequencyMap contains keys or values that are blank/empty, respectively.
      See Also:
    • getCountArray

      public int[] getCountArray(Map<String,Integer> aUniqueSmilesToFrequencyMap) throws NullPointerException, IllegalArgumentException
      Returns a CountArray, which is created based on the given parameter. The map represents a molecule based on its fragments, which are represented by unique SMILES in the key set and whose frequencies are mapped in the value set. But the map can also contain arbitrary fragment sets. The size of the array corresponds to the number of predefined (key) fragments passed during initialization. However, the size may differ if there are duplicates in the specified predefined fragments, as they will be ignored/removed. This method is only available for count fingerprints based on unique SMILES comparisons.
      Parameters:
      aUniqueSmilesToFrequencyMap - map usually represents a molecule by representing the fragments of the molecule by unique SMILES in the key set and indicating their frequency in the value set. In principle, however,such a map can be applied to any set of fragments.
      Returns:
      int[] count array
      Throws:
      NullPointerException - is thrown if the map aUniqueSmilesToFrequencyMap is null or contains keys or values that are null respectively.
      IllegalArgumentException - is thrown if the map aUniqueSmilesToFrequencyMap contains keys or values that are blank/empty, respectively.
    • getCountArray

      public int[] getCountArray(List<String> aListOfUniqueSmiles) throws NullPointerException, IllegalArgumentException
      Returns the count array for the specified list. This method is only available for count fingerprints based on unique SMILES comparisons.
      Parameters:
      aListOfUniqueSmiles - is a list that stores molecule fragments or arbitrary fragments in the form of unique SMILES.
      Returns:
      int[] count array
      Throws:
      NullPointerException - is thrown if the list aListOfUniqueSmiles is null.
      IllegalArgumentException - is thrown if the list aListOfUniqueSmiles contains blank/empty strings.
      See Also: