Class DynamicSMILESFileReader

java.lang.Object
org.openscience.cdk.io.ChemObjectIO
org.openscience.cdk.io.iterator.DefaultIteratingChemObjectReader<org.openscience.cdk.interfaces.IAtomContainer>
de.unijena.cheminf.deglycosylation.tools.DynamicSMILESFileReader
All Implemented Interfaces:
Closeable, AutoCloseable, Iterator<org.openscience.cdk.interfaces.IAtomContainer>, org.openscience.cdk.io.IChemObjectIO, org.openscience.cdk.io.IChemObjectReader, org.openscience.cdk.io.iterator.IIteratingChemObjectReader<org.openscience.cdk.interfaces.IAtomContainer>

public class DynamicSMILESFileReader extends org.openscience.cdk.io.iterator.DefaultIteratingChemObjectReader<org.openscience.cdk.interfaces.IAtomContainer>
File reader for different kinds of files with a SMILES code column. The reader can detect the structure of the file based on the first few lines and some assumptions like the SMILES and ID/name columns should be the first two but can be in both positions. Unsuitable for reaction SMILES or CxSMILES.
Version:
1.0.0.0
Author:
Jonas Schaub, Samuel Behr
  • Nested Class Summary

    Nested classes/interfaces inherited from interface org.openscience.cdk.io.IChemObjectReader

    org.openscience.cdk.io.IChemObjectReader.Mode
  • Field Summary

    Fields
    Modifier and Type
    Field
    Description
    static final int
    Maximum number of lines starting from the first one to check for valid SMILES strings in a SMILES file when trying to determine the SMILES code column and separator.
    static final Set<String>
    Strings that can be parsed by CDK SmilesParser as SMILES codes but should be ignored when detecting the file structure, e.g.
    static final Set<String>
    Possible SMILES file separators used to separate SMILES code from ID.

    Fields inherited from class org.openscience.cdk.io.iterator.DefaultIteratingChemObjectReader

    errorHandler, mode
  • Constructor Summary

    Constructors
    Constructor
    Description
    Constructs a new DynamicSMILESFileReader that can read molecules from a given file.
    Constructs a new DynamicSMILESFileReader that can read molecules from a given a InputStream.
    Constructs a new DynamicSMILESFileReader that can read molecules from a given a Reader.
  • Method Summary

    Modifier and Type
    Method
    Description
    void
     
    Checking the first few lines of the SMILES file for parsable SMILES codes and saving the determined separator character and SMILES code and ID column positions.
    Checking the first few lines of a SMILES file for parsable SMILES codes and saving the determined separator character and SMILES code and ID column positions.
    org.openscience.cdk.io.formats.IResourceFormat
     
    int
    Returns the number of lines that were skipped (empty or erroneous SMILES codes etc.) in the last file import of this instance (reset at each new import), headline not counted.
    boolean
     
    org.openscience.cdk.interfaces.IAtomContainer
     
    org.openscience.cdk.interfaces.IAtomContainerSet
    Reads SMILES file according to the given format.
    void
    setReader(InputStream inputStream)
     
    void
    setReader(Reader reader)
     
    void
     

    Methods inherited from class org.openscience.cdk.io.iterator.DefaultIteratingChemObjectReader

    accepts, handleError, handleError, handleError, handleError, remove, setErrorHandler, setReaderMode

    Methods inherited from class org.openscience.cdk.io.ChemObjectIO

    addChemObjectIOListener, addSetting, addSettings, fireIOSettingQuestion, getIOSettings, getListeners, getSetting, getSetting, getSettings, hasSetting, removeChemObjectIOListener

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

    Methods inherited from interface org.openscience.cdk.io.IChemObjectIO

    addChemObjectIOListener, addSetting, addSettings, getIOSettings, getListeners, getSetting, getSetting, getSettings, hasSetting, removeChemObjectIOListener

    Methods inherited from interface java.util.Iterator

    forEachRemaining
  • Field Details

    • POSSIBLE_SMILES_FILE_SEPARATORS

      public static final Set<String> POSSIBLE_SMILES_FILE_SEPARATORS
      Possible SMILES file separators used to separate SMILES code from ID. Ordered so that non-whitespace characters are tested first.
    • MAXIMUM_LINE_NUMBER_TO_CHECK_IN_SMILES_FILES

      public static final int MAXIMUM_LINE_NUMBER_TO_CHECK_IN_SMILES_FILES
      Maximum number of lines starting from the first one to check for valid SMILES strings in a SMILES file when trying to determine the SMILES code column and separator.
      See Also:
    • PARSABLE_SMILES_EXCEPTIONS

      public static final Set<String> PARSABLE_SMILES_EXCEPTIONS
      Strings that can be parsed by CDK SmilesParser as SMILES codes but should be ignored when detecting the file structure, e.g. "ID" is a likely column name but could be parsed into Iodine-Deuterium as a SMILES code.
  • Constructor Details

    • DynamicSMILESFileReader

      public DynamicSMILESFileReader(InputStream in, DynamicSMILESFileFormat aFormat) throws org.openscience.cdk.exception.CDKException
      Constructs a new DynamicSMILESFileReader that can read molecules from a given a InputStream.
      Parameters:
      in - the InputStream to read from
      Throws:
      org.openscience.cdk.exception.CDKException
    • DynamicSMILESFileReader

      public DynamicSMILESFileReader(File aFile, DynamicSMILESFileFormat aFormat) throws org.openscience.cdk.exception.CDKException, FileNotFoundException
      Constructs a new DynamicSMILESFileReader that can read molecules from a given file.
      Throws:
      org.openscience.cdk.exception.CDKException
      FileNotFoundException
    • DynamicSMILESFileReader

      public DynamicSMILESFileReader(Reader in, DynamicSMILESFileFormat aFormat) throws org.openscience.cdk.exception.CDKException
      Constructs a new DynamicSMILESFileReader that can read molecules from a given a Reader.
      Parameters:
      in - the Reader to read from
      Throws:
      org.openscience.cdk.exception.CDKException
  • Method Details

    • getSkippedLinesCounter

      public int getSkippedLinesCounter()
      Returns the number of lines that were skipped (empty or erroneous SMILES codes etc.) in the last file import of this instance (reset at each new import), headline not counted.
      Returns:
      nr of lines skipped in last import
    • detectFormat

      public static DynamicSMILESFileFormat detectFormat(List<String> lines) throws IOException
      Checking the first few lines of a SMILES file for parsable SMILES codes and saving the determined separator character and SMILES code and ID column positions. Expects one parsable SMILES code per line of the file and an optional second element, which is interpreted as the molecule's ID or name and is separated from the SMILES code by one of the separator tokens tab, semicolon, comma, or space. Unsuitable for reaction SMILES or CxSMILES.
      Parameters:
      lines - first few lines of a SMILES file
      Returns:
      determined format of the given lines
      Throws:
      IOException - if the lines do not adhere to the format expectations
    • detectFormat

      public static DynamicSMILESFileFormat detectFormat(File aFile) throws IOException
      Checking the first few lines of the SMILES file for parsable SMILES codes and saving the determined separator character and SMILES code and ID column positions. Expects one parsable SMILES code per line of the file and an optional second element, which is interpreted as the molecule's ID or name and is separated from the SMILES code by one of the separator tokens tab, semicolon, comma, or space. Unsuitable for reaction SMILES or CxSMILES.
      Parameters:
      aFile - a SMILES file
      Returns:
      determined format of the given file
      Throws:
      IOException - if the file cannot be found or does not adhere to the format expectations
    • readToSet

      public org.openscience.cdk.interfaces.IAtomContainerSet readToSet() throws IOException
      Reads SMILES file according to the given format. Splits the lines at the given separator character, ignores the first line if the format defines that the file has a headline, parses SMILES codes and IDs from the defined columns, etc. Skipped lines (due to being empty or containing erroneous SMILES codes) are counted and this counter can be queried after import via the respective getter method. If a name/ID column is given in the file, it is read and saved as a property of the respective atom container under the name property key taken from the Importer class.
      Returns:
      atom container set parsed from the file
      Throws:
      IOException - if the given file cannot be found
    • hasNext

      public boolean hasNext()
    • next

      public org.openscience.cdk.interfaces.IAtomContainer next()
    • setReader

      public void setReader(Reader reader) throws org.openscience.cdk.exception.CDKException
      Throws:
      org.openscience.cdk.exception.CDKException
    • setReader

      public void setReader(InputStream inputStream) throws org.openscience.cdk.exception.CDKException
      Throws:
      org.openscience.cdk.exception.CDKException
    • setSMILESFileFormat

      public void setSMILESFileFormat(DynamicSMILESFileFormat aFormat)
    • getFormat

      public org.openscience.cdk.io.formats.IResourceFormat getFormat()
    • close

      public void close() throws IOException
      Throws:
      IOException