Class DynamicSMILESFileReader
java.lang.Object
org.openscience.cdk.io.ChemObjectIO
org.openscience.cdk.io.iterator.DefaultIteratingChemObjectReader<org.openscience.cdk.interfaces.IAtomContainer>
de.unijena.cheminf.deglycosylation.tools.DynamicSMILESFileReader
- All Implemented Interfaces:
Closeable
,AutoCloseable
,Iterator<org.openscience.cdk.interfaces.IAtomContainer>
,org.openscience.cdk.io.IChemObjectIO
,org.openscience.cdk.io.IChemObjectReader
,org.openscience.cdk.io.iterator.IIteratingChemObjectReader<org.openscience.cdk.interfaces.IAtomContainer>
public class DynamicSMILESFileReader
extends org.openscience.cdk.io.iterator.DefaultIteratingChemObjectReader<org.openscience.cdk.interfaces.IAtomContainer>
File reader for different kinds of files with a SMILES code column. The reader can detect the structure of the file based
on the first few lines and some assumptions like the SMILES and ID/name columns should be the first two but can
be in both positions. Unsuitable for reaction SMILES or CxSMILES.
- Version:
- 1.0.0.0
- Author:
- Jonas Schaub, Samuel Behr
-
Nested Class Summary
Nested classes/interfaces inherited from interface org.openscience.cdk.io.IChemObjectReader
org.openscience.cdk.io.IChemObjectReader.Mode
-
Field Summary
FieldsModifier and TypeFieldDescriptionstatic final int
Maximum number of lines starting from the first one to check for valid SMILES strings in a SMILES file when trying to determine the SMILES code column and separator.Strings that can be parsed by CDK SmilesParser as SMILES codes but should be ignored when detecting the file structure, e.g.Possible SMILES file separators used to separate SMILES code from ID.Fields inherited from class org.openscience.cdk.io.iterator.DefaultIteratingChemObjectReader
errorHandler, mode
-
Constructor Summary
ConstructorsConstructorDescriptionDynamicSMILESFileReader
(File aFile, DynamicSMILESFileFormat aFormat) Constructs a new DynamicSMILESFileReader that can read molecules from a given file.DynamicSMILESFileReader
(InputStream in, DynamicSMILESFileFormat aFormat) Constructs a new DynamicSMILESFileReader that can read molecules from a given a InputStream.DynamicSMILESFileReader
(Reader in, DynamicSMILESFileFormat aFormat) Constructs a new DynamicSMILESFileReader that can read molecules from a given a Reader. -
Method Summary
Modifier and TypeMethodDescriptionvoid
close()
static DynamicSMILESFileFormat
detectFormat
(File aFile) Checking the first few lines of the SMILES file for parsable SMILES codes and saving the determined separator character and SMILES code and ID column positions.static DynamicSMILESFileFormat
detectFormat
(List<String> lines) Checking the first few lines of a SMILES file for parsable SMILES codes and saving the determined separator character and SMILES code and ID column positions.org.openscience.cdk.io.formats.IResourceFormat
int
Returns the number of lines that were skipped (empty or erroneous SMILES codes etc.) in the last file import of this instance (reset at each new import), headline not counted.boolean
hasNext()
org.openscience.cdk.interfaces.IAtomContainer
next()
org.openscience.cdk.interfaces.IAtomContainerSet
Reads SMILES file according to the given format.void
setReader
(InputStream inputStream) void
void
Methods inherited from class org.openscience.cdk.io.iterator.DefaultIteratingChemObjectReader
accepts, handleError, handleError, handleError, handleError, remove, setErrorHandler, setReaderMode
Methods inherited from class org.openscience.cdk.io.ChemObjectIO
addChemObjectIOListener, addSetting, addSettings, fireIOSettingQuestion, getIOSettings, getListeners, getSetting, getSetting, getSettings, hasSetting, removeChemObjectIOListener
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
Methods inherited from interface org.openscience.cdk.io.IChemObjectIO
addChemObjectIOListener, addSetting, addSettings, getIOSettings, getListeners, getSetting, getSetting, getSettings, hasSetting, removeChemObjectIOListener
Methods inherited from interface java.util.Iterator
forEachRemaining
-
Field Details
-
POSSIBLE_SMILES_FILE_SEPARATORS
Possible SMILES file separators used to separate SMILES code from ID. Ordered so that non-whitespace characters are tested first. -
MAXIMUM_LINE_NUMBER_TO_CHECK_IN_SMILES_FILES
public static final int MAXIMUM_LINE_NUMBER_TO_CHECK_IN_SMILES_FILESMaximum number of lines starting from the first one to check for valid SMILES strings in a SMILES file when trying to determine the SMILES code column and separator.- See Also:
-
PARSABLE_SMILES_EXCEPTIONS
Strings that can be parsed by CDK SmilesParser as SMILES codes but should be ignored when detecting the file structure, e.g. "ID" is a likely column name but could be parsed into Iodine-Deuterium as a SMILES code.
-
-
Constructor Details
-
DynamicSMILESFileReader
public DynamicSMILESFileReader(InputStream in, DynamicSMILESFileFormat aFormat) throws org.openscience.cdk.exception.CDKException Constructs a new DynamicSMILESFileReader that can read molecules from a given a InputStream.- Parameters:
in
- theInputStream
to read from- Throws:
org.openscience.cdk.exception.CDKException
-
DynamicSMILESFileReader
public DynamicSMILESFileReader(File aFile, DynamicSMILESFileFormat aFormat) throws org.openscience.cdk.exception.CDKException, FileNotFoundException Constructs a new DynamicSMILESFileReader that can read molecules from a given file.- Throws:
org.openscience.cdk.exception.CDKException
FileNotFoundException
-
DynamicSMILESFileReader
public DynamicSMILESFileReader(Reader in, DynamicSMILESFileFormat aFormat) throws org.openscience.cdk.exception.CDKException Constructs a new DynamicSMILESFileReader that can read molecules from a given a Reader.- Parameters:
in
- theReader
to read from- Throws:
org.openscience.cdk.exception.CDKException
-
-
Method Details
-
getSkippedLinesCounter
public int getSkippedLinesCounter()Returns the number of lines that were skipped (empty or erroneous SMILES codes etc.) in the last file import of this instance (reset at each new import), headline not counted.- Returns:
- nr of lines skipped in last import
-
detectFormat
Checking the first few lines of a SMILES file for parsable SMILES codes and saving the determined separator character and SMILES code and ID column positions. Expects one parsable SMILES code per line of the file and an optional second element, which is interpreted as the molecule's ID or name and is separated from the SMILES code by one of the separator tokens tab, semicolon, comma, or space. Unsuitable for reaction SMILES or CxSMILES.- Parameters:
lines
- first few lines of a SMILES file- Returns:
- determined format of the given lines
- Throws:
IOException
- if the lines do not adhere to the format expectations
-
detectFormat
Checking the first few lines of the SMILES file for parsable SMILES codes and saving the determined separator character and SMILES code and ID column positions. Expects one parsable SMILES code per line of the file and an optional second element, which is interpreted as the molecule's ID or name and is separated from the SMILES code by one of the separator tokens tab, semicolon, comma, or space. Unsuitable for reaction SMILES or CxSMILES.- Parameters:
aFile
- a SMILES file- Returns:
- determined format of the given file
- Throws:
IOException
- if the file cannot be found or does not adhere to the format expectations
-
readToSet
Reads SMILES file according to the given format. Splits the lines at the given separator character, ignores the first line if the format defines that the file has a headline, parses SMILES codes and IDs from the defined columns, etc. Skipped lines (due to being empty or containing erroneous SMILES codes) are counted and this counter can be queried after import via the respective getter method. If a name/ID column is given in the file, it is read and saved as a property of the respective atom container under the name property key taken from the Importer class.- Returns:
- atom container set parsed from the file
- Throws:
IOException
- if the given file cannot be found
-
hasNext
public boolean hasNext() -
next
public org.openscience.cdk.interfaces.IAtomContainer next() -
setReader
- Throws:
org.openscience.cdk.exception.CDKException
-
setReader
- Throws:
org.openscience.cdk.exception.CDKException
-
setSMILESFileFormat
-
getFormat
public org.openscience.cdk.io.formats.IResourceFormat getFormat() -
close
- Throws:
IOException
-