pydiodon.loadfile¶
- pydiodon.loadfile(filename, fmt=None, delimiter='\t', rownames=False, colnames=False, datasetname='values', dtype='float32')[source]¶
A generic function for loading datasets as numpy arrays
- Parameters:
- filenamestring
contains the data set to be loaded (compulsory)
- fmtstring
explicit format of the file (optional); if it is not given the format will be guessed from the suffix (see notes)
- delimitercharacter
the delimiter between values in a row
- colnamesboolean or string; whether column names
for an ascii file, it is boolean, as True if column names are as first row in the file, and False otherwise;
for an hdf5 file, gives the name of the dataset with the column names
optional, default value is False
- rownamesboolean or string; whether row names
for an ascii file, it is boolean, as True if row names are as first column in the file, and False otherwise
for an hdf5 file, gives the name of the dataset with the row names
optional, default value is None.
- datasetnamestring
for hdf5 files : hdf5 dataset for values
optional, default value is “value”
- Returns:
- Aa numpy array
the values of the data set
- rnlist of strings
row names (optional)
- cnlist of strings
column names (optional)
Notes
Recognized formats are:
ascii,hdf5andcompressed ascii.Delimiters in ascii format can be blanks, comma, semi-columns, tabulations
Ascii data sets with
tabdelimiters are expected to be with suffix.txtor.tsv.Ascii data sets with other delimiters are expected to be with siffix
.csv.
When the filename is read, the function splits the name on the last dot, and interprets the string after as the suffix. Then, there is a call
to
load_ascii()if the suffix is.txt,tsv,.gzor.bz2,to
load_hdf5if the suffix ish5orhdf5,and unzips the file before a call to
load_ascii()if the sufix iszip.
Examples
Here is a call for loading an
asciifile with extension.txthence with tab as delimiters, and with rownames and colnames. In such a case, the call must specifiy that there are colnames and rownames to be read:>>> import pydiodon as dio >>> filename = "pca_template_withnames.txt" >>> A, colnames, rownames = dio.loadfile(filename, colnames=True, rownames=True) >>> print(A) >>> print(colnames) >>> print(rownames)
If it is not specified (default values), an array without colnames and rownames will be loaded, as in
>>> import pydiodon as dio >>> filename = "pca_template_nonames.txt" >>> A = dio.loadfile(filename)
Here is a call of a
.csvfile where delimiters have to be specified:>>> import pydiodon as dio >>> filename = "pca_template_nonames.csv" >>> A = dio.loadfile(filename, delimiter = ";")
Here is how to load a
zipfile from a.txtfile:>>> import pydiodon as dio >>> filename = "pca_template_nonames.txt.zip" >>> A = dio.loadfile(filename) >>> print(A)
and from a
.csvfile with semi-column as delimiter>>> import pydiodon as dio >>> filename = "pca_template_nonames.csv.zip" >>> A = dio.loadfile(filename, delimiter=";")
Here is an example for loading a
hdf5file with values, colnames and rownames as datasets:>>> import pydiodon as dio >>> filename = "pca_template_withnames.h5" >>> A, colnames, rownames = dio.loadfile(filename, colnames='colnames', rownames='rownames')
version 21.03.23