pydiodon.loadfile¶
- pydiodon.loadfile(filename, fmt=None, delimiter='\t', rownames=False, colnames=False, datasetname='values', dtype='float32')[source]¶
A generic function for loading datasets as numpy arrays
- Parameters
- filenamestring
contains the data set to be loaded (compulsory)
- fmtstring
explicit format of the file (optional); if it is not given the format will be guessed from the suffix (see notes)
- delimitercharacter
the delimiter between values in a row
- colnamesboolean or string; whether column names
for an ascii file, it is boolean, as True if column names are as first row in the file, and False otherwise;
for an hdf5 file, gives the name of the dataset with the column names
optional, default value is False
- rownamesboolean or string; whether row names
for an ascii file, it is boolean, as True if row names are as first column in the file, and False otherwise
for an hdf5 file, gives the name of the dataset with the row names
optional, default value is None.
- datasetnamestring
for hdf5 files : hdf5 dataset for values
optional, default value is “value”
- Returns
- Aa numpy array
the values of the data set
- rnlist of strings
row names (optional)
- cnlist of strings
column names (optional)
Notes
Recognized formats are:
ascii
,hdf5
andcompressed ascii
.Delimiters in ascii format can be blanks, comma, semi-columns, tabulations
Ascii data sets with
tab
delimiters are expected to be with suffix.txt
or.tsv
.Ascii data sets with other delimiters are expected to be with siffix
.csv
.
When the filename is read, the function splits the name on the last dot, and interprets the string after as the suffix. Then, there is a call
to
load_ascii()
if the suffix is.txt
,tsv
,.gz
or.bz2
,to
load_hdf5
if the suffix ish5
orhdf5
,and unzips the file before a call to
load_ascii()
if the sufix iszip
.
Examples
Here is a call for loading an
ascii
file with extension.txt
hence with tab as delimiters, and with rownames and colnames. In such a case, the call must specifiy that there are colnames and rownames to be read:>>> import pydiodon as dio >>> filename = "pca_template_withnames.txt" >>> A, colnames, rownames = dio.loadfile(filename, colnames=True, rownames=True) >>> print(A) >>> print(colnames) >>> print(rownames)
If it is not specified (default values), an array without colnames and rownames will be loaded, as in
>>> import pydiodon as dio >>> filename = "pca_template_nonames.txt" >>> A = dio.loadfile(filename)
Here is a call of a
.csv
file where delimiters have to be specified:>>> import pydiodon as dio >>> filename = "pca_template_nonames.csv" >>> A = dio.loadfile(filename, delimiter = ";")
Here is how to load a
zip
file from a.txt
file:>>> import pydiodon as dio >>> filename = "pca_template_nonames.txt.zip" >>> A = dio.loadfile(filename) >>> print(A)
and from a
.csv
file with semi-column as delimiter>>> import pydiodon as dio >>> filename = "pca_template_nonames.csv.zip" >>> A = dio.loadfile(filename, delimiter=";")
Here is an example for loading a
hdf5
file with values, colnames and rownames as datasets:>>> import pydiodon as dio >>> filename = "pca_template_withnames.h5" >>> A, colnames, rownames = dio.loadfile(filename, colnames='colnames', rownames='rownames')
version 21.03.23