pydiodon.loadfile¶

pydiodon.loadfile(filename, fmt=None, delimiter='\t', rownames=False, colnames=False, datasetname='values', dtype='float32')[source]¶

A generic function for loading datasets as numpy arrays

Parameters

filenamestring

contains the data set to be loaded (compulsory)

fmtstring

explicit format of the file (optional); if it is not given the format will be guessed from the suffix (see notes)

delimitercharacter

the delimiter between values in a row

colnamesboolean or string; whether column names

for an ascii file, it is boolean, as True if column names are as first row in the file, and False otherwise;
for an hdf5 file, gives the name of the dataset with the column names
optional, default value is False

rownamesboolean or string; whether row names

for an ascii file, it is boolean, as True if row names are as first column in the file, and False otherwise
for an hdf5 file, gives the name of the dataset with the row names
optional, default value is None.

datasetnamestring

for hdf5 files : hdf5 dataset for values
optional, default value is “value”

Returns

Aa numpy array: the values of the data set
rnlist of strings: row names (optional)
cnlist of strings: column names (optional)

Notes

Recognized formats are: ascii, hdf5 and compressed ascii.
Delimiters in ascii format can be blanks, comma, semi-columns, tabulations
Ascii data sets with tab delimiters are expected to be with suffix .txt or .tsv.
Ascii data sets with other delimiters are expected to be with siffix .csv.

When the filename is read, the function splits the name on the last dot, and interprets the string after as the suffix. Then, there is a call

to load_ascii() if the suffix is .txt, tsv, .gz or .bz2,
to load_hdf5 if the suffix is h5 or hdf5,
and unzips the file before a call to load_ascii() if the sufix is zip.

Examples

Here is a call for loading an ascii file with extension .txt hence with tab as delimiters, and with rownames and colnames. In such a case, the call must specifiy that there are colnames and rownames to be read:

>>> import pydiodon as dio
>>> filename = "pca_template_withnames.txt"
>>> A, colnames, rownames = dio.loadfile(filename, colnames=True, rownames=True)
>>> print(A)
>>> print(colnames)
>>> print(rownames)

If it is not specified (default values), an array without colnames and rownames will be loaded, as in

>>> import pydiodon as dio
>>> filename = "pca_template_nonames.txt"
>>> A = dio.loadfile(filename)

Here is a call of a .csv file where delimiters have to be specified:

>>> import pydiodon as dio
>>> filename = "pca_template_nonames.csv"
>>> A = dio.loadfile(filename, delimiter = ";")

Here is how to load a zip file from a .txt file:

>>> import pydiodon as dio
>>> filename = "pca_template_nonames.txt.zip"
>>> A = dio.loadfile(filename)
>>> print(A)

and from a .csv file with semi-column as delimiter

>>> import pydiodon as dio
>>> filename = "pca_template_nonames.csv.zip"
>>> A = dio.loadfile(filename, delimiter=";")

Here is an example for loading a hdf5 file with values, colnames and rownames as datasets:

>>> import pydiodon as dio
>>> filename = "pca_template_withnames.h5"
>>> A, colnames, rownames = dio.loadfile(filename, colnames='colnames', rownames='rownames')

version 21.03.23

pydiodon.loadfile¶

Previous topic

Next topic