pydiodon.loadfile

pydiodon.loadfile(filename, fmt=None, delimiter='\t', rownames=False, colnames=False, datasetname='values', dtype='float32')[source]

A generic function for loading datasets as numpy arrays

Parameters
filenamestring

contains the data set to be loaded (compulsory)

fmtstring

explicit format of the file (optional); if it is not given the format will be guessed from the suffix (see notes)

delimitercharacter

the delimiter between values in a row

colnamesboolean or string; whether column names
  • for an ascii file, it is boolean, as True if column names are as first row in the file, and False otherwise;

  • for an hdf5 file, gives the name of the dataset with the column names

  • optional, default value is False

rownamesboolean or string; whether row names
  • for an ascii file, it is boolean, as True if row names are as first column in the file, and False otherwise

  • for an hdf5 file, gives the name of the dataset with the row names

  • optional, default value is None.

datasetnamestring
  • for hdf5 files : hdf5 dataset for values

  • optional, default value is “value”

Returns
Aa numpy array

the values of the data set

rnlist of strings

row names (optional)

cnlist of strings

column names (optional)

Notes

  • Recognized formats are: ascii, hdf5 and compressed ascii.

  • Delimiters in ascii format can be blanks, comma, semi-columns, tabulations

  • Ascii data sets with tab delimiters are expected to be with suffix .txt or .tsv.

  • Ascii data sets with other delimiters are expected to be with siffix .csv.

When the filename is read, the function splits the name on the last dot, and interprets the string after as the suffix. Then, there is a call

  • to load_ascii() if the suffix is .txt, tsv, .gz or .bz2,

  • to load_hdf5 if the suffix is h5 or hdf5,

  • and unzips the file before a call to load_ascii() if the sufix is zip.

Examples

Here is a call for loading an ascii file with extension .txt hence with tab as delimiters, and with rownames and colnames. In such a case, the call must specifiy that there are colnames and rownames to be read:

>>> import pydiodon as dio
>>> filename = "pca_template_withnames.txt"
>>> A, colnames, rownames = dio.loadfile(filename, colnames=True, rownames=True)
>>> print(A)
>>> print(colnames)
>>> print(rownames)

If it is not specified (default values), an array without colnames and rownames will be loaded, as in

>>> import pydiodon as dio
>>> filename = "pca_template_nonames.txt"
>>> A = dio.loadfile(filename)

Here is a call of a .csv file where delimiters have to be specified:

>>> import pydiodon as dio
>>> filename = "pca_template_nonames.csv"
>>> A = dio.loadfile(filename, delimiter = ";")

Here is how to load a zip file from a .txt file:

>>> import pydiodon as dio
>>> filename = "pca_template_nonames.txt.zip"
>>> A = dio.loadfile(filename)
>>> print(A)

and from a .csv file with semi-column as delimiter

>>> import pydiodon as dio
>>> filename = "pca_template_nonames.csv.zip"
>>> A = dio.loadfile(filename, delimiter=";")

Here is an example for loading a hdf5 file with values, colnames and rownames as datasets:

>>> import pydiodon as dio
>>> filename = "pca_template_withnames.h5"
>>> A, colnames, rownames = dio.loadfile(filename, colnames='colnames', rownames='rownames')

version 21.03.23