dnarecords.helper

DNARecords helper utilities.

Module Contents

Classes

DNARecordsUtils

Utility class to provide common functionalities used in other modules.

class dnarecords.helper.DNARecordsUtils[source]

Utility class to provide common functionalities used in other modules.

static spark_session() pyspark.sql.SparkSession[source]

Gets the current spark session or builds a new one if none.

Ensures sparktfrecord libraries are available in the session.

Returns

a spark session with sparktfrecord libraries available.

Return type

SparkSession

static init_hail() ModuleType[source]

Initializes Hail ensuring sparktfrecord libraries are available in the session. :return: the hail module (with Hail initialized) :rtype: ModuleType

static dnarecords_tree(dnarecords_path) Dict[str, str][source]

DNARecords directory structure.

Gets a dictionary with the full structure of a DNARecords dataset given a root path.

swrec -> <dnarecords_path>/data/swrec (sample wise dna tfrecords)
vwrec -> <dnarecords_path>/data/vwrec (variant wise dna tfrecords)
swpar -> <dnarecords_path>/data/swpar (sample wise dna parquet files)
vwpar -> <dnarecords_path>/data/vwpar (variant wise dna parquet files)
skeys -> <dnarecords_path>/meta/skeys (sample wise key mapping)
vkeys -> <dnarecords_path>/meta/vkeys (variant wise key mapping)
swpfs -> <dnarecords_path>/meta/swpfs (sample wise parquet files index)
vwpfs -> <dnarecords_path>/meta/vwpfs (variant wise parquet files index)
swrfs -> <dnarecords_path>/meta/swrfs (sample wise tfrecords index)
vwrfs -> <dnarecords_path>/meta/vwrfs (variant wise tfrecords index)
swpsc -> <dnarecords_path>/meta/swpsc (sample wise parquet schema)
vwpsc -> <dnarecords_path>/meta/vwpsc (variant wise parquet schema)
swrsc -> <dnarecords_path>/meta/swrsc (sample wise tfrecord schema)
vwrsc -> <dnarecords_path>/meta/vwrsc (variant wise tfrecord schema)
Returns

a dictionary with the structure of the DNARecords dataset.

Return type

Dict[str,str]