dnarecords.macros
DNARecords available macros.
Module Contents
Classes
Pre-created methods to increase sparsity of your datasets |
- class dnarecords.macros.DosageSparsityMacro[source]
Pre-created methods to increase sparsity of your datasets
- static supercharge_dosage_sparsity(mt: MatrixTable, info_score_threshold: float = 0.8, p_value_hwe_threshold: float = 1e-10, af_threshold: float = 0.001, sparse_threshold: float = 0.1) MatrixTable [source]
Supercharges dosage sparsity based on info_score and variant_qc.
Assumes the input MatrixTable to have info_score and variant_qc row fields, and dosage entry field.
Note
It flips the dosage (2 - dosage) for those variants where alt allele is more frequent and includes a flag indicating whether the flip was done or not.
Example
import dnarecords as dr hl = dr.helper.DNARecordsUtils.init_hail() hl.utils.get_1kg('/tmp/1kg') mt = hl.read_matrix_table('/tmp/1kg/1kg.mt') mt = mt.annotate_rows(info_score=hl.agg.info_score(hl.pl_to_gp(mt.PL))) mt = hl.variant_qc(mt) mt = mt.annotate_entries(dosage=hl.pl_dosage(mt.PL)) mt = dr.macros.DosageSparsityMacro.supercharge_dosage_sparsity(mt) mt.select_cols().select_rows('dosage_flip') \ .select_entries('dosage','sparse_dosage').entries().show()
+---------------+------------+-------------+-----------+----------+---------------+ | locus | alleles | dosage_flip | s | dosage | sparse_dosage | +---------------+------------+-------------+-----------+----------+---------------+ | locus<GRCh37> | array<str> | bool | str | float64 | float64 | +---------------+------------+-------------+-----------+----------+---------------+ | 1:904165 | ["G","A"] | False | "HG00096" | 5,94e-02 | 0,00e+00 | | 1:904165 | ["G","A"] | False | "HG00099" | 3,97e-03 | 0,00e+00 | | 1:904165 | ["G","A"] | False | "HG00105" | 4,99e-03 | 0,00e+00 | | 1:904165 | ["G","A"] | False | "HG00118" | 7,88e-03 | 0,00e+00 | | 1:904165 | ["G","A"] | False | "HG00129" | 3,07e-02 | 0,00e+00 | | 1:904165 | ["G","A"] | False | "HG00148" | 7,36e-02 | 0,00e+00 | | 1:904165 | ["G","A"] | False | "HG00177" | 2,01e-01 | 2,01e-01 | | 1:904165 | ["G","A"] | False | "HG00182" | 3,83e-02 | 0,00e+00 | | 1:904165 | ["G","A"] | False | "HG00242" | 3,07e-02 | 0,00e+00 | | 1:904165 | ["G","A"] | False | "HG00254" | 1,26e-04 | 0,00e+00 | +---------------+------------+-------------+-----------+----------+---------------+
- Parameters
mt – a MatrixTable with info_score and variant_qc row fields, and dosage entry field.
info_score_threshold – rows with info_score.score below the threshold are filtered out.
p_value_hwe_threshold – rows with variant_qc.p_value_hwe below the threshold are filtered out.
af_threshold – rows with AF < af_threshold or AF > (1 - af_threshold) are filtered out.
sparse_threshold – those entries with dosage below the threshold are set dosage = 0.
- Returns
a MatrixTable with above transformations done.
- Return type
MatrixTable