kb_python.ref

Module Contents

kb_python.ref.logger
kb_python.ref.sort_gtf(gtf_path, out_path)

Sorts a GTF file based on its chromosome, start position, line number.

Parameters:gtf_path (str) – path to GTF file
Returns:path to sorted GTF file
Return type:str
kb_python.ref.sort_fasta(fasta_path, out_path)

Sorts a FASTA file based on its header.

Parameters:fasta_path (str) – path to FASTA file
Returns:path to sorted FASTA file
Return type:str
kb_python.ref.create_t2g_from_fasta(fasta_path, t2g_path)

Parse FASTA headers to get transcripts-to-gene mapping.

Parameters:
  • fasta_path (str) – path to FASTA file
  • t2g_path (str) – path to output transcript-to-gene mapping
Returns:

dictionary containing path to generated t2g mapping

Return type:

dict

kb_python.ref.create_t2g_from_gtf(gtf_path, t2g_path, intron=False)

Creates a transcript-to-gene mapping from a GTF file.

GTF entries that have transcript as its feature are parsed for the transcript_id, gene_id and gene_name.

Parameters:
  • gtf_path (str) – path to GTF file
  • t2g_path (str) – path to output transcript-to-gene mapping
  • intron (bool, optional) – whether or not to include intron transcript ids (with the -I prefix), defaults to False
Returns:

dictionary containing path to generated t2g mapping

Return type:

dict

kb_python.ref.create_t2c(fasta_path, t2c_path)

Creates a transcripts-to-capture list from a FASTA file.

Parameters:
  • fasta_path (str) – path to FASTA file
  • t2c_path (str) – path to output transcripts-to-capture list
Returns:

dictionary containing path to generated t2c list

Return type:

dict

kb_python.ref.kallisto_index(fasta_path, index_path, k=31)

Runs kallisto index.

Parameters:
  • fasta_path (str) – path to FASTA file
  • index_path (str) – path to output kallisto index
  • k (int, optional) – k-mer length, defaults to 31
Returns:

dictionary containing path to generated index

Return type:

dict

kb_python.ref.download_reference(reference, files, temp_dir='tmp', overwrite=False)

Downloads a provided reference file from a static url.

The configuration for provided references is in config.py.

Parameters:
  • reference (Reference) – a Reference object, as defined in config.py
  • files (dict) – dictionary that has the command-line option as keys and the path as values. used to determine if all the required paths to download the given reference have been provided
  • temp_dir (str, optional) – path to temporary directory, defaults to tmp
  • overwrite (bool, optional) – overwrite an existing index file, defaults to False
Returns:

dictionary containing paths to generated file(s)

Return type:

dict

kb_python.ref.decompress_file(path, temp_dir='tmp')

Decompress the given path if it is a .gz file. Otherwise, return the original path.

Parameters:path (str) – path to the file
Returns:unaltered path if the file is not a .gz file, otherwise path to the uncompressed file
Return type:str
kb_python.ref.ref(fasta_path, gtf_path, cdna_path, index_path, t2g_path, temp_dir='tmp', overwrite=False)

Generates files necessary to generate count matrices for single-cell RNA-seq.

Parameters:
  • fasta_path (str) – path to genomic FASTA file
  • gtf_path (str) – path to GTF file
  • cdna_path (str) – path to generate the cDNA FASTA file
  • t2g_path (str) – path to output transcript-to-gene mapping
  • temp_dir (str, optional) – path to temporary directory, defaults to tmp
  • overwrite (bool, optional) – overwrite an existing index file, defaults to False
Returns:

dictionary containing paths to generated file(s)

Return type:

dict

kb_python.ref.ref_lamanno(fasta_path, gtf_path, cdna_path, intron_path, index_path, t2g_path, cdna_t2c_path, intron_t2c_path, temp_dir='tmp', overwrite=False)

Generates files necessary to generate RNA velocity matrices for single-cell RNA-seq.

Parameters:
  • fasta_path (str) – path to genomic FASTA file
  • gtf_path (str) – path to GTF file
  • cdna_path (str) – path to generate the cDNA FASTA file
  • intron_path (str) – path to generate the intron FASTA file
  • t2g_path (str) – path to output transcript-to-gene mapping
  • cdna_t2c_path (str) – path to generate the cDNA transcripts-to-capture file
  • intron_t2c_path (str) – path to generate the intron transcripts-to-capture file
  • temp_dir (str, optional) – path to temporary directory, defaults to tmp
  • overwrite (bool, optional) – overwrite an existing index file, defaults to False
Returns:

dictionary containing paths to generated file(s)

Return type:

dict