kb_python.utils

Module Contents

Classes

TqdmLoggingHandler

Custom logging handler so that logging does not affect progress bars.

Functions

update_filename(filename, code)

Update the provided path with the specified code.

make_directory(path)

Quietly make the specified directory (and any subdirectories).

remove_directory(path)

Quietly make the specified directory (and any subdirectories).

run_executable(command, stdin=None, stdout=sp.PIPE, stderr=sp.PIPE, wait=True, stream=True, quiet=False, returncode=0, alias=True, record=True)

Execute a single shell command.

get_kallisto_version()

Get the provided Kallisto version.

get_bustools_version()

Get the provided Bustools version.

parse_technologies(lines)

Parse a list of strings into a list of supported technologies.

get_supported_technologies()

Runs ‘kallisto bus –list’ to fetch a list of supported technologies.

whitelist_provided(technology)

Determine whether or not the whitelist for a technology is provided.

move_file(source, destination)

Move a file from source to destination, overwriting the file if the

copy_whitelist(technology, out_dir)

Copies provided whitelist for specified technology.

copy_map(technology, out_dir)

Copies provided feature-to-cell barcode mapping for the speified technology.

stream_file(url, path)

Creates a FIFO file to use for piping remote files into processes.

read_t2g(t2g_path)

Given a transcript-to-gene mapping path, read it into a dictionary.

import_tcc_matrix_as_anndata(matrix_path, barcodes_path, ec_path, txnames_path, threads=8)

Import a TCC matrix as an Anndata object.

import_matrix_as_anndata(matrix_path, barcodes_path, genes_path, t2g_path=None, name='gene')

Import a matrix as an Anndata object.

overlay_anndatas(adata_spliced, adata_unspliced)

‘Overlays’ anndata objects by taking the intersection of the obs and var

sum_anndatas(adata_spliced, adata_unspliced)

Sum the counts in two anndata objects by taking the intersection of

Attributes

TECHNOLOGY_PARSER

VERSION_PARSER

open_as_text

decompress_gzip

compress_gzip

concatenate_files

download_file

get_temporary_filename

kb_python.utils.TECHNOLOGY_PARSER
kb_python.utils.VERSION_PARSER
kb_python.utils.open_as_text
kb_python.utils.decompress_gzip
kb_python.utils.compress_gzip
kb_python.utils.concatenate_files
kb_python.utils.download_file
kb_python.utils.get_temporary_filename
exception kb_python.utils.NotImplementedException

Bases: Exception

Common base class for all non-exit exceptions.

exception kb_python.utils.UnmetDependencyException

Bases: Exception

Common base class for all non-exit exceptions.

class kb_python.utils.TqdmLoggingHandler(level=logging.NOTSET)

Bases: logging.Handler

Custom logging handler so that logging does not affect progress bars.

emit(self, record)

Do whatever it takes to actually log the specified logging record.

This version is intended to be implemented by subclasses and so raises a NotImplementedError.

kb_python.utils.update_filename(filename, code)

Update the provided path with the specified code.

For instance, if the path is ‘output.bus’ and code is s (for sort), this function returns output.s.bus.

Parameters
  • filename (str) – filename (NOT path)

  • code (str) – code to append to filename

Returns

path updated with provided code

Return type

str

kb_python.utils.make_directory(path)

Quietly make the specified directory (and any subdirectories).

This function is a wrapper around os.makedirs. It is used so that the appropriate mkdir command can be printed for dry runs.

Parameters

path (str) – path to directory to make

kb_python.utils.remove_directory(path)

Quietly make the specified directory (and any subdirectories).

This function is a wrapper around shutil.rmtree. It is used so that the appropriate rm command can be printed for dry runs.

Parameters

path (str) – path to directory to remove

kb_python.utils.run_executable(command, stdin=None, stdout=sp.PIPE, stderr=sp.PIPE, wait=True, stream=True, quiet=False, returncode=0, alias=True, record=True)

Execute a single shell command.

Parameters
  • command (list) – a list representing a single shell command

  • stdin (stream, optional) – object to pass into the stdin argument for subprocess.Popen, defaults to None

  • stdout (stream, optional) – object to pass into the stdout argument for subprocess.Popen, defaults to subprocess.PIPE

  • stderr (stream, optional) – object to pass into the stderr argument for subprocess.Popen, defaults to subprocess.PIPE

  • wait (bool, optional) – whether to wait until the command has finished, defaults to True

  • stream (bool, optional) – whether to stream the output to the command line, defaults to True

  • quiet (bool, optional) – whether to not display anything to the command line and not check the return code, defaults to False

  • returncode (int, optional) – the return code expected if the command runs as intended, defaults to 0

  • alias (bool, optional) – whether to use the basename of the first element of command, defaults to True

  • record (bool, optional) – whether to record the call statistics, defaults to True

Returns

the spawned process

Return type

subprocess.Process

kb_python.utils.get_kallisto_version()

Get the provided Kallisto version.

This function parses the help text by executing the included Kallisto binary.

Returns

tuple of major, minor, patch versions

Return type

tuple

kb_python.utils.get_bustools_version()

Get the provided Bustools version.

This function parses the help text by executing the included Bustools binary.

Returns

tuple of major, minor, patch versions

Return type

tuple

kb_python.utils.parse_technologies(lines)

Parse a list of strings into a list of supported technologies.

This function parses the technologies printed by running kallisto bus –list.

Parameters

lines (list) – the output of kallisto bus –list split into lines

Returns

list of technologies

Return type

list

kb_python.utils.get_supported_technologies()

Runs ‘kallisto bus –list’ to fetch a list of supported technologies.

Returns

list of technologies

Return type

list

kb_python.utils.whitelist_provided(technology)

Determine whether or not the whitelist for a technology is provided.

Parameters

technology (str) – the name of the technology

Returns

whether the whitelist is provided

Return type

bool

kb_python.utils.move_file(source, destination)

Move a file from source to destination, overwriting the file if the destination exists.

Parameters
  • source (str) – path to source file

  • destination (str) – path to destination

Returns

path to moved file

Return type

str

kb_python.utils.copy_whitelist(technology, out_dir)

Copies provided whitelist for specified technology.

Parameters
  • technology (str) – the name of the technology

  • out_dir (str) – directory to put the whitelist

Returns

path to whitelist

Return type

str

kb_python.utils.copy_map(technology, out_dir)

Copies provided feature-to-cell barcode mapping for the speified technology.

Parameters
  • technology (str) – the name of the technology

  • out_dir (str) – directory to put the map

Returns

path to map

Return type

str

kb_python.utils.stream_file(url, path)

Creates a FIFO file to use for piping remote files into processes.

This function spawns a new thread to download the remote file into a FIFO file object. FIFO file objects are only supported on unix systems.

Parameters
  • url (str) – url to the file

  • path (str) – path to place FIFO file

Raises

UnsupportedOSException – if the OS is Windows

Returns

path to FIFO file

Return type

str

kb_python.utils.read_t2g(t2g_path)

Given a transcript-to-gene mapping path, read it into a dictionary. The first column is always assumed to tbe the transcript IDs.

Parameters

t2g_path (str) – path to t2g

Returns

dictionary containing transcript IDs as keys and all other columns as a tuple as values

Return type

dict

kb_python.utils.import_tcc_matrix_as_anndata(matrix_path, barcodes_path, ec_path, txnames_path, threads=8)

Import a TCC matrix as an Anndata object.

Parameters
  • matrix_path (str) – path to the matrix ec file

  • barcodes_path (str) – path to the barcodes txt file

  • genes_path (str) – path to the ec txt file

  • txnames_path (str) – path to transcripts.txt generated by kallisto bus

Returns

a new Anndata object

Return type

anndata.Anndata

kb_python.utils.import_matrix_as_anndata(matrix_path, barcodes_path, genes_path, t2g_path=None, name='gene')

Import a matrix as an Anndata object.

Parameters
  • matrix_path (str) – path to the matrix ec file

  • barcodes_path (str) – path to the barcodes txt file

  • genes_path (str) – path to the genes txt file

  • t2g_path (str, optional) – path to transcript-to-gene mapping. If this is provided, the third column of the mapping is appended to the anndata var, defaults to None

  • name (str, optional) – name of the columns, defaults to “gene”

Returns

a new Anndata object

Return type

anndata.Anndata

kb_python.utils.overlay_anndatas(adata_spliced, adata_unspliced)

‘Overlays’ anndata objects by taking the intersection of the obs and var of each anndata.

Parameters
  • adata_spliced (anndata.Anndata) – an Anndata object

  • adata_unspliced (anndata.Anndata) – an Anndata object

Returns

a new Anndata object

Return type

anndata.Anndata

kb_python.utils.sum_anndatas(adata_spliced, adata_unspliced)

Sum the counts in two anndata objects by taking the intersection of both matrices and adding the values together.

Parameters
  • adata_spliced (anndata.Anndata) – an Anndata object

  • adata_unspliced (anndata.Anndata) – an Anndata object

Returns

a new Anndata object

Return type

anndata.Anndata