kb_python.utils

Module Contents

Classes

TqdmLoggingHandler Custom logging handler so that logging does not affect progress bars.

Functions

update_filename(filename, code) Update the provided path with the specified code.
open_as_text(path, mode) Open a textfile or gzip file in text mode.
decompress_gzip(gzip_path, out_path) Decompress a gzip file to provided file path.
compress_gzip(file_path, out_path) Compress a file into gzip.
make_directory(path) Quietly make the specified directory (and any subdirectories).
remove_directory(path) Quietly make the specified directory (and any subdirectories).
run_executable(command, stdin=None, stdout=sp.PIPE, stderr=sp.PIPE, wait=True, stream=True, quiet=False, returncode=0, alias=True) Execute a single shell command.
get_kallisto_version() Get the provided Kallisto version.
get_bustools_version() Get the provided Bustools version.
parse_technologies(lines) Parse a list of strings into a list of supported technologies.
get_supported_technologies() Runs ‘kallisto bus –list’ to fetch a list of supported technologies.
whitelist_provided(technology) Determine whether or not the whitelist for a technology is provided.
move_file(source, destination) Move a file from source to destination, overwriting the file if the
copy_whitelist(technology, out_dir) Copies provided whitelist for specified technology.
copy_map(technology, out_dir) Copies provided feature-to-cell barcode mapping for the speified technology.
concatenate_files(*paths, out_path, temp_dir=’tmp’) Concatenates an arbitrary number of files into one TEXT file.
download_file(url, path) Download a remote file to the provided path while displaying a progress bar.
stream_file(url, path) Creates a FIFO file to use for piping remote files into processes.
get_temporary_filename(temp_dir=None) Create a temporary file in the provided temprorary directory.
import_tcc_matrix_as_anndata(matrix_path, barcodes_path, ec_path, txnames_path, threads=8) Import a TCC matrix as an Anndata object.
import_matrix_as_anndata(matrix_path, barcodes_path, genes_path, t2g_path=None, name=’gene’) Import a matrix as an Anndata object.
overlay_anndatas(adata_spliced, adata_unspliced) ‘Overlays’ anndata objects by taking the intersection of the obs and var
sum_anndatas(adata_spliced, adata_unspliced) Sum the counts in two anndata objects by taking the intersection of
kb_python.utils.logger
kb_python.utils.TECHNOLOGY_PARSER
kb_python.utils.VERSION_PARSER
exception kb_python.utils.NotImplementedException

Bases: Exception

Common base class for all non-exit exceptions.

exception kb_python.utils.UnmetDependencyException

Bases: Exception

Common base class for all non-exit exceptions.

class kb_python.utils.TqdmLoggingHandler(level=logging.NOTSET)

Bases: logging.Handler

Custom logging handler so that logging does not affect progress bars.

emit(self, record)

Do whatever it takes to actually log the specified logging record.

This version is intended to be implemented by subclasses and so raises a NotImplementedError.

kb_python.utils.update_filename(filename, code)

Update the provided path with the specified code.

For instance, if the path is ‘output.bus’ and code is s (for sort), this function returns output.s.bus.

Parameters:
  • filename (str) – filename (NOT path)
  • code (str) – code to append to filename
Returns:

path updated with provided code

Return type:

str

kb_python.utils.open_as_text(path, mode)

Open a textfile or gzip file in text mode.

Parameters:
  • path (str) – path to textfile or gzip
  • mode (str) – mode to open the file, either w for write or r for read
Returns:

file object

Return type:

file object

kb_python.utils.decompress_gzip(gzip_path, out_path)

Decompress a gzip file to provided file path.

Parameters:
  • gzip_path (str) – path to gzip file
  • out_path (str) – path to decompressed file
Returns:

path to decompressed file

Return type:

str

kb_python.utils.compress_gzip(file_path, out_path)

Compress a file into gzip.

Parameters:
  • file_path (str) – path to file
  • out_dir (str) – path to compressed file
Returns:

path to compressed file

Return type:

str

kb_python.utils.make_directory(path)

Quietly make the specified directory (and any subdirectories).

This function is a wrapper around os.makedirs. It is used so that the appropriate mkdir command can be printed for dry runs.

Parameters:path (str) – path to directory to make
kb_python.utils.remove_directory(path)

Quietly make the specified directory (and any subdirectories).

This function is a wrapper around shutil.rmtree. It is used so that the appropriate rm command can be printed for dry runs.

Parameters:path (str) – path to directory to remove
kb_python.utils.run_executable(command, stdin=None, stdout=sp.PIPE, stderr=sp.PIPE, wait=True, stream=True, quiet=False, returncode=0, alias=True)

Execute a single shell command.

Parameters:
  • command (list) – a list representing a single shell command
  • stdin (stream, optional) – object to pass into the stdin argument for subprocess.Popen, defaults to None
  • stdout (stream, optional) – object to pass into the stdout argument for subprocess.Popen, defaults to subprocess.PIPE
  • stderr (stream, optional) – object to pass into the stderr argument for subprocess.Popen, defaults to subprocess.PIPE
  • wait (bool, optional) – whether to wait until the command has finished, defaults to True
  • stream (bool, optional) – whether to stream the output to the command line, defaults to True
  • quiet (bool, optional) – whether to not display anything to the command line and not check the return code, defaults to False
  • returncode (int, optional) – the return code expected if the command runs as intended, defaults to 0
  • alias (bool, optional) – whether to use the basename of the first element of command, defaults to True
Returns:

the spawned process

Return type:

subprocess.Process

kb_python.utils.get_kallisto_version()

Get the provided Kallisto version.

This function parses the help text by executing the included Kallisto binary.

Returns:tuple of major, minor, patch versions
Return type:tuple
kb_python.utils.get_bustools_version()

Get the provided Bustools version.

This function parses the help text by executing the included Bustools binary.

Returns:tuple of major, minor, patch versions
Return type:tuple
kb_python.utils.parse_technologies(lines)

Parse a list of strings into a list of supported technologies.

This function parses the technologies printed by running kallisto bus –list.

Parameters:lines (list) – the output of kallisto bus –list split into lines
Returns:list of technologies
Return type:list
kb_python.utils.get_supported_technologies()

Runs ‘kallisto bus –list’ to fetch a list of supported technologies.

Returns:list of technologies
Return type:list
kb_python.utils.whitelist_provided(technology)

Determine whether or not the whitelist for a technology is provided.

Parameters:technology (str) – the name of the technology
Returns:whether the whitelist is provided
Return type:bool
kb_python.utils.move_file(source, destination)

Move a file from source to destination, overwriting the file if the destination exists.

Parameters:
  • source (str) – path to source file
  • destination (str) – path to destination
Returns:

path to moved file

Return type:

str

kb_python.utils.copy_whitelist(technology, out_dir)

Copies provided whitelist for specified technology.

Parameters:
  • technology (str) – the name of the technology
  • out_dir (str) – directory to put the whitelist
Returns:

path to whitelist

Return type:

str

kb_python.utils.copy_map(technology, out_dir)

Copies provided feature-to-cell barcode mapping for the speified technology.

Parameters:
  • technology (str) – the name of the technology
  • out_dir (str) – directory to put the map
Returns:

path to map

Return type:

str

kb_python.utils.concatenate_files(*paths, out_path, temp_dir='tmp')

Concatenates an arbitrary number of files into one TEXT file.

Only supports text and gzip files.

Parameters:
  • paths (str) – an arbitrary number of paths to files
  • out_path (str) – path to place concatenated file
  • temp_dir (str, optional) – temporary directory, defaults to tmp
Returns:

path to concatenated file

Return type:

str

kb_python.utils.download_file(url, path)

Download a remote file to the provided path while displaying a progress bar.

Parameters:
  • url (str) – remote url
  • path (str) – local path to download the file to
Returns:

path to downloaded file

Return type:

str

kb_python.utils.stream_file(url, path)

Creates a FIFO file to use for piping remote files into processes.

This function spawns a new thread to download the remote file into a FIFO file object. FIFO file objects are only supported on unix systems.

Parameters:
  • url (str) – url to the file
  • path (str) – path to place FIFO file
Raises:

UnsupportedOSException – if the OS is Windows

Returns:

path to FIFO file

Return type:

str

kb_python.utils.get_temporary_filename(temp_dir=None)

Create a temporary file in the provided temprorary directory.

The caller is responsible for deleting the file.

Parameters:temp_dir (str, optional) – path to temporary directory, defaults to None
Returns:temporary filename
Return type:str
kb_python.utils.import_tcc_matrix_as_anndata(matrix_path, barcodes_path, ec_path, txnames_path, threads=8)

Import a TCC matrix as an Anndata object.

Parameters:
  • matrix_path (str) – path to the matrix ec file
  • barcodes_path (str) – path to the barcodes txt file
  • genes_path (str) – path to the ec txt file
  • txnames_path (str) – path to transcripts.txt generated by kallisto bus
Returns:

a new Anndata object

Return type:

anndata.Anndata

kb_python.utils.import_matrix_as_anndata(matrix_path, barcodes_path, genes_path, t2g_path=None, name='gene')

Import a matrix as an Anndata object.

Parameters:
  • matrix_path (str) – path to the matrix ec file
  • barcodes_path (str) – path to the barcodes txt file
  • genes_path (str) – path to the genes txt file
  • t2g_path (str, optional) – path to transcript-to-gene mapping. If this is provided, the third column of the mapping is appended to the anndata var, defaults to None
  • name (str, optional) – name of the columns, defaults to “gene”
Returns:

a new Anndata object

Return type:

anndata.Anndata

kb_python.utils.overlay_anndatas(adata_spliced, adata_unspliced)

‘Overlays’ anndata objects by taking the intersection of the obs and var of each anndata.

Parameters:
  • adata_spliced (anndata.Anndata) – an Anndata object
  • adata_unspliced (anndata.Anndata) – an Anndata object
Returns:

a new Anndata object

Return type:

anndata.Anndata

kb_python.utils.sum_anndatas(adata_spliced, adata_unspliced)

Sum the counts in two anndata objects by taking the intersection of both matrices and adding the values together.

Parameters:
  • adata_spliced (anndata.Anndata) – an Anndata object
  • adata_unspliced (anndata.Anndata) – an Anndata object
Returns:

a new Anndata object

Return type:

anndata.Anndata