kb_python.utils

Module Contents

Functions

update_filename(→ str)

Update the provided path with the specified code.

make_directory(path)

Quietly make the specified directory (and any subdirectories).

remove_directory(path)

Quietly make the specified directory (and any subdirectories).

run_executable(→ Union[Tuple[subprocess.Popen, str, ...)

Execute a single shell command.

get_kallisto_version(→ Optional[Tuple[int, int, int]])

Get the provided Kallisto version.

get_bustools_version(→ Optional[Tuple[int, int, int]])

Get the provided Bustools version.

parse_technologies(→ Set[str])

Parse a list of strings into a list of supported technologies.

get_supported_technologies(→ Set[str])

Runs 'kallisto bus --list' to fetch a list of supported technologies.

whitelist_provided(→ bool)

Determine whether or not the whitelist for a technology is provided.

move_file(→ str)

Move a file from source to destination, overwriting the file if the

copy_whitelist(→ str)

Copies provided whitelist for specified technology.

create_10x_feature_barcode_map(→ str)

Create a feature-barcode map for the 10x Feature Barcoding technology.

stream_file(→ str)

Creates a FIFO file to use for piping remote files into processes.

read_t2g(→ Dict[str, Tuple[str, Ellipsis]])

Given a transcript-to-gene mapping path, read it into a dictionary.

collapse_anndata(→ anndata.AnnData)

Collapse the given Anndata by summing duplicate rows. The by argument

import_tcc_matrix_as_anndata(→ anndata.AnnData)

Import a TCC matrix as an Anndata object.

import_matrix_as_anndata(→ anndata.AnnData)

Import a matrix as an Anndata object.

overlay_anndatas(→ anndata.AnnData)

'Overlays' anndata objects by taking the intersection of the obs and var

sum_anndatas(→ anndata.AnnData)

Sum the counts in two anndata objects by taking the intersection of

restore_cwd(→ Callable)

Function decorator to decorate functions that change the current working

Attributes

TECHNOLOGY_PARSER

VERSION_PARSER

open_as_text

decompress_gzip

compress_gzip

concatenate_files

download_file

get_temporary_filename

kb_python.utils.TECHNOLOGY_PARSER
kb_python.utils.VERSION_PARSER
kb_python.utils.open_as_text
kb_python.utils.decompress_gzip
kb_python.utils.compress_gzip
kb_python.utils.concatenate_files
kb_python.utils.download_file
kb_python.utils.get_temporary_filename
kb_python.utils.update_filename(filename: str, code: str) str

Update the provided path with the specified code.

For instance, if the path is ‘output.bus’ and code is s (for sort), this function returns output.s.bus.

Parameters
  • filename – filename (NOT path)

  • code – code to append to filename

Returns

Path updated with provided code

kb_python.utils.make_directory(path: str)

Quietly make the specified directory (and any subdirectories).

This function is a wrapper around os.makedirs. It is used so that the appropriate mkdir command can be printed for dry runs.

Parameters

path – Path to directory to make

kb_python.utils.remove_directory(path: str)

Quietly make the specified directory (and any subdirectories).

This function is a wrapper around shutil.rmtree. It is used so that the appropriate rm command can be printed for dry runs.

Parameters

path – Path to directory to remove

kb_python.utils.run_executable(command: List[str], stdin: Optional[int] = None, stdout: int = sp.PIPE, stderr: int = sp.PIPE, wait: bool = True, stream: bool = True, quiet: bool = False, returncode: int = 0, alias: bool = True, record: bool = True) Union[Tuple[subprocess.Popen, str, str], subprocess.Popen]

Execute a single shell command.

Parameters
  • command – A list representing a single shell command

  • stdin – Object to pass into the stdin argument for subprocess.Popen, defaults to None

  • stdout – Object to pass into the stdout argument for subprocess.Popen, defaults to subprocess.PIPE

  • stderr – Object to pass into the stderr argument for subprocess.Popen, defaults to subprocess.PIPE

  • wait – Whether to wait until the command has finished, defaults to True

  • stream – Whether to stream the output to the command line, defaults to True

  • quiet – Whether to not display anything to the command line and not check the return code, defaults to False

  • returncode – The return code expected if the command runs as intended, defaults to 0

  • alias – Whether to use the basename of the first element of command, defaults to True

  • record – Whether to record the call statistics, defaults to True

Returns

(the spawned process, list of strings printed to stdout,

list of strings printed to stderr) if wait=True. Otherwise, the spawned process

kb_python.utils.get_kallisto_version() Optional[Tuple[int, int, int]]

Get the provided Kallisto version.

This function parses the help text by executing the included Kallisto binary.

Returns

Major, minor, patch versions

kb_python.utils.get_bustools_version() Optional[Tuple[int, int, int]]

Get the provided Bustools version.

This function parses the help text by executing the included Bustools binary.

Returns

Major, minor, patch versions

kb_python.utils.parse_technologies(lines: List[str]) Set[str]

Parse a list of strings into a list of supported technologies.

This function parses the technologies printed by running kallisto bus –list.

Parameters

lines – The output of kallisto bus –list split into lines

Returns

Set of technologies

kb_python.utils.get_supported_technologies() Set[str]

Runs ‘kallisto bus –list’ to fetch a list of supported technologies.

Returns

Set of technologies

kb_python.utils.whitelist_provided(technology: str) bool

Determine whether or not the whitelist for a technology is provided.

Parameters

technology – The name of the technology

Returns

Whether the whitelist is provided

kb_python.utils.move_file(source: str, destination: str) str

Move a file from source to destination, overwriting the file if the destination exists.

Parameters
  • source – Path to source file

  • destination – Path to destination

Returns

Path to moved file

kb_python.utils.copy_whitelist(technology: str, out_dir: str) str

Copies provided whitelist for specified technology.

Parameters
  • technology – The name of the technology

  • out_dir – Directory to put the whitelist

Returns

Path to whitelist

kb_python.utils.create_10x_feature_barcode_map(out_path: str) str

Create a feature-barcode map for the 10x Feature Barcoding technology.

Parameters

out_path – Path to the output mapping file

Returns

Path to map

kb_python.utils.stream_file(url: str, path: str) str

Creates a FIFO file to use for piping remote files into processes.

This function spawns a new thread to download the remote file into a FIFO file object. FIFO file objects are only supported on unix systems.

Parameters
  • url – Url to the file

  • path – Path to place FIFO file

Returns

Path to FIFO file

Raises

UnsupportedOSError – If the OS is Windows

kb_python.utils.read_t2g(t2g_path: str) Dict[str, Tuple[str, Ellipsis]]

Given a transcript-to-gene mapping path, read it into a dictionary. The first column is always assumed to tbe the transcript IDs.

Parameters

t2g_path – Path to t2g

Returns

Dictionary containing transcript IDs as keys and all other columns

as a tuple as values

kb_python.utils.collapse_anndata(adata: anndata.AnnData, by: Optional[str] = None) anndata.AnnData

Collapse the given Anndata by summing duplicate rows. The by argument specifies which column to use. If not provided, the index is used.

Note

This function also collapses any existing layers. Additionally, the returned AnnData will have the values used to collapse as the index.

Parameters
  • adata – The Anndata to collapse

  • by – The column to collapse by. If not provided, the index is used. When this column contains missing values (i.e. nan or None), these columns are removed.

Returns

A new collapsed Anndata object. All matrices are sparse, regardless of whether or not they were in the input Anndata.

kb_python.utils.import_tcc_matrix_as_anndata(matrix_path: str, barcodes_path: str, ec_path: str, txnames_path: str, threads: int = 8) anndata.AnnData

Import a TCC matrix as an Anndata object.

Parameters
  • matrix_path – Path to the matrix ec file

  • barcodes_path – Path to the barcodes txt file

  • genes_path – Path to the ec txt file

  • txnames_path – Path to transcripts.txt generated by kallisto bus

Returns

A new Anndata object

kb_python.utils.import_matrix_as_anndata(matrix_path: str, barcodes_path: str, genes_path: str, t2g_path: Optional[str] = None, name: str = 'gene', by_name: bool = False) anndata.AnnData

Import a matrix as an Anndata object.

Parameters
  • matrix_path – Path to the matrix ec file

  • barcodes_path – Path to the barcodes txt file

  • genes_path – Path to the genes txt file

  • t2g_path – Path to transcript-to-gene mapping. If this is provided, the third column of the mapping is appended to the anndata var, defaults to None

  • name – Name of the columns, defaults to “gene”

  • by_name – Aggregate counts by name instead of ID. t2g_path must be provided and contain names.

Returns

A new Anndata object

kb_python.utils.overlay_anndatas(adata_spliced: anndata.AnnData, adata_unspliced: anndata.AnnData) anndata.AnnData

‘Overlays’ anndata objects by taking the intersection of the obs and var of each anndata.

Note

Matrices generated by kallisto | bustools always contain all genes, even if they have zero counts. Therefore, taking the intersection is not entirely necessary but is done as a sanity check.

Parameters
  • adata_spliced – An Anndata object

  • adata_unspliced – An Anndata object

Returns

A new Anndata object

kb_python.utils.sum_anndatas(adata_spliced: anndata.AnnData, adata_unspliced: anndata.AnnData) anndata.AnnData

Sum the counts in two anndata objects by taking the intersection of both matrices and adding the values together.

Note

Matrices generated by kallisto | bustools always contain all genes, even if they have zero counts. Therefore, taking the intersection is not entirely necessary but is done as a sanity check.

Parameters
  • adata_spliced – An Anndata object

  • adata_unspliced – An Anndata object

Returns

A new Anndata object

kb_python.utils.restore_cwd(func: Callable) Callable

Function decorator to decorate functions that change the current working directory. When such a function is decorated with this function, the current working directory is restored to its previous state when the function exits.