kb_python.count

Module Contents

kb_python.count.logger
kb_python.count.kallisto_bus(fastqs, index_path, technology, out_dir, threads=8)

Runs kallisto bus.

Parameters:
  • fastqs (list) – list of FASTQ file paths
  • index_path (str) – path to kallisto index
  • technology (str) – single-cell technology used
  • out_dir (str) – path to output directory
  • threads (int, optional) – number of threads to use, defaults to 8
Returns:

dictionary containing path to generated index

Return type:

dict

kb_python.count.bustools_sort(bus_path, out_path, temp_dir='tmp', threads=8, memory='4G')

Runs bustools sort.

Parameters:
  • bus_path (str) – path to BUS file to sort
  • out_dir (str) – path to output directory
  • temp_dir (str, optional) – path to temporary directory, defaults to tmp
  • threads (int, optional) – number of threads to use, defaults to 8
  • memory (str, optional) – amount of memory to use, defaults to 4G
Returns:

dictionary containing path to generated index

Return type:

dict

kb_python.count.bustools_inspect(bus_path, out_path, whitelist_path, ecmap_path)

Runs bustools inspect.

Parameters:
  • bus_path (str) – path to BUS file to sort
  • out_path (str) – path to output inspect JSON file
  • whitelist_path (str) – path to whitelist
  • ecmap_path (str) – path to ecmap file, as generated by kallisto bus
Returns:

dictionary containing path to generated index

Return type:

dict

kb_python.count.bustools_correct(bus_path, out_path, whitelist_path)

Runs bustools correct.

Parameters:
  • bus_path (str) – path to BUS file to correct
  • out_path (str) – path to output corrected BUS file
  • whitelist_path (str) – path to whitelist
Returns:

dictionary containing path to generated index

Return type:

dict

kb_python.count.bustools_count(bus_path, out_prefix, t2g_path, ecmap_path, txnames_path)

Runs bustools count.

Parameters:
  • bus_path (str) – path to BUS file to correct
  • out_prefix (str) – prefix of the output files to generate
  • t2g_path (str) – path to output transcript-to-gene mapping
  • ecmap_path (str) – path to ecmap file, as generated by kallisto bus
  • txnames_path (str) – path to transcript names file, as generated by kallisto bus
Returns:

dictionary containing path to generated index

Return type:

dict

kb_python.count.bustools_capture(bus_path, out_path, capture_path, ecmap_path, txnames_path, capture_type='transcripts')

Runs bustools capture.

Parameters:
  • bus_path (str) – path to BUS file to capture
  • out_path (str) – path to BUS file to generate
  • capture_path (str) – path transcripts-to-capture list
  • ecmap_path (str) – path to ecmap file, as generated by kallisto bus
  • txnames_path (str) – path to transcript names file, as generated by kallisto bus
  • capture_type (str) – the type of information in the capture list. can be one of transcripts, umis, barcode.
Returns:

dictionary containing path to generated index

Return type:

dict

kb_python.count.bustools_whitelist(bus_path, out_path)

Runs bustools whitelist.

Parameters:
  • bus_path (str) – path to BUS file generate the whitelist from
  • out_path (str) – path to output whitelist
Returns:

dictionary containing path to generated index

Return type:

dict

kb_python.count.filter_with_bustools(bus_path, ecmap_path, txnames_path, t2g_path, whitelist_path, filtered_bus_path, counts_prefix=None, temp_dir='tmp', threads=8, memory='4G', count=True, loom=False, h5ad=False)

Generate filtered count matrices with bustools.

Parameters:
  • bus_path (str) – path to sorted, corrected, sorted BUS file
  • ecmap_path (str) – path to matrix ec file
  • txnames_path (str) – path to list of transcripts
  • t2g_path (str) – path to transcript-to-gene mapping
  • whitelist_path (str) – path to filter whitelist to generate
  • filtered_bus_path (str) – path to filtered BUS file to generate
  • counts_prefix (str) – prefix of count matrix
  • temp_dir (str, optional) – path to temporary directory, defaults to tmp
  • threads (int, optional) – number of threads to use, defaults to 8
  • memory (str, optional) – amount of memory to use, defaults to 4G
  • loom (bool, optional) – whether to convert the final count matrix into a loom file, defaults to False
  • h5ad (bool, optional) – whether to convert the final count matrix into a h5ad file, defaults to False
Returns:

dictionary of generated files

Return type:

dict

kb_python.count.stream_fastqs(fastqs, temp_dir='tmp')

Given a list of fastqs (that may be local or remote paths), stream any remote files. Internally, calls utils.

Parameters:
  • fastqs (list) – list of (remote or local) fastq paths
  • temp_dir (str) – temporary directory
Returns:

all remote paths substituted with a local path

Return type:

list

kb_python.count.copy_or_create_whitelist(technology, bus_path, out_dir)

Copies a pre-packaged whitelist if it is provided. Otherwise, runs bustools whitelist to generate a whitelist.

Parameters:
  • technology (str) – single-cell technology used
  • bus_path (str) – path to BUS file generate the whitelist from
  • out_dir (str) – path to output directory
Returns:

path to copied or generated whitelist

Return type:

str

kb_python.count.convert_matrix_to_loom(matrix_path, barcodes_path, genes_path, out_path)

Converts a matrix to loom.

Parameters:
  • matrix_path (str) – path to matrix mtx file
  • barcodes_path (str) – path list of barcodes
  • genes_path (str) – path to list of genes
  • out_path (str) – path to output loom file
Returns:

path to loom file

Return type:

str

kb_python.count.convert_matrix_to_h5ad(matrix_path, barcodes_path, genes_path, out_path)

Converts a matrix to h5ad.

Parameters:
  • matrix_path (str) – path to matrix mtx file
  • barcodes_path (str) – path list of barcodes
  • genes_path (str) – path to list of genes
  • out_path (str) – path to output h5ad file
Returns:

path to h5ad file

Return type:

str

kb_python.count.count(index_path, t2g_path, technology, out_dir, fastqs, whitelist_path=None, filter=None, temp_dir='tmp', threads=8, memory='4G', overwrite=False, loom=False, h5ad=False)

Generates count matrices for single-cell RNA seq.

Parameters:
  • index_path (str) – path to kallisto index
  • t2g_path (str) – path to transcript-to-gene mapping
  • technology (str) – single-cell technology used
  • out_dir (str) – path to output directory
  • fastqs (list) – list of FASTQ file paths
  • whitelist_path (str, optional) – path to whitelist, defaults to None
  • filter (str, optional) – filter to use to generate a filtered count matrix, defaults to None
  • temp_dir (str, optional) – path to temporary directory, defaults to tmp
  • threads (int, optional) – number of threads to use, defaults to 8
  • memory (str, optional) – amount of memory to use, defaults to 4G
  • overwrite (bool, optional) – overwrite an existing index file, defaults to False
  • loom (bool, optional) – whether to convert the final count matrix into a loom file, defaults to False
  • h5ad (bool, optional) – whether to convert the final count matrix into a h5ad file, defaults to False
Returns:

dictionary containing path to generated index

Return type:

dict

kb_python.count.count_velocity(index_path, t2g_path, cdna_t2c_path, intron_t2c_path, technology, out_dir, fastqs, whitelist_path=None, filter=None, temp_dir='tmp', threads=8, memory='4G', overwrite=False, loom=False, h5ad=False, nucleus=False)

Generates RNA velocity matrices for single-cell RNA seq.

Parameters:
  • index_path (str) – path to kallisto index
  • t2g_path (str) – path to transcript-to-gene mapping
  • cdna_t2c_path (str) – path to cDNA transcripts-to-capture file
  • intron_t2c_path (str) – path to intron transcripts-to-capture file
  • technology (str) – single-cell technology used
  • out_dir (str) – path to output directory
  • fastqs (list) – list of FASTQ file paths
  • whitelist_path (str, optional) – path to whitelist, defaults to None
  • filter (str, optional) – filter to use to generate a filtered count matrix, defaults to None
  • temp_dir (str, optional) – path to temporary directory, defaults to tmp
  • threads (int, optional) – number of threads to use, defaults to 8
  • memory (str, optional) – amount of memory to use, defaults to 4G
  • overwrite (bool, optional) – overwrite an existing index file, defaults to False
  • loom (bool, optional) – whether to convert the final count matrix into a loom file, defaults to False
  • h5ad (bool, optional) – whether to convert the final count matrix into a h5ad file, defaults to False
  • nucleus (bool, optional) – whether this is a single-nucleus experiment. if True, the spliced and unspliced count matrices will be summed, defaults to False
Returns:

dictionary containing path to generated index

Return type:

dict