`kb_python.count`¶

Module Contents¶

Functions¶

`kallisto_pseudo`(batch_path, index_path, out_dir, threads=8)	Runs kallisto pseudo.
`kallisto_bus`(fastqs, index_path, technology, out_dir, threads=8, n=False, k=False)	Runs kallisto bus.
`kallisto_bus_split`(fastqs, index_paths, technology, out_dir, temp_dir=’tmp’, threads=8, memory=‘4G’)	Runs kallisto bus with split indices.
`bustools_mash`(out_dirs, out_dir)	Runs bustools mash. Additionally, combines the `run_info.json`s into
`bustools_merge`(bus_path, out_dir, ecmap_path, txnames_path)	Runs bustools merge.
`bustools_project`(bus_path, out_path, map_path, ecmap_path, txnames_path)	Runs bustools project.
`bustools_sort`(bus_path, out_path, temp_dir=’tmp’, threads=8, memory=‘4G’, flags=False)	Runs bustools sort.
`bustools_inspect`(bus_path, out_path, whitelist_path, ecmap_path)	Runs bustools inspect.
`bustools_correct`(bus_path, out_path, whitelist_path)	Runs bustools correct.
`bustools_count`(bus_path, out_prefix, t2g_path, ecmap_path, txnames_path, tcc=False, mm=False)	Runs bustools count.
`bustools_capture`(bus_path, out_path, capture_path, ecmap_path, txnames_path, capture_type=’transcripts’)	Runs bustools capture.
`bustools_whitelist`(bus_path, out_path)	Runs bustools whitelist.
`write_smartseq_batch`(pairs_1, pairs_2, out_path)	Write a 3-column TSV specifying batch information for Smart-seq reads.
`matrix_to_cellranger`(matrix_path, barcodes_path, genes_path, t2g_path, out_dir)	Convert bustools count matrix to cellranger-format matrix.
`convert_matrix`(counts_dir, matrix_path, barcodes_path, genes_path=None, ec_path=None, t2g_path=None, txnames_path=None, name=’gene’, loom=False, h5ad=False, tcc=False, threads=8)	Convert a gene count or TCC matrix to loom or h5ad.
`convert_matrices`(counts_dir, matrix_paths, barcodes_paths, genes_paths=None, ec_paths=None, t2g_path=None, txnames_path=None, name=’gene’, loom=False, h5ad=False, nucleus=False, tcc=False, threads=8)	Convert a gene count or TCC matrix to loom or h5ad.
`filter_with_bustools`(bus_path, ecmap_path, txnames_path, t2g_path, whitelist_path, filtered_bus_path, counts_prefix=None, tcc=False, mm=False, kite=False, temp_dir=’tmp’, threads=8, memory=‘4G’, count=True, loom=False, h5ad=False, cellranger=False)	Generate filtered count matrices with bustools.
`stream_fastqs`(fastqs, temp_dir=’tmp’)	Given a list of fastqs (that may be local or remote paths), stream any
`copy_or_create_whitelist`(technology, bus_path, out_dir)	Copies a pre-packaged whitelist if it is provided. Otherwise, runs
`count`(index_paths, t2g_path, technology, out_dir, fastqs, whitelist_path=None, tcc=False, mm=False, filter=None, kite=False, FB=False, temp_dir=’tmp’, threads=8, memory=‘4G’, overwrite=False, loom=False, h5ad=False, cellranger=False, inspect=True, report=False)	Generates count matrices for single-cell RNA seq.
`count_smartseq`(index_paths, t2g_path, technology, out_dir, fastqs, temp_dir=’tmp’, threads=8, memory=‘4G’, overwrite=False, loom=False, h5ad=False)	Generates gene or isoform count matrices from Smart-seq reads.
`count_velocity`(index_paths, t2g_path, cdna_t2c_path, intron_t2c_path, technology, out_dir, fastqs, whitelist_path=None, tcc=False, mm=False, filter=None, temp_dir=’tmp’, threads=8, memory=‘4G’, overwrite=False, loom=False, h5ad=False, cellranger=False, report=False, inspect=True, nucleus=False)	Generates RNA velocity matrices for single-cell RNA seq.

kb_python.count.logger¶

kb_python.count.INSPECT_PARSER¶

kb_python.count.kallisto_pseudo(batch_path, index_path, out_dir, threads=8)¶

Runs kallisto pseudo.

Parameters:	batch_path (str) – path to textfile containing batch definitions index_path (str) – path to kallisto index out_dir (str) – path to output directory threads (int, optional) – number of threads to use, defaults to 8
Returns:	dictionary containing output files
Return type:	dict

kb_python.count.kallisto_bus(fastqs, index_path, technology, out_dir, threads=8, n=False, k=False)¶

Runs kallisto bus.

Parameters:	fastqs (list) – list of FASTQ file paths index_path (str) – path to kallisto index technology (str) – single-cell technology used out_dir (str) – path to output directory threads (int, optional) – number of threads to use, defaults to 8 n (bool, optional) – include number of read in flag column (used when splitting indices), defaults to False k (bool, optional) – alignment is done per k-mer (used when splitting indices), defaults to False
Returns:	dictionary containing paths to generated files
Return type:	dict

kb_python.count.kallisto_bus_split(fastqs, index_paths, technology, out_dir, temp_dir='tmp', threads=8, memory='4G')¶

Runs kallisto bus with split indices.

Parameters:	fastqs (list) – list of FASTQ file paths or URLs index_paths (list) – paths to kallisto indices technology (str) – single-cell technology used out_dir (str) – path to output directory temp_dir (str, optional) – path to temporary directory, defaults to tmp threads (int, optional) – number of threads to use, defaults to 8 memory (str, optional) – amount of memory to use, defaults to 4G
Returns:	dictionary containing paths to generated files
Return type:	dict

kb_python.count.bustools_mash(out_dirs, out_dir)¶

Runs bustools mash. Additionally, combines the `run_info.json`s into one.

Parameters:	out_dirs (list) – list of kallisto bus output directories. Note that BUS files should be sorted by flag out_dir (str) – path to output directory
Returns:	dictionary containing paths to generated files
Return type:	dict

kb_python.count.bustools_merge(bus_path, out_dir, ecmap_path, txnames_path)¶

Runs bustools merge.

Parameters:	bus_path (str) – path to BUS file to merge out_dir (str) – path to output directory, where the merged BUS file and ecmap will be written ecmap_path (str) – path to ecmap file, as generated by kallisto bus txnames_path (str) – path to transcript names file, as generated by kallisto bus
Returns:	dictionary containing path to generated BUS file and merged ecmap
Return type:	dict

kb_python.count.bustools_project(bus_path, out_path, map_path, ecmap_path, txnames_path)¶

Runs bustools project.

Parameters:	bus_path (str) – path to BUS file to sort out_dir (str) – path to output directory map_path (str) – path to file containing source-to-destination mapping ecmap_path (str) – path to ecmap file, as generated by kallisto bus txnames_path (str) – path to transcript names file, as generated by kallisto bus
Returns:	dictionary containing path to generated BUS file
Return type:	dict

kb_python.count.bustools_sort(bus_path, out_path, temp_dir='tmp', threads=8, memory='4G', flags=False)¶

Runs bustools sort.

Parameters:	bus_path (str) – path to BUS file to sort out_dir (str) – path to output BUS path temp_dir (str, optional) – path to temporary directory, defaults to tmp threads (int, optional) – number of threads to use, defaults to 8 memory (str, optional) – amount of memory to use, defaults to 4G flags (bool, optional) – whether to supply the –flags argument to sort, defaults to False
Returns:	dictionary containing path to generated index
Return type:	dict

kb_python.count.bustools_inspect(bus_path, out_path, whitelist_path, ecmap_path)¶

Runs bustools inspect.

Parameters:	bus_path (str) – path to BUS file to sort out_path (str) – path to output inspect JSON file whitelist_path (str) – path to whitelist ecmap_path (str) – path to ecmap file, as generated by kallisto bus
Returns:	dictionary containing path to generated index
Return type:	dict

kb_python.count.bustools_correct(bus_path, out_path, whitelist_path)¶

Runs bustools correct.

Parameters:	bus_path (str) – path to BUS file to correct out_path (str) – path to output corrected BUS file whitelist_path (str) – path to whitelist
Returns:	dictionary containing path to generated index
Return type:	dict

kb_python.count.bustools_count(bus_path, out_prefix, t2g_path, ecmap_path, txnames_path, tcc=False, mm=False)¶

Runs bustools count.

Parameters:	bus_path (str) – path to BUS file to correct out_prefix (str) – prefix of the output files to generate t2g_path (str) – path to output transcript-to-gene mapping ecmap_path (str) – path to ecmap file, as generated by kallisto bus txnames_path (str) – path to transcript names file, as generated by kallisto bus tcc (bool, optional) – whether to generate a TCC matrix instead of a gene count matrix, defaults to False mm (bool, optional) – whether to include BUS records that pseudoalign to multiple genes, defaults to False
Returns:	dictionary containing path to generated index
Return type:	dict

kb_python.count.bustools_capture(bus_path, out_path, capture_path, ecmap_path, txnames_path, capture_type='transcripts')¶

Runs bustools capture.

Parameters:	bus_path (str) – path to BUS file to capture out_path (str) – path to BUS file to generate capture_path (str) – path transcripts-to-capture list ecmap_path (str) – path to ecmap file, as generated by kallisto bus txnames_path (str) – path to transcript names file, as generated by kallisto bus capture_type (str) – the type of information in the capture list. can be one of transcripts, umis, barcode.
Returns:	dictionary containing path to generated index
Return type:	dict

kb_python.count.bustools_whitelist(bus_path, out_path)¶

Runs bustools whitelist.

Parameters:	bus_path (str) – path to BUS file generate the whitelist from out_path (str) – path to output whitelist
Returns:	dictionary containing path to generated index
Return type:	dict

kb_python.count.write_smartseq_batch(pairs_1, pairs_2, out_path)¶

Write a 3-column TSV specifying batch information for Smart-seq reads. This file is required to use kallisto pseudo on multiple samples (= cells).

Parameters:	pairs_1 (list) – list of paths to FASTQs corresponding to pair 1 pairs_2 (list) – list of paths to FASTQS corresponding to pair 2 out_path (str) – path to batch file to output
Returns:	dictionary of written batch file
Return type:	dict

kb_python.count.matrix_to_cellranger(matrix_path, barcodes_path, genes_path, t2g_path, out_dir)¶

Convert bustools count matrix to cellranger-format matrix.

Parameters:	matrix_path (str) – path to matrix barcodes_path (str) – list of paths to barcodes.txt genes_path (str) – path to genes.txt t2g_path (str) – path to transcript-to-gene mapping out_dir (str) – path to output matrix
Returns:	dictionary of matrix files
Return type:	dict

kb_python.count.convert_matrix(counts_dir, matrix_path, barcodes_path, genes_path=None, ec_path=None, t2g_path=None, txnames_path=None, name='gene', loom=False, h5ad=False, tcc=False, threads=8)¶

Convert a gene count or TCC matrix to loom or h5ad.

Parameters:	counts_dir (str) – path to counts directory matrix_path (str) – path to matrix barcodes_path (str) – list of paths to barcodes.txt genes_path (str, optional) – path to genes.txt, defaults to None ec_path (str, optional) – path to ec.txt, defaults to None t2g_path (str, optional) – path to transcript-to-gene mapping. If this is provided, the third column of the mapping is appended to the anndata var, defaults to None txnames_path (str, optional) – path to transcripts.txt, defaults to None name (str, optional) – name of the columns, defaults to “gene” loom (bool, optional) – whether to generate loom file, defaults to False h5ad (bool, optional) – whether to generate h5ad file, defaults to False tcc (bool, optional) – whether the matrix is a TCC matrix, defaults to False threads (int, optional) – number of threads to use, defaults to 8
Returns:	dictionary of generated files
Return type:	dict

kb_python.count.convert_matrices(counts_dir, matrix_paths, barcodes_paths, genes_paths=None, ec_paths=None, t2g_path=None, txnames_path=None, name='gene', loom=False, h5ad=False, nucleus=False, tcc=False, threads=8)¶

Convert a gene count or TCC matrix to loom or h5ad.

Parameters:	counts_dir (str) – path to counts directory matrix_paths (list) – list of paths to matrices barcodes_paths (list) – list of paths to barcodes.txt genes_paths (list, optional) – list of paths to genes.txt, defaults to None ec_paths (list, optional) – list of path to ec.txt, defaults to None t2g_path (str, optional) – path to transcript-to-gene mapping. If this is provided, the third column of the mapping is appended to the anndata var, defaults to None txnames_path (str, optional) – list of paths to transcripts.txt, defaults to None name (str, optional) – name of the columns, defaults to “gene” loom (bool, optional) – whether to generate loom file, defaults to False h5ad (bool, optional) – whether to generate h5ad file, defaults to False nucleus (bool, optional) – whether the matrices contain single nucleus counts, defaults to False tcc (bool, optional) – whether the matrix is a TCC matrix, defaults to False threads (int, optional) – number of threads to use, defaults to 8
Returns:	dictionary of generated files
Return type:	dict

kb_python.count.filter_with_bustools(bus_path, ecmap_path, txnames_path, t2g_path, whitelist_path, filtered_bus_path, counts_prefix=None, tcc=False, mm=False, kite=False, temp_dir='tmp', threads=8, memory='4G', count=True, loom=False, h5ad=False, cellranger=False)¶

Generate filtered count matrices with bustools.

Parameters:	bus_path (str) – path to sorted, corrected, sorted BUS file ecmap_path (str) – path to matrix ec file txnames_path (str) – path to list of transcripts t2g_path (str) – path to transcript-to-gene mapping whitelist_path (str) – path to filter whitelist to generate filtered_bus_path (str) – path to filtered BUS file to generate counts_prefix (str, optional) – prefix of count matrix, defaults to None tcc (bool, optional) – whether to generate a TCC matrix instead of a gene count matrix, defaults to False mm (bool, optional) – whether to include BUS records that pseudoalign to multiple genes, defaults to False kite (bool, optional) – Whether this is a KITE workflow temp_dir (str, optional) – path to temporary directory, defaults to tmp threads (int, optional) – number of threads to use, defaults to 8 memory (str, optional) – amount of memory to use, defaults to 4G loom (bool, optional) – whether to convert the final count matrix into a loom file, defaults to False h5ad (bool, optional) – whether to convert the final count matrix into a h5ad file, defaults to False cellranger (bool, optional) – whether to convert the final count matrix into a cellranger-compatible matrix, defaults to False
Returns:	dictionary of generated files
Return type:	dict

kb_python.count.stream_fastqs(fastqs, temp_dir='tmp')¶

Given a list of fastqs (that may be local or remote paths), stream any remote files. Internally, calls utils.

Parameters:	fastqs (list) – list of (remote or local) fastq paths temp_dir (str) – temporary directory
Returns:	all remote paths substituted with a local path
Return type:	list

kb_python.count.copy_or_create_whitelist(technology, bus_path, out_dir)¶

Copies a pre-packaged whitelist if it is provided. Otherwise, runs bustools whitelist to generate a whitelist.

Parameters:	technology (str) – single-cell technology used bus_path (str) – path to BUS file generate the whitelist from out_dir (str) – path to output directory
Returns:	path to copied or generated whitelist
Return type:	str

kb_python.count.count(index_paths, t2g_path, technology, out_dir, fastqs, whitelist_path=None, tcc=False, mm=False, filter=None, kite=False, FB=False, temp_dir='tmp', threads=8, memory='4G', overwrite=False, loom=False, h5ad=False, cellranger=False, inspect=True, report=False)¶

Generates count matrices for single-cell RNA seq.

Parameters:	index_paths (list) – paths to kallisto indices t2g_path (str) – path to transcript-to-gene mapping technology (str) – single-cell technology used out_dir (str) – path to output directory fastqs (list) – list of FASTQ file paths whitelist_path (str, optional) – path to whitelist, defaults to None tcc (bool, optional) – whether to generate a TCC matrix instead of a gene count matrix, defaults to False mm (bool, optional) – whether to include BUS records that pseudoalign to multiple genes, defaults to False filter (str, optional) – filter to use to generate a filtered count matrix, defaults to None kite (bool, optional) – Whether this is a KITE workflow FB (bool, optional) – whether 10x Genomics Feature Barcoding technology was used, defaults to False temp_dir (str, optional) – path to temporary directory, defaults to tmp threads (int, optional) – number of threads to use, defaults to 8 memory (str, optional) – amount of memory to use, defaults to 4G overwrite (bool, optional) – overwrite an existing index file, defaults to False loom (bool, optional) – whether to convert the final count matrix into a loom file, defaults to False h5ad (bool, optional) – whether to convert the final count matrix into a h5ad file, defaults to False cellranger (bool, optional) – whether to convert the final count matrix into a cellranger-compatible matrix, defaults to False inspect (bool, optional) – whether or not to inspect the output BUS file and generate the inspect.json report (bool, optional) – generate an HTMl report, defaults to False
Returns:	dictionary containing path to generated index
Return type:	dict

kb_python.count.count_smartseq(index_paths, t2g_path, technology, out_dir, fastqs, temp_dir='tmp', threads=8, memory='4G', overwrite=False, loom=False, h5ad=False)¶: Generates gene or isoform count matrices from Smart-seq reads.

kb_python.count.count_velocity(index_paths, t2g_path, cdna_t2c_path, intron_t2c_path, technology, out_dir, fastqs, whitelist_path=None, tcc=False, mm=False, filter=None, temp_dir='tmp', threads=8, memory='4G', overwrite=False, loom=False, h5ad=False, cellranger=False, report=False, inspect=True, nucleus=False)¶

Generates RNA velocity matrices for single-cell RNA seq.

Parameters:	index_paths (list) – paths to kallisto indices t2g_path (str) – path to transcript-to-gene mapping cdna_t2c_path (str) – path to cDNA transcripts-to-capture file intron_t2c_path (str) – path to intron transcripts-to-capture file technology (str) – single-cell technology used out_dir (str) – path to output directory fastqs (list) – list of FASTQ file paths whitelist_path (str, optional) – path to whitelist, defaults to None tcc (bool, optional) – whether to generate a TCC matrix instead of a gene count matrix, defaults to False mm (bool, optional) – whether to include BUS records that pseudoalign to multiple genes, defaults to False filter (str, optional) – filter to use to generate a filtered count matrix, defaults to None temp_dir (str, optional) – path to temporary directory, defaults to tmp threads (int, optional) – number of threads to use, defaults to 8 memory (str, optional) – amount of memory to use, defaults to 4G overwrite (bool, optional) – overwrite an existing index file, defaults to False loom (bool, optional) – whether to convert the final count matrix into a loom file, defaults to False h5ad (bool, optional) – whether to convert the final count matrix into a h5ad file, defaults to False cellranger (bool, optional) – whether to convert the final count matrix into a cellranger-compatible matrix, defaults to False report (bool, optional) – generate HTML reports, defaults to False inspect (bool, optional) – whether or not to inspect the output BUS file and generate the inspect.json nucleus (bool, optional) – whether this is a single-nucleus experiment. if True, the spliced and unspliced count matrices will be summed, defaults to False
Returns:	dictionary containing path to generated index
Return type:	dict

kb_python.count¶

Module Contents¶

Functions¶

`kb_python.count`¶