kb_python.count
¶
Module Contents¶
Functions¶
kallisto_pseudo (batch_path, index_path, out_dir, threads=8) |
Runs kallisto pseudo. |
kallisto_bus (fastqs, index_path, technology, out_dir, threads=8, n=False, k=False) |
Runs kallisto bus. |
kallisto_bus_split (fastqs, index_paths, technology, out_dir, temp_dir=’tmp’, threads=8, memory=‘4G’) |
Runs kallisto bus with split indices. |
bustools_mash (out_dirs, out_dir) |
Runs bustools mash. Additionally, combines the `run_info.json`s into |
bustools_merge (bus_path, out_dir, ecmap_path, txnames_path) |
Runs bustools merge. |
bustools_project (bus_path, out_path, map_path, ecmap_path, txnames_path) |
Runs bustools project. |
bustools_sort (bus_path, out_path, temp_dir=’tmp’, threads=8, memory=‘4G’, flags=False) |
Runs bustools sort. |
bustools_inspect (bus_path, out_path, whitelist_path, ecmap_path) |
Runs bustools inspect. |
bustools_correct (bus_path, out_path, whitelist_path) |
Runs bustools correct. |
bustools_count (bus_path, out_prefix, t2g_path, ecmap_path, txnames_path, tcc=False, mm=False) |
Runs bustools count. |
bustools_capture (bus_path, out_path, capture_path, ecmap_path, txnames_path, capture_type=’transcripts’) |
Runs bustools capture. |
bustools_whitelist (bus_path, out_path) |
Runs bustools whitelist. |
write_smartseq_batch (pairs_1, pairs_2, out_path) |
Write a 3-column TSV specifying batch information for Smart-seq reads. |
matrix_to_cellranger (matrix_path, barcodes_path, genes_path, t2g_path, out_dir) |
Convert bustools count matrix to cellranger-format matrix. |
convert_matrix (counts_dir, matrix_path, barcodes_path, genes_path=None, ec_path=None, t2g_path=None, txnames_path=None, name=’gene’, loom=False, h5ad=False, tcc=False, threads=8) |
Convert a gene count or TCC matrix to loom or h5ad. |
convert_matrices (counts_dir, matrix_paths, barcodes_paths, genes_paths=None, ec_paths=None, t2g_path=None, txnames_path=None, name=’gene’, loom=False, h5ad=False, nucleus=False, tcc=False, threads=8) |
Convert a gene count or TCC matrix to loom or h5ad. |
filter_with_bustools (bus_path, ecmap_path, txnames_path, t2g_path, whitelist_path, filtered_bus_path, counts_prefix=None, tcc=False, mm=False, kite=False, temp_dir=’tmp’, threads=8, memory=‘4G’, count=True, loom=False, h5ad=False, cellranger=False) |
Generate filtered count matrices with bustools. |
stream_fastqs (fastqs, temp_dir=’tmp’) |
Given a list of fastqs (that may be local or remote paths), stream any |
copy_or_create_whitelist (technology, bus_path, out_dir) |
Copies a pre-packaged whitelist if it is provided. Otherwise, runs |
count (index_paths, t2g_path, technology, out_dir, fastqs, whitelist_path=None, tcc=False, mm=False, filter=None, kite=False, FB=False, temp_dir=’tmp’, threads=8, memory=‘4G’, overwrite=False, loom=False, h5ad=False, cellranger=False, inspect=True, report=False) |
Generates count matrices for single-cell RNA seq. |
count_smartseq (index_paths, t2g_path, technology, out_dir, fastqs, temp_dir=’tmp’, threads=8, memory=‘4G’, overwrite=False, loom=False, h5ad=False) |
Generates gene or isoform count matrices from Smart-seq reads. |
count_velocity (index_paths, t2g_path, cdna_t2c_path, intron_t2c_path, technology, out_dir, fastqs, whitelist_path=None, tcc=False, mm=False, filter=None, temp_dir=’tmp’, threads=8, memory=‘4G’, overwrite=False, loom=False, h5ad=False, cellranger=False, report=False, inspect=True, nucleus=False) |
Generates RNA velocity matrices for single-cell RNA seq. |
-
kb_python.count.
logger
¶
-
kb_python.count.
INSPECT_PARSER
¶
-
kb_python.count.
kallisto_pseudo
(batch_path, index_path, out_dir, threads=8)¶ Runs kallisto pseudo.
Parameters: - batch_path (str) – path to textfile containing batch definitions
- index_path (str) – path to kallisto index
- out_dir (str) – path to output directory
- threads (int, optional) – number of threads to use, defaults to 8
Returns: dictionary containing output files
Return type: dict
-
kb_python.count.
kallisto_bus
(fastqs, index_path, technology, out_dir, threads=8, n=False, k=False)¶ Runs kallisto bus.
Parameters: - fastqs (list) – list of FASTQ file paths
- index_path (str) – path to kallisto index
- technology (str) – single-cell technology used
- out_dir (str) – path to output directory
- threads (int, optional) – number of threads to use, defaults to 8
- n (bool, optional) – include number of read in flag column (used when splitting indices), defaults to False
- k (bool, optional) – alignment is done per k-mer (used when splitting indices), defaults to False
Returns: dictionary containing paths to generated files
Return type: dict
-
kb_python.count.
kallisto_bus_split
(fastqs, index_paths, technology, out_dir, temp_dir='tmp', threads=8, memory='4G')¶ Runs kallisto bus with split indices.
Parameters: - fastqs (list) – list of FASTQ file paths or URLs
- index_paths (list) – paths to kallisto indices
- technology (str) – single-cell technology used
- out_dir (str) – path to output directory
- temp_dir (str, optional) – path to temporary directory, defaults to tmp
- threads (int, optional) – number of threads to use, defaults to 8
- memory (str, optional) – amount of memory to use, defaults to 4G
Returns: dictionary containing paths to generated files
Return type: dict
-
kb_python.count.
bustools_mash
(out_dirs, out_dir)¶ Runs bustools mash. Additionally, combines the `run_info.json`s into one.
Parameters: - out_dirs (list) – list of kallisto bus output directories. Note that BUS files should be sorted by flag
- out_dir (str) – path to output directory
Returns: dictionary containing paths to generated files
Return type: dict
-
kb_python.count.
bustools_merge
(bus_path, out_dir, ecmap_path, txnames_path)¶ Runs bustools merge.
Parameters: - bus_path (str) – path to BUS file to merge
- out_dir (str) – path to output directory, where the merged BUS file and ecmap will be written
- ecmap_path (str) – path to ecmap file, as generated by kallisto bus
- txnames_path (str) – path to transcript names file, as generated by kallisto bus
Returns: dictionary containing path to generated BUS file and merged ecmap
Return type: dict
-
kb_python.count.
bustools_project
(bus_path, out_path, map_path, ecmap_path, txnames_path)¶ Runs bustools project.
Parameters: - bus_path (str) – path to BUS file to sort
- out_dir (str) – path to output directory
- map_path (str) – path to file containing source-to-destination mapping
- ecmap_path (str) – path to ecmap file, as generated by kallisto bus
- txnames_path (str) – path to transcript names file, as generated by kallisto bus
Returns: dictionary containing path to generated BUS file
Return type: dict
-
kb_python.count.
bustools_sort
(bus_path, out_path, temp_dir='tmp', threads=8, memory='4G', flags=False)¶ Runs bustools sort.
Parameters: - bus_path (str) – path to BUS file to sort
- out_dir (str) – path to output BUS path
- temp_dir (str, optional) – path to temporary directory, defaults to tmp
- threads (int, optional) – number of threads to use, defaults to 8
- memory (str, optional) – amount of memory to use, defaults to 4G
- flags (bool, optional) – whether to supply the –flags argument to sort, defaults to False
Returns: dictionary containing path to generated index
Return type: dict
-
kb_python.count.
bustools_inspect
(bus_path, out_path, whitelist_path, ecmap_path)¶ Runs bustools inspect.
Parameters: - bus_path (str) – path to BUS file to sort
- out_path (str) – path to output inspect JSON file
- whitelist_path (str) – path to whitelist
- ecmap_path (str) – path to ecmap file, as generated by kallisto bus
Returns: dictionary containing path to generated index
Return type: dict
-
kb_python.count.
bustools_correct
(bus_path, out_path, whitelist_path)¶ Runs bustools correct.
Parameters: - bus_path (str) – path to BUS file to correct
- out_path (str) – path to output corrected BUS file
- whitelist_path (str) – path to whitelist
Returns: dictionary containing path to generated index
Return type: dict
-
kb_python.count.
bustools_count
(bus_path, out_prefix, t2g_path, ecmap_path, txnames_path, tcc=False, mm=False)¶ Runs bustools count.
Parameters: - bus_path (str) – path to BUS file to correct
- out_prefix (str) – prefix of the output files to generate
- t2g_path (str) – path to output transcript-to-gene mapping
- ecmap_path (str) – path to ecmap file, as generated by kallisto bus
- txnames_path (str) – path to transcript names file, as generated by kallisto bus
- tcc (bool, optional) – whether to generate a TCC matrix instead of a gene count matrix, defaults to False
- mm (bool, optional) – whether to include BUS records that pseudoalign to multiple genes, defaults to False
Returns: dictionary containing path to generated index
Return type: dict
-
kb_python.count.
bustools_capture
(bus_path, out_path, capture_path, ecmap_path, txnames_path, capture_type='transcripts')¶ Runs bustools capture.
Parameters: - bus_path (str) – path to BUS file to capture
- out_path (str) – path to BUS file to generate
- capture_path (str) – path transcripts-to-capture list
- ecmap_path (str) – path to ecmap file, as generated by kallisto bus
- txnames_path (str) – path to transcript names file, as generated by kallisto bus
- capture_type (str) – the type of information in the capture list. can be one of transcripts, umis, barcode.
Returns: dictionary containing path to generated index
Return type: dict
-
kb_python.count.
bustools_whitelist
(bus_path, out_path)¶ Runs bustools whitelist.
Parameters: - bus_path (str) – path to BUS file generate the whitelist from
- out_path (str) – path to output whitelist
Returns: dictionary containing path to generated index
Return type: dict
-
kb_python.count.
write_smartseq_batch
(pairs_1, pairs_2, out_path)¶ Write a 3-column TSV specifying batch information for Smart-seq reads. This file is required to use kallisto pseudo on multiple samples (= cells).
Parameters: - pairs_1 (list) – list of paths to FASTQs corresponding to pair 1
- pairs_2 (list) – list of paths to FASTQS corresponding to pair 2
- out_path (str) – path to batch file to output
Returns: dictionary of written batch file
Return type: dict
-
kb_python.count.
matrix_to_cellranger
(matrix_path, barcodes_path, genes_path, t2g_path, out_dir)¶ Convert bustools count matrix to cellranger-format matrix.
Parameters: - matrix_path (str) – path to matrix
- barcodes_path (str) – list of paths to barcodes.txt
- genes_path (str) – path to genes.txt
- t2g_path (str) – path to transcript-to-gene mapping
- out_dir (str) – path to output matrix
Returns: dictionary of matrix files
Return type: dict
-
kb_python.count.
convert_matrix
(counts_dir, matrix_path, barcodes_path, genes_path=None, ec_path=None, t2g_path=None, txnames_path=None, name='gene', loom=False, h5ad=False, tcc=False, threads=8)¶ Convert a gene count or TCC matrix to loom or h5ad.
Parameters: - counts_dir (str) – path to counts directory
- matrix_path (str) – path to matrix
- barcodes_path (str) – list of paths to barcodes.txt
- genes_path (str, optional) – path to genes.txt, defaults to None
- ec_path (str, optional) – path to ec.txt, defaults to None
- t2g_path (str, optional) – path to transcript-to-gene mapping. If this is provided, the third column of the mapping is appended to the anndata var, defaults to None
- txnames_path (str, optional) – path to transcripts.txt, defaults to None
- name (str, optional) – name of the columns, defaults to “gene”
- loom (bool, optional) – whether to generate loom file, defaults to False
- h5ad (bool, optional) – whether to generate h5ad file, defaults to False
- tcc (bool, optional) – whether the matrix is a TCC matrix, defaults to False
- threads (int, optional) – number of threads to use, defaults to 8
Returns: dictionary of generated files
Return type: dict
-
kb_python.count.
convert_matrices
(counts_dir, matrix_paths, barcodes_paths, genes_paths=None, ec_paths=None, t2g_path=None, txnames_path=None, name='gene', loom=False, h5ad=False, nucleus=False, tcc=False, threads=8)¶ Convert a gene count or TCC matrix to loom or h5ad.
Parameters: - counts_dir (str) – path to counts directory
- matrix_paths (list) – list of paths to matrices
- barcodes_paths (list) – list of paths to barcodes.txt
- genes_paths (list, optional) – list of paths to genes.txt, defaults to None
- ec_paths (list, optional) – list of path to ec.txt, defaults to None
- t2g_path (str, optional) – path to transcript-to-gene mapping. If this is provided, the third column of the mapping is appended to the anndata var, defaults to None
- txnames_path (str, optional) – list of paths to transcripts.txt, defaults to None
- name (str, optional) – name of the columns, defaults to “gene”
- loom (bool, optional) – whether to generate loom file, defaults to False
- h5ad (bool, optional) – whether to generate h5ad file, defaults to False
- nucleus (bool, optional) – whether the matrices contain single nucleus counts, defaults to False
- tcc (bool, optional) – whether the matrix is a TCC matrix, defaults to False
- threads (int, optional) – number of threads to use, defaults to 8
Returns: dictionary of generated files
Return type: dict
-
kb_python.count.
filter_with_bustools
(bus_path, ecmap_path, txnames_path, t2g_path, whitelist_path, filtered_bus_path, counts_prefix=None, tcc=False, mm=False, kite=False, temp_dir='tmp', threads=8, memory='4G', count=True, loom=False, h5ad=False, cellranger=False)¶ Generate filtered count matrices with bustools.
Parameters: - bus_path (str) – path to sorted, corrected, sorted BUS file
- ecmap_path (str) – path to matrix ec file
- txnames_path (str) – path to list of transcripts
- t2g_path (str) – path to transcript-to-gene mapping
- whitelist_path (str) – path to filter whitelist to generate
- filtered_bus_path (str) – path to filtered BUS file to generate
- counts_prefix (str, optional) – prefix of count matrix, defaults to None
- tcc (bool, optional) – whether to generate a TCC matrix instead of a gene count matrix, defaults to False
- mm (bool, optional) – whether to include BUS records that pseudoalign to multiple genes, defaults to False
- kite (bool, optional) – Whether this is a KITE workflow
- temp_dir (str, optional) – path to temporary directory, defaults to tmp
- threads (int, optional) – number of threads to use, defaults to 8
- memory (str, optional) – amount of memory to use, defaults to 4G
- loom (bool, optional) – whether to convert the final count matrix into a loom file, defaults to False
- h5ad (bool, optional) – whether to convert the final count matrix into a h5ad file, defaults to False
- cellranger (bool, optional) – whether to convert the final count matrix into a cellranger-compatible matrix, defaults to False
Returns: dictionary of generated files
Return type: dict
-
kb_python.count.
stream_fastqs
(fastqs, temp_dir='tmp')¶ Given a list of fastqs (that may be local or remote paths), stream any remote files. Internally, calls utils.
Parameters: - fastqs (list) – list of (remote or local) fastq paths
- temp_dir (str) – temporary directory
Returns: all remote paths substituted with a local path
Return type: list
-
kb_python.count.
copy_or_create_whitelist
(technology, bus_path, out_dir)¶ Copies a pre-packaged whitelist if it is provided. Otherwise, runs bustools whitelist to generate a whitelist.
Parameters: - technology (str) – single-cell technology used
- bus_path (str) – path to BUS file generate the whitelist from
- out_dir (str) – path to output directory
Returns: path to copied or generated whitelist
Return type: str
-
kb_python.count.
count
(index_paths, t2g_path, technology, out_dir, fastqs, whitelist_path=None, tcc=False, mm=False, filter=None, kite=False, FB=False, temp_dir='tmp', threads=8, memory='4G', overwrite=False, loom=False, h5ad=False, cellranger=False, inspect=True, report=False)¶ Generates count matrices for single-cell RNA seq.
Parameters: - index_paths (list) – paths to kallisto indices
- t2g_path (str) – path to transcript-to-gene mapping
- technology (str) – single-cell technology used
- out_dir (str) – path to output directory
- fastqs (list) – list of FASTQ file paths
- whitelist_path (str, optional) – path to whitelist, defaults to None
- tcc (bool, optional) – whether to generate a TCC matrix instead of a gene count matrix, defaults to False
- mm (bool, optional) – whether to include BUS records that pseudoalign to multiple genes, defaults to False
- filter (str, optional) – filter to use to generate a filtered count matrix, defaults to None
- kite (bool, optional) – Whether this is a KITE workflow
- FB (bool, optional) – whether 10x Genomics Feature Barcoding technology was used, defaults to False
- temp_dir (str, optional) – path to temporary directory, defaults to tmp
- threads (int, optional) – number of threads to use, defaults to 8
- memory (str, optional) – amount of memory to use, defaults to 4G
- overwrite (bool, optional) – overwrite an existing index file, defaults to False
- loom (bool, optional) – whether to convert the final count matrix into a loom file, defaults to False
- h5ad (bool, optional) – whether to convert the final count matrix into a h5ad file, defaults to False
- cellranger (bool, optional) – whether to convert the final count matrix into a cellranger-compatible matrix, defaults to False
- inspect (bool, optional) – whether or not to inspect the output BUS file and generate the inspect.json
- report (bool, optional) – generate an HTMl report, defaults to False
Returns: dictionary containing path to generated index
Return type: dict
-
kb_python.count.
count_smartseq
(index_paths, t2g_path, technology, out_dir, fastqs, temp_dir='tmp', threads=8, memory='4G', overwrite=False, loom=False, h5ad=False)¶ Generates gene or isoform count matrices from Smart-seq reads.
-
kb_python.count.
count_velocity
(index_paths, t2g_path, cdna_t2c_path, intron_t2c_path, technology, out_dir, fastqs, whitelist_path=None, tcc=False, mm=False, filter=None, temp_dir='tmp', threads=8, memory='4G', overwrite=False, loom=False, h5ad=False, cellranger=False, report=False, inspect=True, nucleus=False)¶ Generates RNA velocity matrices for single-cell RNA seq.
Parameters: - index_paths (list) – paths to kallisto indices
- t2g_path (str) – path to transcript-to-gene mapping
- cdna_t2c_path (str) – path to cDNA transcripts-to-capture file
- intron_t2c_path (str) – path to intron transcripts-to-capture file
- technology (str) – single-cell technology used
- out_dir (str) – path to output directory
- fastqs (list) – list of FASTQ file paths
- whitelist_path (str, optional) – path to whitelist, defaults to None
- tcc (bool, optional) – whether to generate a TCC matrix instead of a gene count matrix, defaults to False
- mm (bool, optional) – whether to include BUS records that pseudoalign to multiple genes, defaults to False
- filter (str, optional) – filter to use to generate a filtered count matrix, defaults to None
- temp_dir (str, optional) – path to temporary directory, defaults to tmp
- threads (int, optional) – number of threads to use, defaults to 8
- memory (str, optional) – amount of memory to use, defaults to 4G
- overwrite (bool, optional) – overwrite an existing index file, defaults to False
- loom (bool, optional) – whether to convert the final count matrix into a loom file, defaults to False
- h5ad (bool, optional) – whether to convert the final count matrix into a h5ad file, defaults to False
- cellranger (bool, optional) – whether to convert the final count matrix into a cellranger-compatible matrix, defaults to False
- report (bool, optional) – generate HTML reports, defaults to False
- inspect (bool, optional) – whether or not to inspect the output BUS file and generate the inspect.json
- nucleus (bool, optional) – whether this is a single-nucleus experiment. if True, the spliced and unspliced count matrices will be summed, defaults to False
Returns: dictionary containing path to generated index
Return type: dict