circRNA identification¶
Usage: ucircfull circ_call [--help] [--version] --mode STR [--notstranded VAR] --input FQ [--bam BAM] --ref REF --anno GTF [--umi CLSTR] --splice MOTIF --outdir DIR --prefix PREFIX --thread INT [--threshold INT] [--minimap2 PATH] [--samtools PATH] [--debug]
circRNA identification.
Optional arguments:
-h, --help shows help message and exits
-v, --version prints version information and exits
-m, --mode STR circRNA calling mode. (RG, ucRG, cRG) [required]
-sn, --notstranded find splicing signal in both strand. [default: false]
-i, --input FQ stranded fastq file. [required]
--bam BAM mapped bam file as input.
-r, --ref REF CIRI-long reference directory. [required]
-a, --anno GTF reference annotation GTF file. [required]
-u, --umi CLSTR umi clust results file. [default: "-"]
--splice MOTIF splice motifs. [default: "AGGT,AGGC,ACAT,ACGT,AGAT"]
-o, --outdir DIR output directory. [default: "."]
-p, --prefix PREFIX output prefix. [default: "circFL"]
-t, --thread INT number of threads used. [default: 4]
--threshold INT minimum supporting reads for a circRNA transcript. [default: 2]
--minimap2 PATH path to minimap2. [default: "minimap2"]
--samtools PATH path to samtools. [default: "samtools"]
--debug enable debug output.
ucircfull circ_call -m ucRG -i ${sample}_strand.fastq -r $genome -a $gtfFile -u ${sample}_umi.clstr -o ./ucRG -p $sample -t $thread
Output¶
$sample.circ.gtf: identification and quantification results of circRNAs$sample.fusion.txt: detected fusion circRNAs
Output files¶
$prefix.circ.gtf¶
$prefix.circ.gtf is a standard GTF file generated by ucircfull circ_call. It contains three feature types for each reported circRNA locus:
BSJ: back-splice junction locus, grouping all isoforms from the same BSJ sitetranscript: one full-length circRNA isoform (exon composition + splice-site variants)exon: exon structure of the isoform
The file uses the standard 9-column GTF layout:
Column |
Description |
|---|---|
1 |
Chromosome name |
2 |
Always |
3 |
|
4 |
1-based start coordinate |
5 |
1-based end coordinate |
6 |
Always |
7 |
Strand of the circRNA isoform |
8 |
Always |
9 |
Feature-specific attributes described below |
Attributes written by ucircfull:
BSJrecords:gene_id,circ_type,host_gene_id,host_gene_name,bsjtranscriptrecords:gene_id,transcript_id,uniform_id,circ_type,host_gene_id,host_gene_name,bsj(per-isoform read count)exonrecords:gene_id,transcript_id,uniform_id,exon_number
Field meanings:
gene_id: BSJ position ID in the formatchr:start-endtranscript_id: isoform ID in the formatchr:start-end:strand|exon1_start-exon1_end,exon2_start-exon2_end,...bsjonBSJrecords: total number of supporting reads summed across all isoforms at the BSJ locusbsjontranscriptrecords: number of supporting reads for that isoformcirc_type: circRNA classification (exon,intron, orintergenic_region)host_gene_id: best-matching host gene identifier from the annotationhost_gene_name: gene symbol of the host geneuniform_id: standardized circRNA name following the proposed naming scheme; includes exon composition, splice-site variants (L/S), retained introns (RI), novel exons (NE), and an ordered.Nsuffix per distinct BSJ site (e.g.circAKT3(2,3).1,circMCU(2,L3).2)exon_number: exon order within the isoform, starting from 1
Only circRNA isoforms with supporting reads at or above the --threshold value are output to this file.
$prefix.fusion.txt¶
$prefix.fusion.txt is a tab-separated table describing fusion circRNAs detected by ucircfull circ_call.
Column |
Description |
|---|---|
|
Fusion circRNA locus ID in the format |
|
Fusion isoform ID, including exon composition and strand for both loci |
|
Chromosome of the first fusion locus |
|
Start coordinate of the first fusion locus |
|
End coordinate of the first fusion locus |
|
Length of the first fusion locus |
|
Number of exons in the first fusion locus |
|
Comma-separated exon start coordinates for the first locus |
|
Comma-separated exon end coordinates for the first locus |
|
Left splice-site sequence for each exon in the first locus |
|
Right splice-site sequence for each exon in the first locus |
|
Strand of the first fusion locus |
|
Gene annotation overlapping the first locus, if available |
|
Chromosome of the second fusion locus |
|
Start coordinate of the second fusion locus |
|
End coordinate of the second fusion locus |
|
Length of the second fusion locus |
|
Number of exons in the second fusion locus |
|
Comma-separated exon start coordinates for the second locus |
|
Comma-separated exon end coordinates for the second locus |
|
Left splice-site sequence for each exon in the second locus |
|
Right splice-site sequence for each exon in the second locus |
|
Strand of the second fusion locus |
|
Gene annotation overlapping the second locus, if available |
|
Number of reads supporting the fusion isoform |
|
Comma-separated read IDs supporting the fusion isoform |