circRNA identification¶

Usage: ucircfull circ_call [--help] [--version] --mode STR [--notstranded VAR] --input FQ [--bam BAM] --ref REF --anno GTF [--umi CLSTR] --splice MOTIF --outdir DIR --prefix PREFIX --thread INT [--threshold INT] [--minimap2 PATH] [--samtools PATH] [--debug]

circRNA identification.

Optional arguments:
  -h, --help            shows help message and exits
  -v, --version         prints version information and exits
  -m, --mode STR        circRNA calling mode. (RG, ucRG, cRG) [required]
  -sn, --notstranded    find splicing signal in both strand. [default: false]
  -i, --input FQ        stranded fastq file. [required]
  --bam BAM             mapped bam file as input.
  -r, --ref REF         CIRI-long reference directory. [required]
  -a, --anno GTF        reference annotation GTF file. [required]
  -u, --umi CLSTR       umi clust results file. [default: "-"]
  --splice MOTIF        splice motifs. [default: "AGGT,AGGC,ACAT,ACGT,AGAT"]
  -o, --outdir DIR      output directory. [default: "."]
  -p, --prefix PREFIX   output prefix. [default: "circFL"]
  -t, --thread INT      number of threads used. [default: 4]
  --threshold INT       minimum supporting reads for a circRNA transcript. [default: 2]
  --minimap2 PATH       path to minimap2. [default: "minimap2"]
  --samtools PATH       path to samtools. [default: "samtools"]
  --debug               enable debug output.

ucircfull circ_call -m ucRG -i ${sample}_strand.fastq -r $genome -a $gtfFile -u ${sample}_umi.clstr -o ./ucRG -p $sample -t $thread

Output¶

$sample.circ.gtf: identification and quantification results of circRNAs
$sample.fusion.txt: detected fusion circRNAs

Output files¶

`$prefix`.circ.gtf¶

$prefix.circ.gtf is a standard GTF file generated by ucircfull circ_call. It contains three feature types for each reported circRNA locus:

BSJ: back-splice junction locus, grouping all isoforms from the same BSJ site
transcript: one full-length circRNA isoform (exon composition + splice-site variants)
exon: exon structure of the isoform

The file uses the standard 9-column GTF layout:

Column	Description
1 `seqname`	Chromosome name
2 `source`	Always `circfull`
3 `feature`	`BSJ`, `transcript`, or `exon`
4 `start`	1-based start coordinate
5 `end`	1-based end coordinate
6 `score`	Always `.`
7 `strand`	Strand of the circRNA isoform
8 `frame`	Always `.`
9 `attribute`	Feature-specific attributes described below

Attributes written by ucircfull:

BSJ records: gene_id, circ_type, host_gene_id, host_gene_name, bsj
transcript records: gene_id, transcript_id, uniform_id, circ_type, host_gene_id, host_gene_name, bsj (per-isoform read count)
exon records: gene_id, transcript_id, uniform_id, exon_number

Field meanings:

gene_id: BSJ position ID in the format chr:start-end
transcript_id: isoform ID in the format chr:start-end:strand|exon1_start-exon1_end,exon2_start-exon2_end,...
bsj on BSJ records: total number of supporting reads summed across all isoforms at the BSJ locus
bsj on transcript records: number of supporting reads for that isoform
circ_type: circRNA classification (exon, intron, or intergenic_region)
host_gene_id: best-matching host gene identifier from the annotation
host_gene_name: gene symbol of the host gene
uniform_id: standardized circRNA name following the proposed naming scheme; includes exon composition, splice-site variants (L/S), retained introns (RI), novel exons (NE), and an ordered .N suffix per distinct BSJ site (e.g. circAKT3(2,3).1, circMCU(2,L3).2)
exon_number: exon order within the isoform, starting from 1

Only circRNA isoforms with supporting reads at or above the --threshold value are output to this file.

`$prefix`.fusion.txt¶

$prefix.fusion.txt is a tab-separated table describing fusion circRNAs detected by ucircfull circ_call.

Column	Description
`circID`	Fusion circRNA locus ID in the format `chr_first\|start_first\|end_first\|chr_second\|start_second\|end_second`
`isoID`	Fusion isoform ID, including exon composition and strand for both loci
`chr_first`	Chromosome of the first fusion locus
`start_first`	Start coordinate of the first fusion locus
`end_first`	End coordinate of the first fusion locus
`len_first`	Length of the first fusion locus
`exonNum_first`	Number of exons in the first fusion locus
`exon_start_first`	Comma-separated exon start coordinates for the first locus
`exon_end_first`	Comma-separated exon end coordinates for the first locus
`exon_leftSeq_first`	Left splice-site sequence for each exon in the first locus
`exon_rightSeq_first`	Right splice-site sequence for each exon in the first locus
`strand_first`	Strand of the first fusion locus
`geneName_first`	Gene annotation overlapping the first locus, if available
`chr_second`	Chromosome of the second fusion locus
`start_second`	Start coordinate of the second fusion locus
`end_second`	End coordinate of the second fusion locus
`len_second`	Length of the second fusion locus
`exonNum_second`	Number of exons in the second fusion locus
`exon_start_second`	Comma-separated exon start coordinates for the second locus
`exon_end_second`	Comma-separated exon end coordinates for the second locus
`exon_leftSeq_second`	Left splice-site sequence for each exon in the second locus
`exon_rightSeq_second`	Right splice-site sequence for each exon in the second locus
`strand_second`	Strand of the second fusion locus
`geneName_second`	Gene annotation overlapping the second locus, if available
`readCount`	Number of reads supporting the fusion isoform
`readID`	Comma-separated read IDs supporting the fusion isoform

circRNA identification¶

Output¶

Output files¶

$prefix.circ.gtf¶

$prefix.fusion.txt¶

`$prefix`.circ.gtf¶

`$prefix`.fusion.txt¶