Rbims also includes functions to read and explore dbCAN annotations. To learn how to to this;

First, load the rbims package.

read_dbcan3(): Parse dbCAN3 Outputs

The read_dbcan3 function is designed to parse and format raw output files from the dbCAN3 annotation tool.

Input Requirements

  • File Path: Provide the path to the directory containing your dbCAN files.

  • File Extension: The function specifically looks for files ending in *.overview.txt.

Data Structure

The processed input contains 6 key columns:

  • Bin Name: Identifier of the bin/genome.

  • HMMER / Hotpep / DIAMOND: Genes identified by each specific algorithm.

  • SignalP: Indicates if a signal peptide was detected.

  • #ofTools: Total count of algorithms that identified the gene.

If you want to follow this example, you can download the raw data here.

To obtain a wide output, the argument profile = T.

dbcan_profile_T <-read_dbcan3(dbcan_path = "../test/results/03.dbcan",  
                              profile = T, 
                              write = F)

Some extra information is recovered from the input files, such as:

  • Total number of genes.

  • Remaining genes after the the filtered.

  • Number of genes that have signals and passed the filtered.

head(dbcan_profile_T)
Table 1. dbCAN Profile Overview ( profile = T )
dbCAN_family domain_name 5mSIPHEX1_0 5mSIPHEX1_1 5mSIPHEX1_10 5mSIPHEX1_11 5mSIPHEX1_13 5mSIPHEX1_15 5mSIPHEX1_18 5mSIPHEX1_19 5mSIPHEX1_2 5mSIPHEX1_25 5mSIPHEX1_26 5mSIPHEX1_32 5mSIPHEX1_33 5mSIPHEX1_37 5mSIPHEX1_8 5mSIPHEX1_9 5mSIPHEX2_10 5mSIPHEX2_14 5mSIPHEX2_16 5mSIPHEX2_18 5mSIPHEX2_25 5mSIPHEX2_3 5mSIPHEX2_5 5mSIPHEX2_7 700mSIPHEX1_0 700mSIPHEX1_1 700mSIPHEX1_12 700mSIPHEX1_15 700mSIPHEX1_17 700mSIPHEX1_18 700mSIPHEX1_2 700mSIPHEX1_20 700mSIPHEX1_3 700mSIPHEX1_8 700mSIPHEX2_13 700mSIPHEX2_14 700mSIPHEX2_16 700mSIPHEX2_21 700mSIPHEX2_22 700mSIPHEX2_23 700mSIPHEX2_24 700mSIPHEX2_9
AA1 auxiliary activities [AAs] 4 0 9 1 1 4 0 0 0 16 0 4 0 0 4 1 4 9 0 25 16 0 0 1 1 0 1 1 4 1 0 1 9 0 0 4 1 1 1 0 4 0
AA4 auxiliary activities [AAs] 4 0 0 1 1 0 1 0 1 0 1 0 0 1 1 0 4 0 1 0 4 1 0 1 1 0 0 1 0 0 1 1 0 1 1 0 0 1 1 0 0 0
CBM48 carbohydrate-binding module [CBM] 9 0 0 0 0 0 0 0 0 0 0 0 0 0 4 1 9 0 0 0 4 0 0 0 0 16 1 0 1 1 1 0 0 0 0 1 1 0 1 1 0 9
CBM50 carbohydrate-binding module [CBM] 4 4 1 9 1 0 0 4 4 9 9 0 0 4 4 9 4 1 9 9 4 4 4 1 4 4 1 1 1 9 9 9 1 1 1 1 9 1 9 1 0 4
CE11 carbohydrate esterases [CEs] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
CE14 carbohydrate esterases [CEs] 9 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 4 0 1 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 1 0 1 0 1 0 1
CE4 carbohydrate esterases [CEs] 1 1 25 0 1 1 1 4 4 0 1 0 0 4 1 1 1 25 1 0 1 4 1 1 0 4 1 1 4 4 9 0 25 4 4 4 4 1 9 1 9 4
GH1 glycoside hydrolases [GHs] 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 1
GH102 glycoside hydrolases [GHs] 1 1 0 0 1 1 0 0 0 0 0 1 1 1 1 0 1 0 0 0 1 1 0 1 1 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0
GH103 glycoside hydrolases [GHs] 9 1 4 4 16 1 1 1 1 4 0 1 1 16 16 4 9 4 0 4 16 16 1 16 16 1 0 1 0 4 4 4 4 4 4 0 4 1 4 0 1 1

Or print a long table profile = F.

dbcan_profile_F <-read_dbcan3(dbcan_path = "../test/results/03.dbcan",  
                              profile = F, 
                              write = F)
head(dbcan_profile_F)
Table 2. dbCAN Profile Overview (profile = F)
Bin_name dbCAN_family domain_name signalp Abundance
5mSIPHEX1_0 AA1 auxiliary activities [AAs] Y(1-27) 4
5mSIPHEX1_0 AA4 auxiliary activities [AAs] N 4
5mSIPHEX1_0 CBM48 carbohydrate-binding module [CBM] N 9
5mSIPHEX1_0 CBM50 carbohydrate-binding module [CBM] N 4
5mSIPHEX1_0 CE11 carbohydrate esterases [CEs] N 1
5mSIPHEX1_0 CE14 carbohydrate esterases [CEs] N 6
5mSIPHEX1_0 CE14 carbohydrate esterases [CEs] Y(1-23) 3
5mSIPHEX1_0 CE4 carbohydrate esterases [CEs] N 1
5mSIPHEX1_0 GH1 glycoside hydrolases [GHs] N 1
5mSIPHEX1_0 GH102 glycoside hydrolases [GHs] N 1
5mSIPHEX1_0 GH103 glycoside hydrolases [GHs] N 3
5mSIPHEX1_0 GH103 glycoside hydrolases [GHs] Y(1-25) 3
5mSIPHEX1_0 GH103 glycoside hydrolases [GHs] Y(1-26) 3
5mSIPHEX1_0 GH108 glycoside hydrolases [GHs] N 1
5mSIPHEX1_0 GH13 glycoside hydrolases [GHs] N 49

You can export this profile with:

write.table(dbcan_profile_T, "dbcan.tsv", quote = F, sep = "\t", row.names = F, col.names = T)

Or setting write write = T in the function read_dbcan3()