Rbims also includes functions to read and explore dbCAN annotations. To learn how to to this;
First, load the rbims package.
read_dbcan3(): Parse dbCAN3 Outputs
The read_dbcan3 function is designed to parse and format
raw output files from the dbCAN3 annotation tool.
File Path: Provide the path to the directory containing your dbCAN files.
File Extension: The function specifically looks for files ending
in *.overview.txt.
The processed input contains 6 key columns:
Bin Name: Identifier of the bin/genome.
HMMER / Hotpep / DIAMOND: Genes identified by each specific algorithm.
SignalP: Indicates if a signal peptide was detected.
#ofTools: Total count of algorithms that identified the gene.
If you want to follow this example, you can download the raw data here.
To obtain a wide output, the argument profile = T.
dbcan_profile_T <-read_dbcan3(dbcan_path = "../test/results/03.dbcan",
profile = T,
write = F)Some extra information is recovered from the input files, such as:
Total number of genes.
Remaining genes after the the filtered.
Number of genes that have signals and passed the filtered.
head(dbcan_profile_T)| dbCAN_family | domain_name | 5mSIPHEX1_0 | 5mSIPHEX1_1 | 5mSIPHEX1_10 | 5mSIPHEX1_11 | 5mSIPHEX1_13 | 5mSIPHEX1_15 | 5mSIPHEX1_18 | 5mSIPHEX1_19 | 5mSIPHEX1_2 | 5mSIPHEX1_25 | 5mSIPHEX1_26 | 5mSIPHEX1_32 | 5mSIPHEX1_33 | 5mSIPHEX1_37 | 5mSIPHEX1_8 | 5mSIPHEX1_9 | 5mSIPHEX2_10 | 5mSIPHEX2_14 | 5mSIPHEX2_16 | 5mSIPHEX2_18 | 5mSIPHEX2_25 | 5mSIPHEX2_3 | 5mSIPHEX2_5 | 5mSIPHEX2_7 | 700mSIPHEX1_0 | 700mSIPHEX1_1 | 700mSIPHEX1_12 | 700mSIPHEX1_15 | 700mSIPHEX1_17 | 700mSIPHEX1_18 | 700mSIPHEX1_2 | 700mSIPHEX1_20 | 700mSIPHEX1_3 | 700mSIPHEX1_8 | 700mSIPHEX2_13 | 700mSIPHEX2_14 | 700mSIPHEX2_16 | 700mSIPHEX2_21 | 700mSIPHEX2_22 | 700mSIPHEX2_23 | 700mSIPHEX2_24 | 700mSIPHEX2_9 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| AA1 | auxiliary activities [AAs] | 4 | 0 | 9 | 1 | 1 | 4 | 0 | 0 | 0 | 16 | 0 | 4 | 0 | 0 | 4 | 1 | 4 | 9 | 0 | 25 | 16 | 0 | 0 | 1 | 1 | 0 | 1 | 1 | 4 | 1 | 0 | 1 | 9 | 0 | 0 | 4 | 1 | 1 | 1 | 0 | 4 | 0 |
| AA4 | auxiliary activities [AAs] | 4 | 0 | 0 | 1 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 1 | 0 | 4 | 0 | 1 | 0 | 4 | 1 | 0 | 1 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 1 | 0 | 1 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 0 |
| CBM48 | carbohydrate-binding module [CBM] | 9 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 1 | 9 | 0 | 0 | 0 | 4 | 0 | 0 | 0 | 0 | 16 | 1 | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 1 | 1 | 0 | 9 |
| CBM50 | carbohydrate-binding module [CBM] | 4 | 4 | 1 | 9 | 1 | 0 | 0 | 4 | 4 | 9 | 9 | 0 | 0 | 4 | 4 | 9 | 4 | 1 | 9 | 9 | 4 | 4 | 4 | 1 | 4 | 4 | 1 | 1 | 1 | 9 | 9 | 9 | 1 | 1 | 1 | 1 | 9 | 1 | 9 | 1 | 0 | 4 |
| CE11 | carbohydrate esterases [CEs] | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
| CE14 | carbohydrate esterases [CEs] | 9 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 4 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 |
| CE4 | carbohydrate esterases [CEs] | 1 | 1 | 25 | 0 | 1 | 1 | 1 | 4 | 4 | 0 | 1 | 0 | 0 | 4 | 1 | 1 | 1 | 25 | 1 | 0 | 1 | 4 | 1 | 1 | 0 | 4 | 1 | 1 | 4 | 4 | 9 | 0 | 25 | 4 | 4 | 4 | 4 | 1 | 9 | 1 | 9 | 4 |
| GH1 | glycoside hydrolases [GHs] | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 |
| GH102 | glycoside hydrolases [GHs] | 1 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| GH103 | glycoside hydrolases [GHs] | 9 | 1 | 4 | 4 | 16 | 1 | 1 | 1 | 1 | 4 | 0 | 1 | 1 | 16 | 16 | 4 | 9 | 4 | 0 | 4 | 16 | 16 | 1 | 16 | 16 | 1 | 0 | 1 | 0 | 4 | 4 | 4 | 4 | 4 | 4 | 0 | 4 | 1 | 4 | 0 | 1 | 1 |
Or print a long table profile = F.
dbcan_profile_F <-read_dbcan3(dbcan_path = "../test/results/03.dbcan",
profile = F,
write = F)
head(dbcan_profile_F)| Bin_name | dbCAN_family | domain_name | signalp | Abundance |
|---|---|---|---|---|
| 5mSIPHEX1_0 | AA1 | auxiliary activities [AAs] | Y(1-27) | 4 |
| 5mSIPHEX1_0 | AA4 | auxiliary activities [AAs] | N | 4 |
| 5mSIPHEX1_0 | CBM48 | carbohydrate-binding module [CBM] | N | 9 |
| 5mSIPHEX1_0 | CBM50 | carbohydrate-binding module [CBM] | N | 4 |
| 5mSIPHEX1_0 | CE11 | carbohydrate esterases [CEs] | N | 1 |
| 5mSIPHEX1_0 | CE14 | carbohydrate esterases [CEs] | N | 6 |
| 5mSIPHEX1_0 | CE14 | carbohydrate esterases [CEs] | Y(1-23) | 3 |
| 5mSIPHEX1_0 | CE4 | carbohydrate esterases [CEs] | N | 1 |
| 5mSIPHEX1_0 | GH1 | glycoside hydrolases [GHs] | N | 1 |
| 5mSIPHEX1_0 | GH102 | glycoside hydrolases [GHs] | N | 1 |
| 5mSIPHEX1_0 | GH103 | glycoside hydrolases [GHs] | N | 3 |
| 5mSIPHEX1_0 | GH103 | glycoside hydrolases [GHs] | Y(1-25) | 3 |
| 5mSIPHEX1_0 | GH103 | glycoside hydrolases [GHs] | Y(1-26) | 3 |
| 5mSIPHEX1_0 | GH108 | glycoside hydrolases [GHs] | N | 1 |
| 5mSIPHEX1_0 | GH13 | glycoside hydrolases [GHs] | N | 49 |
You can export this profile with:
write.table(dbcan_profile_T, "dbcan.tsv", quote = F, sep = "\t", row.names = F, col.names = T)Or setting write write = T in the function
read_dbcan3()