Rbims also includes functions to read and explore KEGG annotations. To learn how to to this;
Make sure the library is loaded
read_ko(): Parse KEGG Outputs
The read_ko.
function is designed to parse and format raw output files from the
InterProScan annotation tool.
?read_koFile Path: Provide the path to the directory containing your KEGG files (outputs from the KofamKOALA/KofamScan annotation). Even if its just one, the path needs to be explicitly defined for the directory only.
File Extension: The function specifically looks for files ending
in *.txt.
The processed input contains 6 key columns:
gene name: the unique identifier for the sequence that was
annotated.
KO: KEGG ortholog identifier. The “ID number” for the functional group.
thrshld: the minimum score requirement set by the annotation tool to consider a match “real.”
score: the bit-score from the sequence alignment.
E-value: the number of hits one can “expect” to see by chance when searching a database of a particular size.
KO definition: description of the functional group.
If you want to follow this example, you can download the files from here.
ko_bin_table<-read_ko(data_kofam ="../test/results/02.kofam")The read_ko function will create a table that contains the abundance of each KO within each bin.
head(ko_bin_table)| Scaffold_name | Bin_name | KO | Abundance |
|---|---|---|---|
| 5mSIPHEX1_0_scaffold_1104_c1_2 | 5mSIPHEX1_0 | K02056 | 1 |
| 5mSIPHEX1_0_scaffold_1104_c1_6 | 5mSIPHEX1_0 | K00852 | 2 |
| 5mSIPHEX1_0_scaffold_1104_c1_7 | 5mSIPHEX1_0 | K01619 | 1 |
| 5mSIPHEX1_0_scaffold_1104_c1_8 | 5mSIPHEX1_0 | K00128 | 1 |
| 5mSIPHEX1_0_scaffold_12_c2_100 | 5mSIPHEX1_0 | K07231 | 1 |
| 5mSIPHEX1_0_scaffold_12_c2_102 | 5mSIPHEX1_0 | K18911 | 1 |
| 5mSIPHEX1_0_scaffold_12_c2_103 | 5mSIPHEX1_0 | K25285 | 1 |
| 5mSIPHEX1_0_scaffold_12_c2_103 | 5mSIPHEX1_0 | K02013 | 3 |
| 5mSIPHEX1_0_scaffold_12_c2_104 | 5mSIPHEX1_0 | K25283 | 1 |
| 5mSIPHEX1_0_scaffold_12_c2_105 | 5mSIPHEX1_0 | K25284 | 1 |
The function mapping_ko can now be used to map the KO and their abundance to the rest of the features of KEGG and rbims database.
ko_bin_mapp<-mapping_ko(ko_bin_table)
head(ko_bin_mapp)| Module | Module_description | Pathway | Pathway_description | Cycle | Pathway_cycle | Detail_cycle | Genes | Gene_description | Enzyme | KO | rbims_pathway | rbims_sub_pathway | 5mSIPHEX1_0 | 5mSIPHEX1_1 | 5mSIPHEX1_10 | 5mSIPHEX1_11 | 5mSIPHEX1_13 | 5mSIPHEX1_15 | 5mSIPHEX1_18 | 5mSIPHEX1_19 | 5mSIPHEX1_2 | 5mSIPHEX1_25 | 5mSIPHEX1_26 | 5mSIPHEX1_32 | 5mSIPHEX1_33 | 5mSIPHEX1_37 | 5mSIPHEX1_8 | 5mSIPHEX1_9 | 5mSIPHEX2_10 | 5mSIPHEX2_14 | 5mSIPHEX2_16 | 5mSIPHEX2_18 | 5mSIPHEX2_25 | 5mSIPHEX2_3 | 5mSIPHEX2_5 | 5mSIPHEX2_7 | 700mSIPHEX1_0 | 700mSIPHEX1_1 | 700mSIPHEX1_12 | 700mSIPHEX1_15 | 700mSIPHEX1_17 | 700mSIPHEX1_18 | 700mSIPHEX1_2 | 700mSIPHEX1_20 | 700mSIPHEX1_3 | 700mSIPHEX1_8 | 700mSIPHEX2_13 | 700mSIPHEX2_14 | 700mSIPHEX2_16 | 700mSIPHEX2_21 | 700mSIPHEX2_22 | 700mSIPHEX2_23 | 700mSIPHEX2_24 | 700mSIPHEX2_9 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| NA | NA | NA | NA | NA | NA | NA | ABC.SS.A | simple sugar transport system ATP-binding protein [EC:7.5.2.-] | NA | K02056 | NA | NA | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| NA | NA | map00030 | Pentose phosphate pathway | NA | NA | NA | rbsK, RBKS | ribokinase [EC:2.7.1.15] | ec:2.7.1.15 | K00852 | NA | NA | 2 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 2 | 1 | 1 | 3 | 1 | 1 | 0 | 0 | 0 | 3 | 1 | 0 | 1 | 0 | 2 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 1 | 0 | 1 | 0 | 1 | 0 | 0 | 1 |
| NA | NA | map01100 | Metabolic pathways | NA | NA | NA | rbsK, RBKS | ribokinase [EC:2.7.1.15] | ec:2.7.1.15 | K00852 | NA | NA | 2 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 2 | 1 | 1 | 3 | 1 | 1 | 0 | 0 | 0 | 3 | 1 | 0 | 1 | 0 | 2 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 1 | 0 | 1 | 0 | 1 | 0 | 0 | 1 |
| NA | NA | map00030 | Pentose phosphate pathway | NA | NA | NA | deoC, DERA | deoxyribose-phosphate aldolase [EC:4.1.2.4] | ec:4.1.2.4 | K01619 | NA | NA | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 1 | 0 | 1 | 1 | 0 | 1 | 1 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 1 | 0 | 1 |
| NA | NA | map01100 | Metabolic pathways | NA | NA | NA | deoC, DERA | deoxyribose-phosphate aldolase [EC:4.1.2.4] | ec:4.1.2.4 | K01619 | NA | NA | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 1 | 0 | 1 | 1 | 0 | 1 | 1 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 1 | 0 | 1 |
| M00135 | GABA biosynthesis, eukaryotes, putrescine => GABA | map00010 | Glycolysis / Gluconeogenesis | Fermentation | Mixed acid: ethanol, acetate to acetylaldehyde | aldehyde dehydrogenase (NAD+) | ALDH | aldehyde dehydrogenase (NAD+) [EC:1.2.1.3] | ec:1.2.1.3 | K00128 | NA | NA | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
| M00913 | Pantothenate biosynthesis, 2-oxoisovalerate/spermine => pantothenate | map00010 | Glycolysis / Gluconeogenesis | Fermentation | Mixed acid: ethanol, acetate to acetylaldehyde | aldehyde dehydrogenase (NAD+) | ALDH | aldehyde dehydrogenase (NAD+) [EC:1.2.1.3] | ec:1.2.1.3 | K00128 | NA | NA | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
| M01047 | Juvenile hormone biosynthesis, insects, farnesyl-PP => juvenile hormone III | map00010 | Glycolysis / Gluconeogenesis | Fermentation | Mixed acid: ethanol, acetate to acetylaldehyde | aldehyde dehydrogenase (NAD+) | ALDH | aldehyde dehydrogenase (NAD+) [EC:1.2.1.3] | ec:1.2.1.3 | K00128 | NA | NA | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
| M00135 | GABA biosynthesis, eukaryotes, putrescine => GABA | map00053 | Ascorbate and aldarate metabolism | Fermentation | Mixed acid: ethanol, acetate to acetylaldehyde | aldehyde dehydrogenase (NAD+) | ALDH | aldehyde dehydrogenase (NAD+) [EC:1.2.1.3] | ec:1.2.1.3 | K00128 | NA | NA | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
| M00913 | Pantothenate biosynthesis, 2-oxoisovalerate/spermine => pantothenate | map00053 | Ascorbate and aldarate metabolism | Fermentation | Mixed acid: ethanol, acetate to acetylaldehyde | aldehyde dehydrogenase (NAD+) | ALDH | aldehyde dehydrogenase (NAD+) [EC:1.2.1.3] | ec:1.2.1.3 | K00128 | NA | NA | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
You can export this profile like this:
write.table(ko_bin_mapp, "ko_bin_mapp.tsv",
quote = F,
sep = "\t",
row.names = F,
col.names = T)Or setting write write = T in the function
read_ko()