Rbims also includes functions to read and explore KEGG annotations. To learn how to to this;

Make sure the library is loaded

read_ko(): Parse KEGG Outputs

The read_ko. function is designed to parse and format raw output files from the InterProScan annotation tool.

?read_ko

Input Requirements

  • File Path: Provide the path to the directory containing your KEGG files (outputs from the KofamKOALA/KofamScan annotation). Even if its just one, the path needs to be explicitly defined for the directory only.

  • File Extension: The function specifically looks for files ending in *.txt.

Data Structure

The processed input contains 6 key columns:

  • gene name: the unique identifier for the sequence that was annotated.

  • KO: KEGG ortholog identifier. The “ID number” for the functional group.

  • thrshld: the minimum score requirement set by the annotation tool to consider a match “real.”

  • score: the bit-score from the sequence alignment.

  • E-value: the number of hits one can “expect” to see by chance when searching a database of a particular size.

  • KO definition: description of the functional group.

If you want to follow this example, you can download the files from here.

ko_bin_table<-read_ko(data_kofam ="../test/results/02.kofam")

The read_ko function will create a table that contains the abundance of each KO within each bin.

head(ko_bin_table)
Table 1. KEGG Profile Overview ( profile = T )
Scaffold_name Bin_name KO Abundance
5mSIPHEX1_0_scaffold_1104_c1_2 5mSIPHEX1_0 K02056 1
5mSIPHEX1_0_scaffold_1104_c1_6 5mSIPHEX1_0 K00852 2
5mSIPHEX1_0_scaffold_1104_c1_7 5mSIPHEX1_0 K01619 1
5mSIPHEX1_0_scaffold_1104_c1_8 5mSIPHEX1_0 K00128 1
5mSIPHEX1_0_scaffold_12_c2_100 5mSIPHEX1_0 K07231 1
5mSIPHEX1_0_scaffold_12_c2_102 5mSIPHEX1_0 K18911 1
5mSIPHEX1_0_scaffold_12_c2_103 5mSIPHEX1_0 K25285 1
5mSIPHEX1_0_scaffold_12_c2_103 5mSIPHEX1_0 K02013 3
5mSIPHEX1_0_scaffold_12_c2_104 5mSIPHEX1_0 K25283 1
5mSIPHEX1_0_scaffold_12_c2_105 5mSIPHEX1_0 K25284 1

Map to the KEGG database

The function mapping_ko can now be used to map the KO and their abundance to the rest of the features of KEGG and rbims database.

ko_bin_mapp<-mapping_ko(ko_bin_table)
head(ko_bin_mapp)
Table 1. KEGG Profile Overview ( profile = T )
Module Module_description Pathway Pathway_description Cycle Pathway_cycle Detail_cycle Genes Gene_description Enzyme KO rbims_pathway rbims_sub_pathway 5mSIPHEX1_0 5mSIPHEX1_1 5mSIPHEX1_10 5mSIPHEX1_11 5mSIPHEX1_13 5mSIPHEX1_15 5mSIPHEX1_18 5mSIPHEX1_19 5mSIPHEX1_2 5mSIPHEX1_25 5mSIPHEX1_26 5mSIPHEX1_32 5mSIPHEX1_33 5mSIPHEX1_37 5mSIPHEX1_8 5mSIPHEX1_9 5mSIPHEX2_10 5mSIPHEX2_14 5mSIPHEX2_16 5mSIPHEX2_18 5mSIPHEX2_25 5mSIPHEX2_3 5mSIPHEX2_5 5mSIPHEX2_7 700mSIPHEX1_0 700mSIPHEX1_1 700mSIPHEX1_12 700mSIPHEX1_15 700mSIPHEX1_17 700mSIPHEX1_18 700mSIPHEX1_2 700mSIPHEX1_20 700mSIPHEX1_3 700mSIPHEX1_8 700mSIPHEX2_13 700mSIPHEX2_14 700mSIPHEX2_16 700mSIPHEX2_21 700mSIPHEX2_22 700mSIPHEX2_23 700mSIPHEX2_24 700mSIPHEX2_9
NA NA NA NA NA NA NA ABC.SS.A simple sugar transport system ATP-binding protein [EC:7.5.2.-] NA K02056 NA NA 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
NA NA map00030 Pentose phosphate pathway NA NA NA rbsK, RBKS ribokinase [EC:2.7.1.15] ec:2.7.1.15 K00852 NA NA 2 1 0 0 1 1 0 0 0 0 0 2 1 1 3 1 1 0 0 0 3 1 0 1 0 2 0 0 0 1 1 0 0 1 1 0 1 0 1 0 0 1
NA NA map01100 Metabolic pathways NA NA NA rbsK, RBKS ribokinase [EC:2.7.1.15] ec:2.7.1.15 K00852 NA NA 2 1 0 0 1 1 0 0 0 0 0 2 1 1 3 1 1 0 0 0 3 1 0 1 0 2 0 0 0 1 1 0 0 1 1 0 1 0 1 0 0 1
NA NA map00030 Pentose phosphate pathway NA NA NA deoC, DERA deoxyribose-phosphate aldolase [EC:4.1.2.4] ec:4.1.2.4 K01619 NA NA 1 0 0 0 1 0 0 0 0 0 0 0 0 1 1 0 0 0 1 0 1 1 0 1 1 1 1 0 1 0 0 0 0 1 1 1 0 0 0 1 0 1
NA NA map01100 Metabolic pathways NA NA NA deoC, DERA deoxyribose-phosphate aldolase [EC:4.1.2.4] ec:4.1.2.4 K01619 NA NA 1 0 0 0 1 0 0 0 0 0 0 0 0 1 1 0 0 0 1 0 1 1 0 1 1 1 1 0 1 0 0 0 0 1 1 1 0 0 0 1 0 1
M00135 GABA biosynthesis, eukaryotes, putrescine => GABA map00010 Glycolysis / Gluconeogenesis Fermentation Mixed acid: ethanol, acetate to acetylaldehyde aldehyde dehydrogenase (NAD+) ALDH aldehyde dehydrogenase (NAD+) [EC:1.2.1.3] ec:1.2.1.3 K00128 NA NA 1 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 1 1 1 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 1
M00913 Pantothenate biosynthesis, 2-oxoisovalerate/spermine => pantothenate map00010 Glycolysis / Gluconeogenesis Fermentation Mixed acid: ethanol, acetate to acetylaldehyde aldehyde dehydrogenase (NAD+) ALDH aldehyde dehydrogenase (NAD+) [EC:1.2.1.3] ec:1.2.1.3 K00128 NA NA 1 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 1 1 1 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 1
M01047 Juvenile hormone biosynthesis, insects, farnesyl-PP => juvenile hormone III map00010 Glycolysis / Gluconeogenesis Fermentation Mixed acid: ethanol, acetate to acetylaldehyde aldehyde dehydrogenase (NAD+) ALDH aldehyde dehydrogenase (NAD+) [EC:1.2.1.3] ec:1.2.1.3 K00128 NA NA 1 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 1 1 1 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 1
M00135 GABA biosynthesis, eukaryotes, putrescine => GABA map00053 Ascorbate and aldarate metabolism Fermentation Mixed acid: ethanol, acetate to acetylaldehyde aldehyde dehydrogenase (NAD+) ALDH aldehyde dehydrogenase (NAD+) [EC:1.2.1.3] ec:1.2.1.3 K00128 NA NA 1 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 1 1 1 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 1
M00913 Pantothenate biosynthesis, 2-oxoisovalerate/spermine => pantothenate map00053 Ascorbate and aldarate metabolism Fermentation Mixed acid: ethanol, acetate to acetylaldehyde aldehyde dehydrogenase (NAD+) ALDH aldehyde dehydrogenase (NAD+) [EC:1.2.1.3] ec:1.2.1.3 K00128 NA NA 1 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 1 1 1 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 1

You can export this profile like this:

write.table(ko_bin_mapp, "ko_bin_mapp.tsv", 
            quote = F, 
            sep = "\t", 
            row.names = F, 
            col.names = T)

Or setting write write = T in the function read_ko()