First, load the rbims package.
The second thing to do would be to read the metadata file and the KofamKOALA/KofamScan output. The metadata file is a tab-separated file containing the name of your bins and any extra information you would like to use for visualization.
The metadata file contains external information of each bin, like sample site or taxonomy. The name of the bins in the first column is mandatory.
The metadata table can be read in various formats (csv, tsv, txt,
xlsx); you will need to use the corresponding function to read the type
of file you have. In this case, the example table of rbims is in excel
format; therefore, to read the metadata, we will use the function
read_excel
from the package readxl
. You can
download the metadata example file metadata
and try.
library(readxl)
metadata<-read_excel("metadata.xlsx")
head(metadata)
#> # A tibble: 6 × 10
#> Bin_name Sample_site Clades Domain Phylum Class Order Family Genus Species
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 Bin_10 Sediment Clade_1 d__Bact… p__Pr… c__G… o__P… f__Sa… g__O… s__Ole…
#> 2 Bin_12 Sediment Clade_2 d__Bact… p__Pr… c__G… o__P… f__Ol… g__M… s__Mar…
#> 3 Bin_56 Sediment Clade_3 d__Bact… p__Pr… c__G… o__P… f__Sa… g__T… s__Tha…
#> 4 Bin_113 Water_column Clade_1 d__Bact… p__Pr… c__A… o__R… f__Rh… g__P… s__Par…
#> 5 Bin_1 Water_column Clade_2 d__Bact… p__Pr… c__G… o__E… f__Al… g__A… s__Alt…
#> 6 Bin_2 Water_column Clade_3 d__Bact… p__Pr… c__G… o__E… f__Al… g__A… s__Alt…
If you followed the create KEGG profile tutorial, you could go directly to a case example of exploring and specific pathway.
read_ko can read multiple text files obtained from KofamKOALA/KofamScan or KAAS, as long as they are all in the same path in your working directory. If you use both, and there are different hits for a KO in both searches, it will take the hit from KofamKOALA/KofamScan.
ko_bin_table<-read_ko(data_kofam ="C:/Users/bins")
Then map the KO to the rest of the features of the KEGG and rbims database.
ko_bin_mapp<-mapping_ko(ko_bin_table)
To explore the metabolism table, rbims has three functions to subset the table:
Let’s say that you are interested in the genes associated with the biofilm formation in Vibrio Cholerae.
Biofilm_Vibrio<-c("map05111")
library(tidyr)
#> Warning: package 'tidyr' was built under R version 4.3.2
Biofilm_Vibrio_subset<-ko_bin_mapp%>%
drop_na(Pathway) %>%
get_subset_pathway(Pathway, Biofilm_Vibrio)
head(Biofilm_Vibrio_subset)
#> # A tibble: 6 × 19
#> Module Module_description Pathway Pathway_description Cycle Pathway_cycle
#> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 NA NA map051… Biofilm formation … NA NA
#> 2 NA NA map051… Biofilm formation … NA NA
#> 3 NA NA map051… Biofilm formation … NA NA
#> 4 M00878 Phenylacetate degradat… map051… Biofilm formation … NA NA
#> 5 NA NA map051… Biofilm formation … Other Flagellar as…
#> 6 NA NA map051… Biofilm formation … NA NA
#> # ℹ 13 more variables: Detail_cycle <chr>, Genes <chr>, Gene_description <chr>,
#> # Enzyme <chr>, KO <chr>, rbims_pathway <chr>, rbims_sub_pathway <chr>,
#> # Bin_10 <int>, Bin_12 <int>, Bin_56 <int>, Bin_113 <int>, Bin_1 <int>,
#> # Bin_2 <int>
plot_bubble(tibble_ko = Biofilm_Vibrio_subset,
x_axis = Bin_name,
y_axis = KO,
analysis="KEGG",
data_experiment = metadata,
calc="Binary",
color_character = Order,
range_size = c(1,10))
Let’s say that you want to order by bin names.
order_taxa<-c("Bin_1", "Bin_2", "Bin_10", "Bin_113", "Bin_12", "Bin_56")
plot_bubble(tibble_ko = Biofilm_Vibrio_subset,
data_experiment = metadata,
x_axis = Bin_name,
y_axis = Genes,
analysis="KEGG",
calc="Binary",
order_bins=order_taxa,
color_character=Genus,
range_size = c(5,6))
Here, I will extract the information of some KO related to Carbon fixation metabolism.
Carbon_fixation<-c("K01007", "K00626", "K01902", "K01595", "K01903", "K00170", "K00169", "K00171", "K00172", "K00241")
library(tidyr)
Carbon_fixation_subset<-ko_bin_mapp%>%
drop_na(KO) %>%
get_subset_pathway(KO, Carbon_fixation)
head(Carbon_fixation_subset)
#> # A tibble: 6 × 19
#> Module Module_description Pathway Pathway_description Cycle Pathway_cycle
#> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 M00173 Reductive citrate cycl… map000… Glycolysis / Gluco… Carb… Reductive ci…
#> 2 M00173 Reductive citrate cycl… map000… Glycolysis / Gluco… Carb… Dicarboxylat…
#> 3 M00374 Dicarboxylate-hydroxyb… map000… Glycolysis / Gluco… Carb… Reductive ci…
#> 4 M00374 Dicarboxylate-hydroxyb… map000… Glycolysis / Gluco… Carb… Dicarboxylat…
#> 5 M00173 Reductive citrate cycl… map006… Pyruvate metabolism Carb… Reductive ci…
#> 6 M00173 Reductive citrate cycl… map006… Pyruvate metabolism Carb… Dicarboxylat…
#> # ℹ 13 more variables: Detail_cycle <chr>, Genes <chr>, Gene_description <chr>,
#> # Enzyme <chr>, KO <chr>, rbims_pathway <chr>, rbims_sub_pathway <chr>,
#> # Bin_10 <int>, Bin_12 <int>, Bin_56 <int>, Bin_113 <int>, Bin_1 <int>,
#> # Bin_2 <int>
We can visualize the data with a heatmap.
plot_heatmap(tibble_ko=Carbon_fixation_subset,
y_axis=Genes,
analysis = "KEGG",
calc="Binary")
In this example, we will use energy metabolism to explore the rest of the functions.
Other_energy<-c("Fermentation", "Carbon fixation", "Methane metabolism",
"Sulfur metabolism", "Nitrogen metabolism")
library(tidyr)
Energy_metabolisms<-ko_bin_mapp %>%
drop_na(Cycle) %>%
get_subset_pathway(Cycle, Other_energy)
head(Energy_metabolisms)
#> # A tibble: 6 × 19
#> Module Module_description Pathway Pathway_description Cycle Pathway_cycle
#> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 M00173 Reductive citrate cycl… map000… Glycolysis / Gluco… Carb… Reductive ci…
#> 2 M00173 Reductive citrate cycl… map000… Glycolysis / Gluco… Carb… Dicarboxylat…
#> 3 M00374 Dicarboxylate-hydroxyb… map000… Glycolysis / Gluco… Carb… Reductive ci…
#> 4 M00374 Dicarboxylate-hydroxyb… map000… Glycolysis / Gluco… Carb… Dicarboxylat…
#> 5 M00173 Reductive citrate cycl… map006… Pyruvate metabolism Carb… Reductive ci…
#> 6 M00173 Reductive citrate cycl… map006… Pyruvate metabolism Carb… Dicarboxylat…
#> # ℹ 13 more variables: Detail_cycle <chr>, Genes <chr>, Gene_description <chr>,
#> # Enzyme <chr>, KO <chr>, rbims_pathway <chr>, rbims_sub_pathway <chr>,
#> # Bin_10 <int>, Bin_12 <int>, Bin_56 <int>, Bin_113 <int>, Bin_1 <int>,
#> # Bin_2 <int>
plot_heatmap(tibble_ko=Energy_metabolisms,
y_axis=Pathway_cycle,
order_y = Cycle,
analysis = "KEGG",
calc="Percentage")
plot_heatmap(tibble_ko=Energy_metabolisms,
y_axis=Pathway_cycle,
data_experiment=metadata,
order_x = Sample_site,
analysis = "KEGG",
calc="Percentage")
plot_heatmap(tibble_ko=Energy_metabolisms,
y_axis=Pathway_cycle,
order_y = Cycle,
split_y = TRUE,
analysis = "KEGG",
calc="Percentage")
plot_heatmap(tibble_ko=Energy_metabolisms,
data_experiment = metadata,
y_axis=Pathway_cycle,
order_y = Cycle,
order_x = Clades,
split_y = TRUE,
analysis = "KEGG",
calc="Percentage")