PICRUSt2 (Phylogenetic Investigation of Communities by Reconstruction of Unobserved States) is software that predicts functional abundances from marker gene sequences. It typically focuses on gene families like KEGG orthologs and Enzyme Classification numbers, but can predict any trait. While commonly used with 16S rRNA data, it also supports other marker genes.
The example data is from an article already published link and were sediment core samples that were collected from three zones of mangrove ecosystem: Basin, Fringe and Impaired. The samples were taken during two seasonal conditions, dry and flood seasons, at three sediment depths: surface, mid-layer, and deeper sediment layers, to assess the impact of degradation level or zone, seasonal variation, and sediment depth on microbial communities.
First, load the rbims package.
The function to use that information is read_picrust2
.
This function can parse the information of the KO, EC and pathways data
from picrust2. The KEGG analysis is just possible if KO table is
used.
KO_picrust2_profile <- read_picrust2("../inst/extdata/KO_metagenome_out/KO_pred_metagenome_unstrat.tsv", database = "KO", profile = T)
The database
argument will parse the database. In
this example, I will explore the KO output.
The output format is chosen with the profile
argument. When profile = T, a wide output is
obtained.
The write
argument saves the formatted table
generated in .tsv extension. When write = F gives you
the output but not saves the table in your current directory.
If you want to follow the example you can download the use rbims test file.
head(KO_picrust2_profile)
#> # A tibble: 6 × 54
#> KO zr2502_1_R1 zr2502_10_R1 zr2502_11_R1 zr2502_12_R1 zr2502_13_R1
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 K00001 82 407 0 243 0
#> 2 K00002 0 0 0 0 0
#> 3 K00003 636 1435 665 1033 373
#> 4 K00004 0 0 0 0 0
#> 5 K00005 0 0 0 39 0
#> 6 K00007 0 0 0 0 0
#> # ℹ 48 more variables: zr2502_14_R1 <dbl>, zr2502_15_R1 <dbl>,
#> # zr2502_17_R1 <dbl>, zr2502_18_R1 <dbl>, zr2502_19_R1 <dbl>,
#> # zr2502_2_R1 <dbl>, zr2502_20_R1 <dbl>, zr2502_21_R1 <dbl>,
#> # zr2502_22_R1 <dbl>, zr2502_23_R1 <dbl>, zr2502_24_R1 <dbl>,
#> # zr2502_25_R1 <dbl>, zr2502_26_R1 <dbl>, zr2502_27_R1 <dbl>,
#> # zr2502_28_R1 <dbl>, zr2502_29_R1 <dbl>, zr2502_3_R1 <dbl>,
#> # zr2502_30_R1 <dbl>, zr2502_31_R1 <dbl>, zr2502_32_R1 <dbl>, …
Or print a long table profile = F.
KO_picrust2_profile_long <- read_picrust2("../inst/extdata/KO_metagenome_out/KO_pred_metagenome_unstrat.tsv", database = "KO", profile = F)
head(KO_picrust2_profile_long)
#> # A tibble: 6 × 3
#> KO Bin_name Abundance
#> <chr> <chr> <dbl>
#> 1 K00001 zr2502_1_R1 82
#> 2 K00001 zr2502_10_R1 407
#> 3 K00001 zr2502_11_R1 0
#> 4 K00001 zr2502_12_R1 243
#> 5 K00001 zr2502_13_R1 0
#> 6 K00001 zr2502_14_R1 184
You can export this to a table like this:
write.table(KO_picrust2_profile_long, "KO_picrust2.tsv", quote = F, sep = "\t", row.names = F, col.names = T)
Or setting write write = T.
Other databases optiones are: “EC” and “pathway”:
EC_picrust2_profile <- read_picrust2("../inst/extdata/EC_metagenome_out/EC_pred_metagenome_unstrat.tsv", database = "EC", profile = T)
head(EC_picrust2_profile)
#> # A tibble: 6 × 54
#> EC zr2502_1_R1 zr2502_10_R1 zr2502_11_R1 zr2502_12_R1 zr2502_13_R1
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 EC:1.1.1.1 1067 2894 857 2182 518
#> 2 EC:1.1.1.100 2503 6743 1831 4215 1409
#> 3 EC:1.1.1.103 0 259 0 0 0
#> 4 EC:1.1.1.108 0 419 0 231 0
#> 5 EC:1.1.1.11 0 0 0 0 0
#> 6 EC:1.1.1.122 0 0 0 0 0
#> # ℹ 48 more variables: zr2502_14_R1 <dbl>, zr2502_15_R1 <dbl>,
#> # zr2502_17_R1 <dbl>, zr2502_18_R1 <dbl>, zr2502_19_R1 <dbl>,
#> # zr2502_2_R1 <dbl>, zr2502_20_R1 <dbl>, zr2502_21_R1 <dbl>,
#> # zr2502_22_R1 <dbl>, zr2502_23_R1 <dbl>, zr2502_24_R1 <dbl>,
#> # zr2502_25_R1 <dbl>, zr2502_26_R1 <dbl>, zr2502_27_R1 <dbl>,
#> # zr2502_28_R1 <dbl>, zr2502_29_R1 <dbl>, zr2502_3_R1 <dbl>,
#> # zr2502_30_R1 <dbl>, zr2502_31_R1 <dbl>, zr2502_32_R1 <dbl>, …
pathways_picrust2_profile <- read_picrust2("../inst/extdata/pathways_out/path_abun_unstrat.tsv", database = "pathway", profile = T)
head(pathways_picrust2_profile)
#> # A tibble: 6 × 54
#> pathway zr2502_1_R1 zr2502_10_R1 zr2502_11_R1 zr2502_12_R1 zr2502_13_R1
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1CMET2-PWY 470. 1258. 599. 1025. 284.
#> 2 3-HYDROXYPHEN… 0 142. 0 0 0
#> 3 ANAEROFRUCAT-… 514. 1415. 675. 1128. 368.
#> 4 ANAGLYCOLYSIS… 653. 1530. 729. 1148. 385.
#> 5 ARG+POLYAMINE… 0 682. 0 0 0
#> 6 ARGORNPROST-P… 0 421. 0 494. 0
#> # ℹ 48 more variables: zr2502_14_R1 <dbl>, zr2502_15_R1 <dbl>,
#> # zr2502_17_R1 <dbl>, zr2502_18_R1 <dbl>, zr2502_19_R1 <dbl>,
#> # zr2502_2_R1 <dbl>, zr2502_20_R1 <dbl>, zr2502_21_R1 <dbl>,
#> # zr2502_22_R1 <dbl>, zr2502_23_R1 <dbl>, zr2502_24_R1 <dbl>,
#> # zr2502_25_R1 <dbl>, zr2502_26_R1 <dbl>, zr2502_27_R1 <dbl>,
#> # zr2502_28_R1 <dbl>, zr2502_29_R1 <dbl>, zr2502_3_R1 <dbl>,
#> # zr2502_30_R1 <dbl>, zr2502_31_R1 <dbl>, zr2502_32_R1 <dbl>, …