PICRUSt2 (Phylogenetic Investigation of Communities by Reconstruction of Unobserved States) is software that predicts functional abundances from marker gene sequences. It typically focuses on gene families like KEGG orthologs and Enzyme Classification numbers, but can predict any trait. While commonly used with 16S rRNA data, it also supports other marker genes.

The example data is from an article already published link and were sediment core samples that were collected from three zones of mangrove ecosystem: Basin, Fringe and Impaired. The samples were taken during two seasonal conditions, dry and flood seasons, at three sediment depths: surface, mid-layer, and deeper sediment layers, to assess the impact of degradation level or zone, seasonal variation, and sediment depth on microbial communities.

First, load the rbims package.

The function to use that information is read_picrust2. This function can parse the information of the KO, EC and pathways data from picrust2. The KEGG analysis is just possible if KO table is used.

KO_picrust2_profile <- read_picrust2("../inst/extdata/KO_metagenome_out/KO_pred_metagenome_unstrat.tsv", database = "KO", profile = T)
  • The database argument will parse the database. In this example, I will explore the KO output.

  • The output format is chosen with the profile argument. When profile = T, a wide output is obtained.

  • The write argument saves the formatted table generated in .tsv extension. When write = F gives you the output but not saves the table in your current directory.

If you want to follow the example you can download the use rbims test file.

head(KO_picrust2_profile)
#> # A tibble: 6 × 54
#>   KO     zr2502_1_R1 zr2502_10_R1 zr2502_11_R1 zr2502_12_R1 zr2502_13_R1
#>   <chr>        <dbl>        <dbl>        <dbl>        <dbl>        <dbl>
#> 1 K00001          82          407            0          243            0
#> 2 K00002           0            0            0            0            0
#> 3 K00003         636         1435          665         1033          373
#> 4 K00004           0            0            0            0            0
#> 5 K00005           0            0            0           39            0
#> 6 K00007           0            0            0            0            0
#> # ℹ 48 more variables: zr2502_14_R1 <dbl>, zr2502_15_R1 <dbl>,
#> #   zr2502_17_R1 <dbl>, zr2502_18_R1 <dbl>, zr2502_19_R1 <dbl>,
#> #   zr2502_2_R1 <dbl>, zr2502_20_R1 <dbl>, zr2502_21_R1 <dbl>,
#> #   zr2502_22_R1 <dbl>, zr2502_23_R1 <dbl>, zr2502_24_R1 <dbl>,
#> #   zr2502_25_R1 <dbl>, zr2502_26_R1 <dbl>, zr2502_27_R1 <dbl>,
#> #   zr2502_28_R1 <dbl>, zr2502_29_R1 <dbl>, zr2502_3_R1 <dbl>,
#> #   zr2502_30_R1 <dbl>, zr2502_31_R1 <dbl>, zr2502_32_R1 <dbl>, …

Or print a long table profile = F.

KO_picrust2_profile_long <- read_picrust2("../inst/extdata/KO_metagenome_out/KO_pred_metagenome_unstrat.tsv", database = "KO", profile = F)
head(KO_picrust2_profile_long)
#> # A tibble: 6 × 3
#>   KO     Bin_name     Abundance
#>   <chr>  <chr>            <dbl>
#> 1 K00001 zr2502_1_R1         82
#> 2 K00001 zr2502_10_R1       407
#> 3 K00001 zr2502_11_R1         0
#> 4 K00001 zr2502_12_R1       243
#> 5 K00001 zr2502_13_R1         0
#> 6 K00001 zr2502_14_R1       184

You can export this to a table like this:

write.table(KO_picrust2_profile_long, "KO_picrust2.tsv", quote = F, sep = "\t", row.names = F, col.names = T)

Or setting write write = T.

Other databases optiones are: “EC” and “pathway”:

EC_picrust2_profile <- read_picrust2("../inst/extdata/EC_metagenome_out/EC_pred_metagenome_unstrat.tsv", database = "EC", profile = T)
head(EC_picrust2_profile)
#> # A tibble: 6 × 54
#>   EC           zr2502_1_R1 zr2502_10_R1 zr2502_11_R1 zr2502_12_R1 zr2502_13_R1
#>   <chr>              <dbl>        <dbl>        <dbl>        <dbl>        <dbl>
#> 1 EC:1.1.1.1          1067         2894          857         2182          518
#> 2 EC:1.1.1.100        2503         6743         1831         4215         1409
#> 3 EC:1.1.1.103           0          259            0            0            0
#> 4 EC:1.1.1.108           0          419            0          231            0
#> 5 EC:1.1.1.11            0            0            0            0            0
#> 6 EC:1.1.1.122           0            0            0            0            0
#> # ℹ 48 more variables: zr2502_14_R1 <dbl>, zr2502_15_R1 <dbl>,
#> #   zr2502_17_R1 <dbl>, zr2502_18_R1 <dbl>, zr2502_19_R1 <dbl>,
#> #   zr2502_2_R1 <dbl>, zr2502_20_R1 <dbl>, zr2502_21_R1 <dbl>,
#> #   zr2502_22_R1 <dbl>, zr2502_23_R1 <dbl>, zr2502_24_R1 <dbl>,
#> #   zr2502_25_R1 <dbl>, zr2502_26_R1 <dbl>, zr2502_27_R1 <dbl>,
#> #   zr2502_28_R1 <dbl>, zr2502_29_R1 <dbl>, zr2502_3_R1 <dbl>,
#> #   zr2502_30_R1 <dbl>, zr2502_31_R1 <dbl>, zr2502_32_R1 <dbl>, …
pathways_picrust2_profile <- read_picrust2("../inst/extdata/pathways_out/path_abun_unstrat.tsv", database = "pathway", profile = T)
head(pathways_picrust2_profile)
#> # A tibble: 6 × 54
#>   pathway        zr2502_1_R1 zr2502_10_R1 zr2502_11_R1 zr2502_12_R1 zr2502_13_R1
#>   <chr>                <dbl>        <dbl>        <dbl>        <dbl>        <dbl>
#> 1 1CMET2-PWY            470.        1258.         599.        1025.         284.
#> 2 3-HYDROXYPHEN…          0          142.           0            0            0 
#> 3 ANAEROFRUCAT-…        514.        1415.         675.        1128.         368.
#> 4 ANAGLYCOLYSIS…        653.        1530.         729.        1148.         385.
#> 5 ARG+POLYAMINE…          0          682.           0            0            0 
#> 6 ARGORNPROST-P…          0          421.           0          494.           0 
#> # ℹ 48 more variables: zr2502_14_R1 <dbl>, zr2502_15_R1 <dbl>,
#> #   zr2502_17_R1 <dbl>, zr2502_18_R1 <dbl>, zr2502_19_R1 <dbl>,
#> #   zr2502_2_R1 <dbl>, zr2502_20_R1 <dbl>, zr2502_21_R1 <dbl>,
#> #   zr2502_22_R1 <dbl>, zr2502_23_R1 <dbl>, zr2502_24_R1 <dbl>,
#> #   zr2502_25_R1 <dbl>, zr2502_26_R1 <dbl>, zr2502_27_R1 <dbl>,
#> #   zr2502_28_R1 <dbl>, zr2502_29_R1 <dbl>, zr2502_3_R1 <dbl>,
#> #   zr2502_30_R1 <dbl>, zr2502_31_R1 <dbl>, zr2502_32_R1 <dbl>, …