A common practice to annotate the genes of a new genome or reconstructed genome is to use InterProScan. Here are some functions to explore that information.

First, load the rbims package.

The function to use that information is read_interpro. This function can parse the information of the PFAM, INTERPRO, and KEGG ids. The KEGG analysis is just possible if InterProScan was run with the -pa option. Two output options are also possible: a wide profile, or a long table.

  • The database argument will parse the database. In this example, I will explore the PFAM output.

  • The output format is chosen with the profile argument. When profile = T, a wide output is obtained.

  • The write argument saves the formatted table generated in .tsv extension. When write = F gives you the output but not saves the table in your current directory.

If you want to follow the example you can download the use rbims test file.

interpro_pfam_profile<-read_interpro(data_interpro = "../inst/extdata/Interpro_test.tsv", database="Pfam", profile =T)
head(interpro_pfam_profile)
#> # A tibble: 6 × 8
#>   PFAM    domain_name                   Bin_10 Bin_12 Bin_56 Bin_113 Bin_1 Bin_2
#>   <chr>   <chr>                          <int>  <int>  <int>   <int> <int> <int>
#> 1 PF03595 Voltage-dependent anion chan…      1      1      1       0     0     0
#> 2 PF00440 Bacterial regulatory protein…      0      0      0       1     1     0
#> 3 PF13305 WHG domain                         0      0      0       1     1     0
#> 4 PF01131 DNA topoisomerase                  1      0      0       0     0     1
#> 5 PF08272 Topoisomerase I zinc-ribbon-…      1      0      0       0     0     1
#> 6 PF01751 Toprim domain                      1      0      0       0     0     1

Or print a long table profile = F.

interpro_pfam_long<-read_interpro("../inst/extdata/Interpro_test.tsv", database="Pfam", profile = F)
head(interpro_pfam_long)
#> # A tibble: 6 × 5
#>   Bin_name Scaffold_name      PFAM    domain_name                      Abundance
#>   <chr>    <chr>              <chr>   <chr>                                <int>
#> 1 Bin_10   scaffold_441_c1_24 PF03595 Voltage-dependent anion channel          1
#> 2 Bin_12   scaffold_69_c1_124 PF03595 Voltage-dependent anion channel          1
#> 3 Bin_56   scaffold_71_c1_69  PF03595 Voltage-dependent anion channel          1
#> 4 Bin_113  scaffold_145_c1_85 PF00440 Bacterial regulatory proteins, …         1
#> 5 Bin_113  scaffold_145_c1_85 PF13305 WHG domain                               1
#> 6 Bin_1    scaffold_146_c1_1  PF00440 Bacterial regulatory proteins, …         1

You can export this to a table like this:

write.table(interpro_pfam_long, "Interpro.tsv", quote = F, sep = "\t", row.names = F, col.names = T)

Or setting write write = T.