vignettes/06_Create_merops_profile.Rmd
06_Create_merops_profile.RmdThe MEROPS database classifies peptidases (proteases) and their inhibitors using a hierarchical, structure-based system. Peptidases are grouped into Families based on significant sequence similarities, and related families are further grouped into Clans, indicating evolutionary relationships. This classification helps researchers understand enzyme function, structure, and evolution. The database provides sequence identifiers, structural data (if available), and literature references for deeper exploration.
Rbims also includes functions to read and explore MEROPS annotations.
To learn how to to this;
First, load the rbims package.
read_merops(): Parse MEROPS Outputs
The read_merops function is designed to parse and format
raw output files from the MEROPS blast annotation tool.
File Path: Provide the path to the directory containing your MEROPS files.
File Extension: The function specifically looks for files ending
in *.txt.
The processed input contains 6 key columns:
qseqid: Query Sequence ID.
sseqid: Subject Sequence ID.
stitle: Subject Title.
pident: Percentage of identical matches.
evalue: Expect value.
bitscore: Bit score.
If you want to follow this example, you can download the raw data here.
To obtain a wide output, the argument profile = T.
merops_profile_T <- read_merops(merops_path = "../test/results/04.merops",
profile = T,
write = F )
head(merops_profile_T)| MEROPS_family | domain_name | 5mSIPHEX1_0 | 5mSIPHEX1_1 | 5mSIPHEX1_10 | 5mSIPHEX1_11 | 5mSIPHEX1_13 | 5mSIPHEX1_15 | 5mSIPHEX1_18 | 5mSIPHEX1_19 | 5mSIPHEX1_2 | 5mSIPHEX1_25 | 5mSIPHEX1_26 | 5mSIPHEX1_32 | 5mSIPHEX1_33 | 5mSIPHEX1_37 | 5mSIPHEX1_8 | 5mSIPHEX1_9 | 5mSIPHEX2_10 | 5mSIPHEX2_14 | 5mSIPHEX2_16 | 5mSIPHEX2_18 | 5mSIPHEX2_25 | 5mSIPHEX2_3 | 5mSIPHEX2_5 | 5mSIPHEX2_7 | 700mSIPHEX1_0 | 700mSIPHEX1_1 | 700mSIPHEX1_12 | 700mSIPHEX1_15 | 700mSIPHEX1_17 | 700mSIPHEX1_18 | 700mSIPHEX1_2 | 700mSIPHEX1_20 | 700mSIPHEX1_3 | 700mSIPHEX1_8 | 700mSIPHEX2_13 | 700mSIPHEX2_14 | 700mSIPHEX2_16 | 700mSIPHEX2_21 | 700mSIPHEX2_22 | 700mSIPHEX2_23 | 700mSIPHEX2_24 | 700mSIPHEX2_9 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| MER0000510 | ClpA ATP-ase component of endopeptidase Clp | X20.001 | 1 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
| MER0011344 | family C26 unassigned peptidases | C26.UPW | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| MER0015597 | amidophosphoribosyltransferase precursor | C44.001 | 1 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| MER0020519 | TldD peptidase | M103.001 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| MER0024566 | amidophosphoribosyltransferase precursor | C44.001 | 1 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
| MER0024582 | methionyl aminopeptidase 1 | M24.001 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| MER0025778 | GMP synthase | C26.957 | 1 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| MER0026061 | peptidase Clp | S14.001 | 1 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
| MER0026064 | Lon-A peptidase | S16.001 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
| MER0027987 | peptidase Clp | S14.001 | 1 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
Or print a long table profile = F.
merops_profile_F <-read_merops(merops_path = "../test/results/04.merops",
profile = F,
write = F)
head(merops_profile_F)| Bin_name | MEROPS_family | domain_name | Abundance |
|---|---|---|---|
| 5mSIPHEX1_0 | MER0000510 | ClpA ATP-ase component of endopeptidase Clp | X20.001 | 1 |
| 5mSIPHEX1_0 | MER0011344 | family C26 unassigned peptidases | C26.UPW | 1 |
| 5mSIPHEX1_0 | MER0015597 | amidophosphoribosyltransferase precursor | C44.001 | 1 |
| 5mSIPHEX1_0 | MER0020519 | TldD peptidase | M103.001 | 1 |
| 5mSIPHEX1_0 | MER0024566 | amidophosphoribosyltransferase precursor | C44.001 | 1 |
| 5mSIPHEX1_0 | MER0024582 | methionyl aminopeptidase 1 | M24.001 | 1 |
| 5mSIPHEX1_0 | MER0025778 | GMP synthase | C26.957 | 1 |
| 5mSIPHEX1_0 | MER0026061 | peptidase Clp | S14.001 | 1 |
| 5mSIPHEX1_0 | MER0026064 | Lon-A peptidase | S16.001 | 1 |
| 5mSIPHEX1_0 | MER0027987 | peptidase Clp | S14.001 | 1 |
| 5mSIPHEX1_0 | MER0029893 | HslV component of HslUV peptidase | T01.006 | 1 |
| 5mSIPHEX1_0 | MER0036043 | glutamate synthase | C44.003 | 1 |
| 5mSIPHEX1_0 | MER0043258 | urease | M38.982 | 1 |
| 5mSIPHEX1_0 | MER0047854 | family M41 unassigned peptidases | M41.UPW | 1 |
| 5mSIPHEX1_0 | MER0048082 | methionyl aminopeptidase 1 | M24.001 | 1 |
You can export this profile with:
write.table(merops_profile_T, "merops.tsv", quote = F, sep = "\t", row.names = F, col.names = T)Or setting write write = T in the function
read_merops()