Field Note 12: Dispatch - HTGAA Week 5 Homework

Field Note 12: Dispatch - HTGAA Week 5 Homework

Part A (From Pranam)

I generated Two Binder peptides for the SOD1 (A4V mutated) Protein in additon to the literature binder peptide FLYRWLPSRRGG that was provided in the homework assignment.

Number Binder Pseudo Perplexity
0 WRVYVVAVALKE 21.914981
1 WRYYVAAAALGE 11.314945
2 FLYRWLPSRRGG n/a


I then extracted the ipTM scores from the log files generated by AlphaFold-Multimer for each seed of the three binder peptides complexed with the SOD1 (A4V mutated) Protein and plotted them into a bar chart using matplotlib.

ipTM Plot

Bar Chart of ipTM Scores for Binder Peptides Complexed with SOD1 (A4V mutated) Protein

The bar chart compares the ipTM values for five models/seeds across three different Binders (complexes). Binder 2 has some of the highest ipTM values, peaking at 0.314 for model 5. Binder 1 also shows high ipTM values, with its highest at 0.301 for model 4. Binder 3 consistently has the lowest ipTM values across all models, with its highest value being 0.213 for model 2. Overall, Binder 2 appears to have the best performance based on ipTM values, while Binder 3 performs the worst among them.

Part B (Final Project: L-Protein Mutants)

Decided to go with OPTION 2: Follow the below pipeline to engineer the L-protein!, The reason i didn’t go with my Armored Phage Proposal From Last Week’s Homework Assignment, is that after going through it again, i believe it still lacks more validation or “Success Metrics” and more thermostability-related tools to back it up. So for now i will be focusing on engineering the L-protein.

This heatmap was generated using this notebook

Predicted Effects of Mutations on Protein Sequence

A Heatmap Showing the Predicted Effects of Mutations on the L Protein Sequence

We need to identify mutations that have a possible enhancing effect on the lysis activity of the L Protein, mainly we need to select mutations that:

  1. Have a positive LLR score, indicating this is a favorable mutation.
  2. Should not be in the conserved regions of the protein.
  3. Maintains the Lysis activity of the L Protein.
  4. Does not disrupt the coat and replication Proteins of the MS2 Phage.

To identify the conservative pBlast and Clustal Omega were used to identify the conserved regions of the L Protein. Here is a link to the Clustal Omega job that was used to identify the conserved regions of the L Protein.

To Better Visualize the conserved regions of the L Protein, I imported the L Protein sequence into Benchling and annotated the conserved regions based on the Clustal Omega results. I also annotated the Soluble Domain (Amino Acids 1-40) and the Transmembrane Domain (Amino Acids 41-75).

Annotated L Protein on Benchling

L Protein Sequence with Annotated Conserved Regions in Benchling (Gray Tracks are the conserved Regions)

Interesting Observation: It seems that all the conserved regions are concentrated in the soluble domain, the one responsible for interacting with the host’s chaperone DnaJ, while the transmembrane domain seems to be less conserved.

I selected the following mutations as potential candidates for enhancing the lysis activity of the L Protein:

Mutation Codename Amino Acid Position Amino Acid Change LLR Score Region Overlaps with Coat/Rep protein?
K50L 50 K->L 2.56146419048309 Transmembrane Yes
Y39L 39 Y->L 2.24177801609039 Soluble Yes
S09Q 9 S->Q 2.01432251930236 Soluble Yes
K50I 50 K->I 1.92879796028137 Transmembrane Yes
Y27R 27 Y->R 1.62805962562561 Soluble No


Check out Other HTGAA Submissions from Here