May 9

4/19/17 Independent Project Planning Jake Hanna

Goal/Rationale:

(I was gone on the 12th and couldn’t make any contribution to the research). Noah and I will use a single gene sequencing tool to analyze the overall usefulness of three genes we found to be present shrooms, caterpillar, and nubia, as well as the TMP (tape-measure-protein) gene., which were the: Capsid Maturation Protease, Terminase large subunit, and a Portal Protein genes. Through the Single gene sequencing tool, we will be able to evaluate the conservation of genes within the clusters. Through this program we will be able to quantify Andreas Gepard-dotplot data, and get a clearer picture of how conserved the selected genes are to each other.

Tools/Procedure:

We will be using the phagesDB database to find the appropriate phages to put through the Single Gene sequencing tool, Cluster Omega, to analyze how similar the same genes are across phages of different clusters as well as within the same cluster. High percentages for similarity from the genes within indicate a low degree of conservation. A high percentage of similarities between genes that come phages of different clusters would indicate that it would be a less effective tool for single gene analysis.

 

Results:

AK cluster

Findings/Next Steps:

While only going through a small portion of what we will need to analyze, these results have already begun to show a high level of inter-cluster gene conservation, but more data will have to be put together to analyze the true significance of our findings.

April 28

NW 03/22/17 & 03/27/17 & 03/29/17 URSA Poster Practice and Presentation

What We Did:

03/22/17: Present Posters to class an pick two to be printed. My group’s poster was picked!

Our Two Final Posters:

Figure A

Figure B

03/27/17: Practice presenting. Our class split off in to two groups and each practiced with one poster. My group worked on our poster (the green one) and we presented to Dr. Adair and Lathan. They critiqued us and we polished our presenting skills.

03/29/17: URSA Scholar’s Day! We presented our posters in front of judges and professors. We also had the opportunity to listen to other student’s presentations and learn about other student research going on at Baylor.

We did it!

April 28

4/12/17: Collecting Data for Project

Goal: To collect the amino acid sequences for each of the genes within each of the clusters and to begin to compare them.

Methods: We began by searching on phagesdb for three different phages within each of the AL, AK, and AU clusters.

Once we chose the phages we searched for the three chosen proteins within each one of them, Capsid Maturation Protease, Terminase large subunit, and a Portal Protein.

The amino acid sequences for each of these proteins within each clusters was then found and recorded using phagesdb

We then started comparing the amino acid sequences with each other within the clusters using a multiple sequence alignment tool, Clustal Omega

Results: We made it through the comparisons of the proteins for the AK cluster

April 28

4/10/17: Revising Our Question

Goal: Today’s goal was to revise our question to make it more specific and yield more important results

Methods: Our group decided that only looking at tape measure proteins would not be enough information to determine if it was the best way to group phages into clusters.

After meeting with Stu’s group who had a similar topic to ours, we decided that in order to make our results more significant we would need something to compare the tmp to. We then chose to include three more genes from three different phages within each of the clusters.

This would allow us to asses the similarity of all four genes within each cluster and across all three clusters in order to determine which was the mos conserved therefore being the most useful for sorting into clusters by single gene analysis.

We decided to use the rest of the day to research different tools and programs to use and decided that the two most useful tools would be a gepard dotplot and a multiple sequence alignment tool.

Conclusions/ Next Steps: Our next step is to collect the amino acid sequences of the three additional genes chosen for comparison

 

April 28

4/5/17: Beginning Planning for Independent Project

Goal: Today’s goal was to develop a question to be used for our final project

Methods: I joined a group with Andrea Springman and Jake Hanna

The first question that our group came up with is how can phages be separated into clusters by single gene analysis

We began be formulating the idea that the gene that codes for tape measure proteins is the best gene for single gene analysis as it would be the most conserved gene within clusters.

We decided to analyse the AU, AK, and AL clusters. We began by finding the amino acid sequences of three different phages within each of these clusters for the tape measure protein.

Conclusion/ Next steps: We ran out of time today but our next steps would be to compare the amino acid sequence of the tape measure proteins within the clusters and across the three clusters.

April 28

3/13-3/29: URSA Scholars week poster

3/13: Today are groups for the poster presentation were put together. My group consisted of me, Niru, Cori, and Alex. At this point we put together the basic poster design and decided to finish the design the next lab day.

3/15: Today our group finished the basic design of the poster as to where are material, methods and results would go. We then split the poster into four parts and began to work individually on each of the parts.

3/20: Today I worked on putting together the methods for both the wet and in silico lab for the poster. I organized each of these into steps and found images of the phages for the methods section.

3/23: Our group got together to put all of our sections together and to reformat the poster so that everything would fit together.

3/27: Today groups practiced presenting our posters in order to chose which poster should be chosen for presenting and to get practice presenting

3/29: Today was the day that we presented our posters for URSA Scholars Day to students and judges.

April 28

4/19 Emily Johnson Gathering the Data pt.2

4/19 Emily Johnson

Purpose: The goal for today was to use bioinformatics tools to compare our HNH Endonuclease sequences

Materials:

TM-Align

SWISS-MODEL

Methods:

We divided up the work today: Navya was in charge of using Phamerator to gather data about our gene, such as GC content, pham numbers, etc. Alex looked up info about HNH Endonucleases, and I used TM-Align and SWISS-MODEL to compare two protein structures at once.

Results:

Caterpillar v Nubia
Caterpillar red nubia blue
TM-Score of: 0.60162
Caterpillar v Shrooms:
Caterpillar red shrooms blue
TM-Score of: 0.52932
Shrooms v Nubia:
Shrooms red nubia blue
TM-Score of: 0.93565
Conclusion:
TM-Score measures the similarity of predicted protein structures and is considered more accurate than the previously used RMSD scores. Interestingly, the TM-Score of Shrooms and Nubia is highest, when I had predicted that the score of Caterpillar and Nubia would have been highest because it had the highest sequence similarity.
But since .93565 is more significant than 27%, I would be more inclined to believe that Shrooms and Nubia are more similar. But I think this would be a great subject to do further research on in the future.
Next lab our goal is to make our presentation power point so we are ready to present for the class on Wednesday 4/26.
April 28

3/20 Working on our poster

Purpose: The purpose of lab today is to work on our poster for the URSA Scholar’s Week

Methods: Today we divided up the tasks we need to do for the poster. I am in charge of doing the introduction and materials. Navya is in charge of the wet lab and in silico methods. Josh will be explaining the results and conclusion and Thomas will be responsible for taking all of our materials and designing the poster for our finished product.

I was able to make a list of all the materials we needed today, and work on part of the intro. I used an article or two to describe the importance of arthrobacter phages and had to ask around some other groups to help me remember exactly what we did in the first semester. Also, I had to do some research for what the uses of arthrobacter are.

Next lab, we will be presenting the poster, so I will finish my introduction tonight and send it Thomas so he can put it together.

April 28

Emily Johnson 2/22 Genes 23-25

Data:

GeneMark Coding potential, Gene 23 is the second dark line on the third reading frame, Gene 24 is the dark line right after the small white gap following Gene 23, also on the third reading frame. Gene 25 begins on the dark line on the right side of the first reading frame.

Gene 23:

NCBI Hit Gene 23

Phages DB Hit Gene 23

Gene 24:

NCBI Hit Gene 24

Phages DB Hit Gene 24

Gene 25:

HHPred Hit Gene 25

CDD Hit Gene 25

NCBI Hit Gene 25

Phages DB Hit Gene 25

Results:

Gene 23:

Start: 17661bp Stop: 18473bp FWD GAP: 171bp Gap SD Final Value: SD Score: -2.589 (2nd best score) The first best score causes a big gap that is uncharacteristic of bacteriophage genomes and there is no potential for a gene between the two. Z-Value: 2.923 CP: The gene is covered SCS: Agrees with Glimmer, Agrees with GeneMark NCBI BLAST: hypothetical protein LAROYE_24 [Arthrobacter phage Laroye] Q1 S1 E-Value: 1e-112 CDD: No good hit PhagesDB BLAST: Laroye_24, function unknown, 280 Q1 S1 E-Value: 1e-112 HHPred: No good hit LO: Yes ST: F: FS: Notes:

Gene 24:

Start: 18477bp Stop: 18854bp FWD GAP: 3bp Gap SD Final Value: SD Score: -3.779 (Best score) Z-Value: 2.852 CP: The gene is covered SCS: Agrees with Glimmer, Agrees with GeneMark NCBI BLAST: hypothetical protein LAROYE_25 [Arthrobacter phage Laroye] E-Value: 8e-62 CDD: No good hit PhagesDB BLAST: Wheelbite_Draft_23, function unknown, 127 Q1 S1 E-Value: 9e-51 HHPred: No good hit LO: Yes ST: Agrees with Starterator F: NKF FS: Notes: This gene surprised me a bit because it starts 3 base pairs after the prior gene, which would not be weird except that they are on the saem reading frame so it almost looks like it could be one long gene save for the stop codon between them.

Gene 25:

Start: 18889bp Stop: 19848bp FWD GAP: 34bp Gap SD Final Value: SD Score: -2.293 (Best score) Z-Value: 3.068 CP: The gene is covered SCS: Agrees with Glimmer, Agrees with GeneMark NCBI BLAST: capsid protein [Arthrobacter phage Laroye] Q1 S1 E-Value: 0.0 CDD: Phage_cap_E pfam03864 E-Value: 1.29e-3 PhagesDB BLAST: Salgado_26, capsid, 319 Q1 S1 E-Value: 1e-174 HHPred: Phage-related protein; structural genomics, joint center for structural genomics, JCSG, protein structure E-Value: 4e-40 LO: Yes ST: Agrees with Starterator F: major capsid protein FS: NCBI, PhagesDB, and HHPred all call it a major capsid protein Notes:

April 28

Emily Johnson 2/20 Genes 20-22

Data:

Gene 20:

NCBI Blast results for gene 20

PhagesDB Blast results gene 20

Gene 21:

NCBI Blast result gene 21

PhagesDB Blast result gene 21

Gene 22:

NCBI Blast result gene 22

CDD Hit Gene 22

PhagesDB Hit Gene 22

HHPred Hit Gene 22

Results:

Gene 20:

Start: 16017bp Stop: 16466bp FWD GAP: 79bp Gap SD Final Value: SD Score: -3.108 (Best score) Z-Value: 3.074 CP: The gene is covered SCS: Agrees with Glimmer, Agrees with GeneMark NCBI BLAST: hypothetical protein LAROYE_20 [Arthrobacter phage Laroye] Q1 S1 E-Value: 9e-28 CDD: No good hit PhagesDB BLAST: Edmundo_Draft_19, function unknown, 149 Q1 S1 E-Value: 2e-71 HHPred: No good hit LO: Yes ST: Agrees with Starterator F: FS: Notes:

Gene 21:

Start: 16456bp Stop: 16863bp FWD GAP: 11bp Overlap SD Final Value: SD Score: -4.651 (2nd best score) The best score has a gap of 356bp and the gene would only be 42bp long Z-Value: 1.936 CP: The gene is covered SCS: with Glimmer, Agrees with GeneMark NCBI BLAST: hypothetical protein LAROYE_22 [Arthrobacter phage Laroye] Q11 S5 E-Value: 8e-49 CDD: No good hit PhagesDB BLAST: Salgado_22, function unknown, 122 Q11 S5 E-Value: 2e-46 HHPred: No good hit LO: No The longest reading frame caused a big overlap, bad Z and final scores and bad e values on hits from NCBI and PhagesDB ST: Agrees with Starterator F: NKF FS: Notes: In both NCBI and PhagesDB Blasts, the Query was 11 and the Subject was 5. I tried blasting this gene using the start codon before nad the start codon after, but this just created larger discrepancies between the Query and Subject matches, as well as unlikely overlaps, gaps, and Final/Z scores.

Gene 22:

Start: 16860bp Stop: 17489bp FWD GAP: 4bp Overlap SD Final Value: SD Score: -5.23 (3rd best score) The first two best scores cause a gap of at least 800 bps Z-Value: 1.695 CP: The gene is covered SCS: with Glimmer, Agrees with GeneMark NCBI BLAST: No good hit CDD: Rnase_HI_RT_non_LTR cd09276 E-Value: 2.43e-09 PhagesDB BLAST: Salgado_23, RNAse, 209 Q1 S1 E-Value: 1e-88 HHPred: Ribonuclease H1 E-Value: 2.4e-18 LO: Yes ST: Agrees with Starterator F: NKF FS: NBCI. PhagesDB and HHPred call it an RNAse (ribonuclease) Notes: The function is actually predicted to be a RNAse but that is not an option on the function list