April 28

4/26/17 Independant Project Preliminary Presentations Jake Hanna

Goal:

To present our findings to the class and receive constructive criticism for improvement.

Results/How it went:

We were somewhat unprepared for the presentation, but we learned what needs to be improved. Our introduction needs work as it doesn’t adequately describe the “so what”, of our presentation. The data needs to inserted in a more understandable format, so adding another chart that summarized all the data on the differences between phages compared from different clusters would help.

 

April 28

NW 03/15/17 & 03/20/17 Make URSA Posters

Goals: Form groups for making URSA posters and begin working on the poster.

Methods: Because our groups needed at least one representative from each of the genome teams (Shrooms, Nubia, and Caterpillar), I formed a group with Chrissy, Andrea and Stu. We began the planning of our poster on 3/15 and finished our poster on 3/20.

Our Poster:

Poster-vp0d5z

Next steps include presenting our poster to the class and picking two to print for URSA Scholars Day!

April 28

NW 03/01/17 & 03/13/17 Finish Annotation Summary and Review of Shrooms Genome

Goals: Finish the cover sheet (annotation summary) and assist group in finishing the review process.

Finished Annotation Summary:

Arthrophage Shrooms Annotation Summary

Michael Munson, Stu Mair, Natalie Widdows, Niharika Koka, Emily Johnson,

Andrea Springman, Niru Ancha, Daniel Zeter

Baylor University

 

Totals:

Gene Count: 98

Insertions: 1

Deletions: 3

Extensions: 11

Reductions: 1

No Known Function (NKF): 67

Identified Functions: 31

 

Debatable Calls:

Gene 28 – The gene was extended to 20600bp to cover all the coding potential. The extension agreed with Starterator, NCBI BLAST, and PhagesDB BLAST. Extending the gene also improves the Z value from 2.771 to 2.052. This call does, however, disagree with both Glimmer and GeneMark and sacrifices a better SD score. Glimmer and GeneMark called the start at 20624bp, and the SD score changed from -3.682 to -5.306.

 

Gene 34 – The gene was extended to 22385bp to improve the SD score from -5.632 to -4.759 and the Z value from 1.157 to 2.13. Extending the gene did, however, conflict with GeneMark, Glimmer, and Starterator which called the start at 22397bp. While both PhagesDB BLAST and NCBI BLAST had results, neither extending the gene nor leaving the gene aligned the start codon with the compared genomes. NCBI: Q9, S3; PhagesDB: Q10, S4.

 

Gene 48 – The gene was extended to 27837bp to cover all the coding potential. This call does, however, conflict with both Glimmer and GeneMark which called the gene at 28040bp. There were no hits for this gene on NCBI BLAST or PhagesDB so making the call was difficult.

 

Conclusions: After finishing the annotation summary, I helped the rest of the group finish the peer review because our group was struggling to get through all of the genes. But we all finished, and we were able to submit all of our work and final DNA Master files to Dr. Adair for her review. All that’s left is for the genome to be submitted. Yay!

 

April 28

NW 2/27/17 Begin Review Process of Shrooms Genome

Goals: Delegate tasks within our group to begin the review process for the Shrooms genome that we have annotated.

Our tasks are:

  • Proof each of the annotations to make sure the calls were good and are free from error
  • Write a cover sheet explaining difficult calls and giving an overview of the genome
  • Make a final DNA Master file with annotations pasted in to the notes section
  • Make a DNA Master file of all of the assigned functions

We discussed these as a group and delegated the tasks. It was decided that I would work on the cover sheet. During lab I began working on this, although I was not able to finish. I began to compile all the information about the number of changed starts, insertions/deletions, difficult calls, etc.

Future goals include finishing the annotation cover sheet.

April 28

NW 02/22/17 Annotation of Genes 44,46,48, and 50

Goal – Annotate genes 44, 46, 48, and 50

Tools – Phamerator, Starterator, DNA Master, GeneMark, Glimmer, NCBI BLAST, PhagesDB BLAST, HHpred

Results:

Gene 44 –

Start: 26673bp Stop: 26855bp FWD GAP: 4bp Overlap SD Score: -6.571 (2nd best score) The best score would have caused too big of a base pair overlap with the previous gene Z-Value: 1.08 CP: The gene is covered SCS: Agrees with Glimmer, Disagrees with GeneMark Genemark did not call the gene. NCBI BLAST: hypothetical protein LAROYE_46 [Arthrobacter phage Laroye]; Q30, S98 E-Value: 4e-10 CDD: No good hit PhagesDB BLAST: Laroye_46, function unknown; Q30, S98 E-Value: 1e-12 HHPred: No good hit LO: No Longest ORF would cause an overlap of 87bp ST: Agrees with Starterator F: NKF FS: NCBI, Phamerator, HHPred Notes:

Coding Potential Gene 44

NCBI BLAST 44

Gene 46 –

Start: 27430bp Stop: 27849bp FWD GAP: 299bp Gap SD Score: -4.18 (Best score) Z-Value: 2.576 CP: The gene is covered SCS: Disagrees with Glimmer, Agrees with GeneMark NCBI BLAST: No good hit CDD: No good hit PhagesDB BLAST: Waltz_Draft_46, function unknown; Q1S1 E-Value: 3e-69 HHPred: No good hit LO: Yes ST: Agrees with Starterator F: NKF FS: NCBI, Phamerator, HHPred Notes:

 

PhagesDB BLAST Gene 46

Gene 48 – 

Start: 27837bp Stop: 28040bp FWD GAP: 13bp Overlap SD Score: -4.537 (2nd best score) The best score does not cover all the coding potential. Z-Value: 2.476 CP: The gene is covered SCS: Disagrees with Glimmer, Disagrees with GeneMark These calls left uncovered coding potential. NCBI BLAST: hypothetical protein SALGADO_37 [Arthrobacter phage Salgado] E-Value: 6e-46 CDD: No good hit PhagesDB BLAST: Edmundo_Draft_50, function unknown, Q4, S1 E-Value: 6e-26 HHPred: No good hit LO: No Longest ORF would cause an overlap of 193bp ST: Starterator was basing this call only on drafts and therefore was not very reliable. The gene was extended against Starterator to cover the coding potential F: NKF FS: NCBI, Phamerator, HHPred Notes: The start codon called by both GeneMark and Glimmer left uncovered coding potential. I chose to extend the gene to cover the coding potential.

PhagesDB BLAST Gene 48

Gene 50 – 

Start: 28231bp Stop: 28398bp FWD GAP: 4bp Overlap SD Score: -5.401 (2nd best score) The best score has an ORF length of 63 bp. Z-Value: 1.624 CP: The gene is not covered There is no start codon by which to extend the gene to cover the coding potential. SCS: Agrees with Glimmer, Agrees with GeneMark NCBI BLAST: No good hit CDD: No good hit PhagesDB BLAST: Waltz_Draft_48, function unknown, Q1 S1 E-Value: 7e-19 HHPred: No good hit LO: Yes ST: Agrees with Starterator F: NKF FS: NCBI, Phamerator, HHPred Notes:

PhagesDB BLAST Gene 50 –

Coding Potential Gene 50 –

 

Conclusions: Today Nihi and I finished annotating our portion of Shroom’s genome. The rest of our team has just about finished their portion too and so the next task is to review the annotations, write a cover sheet, and form a final DNA Master file for submission.

April 28

Independent Project: Clustering Methods – 4/5/17

Leading into today, the TMP sequences for clusters AK-AU were found and recorded in a multi-FASTA file. These were found by using an annotated DNA Master document pulled from PhagesDB and then extrapolating onto the other members of that cluster. Three phages were used in each cluster unless only two were available.  During this lab meeting, we annotated the AV cluster phages for their TMP sequence. There was an existing cluster AV annotation (Jasmine) but it didn’t classify a TMP. There were several genes of substantial enough length to be a TMP, which we then blasted to find a function. We ended up settling on it being gene 29. It had an appropriate length (5412) and a weak hit (E-value=.013) to a different cluster’s TMP. With no other alternatives lacking a function, this was selected as our TMP for this cluster. This gene was strictly conserved (E-Value=0.0) across all the cluster’s phages we are using.

Once we collected all of the TMP sequences in a multi-FASTA file, we created a Gepard dotplot for the TMP sequences and compared it to a Gepard plot. It reflected similar cluster boxes with slightly less strength. Lymara, an AR phage was inserted into the top of these to verify that each are capable of clustering an “unknown” phage. Both methods exhibited this capability, indicating the capability to cluster by TMP.

TMP Clustering Example

Full Genome Clustering Example

 

April 28

Independent Project: Clustering Methods – 4/3/17

Today we developed ideas for the project based on an existing article that piqued interest. What we intend to pursue is analyzing the ability of how to cluster by tape measure proteins in Arthrobacter clusters and see if the single gene analysis implies the same evolutionary history as whole genome alignment (which is seen through the current cluster names).

April 28

4/24-26/17: Analyzing Data

Goal: Today’s goal is to analyze the data that was collected and determine the significance

Methods: The first step of analyzing the data is to average the percent identities found for each gene within all three clusters and to average the percent identities found for each gene across all three clusters.

We then compared these averages against each other in order to determine which gene is the most conserved across the three phages within each of the clusters

Results:

The results show that the most conserved gene across all three phages within each of the clusters is not actually a tmp. The results show that there was a 24.66% similarity across the three clusters of the tape measure proteins, while the other three proteins only had around a 20% similarity meaning that there was a greater amount of similarity within the clusters of the other three genes

4/26

Today we presented our projects to the class in order to get feedback on our presentations. After presenting our data to the class, we decided to focus more on the importance of these results and add more to the presentation as to why sing gene analysis is important.

April 28

4/19/17: Gepard Dotplot data

Goal: To create Gepard dotplots that show the similarity among clusters for each of the proteins

Methods: Using the Gepard program the amino acid sequence was inputted for each of the proteins across all three clusters.

the program was used to generate four dotplots

Results:

April 28

4/17/17: Continuing to collect data for project

Goal: Today’s goal is to continue collecting the multiple sequence alignment results.

Methods: We used the amino acid sequences already collected from the phages and proteins chosen and ran them against each other in the multiple sequence alignment tool, Clustal omega

We were able to collect the results for the rest of the alignments.