April 28

NW 03/22/17 & 03/27/17 & 03/29/17 URSA Poster Practice and Presentation

What We Did:

03/22/17: Present Posters to class an pick two to be printed. My group’s poster was picked!

Our Two Final Posters:

Figure A

Figure B

03/27/17: Practice presenting. Our class split off in to two groups and each practiced with one poster. My group worked on our poster (the green one) and we presented to Dr. Adair and Lathan. They critiqued us and we polished our presenting skills.

03/29/17: URSA Scholar’s Day! We presented our posters in front of judges and professors. We also had the opportunity to listen to other student’s presentations and learn about other student research going on at Baylor.

We did it!

April 28

NW 03/15/17 & 03/20/17 Make URSA Posters

Goals: Form groups for making URSA posters and begin working on the poster.

Methods: Because our groups needed at least one representative from each of the genome teams (Shrooms, Nubia, and Caterpillar), I formed a group with Chrissy, Andrea and Stu. We began the planning of our poster on 3/15 and finished our poster on 3/20.

Our Poster:

Poster-vp0d5z

Next steps include presenting our poster to the class and picking two to print for URSA Scholars Day!

April 28

NW 03/01/17 & 03/13/17 Finish Annotation Summary and Review of Shrooms Genome

Goals: Finish the cover sheet (annotation summary) and assist group in finishing the review process.

Finished Annotation Summary:

Arthrophage Shrooms Annotation Summary

Michael Munson, Stu Mair, Natalie Widdows, Niharika Koka, Emily Johnson,

Andrea Springman, Niru Ancha, Daniel Zeter

Baylor University

 

Totals:

Gene Count: 98

Insertions: 1

Deletions: 3

Extensions: 11

Reductions: 1

No Known Function (NKF): 67

Identified Functions: 31

 

Debatable Calls:

Gene 28 – The gene was extended to 20600bp to cover all the coding potential. The extension agreed with Starterator, NCBI BLAST, and PhagesDB BLAST. Extending the gene also improves the Z value from 2.771 to 2.052. This call does, however, disagree with both Glimmer and GeneMark and sacrifices a better SD score. Glimmer and GeneMark called the start at 20624bp, and the SD score changed from -3.682 to -5.306.

 

Gene 34 – The gene was extended to 22385bp to improve the SD score from -5.632 to -4.759 and the Z value from 1.157 to 2.13. Extending the gene did, however, conflict with GeneMark, Glimmer, and Starterator which called the start at 22397bp. While both PhagesDB BLAST and NCBI BLAST had results, neither extending the gene nor leaving the gene aligned the start codon with the compared genomes. NCBI: Q9, S3; PhagesDB: Q10, S4.

 

Gene 48 – The gene was extended to 27837bp to cover all the coding potential. This call does, however, conflict with both Glimmer and GeneMark which called the gene at 28040bp. There were no hits for this gene on NCBI BLAST or PhagesDB so making the call was difficult.

 

Conclusions: After finishing the annotation summary, I helped the rest of the group finish the peer review because our group was struggling to get through all of the genes. But we all finished, and we were able to submit all of our work and final DNA Master files to Dr. Adair for her review. All that’s left is for the genome to be submitted. Yay!

 

April 28

NW 2/27/17 Begin Review Process of Shrooms Genome

Goals: Delegate tasks within our group to begin the review process for the Shrooms genome that we have annotated.

Our tasks are:

  • Proof each of the annotations to make sure the calls were good and are free from error
  • Write a cover sheet explaining difficult calls and giving an overview of the genome
  • Make a final DNA Master file with annotations pasted in to the notes section
  • Make a DNA Master file of all of the assigned functions

We discussed these as a group and delegated the tasks. It was decided that I would work on the cover sheet. During lab I began working on this, although I was not able to finish. I began to compile all the information about the number of changed starts, insertions/deletions, difficult calls, etc.

Future goals include finishing the annotation cover sheet.

April 28

NW 02/22/17 Annotation of Genes 44,46,48, and 50

Goal – Annotate genes 44, 46, 48, and 50

Tools – Phamerator, Starterator, DNA Master, GeneMark, Glimmer, NCBI BLAST, PhagesDB BLAST, HHpred

Results:

Gene 44 –

Start: 26673bp Stop: 26855bp FWD GAP: 4bp Overlap SD Score: -6.571 (2nd best score) The best score would have caused too big of a base pair overlap with the previous gene Z-Value: 1.08 CP: The gene is covered SCS: Agrees with Glimmer, Disagrees with GeneMark Genemark did not call the gene. NCBI BLAST: hypothetical protein LAROYE_46 [Arthrobacter phage Laroye]; Q30, S98 E-Value: 4e-10 CDD: No good hit PhagesDB BLAST: Laroye_46, function unknown; Q30, S98 E-Value: 1e-12 HHPred: No good hit LO: No Longest ORF would cause an overlap of 87bp ST: Agrees with Starterator F: NKF FS: NCBI, Phamerator, HHPred Notes:

Coding Potential Gene 44

NCBI BLAST 44

Gene 46 –

Start: 27430bp Stop: 27849bp FWD GAP: 299bp Gap SD Score: -4.18 (Best score) Z-Value: 2.576 CP: The gene is covered SCS: Disagrees with Glimmer, Agrees with GeneMark NCBI BLAST: No good hit CDD: No good hit PhagesDB BLAST: Waltz_Draft_46, function unknown; Q1S1 E-Value: 3e-69 HHPred: No good hit LO: Yes ST: Agrees with Starterator F: NKF FS: NCBI, Phamerator, HHPred Notes:

 

PhagesDB BLAST Gene 46

Gene 48 – 

Start: 27837bp Stop: 28040bp FWD GAP: 13bp Overlap SD Score: -4.537 (2nd best score) The best score does not cover all the coding potential. Z-Value: 2.476 CP: The gene is covered SCS: Disagrees with Glimmer, Disagrees with GeneMark These calls left uncovered coding potential. NCBI BLAST: hypothetical protein SALGADO_37 [Arthrobacter phage Salgado] E-Value: 6e-46 CDD: No good hit PhagesDB BLAST: Edmundo_Draft_50, function unknown, Q4, S1 E-Value: 6e-26 HHPred: No good hit LO: No Longest ORF would cause an overlap of 193bp ST: Starterator was basing this call only on drafts and therefore was not very reliable. The gene was extended against Starterator to cover the coding potential F: NKF FS: NCBI, Phamerator, HHPred Notes: The start codon called by both GeneMark and Glimmer left uncovered coding potential. I chose to extend the gene to cover the coding potential.

PhagesDB BLAST Gene 48

Gene 50 – 

Start: 28231bp Stop: 28398bp FWD GAP: 4bp Overlap SD Score: -5.401 (2nd best score) The best score has an ORF length of 63 bp. Z-Value: 1.624 CP: The gene is not covered There is no start codon by which to extend the gene to cover the coding potential. SCS: Agrees with Glimmer, Agrees with GeneMark NCBI BLAST: No good hit CDD: No good hit PhagesDB BLAST: Waltz_Draft_48, function unknown, Q1 S1 E-Value: 7e-19 HHPred: No good hit LO: Yes ST: Agrees with Starterator F: NKF FS: NCBI, Phamerator, HHPred Notes:

PhagesDB BLAST Gene 50 –

Coding Potential Gene 50 –

 

Conclusions: Today Nihi and I finished annotating our portion of Shroom’s genome. The rest of our team has just about finished their portion too and so the next task is to review the annotations, write a cover sheet, and form a final DNA Master file for submission.

April 28

NW 04/24/17 Write Abstract

Goal: Write Abstract for Presentation

Abstract – 

Analysis of Alternative Clustering Methods for Arthrobacteriophages

The study of bacteriophages includes an investigation of their DNA. An important aspect of DNA analysis is clustering, the means by which discovered bacteriophages are sorted according to genome similarity. The current method for clustering is an analysis and comparison of a bacteriophage’s full genome. However, this presents several limitations for clustering, as it requires the completion of full-genome sequencing and the production of a high-titer DNA lysate. In the interest of easing the clustering process, additional clustering methods were analyzed including sorting by G/C content and clustering by Tape Measure Proteins (TMP). Through comparison to existing clusters, it was determined that the G/C content of phage clusters group too closely to be useful for clustering. However, through the use of a Gepard dotplot, it was determined that there is enough similarity among TMPs within clusters and enough diversity between clusters to facilitate that clustering method. From this, we used cluster TMP alignments generated by Mega7 to find conserved regions within each genome. From these, small PCR primers were created that were specific to each cluster’s TMP with minimal off-target similarity. These will allow future researchers to determine the clusters of bacteriophages with a minimal sample of DNA earlier in the process without going through whole genome sequencing. In conclusion, TMP analysis is a highly effective alternative clustering method with the added benefit of PCR primers, which offer the benefits of less lab work and the ability to sort phages with a low titer DNA lysate.

Conclusions: With the abstract finished, all we have left to do is to make the presentation!

April 28

NW 04/19/17 Design Primers

Goal: Begin designing PCR primers

Tools Used: Mega 7, DNA Master

Methods:

  • Download Mega7
  • Using Mega7, perform an alignment on the 3 TMP sequences within a cluster
  • Then scan through to find regions of similarity that are identical across all 3 genomes and are at least 20 bps.
  • Then take sequence and run through DNA Master file of a genome of that cluster to see if that sequence is present anywhere else in the genome. Run this comparison across all other 12 clusters to make sure the sequence is unique to the single cluster.
  • After the checking process is complete, make sure your forward and reverse primers form a fragment that is unique in length from all the other primers so that you can run a gel following PCR and get a uniquely sized fragment to be able to identify which cluster it belongs to.

Primers:

Primer Chart-1r6t26w

Cluster Forward Primer (Rev. Comp.) Reverse Primer Product Size (bp)
AK​ CTGAACGCGGTATTCGGGTTCTTCTC​ GCCGTCCTTTCTAACATCCACGAGA​ 596​
AL​ AAGGACTACACGGGCCTGAC​ ACTACTTCGACCCATGACTGG​ 292​
AM​ TCAAACATCTCAAGTAAGTTCTC​ GAACCTGAGTCGACTAACCATGGAACTTC​ 783​
AN​ ACTTGGCTGCCATGTTCGGCGGCAC​ TCGGCCGCATTTGCGGCACAAGTT​ 1000​
AO1​ CGCGTCGAGCGGGCGCACCGCTGGC​ AGTAGCTTCTGCACGAGCTGCTTCGTCT​ 204​
AO2​ GCTGCGTCTGGCGCGTGGGGCGGC​ TCACGAGCCGAGAATGTTACGCAA​ 950​
AP​ TCGACAAGCAGCGTCAGGCCT​ TCTACCTCTTCGCCTTCCGT​ 371​
AQ​ TGGCTAACGAGTTTGAAGCC​ GGGACTTTCTGAACTGCTGA​ 632​
AR​ TATGACCAAGCCCTTGGGCG​ TGTCGCGGGAACGGCGGCAGGTCCC​ 431​
AS​ CAAGTCGGTGGCGGCGTGGATTCTG​ AGAGGCCCCGGTTCCTGCGCCGC​ 700​
AT​ AGCCGATCATCGAAATGGCGA​ CCGTTTAGGACCCAGGCCGCGA​ 844​
AU​ GCACTTCTGGGTCTTATTCC​ GCATAGCGTCCCTACGACTGTGAAGGC​ 906​
AV​ GAACGTACATTTGAAGCAATGGT​ CTAGCATTCCAAATGGCAAGAGC​ 527​

Conclusion: Finishing the design of our primers completes our project. The primers were pretty time consuming, especially doing all the checking in DNA master and making sure the lengths were varied enough. Next steps include writing the abstract, making the powerpoint presentation, and presenting.

April 28

NW 04/12/17 New Idea of PCR Primers

Goal: Begin brainstorming about how to use PCR primers to cluster.

Discussion: In the article referenced for the use of TMPs to cluster, the authors designed PCR primers for the TMP proteins. We discussed that we can do this with our phages, designing primers unique and specific for each of the 13 clusters. This way, we can cluster without whole genome sequencing and without having a high titer lysate. Additionally, this process could be performed in the wet lab to see the cluster of all the phages that don’t make it to sequencing.

What we did: 

First, we collaborated with another group with whom we had similar topics. We decided to diversify – we are going to focus more on TMPs and designing PCR primers.

Next, we continued researching how we can design primers. We learned the primers need to be approximately 20 bp. We need both a forward primer and a reverse primer. The primers must be distinct enough that there is no more than 80% similarity in the other clusters.

Goals for Next Lab: 

Begin designing the PCR primers for all 13 clusters.

April 28

NW 04/10/17 Compare Clustering by G/C Content

Goal: Look at the G/C content in each of the 39 genomes of the 13 clusters and see if the G/C content is conserved and if it can be used to cluster.

Methods:

Compile GC content info on all 39 phages

Create graph in Excel to visualize the clusters

Results:

Phage Name Cluster G/C Content
Joann AK 60.7
Nubia AK 60.7
Vulture AK 61.6
Laroye AL 64.8
Salgado AL 64.6
Shrooms AL 64.5
Circum AM 45.2
Heisenberger AM 45.1
Mudcat AM 45.1
Jessica AN 60.1
Sandman AN 60
Stratus AN 60
Brent AO1 63.4
Jawnski AO1 63.4
Nahla AO1 63.5
Martha AO2 61
Shade AO2 61.1
Sonny AO2 61.1
Tank AP 62.9
Wilde AP 62.9
Amigo AQ 52.9
Gorgeous AQ 53
Rings AQ 53
DrYang AR 62.6
PrincessTrina AR 61.6
Tophat AR 61.6
Abidatro AS 68.5
Galaxy AS 68.4
BeatusComedenti AT 63.4
KellEzio AT 63.3
Kitkat AT 63.4
CapnMurica AU 49.6
Caterpillar AU 51.1
Gordon AU 49.8
Gurgleferb AV 45.7
Jasmine AV 45.9
Nellie AV 45.8

Conclusions: 

G/C content, while highly conserved within each cluster, is very similar across clusters and therefore cannot be used to cluster. Next time we will begin to look at the possibility of designing PCR primers for the TMP proteins.