GEP Annotation Report for CS-2

I. Introduction

This is an example of how to write the GEP Annotation Report. Submission of the GEP Annotation Report and the associated files is required to complete this course. The GEP Annotation Report template is posted on this website, and is followed here.

II. The Report

The GEP Annotation Report asks you to identify yourself as follows:

Student name:Eugene Hunter 
Student email:genehunter@unm.edu 
Faculty advisor:Dr. Paul Szauter 
College/University:University of New Mexico 

Project Details

 
Project name:dananassae_3Lcontrol_Jan2013_fosmid_2748I18 This is the name of the folder containing your fosmid sequence and other files.
Project species:Drosophila ananassae 
Date of submission:April 1, 2013 
Size of project in base pairs:39,531 You can obtain this number by viewing your fosmid in the GEP UCSC Genome Browser and noting the end coordinate.
Number of genes in project:3 This is the final number of genes you have found. GENSCAN models that are not valid do not count as genes. We are giving the number as three, because this fosmid contains CS-2, Act79B, and CG7470.

Does this report cover all genes and all isoforms or is it a partial report?
This example is a partial report; you should cover all of the genes on your fosmid.

If this is a partial report because different students are working on different regions of this sequence, please report the region of the project covered by this report:
from base  1  to base  8902 
We have given the base count from base 1 (the left edge of the fosmid) to base 8902, the last base of the stop codon in CS-2.

Complete the following Gene Report Form for each gene in your project. Copy and paste the sections below to create as many copies as needed. Be sure to create enough Isoform Report Forms within your Gene Report Form for all isoforms.

Gene report form

Gene name (i.e. D. mojavensis eyeless):D. ananassae Chitin Synthetase 2
Gene symbol (i.e. dmoj_ey):dana_CS-2
Approximate location in project (from 5' end to 3' end):1 - 8902
Number of isoforms in D. melanogaster:2
Number of isoforms in this project:1

Complete the following table for all the isoforms in this project:
If you are annotating untranslated regions then all isoforms are unique (by definition)

Name of unique isoform
based on coding sequence
List of isoforms with identical coding sequences
CS-2-RC none
   
   

Isoform report form
Complete this report form for each unique isoform listed in the table above (copy and paste to create as many copies of this Isoform Report Form as needed):

Gene-isoform name (i.e. dmoj_ey-PA):    dana_CS-2-PC
Names of the isoforms with identical coding sequences as this isoform:    none

Is the 5' end of this isoform missing from the end of project:    yes
    If so, how many exons are missing from the 5' end:    1
Is the 3' end of this isoform missing from the end of the project:    no
    If so, how many exons are missing from the 3' end:    none

1. Gene Model Checker checklist

Enter the coordinates of your final gene model for this isoform into the Gene Model Checker and paste a screenshot of the checklist results below:

CS-2 gene model

CS-2 gene model

2. View the gene model on the Genome Browser

Using the custom track feature from the Gene Model Checker (see page 10 of the Gene Model Checker user guide on how to do this; you can find the guide under "Help" -> "Documentations" -> "Web Framework" on the GEP website at http://gep.wustl.edu).

First, save the GFF file from the Gene Model Checker.

Here is the GFF file from the Gene Model Checker:

CS-2 GFF

Please see the Instructions for uploading your custom track to the GEP UCSC Genome Browser.

Go to the GEP Genome Browser Gateway and specify the fosmid. Click the button indicated by the red arrow below to add custom tracks.

add custom tracks

In the screen that appears, use the Choose File button indicated by the red arrow to upload your GFF file. Click the Submit button.

add custom tracks


Capture a screenshot of your gene model shown on the Genome Browser for your project; zoom in so that only this isoform is in the screenshot. Include the following evidence tracks in the screenshot if they are available.

1. A sequence alignment track (D. mel Protein or Other RefSeq)
2. At least one gene prediction track (e.g. Genscan)
3. At least one RNA-Seq track (e.g. RNA-Seq Alignment Summary)
4. A comparative genomics track
(e.g. Conservation, D. mel. Net Alignment, 3-way, 5-way or 7-way multiz)

Paste the screenshot of your gene model as shown on the Genome Browser below:

UCSC custom CS-2

3. Alignment between the submitted model and the D. melanogaster ortholog

Show an alignment between the protein sequence for your gene model and the protein sequence from the putative D. melanogaster ortholog. You can use the protein alignment generated by the Gene Model Checker or you can generate a new alignment using BLAST 2 Sequences (bl2seq). Copy and paste the alignment below:

CS-2 alignment

4. Dot plot between the submitted model and the D. melanogaster ortholog

Paste a copy of the dot plot of your submitted model against the putative D. melanogaster ortholog (generated by the Gene Model Checker). Provide an explanation for any anomalies on the dot plot (e.g. large gaps, regions with no sequence similarity).

CS-2 gene model

Supplement

We need to explain why we have not annotated the CS-2-PD isoform that is found in D. melanogaster. This isoform differs in the use of Exon 3 rather than Exon 4. We show below that while D. melanogaster has two splice acceptor sites three bases apart that add a single amino acid to Exon 3 vs. Exon 4, D. ananassae lacks one of the splice acceptors. Alignment of the sequences shows that only Exon 4 is used in D. ananassae.

We want to decide whether Exon 3 or Exon 4 was lost in D. ananassae. We retrieve genomic sequences for D. melanogaster from FlyBase and use our alignments to select the relevant portion of the sequence of the D. ananassae fosmid.

We use the EMBOSS toolkit, selecting showorf as the relevant tool. The screenshots shown below were colorized in Photoshop to highlight the features under discussion.

Dmel exons 2 & 3/4

Dana exons 2 & 4

Dmel CS-2 exon 2/3 Dmel CS-2 exon 2/3
Genomic sequence for Exons 2 and 3/4 from D. melanogaster and D. ananassae are shown. The amino acid sequences for the coding sequences are highlighted in yellow. The nucleotide sequences that align between D. melanogaster Exons 2 and 3/4 and genomic sequence from D. ananassae are shown highlighted in red. Splice acceptors at the 5' end of Exon 3/4 are shown highlighted in green. The most parsimonious interpretation is that D. ananassae has lost the first splice acceptor and retained the second. Therefore, D. melanogaster Exon 3 is not used in D. ananassae, only Exon 4. This means that D. ananassae lacks an isoform corresponding to CS-2-RD, and only has a single isoform corresponding to CS-2-RC in D. melanogaster.