Verify Taxonomy And Loci

From Moorea Biocode
Jump to: navigation, search
A sample verify taxonomy table.

To help verify whether the correct sample has been sequenced and that the assigned taxonomy is accurate Geneious can run a specialized batch BLAST search for the sequencing results. This can be run on any selection of contigs and alignments of contigs. If you have performed an alignment as above then you should use the alignment to make sure you are using the edited consensus sequence.

After selecting the desired sequencing results click on Biocode > Verify Taxonomy... in the toolbar. This brings up the standard blast options. It is required that "Fully annotate hit summaries" is turned on but the rest of the options can be tweaked as necessary. Click OK to begin the search. This can take quite a long time to run due to BLAST.

When the process is complete, a Verify Taxonomy Results document will be produced. This displays a table which has a row for each of the queries comparing them with each of their top hits returned from BLAST.


Rows can be selected in the table by clicking/dragging and holding shift/ctrl/command while clicking in the normal ways. Click on Go To Queries to jump to the contigs associated with the selected rows and click Show Other Hits to see additional hits that were downloaded for the selected row. Show Other Hits is only enabled when one row is selected. Double clicking on a row also shows other hits.

Again, if you decide that the verify process shows that the sequencing was a failure then you can mark this in the LIMS by jumping to the contigs and using Mark as Fail in LIMS. Also, as mentioned above it's a good idea to move the failed contigs to a new subfolder (eg. named "fail") so they don't interfere.


Columns

  • Bin: Similar to the bin column that has been used for reads and contigs, this bin column summarizes several properties of the verification process by assigning each result a High, Medium or Low value (in the form of a smiley). The parameters used for binning is covered in the below section.
  • Query: The name of the query contig
  • Query Taxon: The taxonomy of the query which was pulled from the FIMS earlier in the pipeline. The verify operation fills in higher taxonomy by searching NCBI taxonomy. If the taxon couldn't be found in the taxonomy database then this will be noted and result will be marked as Low bin.
  • Hit Taxon: The taxonomy of the top hit from BLAST. Levels in the taxonomies are marked as green or red depending on whether they match with the query.
  • Keywords: A user defined list of keywords which are expected in the hit definition from BLAST. Again these are highlight red or green depending on whether they are found in the definition.
  • Hit Definition: The definition of the top hit returned from BLAST with matching keywords highlighted.
  • Hit Length: Length of the hit alignment from BLAST, highlighted according to binning parameters (red, orange or green)
  • Hit Identity: Identity of of the hit alignment from BLAST, highlighted according to binning parameters (red, orange or green)
  • Assembly Bin: The bin that was assigned to the contig according to the previously mentioned binning parameters.


You can sort by any of the columns as usual and rearrange/resize them.