Sequence similarity search
This topic explores online tools for sequence similarity search, mainly BLAST at
Example output for EPO, accession X02157 (cDNA) searched with BLASTN using default parameters:
- NCBI: RID-3598A6B101R
- EBI: jobID ncbiblast-I20161121-075026-0519-99904616-es
- Take one of the overrepresented sequences from the SRR515298 sample, listed in the FastQC report and try to find out where it might come from:
- Some example results:
- The default BLAST search at NCBI does not produce any hits for this sequence - try to improve the sensitivity and figure out what it might be related to.
- Search a stretch of DNA from the human genome and try to find out where the exons are.
- Analyse a 9kb genomic yeast sequence for coding genes (NCBI). Which feature is obscuring the results and what parameter can be used to prevent this?
Max matches in a query range set to 2: 35JXM61Y015
- Moving on to protein searches: What potential paralogues of HOXA1 (AAB35423.2, local copy) can you identify in the human genome by using NCBI's BLASTP server?
- Try the same search on the EBI server. In the 'Result Summary', what does 'Query-anchored showing identities' produce? How does this compare to the 'Flat query-anchored showing identities'?
- Here is the rhodopsin protein sequence from the zebrafish: danio_rerio.aa. Do a BLASTP search and report the percentage identity to rhodopsin in human, mouse and orca.
- Here is the rhodopsin protein sequence from the dolphin: dolphin.aa. Do a BLASTP search and report the percentage identity to rhodopsin in human, mouse and orca.
- Which are the next closest homologues to TLR4 (NP_612564.1, local copy)? Try the tree view in NCBI!
- A 2009 paper by Aoife McLysaght's group identified three protein coding genes, which were only found in humans:
>sp|Q5K131|CLLU1_HUMAN Chronic lymphocytic leukemia up-regulated protein 1 OS=Homo sapiens GN=CLLU1 PE=2 SV=1
>sp|P86434|AAS1_HUMAN Putative uncharacterized protein ADORA2A-AS1 OS=Homo sapiens GN=ADORA2A-AS1 PE=5 SV=1
>sp|P0CZ25|D10OS_HUMAN Uncharacterized protein DNAH10OS OS=Homo sapiens GN=DNAH10OS PE=2 SV=1
Carry out BLAST searches to see how unique these sequences are.