get_proteomes.py¶
get_proteomes.py
implements a method that we apply before finding HGT candidates with ShadowCaster.
This script retrieves a list of proteomes from phylogenetically related species to the query species (fasta files) from the NCBI ftp.
ShadowCaster needs these proteomes to construct a phylogenetic shadow used in its phylogenetic component.
Prerequisites¶
- EDirect UNIX command line of NCBI.
Before using the script, check that the commands esearch and xtract work correctly in a new shell window.
type esearch xtract
Usage¶
The usage and help documentation of get_proteomes.py
can be seen by
running python get_proteomes.py -h
:
Example¶
An example of how to run get_proteomes.py
on the test data:
cd ShadowCaster/scripts
python get_proteomes.py -n Rhodanobacter_denitrificans -sp 25
This results in the following output files in the folder named with the species name provided:
log.txt
Name of the downloaded species and its ftp address.proteomes folder
Proteomes (fasta file) used to construct the shadow.
The results should be similar to those found in the proteomes-output
folder of the test data repository, see here