ShadowCaster implements an evolutionary model to calculate Bayesian likelihoods for each ‘alien genes’ with an unusual sequence composition according to the host genome background to detect HGT events in prokaryotes.
ShadowCaster analysis workflow¶
- The user defines a query genome (by providing two fasta files, see Usage ) in which HGT events will be detected.
- ShadowCaster uses a list of proteomes from phylogenetically related species to the query genome (one proteome FASTA file per species) to construct a phylogenetic shadow.
- The list of proteomes could be either:
-Provided by the user (a collection of FASTA files).
-Automatically retrieved by ShadowCaster from the NCBI ftp site by using
script/get_proteomes.py, see get_proteomes.py.
- A prioritized list of potential ‘alien genes’ present in the query genome is generated by the analysis of compositional features i.e. 4mers and codon usage. An unsupervised one-class support vector machine is used for the prioritization task.
- Orthology relationships among the query genome and its phylogenetically related species are obtained by the third-party algorithm ORTHOMCL. This information is used to calculate the ‘probability of orthology’ between the query genome and each other genome in the phylogenetic shadow.
- BLAST is used to calculate the identity between each alien gene and the rest of genes in the genomes of the phylogenetic shadow.
- A likelihood is calculated for each alien genes in the list from step 3. The likelihood expresses how likely is that the pattern of identity across genomes in the phylogenetic shadow for this alien gene derives from vertical inheritance.
For a comprehensive guide on how to install ShadowCaster and its prerequisites, see Installation.
GNU General Public License Version 3