Getting poster data...
Des Higgins (Conway Institute, Dublin, Ireland)We have been using a variety of multivariate analysis methods to visualise complex relationships in large multiple sequence alignments. We have used “Between Groups Analysis” as a way of doing a supervised PCA or Correspondence Analysis (CA) in order to find residues that are specific for sets of sequences with different properties (Wallace et al, BMC Bioinformatics, 2006). More recently we have use Co-Inertia Analysis to find relationships between residues in different groups of kinases and the binding of small molecule inhibitors. We have also been using general CA and PCA type methods for analysing phylogenetic type relationships in large data sets such as sequences those from Influenza H1N1. These methods can be made to scale very well, even for huge numbers of sequences numbering in the 10s or even 100s of thousands. This is partly accomplished by avoiding the use of full NxN distance matrices for N sequences but is also achieved by using some recent developments in fast MDS methods.