Issue 7, 2016

Identifying relevant positions in proteins by Critical Variable Selection

Abstract

Evolution in its course has found a variety of solutions to the same optimisation problem. The advent of high-throughput genomic sequencing has made available extensive data from which, in principle, one can infer the underlying structure on which biological functions rely. In this paper, we present a new method aimed at the extraction of sites encoding structural and functional properties from a set of protein primary sequences, namely a multiple sequence alignment. The method, called critical variable selection, is based on the idea that subsets of relevant sites correspond to subsequences that occur with a particularly broad frequency distribution in the dataset. By applying this algorithm to in silico sequences, to the response regulator receiver and to the voltage sensor domain of ion channels, we show that this procedure recovers not only the information encoded in single site statistics and pairwise correlations but also captures dependencies going beyond pairwise correlations. The method proposed here is complementary to statistical coupling analysis, in that the most relevant sites predicted by the two methods differ markedly. We find robust and consistent results for datasets as small as few hundred sequences that reveal a hidden hierarchy of sites that are consistent with the present knowledge on biologically relevant sites and evolutionary dynamics. This suggests that critical variable selection is capable of identifying a core of sites encoding functional and structural information in a multiple sequence alignment.

Graphical abstract: Identifying relevant positions in proteins by Critical Variable Selection

Supplementary files

Article information

Article type
Paper
Submitted
19 Jan 2016
Accepted
02 Mar 2016
First published
14 Mar 2016

Mol. BioSyst., 2016,12, 2147-2158

Identifying relevant positions in proteins by Critical Variable Selection

S. Grigolon, S. Franz and M. Marsili, Mol. BioSyst., 2016, 12, 2147 DOI: 10.1039/C6MB00047A

To request permission to reproduce material from this article, please go to the Copyright Clearance Center request page.

If you are an author contributing to an RSC publication, you do not need to request permission provided correct acknowledgement is given.

If you are the author of this article, you do not need to request permission to reproduce figures and diagrams provided correct acknowledgement is given. If you want to reproduce the whole article in a third-party publication (excluding your thesis/dissertation for which permission is not required) please go to the Copyright Clearance Center request page.

Read more about how to correctly acknowledge RSC content.

Spotlight

Advertisements