Issue 3, 2014

Improving the performance of protein kinase identification via high dimensional protein–protein interactions and substrate structure data

Abstract

As a crucial post-translational modification, protein phosphorylation regulates almost all basic cellular processes. Recently, thousands of phosphorylation sites have been discovered by large-scale phospho-proteomics studies, but only about 20% of them have information regarding catalytic kinases, which brings a great challenge for correct identification of the protein kinases responsible for experimentally verified phosphorylation sites. In most existing identification tools, only a local sequence was selected to construct predictive models, and information regarding protein–protein interaction (PPI) was adopted for further filtering. However, the limited information utilized by these tools is not sufficient to identify protein kinases responsible for phosphorylated proteins. In this work, a novel computational approach that fully incorporates PPI and substrate structure information is proposed to improve the performance of human protein kinase identification. To handle the issue of high-dimensional PPI and structure data, a two-step feature selection algorithm that incorporates a support vector machine (SVM), is designed to detect information useful in discriminating the corresponding kinase of phosphorylation sites. Benchmark datasets for kinase identification are constructed using human protein phosphorylation data extracted from the latest Phospho.ELM database. With the selected PPI and structure features, the performance of kinase identification is significantly enhanced as compared with that obtained by using only sequence information. To further verify our method, we compared it with the state-of-the-art tools NetworKIN and IGPS at two stringency levels with medium (>90.0%) and high (>99.0%) specificity. The results show that our method outperforms existing tools in identifying protein kinases. Further evaluation demonstrates that our method also has superior performance on different hierarchical levels including kinase, subfamily, family and group.

Graphical abstract: Improving the performance of protein kinase identification via high dimensional protein–protein interactions and substrate structure data

Supplementary files

Article information

Article type
Paper
Submitted
16 Oct 2013
Accepted
24 Dec 2013
First published
06 Jan 2014

Mol. BioSyst., 2014,10, 694-702

Author version available

Improving the performance of protein kinase identification via high dimensional protein–protein interactions and substrate structure data

X. Xu, A. Li, L. Zou, Y. Shen, W. Fan and M. Wang, Mol. BioSyst., 2014, 10, 694 DOI: 10.1039/C3MB70462A

To request permission to reproduce material from this article, please go to the Copyright Clearance Center request page.

If you are an author contributing to an RSC publication, you do not need to request permission provided correct acknowledgement is given.

If you are the author of this article, you do not need to request permission to reproduce figures and diagrams provided correct acknowledgement is given. If you want to reproduce the whole article in a third-party publication (excluding your thesis/dissertation for which permission is not required) please go to the Copyright Clearance Center request page.

Read more about how to correctly acknowledge RSC content.

Spotlight

Advertisements