SELECTION OF A REPRESENTATIVE SET OF STRUCTURES FROM BROOKHAVEN PROTEIN DATA-BANK

BOBERG, J; SALAKOSKI, T; Vihinen, Mauno

SELECTION OF A REPRESENTATIVE SET OF STRUCTURES FROM BROOKHAVEN PROTEIN DATA-BANK

Mark

BOBERG, J ; SALAKOSKI, T and Vihinen, Mauno ^LU

(1992) In Proteins 14(2). p.265-276

Abstract: Reliable structural and statistical analyses of three dimensional protein structures should be based on unbiased data. The Protein Data Bank is highly redundant, containing several entries for identical or very similar sequences. A technique was developed for clustering the known structures based on their sequences and contents of alpha- and beta-structures. First, sequences were aligned pairwise. A representative sample of sequences was then obtained by grouping similar sequences together, and selecting a typical representative from each group. The similarity significance threshold needed in the clustering method was found by analyzing similarities of random sequences. Because three dimensional structures for proteins of same structural... (More); Reliable structural and statistical analyses of three dimensional protein structures should be based on unbiased data. The Protein Data Bank is highly redundant, containing several entries for identical or very similar sequences. A technique was developed for clustering the known structures based on their sequences and contents of alpha- and beta-structures. First, sequences were aligned pairwise. A representative sample of sequences was then obtained by grouping similar sequences together, and selecting a typical representative from each group. The similarity significance threshold needed in the clustering method was found by analyzing similarities of random sequences. Because three dimensional structures for proteins of same structural class are generally more conserved than their sequences, the proteins were clustered also according to their contents of secondary structural elements. The results of these clusterings indicate conservation of alpha- and beta-structures even when sequence similarity is relatively low. An unbiased sample of 103 high resolution structures, representing a wide variety of proteins, was chosen based on the suggestions made by the clustering algorithm. The proteins were divided into structural classes according to their contents and ratios of secondary structural elements. Previous classifications have suffered from subjectice view of secondary structures, whereas here the classification was based on backbone geometry. The concise view lead to reclassification of some structures. The representative set of structures facilitates unbiased analyses of relationships between protein sequence, function, and structure as well as of structural characteristics. (Less)

Please use this url to cite or link to this publication: https://lup.lub.lu.se/record/3853443

author

BOBERG, J ; SALAKOSKI, T and Vihinen, Mauno ^LU

publishing date

1992

type

Contribution to journal

publication status

published

subject

Medical Genetics and Genomics (including Gene Therapy)

keywords

REPRESENTATIVE PDB STRUCTURES, SEQUENCE CLUSTERING, SIGNIFICANCE OF, SEQUENCE SIMILARITY, CLASSIFICATION OF PROTEIN STRUCTURES, AMINO ACID, COMPOSITION

in

Proteins

volume

14

issue

2

pages

265 - 276

publisher

John Wiley & Sons Inc.

external identifiers

wos:A1992JK50600011
scopus:0026661145
pmid:1409573

ISSN

0887-3585

DOI

10.1002/prot.340140212

language

English

LU publication?

no

id

9d35e948-9503-42ce-8a74-0f3ae6f23fca (old id 3853443)

date added to LUP

2016-04-01 15:55:17

date last changed

2025-04-04 15:50:00

@article{9d35e948-9503-42ce-8a74-0f3ae6f23fca,
  abstract     = {{Reliable structural and statistical analyses of three dimensional protein structures should be based on unbiased data. The Protein Data Bank is highly redundant, containing several entries for identical or very similar sequences. A technique was developed for clustering the known structures based on their sequences and contents of alpha- and beta-structures. First, sequences were aligned pairwise. A representative sample of sequences was then obtained by grouping similar sequences together, and selecting a typical representative from each group. The similarity significance threshold needed in the clustering method was found by analyzing similarities of random sequences. Because three dimensional structures for proteins of same structural class are generally more conserved than their sequences, the proteins were clustered also according to their contents of secondary structural elements. The results of these clusterings indicate conservation of alpha- and beta-structures even when sequence similarity is relatively low. An unbiased sample of 103 high resolution structures, representing a wide variety of proteins, was chosen based on the suggestions made by the clustering algorithm. The proteins were divided into structural classes according to their contents and ratios of secondary structural elements. Previous classifications have suffered from subjectice view of secondary structures, whereas here the classification was based on backbone geometry. The concise view lead to reclassification of some structures. The representative set of structures facilitates unbiased analyses of relationships between protein sequence, function, and structure as well as of structural characteristics.}},
  author       = {{BOBERG, J and SALAKOSKI, T and Vihinen, Mauno}},
  issn         = {{0887-3585}},
  keywords     = {{REPRESENTATIVE PDB STRUCTURES; SEQUENCE CLUSTERING; SIGNIFICANCE OF; SEQUENCE SIMILARITY; CLASSIFICATION OF PROTEIN STRUCTURES; AMINO ACID; COMPOSITION}},
  language     = {{eng}},
  number       = {{2}},
  pages        = {{265--276}},
  publisher    = {{John Wiley & Sons Inc.}},
  series       = {{Proteins}},
  title        = {{SELECTION OF A REPRESENTATIVE SET OF STRUCTURES FROM BROOKHAVEN PROTEIN DATA-BANK}},
  url          = {{http://dx.doi.org/10.1002/prot.340140212}},
  doi          = {{10.1002/prot.340140212}},
  volume       = {{14}},
  year         = {{1992}},
}

Lund University Publications

LUND UNIVERSITY LIBRARIES

SELECTION OF A REPRESENTATIVE SET OF STRUCTURES FROM BROOKHAVEN PROTEIN DATA-BANK