We are searching data for your request:
Upon completion, a link will appear to access the found materials.
I have been studying protein structure prediction algorithms. A lot of recent work uses something called the PSSM, the position-specific scoring matrix.
I think that what a PSSM does is to build a 2-D matrix of all possible residue pairs in a protein, then scores how likely it is that the two residues mutate in tandem. Co-evolution of two positions which are far from each other in the primary sequence are an indication that they are in contact. When one residue in a contact pair changes, that generally destabilizes the protein, and the other residue in the pair is under selective pressure to compensate. Knowing this gives you a good start on building the protein contact map.
Do I have that right?
If that is correct, then this technique of protein structure prediction depends on having many examples of proteins in a homologous family. You need nature to do a lot of sampling for you. And you need to dredge up massive numbers of homologs from genomics projects. I've read that creating the multiple sequence alignments for PSSM work is computationally intensive. If I understand the process correctly, I can see why.
My main question is: what can protein structure prediction models built using PSSM do, when there isn't a PSSM?
For example, the Top7 protein is a fully-novel protein fold that does not have any homologs in nature. It was created in 2003 using RosettaDesign software. Rosetta's protein structure prediction algorithms predate PSSM as far as I know. Sixteen years later, there are exactly six variants of Top7, all of which have been made in the laboratory. That hardly sounds like enough data for a statistically-valid PSSM, and in any case, the variants were not naturally selected.
If you don't have a PSSM, is it even possible to enter your sequence into a model that expects one?
Thanks for your input.
I think you are making PSSMs out to be much more sophisticated than they really are.
A PSSM is merely a scoring matrix - it gives position specific scores for each residue at a given location.
There is no explicit pairing of interacting residues though that does sound like an interesting approach…
You can learn more about PSSMs from many sources on bioinformatics including NCBI.