Please remember that the protein selected for this project will be utilized for the duration of the Fall 2021 semester. The goal of this project will be to familiarize students with how bioinformatic tools can be used to create testable hypotheses for proteins if the sequence information is available. Because students are tasked with using the same protein throughout the semester for all analyses, it is suggested that proteins be chosen based on general interest. This may involve interest in the protein’s role in a specific metabolic pathway or based on its function as an enzyme.
A multitude of databases exist to obtain protein sequences from as well as to do subsequent analyses in:
- The National Center for Biotechnology Information (NCBI) maintains various databases that contain thousands of protein and gene sequences. These can be accessed at the following URL, https://www.ncbi.nlm.nih.gov/
- The Swiss Institute of Bioinformatics (SIB) maintains a large Bioinformatics Resource Portal generally referred to as Expasy (https://www.expasy.org/). The Expasy portal allows for users to search for other proteins of similar sequence (UniProt Blast), identify potential cleavage sites for protease enzymes in other proteins (PeptideCutter), identify structural patterns in proteins based only on sequence (MARCOIL), or predict protein structure without any prior input knowledge (SWISS-MODEL). Analytical methods do not have to be chosen from the Expasy Portal, but this should serve as a useful starting point.
- The RCSB Protein Data Bank contains 181, 969 biological macromolecular structures determined at or near atomic resolution. The experimental methods commonly associated with these structures include X-ray crystallography, Nuclear Magnetic Resonance, and Cryo-Electron Microscopy. One task for this assignment will be to utilize protein structure to make mechanistic inferences. As such, the structure for the protein to be chosen for the assignment should either be known (available in the PDB) or predicted using a structure homology tool such as SWISS-MODEL.
Protein Selection Guidelines
Students should feel free to choose proteins based on their general interests. As stated above, this may include an interest in a general metabolic pathway or in a particular enzymatic activity. With that that said, please take note of the following guidelines for protein selection:
- Preference should be given to non-mammalian proteins
- The presence of introns and exons can significantly complicate protein sequence analysis
- Preference should be given to proteins with structures deposited in the Protein Data Bank (ww.rcsb.org). However, please note that this is not a requirement. Use of SWISS-MODEL allows for the prediction of protein structure for those uncharacterized proteins.
- Membrane proteins are acceptable
- Please take note that structural information may not always be available for the full protein sequence. In some cases, smaller parts of the protein may be characterized, but not the full sequence. If such a protein is chosen for this assignment, students should make an effort to address potential functional roles for all of the protein sequence. Prediction of structure and function for these uncharacterized protein segments may provide for an interesting prediction of new cellular functions!
Week 4 Assignment Expectations
The assignment should be 2-4 paragraphs in length and will be graded loosely to ascertain adherence to the outline described here. Please recall that this assignment will serve as a first draft of the introduction to the larger assignment due at the end of the semester. Your assignment should generally be structured to address the following topics:
- Overview of the physiologic context in which the selected protein functions. This will serve as the first introduction for the reader wherein the context is introduced, followed by introduction of the selected protein and how it fits into this physiologic context.
- Please provide information regarding what is known about the structure and function for the selected protein (general background related to the selected protein).
- Please address generally how the protein will be analyzed. This will begin with a statement of your general scientific hypothesis. For example, if only the C-terminal portion of a protein has been structurally characterized, one might ask what function the N-terminal segment serves. Thus, bioinformatic techniques might then be applied to the N-terminal portion of the protein in order to predict structure and subsequent function along with comparison to related proteins.
- Please state what methods you propose to address your general question. Please remember that we are restricted to computational tools available online. We will not be conducting wet lab-experiments to support the assignment. Please be aware that the analytical methods chosen here are not binding. As with any research effort, the methods will change as you learn more and develop a scientific question to apply more relevant approaches. This assignment merely represents a starting point to build from.
All assignments should be submitted to the relevant D2L dropbox in either a PDF or Microsoft Word (.doc or .docx) format.
Please feel free to contact me with any questions that you may have regarding the assignment.