Amino acids can be linked together to form chains containing anything from two to many thousands of units. Short chains are known as peptides, while longer chains are called polypeptides, which include proteins. An amino acid sequence is simply the order of these units in a polypeptide chain. In the case of proteins, the sequence determines the molecule’s three-dimensional structure, which in turn is crucial to the protein’s function. The sequences of amino acids in the proteins found in a living organism are coded in that organism’s DNA.
Amino Acid Structure
Amino acids all have a general structure consisting of a carbon atom with an amino group (NH2) on one side, a carboxyl group (COOH) on the other, and what is called an R-group, or side chain. “R” stands for radical, which in this context simply means a part of a molecule. It is the composition of the side chain that distinguishes different amino acids from one another. In the simplest one, glycine, it consists of just a hydrogen atom, but in others, the side chain is more complex. For example, in tyrosine, it has a ring structure, and in lysine, it consists of a long hydrocarbon chain — a molecule made up of a carbon backbone with hydrogen atoms attached.
How Sequences Form
The amino group is basic, and has a positive charge, while the carboxyl group is acidic and carries a negative charge. Since acids and bases react with one another, this makes it possible for the amino group of one amino acid to bond with the carboxyl group of another. This is known as a peptide bond, and it releases a molecule of water as a by-product. Chemical processes like this are known as condensation reactions, because a part of each molecule has been lost in the process: the H from the NH2, and the OH from the COOH group combine to form water (H2O). Strictly speaking, the amino acid units that form peptides and proteins should be called amino acid residues, but they are usually just referred to as amino acids.
Sequence Descriptions
A chain of these units will typically have an amino group at one end, and a carboxyl group at the other. For consistency, sequences are described from left to right, with the amino end, known as the N-terminus, at the left, and the carboxyl end, or C-terminus at the right. It is also possible, however, for the opposite ends of a polypeptide chain to form a peptide bond, resulting in a cyclic molecule.
Proteins, and other polypeptides, can therefore be described by the sequence of amino acid units. For brevity, the names of the units are usually abbreviated to three letters or to just one letter. For example, in the three-letter system, arginine is Arg, leucine is Leu and proline is Pro. In the one-letter system, the letters for these units are R, L and P, respectively. Therefore, a particular amino acid sequence could be represented as Leu-Arg-Leu-Pro-Arg-Pro, or as L-R-L-P-R-P.
Protein Shape and Function
The sequence of units in a protein is known as its primary structure. Bonds can also form, however, between the side chains on a polypeptide chain, causing it to fold over in various ways, and between the side chains of adjacent polypeptide chains. These types of bonding contribute to what are known as the secondary, tertiary and quaternary structures of proteins, which determine the molecules' overall three-dimensional shapes. The bonds between side chains are normally weaker than peptide bonds, and factors such as heat, and various chemical agents, can break them, causing a protein to lose its shape, but preserving the primary structure. This is known as denaturation.
Although there are over 100 known amino acids, only about 20 are found in the proteins that make up living organisms. Nevertheless, these 20 can form many thousands of different sequences, of varying lengths. Many proteins consist of more than one polypeptide chain, and can form huge molecules of enormous complexity.
Proteins, Genes, and DNA
An organism’s DNA can be regarded as a set of instructions for putting together all the proteins it requires. The amino acid sequence necessary for each protein is encoded in the DNA in the form of groups of three nucleotides known as codons, each of which represents a particular amino acid unit. The processes of DNA transcription and RNA translation allow these units to be assembled into the correct sequences to form the necessary proteins when cells divide.
First, the DNA is transcribed to make a strand of messenger RNA, or mRNA. The mRNA moves out of the nucleus and into the cell’s cytoplasm to a ribosome, where translation takes place. The mRNA acts as a template for amino acids, allowing them to be joined together. For each codon, transfer RNA, or tRNA, carries the appropriate free amino acid from the cytoplasm to the ribosome where they are joined to the existing chain. As the mRNA is translated, the units are joined to form the specific sequence for that protein.