Genome annotation tags sections of a genome with information about the genetic data it contains. This is part of the process at genome projects, where the goal is not just to sequence the DNA of a target organism, but to understand what it does and how it functions. Researchers can conduct annotation in their labs and may share data with other scientists to pool resources and information. Online databases open to the public are available, and some also allow members of the general public to submit their own annotations.
The first step in genome annotation is sequencing, where researchers determine the order of amino acids in an organism's DNA. Sequencing a whole genome takes a long time, and it is common for scientists to start annotating before the genome has been decoded. With a section of sequenced DNA in hand, a researcher can start annotating. Scientists can note where genes appear to start and stop, paying attention to distinctive strings of DNA that contain information about the function of the genome.
Computers are capable of performing some genome annotation on their own. They can seek known patterns, such as strings of amino acids that appear at the beginning and end of genes. In automated annotation, the computer can add notes to different sections of a string of DNA to provide information about it. It is also possible to compare segments from different organisms to look for variance that might provide important information about the species as a whole.
Manual genome annotation involves physically reviewing the DNA. Many researchers use computers to display the information and tag it, allowing them to enter it into databases as they work. In some cases, a manual review may be necessary after automated annotation, to make sure the computer got the information correct. This can be a painstaking process and errors do occur, which is one reason researchers like to pool databases. If an annotation doesn't match others on the same section of DNA, people can evaluate the information to determine what happened, and fix the error.
It is not always possible to determine what a gene does during the genome annotation process. Scientists can flag genes and separate them from other components of the genome, such as non-coding DNA like repeats. This information can be used in research as people develop theories about different segments of the genome. They can add to the annotations to note the function of a gene.