About the VTM Plot Data

The plot data consists of about 18,000 datasheets of recorded vegetation and environmental data from plots located throughout California and nearby areas of Nevada and Oregon. Most plots were recorded in the 1930's and some as early as the 1920's. Much of the data was collected in timberland since the objective of the VTM project was to survey forest resources (Manual of Field Instructions for Vegetation Type Map of California, pg.1); however, plots were also located in non-timber forests and woodlands, shrublands, and grasslands.


The Brush and Ground Cover plots were 2 x 0.5 chains in size (132 x 33ft or roughly 40 x 10m). VTM field biologists divided the rectangular plot into 100 equal sections and recorded the one herb or shrub species that was dominant in each subplot. They then calculated percent cover for each species by summing the number of subplots containing each dominant species. They also recorded the average height of each species and the depth of the litter.

Trees were sampled in a 1 chain wide strip whose center line coincided with one of the midline edges of the Brush and Ground Cover plot, thus Tree Tally plots were 2 x 1 chains in size (132 x 66ft or roughly 40 x 20m). All trees with a DBH (diameter breast height) of at least 4" were tallied by species into DBH size classes.

In addition to herb, shrub, and tree data, they also recorded: vegetation type, exposure, slope, year of last burn, soil depth and character, parent rock, site index (an index of soil productivity measured by estimating the height attainable by the average dominant trees at 300 years), additional ground cover species, and remarks about the plot. A complete description of the methods for the plot data can be viewed in the VTM field manual (Manual of Field Instructions for Vegetation Type Map of California), by A.E. Wieslander. VTM vegetation plots are referred to as "Vegetation Type Sample Plots" or just "Sample Plots" in VTM documents.


Each VTM plot map was divided into a grid, labeled alphabetically down the side and numerically across the top. The plots were numbered within each section of the grid, for example A13 for grid section A 1, plot 3. We assign each plot a unique identifier by prepending the quad number, i.e. 65A13 for quad 65, grid section A 1, plot 3.


The plot datasheets were entered by the Allen-Diaz lab into an MS Access database developed by staff at the USDA Forest Service, Pacific Northwest Research Station, Forest Inventory and Analysis Program. Each section on the datasheets corresponds to a field in the database. The only field that does not occur on the datasheets is the Miscellaneous Notes field. This field contains additional information about the datasheet that does not fall into one of the standard sections on the datasheet such as notes written on the margins, condition of the datasheet, or other noteworthy comments about the datasheet.


The VTM plot collection is a large dataset that required over a thousand data entry hours to transform almost 18,000 handwritten datasheets from the 1930's into a usable format for the public. Needless to say, this enormous task also involved countless hours of quality control to find and correct data entry errors. We report error rates from quality assurance (QA) tests that should help users gauge the usefulness of the data for their own purposes.

To test the database for accuracy, we (the Allen-Diaz Lab) randomly selected two non-overlapping samples of 100 plots from the collection and compared what was entered to what was written on the datasheets. Each field in the database corresponds to a section on the datasheet. We documented errors we found for each field so we could calculate a percent error per field. Errors were counted if it resulted in an incorrect or missing value. If what was written was slightly different than what was entered but did not change the meaning, it was not counted as an error. For example, in the geographic location field, if "1/2 mile" was changed to "0.5 mile" it was not counted as an error. Results from the QA tests are shown in the table below.

Field Random Subsample
First 100 Second 100 Total (all 200)
Plot Number 0.0% 1.0% 0.5%
Date 0.0% 0.0% 0.0%
Taken By 0.0% 1.0% 0.5%
Quadrangle 0.0% 0.0% 0.0%
Geographic Location 0.0% 3.0% 1.5%
Township 0.0% 0.0% 0.0%
Range 1.0% 0.0% 0.5%
Section 1.0% 0.0% 0.5%
Exposure 0.0% 0.0% 0.0%
Slope 5.0% 4.0% 4.5%
Elevation 0.0% 1.0% 0.5%
Site Index 0.0% 1.0% 0.5%
Penetrability 0.0% 0.0% 0.0%
Year of Last Burn 0.0% 0.0% 0.0%
Special Fire Hazards 0.0% 2.0% 1.0%
Veg. Type 0.6% 2.8% 1.7%
Ground Cover_Species Code 2.0% 2.7% 2.4%
Ground Cover_% Cover 0.9% 1.4% 1.1%
Ground Cover_Height (ft) 0.5% 0.5% 0.5%
Ground Cover_Litter Depth (in) 0.0% 1.0% 0.5%
Additional Ground Cover Spp. 2.9% 1.1% 1.9%
Tree_Species Code 1.6% 4.4% 2.7%
Tree_DBH 0.8% 2.9% 1.9%
Tree_Total 0% 0% 0%
Tree_Height (ft) 0.0% 2.2% 1.1%
Parent Rock 11.0% 11.0% 11.0%
Soil Origin 3.0% 1.0% 2.0%
Soil Depth 1.0% 0.0% 0.5%
Soil Character 2.0% 3.0% 2.5%
Excessive Erosion Evidence 1.0% 0.0% 0.5%
Remarks 2.0% 1.0% 1.5%
Misc Notes 1.0% 6.0% 3.5%

All errors in parent rock field were errors of omission, ie, no data was in the field when data were actually available. In the slope field, 55% of the errors were errors of omission. These occurred when the value on the datasheet was text rather than a numeric value. These entries were left blank and the text was entered in the miscellaneous field.


We converted the database Access to MySQL, a freely available and powerful database system. This conversion was done to facilitate publishing the data in the various web tools and web browsing applications we have developed to analyze and display the data. MySQL is well supported by other software critical to the development of these tools (Apache web server software, PHP and Perl programming languages, etc.). Having the data in an SQL language compliant database also enhances the complexity of the searches we can conduct.

The database for the VTM project is maintained at the College of Natural Resources Statistics and Bioinformatics Consulting Service (http://nature.berkeley.edu/sbcs).


Although we have taken great pains to ensure the accuracy of the data, there are some aspects of the data of which we feel the user should be aware. Firstly, our objective in making the plot data available was simply to transcribe what was recorded on the VTM plot datasheets into a usable digital format. We corrected obvious spelling errors, but we did not try to interpret ambiguous or possibly incorrect data recorded on the datasheets by the VTM field recorders. We cannot make an estimate of the accuracy of the data recorded by the VTM personnel.

All species names in the plot database are consistent with the time period of the VTM data collection (mainly 1930's) during which time the first Jepson manual (Jepson 1925) was widely used in California. Since that time many taxonomic changes have occurred along with the publications of the Munz manual (Munz and Keck 1968) and the second Jepson manual (Hickman 1993), among others. It is important to understand that not all revisions are a straightforward replacement of the old name with the current name. Taxonomic changes can also include lumping and splitting of taxa, changes in rank, and circumscription changes. We did not update any of the names in the plot collection to current names. We have provided a link in the plot data query tool that provides historical synonyms, but users should be aware of the complexities of certain taxonomic changes and be judicious in their use.

There are some especially troublesome species codes that may provide a challenge to users of the plot data. Sometimes the same species code in the VTM species lists can code for two different species. For example, one of the species lists shows A2 as the code for Agrostis sp. as well as Alnus rhombifolius. In cases such as these, the web site is designed to show all possible species names that apply to a code if more than one exists (an ambiguous species code). In another example, the code R was often used to denote the presence of rock, although R is also the code for Sequoia sempervirens. Many of these cases can be decided by the user just by the location of the entry on the datasheet. For example, if R is recorded in the tree tally section and has dbh values, then it is most certainly referring to the tree and not to rock.

Also, many species codes are different in slight but important ways which may have been overlooked by the data entry person or by the VTM field recorder. If one letter or symbol of the code was incorrectly entered or recorded it could translate into a different species or an unidentified species code. For example, species codes for grasses were underlined. During data entry those codes were entered with a 2 following the code to denote the underline. If, for example, the data entry person did not notice that the code LP was underlined it would translate to Pinus flexilis (LP) instead of Leptochloa sp. (LP2). Also, codes that depend on the case of the second letter can be problematic. For instance, depending on the VTM recorder's handwriting it may be difficult to distinguish a lowercase l with the number 1, which again could lead to an incorrect translation from code to species name. If the data entry person could not read the VTM field recorder's handwriting, the entry was entered as "illegible".

Many entries recorded in the Vegetation Type section of the plot datasheets are not the same as the mapped vegetation type on the corresponding Vegetation Type Map. Most of the time the vegetation type was recorded in words, like "Sagebrush", which is straightforward; but VTM field recorders often recorded the vegetation type instead as a string of species codes, like "AtrGr2", to denote the dominant species in the area. Atr is code for Artemisia tridentata and Gr2 is code for Grass, so in this case the database would show Artemisia tridentata and Grass as the entry for this plot. These species codes would have been later interpreted by VTM personnel to fall into one of the vegetation types and the plot would have been mapped accordingly. However, if a user queries the Vegetation Type field of the database for all the Sagebrush vegetation type plots, only the plots that actually have the word "Sagebrush" in the Type field would be included in the query results, even though the "Artemisia tridentata and Grass" plot may have been mapped as a Sagebrush type.

The species code "W" or "w" is of special concern when it appears in the Vegetation Type field. The lowercase w always follows an uppercase letter such as "Dw". D is the code for Douglas fir (1930's name was Pseudotsuga taxifolia)*; and W is the code for Quercus wislizenii. It appears that when used in the type field, W and w may sometimes refer to the species Abies concolor (species code W1). We think this may be true because in some cases it is recorded in plots that 1) list W1 in the tree tally section and not W, and 2) these plots are also outside of the elevational range of Quercus wislizenii. In other cases it does appear to code for Quercus wislizenii. In either case, the database will translate W to Quercus wislizenii because we have found no documented source (no official VTM species list or other official VTM source) that W can also refer to Abies concolor.

Our task was to provide what was written on the datasheets in a digital format. The type field as it exists may not be the best way to answer research questions about vegetation types, since the type field data on the datasheets is often in a raw species list format. The order of the species codes listed in this field was important in translating to a vegetation type (Manual of Field Instructions for Vegetation Type Map of California), and this was done by VTM personnel for the preparation of the Vegetation Type Map. Therefore, data from existing digitized Vegetation Type Maps (digitization is currently in progress) are the best source of classifying vegetation of particular plots using the VTM method of classification.

*D is the species code for Quercus douglasii in certain quadrangles, as is documented in quadrangle specific species lists. When translating codes into species names, the database defers first to the quadrangle specific list if one exists, but not all quadrangles have their own species list. All other codes come from global species lists, ie, official VTM species lists that are not particular to one specific quadrangle.

Works Cited

  1. Hickman, James C. (editor), 1993. The Jepson Manual: Higher Plants of California. Berkeley: University of California Press, 1400 pp.
  2. Jepson, Willis Linn, 1925. Manual of the Flowering Plants of California. Berkeley: University of California Press, 1238 pp.
  3. Munz, Philip A. and David D. Keck, 1968. A California Flora and Supplement. Berkeley: University of California Press, 1681 pp.