Why describe your data?
If you want your data to be useful for your future self, your colleagues, or other researchers, you must describe it. Beyond what you planned, what you did, what happened, and what you think it means, describe your data in detail. The sooner you start, the more likely you will remember everything!
Where should I start?
- Scope and structure. Do your best to describe the content, formats, and any internal relationships in your dataset.
- Glossaries and legends. Define any terms, codes, and variables that you use.
- Context. Provide details relevant for validation/replication (e.g. funding sources, timetables, collaborators, location, and environmental conditions).
- Methods. Describe techniques, software, and hardware used in data collection.
- Analysis. Describe steps you took in processing and analyzing your data.
- Attribution. Cite your data sources.
- Access. Provide details about confidentiality, access & use conditions of your data.
What about metadata?
- Data description is metadata. The formal/structured description associated with your data is the metadata that enables someone (even your future self!) to find it, access it, determine value, and potentially use it.
- Meet minimum requirements. Depending on how you plan to share and/or archive your data, you may need to meet discipline standards or repository requirements for your metadata.
- Choose a format. Even just including a general narrative in a text file (e.g. readme.txt) associated with your datasets is better than nothing! Or use standard element sets, input guidelines, and controlled vocabularies along with CSV, XML, or RDF encoding formats for your data description to make it more interoperable and machine-readable.
Other materials associated with your primary data set, including those listed below, should be digitized and included with your data. They should also at least be referenced in your data description, if not fully described:
- Code books or lab books
- Data dictionaries
- Field notes
Tools and Resources
- This data description checklist is designed to help you to make your data reusable.
- Find metadata standards by discipline/subject area: http://rd-alliance.github.io/metadata-directory/
- Dublin Core is a domain-agnostic, well-known and widely used standard for simple, generic descriptions.
- Electronic Lab Notebooks are a useful tool for documenting and managing data throughout your project. There are many options (e.g., LabArchives, Evernote), each with unique features for various workflows.
- This readme.txt template provides detailed recommendations for how to describe and cite your data.
- Metadata requirements for Texas ScholarWorks, UT’s institutional repository.
- Schedule a consultation with our metadata specialists: contact Melanie Cofield at firstname.lastname@example.org.