Skip to Main Content

Research Data Management (RDM)

Data Organization Conventions

Effective data organization is critical for maintaining research integrity, making data findable, and ensuring long-term usability. Here are three essential components to consider:

Naming Conventions
Using clear and consistent file naming conventions helps researchers and collaborators quickly locate, understand, and manage files. A good naming convention should be:

  • Descriptive: Include details like project name, date, version, and file content.
  • Consistent: Stick to the same pattern throughout the project. Avoid spaces and special characters, using underscores (_) or hyphens (-) instead.
  • Sortable: Use logical orderings, such as year-month-day (YYYYMMDD), to make files easier to find when sorted.
    • Using YYYYMMDD improves the readability and organization of datasets, especially in environments that require long-term data preservation or global collaboration. This format is also ISO 8601 compliant, which is widely accepted across systems and regions.

Example:
ProjectName_YYYYMMDD_Version_Description.ext
StudyXYZ_20240906_v01_DataCollection.csv


Stable File Formats
Choosing stable, widely-used file formats is critical for long-term data preservation and interoperability. Proprietary formats may become obsolete or require specific software to access, so opting for non-proprietary, open formats is a safer choice for future-proofing your data.

  • Preferred formats include:
    • Text: .txt, .csv (for structured data)
    • Images: .tiff, .png
    • Documents: .pdf (for final versions), .xml
    • Data: .csv, .json

These formats are recognized for being well-documented, widely supported, and unlikely to become inaccessible due to software changes.


Version Control
Maintaining version control is crucial for tracking changes in files over time, preventing data loss, and ensuring that collaborators work with the correct version of the file. Without clear versioning, you risk confusion over outdated or incorrect data.

Strategies for version control:

  • Manual versioning: Append file names with version numbers (e.g., _v01, _v02). This method is simple but requires diligence in naming.
  • Automated versioning tools: Use tools like Git or Subversion to automatically track changes, especially in collaborative projects.
  • Version control guidelines: Establish a clear process for updating and managing files so that everyone on the team follows the same rules.

By using version control, you ensure that all changes are documented, making it easier to revert to previous versions or identify where mistakes were introduced.

Biased data structuring or categorization occurs when data is organized or labeled in a way that reflects or reinforces societal biases. This can lead to skewed analyses and misrepresentation of certain groups. For example, categorizing racial or ethnic groups using outdated or overly broad terms can obscure important nuances and perpetuate stereotypes.

Ways to Ensure Equitable Practices:

  1. Use Inclusive Categories: Ensure data categories reflect the diversity of the population.
  2. Collaborate with Stakeholders: Engage affected communities in the structuring process.
  3. Review for Cultural Sensitivity: Ensure labels and classifications are respectful and relevant.

Metadata: Describing Your Data

What is Metadata? Metadata is information that describes your data. It answers important questions like who collected the data, what the data is about, when and where it was collected, and how it was gathered. Think of it as the details that help others understand your data clearly.

Why is Metadata Important? Without metadata, it can be difficult for others to find or understand your data. Good metadata ensures that your data:

  • Can be found by others.
  • Is accessible and easy to understand.
  • Can be used together with other datasets.
  • Is reusable, allowing others to build on your work.

Metadata supports the FAIR principles by making sure your data is Findable, Accessible, Interoperable, and Reusable. This is important for Open Science because it helps share knowledge with others, making science more open and collaborative.

How Metadata Helps:

  • It makes your data easier to understand and use.
  • It helps others find and access your data.
  • It ensures your data can be combined with other data for new research.

In short, metadata is essential for making sure your data is useful, clear, and available for future research.


A metadata schema is a standardized framework used to describe, organize, and manage data. It defines specific elements or fields (like title, author, date) and the rules for how these elements should be used. These schemas ensure that datasets are properly documented, making them easier to find, cite, and reuse across platforms and disciplines.

Researchers don’t always need to follow a formal metadata schema, but it can be useful for sharing and reusing data across platforms. For smaller or internal projects, being consistent with naming conventions (see above) and using clear documentation can be enough. It’s important to name files and variables in a way that’s logical and easy to understand, so others can still make sense of the data. Using either a formal schema or consistent practices helps ensure the data remains usable and organized.

Common metadata schemas are listed below. It is important to note, that different disciplines may use other schemas not listed here, based on their specific needs.

README

Don't forget the README file! README files are essential for ensuring that others can understand and effectively use your dataset. They provide key information about the context, structure, and usage of the data, including descriptions of variables, file formats, and any relevant methodologies. README files also clarify any necessary steps for reproducing results or interpreting the data correctly. This ensures that your research is accessible and reusable by others, improving transparency and replicability.

Key Elements

  • Cintact information: How to reach the data authors.
  • Project title: The name of the dataset or project.
  • Description of the dataset: Overview of what the data contains, the context, and how it was collected.
  • File structure: Explanation of folder and file organization.
  • Variable descriptions: Definitions for all variables in the dataset.
  • File formats: Information about file types and their usage.
  • Usage instructions: Steps for using or interpreting the data.
  • Version control: Notes on versions and changes.

Other elements you might want to include:

  • Information about uncertainty.
  • References to publications that describe the dataset and/or it's processing

.txt format is ideal because it's universally readable across platforms, lightweight, and avoids compatibility issues with proprietary formats.

Guides

Last Updated: Nov 8, 2024 7:44 AM