RESEARCH DATA MANAGEMENT (RDM)

Research Data Management refers to the organization, storage, preservation, and sharing of data collected and used in a research project. It encompasses the entire data lifecycle, from planning and collection to analysis, archiving, and reuse. Good data management practices not only protect data integrity and ensure compliance with institutional and funder policies, but also enhance the visibility, impact, and reproducibility of research outcomes.

As universities and research institutions worldwide adopt Open Science principles, RDM plays a critical role in supporting data sharing, facilitating interdisciplinary collaboration, and maximizing the long-term value of research investments. Whether working with experimental results, survey responses, digital texts, or sensor outputs, researchers must be equipped with the knowledge and tools to manage their data responsibly and strategically.

Further readings:


Research data is the foundation upon which scientific inquiry and discovery are built. It encompasses the recorded factual material commonly accepted in the scientific community as necessary to validate research findings. This includes a wide range of formats—such as numerical datasets, text documents, images, audio-visual files, survey results, laboratory notes, software code, and digital models—collected, observed, generated, or created in the course of conducting research.

The nature of research data varies significantly across disciplines. For example, a biologist may work with genetic sequences or microscope images, while a social scientist may collect interview transcripts or statistical survey results. Despite these differences, all research data must be handled with care to ensure it is accurate, reliable, and accessible.

Understanding what constitutes research data and how to manage it effectively is critical not only for ensuring the validity of research, but also for enabling data sharing, reproducibility, and long-term preservation. As global research practices shift towards Open Science, the importance of responsibly managing and curating research data continues to grow.

Managing research data is crucial for several reasons, both practical and ethical. Here’s why it matters:

a. Ensures Data Integrity and Quality

  • Maintains accuracy, consistency, and reliability of data.
  • Supports reproducibility of research findings.

b. Meets Funder and Institutional Requirements

  • Complies with data management plans (DMPs) required by research funders.
  • Aligns with institutional policies and research ethics guidelines.

c. Supports Legal and Ethical Compliance

  • Ensures responsible handling of sensitive or personal data (e.g., GDPR).
  • Protects intellectual property rights and participant confidentiality.

d. Enables Data Sharing and Reuse

  • Facilitates collaboration and verification by others.
  • Allows data to be reused in future research, saving time and resources.

e. Increases Research Visibility and Impact

  • Shared datasets with proper metadata and DOIs can be cited.
  • Enhances credibility and transparency of research.

f. Prevents Data Loss

  • Reduces risk of loss through secure storage, backups, and version control.

g. Improves Research Efficiency

  • Well-organized data is easier to locate, analyze, and interpret.
  • Supports better project planning and workflow management.

h. Supports Long-Term Preservation

  • Ensures that valuable data remain accessible and usable over time.
  • Contributes to the broader scientific record and institutional memory.

Additional resources and information about research data:

Preparing a data management plan before data are collected is claimed to ensure that data are in the correct format, organized well, and better annotated. This can save time in the long term because there is no need to re-organize, re-format, or try to remember details about data. It can increase the research efficiency since both the data collector and other researchers might be able to understand and use well-annotated data in the future.

Data management plans should be updated as the project moves forward or if there are any substantial deviations from the original project plan because they are meant to be working documents.

Data preservation and archiving is a part of a data management strategy. By selecting an archive in advance, the data collector can format data as it is collected to make submitting it to a database simpler in the future.

Example Data Management Plan:

Saving data means keeping track of research materials so that you or others can access and utilize them in the future. Here are three things to think about before preserving your data.

i) Location: When you can, make numerous copies of your data and store them on various types of media. Although the dependability of hard drives, cloud storage, and other choices varies, they will all ultimately break down or become obsolete.

ii) Time: Although data saving takes time, data loss wastes more time. You should regularly back up your data as part of your research procedures, and you should have a strategy for how to save your data when your study is finished.

iii) Format: Data should be kept in a format that makes it possible to use it later. This may involve storing data in open or easily accessible file formats or just keeping your data on hand with the supporting documentation and other research resources.

Research data repositories are trusted platforms used to store, preserve, and share research data. They support open access, ensure long-term availability, and help researchers meet funder or publisher requirements. Repositories can be discipline-specific, general-purpose, or institutional, and often provide tools like DOIs, metadata, and licensing options. Using a repository enhances research visibility, transparency, and collaboration, while promoting the principles of FAIR and Open Science.

Example of Research Data Repositories

General information about data storage and backup:

In the evolving landscape of scholarly communication and Open Science, sharing research data has become a key component of responsible and impactful research practice. Data sharing refers to the act of making research data available to other researchers, institutions, or the public often through online repositories or as part of the publication process.

Sharing data not only enhances the transparency, reproducibility, and credibility of research, but also maximizes the value and reach of the data collected. It allows others to validate findings, reuse data for new research, and build on existing work, thereby accelerating scientific discovery and innovation.

Many research funders, publishers, and institutions now recognize the importance of data sharing and have adopted policies to encourage or require it. As a result, researchers are increasingly expected to make their data accessible in ways that are ethical, secure, and aligned with best practices.

Ultimately, sharing research data benefits not only the wider research community and society, but also the original data producers by increasing citations, fostering collaboration, and enhancing the overall impact of their work.

Sharing research data effectively requires thoughtful planning, ethical consideration, and adherence to standards that ensure data is discoverable, understandable, and reusable. Below are the key steps and best practices:

Step 1. Prepare Your Data for Sharing

  • Clean and organize your data: Remove duplicates, correct errors, and ensure consistency.
  • Anonymize sensitive data: Remove or de-identify personal or confidential information to protect privacy.
  • Document your data thoroughly: Include metadata, data dictionaries, codebooks, or README files explaining how the data was collected, processed, and structured.

Step 2. Choose the Right Repository

Select a suitable data repository that aligns with your discipline, institution, or funder requirements:

  • Disciplinary repositories (e.g., GenBank for genomics, ICPSR for social sciences)
  • Institutional repositories (e.g., your university’s research data platform)
  • General repositories (e.g., Zenodo, Figshare, Dryad)

Look for repositories that provide:

  • Persistent identifiers (like DOIs)
  • Long-term storage
  • Open access options
  • Licensing choices

Step 3. Apply a License for Reuse

  • Use clear licenses (e.g., Creative Commons or Open Data Commons) to specify how others can use your data.
  • Consider a CC BY (attribution required) or CC0 (public domain) license, depending on your goals and institutional policy.

Step 4. Include Metadata and Documentation

  • Use standard metadata schemas (e.g., Dublin Core, DataCite) to make your data more discoverable.
  • Provide comprehensive documentation to help others interpret and reuse your data correctly.

Step 5. Link Data to Your Publications

  • Include a data availability statement in your journal articles or reports.
  • Cite your dataset as you would cite a publication, using its DOI and repository link.

Step 6. Comply with Legal, Ethical, and Funder Requirements

  • Ensure your data sharing practices align with:
    • Informed consent agreements
    • Ethics committee guidelines
    • Funder mandates (e.g., Horizon Europe, NIH, FRGS)
    • Data protection laws (e.g., GDPR, PDPA)

Step 7. Promote Your Shared Data

  • Share links via personal webpages, social media, ORCID, and institutional profiles.
  • Collaborate with librarians or research data managers to increase visibility.

Additional resources and information about data sharing and preservation:

Datasets used during the research process should be cited like you would cite an article – in the reference, cited sources, and bibliographies sections of your works. The practice of citing research data has evolved as researchers and stakeholders have come to understand the value of including data in the scholarly record between a research output and the supporting evidence that supports it.

Citing data will give credit to the responsible researchers and enables those who share the data to assess its impact. It also supports the research infrastructure by linking data and published research, which increases access to the data, offers opportunities for data verification, and encourages the use of data as a scholarly output on par with written works.

Although it is now expected to cite data, academic and professional communities have mostly had difficulty creating standards for mentioning data inside their established citation formats. Follow the citation guidelines provided by the publisher when referencing a dataset in a publication. Gather all the essential components and match the reference for textual articles if they don’t specify a format for datasets.

Example citations

Ministry for the Environment. (2016). Vulnerable catchments (Version 17) [Data set]. https://data.mfe.govt.nz/layer/53523-vulnerable-catchments/

Ministry of Education. (2015). Transient students [Data set]. https://catalogue.data.govt.nz/dataset/transient-students

Klette, R. (2014). [Data for computer vision spatial value statistics] [Unpublished raw data]. Auckland University of Technology.

General information about data citation:

Additional Resources:

Step 1: Plan Your Research Data

  • Identify data types (e.g., quantitative, qualitative, images, code)
  • Choose suitable file formats (CSV, TXT, JPEG, PDF)
  • Decide how you will collect data (surveys, experiments, observation)
  • Determine storage duration and access rights

Step 2: Create a Data Management Plan (DMP)

  • Outline:
    • What data will be collected?
    • How data will be stored and preserved?
    • How data will be shared or published?
    • Security and privacy considerations
  • Use university or funding agency DMP templates

Step 3: Collect and Record Data Carefully

  • Use standardized data collection methods
  • Keep raw/original data intact
  • Record metadata: date, location, equipment, context

Step 4: Organize and Secure Data

  • Use consistent, clear file naming conventions
  • Store data in organized folders with backups
  • Use institutional storage solutions (e.g., university cloud)
  • Control data access with permissions/passwords

Step 5: Document Your Data and Metadata

  • Prepare documentation like:
    • README files describing data content and structure
    • Data dictionaries defining variables and codes
  • Use standard metadata formats if applicable (e.g., Dublin Core)

Step 6: Ensure Ethics and Privacy Compliance

  • Obtain ethics approval for human-related data
  • Anonymize sensitive information to protect identities
  • Follow privacy laws and institutional policies

Step 7: Share and Publish Your Data Properly

  • Select suitable data repositories (e.g., institutional repository, PutraRDRepo, Figshare)
  • Include metadata and supporting documents
  • Obtain DOIs for your datasets for citation
  • Specify data licenses (e.g., Creative Commons)

Step 8: Store and Manage Data Long-Term

  • Plan long-term storage according to policies or grant requirements
  • Update documentation and use durable file formats
  • Ensure project team access after project completion

Step 9: Review and Update Your Data Management Plan

  • Regularly review data management during the project
  • Update your DMP if data handling changes