ClosedResearch Grant

Funding for Development of Data Corpora for AI

Funded by:

German Research Foundation (DFG)

Grant Amount

EUR: 400,000Research Grant Per Project

EUR: 8,000,000Total Funding Available

Deadline

Jul 29, 2025

Expired

Est. Awards

20 Grants

Subjects

Computer ScienceSocial Sciences

Description

The DFG (German Research Foundation) is offering funding to support the development of data corpora for training artificial intelligence (AI). This initiative addresses the research community's needs, focusing on preparing and providing data as a foundation for AI advancement in research.

Objectives and Scope

The primary objective is to support the establishment of high-quality, extensive data corpora. These corpora should serve as a robust foundation for the development and application of AI methods in research. The aim is to enable research beyond individual projects and improve the provision of scientific information.

The scope of funding includes the compilation and extension of data corpora for AI. Projects may also focus on the conception of selection and quality criteria, and the implementation of quality assurance measures. Additionally, the reuse, adaptation, and application of data cleansing, aggregation, annotation, curation, and harmonisation procedures are within the scope.

Funded projects should ensure the compiled or extended data corpus is accessible through existing scientific information infrastructures. This includes adhering to established principles and standards like FAIR and CARE, and utilizing the National Research Data Infrastructure (NFDI).

Proposal Requirements

Proposals must be submitted in English to facilitate international review. The structure must follow the guidelines in the Proposal Preparation Instructions – Project Proposals in the Area of Scientific Library Services and Information Systems (DFG form 12.01). Applicants should also consult the Guidelines and Supplementary Instructions (DFG form 12.14).

The project description should detail how the planned corpus will support AI development and application. It should also demonstrate the need for the proposed corpus, illustrating use cases that will be enabled through the data. A clear explanation of the corpus composition and justification for data selection is required.

Applicants must outline the criteria for assessing data quality, aligning with accepted standards and FAIR/CARE principles. The proposal should specify the desired data quality and format, detailing the methods to achieve the targeted standard. Long-term accessibility and curation plans for the corpus are essential.

Funding Details

Applicants can request funding for staff, direct project costs, and project-specific workshops. The maximum funding available is €400,000 per project. Funding for investments is not available. The total funding available under this call is up to €8 million.

Projects can run for a maximum duration of two years. The compiled data corpus must be published under a license that allows free use for research purposes. Availability and access to the corpus should be as open as possible for scientific users.

Key interim results should be published after the first year. The data corpus must be made findable in disciplinary and cross-disciplinary directories. The corpus should be documented according to recognised quality standards. Applicants must confirm that no duplicate funding is involved.

Important Dates and Submission

The proposal submission deadline is 30 July 2025. Submission is exclusively via the elan portal. If you are a first-time applicant, registration in the elan portal is required by 23 July 2025.

An informal letter of intent is requested by 28 May 2025. All selected project owners will be required to attend a joint kick-off workshop in the first half of 2026.

A virtual information event will be held from 10:00 to 11:30 on 7 May 2025. Applicants are encouraged to involve both researchers and infrastructure specialists in their projects.

Eligibility and Licensing

Applicants are encouraged to ensure that both researchers working in the subject area and infrastructure specialists are involved in their projects. If a researcher is not named as an applicant, the involvement of researchers on an advisory basis is recommended.

The compiled data corpus is to be published under a licence that allows free use for research purposes. The chosen licence must be specified in the project description. Availability and access to the corpus should be as open as possible for scientific users. If open access cannot be granted, this must be explained in detail in the proposal. In general, access modalities for users must be clearly described.

Please confirm that the following actions will be taken: Key interim results will be published after the first year of the project. The data corpus will be made findable in both disciplinary and cross-disciplinary directories, registries and the like. The corpus will be documented according to recognised quality standards.

Subscribe to Free Alerts

Funding for Development of Data Corpora for AI

Description

Objectives and Scope

Proposal Requirements

Funding Details

Important Dates and Submission

Eligibility and Licensing

Suggested Research Topics

Grant AI Assistant

Ready to Apply?

Related Opportunities

FY26 COPS Hiring Program

Rigorous Impact Evaluation of Programs to Prevent Teen Pregnancy and Achieve Optimal Health

Treatment, Recovery, and Workforce Support

Multiplex Platforms to Assess Indicators of Micronutrient Status, Inflammation and Infectious Disease

Grant Timeline

Eligibility & Coverage

Career Stage

Eligible Countries (Citizenship)

Activity Country

Additional Details