Closing Date:
Status:
Open
Funding Type:
Fund:
8000000 EUR-Total Funding Available
Applicants:
Activity Country:
Citizenship:
Duration:
2 Years
LoI:
Estimated Grants:
20
Published Date:
The DFG (German Research Foundation) is offering funding to support the development of data corpora for training artificial intelligence (AI). This initiative addresses the research community's needs, focusing on preparing and providing data as a foundation for AI advancement in research.
The primary objective is to support the establishment of high-quality, extensive data corpora. These corpora should serve as a robust foundation for the development and application of AI methods in research. The aim is to enable research beyond individual projects and improve the provision of scientific information.
The scope of funding includes the compilation and extension of data corpora for AI. Projects may also focus on the conception of selection and quality criteria, and the implementation of quality assurance measures. Additionally, the reuse, adaptation, and application of data cleansing, aggregation, annotation, curation, and harmonisation procedures are within the scope.
Funded projects should ensure the compiled or extended data corpus is accessible through existing scientific information infrastructures. This includes adhering to established principles and standards like FAIR and CARE, and utilizing the National Research Data Infrastructure (NFDI).
Proposals must be submitted in English to facilitate international review. The structure must follow the guidelines in the Proposal Preparation Instructions – Project Proposals in the Area of Scientific Library Services and Information Systems (DFG form 12.01). Applicants should also consult the Guidelines and Supplementary Instructions (DFG form 12.14).
The project description should detail how the planned corpus will support AI development and application. It should also demonstrate the need for the proposed corpus, illustrating use cases that will be enabled through the data. A clear explanation of the corpus composition and justification for data selection is required.
Applicants must outline the criteria for assessing data quality, aligning with accepted standards and FAIR/CARE principles. The proposal should specify the desired data quality and format, detailing the methods to achieve the targeted standard. Long-term accessibility and curation plans for the corpus are essential.
Applicants can request funding for staff, direct project costs, and project-specific workshops. The maximum funding available is €400,000 per project. Funding for investments is not available. The total funding available under this call is up to €8 million.
Projects can run for a maximum duration of two years. The compiled data corpus must be published under a license that allows free use for research purposes. Availability and access to the corpus should be as open as possible for scientific users.
Key interim results should be published after the first year. The data corpus must be made findable in disciplinary and cross-disciplinary directories. The corpus should be documented according to recognised quality standards. Applicants must confirm that no duplicate funding is involved.
The proposal submission deadline is 30 July 2025. Submission is exclusively via the elan portal. If you are a first-time applicant, registration in the elan portal is required by 23 July 2025.
An informal letter of intent is requested by 28 May 2025. All selected project owners will be required to attend a joint kick-off workshop in the first half of 2026.
A virtual information event will be held from 10:00 to 11:30 on 7 May 2025. Applicants are encouraged to involve both researchers and infrastructure specialists in their projects.
Applicants are encouraged to ensure that both researchers working in the subject area and infrastructure specialists are involved in their projects. If a researcher is not named as an applicant, the involvement of researchers on an advisory basis is recommended.
The compiled data corpus is to be published under a licence that allows free use for research purposes. The chosen licence must be specified in the project description. Availability and access to the corpus should be as open as possible for scientific users. If open access cannot be granted, this must be explained in detail in the proposal. In general, access modalities for users must be clearly described.
Please confirm that the following actions will be taken: Key interim results will be published after the first year of the project. The data corpus will be made findable in both disciplinary and cross-disciplinary directories, registries and the like. The corpus will be documented according to recognised quality standards.
Natural Language Processing (NLP): Development of corpora for training language models, sentiment analysis, and machine translation.
Computer Vision: Creation of image and video datasets for object recognition, image segmentation, and video understanding.
Bioinformatics: Construction of datasets for genomics, proteomics, and drug discovery.
Social Sciences: Compilation of datasets for social network analysis, opinion mining, and behavioural studies.
Environmental Science: Development of datasets for climate modelling, pollution monitoring, and biodiversity analysis.
Fellowship
Varies Country to Country
Fellowship
Not Specified
Fellowship
Not Specified
Research Grant
5000000 GBP
Research Grant
2000000 EUR
Log in to create free customized alerts based on your prefernces