Full title: Use of NLP and Text Mining for health, nutrition, and food: Plan
TL/BSC resources, components, corpora and use cases
Speaker: Antonio Miranda and Eulalia Farre
Affiliation: Barcelona Supercomputing Center, Spain
IberHeLT webpage: https://sites.google.com/campus.ul.pt...
IberHeLT proceedings: https://drive.google.com/file/d/130fI...
Presentation: https://docs.google.com/presentation/...
FONA corpus: https://doi.org/10.5281/zenodo.5500308
There is a pressing need to generate more efficient access to food and nutrition-related information applied to health-related content through text mining and natural language processing technologies, not only for data in English but also for other languages like Spanish. For instance, the recent COVID-19 pandemic has also caused noticeable changes in food consumption patterns with potential effects on population health and wellbeing. Most of the previous food-related NLP applications were applied to gastronomy, processing menus, ingredients, and recipes, with far less research on actual clinical and medical application scenarios and content types. The BSC Text Mining Unit, in the context of the Spanish National Plan for Advancement of Language Technologies (Plan TL), has characterized a set of key concepts and entity types of relevance for health and clinical food language technology applications such as food safety (food poisoning, contamination, allergies, food-intolerance, food-drug interactions, food-borne diseases, and patient dietary records). This talk will summarize resources, annotated data types, corpora, and components generated for medical data in Spanish, with potential adaptation to other languages and other application fields (veterinary medicine, agriculture, and environmental health). Particular emphasis will be placed on a novel annotated dataset for the extraction, recognition, and normalization of species mentions, one of the critical concept types for clinical food sciences, and its use for a community evaluation shared task. We will also present the Spanish Food and Health corpus, a dataset annotated with species, diseases, procedures, among other entity types to foster the development of novel language technology applications.
Информация по комментариям в разработке