An automatic information extraction system for scientific articles on COVID-19 — ScienceDaily

The global biohealth research community is working hard to generate knowledge about COVID-19 and SARS-CoV-2. In practice, this effort results in a massive and very rapid production of scientific publications, which makes it difficult to consult and analyze all the information. This is why experts and decision-making bodies must be equipped with information systems enabling them to acquire the knowledge they need.

This is precisely what has been explored in the VIGICOVID researchers’ project carried out by the HiTZ Center of the UPV/EHU, the NLP & IR group of the UNED and the Artificial Intelligence and Language Technologies Unit of Elhuyar, thanks to funding from the Fondo Supera COVID-19 granted by the CRUE. In the study, under the coordination of the UNED research group, they created a prototype to extract information through natural language questions and answers from an updated set of scientific articles on COVID-19. and SARS-CoV-2 published by the global research community. .

“The paradigm of information retrieval is changing thanks to artificial intelligence,” said Eneko Agirre, head of the HiTZ center at UPV/EHU. “Until now, when searching for information on the Internet, a question was entered and the answer had to be searched for in the documents displayed by the system. However, in accordance with the new paradigm, systems that provide the answer directly without any need to read the entire document are becoming more widespread.”

In this system, “the user does not request information using keywords, but directly asks a question”, explains Xabier Saralegi, researcher at Elhuyar. The system searches for answers to this question in two stages: “First, it retrieves documents that may contain the answer to the question asked using a technology that combines keywords with direct questions. This is why we explored neural architectures,” added Dr. Saralegi. Example-powered deep neural architectures were used: “This means that search models and question-answer models are trained using deep machine learning.”

Once all the documents have been extracted, they are reprocessed by a question-answer system in order to obtain precise answers: “We built the engine that answers the questions; when the engine receives a question and a document, it is able to detect whether or not the answer is in the document, and if so, it tells us exactly where it is,” explained Dr. Agirre.

An easily marketable prototype

The researchers are satisfied with the results of their research: “Among the techniques and evaluations that we have analyzed in our experiments, we have retained those which give the prototype the best results”, specifies the researcher from Elhuyar. A solid technological base has been established and several scientific articles on the subject have been published. “We have found another way to perform searches whenever information is urgently needed, which facilitates the process of using the information. At the research level, we have shown that the proposed technology works and that the system gives good results,” Agirre said. underline.

“Our result is a prototype of a basic research project. It is not a commercial product,” Saralegi stressed. But such prototypes can be modeled easily in a short time, which means they can be marketed and made available to society. These researchers point out that artificial intelligence makes it possible to provide increasingly powerful tools for working with large documentary bases. “We are progressing very quickly in this field. Moreover, everything that is studied can easily arrive on the market”, concludes the UPV/EHU researcher.

Source of the story:

Material provided by University of the Basque Country. Note: Content may be edited for style and length.

Comments are closed.