CN116910254A - Method, device, computer equipment and storage medium for generating research report - Google Patents

Method, device, computer equipment and storage medium for generating research report Download PDF

Info

Publication number
CN116910254A
CN116910254A CN202310868212.0A CN202310868212A CN116910254A CN 116910254 A CN116910254 A CN 116910254A CN 202310868212 A CN202310868212 A CN 202310868212A CN 116910254 A CN116910254 A CN 116910254A
Authority
CN
China
Prior art keywords
medical
literature
information
document
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310868212.0A
Other languages
Chinese (zh)
Inventor
周立运
请求不公布姓名
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Rubik's Cube Medical Technology Suzhou Co ltd
Original Assignee
Rubik's Cube Medical Technology Suzhou Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Rubik's Cube Medical Technology Suzhou Co ltd filed Critical Rubik's Cube Medical Technology Suzhou Co ltd
Priority to CN202310868212.0A priority Critical patent/CN116910254A/en
Publication of CN116910254A publication Critical patent/CN116910254A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/166Editing, e.g. inserting or deleting
    • G06F40/177Editing, e.g. inserting or deleting of tables; using ruled lines
    • G06F40/18Editing, e.g. inserting or deleting of tables; using ruled lines of spreadsheets
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • G06N3/0442Recurrent networks, e.g. Hopfield networks characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU]
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Abstract

The application discloses a method, a device, computer equipment and a storage medium for generating a research report. The method comprises the following steps: acquiring a first medical literature set of a target medical field within a preset time period; extracting structural information from target contents of the first medical literature set to obtain structural literature information of the first medical literature set; and generating a research report of the target medical field according to the structured literature information. Because the embodiment of the application aims at the document set of the target field in the preset period, the structured information of the target content in each document is extracted, and the research report of the target medical field is automatically generated instead of manually reading and understanding the medical document to make the research report in the prior art, the embodiment of the application can reduce the workload of generating the research report of the medical field and improve the efficiency of generating the research report.

Description

Method, device, computer equipment and storage medium for generating research report
Technical Field
The present application relates to the field of report generation technologies, and in particular, to a method and apparatus for generating a research report, a computer device, and a storage medium.
Background
In the medical field, collecting and sorting out research reports is an extremely important activity, as it is able to provide medical institutions, doctors, scientists and patients with detailed information about new technologies, new products and new treatment methods, thereby helping to make better medical decisions and provide better medical services to patients.
For research reports with high requirements on the professionals in the medical field, a great deal of documents are usually required to be read by experts in the relevant medical field, and the steps of subject screening, content understanding, information extraction, typesetting and finishing are manually carried out on the documents to finally form the documents, so that the process consumes a great deal of expert time and is low in efficiency.
Disclosure of Invention
The application provides a method, a device, computer equipment and a storage medium for generating a research report, which can automatically generate the research report in the target medical field, reduce the workload of generating the research report in the medical field and improve the efficiency of generating the research report.
In one aspect, the present application provides a method for generating a research report, the method comprising:
acquiring a first medical literature set of a target medical field within a preset time period;
extracting structural information from the target content of the first medical literature set to obtain structural literature information of the first medical literature set;
And generating a research report of the target medical field according to the structured literature information.
In some embodiments of the present application, the extracting structural information from the target content of the first medical literature set to obtain structural literature information of the first medical literature set includes:
taking each medical document in the first medical document set as a target medical document, and performing data preprocessing on target contents in the target medical document to obtain preprocessed document information;
carrying out structured information extraction on the preprocessed literature information to obtain structured literature information of the target medical literature;
and after each medical document in the first medical document set finishes the extraction of the structural information, obtaining the structural document information of the first medical document set.
In other embodiments of the present application, the pre-processed document information is text data;
the step of extracting the structural information of the preprocessed literature information to obtain the structural literature information of the target medical literature comprises the following steps:
text labeling is carried out on the preprocessed literature information so as to identify medical naming entities in the preprocessed literature information;
Linking the medical named entity to a medical entity in a preset knowledge base to obtain medical literature information after entity linking;
and generating the structural literature information of the target medical literature according to the medical literature information after the entity is linked.
In other embodiments of the present application, the generating the structured literature information of the target medical literature based on the literature information after the entity linking includes:
converting the medical document information after the entity link into a table form to obtain tabular medical document information;
and carrying out data post-processing on the tabulated medical literature information to obtain the structural literature information of the target medical literature.
In other embodiments of the present application, the acquiring a first medical document set of the target medical field for a preset period of time includes:
acquiring an initial medical literature set of a target medical field in a preset time period;
screening the initial medical literature set according to a first preset literature attribute to obtain a second medical literature set;
classifying the second medical literature set to obtain a third medical literature set;
a first medical document set of the target medical field is extracted from the third medical document set.
In other embodiments of the present application, the classifying the second medical document set to obtain a third medical document set includes:
classifying the second medical literature set according to second preset literature attributes to obtain a fourth medical literature set;
respectively labeling and classifying the fourth medical document set according to a preset label to obtain a labeling and classifying result of a fifth medical document set;
and determining a third medical literature set in the fifth medical literature set according to the labeling classification result.
In other embodiments of the present application, the determining a third medical document set from the fifth medical document set according to the labeling classification result includes:
respectively sending the labeling classification results of the fifth medical document set to corresponding classified rechecking user terminals so that the rechecking user terminals can confirm the accuracy of the labeling classification results;
obtaining a classification feedback result fed back by the rechecking user terminal, and updating the labeling classification result according to the classification feedback result;
and taking the fifth medical literature set with the updated classification result as a third medical literature set.
On the other hand, the application also provides a device for generating the research report, which comprises:
The document acquisition module is used for acquiring a first medical document set of the target medical field in a preset time period;
the information extraction module is used for extracting structural information of the target content of the first medical literature set to obtain structural literature information of the first medical literature set;
and the report generation module is used for generating a research report of the target medical field according to the structured literature information.
In some embodiments of the present application, the information extraction module is specifically configured to:
taking each medical document in the first medical document set as a target medical document, and performing data preprocessing on target contents in the target medical document to obtain preprocessed document information;
carrying out structured information extraction on the preprocessed literature information to obtain structured literature information of the target medical literature;
and after each medical document in the first medical document set finishes the extraction of the structural information, obtaining the structural document information of the first medical document set.
In other embodiments of the present application, the pre-processed document information is text data;
the information extraction module is specifically configured to: text labeling is carried out on the preprocessed literature information so as to identify medical naming entities in the preprocessed literature information; linking the medical named entity to a medical entity in a preset knowledge base to obtain medical literature information after entity linking; and generating the structural literature information of the target medical literature according to the medical literature information after the entity is linked.
In other embodiments of the present application, the information extraction module is specifically configured to: converting the medical document information after the entity link into a table form to obtain tabular medical document information; and carrying out data post-processing on the tabulated medical literature information to obtain the structural literature information of the target medical literature.
In other embodiments of the application, the document acquisition module is specifically configured to:
acquiring an initial medical literature set of a target medical field in a preset time period;
screening the initial medical literature set according to a first preset literature attribute to obtain a second medical literature set;
classifying the second medical literature set to obtain a third medical literature set;
a first medical document set of the target medical field is extracted from the third medical document set.
In other embodiments of the application, the document acquisition module is specifically configured to:
classifying the second medical literature set according to second preset literature attributes to obtain a fourth medical literature set;
respectively labeling and classifying the fourth medical document set according to a preset label to obtain a labeling and classifying result of a fifth medical document set;
And determining a third medical literature set in the fifth medical literature set according to the labeling classification result.
In other embodiments of the application, the document acquisition module is specifically configured to:
respectively sending the labeling classification results of the fifth medical document set to corresponding classified rechecking user terminals so that the rechecking user terminals can confirm the accuracy of the labeling classification results;
obtaining a classification feedback result fed back by the rechecking user terminal, and updating the labeling classification result according to the classification feedback result;
and taking the fifth medical literature set with the updated classification result as a third medical literature set.
In another aspect, the present application also provides a computer device, which includes a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method for generating a study report as described in any implementation of the first aspect when executing the computer program.
In another aspect, the application is a computer readable storage medium having stored thereon a computer program to be loaded by a processor for performing the steps of the method for generating a research report as described in any of the embodiments of the first aspect.
Compared with the prior art, in the embodiment of the application, the first medical literature set of the target medical field in the preset time period is acquired; extracting structural information from target contents of the first medical literature set to obtain structural literature information of the first medical literature set; and generating a research report of the target medical field according to the structured literature information. Because the embodiment of the application aims at the document set of the target field in the preset period, the structured information of the target content in each document is extracted, and the research report of the target medical field is automatically generated instead of manually reading and understanding the medical document to make the research report in the prior art, the embodiment of the application can reduce the workload of generating the research report of the medical field and improve the efficiency of generating the research report.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic diagram of a scenario of a research report generation system provided by an embodiment of the present application;
FIG. 2 is a flow diagram of one embodiment of a method of generating a research report provided in an embodiment of the present application;
FIG. 3 is a schematic diagram of an embodiment of the present application after labeling documents according to preset labels;
FIG. 4 is a schematic flow chart of an embodiment of the present application for extracting structured document information from target content of a first medical document set;
FIG. 5 is a schematic structural diagram of a device for generating a research report according to an embodiment of the present application;
FIG. 6 is a schematic diagram of a computer device according to an embodiment of the present application;
FIG. 7 is a schematic diagram of a mobile phone according to an embodiment of the present application;
fig. 8 is a schematic structural diagram of a server according to an embodiment of the present application.
In the drawings, the same or corresponding reference numerals indicate the same or corresponding parts.
Detailed Description
The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to fall within the scope of the application.
In the description that follows, embodiments of the application will be described with reference to steps and symbols performed by one or more computers, unless otherwise indicated. Thus, these steps and operations will be referred to in several instances as being performed by a computer, which as referred to herein performs operations that include processing units by the computer that represent electronic signals that represent data in a structured form. This operation transforms the data or maintains it in place in the computer's memory system, which may reconfigure or otherwise alter the computer's operation in a manner well known to those skilled in the art. The data structure maintained by the data is the physical location of the memory, which has specific characteristics defined by the data format. However, the principles of the present application are described in the foregoing text and are not meant to be limiting, and one skilled in the art will recognize that various steps and operations described below may also be implemented in hardware.
The term "module" or "unit" as used herein may be considered a software object executing on the computing system. The various components, modules, engines, and services described herein may be viewed as implementing objects on the computing system. The apparatus and methods described herein are preferably implemented in software, but may of course also be implemented in hardware, all within the scope of the application.
As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless expressly stated otherwise, as understood by those skilled in the art. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may also be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or wirelessly coupled. The term "and/or" as used herein includes all or any element and all combination of one or more of the associated listed items.
In the medical arts, collecting and organizing research reports is an extremely important activity because it can provide medical institutions, doctors, scientists, and patients with detailed information about new technologies, new products, and new treatment methods, thereby helping to make better medical decisions and provide better medical services to patients.
Currently, there are a wide variety of audience needs for medical research reports, one of which has 3 specific needs: 1. collecting and sorting important (influencing) documents under a certain theme; 2. understanding the summary of the literature; 3. the spirit of the whole article is shown in a strong logic expression and refining manner (often shown in a chart mode). Aiming at the above requirements, a large number of documents are generally required to be read from a plurality of places by experts in the related field, subject screening, content understanding, information extraction and typesetting arrangement are carried out on the documents, and finally research reports meeting the requirements are formed, so that the process inevitably consumes a large amount of time and cost of the experts and is not efficient.
Therefore, in order to improve the efficiency of generating the medical research report and produce the medical research month report meeting the needs of the audience, the embodiment of the application provides a method, a device, a computer device and a storage medium for generating the research report.
Referring to fig. 1, fig. 1 is a schematic view of a report generating system according to an embodiment of the present application, where the report generating system may include a computer device 100, and the computer device 100 is connected through a network, and a generating apparatus of a research report is integrated in the computer device 100. In the embodiment of the present application, the computer device 100 may be a terminal device or a server.
In the embodiment of the present application, in the case where the computer device 100 is a server, the server may be an independent server, or may be a server network or a server cluster formed by servers, for example, a server described in the embodiment of the present application includes, but is not limited to, a computer, a network host, a single network server, a plurality of network server sets, or a cloud server formed by a plurality of servers. Wherein the Cloud server is composed of a large number of computers or web servers based on Cloud Computing (Cloud Computing). In embodiments of the present application, communication between the server and the client may be accomplished by any means of communication, including, but not limited to, mobile communication based on the third generation partnership project (3rd Generation Partnership Project,3GPP), long term evolution (Long Term Evolution, LTE), worldwide interoperability for microwave access (Worldwide Interoperability for Microwave Access, wiMAX), or computer network communication based on the TCP/IP protocol family (TCP/IP Protocol Suite, TCP/IP), user datagram protocol (User Datagram Protocol, UDP), etc.
It will be appreciated that when the computer device 100 used in the embodiments of the present application is a terminal device, the terminal device may be a device that includes both receiving hardware and transmitting hardware, i.e., a device having receiving and transmitting hardware capable of performing bi-directional communications over a bi-directional communication link. Such a terminal device may include: a cellular or other communication device having a single-line display or a multi-line display or a cellular or other communication device without a multi-line display. The computer device 100 may be a desktop terminal or a mobile terminal, and the computer device 100 may be one of a mobile phone, a tablet computer, a notebook computer, and the like.
The terminal device according to the embodiment of the present application may also be a device that provides voice and/or data connectivity to a user, a handheld device with a wireless connection function, or other processing device connected to a wireless modem. Such as mobile telephones (or "cellular" telephones) and computers with mobile terminals, which can be portable, pocket, hand-held, computer-built-in or car-mounted mobile devices, for example, which exchange voice and/or data with radio access networks. For example, personal communication services (English full name: personal Communication Service, english short name: PCS) telephones, cordless telephones, session Initiation Protocol (SIP) phones, wireless local loop (Wireless Local Loop, english short name: WLL) stations, personal digital assistants (English full name: personal Digital Assistant, english short name: PDA) and the like.
It will be appreciated by those skilled in the art that the application environment shown in fig. 1 is merely an application scenario of the present application, and is not limited to the application scenario of the present application, and other application environments may also include more or fewer computer devices than those shown in fig. 1, or a network connection relationship of computer devices, for example, only 1 computer device is shown in fig. 1, and it will be appreciated that the report generating system may also include one or more other computer devices, or/and one or more other computer devices that are network connected to the computer device 100, and is not limited herein.
In addition, as shown in fig. 1, the report generating system may further include a memory 200 for storing data, such as acquired medical document data, and classification results, labeling results, etc. of medical documents.
It should be noted that, the schematic view of the scenario of the report generating system shown in fig. 1 is only an example, and the report generating system and scenario described in the embodiment of the present application are for more clearly describing the technical solution of the embodiment of the present application, and do not constitute a limitation on the technical solution provided by the embodiment of the present application, and those skilled in the art can know that, with the evolution of the report generating system and the appearance of a new service scenario, the technical solution provided by the embodiment of the present application is equally applicable to similar technical problems.
The scheme provided by the embodiment of the application relates to artificial intelligence (Artificial Intelligence, AI), computer Vision (CV), machine Learning (ML) and other technologies, and is specifically described by the following embodiments:
the AI is a theory, a method, a technology and an application system which simulate, extend and extend human intelligence by using a digital computer or a machine controlled by the digital computer, sense environment, acquire knowledge and acquire an optimal result by using the knowledge. In other words, artificial intelligence is an integrated technology of computer science that attempts to understand the essence of intelligence and to produce a new intelligent machine that can react in a similar way to human intelligence. Artificial intelligence, i.e. research on design principles and implementation methods of various intelligent machines, enables the machines to have functions of sensing, reasoning and decision.
AI technology is a comprehensive discipline, and relates to a wide range of technologies, both hardware and software. Artificial intelligence infrastructure technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and other directions.
CV is a science of how to make a machine "look at", and more specifically, it means that a camera and a computer are used to replace human eyes to recognize, track and measure targets, and further perform graphic processing, so that the computer is processed into images more suitable for human eyes to observe or transmit to an instrument to detect. As a scientific discipline, computer vision research-related theory and technology has attempted to build artificial intelligence systems that can acquire information from images or multidimensional data. Computer vision techniques typically include techniques for anti-disturbance generation, image recognition, image semantic understanding, image retrieval, OCR, video processing, video semantic understanding, video content/behavior recognition, three-dimensional object reconstruction, 3D techniques, virtual reality, augmented reality, synchronous positioning, and map construction, as well as common biometric techniques such as face recognition, fingerprint recognition, and the like.
The following describes in detail specific embodiments.
In the present embodiment, description will be made from the viewpoint of a study report generating apparatus, which may be integrated in the computer device 100 in particular.
The application provides a method for generating a research report, which comprises the following steps: acquiring a first medical literature set of a target medical field within a preset time period; extracting structural information from target contents of the first medical literature set to obtain structural literature information of the first medical literature set; and generating a research report of the target medical field according to the structured literature information.
Referring to fig. 2, a flowchart of an embodiment of a method for generating a research report according to an embodiment of the present application includes the following steps 201 to 203:
201. a first set of medical documents of a target medical field within a preset time period is acquired.
The preset time period may be weeks, half months, quarters, half years, etc. in a preset fixed year, for example, the preset time period may be 2022 years, or may be 2022 years, 10 months, or the first quarter of 2022 years, and it is understood that, according to an actual application scenario, the preset time period may also be flexibly set, for example, 10 days in 2022/10/1 to 2022/10/10, which is not limited herein specifically.
In the embodiment of the present application, the target medical field may be preset, and the target medical field may be a medical field classified according to a disease classification, for example, lung cancer, breast cancer, gastroesophageal tumor, urinary tumor, leukemia, lymphoma, or the like, or a medical field classified according to a medical science department or medical academy, for example, a digestive department, anorectal department, dentistry, hematology, or the like, which is not particularly limited herein.
It should be noted that, in the above examples, the target medical field includes only one medical field, for example, the target medical field includes a medical field corresponding to lung cancer, and the first medical document set includes at least one lung cancer related document. It will be appreciated that in an actual application scenario, the target medical field may also include a plurality of medical fields, where medical documents of the plurality of medical fields are acquired to generate a comprehensive study report of the plurality of medical fields, which is not limited herein.
In some embodiments of the present application, the first medical document set may be initial medical document information directly acquired, and in order to improve the efficiency of generating the subsequent report, in other embodiments of the present application, some screening classification may be performed on the initial medical document set first, and in the subsequent step, that is, the first medical document set may also be a medical document set obtained by performing screening classification on the basis of the acquired initial medical document, which is not limited herein specifically.
In the case that the first medical document set may also be a medical document set obtained after screening and classifying based on the collected initial medical document, the obtaining the first medical document set of the target medical field in the step 201 may further include: acquiring an initial medical literature set of a target medical field in a preset time period; screening the initial medical literature set according to the first preset literature attribute to obtain a second medical literature set; classifying the second medical literature set to obtain a third medical literature set; a first medical document set of the target medical field is extracted from the third medical document set.
In the embodiment of the application, the collection of the initial medical literature set can surround the field of target diseases, and focus on the following important progress: 1) Obtaining supervision and evaluation information of new medicines, breakthrough therapy and the like; 2) Important clinical results of news bulletin publication; 3) Literature information such as important clinical trials/basic research/review based on pubmed (internet biomedical information retrieval system); 4) Related society important guideline update conditions; 5) New clinical trials of this month are conducted. Thus, the initial medical literature set is closer to the latest medical research direction, so that the acquired medical literature is more targeted.
In the embodiment of the application, in order to generate a medical field study report capable of meeting the needs of an audience, a large number of medical documents are necessarily required to be collected as a data source for report output, so that the embodiment of the application proposes that related medical field documents can be acquired from each large document platform, each large medical conference platform and the like in a specific compliance mode (such as platform authorization and the like) periodically or aperiodically to obtain an initial medical document set, wherein the acquired medical documents in the initial medical document set can comprise the following attributes: journal name, journal influence factor, time of publication, title of document, abstract of document, author of document, body of document, etc.
According to the embodiment of the application, after a certain amount of medical documents are collected, a first preset document attribute is utilized to perform primary document screening on an initial medical document set according to preset conditions, wherein the first preset document attribute comprises at least one document attribute, when the first preset document attribute comprises a plurality of document attributes, document screening can be sequentially performed based on the plurality of document attributes, the preset conditions can be a screening sequence of the plurality of document attributes, for example, when the first preset document attribute comprises a 'publication time' and a 'periodical influence factor', documents published in the month are screened based on the document attribute 'publication time', and documents with IF more than 10 are screened based on the document attribute 'periodical influence factor' (IF), and the screened documents can be summarized into a 'document set A' (namely a second medical document set).
In other embodiments of the present application, classifying the second medical document set into a third medical document set may include: classifying the second medical literature set according to the second preset literature attribute to obtain a fourth medical literature set; respectively labeling and classifying the fourth medical document set according to the preset label to obtain a labeling and classifying result of the fifth medical document set; and determining a third medical literature set in the fifth medical literature set according to the labeling classification result of the fifth medical literature set.
For the second medical document set (such as document set a) obtained by the first-level document screening, the documents may be further classified according to a second preset document attribute, so as to obtain a fourth medical document set, wherein the second preset document attribute is different from the first preset document attribute, and for example, the second preset document attribute may be a "journal name" and/or an "article title". In the embodiment of the application, the manner of classifying the second medical literature set according to the second preset literature attribute to obtain the fourth medical literature set may be to classify the literature by adopting a preset AI algorithm, wherein the classification is based on the attribute of journal name and/or article title, and the preset AI algorithm may be a deep learning algorithm of the existing text classification, such as Long Short-Term Memory (LSTM) or feedforward neural network.
For example, for the "document set a" obtained by the first-level document screening, document classification may be further performed, where the classification is based on a second preset attribute, such as "journal name" and/or "article title"; the classification category may be the medical field, and in particular, the literature study may be of the disease type, such as: lung cancer, breast cancer, gastroesophageal tumor, urinary tumor, leukemia, lymphoma, etc. As such, different medical fields will produce different "literature sets B" (i.e., a fourth medical literature set).
For the fourth medical document set (document set B) obtained by document classification, the embodiment of the application can check the classification result by the expert (corresponding checking user terminal) corresponding to the classification attribute to correct the accuracy of the document set B. Then, aiming at the documents in the document set B under each theme, the documents can be labeled and classified one by adopting a preset AI algorithm according to preset labels, and then the subdivision results can still be checked and adjusted by the expert corresponding to the theme.
For example, for the "document set B" obtained by document classification, the expert on the corresponding subject first reviews the classification result to correct the accuracy of the "document set B". Then, for the documents in the document set B under each topic, the documents can be labeled and classified one by one according to preset labels by adopting a preset AI algorithm, wherein the preset labels comprise at least one of the following, but are not limited to: randomized controlled trials, non-randomized controlled trials, basic studies, reviews, meta analysis, clinical guidelines, and the like. For example, referring to fig. 3, in a research report generated by the present application, documents are labeled with a preset tag under the action of a preset AI algorithm, so that at least one document (such as the chinese title of the document in fig. 3) belonging to the same preset tag is shown in a centralized manner.
In other embodiments of the present application, after the labeling classification result of the fifth medical document set is obtained, the subdivision classification result herein may still be subjected to review adjustment by the expert (corresponding review user terminal) of the corresponding classification.
Specifically, determining the third medical document set in the fifth medical document set according to the labeling classification result of the fifth medical document set may further include: respectively sending the labeling classification results of the fifth medical document set to the rechecking user terminals of the corresponding classifications, so that the rechecking user terminals can confirm the accuracy of the labeling classification results of the fifth medical document set; obtaining a classification feedback result fed back by the rechecking user terminal, and updating a labeling classification result of the fifth medical document set according to the classification feedback result fed back by the rechecking user terminal; and taking the fifth medical literature set with the updated classification result as a third medical literature set.
202. And extracting the structural information of the target content of the first medical literature set to obtain the structural literature information of the first medical literature set.
The target content may be content with fixed document attributes in the medical document, and the target content may include content corresponding to a plurality of medical document attributes, for example, the target content may include one or more of content corresponding to journal names, journal influence factors, document publishing times, document titles, document summaries, document authors, document texts, and the like.
In the embodiment of the application, after the first medical literature set is acquired, structural extraction can be performed on target contents in the medical literature set to obtain structural literature information of the first medical literature set.
203. A research report of the target medical field is generated from the structured literature information of the first medical literature set.
When the preset time period is "week", the research report of the target medical field generated in step 203 is a research week report; when the preset time period is "month", the research report of the target medical field generated in step 203 is a research month report; when the preset time period is "quarter", the research report of the target medical field generated in step 203 is a research quarter report; when the preset time period is "year", the research report of the target medical field generated in step 203 is a research annual report, and so on.
In the embodiment of the application, after the structured literature information of the first medical literature set is obtained, the structured literature information can be checked manually and manually to determine the accuracy of the structured literature information.
After the accuracy is determined by manual review, when a research report in the target medical field is generated according to the structured literature information of the first medical literature set, a preset text format document (such as a text format corresponding to a word) of a research month report can be generated according to a preset report template, and the specific implementation mode is as follows: (1) structured document information obtained according to a directory arrangement; (2) Chinese & English titles of the original medical documents are displayed (translated by AI and other foreign language titles can be obtained according to release requirements); (3) Identifying whether each part of content after the report template is arranged contains preset key information, and if so, highlighting and displaying the preset key information; (4) Carrying out close-range (preset interval distance) typesetting display on the related content and the link of the traceable content in the report template; (5) And generating a structured chart according to the structured document information to obtain a research report.
Similarly, the research report generated finally can be manually checked, that is, the generated research report file can be checked by the expert of the related subject, and the main content of the check can include a translation part and a preset part, which is not limited herein.
Compared with the prior art, in the embodiment of the application, the first medical literature set of the target medical field in the preset time period is acquired; extracting structural information from target contents of the first medical literature set to obtain structural literature information of the first medical literature set; and generating a research report of the target medical field according to the structured literature information. Because the embodiment of the application aims at the document set of the target field in the preset period, the structured information of the target content in each document is extracted, and the research report of the target medical field is automatically generated instead of manually reading and understanding the medical document to make the research report in the prior art, the embodiment of the application can reduce the workload of generating the research report of the medical field and improve the efficiency of generating the research report.
In the embodiment of the present application, there may be various ways of extracting structured information from the target content of the first medical document set, and as illustrated in fig. 4, in some embodiments of the present application, the step 202 of extracting structured information from the target content of the first medical document set to obtain structured document information may further include the following steps 401 to 403:
401. And respectively taking each medical document in the first medical document set as a target medical document, and preprocessing data of target contents in the target medical document to obtain preprocessed document information.
The data preprocessing may be to remove garbage, cleaning data, clauses, etc. for the target content in the target medical document.
402. And extracting the structural information of the preprocessed literature information to obtain the structural literature information of the target medical literature.
In other embodiments of the present application, the pre-processed document information may be text data. At this time, the structured information extraction is performed on the preprocessed literature information to obtain structured literature information of the target medical literature, and may further include: text labeling is carried out on the preprocessed literature information so as to identify medical naming entities in the preprocessed literature information; linking the medical named entity to a medical entity in a preset knowledge base to obtain medical literature information after entity linking; and generating structural literature information of the target medical literature according to the medical literature information after the entity is linked.
Named Entity (Named Entity) is a person's name, organization's name, place's name, and all other entities identified by name. The medical named entity is the entity identified by the medical name in the medical literature.
In the embodiment of the application, text labeling is performed on the pretreated literature information to identify that a medical named entity in the pretreated literature information can be identified based on an entity identification model, for example, a trained named entity identification (NER) model is used, specifically, the text in the pretreated literature information is labeled by using the trained NER model, and the medical named entity in the pretreated literature information, such as PICOS (particle research object, interaction Intervention, comparison control group, outcome research result, study Design research) and the following content are identified: (1) drug information (drug name, dose, mode of administration, route of administration), (2) adaptability, (3) adverse reaction/event information, (4) SMQ of adverse reaction, (5) report type/severity/causal relationship, (6) patient information (age, sex, age at the time of occurrence of adverse reaction/event), (7) examination result, (8) diagnosis details, (9) icsr number, reporting time of the report, and the like.
Further, after identifying the medical named entity in the preprocessed literature information, in the embodiment of the present application, the medical named entity may be linked to a medical entity in a preset knowledge base to obtain the medical literature information after the entity is linked, for example, the medical named entity identified by the NER model may be linked to a medical entity in the preset knowledge base, where the preset knowledge base includes but is not limited to: a medicine dictionary, an indication dictionary, etc. containing a plurality of standard fields are established in advance.
In other embodiments of the present application, the information after the entity linking is converted into a table form, which is convenient for the user to view and use. Specifically, generating the structured literature information of the target medical literature according to the literature information after the entity linking may further include: converting the medical document information after the entity link into a table form to obtain tabular medical document information; and carrying out data post-processing on the formatted medical document information to obtain the structured document information of the target medical document.
In the embodiment of the application, the medical document information after entity linking is converted into a table form, so that the tabular medical document information is obtained, and the tabular medical document information can be finished by using a natural language processing (Natural Language Processing, NLP) technology and a machine learning algorithm, for example, through an Encoder-Decoder model.
After the tabular medical document information is obtained, the tabular medical document information may be subjected to data post-processing to obtain the structured document information of the final target medical document. Specifically, the data post-processing is performed on the formatted medical document information to obtain the structured document information of the target medical document, which may be one or more steps of de-duplication, screening, sorting and the like, so as to obtain the structured document information of the target medical document, so that the accuracy and the integrity of the obtained structured document information of the target medical document can be further ensured.
403. After each medical document in the first medical document set completes the extraction of the structural information, the structural document information of the first medical document set is obtained.
In this embodiment, the target content in the medical document is preprocessed, so that more matched data can be extracted during the subsequent extraction of the structured information, the amount of useless data is reduced, and the structured data extraction efficiency of the document is improved.
In order to facilitate better implementation of the method for generating the research report provided by the embodiment of the application, the embodiment of the application also provides a device based on the method for generating the research report. Where the meaning of the terms is the same as in the method of generating the research report described above, specific implementation details may be referred to in the description of the method embodiments.
Referring to fig. 5, fig. 5 is a schematic structural diagram of a study report generating apparatus according to an embodiment of the present application, where the study report generating apparatus 500 may include a document acquisition module 501, an information extraction module 502, and a report generating module 503, where:
a document acquisition module 501, configured to acquire a first medical document set of a target medical field within a preset time period;
the information extraction module 502 is configured to extract structured information from the target content of the first medical literature set, so as to obtain structured literature information of the first medical literature set;
And a report generating module 503, configured to generate a study report of the target medical field according to the structured document information.
In some embodiments of the present application, the information extraction module 502 is specifically configured to:
taking each medical document in the first medical document set as a target medical document, and performing data preprocessing on target contents in the target medical document to obtain preprocessed document information;
carrying out structured information extraction on the preprocessed literature information to obtain structured literature information of the target medical literature;
and after each medical document in the first medical document set finishes the extraction of the structural information, obtaining the structural document information of the first medical document set.
In other embodiments of the present application, the pre-processed document information is text data;
the information extraction module 502 is specifically configured to: text labeling is carried out on the preprocessed literature information so as to identify medical naming entities in the preprocessed literature information; linking the medical named entity to a medical entity in a preset knowledge base to obtain medical literature information after entity linking; and generating the structural literature information of the target medical literature according to the medical literature information after the entity is linked.
In other embodiments of the present application, the information extraction module 502 is specifically configured to: converting the medical document information after the entity link into a table form to obtain tabular medical document information; and carrying out data post-processing on the tabulated medical literature information to obtain the structural literature information of the target medical literature.
In other embodiments of the present application, the document acquisition module 501 is specifically configured to:
acquiring an initial medical literature set of a target medical field in a preset time period;
screening the initial medical literature set according to a first preset literature attribute to obtain a second medical literature set;
classifying the second medical literature set to obtain a third medical literature set;
a first medical document set of the target medical field is extracted from the third medical document set.
In other embodiments of the present application, the document acquisition module 501 is specifically configured to:
classifying the second medical literature set according to second preset literature attributes to obtain a fourth medical literature set;
respectively labeling and classifying the fourth medical document set according to a preset label to obtain a labeling and classifying result of a fifth medical document set;
And determining a third medical literature set in the fifth medical literature set according to the labeling classification result.
In other embodiments of the present application, the document acquisition module 501 is specifically configured to:
respectively sending the labeling classification results of the fifth medical document set to corresponding classified rechecking user terminals so that the rechecking user terminals can confirm the accuracy of the labeling classification results;
obtaining a classification feedback result fed back by the rechecking user terminal, and updating the labeling classification result according to the classification feedback result;
and taking the fifth medical literature set with the updated classification result as a third medical literature set.
In the embodiment of the present application, a first medical document set of a target medical field in a preset time period is acquired by a document acquisition module 501; the information extraction module 502 performs structural information extraction on the target content of the first medical literature set to obtain structural literature information of the first medical literature set; the report generation module 503 generates a study report of the target medical field based on the structured document information. Because the embodiment of the application aims at the document set of the target field in the preset period, the structured information of the target content in each document is extracted, and the research report of the target medical field is automatically generated instead of manually reading and understanding the medical document to make the research report in the prior art, the embodiment of the application can reduce the workload of generating the research report of the medical field and improve the efficiency of generating the research report.
The study report generating apparatus 500 in the embodiment of the present application is described above in terms of modular functional entities, and the study report generating apparatus in the embodiment of the present application is described below in terms of hardware processing, respectively.
The apparatuses shown in fig. 5 may each have a structure as shown in fig. 6, and when the study report generating apparatus 500 shown in fig. 5 has a structure as shown in fig. 6, the processor and the transceiver in fig. 6 can implement the same or similar functions as the processing module 602 and the input/output module 601 provided in the foregoing apparatus embodiment corresponding to the apparatus, and the memory in fig. 7 stores a computer program to be invoked when the processor executes the above study report generating method.
Since the computer device in the embodiment of the present application may be a terminal device or a server, when the computer device is a terminal device, fig. 7 provides a schematic diagram of an embodiment of the terminal device in the embodiment of the present application, as shown in fig. 7, for convenience of explanation, only a portion related to the embodiment of the present application is shown, and specific technical details are not disclosed, and please refer to a method portion of the embodiment of the present application. The terminal device may be any terminal device including a mobile phone, a tablet computer, a personal digital assistant (Personal Digital Assistant, PDA), a Point of Sales (POS), a vehicle-mounted computer, and the like, taking the terminal device as an example of the mobile phone:
Fig. 7 is a block diagram showing a part of the structure of a mobile phone related to a terminal device provided by an embodiment of the present application. Referring to fig. 7, the mobile phone includes: radio Frequency (RF) circuitry 1010, memory 1020, input unit 1030, display unit 1040, sensor 1050, audio circuitry 1060, wireless fidelity (wireless fidelity, wiFi) module 1070, processor 1080, and power source 1090. It will be appreciated by those skilled in the art that the handset construction shown in fig. 7 is not limiting of the handset and may include more or fewer components than shown, or may combine certain components, or a different arrangement of components.
The following describes the components of the mobile phone in detail with reference to fig. 7:
the RF circuit 1010 may be used for receiving and transmitting signals during a message or a call, and particularly, after receiving downlink information of a base station, the signal is processed by the processor 1080; in addition, the data of the design uplink is sent to the base station. Generally, RF circuitry 1010 includes, but is not limited to, an antenna, at least one amplifier, a transceiver, a coupler, a Low noise amplifier (Low NoiseAmplifier, LNA), a duplexer, and the like. In addition, the RF circuitry 1010 may also communicate with networks and other devices via wireless communications. The wireless communications may use any communication standard or protocol including, but not limited to, global system for mobile communications (GlobalSystem of Mobile communication, GSM), general Packet radio service (General Packet RadioService, GPRS), code division multiple access (Code Division Multiple Access, CDMA), wideband code division multiple access (Wideband Code Division Multiple Access, WCDMA), long term evolution (Long Term Evolution, LTE), email, short message service (Short Messaging Service, SMS), and the like.
The memory 1020 may be used to store software programs and modules that the processor 1080 performs various functional applications and data processing of the handset by executing the software programs and modules stored in the memory 1020. The memory 1020 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program (such as a sound playing function, an image playing function, etc.) required for at least one function, and the like; the storage data area may store data (such as audio data, phonebook, etc.) created according to the use of the handset, etc. In addition, memory 1020 may include high-speed random access memory and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state memory device.
The input unit 1030 may be used to receive input numeric or character information and generate key signal inputs related to user settings and function control of the handset. In particular, the input unit 1030 may include a touch panel 1031 and other input devices 1032. The touch panel 1031, also referred to as a touch screen, may collect touch operations thereon or thereabout by a user (e.g., operations of the user on the touch panel 1031 or thereabout using any suitable object or accessory such as a finger, stylus, etc.), and drive the corresponding connection device according to a predetermined program. Alternatively, the touch panel 1031 may include two parts, a touch detection device and a touch controller. The touch detection device detects the touch azimuth of a user, detects a signal brought by touch operation and transmits the signal to the touch controller; the touch controller receives touch information from the touch detection device and converts it into touch point coordinates, which are then sent to the processor 1080 and can receive commands from the processor 1080 and execute them. Further, the touch panel 1031 may be implemented in various types such as resistive, capacitive, infrared, and surface acoustic wave. The input unit 1030 may include other input devices 1032 in addition to the touch panel 1031. In particular, other input devices 1032 may include, but are not limited to, one or more of a physical keyboard, function keys (e.g., volume control keys, switch keys, etc.), a track ball, a mouse, a joystick, etc.
The display unit 1040 may be used to display information input by a user or information provided to the user and various menus of the mobile phone. The display unit 1040 may include a display panel 1041, and alternatively, the display panel 1041 may be configured in the form of a Liquid crystal display (Liquid CrystalDisplay, LCD), an Organic Light-Emitting Diode (OLED), or the like. Further, the touch panel 1031 may overlay the display panel 1041, and when the touch panel 1031 detects a touch operation thereon or thereabout, the touch panel is transferred to the processor 1080 to determine a type of touch event, and then the processor 1080 provides a corresponding visual output on the display panel 1041 according to the type of touch event. Although in fig. 7, the touch panel 1031 and the display panel 1041 are two independent components for implementing the input and output functions of the mobile phone, in some embodiments, the touch panel 1031 and the display panel 1041 may be integrated to implement the input and output functions of the mobile phone.
The handset may also include at least one sensor 1050, such as a light sensor, a motion sensor, and other sensors. Specifically, the light sensor may include an ambient light sensor and a proximity sensor, wherein the ambient light sensor may adjust the brightness of the display panel 1041 according to the brightness of ambient light, and the proximity sensor may turn off the display panel 1041 and/or the backlight when the mobile phone moves to the ear. As one of the motion sensors, the accelerometer sensor can detect the acceleration in all directions (generally three axes), and can detect the gravity and direction when stationary, and can be used for applications of recognizing the gesture of a mobile phone (such as horizontal and vertical screen switching, related games, magnetometer gesture calibration), vibration recognition related functions (such as pedometer and knocking), and the like; other sensors such as gyroscopes, barometers, hygrometers, thermometers, infrared sensors, etc. that may also be configured with the handset are not described in detail herein.
Audio circuitry 1060, a speaker 1061, and a microphone 1062 may provide an audio interface between a user and a cell phone. Audio circuit 1060 may transmit the received electrical signal after audio data conversion to speaker 1061 for conversion by speaker 1061 into an audio signal output; on the other hand, microphone 1062 converts the collected sound signals into electrical signals, which are received by audio circuit 1060 and converted into audio data, which are processed by audio data output processor 1080 for transmission to, for example, another cell phone via RF circuit 1010 or for output to memory 1020 for further processing.
Wi-Fi belongs to a short-distance wireless transmission technology, and a mobile phone can help a user to send and receive e-mails, browse web pages, access streaming media and the like through a Wi-Fi module 1070, so that wireless broadband Internet access is provided for the user. Although fig. 7 shows Wi-Fi module 1070, it is understood that it does not belong to the necessary constitution of the handset, and can be omitted entirely as required within the scope of not changing the essence of the invention.
Processor 1080 is the control center of the handset, connects the various parts of the entire handset using various interfaces and lines, and performs various functions and processes of the handset by running or executing software programs and/or modules stored in memory 1020, and invoking data stored in memory 1020, thereby performing overall monitoring of the handset. Optionally, processor 1080 may include one or more processing units; alternatively, processor 1080 may integrate an application processor primarily handling operating systems, user interfaces, applications, etc., with a modem processor primarily handling wireless communications. It will be appreciated that the modem processor described above may not be integrated into processor 1080.
The handset further includes a power source 1090 (e.g., a battery) for powering the various components, optionally in logical communication with the processor 1080 via a power management system, such as for managing charge, discharge, and power consumption by the power management system.
Although not shown, the mobile phone may further include a camera, a bluetooth module, etc., which will not be described herein.
In the embodiment of the present application, the processor 1080 included in the mobile phone further has a control unit for executing the above method flow of generating the study report executed by the study report generating unit.
When the computer device is a terminal device, fig. 8 provides a schematic diagram of an embodiment of the terminal device in the embodiment of the present application, and referring to fig. 8, fig. 8 is a schematic diagram of a server structure provided in the embodiment of the present application, where the server 1100 may have a relatively large difference due to different configurations or performances, and may include one or more central processing units (in english: central processing units, abbreviated: CPU) 1122 (for example, one or more processors) and a memory 1132, and one or more storage media 1130 (for example, one or more mass storage devices) storing application programs 1142 or data 1144. Wherein the memory 1132 and the storage medium 1130 may be transitory or persistent. The program stored on the storage medium 1130 may include one or more modules (not shown), each of which may include a series of instruction operations on a server. Still further, the central processor 1122 may be provided in communication with a storage medium 1130, executing a series of instruction operations in the storage medium 1130 on the server 1100.
The Server 1100 may also include one or more power supplies 1126, one or more wired or wireless network interfaces 1150, one or more input-output interfaces 1158, and/or one or more operating systems 1141, such as Windows Server, mac OS X, unix, linux, freeBSD, and the like.
The steps performed by the server in the above embodiments may be based on the structure of the server 1100 shown in fig. 8. For example, the steps performed by the data processing apparatus 60 or the image processing apparatus 70 shown in fig. 8 in the above-described embodiment may be based on the server structure shown in fig. 8. For example, the cpu 1122 executes the above flow of the study report generation method executed by the study report generation device by calling the instructions in the memory 1132
In the foregoing embodiments, the descriptions of the embodiments are emphasized, and for parts of one embodiment that are not described in detail, reference may be made to related descriptions of other embodiments.
It will be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the systems, apparatuses and modules described above may refer to the corresponding processes in the foregoing method embodiments, which are not repeated herein.
In the embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, and for example, the division of the modules is merely a logical function division, and there may be additional divisions when actually implemented, for example, multiple modules or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or modules, which may be in electrical, mechanical, or other forms.
The modules described as separate components may or may not be physically separate, and components shown as modules may or may not be physical modules, i.e., may be located in one place, or may be distributed over a plurality of network modules. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional module in each embodiment of the present application may be integrated into one processing module, or each module may exist alone physically, or two or more modules may be integrated into one module. The integrated modules may be implemented in hardware or in software functional modules. The integrated modules, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer readable storage medium.
In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product.
The computer program product includes one or more computer instructions. When the computer program is loaded and executed on a computer, the flow or functions according to the embodiments of the present application are fully or partially produced. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by a wired (e.g., coaxial cable, fiber optic, digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be stored by a computer or a data storage device such as a server, data center, etc. that contains an integration of one or more available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid State Disk (SSD)), etc.
The above description has been made in detail on the technical solutions provided by the embodiments of the present application, and specific examples are applied in the embodiments of the present application to illustrate the principles and implementation manners of the embodiments of the present application, where the above description of the embodiments is only for helping to understand the methods and core ideas of the embodiments of the present application; meanwhile, as for those skilled in the art, according to the idea of the embodiment of the present application, there are various changes in the specific implementation and application scope, and in summary, the present disclosure should not be construed as limiting the embodiment of the present application.

Claims (10)

1. A method of generating a research report, the method comprising:
acquiring a first medical literature set of a target medical field within a preset time period;
extracting structural information from the target content of the first medical literature set to obtain structural literature information of the first medical literature set;
and generating a research report of the target medical field according to the structured literature information.
2. The method for generating a research report of claim 1, wherein the extracting structural information from the target content of the first medical document set to obtain the structural document information of the first medical document set comprises:
Taking each medical document in the first medical document set as a target medical document, and performing data preprocessing on target contents in the target medical document to obtain preprocessed document information;
carrying out structured information extraction on the preprocessed literature information to obtain structured literature information of the target medical literature;
and after each medical document in the first medical document set finishes the extraction of the structural information, obtaining the structural document information of the first medical document set.
3. The method for generating a research report of claim 2 wherein the pre-processed literature information is text data;
the step of extracting the structural information of the preprocessed literature information to obtain the structural literature information of the target medical literature comprises the following steps:
text labeling is carried out on the preprocessed literature information so as to identify medical naming entities in the preprocessed literature information;
linking the medical named entity to a medical entity in a preset knowledge base to obtain medical literature information after entity linking;
and generating the structural literature information of the target medical literature according to the medical literature information after the entity is linked.
4. The method of generating a study report of claim 3, wherein the generating structured literature information of the target medical literature from the literature information after the entity linking comprises:
converting the medical document information after the entity link into a table form to obtain tabular medical document information;
and carrying out data post-processing on the tabulated medical literature information to obtain the structural literature information of the target medical literature.
5. The method of generating a research report of any of claims 1-4 wherein said obtaining a first set of medical documents for a target medical field over a preset period of time comprises:
acquiring an initial medical literature set of a target medical field in a preset time period;
screening the initial medical literature set according to a first preset literature attribute to obtain a second medical literature set;
classifying the second medical literature set to obtain a third medical literature set;
a first medical document set of the target medical field is extracted from the third medical document set.
6. The method of claim 5, wherein said classifying the second medical literature set to obtain a third medical literature set comprises:
Classifying the second medical literature set according to second preset literature attributes to obtain a fourth medical literature set;
respectively labeling and classifying the fourth medical document set according to a preset label to obtain a labeling and classifying result of a fifth medical document set;
and determining a third medical literature set in the fifth medical literature set according to the labeling classification result.
7. The method of generating a research report of claim 6 wherein said determining a third set of medical documents in said fifth set of medical documents based on said labeling classification result comprises:
respectively sending the labeling classification results of the fifth medical document set to corresponding classified rechecking user terminals so that the rechecking user terminals can confirm the accuracy of the labeling classification results;
obtaining a classification feedback result fed back by the rechecking user terminal, and updating the labeling classification result according to the classification feedback result;
and taking the fifth medical literature set with the updated classification result as a third medical literature set.
8. A research report generating apparatus, the apparatus comprising:
the document acquisition module is used for acquiring a first medical document set of the target medical field in a preset time period;
The information extraction module is used for extracting structural information of the target content of the first medical literature set to obtain structural literature information of the first medical literature set;
and the report generation module is used for generating a research report of the target medical field according to the structured literature information.
9. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method of generating a research report according to any of claims 1 to 7 when the computer program is executed by the processor.
10. A computer-readable storage medium, having stored thereon a computer program, the computer program being loaded by a processor to perform the steps in the method of generating a research report as claimed in any of claims 1 to 7.
CN202310868212.0A 2023-07-14 2023-07-14 Method, device, computer equipment and storage medium for generating research report Pending CN116910254A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310868212.0A CN116910254A (en) 2023-07-14 2023-07-14 Method, device, computer equipment and storage medium for generating research report

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310868212.0A CN116910254A (en) 2023-07-14 2023-07-14 Method, device, computer equipment and storage medium for generating research report

Publications (1)

Publication Number Publication Date
CN116910254A true CN116910254A (en) 2023-10-20

Family

ID=88362360

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310868212.0A Pending CN116910254A (en) 2023-07-14 2023-07-14 Method, device, computer equipment and storage medium for generating research report

Country Status (1)

Country Link
CN (1) CN116910254A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117316371A (en) * 2023-11-29 2023-12-29 杭州未名信科科技有限公司 Case report table generation method and device, electronic equipment and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117316371A (en) * 2023-11-29 2023-12-29 杭州未名信科科技有限公司 Case report table generation method and device, electronic equipment and storage medium
CN117316371B (en) * 2023-11-29 2024-04-16 杭州未名信科科技有限公司 Case report table generation method and device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
CN109947858B (en) Data processing method and device
EP3493112B1 (en) Image processing method, computer device, and computer readable storage medium
CN111177180A (en) Data query method and device and electronic equipment
CN112307311A (en) Information searching method, device, equipment and storage medium
CN104951432A (en) Information processing method and device
CN110610181A (en) Medical image identification method and device, electronic equipment and storage medium
Depeursinge et al. Mobile medical visual information retrieval
CN116910254A (en) Method, device, computer equipment and storage medium for generating research report
CN108628513A (en) A kind of method and medical team Message Entry System of medical information typing
CN104281610B (en) The method and apparatus for filtering microblogging
US10185724B2 (en) Method for sorting media content and electronic device implementing same
CN114595124B (en) Time sequence abnormity detection model evaluation method, related device and storage medium
CN108846051A (en) Data processing method, device and computer readable storage medium
WO2021098488A1 (en) Atrial fibrillation signal classification method and device, and terminal and storage medium
CN107315811B (en) Clinical pharmacy information interaction control method and equipment
CN107807940B (en) Information recommendation method and device
CN110866114B (en) Object behavior identification method and device and terminal equipment
US20230281391A1 (en) Systems and methods for biomedical information extraction, analytic generation and visual representation thereof
CN115546516A (en) Personnel gathering method and device, computer equipment and storage medium
CN114973352A (en) Face recognition method, device, equipment and storage medium
CN113380353A (en) Patient recruitment method and device in clinical research project
CN117009554A (en) Method and device for generating annotation data, computer equipment and storage medium
CN115909186B (en) Image information identification method, device, computer equipment and storage medium
CN115412726B (en) Video authenticity detection method, device and storage medium
CN114722970B (en) Multimedia detection method, device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination