WO2022160454A1 - Procédé et appareil de récupération de littérature médicale, dispositif électronique, et support de stockage - Google Patents

Procédé et appareil de récupération de littérature médicale, dispositif électronique, et support de stockage Download PDF

Info

Publication number
WO2022160454A1
WO2022160454A1 PCT/CN2021/083825 CN2021083825W WO2022160454A1 WO 2022160454 A1 WO2022160454 A1 WO 2022160454A1 CN 2021083825 W CN2021083825 W CN 2021083825W WO 2022160454 A1 WO2022160454 A1 WO 2022160454A1
Authority
WO
WIPO (PCT)
Prior art keywords
medical
entity
document
retrieval
extracted
Prior art date
Application number
PCT/CN2021/083825
Other languages
English (en)
Chinese (zh)
Inventor
马文佳
倪渊
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2022160454A1 publication Critical patent/WO2022160454A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/70ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients

Definitions

  • the present application relates to the field of natural language processing, and in particular, to a method, apparatus, electronic device and computer-readable storage medium for retrieving medical documents.
  • the inventor realizes that the current retrieval method is generally based on traditional keyword retrieval, such as entering a total of two keywords of cancer treatment in the retrieval box of the medical literature retrieval system, and comparing the full text or title with the medical literature included in the entire system, etc. , to retrieve medical literature related to cancer treatment keywords.
  • keyword retrieval needs to traverse each medical document in the system in a short time, it consumes computing resources, and the retrieval accuracy of keyword retrieval needs to be further improved.
  • a search method for medical literature including:
  • Receive a medical document set perform part-of-speech recognition on the medical document set, and obtain a medical entity to be extracted set;
  • sorting is performed on the medical document set to be sorted to obtain medical retrieval documents corresponding to the search terms.
  • a retrieval device for medical documents includes:
  • a part-of-speech recognition module is used to receive a medical document set, perform part-of-speech recognition on the medical document set, and obtain a medical entity to be extracted set;
  • a medical entity relationship generation module configured to extract a medical entity set to be optimized from the medical entity to be extracted set, optimize the medical entity set to be optimized to obtain a medical entity set, and use the medical entity set to generate a medical entity relationship;
  • the document storage module is used to perform information fusion on the medical entity set and the medical entity relationship to obtain a medical knowledge atlas, and store the medical knowledge atlas and the medical document set in a pre-built set according to the corresponding relationship. in the medical literature retrieval system;
  • the document retrieval module is used to receive the retrieval words input by the user, extract the retrieval entity set from the retrieval words, retrieve the medical document set to be sorted from the medical document retrieval system according to the retrieval entity set, and obtain the medical document set to be sorted according to the predetermined
  • the trained medical document sorting model performs sorting on the medical document set to be sorted to obtain medical retrieval documents corresponding to the search terms.
  • An electronic device comprising:
  • a processor that executes the instructions stored in the memory to achieve the following steps:
  • Receive a medical document set perform part-of-speech recognition on the medical document set, and obtain a medical entity to be extracted set;
  • sorting is performed on the medical document set to be sorted to obtain medical retrieval documents corresponding to the search terms.
  • a computer-readable storage medium comprising a storage data area and a storage program area, the storage data area stores data created, and the storage program area stores a computer program; wherein, the computer program is executed by a processor The following steps are implemented:
  • Receive a medical document set perform part-of-speech recognition on the medical document set, and obtain a medical entity to be extracted set;
  • sorting is performed on the medical document set to be sorted to obtain medical retrieval documents corresponding to the search terms.
  • the present application can solve the problems that computing resources are consumed during the retrieval of medical documents and the retrieval accuracy is not high.
  • FIG. 1 is a schematic flowchart of a method for retrieving medical documents according to an embodiment of the present application
  • FIG. 2 is a schematic flowchart of S2 in a method for retrieving medical documents provided by an embodiment of the present application
  • FIG. 3 is a schematic block diagram of a retrieval apparatus for medical documents provided by an embodiment of the present application.
  • FIG. 4 is a schematic diagram of the internal structure of an electronic device for implementing a method for retrieving medical documents according to an embodiment of the present application
  • the embodiments of the present application provide a method for retrieving medical documents
  • the execution subject of the method for retrieving medical documents includes but is not limited to a server, a terminal, and other electronic devices that can be configured to execute the method provided by the embodiments of the present application. at least one.
  • the retrieval method of the medical document can be executed by software or hardware installed in the terminal device or the server device, and the software can be a blockchain platform.
  • the server includes but is not limited to: a single server, a server cluster, a cloud server or a cloud server cluster, and the like.
  • the retrieval method of the medical document includes:
  • the medical document collection includes academic papers, journal conferences, etc. about disease description, diagnosis process, etc., which are organized in advance by medical personnel or obtained from the network by using a crawler program.
  • the medical document collection includes Medical document A: "Lymphoma originates from a malignant tumor of the lymphohematopoietic system. The main symptoms are painless lymphadenopathy, hepatosplenomegaly, and involvement of all tissues and organs in the body, accompanied by fever, night sweats, weight loss, and itching.”
  • performing part-of-speech recognition on the medical document set to obtain a medical entity to be extracted set includes: performing denoising, stop word removal and word segmentation processing on the medical document set to obtain a part-of-speech document set to be identified; using The pre-trained part-of-speech recognition model performs part-of-speech recognition on the part-of-speech document set to be identified to obtain the set of medical entities to be extracted.
  • the medical document set may contain non-text data, such as hyperlinks, webpage tags, etc., which will affect the construction of medical entities, it is necessary to perform denoising processing on the medical document set.
  • the denoising process may use a regular expression constructed based on a programming language to complete the removal of noises such as numbers, emoticons, and special symbols, such as URL, "@", "#", and the like.
  • the embodiment of the present application uses the stuttering word segmentation method to perform word segmentation on the denoised medical documents to obtain multiple sets of medical word sets corresponding to the medical document sets.
  • stop words refer to words that have no actual meaning and have no effect on the construction of medical entities. Since stop words appear frequently, including commonly used pronouns, prepositions, etc. The computational burden even affects the accuracy of retrieving medical documents, so it is necessary to perform stop word removal processing on the multiple sets of medical word sets.
  • the stop word table filtering method can be used to remove stop words, and a pre-built stop word table is used to match each medical word in the multiple sets of medical words one by one. If successful, the medical word is judged to be a stop word, and the word is deleted.
  • using the pre-trained part-of-speech recognition model to perform part-of-speech recognition on the part-of-speech to-be-recognized document set to obtain the medical entity to-be-extracted set including:
  • Step A Build and train a part-of-speech recognition model, wherein the part-of-speech recognition model includes a feature conversion layer and a part-of-speech recognition layer.
  • the building and training part-of-speech recognition model includes: receiving a training text set and a part-of-speech tag set corresponding to the training text set, performing replacement and masking operations on the training text set, and obtaining a semi-masked text set; constructing a part of speech Recognition model, using the part-of-speech recognition model to calculate the part-of-speech prediction set of the semi-covered text set; calculating the difference value between the part-of-speech prediction set and the part-of-speech tag set, when the difference value is greater than or equal to a preset threshold, The internal parameters of the part-of-speech recognition model are adjusted until the difference value is less than the preset threshold, and a trained part-of-speech recognition model is obtained.
  • the part-of-speech recognition model mainly includes a feature conversion layer and a part-of-speech recognition layer.
  • the feature conversion layer is composed of a BERT model (Bidirectional Encoder Representations from Transformers), and the part-of-speech recognition layer is composed of a CRF (Conditional Random Field) model.
  • the training text set is medical document data obtained by crawling from the network in advance and manually cleaned by means such as crawler.
  • the part-of-speech tag set records the part-of-speech tags of each word in the training text set.
  • the training text set includes the training text a: "Organ tissue abnormally proliferates, showing malignant infiltration and growth, which occurs in epithelial tissues and is called cancer".
  • the part-of-speech tag set records that the part-of-speech of each word in the training text a is: "organ (noun) tissue (noun) abnormality (verb)".
  • the training text a "Organ tissue abnormally proliferates, showing malignant infiltration growth, and the occurrence of epithelial tissue is called cancer", select “proliferation”, if "proliferation” is masked, the training text a becomes: " Organ tissue abnormality [mask], showing malignant infiltration growth, occurs in epithelial tissue is called cancer.”
  • calculating the part-of-speech prediction set of the semi-occluded text set using the part-of-speech recognition model includes: using the feature conversion layer to convert the semi-occluded text set into a semi-occluded vector set; using the part-of-speech recognition layer, perform part-of-speech recognition on the semi-masked vector set, and obtain the part-of-speech prediction set.
  • a 15-layer bidirectional encoding (encoer-decoder) BERT model is used to convert the semi-masked text set into a semi-masked vector set.
  • the bidirectional encoding can be a disclosed feature extraction neural network.
  • the CRF model is used to calculate part-of-speech probability values of different parts of speech corresponding to each word, and select the part-of-speech corresponding to the largest part-of-speech probability value, so as to achieve the purpose of part-of-speech prediction.
  • the Chebyshev algorithm may be used to calculate the difference value between the part-of-speech prediction set and the part-of-speech tag set, and when the difference value is less than the preset threshold, the trained part-of-speech recognition model is obtained.
  • Step B Receive the part-of-speech to-be-recognized document set, use the feature conversion layer to convert the part-of-speech to-be-recognized document set into a to-be-recognized document feature set, and use the part-of-speech recognition layer to perform part-of-speech on the to-be-recognized document feature set Identify, and obtain the medical entity to be extracted set.
  • the embodiment of the present application uses the part-of-speech to be identified document set to be converted into the to-be-identified document feature set, and the part-of-speech to identify the to-be-identified document feature set to obtain the medical entity to be extracted set, which is the same as the above model training steps, I won't go into details here.
  • the medical entity to be extracted set can be obtained. It can be seen that the medical entity to be extracted set is composed of several medical words with part-of-speech information.
  • the set of medical entities to be extracted includes entities such as persons, places, organizations, and times involved in the medical document collection, for example, in the medical document B in the medical document collection: "Leukemia is a malignant clonal disease of hematopoietic stem cells. ", the text entity set to be optimized includes: “leukemia”, “hematopoietic stem cells”, “disease”, etc.
  • the set of medical entities to be optimized obtained by extracting the set of medical entities to be extracted includes:
  • the entity probability function P(W i ) is:
  • W 1 , W 2 ,...,W i are different medical words in the medical entity to be extracted set
  • i is the serial number
  • m is the number of the medical entity to be extracted set
  • the optimizing the medical entity set to be optimized to obtain the medical entity set includes: calculating the entity ranking value of the medical entity set to be optimized, and cleaning the medical entity set to be optimized by using the entity ranking value to obtain the medical entity set.
  • P(s i ) is the entity matrix corresponding to the medical entity si to be optimized in the medical entity set to be optimized
  • T is the entity ranking value corresponding to the medical entity si to be optimized
  • is calculated by using the pagerank algorithm
  • I is a coordination matrix with a value of 1 corresponding to the entity matrix.
  • the medical entity to be optimized After the entity ranking value of each medical entity to be optimized is obtained by calculation, the medical entity to be optimized whose entity ranking value is less than the preset threshold is removed, thereby obtaining the medical entity set.
  • the BERT model is used to realize the entity relationship generation of the medical entity set.
  • the use of the medical entity set to generate the medical entity relationship includes: combining the medical entity set and the medical document set Input into the BERT model that has been trained, use the BERT model to extract the text entity set to be proofread from the medical document set, perform proofreading on the text entity set to be proofread and the medical entity set, and obtain a proofreading entity set, using The BERT model and the proofreading entity set are extracted to obtain the medical entity relationship.
  • the BERT model can perform feature transformation in S1, and can also be used to extract medical entity relationships.
  • the differences in the functions are mainly based on different training methods for training the BERT model.
  • the text entity set to be proofread is extracted from the medical document set by the BERT model, and there may be deviations. Therefore, it is necessary to use the medical entity set to perform correction.
  • the BERT is further used.
  • the model is extracted with reference to the proofreading entity set to obtain the medical entity relationship.
  • the medical document collection includes medical document A, medical document B, medical document C, etc.
  • medical document A has 6 medical knowledge graphs
  • medical document B has 9 medical knowledge graphs
  • medical document C has 18 medical knowledge graphs.
  • Each medical document in the medical document collection and the corresponding medical knowledge graph are stored in the medical document retrieval system, respectively.
  • the medical document retrieval system may use a database constructed based on Mysql language.
  • S6 Receive a search term input by a user, extract a search entity set from the search term, and retrieve a medical document set to be sorted from the medical document retrieval system according to the search entity set.
  • extracting the search entity set from the search terms includes: first removing the keywords of the search terms, and The retrieval entity set is obtained by performing word segmentation on the retrieval words after the keywords have been removed. For example, “erythrocyte lesions in leukemia” is performed to obtain search entities such as “leukemia”, “erythrocytes”, and “lesions” after removing keywords and word segmentation.
  • the retrieval of the medical document set to be sorted from the medical document retrieval system according to the retrieval entity set includes: matching the medical knowledge graph corresponding to the retrieval entity from the medical knowledge graph set , using the matched medical knowledge graph to retrieve one or more corresponding medical documents from the medical document set to obtain the to-be-sorted medical document set.
  • the search term input by the above user is "erythrocyte lesions in leukemia", and three sets of search entities of "leukemia”, “erythrocytes” and “lesions” are obtained.
  • - Lesion-leukemia "red blood cell-lesion-leukemia”, “erythrocyte-lesion-decreased cell mass” and other medical knowledge maps, thus according to “erythrocyte-lesion-leukemia”, “red blood cell-lesion-leukemia”, “erythrocyte-lesion-leukemia”, “erythrocyte-lesion-leukemia”
  • the medical knowledge map such as -pathological changes-cell mass reduction", and the related medical documents are matched, that is, the medical document set to be sorted.
  • the medical document sorting model is constructed by using the BioBert model (a pre-trained biomedical language representation model for biomedical text mining, referred to as BioBert), wherein the BioBert model is obtained by using the training data set in the biomedical field. Bert model. Further, the training datasets in the biomedical field include the currently published NER datasets and the PubMed corpus.
  • BioBert a pre-trained biomedical language representation model for biomedical text mining
  • the trained medical document sorting model When the trained medical document sorting model is obtained, use the medical document sorting model to calculate the score of each medical document to be sorted in the medical document set to be sorted, and use the score to rearrange the medical document set to be sorted, and A preset number of medical documents to be sorted are selected therefrom, that is, medical search documents corresponding to the search terms.
  • a medical knowledge atlas set is constructed by performing part-of-speech recognition, entity extraction and information fusion on the received medical document set, and further, the medical document set and the medical knowledge atlas set are stored
  • the embodiment of the present application can directly index the medical document set to be sorted corresponding to the search term through the corresponding relationship between the medical knowledge atlas and the search term, rather than the search term.
  • the computing resources in the medical document retrieval process are saved.
  • the medical document sorting model sorting is performed on the medical document set to be sorted, and medical retrieval documents corresponding to the search terms are obtained. Therefore, it can be seen that in the embodiment of the present application, the medical documents to be sorted are further analyzed through the model. The importance of each medical document to be sorted is concentrated, so that the more important medical documents are pushed to users, and the retrieval accuracy of medical documents is improved. Therefore, the medical document retrieval method, device and computer-readable storage medium proposed in the present application can solve the problems of consuming computing resources and low retrieval accuracy during medical document retrieval.
  • FIG. 3 it is a schematic block diagram of a retrieval apparatus for medical documents of the present application.
  • the retrieval apparatus 100 for medical documents described in the present application may be installed in an electronic device.
  • the medical document retrieval device may include a part-of-speech recognition module 101 , a medical entity relationship generation module 102 , a document storage module 103 and a document retrieval module 104 .
  • the modules described in the present invention can also be called units, which refer to a series of computer program segments that can be executed by the electronic device processor and can perform fixed functions, and are stored in the memory of the electronic device.
  • each module/unit is as follows:
  • the part-of-speech recognition module 101 is configured to receive a medical document set, perform part-of-speech recognition on the medical document set, and obtain a medical entity to be extracted set;
  • the medical entity relationship generation module 102 is configured to extract a medical entity set to be optimized from the medical entity to be extracted set, optimize the medical entity set to be optimized, obtain a medical entity set, and use the medical entity set to generate a medical entity set relation;
  • the document storage module 103 is configured to perform information fusion on the medical entity set and the medical entity relationship to obtain a medical knowledge atlas set, and store the medical knowledge atlas set and the medical document set in the corresponding relationship.
  • a pre-built medical literature retrieval system
  • the document retrieval module 104 is configured to receive a retrieval term input by a user, extract a retrieval entity set from the retrieval term, and retrieve a medical literature set to be sorted from the medical literature retrieval system according to the retrieval entity set. , according to the pre-trained medical document sorting model, perform sorting on the medical document set to be sorted to obtain medical retrieval documents corresponding to the search terms.
  • Each module in the apparatus 100 for retrieving medical documents provided by the embodiments of the present application can use the same means as the above-mentioned retrieval method for medical documents, and the specific implementation steps will not be repeated here.
  • FIG. 4 it is a schematic structural diagram of an electronic device implementing the medical document retrieval method of the present application.
  • the electronic device 1 may include a processor 10, a memory 11 and a bus, and may also include a computer program stored in the memory 11 and executable on the processor 10, such as a retrieval program 12 for medical documents.
  • the memory 11 includes at least one type of readable storage medium, and the readable storage medium includes flash memory, mobile hard disk, multimedia card, card-type memory (for example: SD or DX memory, etc.), magnetic memory, magnetic disk, CD etc.
  • the memory 11 may be an internal storage unit of the electronic device 1 in some embodiments, such as a mobile hard disk of the electronic device 1 .
  • the memory 11 may also be an external storage device of the electronic device 1, such as a pluggable mobile hard disk, a smart memory card (Smart Media Card, SMC), a secure digital (Secure Digital) equipped on the electronic device 1. , SD) card, flash memory card (Flash Card), etc.
  • the memory 11 may also include both an internal storage unit of the electronic device 1 and an external storage device.
  • the memory 11 can be used not only to store application software installed in the electronic device 1 and various data, such as codes of the retrieval program 12 of medical documents, etc., but also to temporarily store data that has been output or will be output.
  • the processor 10 may be composed of integrated circuits, for example, may be composed of a single packaged integrated circuit, or may be composed of multiple integrated circuits packaged with the same function or different functions, including one or more integrated circuits.
  • Central Processing Unit CPU
  • microprocessor digital processing chip
  • graphics processor and combination of various control chips, etc.
  • the processor 10 is the control core (Control Unit) of the electronic device, and uses various interfaces and lines to connect the various components of the entire electronic device, by running or executing the program or module (for example, executing the program) stored in the memory 11. retrieval program of medical documents, etc.), and call data stored in the memory 11 to perform various functions of the electronic device 1 and process data.
  • the bus may be a peripheral component interconnect (PCI for short) bus or an extended industry standard architecture (Extended industry standard architecture, EISA for short) bus or the like.
  • PCI peripheral component interconnect
  • EISA Extended industry standard architecture
  • the bus can be divided into address bus, data bus, control bus and so on.
  • the bus is configured to implement connection communication between the memory 11 and at least one processor 10 and the like.
  • FIG. 4 only shows an electronic device with components. Those skilled in the art can understand that the structure shown in FIG. 4 does not constitute a limitation on the electronic device 1, and may include fewer or more components than those shown in the drawings. components, or a combination of certain components, or a different arrangement of components.
  • the electronic device 1 may also include a power source (such as a battery) for powering the various components, preferably, the power source may be logically connected to the at least one processor 10 through a power management device, so that the power source can be managed by the power source.
  • the device implements functions such as charge management, discharge management, and power consumption management.
  • the power source may also include one or more DC or AC power sources, recharging devices, power failure detection circuits, power converters or inverters, power status indicators, and any other components.
  • the electronic device 1 may further include a variety of sensors, Bluetooth modules, Wi-Fi modules, etc., which will not be repeated here.
  • the electronic device 1 may also include a network interface, optionally, the network interface may include a wired interface and/or a wireless interface (such as a WI-FI interface, a Bluetooth interface, etc.), which is usually used in the electronic device 1 Establish a communication connection with other electronic devices.
  • a network interface optionally, the network interface may include a wired interface and/or a wireless interface (such as a WI-FI interface, a Bluetooth interface, etc.), which is usually used in the electronic device 1 Establish a communication connection with other electronic devices.
  • the electronic device 1 may further include a user interface, and the user interface may be a display (Display), an input unit (eg, a keyboard (Keyboard)), optionally, the user interface may also be a standard wired interface or a wireless interface.
  • the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode, organic light-emitting diode) touch device, and the like.
  • the display may also be appropriately called a display screen or a display unit, which is used for displaying information processed in the electronic device 1 and for displaying a visualized user interface.
  • the retrieval program 12 for medical documents stored in the memory 11 in the electronic device 1 is a combination of multiple instructions, and when running in the processor 10, it can realize:
  • Receive a medical document set perform part-of-speech recognition on the medical document set, and obtain a medical entity to be extracted set;
  • sorting is performed on the medical document set to be sorted to obtain medical retrieval documents corresponding to the search terms.
  • the modules/units integrated in the electronic device 1 may be stored in a computer-readable storage medium.
  • the computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, U disk, removable hard disk, magnetic disk, optical disk, computer memory, read-only memory (ROM, Read-Only Memory) .
  • the computer-usable storage medium may mainly include a stored program area and a stored data area, wherein the stored program area may store an operating system, an application program required by at least one function, and the like; using the created data, etc.
  • the present application also provides a computer-readable storage medium.
  • the computer-readable storage medium may be volatile or non-volatile.
  • the readable storage medium stores a computer program, and the computer program is stored in the When executed by the processor of the electronic device, it can achieve:
  • Receive a medical document set perform part-of-speech recognition on the medical document set, and obtain a medical entity to be extracted set;
  • sorting is performed on the medical document set to be sorted to obtain medical retrieval documents corresponding to the search terms.
  • modules described as separate components may or may not be physically separated, and the components shown as modules may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution in this embodiment.
  • each functional module in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit.
  • the above-mentioned integrated units can be implemented in the form of hardware, or can be implemented in the form of hardware plus software function modules.
  • the blockchain referred to in this application is a new application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanism, and encryption algorithm.
  • Blockchain essentially a decentralized database, is a series of data blocks associated with cryptographic methods. Each data block contains a batch of network transaction information to verify its Validity of information (anti-counterfeiting) and generation of the next block.
  • the blockchain can include the underlying platform of the blockchain, the platform product service layer, and the application service layer.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • Evolutionary Computation (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Biophysics (AREA)
  • Public Health (AREA)
  • Molecular Biology (AREA)
  • Medical Informatics (AREA)
  • Pathology (AREA)
  • Animal Behavior & Ethology (AREA)
  • Epidemiology (AREA)
  • Primary Health Care (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Medical Treatment And Welfare Office Work (AREA)

Abstract

L'invention concerne un procédé de récupération de littérature médicale, un appareil de récupération de littérature médicale, un dispositif électronique, et un support de stockage. Le procédé comporte les étapes consistant à: effectuer une reconnaissance de parties du discours sur un ensemble de littérature médicale pour obtenir un ensemble d'entités médicales à extraire (S1); extraire un ensemble de graphes de connaissances médicales de l'ensemble d'entités médicales à extraire (S2-S4); stocker l'ensemble de graphes de connaissances médicales et l'ensemble de littérature médicale dans un système préconstruit de récupération de littérature médicale selon une correspondance (S5); recevoir un terme de recherche saisi par un utilisateur, extraire du terme de recherche un ensemble d'entités de recherche, et récupérer le système de récupération de littérature médicale selon l'ensemble d'entités de recherche pour obtenir un ensemble de littérature médicale à trier (S6); et trier, selon un modèle pré-entraîné de tri de littérature médicale, l'ensemble de littérature médicale à trier pour obtenir une littérature médicale de récupération correspondant au terme de recherche. Le procédé peut résoudre les problèmes de consommation de ressources informatiques et de faible précision de récupération pendant une récupération de littérature médicale.
PCT/CN2021/083825 2021-01-28 2021-03-30 Procédé et appareil de récupération de littérature médicale, dispositif électronique, et support de stockage WO2022160454A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110119598.6 2021-01-28
CN202110119598.6A CN112885478B (zh) 2021-01-28 2021-01-28 医疗文献的检索方法、装置、电子设备及存储介质

Publications (1)

Publication Number Publication Date
WO2022160454A1 true WO2022160454A1 (fr) 2022-08-04

Family

ID=76053086

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/083825 WO2022160454A1 (fr) 2021-01-28 2021-03-30 Procédé et appareil de récupération de littérature médicale, dispositif électronique, et support de stockage

Country Status (2)

Country Link
CN (1) CN112885478B (fr)
WO (1) WO2022160454A1 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117316371A (zh) * 2023-11-29 2023-12-29 杭州未名信科科技有限公司 病例报告表的生成方法、装置、电子设备和存储介质

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114840684A (zh) * 2022-04-25 2022-08-02 平安普惠企业管理有限公司 基于医疗实体的图谱构建方法、装置、设备及存储介质
CN116110594B (zh) * 2022-12-02 2024-05-07 北京交通大学 基于关联文献的医学知识图谱的知识评价方法及系统
CN115658851B (zh) * 2022-12-27 2023-04-04 药融云数字科技(成都)有限公司 基于主题的医学文献检索方法、系统、存储介质及终端
CN116340468A (zh) * 2023-05-12 2023-06-27 华北理工大学 主题文献检索预测方法

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109509556A (zh) * 2018-11-09 2019-03-22 天津开心生活科技有限公司 知识图谱生成方法、装置、电子设备及计算机可读介质
WO2019071661A1 (fr) * 2017-10-09 2019-04-18 平安科技(深圳)有限公司 Appareil électronique, procédé d'identification de nom d'entité de texte médical, système et support d'enregistrement
CN109885660A (zh) * 2019-02-22 2019-06-14 上海乐言信息科技有限公司 一种知识图谱赋能的基于信息检索的问答系统和方法
CN110032648A (zh) * 2019-03-19 2019-07-19 微医云(杭州)控股有限公司 一种基于医学领域实体的病历结构化解析方法

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106990973A (zh) * 2017-05-25 2017-07-28 海南大学 一种基于数据图谱、信息图谱和知识图谱架构的价值驱动的服务软件开发方法
CN111930962A (zh) * 2020-09-02 2020-11-13 平安国际智慧城市科技股份有限公司 文献数据价值评估方法、装置、电子设备及存储介质

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019071661A1 (fr) * 2017-10-09 2019-04-18 平安科技(深圳)有限公司 Appareil électronique, procédé d'identification de nom d'entité de texte médical, système et support d'enregistrement
CN109509556A (zh) * 2018-11-09 2019-03-22 天津开心生活科技有限公司 知识图谱生成方法、装置、电子设备及计算机可读介质
CN109885660A (zh) * 2019-02-22 2019-06-14 上海乐言信息科技有限公司 一种知识图谱赋能的基于信息检索的问答系统和方法
CN110032648A (zh) * 2019-03-19 2019-07-19 微医云(杭州)控股有限公司 一种基于医学领域实体的病历结构化解析方法

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117316371A (zh) * 2023-11-29 2023-12-29 杭州未名信科科技有限公司 病例报告表的生成方法、装置、电子设备和存储介质
CN117316371B (zh) * 2023-11-29 2024-04-16 杭州未名信科科技有限公司 病例报告表的生成方法、装置、电子设备和存储介质

Also Published As

Publication number Publication date
CN112885478A (zh) 2021-06-01
CN112885478B (zh) 2023-07-07

Similar Documents

Publication Publication Date Title
WO2022160454A1 (fr) Procédé et appareil de récupération de littérature médicale, dispositif électronique, et support de stockage
CN111414393B (zh) 一种基于医学知识图谱的语义相似病例检索方法及设备
CN109906449B (zh) 一种查找方法及装置
CN113707297B (zh) 医疗数据的处理方法、装置、设备及存储介质
WO2022121171A1 (fr) Procédé et appareil de mise en correspondance de textes similaires, ainsi que dispositif électronique et support de stockage informatique
CN112395395B (zh) 文本关键词提取方法、装置、设备及存储介质
WO2022222943A1 (fr) Procédé et appareil de recommandation de département, dispositif électronique et support de stockage
US11294938B2 (en) Generalized distributed framework for parallel search and retrieval of unstructured and structured patient data across zones with hierarchical ranking
CN111126065A (zh) 一种自然语言文本的信息提取方法及装置
CN111696635A (zh) 疾病名称标准化方法及装置
US20190354596A1 (en) Similarity matching systems and methods for record linkage
WO2022222942A1 (fr) Procédé et appareil pour générer un enregistrement de questions et de réponses, dispositif électronique et support de stockage
US20210183526A1 (en) Unsupervised taxonomy extraction from medical clinical trials
WO2022134355A1 (fr) Procédé et appareil de recherche basés sur une invite de mots-clé, dispositif électronique et support de stockage
CN110334343B (zh) 一种合同中个人隐私信息抽取的方法和系统
CN113345577A (zh) 诊疗辅助信息的生成方法、模型训练方法、装置、设备以及存储介质
CN112507230B (zh) 基于浏览器的网页推荐方法、装置、电子设备及存储介质
CN114330335B (zh) 关键词抽取方法、装置、设备及存储介质
CN115995281A (zh) 一种基于数据治理的专病数据库的数据检索方法及装置
CN112784589A (zh) 一种训练样本的生成方法、装置及电子设备
WO2022227171A1 (fr) Procédé et appareil d'extraction d'informations clés, dispositif électronique et support
WO2022121152A1 (fr) Procédé de dialogue intelligent, appareil, dispositif électronique et support de stockage
CN113343680B (zh) 一种基于多类型病历文本的结构化信息提取方法
CN114141384A (zh) 用于检索医学数据的方法、设备和介质
WO2022141860A1 (fr) Procédé et appareil de déduplication de texte, dispositif électronique et support de stockage lisible par ordinateur

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21922053

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21922053

Country of ref document: EP

Kind code of ref document: A1