CN117350291A

CN117350291A - Electronic medical record named entity identification method, device, equipment and storage medium

Info

Publication number: CN117350291A
Application number: CN202311309766.3A
Authority: CN
Inventors: 张兆
Original assignee: Ping An Life Insurance Company of China Ltd
Current assignee: Ping An Life Insurance Company of China Ltd
Priority date: 2023-10-10
Filing date: 2023-10-10
Publication date: 2024-01-05

Abstract

The present invention relates to the field of digital medical treatment, and in particular, to a method, an apparatus, a device, and a storage medium for identifying a named entity of an electronic medical record. Acquiring real-time text data of an electronic medical record, processing the real-time text data by utilizing a contrast learning mode in a pre-trained language characterization model to obtain token semantic feature information, carrying out label semantic processing on the real-time text data based on a preset indication label to obtain a plurality of label semantic feature information, carrying out relevance calculation on each token semantic feature information and the plurality of label semantic feature information based on similarity to obtain label semantic feature information corresponding to the token semantic feature information, and carrying out named entity identification on the real-time text data to extract named entities corresponding to preset entity types. The invention fully learns token semantic feature information of the text, improves generalization of the model after migration, integrates semantic knowledge of the label, and performs relevance calculation, thereby improving efficiency and accuracy of named entity identification.

Description

Electronic medical record named entity identification method, device, equipment and storage medium

Technical Field

The invention relates to the field of artificial intelligence and digital medical treatment, in particular to a method, a device, equipment and a storage medium for identifying a named entity of an electronic medical record.

Background

With the rapid development and application of hospital information systems, large-scale electronic medical record data is accumulated in medical institutions. The data are important records generated in the hospital visit and treatment process of patients, and comprise various types of data such as medical record texts, medical charts, medical images and the like, so that medical staff can conveniently and rapidly use the medical data analysis system. The named entity identification work of the electronic medical record is an upstream work of medical information processing. Named entity recognition refers to identifying entities in text that have a particular meaning and categorizing them into predefined categories, such as diseases, treatments, symptoms, medicines, and the like.

Based on the complexity of medical scenes and limited labeling corpus, the prior art is solved by utilizing the thought of migration learning, pre-training is performed from a source data field (source domains), and then migration is performed to a target data field (target domains) to perform finishing. However, in practice, medical terms and expression modes between different professions and different hospitals are various, and data privacy problems prevent the different professions or hospitals from sharing data, so that generalization or migration effects of current practice are very limited in medical scenes, and particularly generalization capability on an unseen target domain (which is greatly different from a source data domain) is affected. Therefore, how to effectively improve generalization of the model after migration and realize accuracy of identifying the named entities of the electronic medical record has become a technical problem to be solved urgently by those skilled in the art.

Disclosure of Invention

Based on the above, it is necessary to provide a method, a device and a storage medium for identifying a named entity of an electronic medical record, so as to solve the problem that the prior art cannot effectively improve generalization of a model after migration and realize accuracy of identifying the named entity of the electronic medical record.

A first aspect of an embodiment of the present application provides a method for identifying a named entity of an electronic medical record, where the method includes:

acquiring real-time text data of an electronic medical record, and processing the real-time text data by utilizing a contrast learning mode in a pre-trained language characterization model to obtain token semantic feature information corresponding to the real-time text data;

performing label semanteme processing on the real-time text data based on a preset indication label to obtain a plurality of label semantic feature information corresponding to the real-time text data;

performing relevance calculation on each token semantic feature information and the plurality of tag semantic feature information based on the similarity to obtain tag semantic feature information corresponding to the token semantic feature information;

and carrying out named entity recognition on the real-time text data according to the label semantic feature information corresponding to the token semantic feature information so as to extract named entities corresponding to the preset entity types.

A second aspect of the embodiments of the present application provides an electronic medical record named entity recognition device, including:

the acquisition module is used for acquiring real-time text data of the electronic medical record, and processing the real-time text data by utilizing a contrast learning mode in a pre-trained language characterization model to obtain token semantic feature information corresponding to the real-time text data;

the processing module is used for carrying out label semantezation processing on the real-time text data based on a preset indication label to obtain a plurality of label semantic feature information corresponding to the real-time text data;

the calculating module is used for carrying out relevance calculation on each token semantic feature information and the plurality of tag semantic feature information based on the similarity to obtain tag semantic feature information corresponding to the token semantic feature information;

and the extraction module is used for carrying out named entity identification on the real-time text data according to the label semantic feature information corresponding to the token semantic feature information so as to extract the named entity corresponding to the preset entity type.

In a third aspect, an embodiment of the present invention provides a computer device, where the computer device includes a processor, a memory, and a computer program stored in the memory and capable of running on the processor, and when the processor executes the computer program, the electronic medical record naming entity identifying method according to the first aspect is implemented.

In a fourth aspect, an embodiment of the present invention provides a computer readable storage medium, where a computer program is stored, where the computer program is executed by a processor to implement the electronic medical record named entity identifying method according to the first aspect.

In summary, the invention provides a method, a device, equipment and a storage medium for identifying a named entity of an electronic medical record, which are used for acquiring real-time text data of the electronic medical record, processing the real-time text data by utilizing a comparison learning mode in a pre-trained language characterization model to obtain token semantic feature information corresponding to the real-time text data, carrying out label semantic processing on the real-time text data based on a preset indication label to obtain a plurality of label semantic feature information corresponding to the real-time text data, carrying out relevance calculation on each token semantic feature information and the plurality of label semantic feature information based on similarity to obtain label semantic feature information corresponding to the token semantic feature information, and further carrying out named entity identification on the real-time text data to extract a named entity corresponding to a preset entity type. The invention fully learns token semantic feature information of the text, improves generalization of the text after the model is migrated, simultaneously integrates semantic knowledge of the labels, carries out relevance calculation on each token semantic feature information and a plurality of label semantic feature information, can enhance similarity recognition of named entities, greatly reduces the requirement on manual labeling, can help the model to better complete the task of recognizing the named entities by acquiring the label semantic feature information corresponding to the token semantic feature information, fully utilizes the semantic information of the labels, and improves the efficiency and accuracy of recognizing the named entities.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the description of the embodiments of the present invention will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic diagram of an application environment of a method for identifying named entities of an electronic medical record according to an embodiment of the present invention;

FIG. 2 is a flowchart of a method for identifying named entities of an electronic medical record according to an embodiment of the present invention;

FIG. 3 is a schematic structural diagram of an electronic medical record named entity recognition device according to an embodiment of the present invention;

fig. 4 is a schematic structural diagram of a computer device according to an embodiment of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

It should be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It should also be understood that the term "and/or" as used in the present specification and the appended claims refers to any and all possible combinations of one or more of the associated listed items, and includes such combinations.

As used in the present description and the appended claims, the term "if" may be interpreted as "when..once" or "in response to a determination" or "in response to detection" depending on the context. Similarly, the phrase "if a determination" or "if a [ described condition or event ] is detected" may be interpreted in the context of meaning "upon determination" or "in response to determination" or "upon detection of a [ described condition or event ]" or "in response to detection of a [ described condition or event ]".

Furthermore, the terms "first," "second," "third," and the like in the description of the present specification and in the appended claims, are used for distinguishing between descriptions and not necessarily for indicating or implying a relative importance.

Reference in the specification to "one embodiment" or "some embodiments" or the like means that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the invention. Thus, appearances of the phrases "in one embodiment," "in some embodiments," "in other embodiments," and the like in the specification are not necessarily all referring to the same embodiment, but mean "one or more but not all embodiments" unless expressly specified otherwise. The terms "comprising," "including," "having," and variations thereof mean "including but not limited to," unless expressly specified otherwise.

It should be understood that the sequence numbers of the steps in the following embodiments do not mean the order of execution, and the execution order of the processes should be determined by the functions and the internal logic, and should not be construed as limiting the implementation process of the embodiments of the present invention.

The embodiment of the application can acquire and process the related data based on the artificial intelligence technology. Among these, artificial intelligence (Artificial Intelligence, AI) is the theory, method, technique and application system that uses a digital computer or a digital computer-controlled machine to simulate, extend and extend human intelligence, sense the environment, acquire knowledge and use knowledge to obtain optimal results.

In order to illustrate the technical scheme of the invention, the following description is made by specific examples.

The method for identifying the named entities of the electronic medical record provided by the embodiment of the invention can be applied to an application environment as shown in fig. 1, wherein a client communicates with a server. The clients include, but are not limited to, palm top computers, desktop computers, notebook computers, ultra-mobile personal computer (UMPC), netbooks, personal digital assistants (personal digital assistant, PDA), and the like. The server can be realized by an independent server or a server cluster formed by a plurality of servers, and medical data such as personal health files, prescriptions, examination reports and the like can be uploaded and downloaded through the server.

It should be noted that, the method for identifying the named entity of the electronic medical record provided by the embodiment of the application is applied to the field of digital medical treatment, and utilizes a medical platform to output electronic medical record data corresponding to various medical texts, for example, the electronic medical record specifically refers to: digitized medical records are stored, managed, transmitted, and reproduced using electronic devices (computers, health cards, etc.) to replace all information of handwritten paper cases. The electronic medical record comprises various different types of documents such as project names, disease records, postoperative course, examination results, medical orders, operation records, admission records and the like, chapter types of the different types of documents are different (for example, the admission records comprise chapters such as main complaints, current medical history, family history and the like), reports can be acquired through a medical platform, and the medical platform converts the text examination reports into target text data and outputs the target text data to a user.

In one possible implementation, the medical text may be a medical electronic record (Electronic Healthcare Record), an electronic personal health record, a series of electronic records with saved inventory value including medical records, electrocardiography, medical images, and the like.

In one possible implementation manner, the method can be applied to intelligent diagnosis and treatment and remote consultation, and can also be used for intelligent customer service treatment of an internet hospital by utilizing the synthesized target voice for intelligent diagnosis and treatment and remote consultation.

Information inquiry is a channel for users to quickly acquire required information in many scenes. For example, in the medical field, medical record information required by a user can be queried from a large amount of electronic medical records based on an artificial intelligence model, and medical record reference can be provided for the user by outputting medical texts through voice.

It should be noted that the above application scenario related to medical treatment is only illustrative, and specific examples are not limited thereto. Referring to fig. 2, a flow chart of a method for identifying a named entity of an electronic medical record according to an embodiment of the present invention is shown, where the method for identifying a named entity of an electronic medical record may be applied to a server in fig. 1, and the server is connected to a corresponding client, as shown in fig. 2, and the method for identifying a named entity of an electronic medical record may include the following steps.

S201: and acquiring real-time text data of the electronic medical record, and processing the real-time text data by utilizing a contrast learning mode in a pre-trained language characterization model to obtain token semantic feature information corresponding to the real-time text data.

In step S201, the real-time text data of the electronic medical record obtained in the embodiment of the present application may be obtained from the electronic medical record database server; or, electronically scanning the paper medical record to obtain real-time text data of the electronic medical record, and processing the real-time text data by using a contrast learning mode in a pre-trained language characterization model to obtain token semantic feature information corresponding to the real-time text data.

Optionally, the processing the real-time text data by using a contrast learning mode in the pre-trained language characterization model to obtain token semantic feature information corresponding to the real-time text data includes:

pre-building a pre-trained language characterization model, wherein the pre-trained language characterization model comprises a Gaussian embedding layer;

inputting the real-time text data into a Gaussian embedded layer in a pre-trained language characterization model for comparison processing to obtain the distribution distance between each token in the real-time text data;

And determining token semantic feature information corresponding to the real-time text data according to the distribution distance between each token in the real-time text data.

In this embodiment, the current method of industry transfer learning is not efficient enough, is limited by the data tag and data distribution of the source data domain, learns semantic features and intermediate representations insufficiently, and has very limited generalization under different scenarios. Based on BERT, a Gaussian embedding layer is built to build a pre-trained language characterization model, so that the pre-trained language characterization model comprises a Gaussian embedding layer, and the model is better adapted to tasks in the field. The method comprises the steps of obtaining a text presentation containing rich semantic information, namely a text semantic Representation, by utilizing large-scale non-labeling corpus training, performing fine tuning on the text semantic Representation in a specific NLP task, and finally applying the text semantic Representation to the NLP task. And (3) an Embedding: embedding is also called mapping, which is to map sentences of word composition to a token vector. The method comprises the steps of inputting real-time text data into a Gaussian embedding layer in a pre-trained language characterization model, determining the distribution distance of a Gaussian embedding layer among various token in the real-time text data by means of contrast learning, enabling the model to try to reduce the distance of token embedding of similar entities, increasing the distance of token embedding of different entities, and further determining token semantic feature information corresponding to the real-time text data according to the distribution distance among each token in the real-time text data. For example, our input is "Barack Obama was born in 1961", where "Back Obama" is a name entity, we consider that the token ("Back", "Obama") within the entity should be relatively close in the ebadd distance, while "Back" and "Obama" should be further away from other token ("was", "born", "in", "1961") outside the entity, and in particular operation, when the token within the same entity, we give a smaller value of loss during training, and conversely a larger value of loss, allowing the model to learn the relationship between the entity and outside the entity. Such as: the calculated distance between the back and the Obama has smaller value, and the calculated distance between the Obama and the wass has larger value.

It should be noted that the pre-training language model may be gpt3, chatglm, bert, and the like, which is not limited in this application.

For example, for different patients, the application part and treatment target are different, so that the electronic medical record naming entity identification modes in each progress are also different. Therefore, in the technical scheme of the invention, the identification mode of the naming entity of the target electronic medical record can be determined according to the current medical instrument and the current progress of the rehabilitation data. In one possible implementation, the data is medical data, such as personal health records, prescriptions, exam reports, and the like.

According to the method and the device, the real-time text data of the electronic medical record are obtained, the real-time text data are processed by utilizing a contrast learning mode in the pre-trained language characterization model, so that token semantic feature information of the text is fully learned, the model can better capture the dependency relationship between labels, the intermediate representation and the semantic features are fully learned, and generalization of the model after migration is improved.

S202: and carrying out label semanteme processing on the real-time text data based on a preset indication label to obtain a plurality of label semantic feature information corresponding to the real-time text data.

In step S202, the semantic features of the tag are not utilized in the present application due to the conventional entity extraction process. For example, 2 entities of "PER" and "LOC" are extracted, in the conventional model, the model only knows 2 entities of prediction 0 (PER) and 1 (LOC), but does not know what semantics are specifically represented by 0 and 1, so that by integrating tag knowledge, based on a preset indication tag, real-time text data is subjected to tag semanteme processing, and a plurality of tag semantic feature information corresponding to the real-time text data is obtained. For example, "PER" stands for Person, so we translate B-PER semantics into begin Person, I-PER semantics into inside Person, and so on. Semantically means that PER is converted into Begin person/entity person, and then the PER is encoded through BERT model, namely semantically characteristic information is generated.

In the embodiment of the invention, small models in various fields are not required to be designed under different medical scenes, the advantages of semantic features are fully utilized, the semantic feature information of the labels is fused, the generalization effect of the models in the target fields is enhanced, and the recognition efficiency of named entities is further improved.

S203: and carrying out relevance calculation on each token semantic feature information and the plurality of tag semantic feature information based on the similarity to obtain tag semantic feature information corresponding to the token semantic feature information.

In step S203, when the token semantic feature information corresponding to the real-time text data and the plurality of tag semantic feature information corresponding to the real-time text data are obtained, the relative degree of the token semantic feature information and the plurality of tag semantic feature information is determined, and further, correlation calculation is performed on each token semantic feature information and the plurality of tag semantic feature information based on the similarity, so as to obtain the tag semantic feature information corresponding to the token semantic feature information.

Optionally, performing relevance calculation on each token semantic feature information and a plurality of tag semantic feature information based on the similarity includes:

judging whether the similarity between each token semantic feature information and a plurality of label semantic feature information is larger than a preset similarity threshold value or not;

and if the similarity between each token semantic feature information and the plurality of tag semantic feature information is larger than a preset similarity threshold, performing relevance calculation on each token semantic feature information and the plurality of tag semantic feature information.

In this embodiment, by calculating the similarity between each token semantic feature information and the plurality of tag semantic feature information and setting a preset similarity threshold, whether the similarity between each token semantic feature information and the plurality of tag semantic feature information is greater than the preset similarity threshold is determined, if the similarity between each token semantic feature information and the plurality of tag semantic feature information is greater than the preset similarity threshold, the relevance calculation is performed on each token semantic feature information and the plurality of tag semantic feature information, and if the similarity between each token semantic feature information and the plurality of tag semantic feature information is not greater than the preset similarity threshold, the relevance calculation is not performed on each token semantic feature information and the plurality of tag semantic feature information, and the step of the electronic medical record named entity recognition method is required to be re-executed.

It should be noted that, the specific value of the preset similarity threshold may be set according to the actual requirement of the user, and the embodiment of the present application is not limited in any way.

Optionally, performing relevance calculation on each token semantic feature information and a plurality of tag semantic feature information, including:

for each token semantic feature information, fusing one token semantic feature information and a plurality of tag semantic feature information in advance to obtain different tag semantic feature information corresponding to the token semantic feature information;

sequentially classifying and scoring the semantic feature information of different labels corresponding to the token semantic feature information to obtain a target score set;

sequentially sequencing each score value in the target score set from large to small, and selecting the maximum value as the label semantic feature information which corresponds to the token semantic feature information and is closest;

repeating the relevance calculating step for other token semantic feature information until the label semantic feature information corresponding to all token semantic feature information is determined to be completed.

In this embodiment, when performing relevance calculation on each token semantic feature information and multiple tag semantic feature information, first, fusing one token semantic feature information with multiple tag semantic feature information to obtain different tag semantic feature information corresponding to the token semantic feature information, then sequentially classifying and scoring the different tag semantic feature information corresponding to the token semantic feature information to obtain a target score set, sequentially sorting each score value in the target score set from large to small, selecting the maximum value as the tag semantic feature information corresponding to the token semantic feature information most similar through priority, repeating the relevance calculation step for the rest other token semantic feature information, and sequentially analogizing until the tag semantic feature information corresponding to all the token semantic feature information is completed. For example, 2 encoders (encodings) are constructed through a BERT model, input token semantic feature information and label semantic feature information are encoded respectively, then the semantic feature information of each token of an input text is associated with semantic feature information of a plurality of labels (labels) for calculation, labels (labels) closest to the token are obtained, and the labels corresponding to all the token are calculated by analogy, so that extraction of named entities can be completed. The extraction process is very general, and for different labels in different scenes, redesign or retraining is not needed (2 encoders can already encode the input token semantic feature information well and the label semantic feature information).

In the embodiment, relevance calculation is performed on each token semantic feature information and a plurality of tag semantic feature information based on the similarity, so that the tag semantic feature information corresponding to the token semantic feature information is obtained, a better completion task of named entity recognition by a model is ensured, the semantic information of the tag is fully utilized, and the efficiency and accuracy of named entity recognition are improved.

S204: and carrying out named entity recognition on the real-time text data according to the label semantic feature information corresponding to the token semantic feature information so as to extract named entities corresponding to the preset entity types.

In step S204, after the label semantic feature information corresponding to the token semantic feature information, further, according to the label semantic feature information corresponding to the token semantic feature information, named entity recognition is performed on the real-time text data, so as to extract a named entity corresponding to the preset entity type.

Optionally, performing named entity recognition on the real-time text data to extract a named entity corresponding to the preset entity type, including:

carrying out named entity recognition on the real-time text data according to label semantic feature information corresponding to all token semantic feature information to obtain entity attribute identifiers corresponding to all token in the real-time text data;

Judging whether entity attribute identifiers corresponding to all token in the real-time text data are matched with a preset entity type or not;

and if the entity attribute identifiers corresponding to the token in the real-time text data are matched with the preset entity types, extracting named entities corresponding to the preset entity types.

In one embodiment, after determining tag semantic feature information corresponding to all token semantic feature information, using a preset named entity recognition model to perform named entity recognition on the real-time text data, and obtaining entity attribute identifiers corresponding to all the tokens in the real-time text data, where the entity attribute identifiers are used to indicate whether all the tokens in the real-time text data belong to named entities. After determining that each token in the real-time text data belongs to a named entity, judging whether entity attribute identifiers corresponding to each token in the real-time text data are matched with a preset entity type, and if the entity attribute identifiers corresponding to each token in the real-time text data are matched with the preset entity type, extracting the named entity corresponding to the preset entity type. If the entity attribute identifier corresponding to each token in the real-time text data is not matched with the preset entity type, the named entity corresponding to the preset entity type cannot be extracted, and the step of the electronic medical record named entity identification method needs to be executed again.

Optionally, before extracting the named entity corresponding to the preset entity type, the method includes:

pre-establishing a named entity index based on a plurality of tag semantic feature information;

and inputting the token semantic feature information into the named entity index, and directly extracting the named entity corresponding to the preset entity type.

In this embodiment, before extracting the named entity corresponding to the preset entity type, we can calculate all the tag semantic feature information in the target domain scene in advance and save the tag semantic feature information, and then build the named entity index (redis can be used as a storage and design index) for all the tag semantic feature information in the target domain scene, so that when extracting the named entity corresponding to the preset entity type, only the input token semantic feature information needs to be calculated each time, the named entity of the electronic medical record can be extracted quickly, for example, when the related computation of the token semantic feature information and the tag semantic characterization information is needed, for the same scene, the types of the tags to be extracted are fixed, such as: under the condition of extracting the B ultrasonic report, only the report time, the doctor for delivery, the hospital, the disease and the part are extracted, and then the labels are coded by a BERT model in advance, so that token semantic feature information is obtained, and repeated calculation is not needed each time a named entity is extracted.

In the embodiment, under a complex and difficultly marked medical scene, the named entity identification is performed on the real-time text data through the label semantic feature information corresponding to the token semantic feature information so as to extract the named entity corresponding to the preset entity type, so that the advantages of the token semantic feature information are fully utilized, the label semantic feature information is fused, the generalization effect of the model is enhanced, the related index is designed to store the semantic label of the label, and the speed of model reasoning is greatly improved.

Referring to fig. 3, fig. 3 is a schematic structural diagram of an electronic medical record named entity recognition device according to an embodiment of the invention. The terminal in this embodiment includes units for executing the steps in the embodiment corresponding to fig. 2. Refer specifically to fig. 2 and the related description in the embodiment corresponding to fig. 2. For convenience of explanation, only the portions related to the present embodiment are shown. Referring to fig. 3, the electronic medical record named entity recognition device 30 includes: the device comprises an acquisition module 31, a processing module 32, a calculation module 33 and an extraction module 34.

The acquiring module 31 is configured to acquire real-time text data of an electronic medical record, and process the real-time text data by using a contrast learning mode in a pre-trained language characterization model to obtain token semantic feature information corresponding to the real-time text data;

the processing module 32 is configured to perform tag semanteme processing on the real-time text data based on a preset indication tag, so as to obtain a plurality of tag semantic feature information corresponding to the real-time text data;

a calculating module 33, configured to perform relevance calculation on each token semantic feature information and the plurality of tag semantic feature information based on similarity, so as to obtain tag semantic feature information corresponding to the token semantic feature information;

And the extraction module 34 is configured to identify a named entity of the real-time text data according to the tag semantic feature information corresponding to the token semantic feature information, so as to extract the named entity corresponding to the preset entity type.

Optionally, the above-mentioned obtaining module 31 is specifically configured to:

Optionally, the above-mentioned calculation module 33 is specifically configured to:

Optionally, the above-mentioned calculation module 33 is further configured to:

Optionally, the extraction module 34 is specifically configured to:

Optionally, the foregoing extraction module 34 is specifically configured to:

It should be noted that, because the content of information interaction and execution process between the above units is based on the same concept as the method embodiment of the present invention, specific functions and technical effects thereof may be referred to in the method embodiment section, and will not be described herein.

Fig. 4 is a schematic structural diagram of a computer device according to an embodiment of the present invention. As shown in fig. 4, the computer device of this embodiment includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The network interface of the computer device is used for communicating with an external terminal through a network connection. The steps in any of the above embodiments of the electronic medical record named entity identification method are implemented when the computer program is executed by a processor.

The computer device may include, but is not limited to, a processor, a memory. It will be appreciated by those skilled in the art that fig. 4 is merely an example of a computer device and is not intended to limit the computer device, and that a computer device may include more or fewer components than shown, or may combine certain components, or different components, such as may also include a network interface, a display screen, an input device, and the like.

In an embodiment, a computer readable storage medium is provided, where instructions in the computer readable storage medium, when executed by a processor in a computer device, enable the computer device to perform the steps of any embodiment of the electronic medical record named entity identification method as disclosed in the present invention, are not repeated herein. The computer readable storage medium may be nonvolatile or may be volatile.

The processor may be a CPU, but may also be other general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), off-the-shelf programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory includes a readable storage medium, an internal memory, etc., where the internal memory may be the memory of the computer device, the internal memory providing an environment for the execution of an operating system and computer-readable instructions in the readable storage medium. The readable storage medium may be a hard disk of a computer device, and in other embodiments may be an external storage device of the computer device, for example, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), etc. that are provided on the computer device. Further, the memory may also include both internal storage units and external storage devices of the computer device. The memory is used to store an operating system, application programs, boot loader (BootLoader), data, and other programs such as program codes of computer programs, and the like. The memory may also be used to temporarily store data that has been output or is to be output.

Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the various embodiments provided herein may include non-volatile and/or volatile memory. The nonvolatile memory can include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), memory bus direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), among others.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of the functional units and modules is illustrated, and in practical application, the above-described functional distribution may be performed by different functional units and modules according to needs, i.e. the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-described functions. The functional units and modules in the embodiment may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit, where the integrated units may be implemented in a form of hardware or a form of a software functional unit. In addition, the specific names of the functional units and modules are only for distinguishing from each other, and are not used for limiting the protection scope of the present invention. The specific working process of the units and modules in the above device may refer to the corresponding process in the foregoing method embodiment, which is not described herein again. The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

The above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention, and are intended to be included in the scope of the present invention.

Claims

1. The method for identifying the named entity of the electronic medical record is characterized by comprising the following steps of:

2. The method for identifying a named entity of an electronic medical record according to claim 1, wherein the processing the real-time text data by using a contrast learning mode in a pre-trained language characterization model to obtain token semantic feature information corresponding to the real-time text data comprises:

3. The electronic medical record named entity recognition method of claim 1, wherein performing association calculation on each token semantic feature information and a plurality of tag semantic feature information based on similarity comprises:

4. The electronic medical record named entity recognition method of claim 3, wherein said performing a correlation calculation on each token semantic feature information and a plurality of tag semantic feature information comprises:

5. The method for identifying named entities of an electronic medical record according to claim 1, wherein the step of identifying named entities of the real-time text data to extract named entities corresponding to a predetermined entity type comprises the steps of:

6. The method for identifying a named entity of an electronic medical record according to claim 5, wherein before extracting the named entity corresponding to the preset entity type, the method comprises:

7. An electronic medical record named entity recognition device, characterized by comprising:

8. The electronic medical record named entity recognition device of claim 7, wherein the processing module comprises:

the device comprises a building unit, a pre-training unit and a processing unit, wherein the building unit is used for pre-building a pre-training language characterization model, and the pre-training language characterization model comprises a Gaussian embedding layer;

the comparison unit is used for inputting the real-time text data into a Gaussian embedding layer in a pre-trained language characterization model for comparison processing to obtain the distribution distance between each token in the real-time text data;

and the determining unit is used for determining the token semantic feature information corresponding to the real-time text data according to the distribution distance between each token in the real-time text data.

9. A computer device comprising a processor, a memory, and a computer program stored in the memory and executable on the processor, the processor implementing the electronic medical record named entity recognition method of any one of claims 1 to 6 when the computer program is executed by the processor.

10. A computer readable storage medium storing a computer program, wherein the computer program when executed by a processor implements the electronic medical record named entity recognition method of any one of claims 1 to 6.