CN111353302A

CN111353302A - Medical word sense recognition method and device, computer equipment and storage medium

Info

Publication number: CN111353302A
Application number: CN202010141191.9A
Authority: CN
Inventors: 施维; 郭建福; 张旭
Original assignee: Ping An Medical and Healthcare Management Co Ltd
Current assignee: Shenzhen Ping An Medical Health Technology Service Co Ltd
Priority date: 2020-03-03
Filing date: 2020-03-03
Publication date: 2020-06-30

Abstract

The application belongs to the field of data processing and discloses a medical word sense identification method and device, computer equipment and a readable storage medium. The method comprises the steps of obtaining a sentence to be analyzed, and finding out medical words which are relevant to the sentence to be analyzed from a preset medical word list according to the sentence to be analyzed; importing the statement to be analyzed and the medical word into a Bilstm model to obtain an original statement vector and a medical word vector; performing pooling analysis on the original sentence vector and the medical word vector respectively to obtain an original feedforward vector and a medical feedforward vector; and introducing the original feedforward vector and the medical feedforward vector into a cosine similarity algorithm to obtain a cosine value between the original feedforward vector and the medical feedforward vector, and taking the medical word corresponding to the largest cosine value as a medical word meaning identification result. The method solves the technical problems that correct concepts cannot be hit from thousands of standard concepts and related medical words can not be accurately positioned.

Description

Medical word sense recognition method and device, computer equipment and storage medium

Technical Field

The present application relates to the field of data processing, and in particular, to a medical word sense recognition method, apparatus, computer device, and storage medium.

Background

With the development of information technology, the deep learning model has great progress in the general field of semantic similarity recently, such as semantic recognition by a robot, but has no prominent achievement on the semantic similarity of Chinese medical clinical data;

currently, a twin network model is adopted for medical clinical semantic recognition, and a traditional twin network model simultaneously maps a statement to be analyzed and a medical word preliminarily determined according to the statement to be analyzed into an analysis space, so that the statement to be analyzed and the corresponding medical word are represented in the space, and medical clinical semantic recognition is realized by calculating the similarity of spelling difference; however, the current twin neural network model encodes the statement to be analyzed indiscriminately, and does not integrate important medical knowledge. Therefore, the traditional twin model cannot specifically understand the clinical medical concepts and the relationship between them, and the correct concepts cannot be hit from thousands of standard concepts; under the social and large environment where big data are widely used, relevant medical words are accurately positioned by analyzing sentences input by patients/doctors, and the realization of medical association is a problem to be solved urgently at present.

Disclosure of Invention

In view of the above, it is necessary to provide a method, an apparatus, a computer device and a storage medium for recognizing a medical word sense, so as to solve the technical problem that in the prior art, a correct concept cannot be hit from thousands of standard concepts to accurately locate a related medical word.

A medical word sense recognition method, the method comprising:

obtaining a sentence to be analyzed, and finding out medical words which are related to the sentence to be analyzed from a preset medical word list according to the sentence to be analyzed, wherein the number of the medical words is at least one;

importing the statement to be analyzed and the medical word into a Bilstm model to obtain an original statement vector and a medical word vector;

performing pooling analysis on the original sentence vector and the medical word vector respectively to obtain an original feedforward vector and a medical feedforward vector;

and introducing the original feedforward vector and the medical feedforward vector into a cosine similarity algorithm to obtain a cosine value between the original feedforward vector and the medical feedforward vector, and taking the medical word corresponding to the largest cosine value as a medical word meaning identification result.

A medical word sense recognition apparatus, the apparatus comprising:

the sentence matching module is used for acquiring a sentence to be analyzed and searching out medical words which are related to the sentence to be analyzed from a preset medical word list according to the sentence to be analyzed, wherein the number of the medical words is at least one;

the coding processing module is used for leading the statement to be analyzed and the medical word into a Bilstm model to obtain an original statement vector and a medical word vector;

the pooling analysis module is used for respectively carrying out pooling analysis on the original statement vector and the medical word vector to obtain an original feedforward vector and a medical feedforward vector;

and the result identification module is used for introducing the original feedforward vector and the medical feedforward vector into a cosine similarity algorithm to obtain a cosine value between the original feedforward vector and the medical feedforward vector, and taking the medical word corresponding to the largest cosine value as a medical word meaning identification result.

A computer device comprising a memory and a processor, and a computer program stored in said memory and executable on said processor, said processor implementing the steps of the above-mentioned medical word sense recognition method when executing said computer program.

A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the above-mentioned medical word sense recognition method.

According to the medical word meaning identification method, the medical word meaning identification device, the computer equipment and the storage medium, the medical words related to the sentences to be analyzed are found out from the preset medical word list by acquiring the input sentences to be analyzed; then, the statement to be analyzed and the medical word are led into a Bilstm model to obtain an original statement vector and a medical word vector, the original phrase vector and the medical word vector are subjected to pooling analysis respectively, deeper semantic information of the original phrase vector and the medical word vector is reserved, an original feedforward vector and a medical feedforward vector are obtained, and the accuracy of semantic analysis is improved; finally, the original feedforward vector and the medical feedforward vector are led into a cosine similarity calculation method, the cosine value of the calculated medical feedforward vector is used as the similarity of the original feedforward vector, the medical word with the highest similarity to the original feedforward vector is determined from a plurality of medical feedforward vectors by repeatedly calculating the cosine value, so that the medical word closest to the sentence is determined from tens of millions of medical concepts according to the sentence input by a patient/doctor, the technical sharing of medical federation and union under the large data environment is realized, and the most accurate medical diagnosis can be performed in remote areas.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments of the present invention will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to these drawings without inventive labor.

FIG. 1 is a schematic diagram of an application environment of a medical word sense recognition method;

FIG. 2 is a flow chart diagram of a medical word sense identification method;

FIG. 3 is a flow diagram of another embodiment of a medical word sense identification method;

FIG. 4 is a schematic flow chart of step 202 in FIG. 2;

FIG. 5 is a schematic flow chart of step 204 in FIG. 2;

FIG. 6 is a schematic flow chart of step 206 in FIG. 2;

FIG. 7 is a diagram showing feature extraction in example 6;

FIG. 8 is a diagram showing semantic concatenation in embodiment 6;

FIG. 9 is a schematic diagram of a medical word sense recognition apparatus;

FIG. 10 is a diagram of a computer device in one embodiment.

Detailed Description

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs; the terminology used in the description of the application herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application; the terms "including" and "having," and any variations thereof, in the description and claims of this application and the description of the above figures are intended to cover non-exclusive inclusions. The terms "first," "second," and the like in the description and claims of this application or in the above-described drawings are used for distinguishing between different objects and not for describing a particular order.

Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The medical word sense recognition method provided by the embodiment of the invention can be applied to the application environment shown in fig. 1. The application environment may include a terminal 102, a network for providing a communication link medium between the terminal 102 and the server 104, and a server 104, wherein the network may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.

A user may use the terminal 102 to interact with the server 104 over a network to receive or send messages, etc. The terminal 102 may have installed thereon various communication client applications, such as a web browser application, a shopping application, a search application, an instant messaging tool, a mailbox client, social platform software, and the like.

The terminal 102 may be various electronic devices having a display screen and supporting web browsing, including but not limited to a smart phone, a tablet computer, an e-book reader, an MP3 player (Moving Picture Experts Group audio Layer III, mpeg compression standard audio Layer 3), an MP4 player (Moving Picture Experts Group audio Layer IV, mpeg compression standard audio Layer 4), a laptop portable computer, a desktop computer, and the like.

The server 104 may be a server that provides various services, such as a background server that provides support for pages displayed on the terminal 102.

It should be noted that the medical word sense recognition method provided in the embodiments of the present application is generally executed by a server/terminal, and accordingly, the medical word sense recognition apparatus is generally disposed in the server/terminal device.

It should be understood that the number of terminals, networks and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

Wherein, the terminal 102 communicates with the server 104 through the network. The server 104 obtains the sentence to be analyzed input to the terminal 102 by the user from the terminal 102, and finds out the medical word related to the sentence to be analyzed from the preset medical word form; then, the statement to be analyzed and the medical word are led into a Billstm model to obtain an original statement vector and a medical word vector, the original phrase vector and the medical word vector are respectively subjected to pooling analysis, deeper semantic information of the original phrase vector and the medical word vector is reserved, an original feedforward vector and a medical feedforward vector are obtained, and the accuracy of semantic analysis is improved; and finally, introducing the original feedforward vector and the medical feedforward vector into a cosine similarity calculation method, taking the cosine value of the calculated medical feedforward vector as the similarity of the original feedforward vector, and repeatedly calculating the cosine value to determine a medical word with the highest similarity with the original feedforward vector from a plurality of medical feedforward vectors. The terminal 102 and the server 104 are connected through a network, the network may be a wired network or a wireless network, the terminal 102 may be, but is not limited to, various personal computers, notebook computers, smart phones, tablet computers, and portable wearable devices, and the server 104 may be implemented by an independent server or a server cluster formed by a plurality of servers.

Embodiment 1, as shown in fig. 2, provides a medical word sense recognition method, which is described by taking the application of the method to the server side in fig. 1 as an example, and includes the following steps:

step 202, obtaining a sentence to be analyzed, and finding out medical words having correlation with the sentence to be analyzed from a preset medical word list according to the sentence to be analyzed, wherein the number of the medical words is at least one.

The execution subject adopted by the embodiment may be a server or a computer device, and the embodiment is described by taking the server as the execution subject to execute the medical word sense recognition; the above-mentioned sentence to be analyzed can be a sentence in the input terminal of the patient or doctor, the above-mentioned medical word form is a form prestored in the server, the embodiment adopts the medical semantic segmentation in the SNOMED-CT knowledge base, has selected the medical characteristics of 11 kinds of medical concepts (disease, operation formula, examination, etc.), has more than 14,000 medical words and makes into the medical word form. The SNOMED-CT knowledge base is an abbreviation of systematic Nomenclature of medical-Clinical Terms, which is called as SNOMED CT (systematic Nomenclature of medical-Clinical Terms, medical system Nomenclature-Clinical Terms, medical Term system Nomenclature-Clinical Terms), and is a medical Term set which is organized by a system and is convenient for computer processing, and covers Clinical information in most aspects, such as diseases, findings, operations, microorganisms, medicines and the like. Using this set of terms, indexing, storing, retrieving, and aggregating clinical data can be accomplished in a coordinated fashion between different disciplines, specialties, and care locations. Meanwhile, the method is also beneficial to organizing medical record contents and reducing variation of data acquisition, coding and using modes in clinical care and scientific research work. Relevance means that semantic equivalence exists between the sentence to be analyzed and the medical word, such as: the sentence to be analyzed is a symptom descriptor inputted by "eye swelling, eye pain, photophobia, hard eyeball, weak eyesight" and the like, and the pathological medical word corresponding to the sentence to be analyzed may be "glaucoma, acute angle-closure glaucoma, chronic angle-closure glaucoma, primary open-angle glaucoma, filtering bleb separation" and the like.

And 204, importing the statement to be analyzed and the medical word into a Bilstm model to obtain an original statement vector and a medical word vector.

Converting the sentence to be analyzed and the medical word from a character form into a vector form for finally calculating the similarity between vectors, and determining the medical word which is most matched with the sentence to be analyzed from a plurality of medical words according to the similarity; the Bilstm model is a natural language processing neural network model, and the vector conversion mode of the statements to be analyzed and the medical words by the Bilstm model is as follows:

for example, the sentence to be analyzed "eye distension, eye pain, photophobia, hard eyeball, weak eyesight"; the first medical word "glaucoma, acute angle-closure glaucoma, chronic angle-closure glaucoma, primary open-angle glaucoma, follicular apheresis", the second medical word "myopia", the third medical word "keratitis", and so on.

The Bilstm model is used for splicing a statement to be analyzed and a plurality of medical words to obtain an original statement vector HL and a plurality of medical word vectors HR, namely the original statement vector HL { eye swelling + eye pain + photophobia + hard eyeball + weak eyesight }, a first medical word vector HR1{ glaucoma + acute angle-closure glaucoma + chronic angle-closure glaucoma + primary angle-closure glaucoma + filtration bleb separation }, a second medical word vector HR1{ myopic eye } and the like.

And step 206, performing pooling analysis on the original sentence vector and the medical word vector respectively to obtain an original feedforward vector and a medical feedforward vector.

The pooling analysis is pooling proposed based on a convolutional neural network, in the embodiment, an original statement vector and a medical word vector are respectively processed by using maximum pooling (maxporoling) and mean pooling (avg pooling), and pooling operation is performed by using a mode of combining a maximum pooling (max pooling) and mean pooling (avg pooling) parallel double pooling layers, so that deeper semantic information of the original statement vector and the medical word vector is retained.

And 208, importing the original feedforward vector and the medical feedforward vector into a cosine similarity algorithm to obtain a cosine value between the original feedforward vector and the medical feedforward vector, and taking the medical word corresponding to the maximum cosine value as a medical word meaning identification result.

As shown in step 206, the right part of the medical feedforward vectors can be screened out from the medical word vectors through the pooling process, in this embodiment, the right part of the medical feedforward vectors is compared and determined, and a medical word that best matches the original feedforward vector is determined from the medical feedforward vectors. Therefore, the patient type and treatment means related to the patient can be determined according to the medical words which are analyzed and accurately positioned in the sentences input by the patient/doctor.

Specifically, a medical word which is most matched with an original feedforward vector can be determined from a plurality of medical feedforward vectors through a cosine similarity algorithm, the cosine similarity algorithm is a formula for judging vector similarity, the original feedforward vector and the medical feedforward vector are led into the cosine similarity algorithm by the server side, (it can be understood that if a plurality of medical feedforward vectors exist, calculation is carried out for a plurality of times), and the similarity of the original feedforward vector and the medical feedforward vector is finally calculated.

In the medical word meaning identification method, the medical words related to the sentences to be analyzed are found out from a preset medical word list by acquiring the input sentences to be analyzed; then, the statement to be analyzed and the medical word are led into a Billstm model to obtain an original statement vector and a medical word vector, the original phrase vector and the medical word vector are respectively subjected to pooling analysis, deeper semantic information of the original phrase vector and the medical word vector is reserved, an original feedforward vector and a medical feedforward vector are obtained, and the accuracy of semantic analysis is improved; finally, the original feedforward vector and the medical feedforward vector are led into a cosine similarity calculation method, the cosine value of the calculated medical feedforward vector is used as the similarity of the original feedforward vector, the medical word with the highest similarity to the original feedforward vector is determined from a plurality of medical feedforward vectors by repeatedly calculating the cosine value, so that the medical word closest to the sentence is determined from tens of millions of medical concepts according to the sentence input by a patient/doctor, the technical sharing of medical federation and union under the large data environment is realized, and the most accurate medical diagnosis can be performed in remote areas.

Embodiment 2, as shown in fig. 3, before step 202, further includes:

step 302, a first training sample B1 is obtained from the case database and a second training sample B2 is obtained from the medical word form. The server can obtain a first training sample B through the platform₁Obtaining a second training sample B through the medical word form₂Specifically, the platform may be a patient illness record recorded by a hospital, a clinic, or other medical unit.

In step 304, a similarity label between the first training sample B1 and at least one second training sample B2 is established, wherein the second training sample B2 is a medical word having a correlation with the text of the first training sample B1. By a first training sample B₁A plurality of second training samples B associated with the letters of the first training sample B1₂Establishing a similarity label, e.g., a first training sample: b1-glaucoma bleb separation ", a second plurality of training samples B₂The labels are laberB1 (disease "glaucoma filtering bleb separation" + surgery "filtering bleb separation" "," glaucoma "angle-closure glaucoma" and "primary open angle glaucoma"), respectively, and the similarity label laberB1 binds to the first training sample B1.

Step 306, the similarity label is input into the medical word form. Finally, the training process is achieved by inputting a large number of B1 and the label labelB1 into a computer device for storage.

The mode of calling the medical words in the medical word form through the sentences to be analyzed specifically comprises the following steps: the server identifies the characters of the sentence to be analyzed, determines a similarity label corresponding to the sentence to be analyzed through comparison, and records medical words related to the sentence to be analyzed in the similarity label.

In the embodiment, the similarity labels between the first training sample B1 and the plurality of second training samples B2 are established through pre-training, and then based on mass data, the corresponding similarity labels may exist in the medical word form for all possibly occurring sentences to be analyzed, so that the efficiency of acquiring medical words having correlation with the sentences to be analyzed from the medical word form according to the sentences to be analyzed can be improved.

Embodiment 3, as shown in fig. 4, based on embodiment 2, step 202, includes:

step 402, identifying analysis words in the sentence to be analyzed.

Step 404, determining a corresponding similarity label of the analysis word in the medical word form.

And 406, obtaining the medical words which are related to the characters of the sentence to be analyzed according to the similarity labels.

Based on the similarity label obtained in the embodiment 2, after the server acquires the sentence to be analyzed of the patient or the doctor from the terminal, the medical word corresponding to the sentence can be found from the medical word list. Specifically, the server side obtains a sentence to be analyzed, then identifies characters of the sentence to be analyzed, and determines a similarity label corresponding to the sentence to be analyzed in the medical word form through comparison, wherein the similarity label records medical words related to the sentence to be analyzed.

According to the embodiment, the medical words having a correlation with the statement to be analyzed can be directly obtained according to the similarity labels, the server side does not need to compare the statement to be analyzed with the medical words in the form one by one, and the efficiency of obtaining the medical words having the correlation with the statement to be analyzed from the medical word form according to the statement to be analyzed can be improved.

Example 4, as shown in fig. 5, step 204, comprises:

and 502, screening out redundant characters in the sentence to be analyzed according to the human body part vocabulary and the discomfort symptom vocabulary.

And step 504, filtering the redundant characters from the sentences to be analyzed through the Bilstm model, and vectorizing the sentences to be analyzed after the redundant characters are filtered to obtain original sentence vectors.

The bilstm model has a forgetting gate for filtering redundant letters in the above sentence to be analyzed, such as "me", "feel" in "i feel me waist soreness and backache", a memory gate, and an output gate; the memory gate is used for selecting words needing to be memorized, such as symptom words of 'waist soreness and back pain' and the like, in the sentence to be analyzed, and is usually used for training the bilstm model; and the output gate outputs the to-be-analyzed sentences which are left behind and filtered by the redundant characters. In this embodiment, the server side has three gates for controlling the forgetting gate, the memory gate and the output gate in the bilstm model to achieve training of the bilstm model, data forgetting to release the memory and data output, and details about training of the bilstm model are not repeated in this example.

The server side records a human body part vocabulary and an uncomfortable symptom vocabulary, can determine redundant characters from the sentence to be analyzed according to the human body part vocabulary and the uncomfortable symptom vocabulary, controls a forgetting gate of the bilstm model to filter the redundant characters, and controls an output gate to output the original sentence vector.

In the embodiment, the sentence to be analyzed is denoised, and redundant characters in the sentence to be analyzed are filtered, so that descriptions related to diseases in the sentence to be analyzed are deeply mined, the accuracy of matching medical words is improved, the data volume of vectorization processing is reduced, and the efficiency of data processing is improved.

Example 5, as shown in fig. 6, step 206, comprises:

step 602, feature extraction is respectively performed on the original sentence vector and the medical word vector.

And step 604, performing semantic splicing on the extracted features after dimensionality reduction to form an original feedforward vector and a medical feedforward vector.

After the original statement vector and the medical word vector are subjected to convolution layers with different window sizes and feature extraction of a lower filtering unit thereof, the original statement vector and the medical word vector are input into two different parallel pooling layers for dimension reduction, namely an avg pooling layer and a max pooling layer, the characteristics dynamically extracted by the max pooling layer and the contribution capacity of the avg pooling layer to the average semantics of the short text are fully combined, and the condition of losing semantic information by dimension reduction is effectively reduced; finally, necessary semantic splicing is carried out on the concatenation layer to form an original feedforward vector and a medical feedforward vector; the max pooling layer dynamic extraction method and the avg pooling layer short text average semantic method consider the influence of the height of a convolution kernel sliding window on the generation of the feature map. That is, the height of the convolution kernel is used as an important basis for the downsampling number M of the feature map, the higher the convolution kernel is, the fewer the downsampling number is, and conversely, the lower the height of the convolution kernel is, the more the downsampling number is. Wherein, the value formula of the down-sampling M is as follows:

M＝s/h (1)

where h represents the height of the convolution kernel, i.e. the sliding window, and s represents the length of the sentence to be analyzed (controlled within 30 characters). Compared with a maximum pooling strategy, a plurality of important semantic combination features can be dynamically extracted according to the characteristics of the multi-sliding window convolution layer, and the relative sequence relation among the features is reserved.

As shown in fig. 7, the text length of the input sentence to be analyzed is 6, and we perform a dynamic pooling strategy by selecting convolution kernels with h-2 and h-3, where the shaded part in fig. 7 represents the more important feature to be extracted; and the medical word is correct and wrong by judging as shown in fig. 8, so as to call up deepened feedforward vectors in the series layer. And finally, the server side obtains the required original feedforward vector and the required medical feedforward vector.

The original phrase vectors and the medical word vectors are respectively subjected to pooling analysis through the embodiment to obtain the original feedforward vectors and the medical feedforward vectors, correct parts of medical feedforward vectors can be screened out from a plurality of medical word vectors, in the embodiment, the correct parts of medical feedforward vectors are compared and judged, and a medical word which is most matched with the original feedforward vectors is determined from the plurality of medical feedforward vectors. Therefore, the patient type and treatment means related to the patient can be determined according to the medical words which are analyzed and accurately positioned in the sentences input by the patient/doctor.

Example 6, step 208, comprises:

wherein fw (x1) is an original feedforward vector, fw (x2) is a medical feedforward vector, and Ew is the similarity between the sentence to be analyzed and the medical word. The cosine similarity algorithm is a formula for judging vector similarity, and computer equipment introduces an original feedforward vector and a medical feedforward vector into the cosine similarity algorithm (it can be understood that if a plurality of medical feedforward vectors exist, multiple calculations are carried out), and finally the similarity of the original feedforward vector and the medical feedforward vector is calculated.

Alternatively, in the process of specific operation, the original feedforward vector fw (x1) and the medical feedforward vector fw (x2) may be (x1, y1) and (x2, y2) in coordinates, respectively, and then the method may be according to formula (3):

and calculating to obtain a cosine value between the original feedforward vector and the medical feedforward vector.

Alternatively, if the original feedforward vector fw (x1) ═ a1, a2 ″ -An, the medical feedforward vector fw (x2) ═ B1, B2 ″ -Bn), Ew (x2 ″, Bn₁，x₂) Cosine value, equation (4) at this time:

i.e. as in equation (5):

when L is inclined to 1, the medical word X2 is assumed to be similar to the sentence X1 to be analyzed, and vice versa, it is not similar. And finally, repeatedly calculating a cosine similarity algorithm, and determining a medical word with the highest similarity with the original feedforward vector from the plurality of medical feedforward vectors as a medical word meaning identification result.

In the embodiment, the cosine value between the medical feedforward vector and the original feedforward vector is calculated, and the medical word corresponding to the maximum cosine value is used as the medical word meaning recognition result of the semantic to be analyzed, so that the calculation difficulty can be greatly simplified.

It should be understood that although the various steps in the flowcharts of fig. 2-6 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least some of the steps in fig. 2-6 may include multiple sub-steps or multiple stages that are not necessarily performed at the same time, but may be performed at different times, and the order of performing the sub-steps or stages is not necessarily sequential, but may be performed in turn or alternately with other steps or at least some of the sub-steps or stages of other steps.

In one embodiment, as shown in fig. 9, a medical word sense recognition apparatus is provided, which corresponds to the medical word sense recognition method in the above-mentioned embodiment one to one. The medical word sense recognition apparatus includes:

the sentence matching module 902 is configured to obtain a sentence to be analyzed, and find a medical word having a correlation with the sentence to be analyzed from a preset medical word form according to the sentence to be analyzed, where the number of the medical word is at least one.

And the coding processing module 904 is configured to introduce the statement to be analyzed and the medical word into the blstm model to obtain an original statement vector and a medical word vector.

And the pooling analysis module 906 is configured to perform pooling analysis on the original sentence vector and the medical word vector respectively to obtain an original feedforward vector and a medical feedforward vector.

And a result identification module 908, configured to introduce the original feedforward vector and the medical feedforward vector into a cosine similarity algorithm, to obtain a cosine value between the original feedforward vector and the medical feedforward vector, and use the medical word corresponding to the maximum cosine value as a medical word sense identification result.

Further, the medical word sense recognition apparatus further includes:

and the sample selection module is used for acquiring a first training sample B1 from the case database and acquiring a second training sample B2 from the medical word form.

And the label establishing module is used for establishing a similarity label between the first training sample B1 and at least one second training sample B2, wherein the second training sample B2 is a medical word which has correlation with the characters of the first training sample B1.

And the label input module is used for inputting the similarity labels into the medical word form.

Further, the sentence matching module 902 includes:

and the character recognition submodule is used for recognizing the analysis characters in the sentence to be analyzed.

And the label confirmation submodule is used for confirming the corresponding similarity label of the analysis character in the medical word list.

And the association obtaining submodule is used for obtaining the medical words which are related to the words of the sentence to be analyzed according to the similarity labels.

Further, the encoding processing module 904 includes:

and the redundancy determining submodule is used for determining the redundant characters in the sentence to be analyzed according to the human body part vocabulary and the uncomfortable symptom vocabulary.

And the vectorization processing submodule is used for filtering the redundant characters from the sentences to be analyzed through the Bilstm model, and carrying out vectorization processing on the sentences to be analyzed after the redundant characters are filtered to obtain the original sentence vectors.

Further, a pooling analysis module 906, comprising:

and the feature extraction submodule is used for respectively extracting features of the original sentence vector and the medical word vector.

And the semantic splicing submodule is used for performing semantic splicing on the extracted features after dimensionality reduction to form an original feedforward vector and a medical feedforward vector.

Further, the result identification module 908 comprises:

a cosine calculation sub-module for:

and calculating to obtain cosine values between the original feedforward vector and the medical feedforward vector, wherein fw (x1) is the original feedforward vector, fw (x2) is the medical feedforward vector, and Ew is the similarity between the original feedforward vector and the medical feedforward vector.

The medical word meaning recognition device searches out medical words related to the sentences to be analyzed from a preset medical word list by acquiring the input sentences to be analyzed; then, the statement to be analyzed and the medical word are led into a Billstm model to obtain an original statement vector and a medical word vector, the original phrase vector and the medical word vector are respectively subjected to pooling analysis, deeper semantic information of the original phrase vector and the medical word vector is reserved, an original feedforward vector and a medical feedforward vector are obtained, and the accuracy of semantic analysis is improved; finally, the original feedforward vector and the medical feedforward vector are led into a cosine similarity calculation method, the cosine value of the calculated medical feedforward vector is used as the similarity of the original feedforward vector, the medical word with the highest similarity to the original feedforward vector is determined from a plurality of medical feedforward vectors by repeatedly calculating the cosine value, so that the medical word closest to the sentence is determined from tens of millions of medical concepts according to the sentence input by a patient/doctor, the technical sharing of medical federation and union under the large data environment is realized, and the most accurate medical diagnosis can be performed in remote areas.

In one embodiment, a computer device is provided, which may be a server, and its internal structure diagram may be as shown in fig. 10. The computer device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The database of the computer device is used for storing user order data. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a method of medical word sense recognition. The method comprises the steps of searching out medical words related to a sentence to be analyzed from a preset medical word list by acquiring an input sentence to be analyzed; then, the statement to be analyzed and the medical word are led into a Billstm model to obtain an original statement vector and a medical word vector, the original phrase vector and the medical word vector are respectively subjected to pooling analysis, deeper semantic information of the original phrase vector and the medical word vector is reserved, an original feedforward vector and a medical feedforward vector are obtained, and the accuracy of semantic analysis is improved; finally, the original feedforward vector and the medical feedforward vector are led into a cosine similarity calculation method, the cosine value of the calculated medical feedforward vector is used as the similarity of the original feedforward vector, the medical word with the highest similarity to the original feedforward vector is determined from a plurality of medical feedforward vectors by repeatedly calculating the cosine value, so that the medical word closest to the sentence is determined from tens of millions of medical concepts according to the sentence input by a patient/doctor, the technical sharing of medical federation and union under the large data environment is realized, and the most accurate medical diagnosis can be performed in remote areas.

As will be understood by those skilled in the art, the computer device is a device capable of automatically performing numerical calculation and/or information processing according to a preset or stored instruction, and the hardware includes, but is not limited to, a microprocessor, an Application Specific Integrated Circuit (ASIC), a Programmable Gate Array (FPGA), a Digital Signal Processor (DSP), an embedded device, and the like.

Those skilled in the art will appreciate that the architecture shown in fig. 10 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.

In one embodiment, a computer-readable storage medium is provided, on which a computer program is stored, which when being executed by a processor implements the steps of the medical word sense recognition method in the above-described embodiments, such as the steps 202 to 208 shown in fig. 2, or the processor implements the functions of the modules/units of the medical word sense recognition apparatus in the above-described embodiments, such as the modules 902 to 908 shown in fig. 9. The method comprises the steps of searching out medical words related to a sentence to be analyzed from a preset medical word list by acquiring an input sentence to be analyzed; then, the statement to be analyzed and the medical word are led into a Billstm model to obtain an original statement vector and a medical word vector, the original phrase vector and the medical word vector are respectively subjected to pooling analysis, deeper semantic information of the original phrase vector and the medical word vector is reserved, an original feedforward vector and a medical feedforward vector are obtained, and the accuracy of semantic analysis is improved; finally, the original feedforward vector and the medical feedforward vector are led into a cosine similarity calculation method, the cosine value of the calculated medical feedforward vector is used as the similarity of the original feedforward vector, the medical word with the highest similarity to the original feedforward vector is determined from a plurality of medical feedforward vectors by repeatedly calculating the cosine value, so that the medical word closest to the sentence is determined from tens of millions of medical concepts according to the sentence input by a patient/doctor, the technical sharing of medical federation and union under the large data environment is realized, and the most accurate medical diagnosis can be performed in remote areas.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-mentioned functions.

The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for those skilled in the art, without departing from the spirit and scope of the present invention, several changes, modifications and equivalent substitutions of some technical features may be made, and these changes or substitutions do not make the essence of the same technical solution depart from the spirit and scope of the technical solution of the embodiments of the present invention. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. A medical word sense recognition method, the method comprising:

2. The method according to claim 1, before the finding out the medical word having a correlation with the sentence to be analyzed from a preset medical word list according to the sentence to be analyzed, further comprising:

acquiring a first training sample B1 from a case database, and acquiring a second training sample B2 from the medical word form;

establishing a similarity label between the first training sample B1 and at least one second training sample B2, wherein the second training sample B2 is a medical word having a correlation with the characters of the first training sample B1;

and inputting the similarity label into a medical word form.

3. The method of claim 2, wherein the finding out the medical word having a correlation with the sentence to be analyzed from a preset medical word list according to the sentence to be analyzed comprises:

identifying the analysis words in the sentence to be analyzed;

determining a corresponding similarity label of the analysis character in the medical word form;

and obtaining the medical words which are relevant to the words of the sentence to be analyzed according to the similarity label.

4. The method of claim 1, wherein the introducing the sentence to be analyzed and the medical word into a Bilstm model to obtain an original sentence vector and a medical word vector comprises:

confirming redundant characters in the sentence to be analyzed according to the human body part vocabulary and the discomfort symptom vocabulary;

and filtering the redundant characters from the sentences to be analyzed through the Bilstm model, and vectorizing the sentences to be analyzed after the redundant characters are filtered to obtain the original sentence vectors.

5. The method of claim 1, wherein the pooling analysis of the original sentence vector and the medical word vector, respectively, resulting in an original feedforward vector and a medical feedforward vector, comprises:

respectively extracting the features of the original sentence vector and the medical word vector;

and performing semantic splicing on the extracted features after dimensionality reduction to form the original feedforward vector and the medical feedforward vector.

6. The method of claim 1, wherein the introducing the original feedforward vector and the medical feedforward vector into a cosine similarity algorithm to obtain a cosine value between the original feedforward vector and the medical feedforward vector comprises:

according to the formula:

7. A medical word sense recognition apparatus, comprising:

8. The apparatus of claim 7, further comprising:

the sample acquisition module is used for acquiring a first training sample B1 from a case database and acquiring a second training sample B2 from the medical word form;

a label construction module, configured to establish a similarity label between the first training sample B1 and at least one second training sample B2, where the second training sample B2 is a medical word having a correlation with a text of the first training sample B1;

and the label import module is used for inputting the similarity label into the medical word form.

9. A computer device comprising a memory and a processor, the memory storing a computer program, wherein the processor implements the steps of the method of any one of claims 1 to 6 when executing the computer program.

10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 6.