CN112989767B

CN112989767B - Medical term labeling method, medical term mapping device and medical term mapping equipment

Info

Publication number: CN112989767B
Application number: CN202110430710.8A
Authority: CN
Inventors: 施晓明; 陈曦; 张子恒
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2021-04-21
Filing date: 2021-04-21
Publication date: 2021-09-03
Anticipated expiration: 2041-04-21
Also published as: CN112989767A

Abstract

The embodiment of the application discloses a medical word labeling method, a medical word mapping device and medical word mapping equipment, and belongs to the technical field of computers. The method comprises the following steps: the method comprises the steps of obtaining a first symptom description statement and a reply statement of the first symptom description statement, generating marking information of the first symptom description statement based on medical terms contained in the reply statement, training a term marking model according to the first symptom description statement and the marking information, calling the trained term marking model, marking a second symptom description statement to obtain medical terms corresponding to the second symptom description statement, adopting the first symptom description statement and the reply statement of the first symptom description statement to train the term marking model, achieving unsupervised training of the term marking model, achieving automatic marking of the symptom description statement through the trained term marking model, saving man-made marking time, and improving marking efficiency.

Description

Medical term labeling method, medical term mapping device and medical term mapping equipment

Technical Field

The embodiment of the application relates to the technical field of computers, in particular to a medical term labeling method, a medical term mapping device and medical term mapping equipment.

Background

With the development of computer technology, the technology of semantic recognition is more and more widely applied. For example, in a medical scenario, a semantic recognition technology is adopted, so that a spoken symptom description sentence input by a user can be mapped into a medical word, and a doctor service can be provided for the user according to the medical word subsequently.

In the related technology, a manual marking mode is adopted, medical words with the same meaning as that of an input sentence are marked according to the meaning expressed by the input sentence, and a network model is trained according to a marking result, so that the medical words corresponding to any input sentence can be determined based on the trained network model. However, the manual labeling method takes a long time and is inefficient.

Disclosure of Invention

The embodiment of the application provides a medical term labeling method, a medical term mapping device and medical term mapping equipment, which can improve labeling efficiency.

In one aspect, a method for labeling medical terms is provided, the method comprising:

acquiring a first symptom description statement and a reply statement to the first symptom description statement;

generating annotation information of the first symptom description statement based on medical terms contained in the reply statement, wherein the annotation information comprises medical terms related to symptoms described by the first symptom description statement;

training a word annotation model according to the first symptom description statement and the annotation information, wherein the word annotation model is used for annotating a corresponding medical word for any symptom description statement;

and calling the trained word labeling model, and labeling the second symptom description sentence to obtain the medical word corresponding to the second symptom description sentence.

In one possible implementation, the method further includes:

segmenting the second symptom description sentence to obtain a plurality of sentence segments, wherein each sentence segment comprises at least one word;

respectively determining the sum of the weights of the words contained in each sentence fragment as the weight of each sentence fragment;

determining a sentence segment corresponding to the maximum weight from the plurality of sentence segments as a sample sentence segment in the second symptom describing sentence.

In another possible implementation manner, the training the word mapping model again according to the third symptom description sentence and the corresponding medical word includes:

and training the word mapping model again according to a sample sentence segment in the third symptom description sentence and the medical word corresponding to the third symptom description sentence, wherein the sample sentence segment is a segment used for describing symptoms in the third symptom description sentence.

In another possible implementation manner, the invoking the word annotation model, performing feature extraction on the first symptom description statement, and obtaining a statement feature of the first symptom description statement includes:

calling the word annotation model, and performing word segmentation processing on the first symptom description sentence to obtain at least one word;

coding the at least one word to obtain a word vector of the at least one word;

and fusing the word vectors of the at least one word to obtain the sentence characteristics of the first symptom description sentence.

In another possible implementation, the at least one term includes a plurality of terms; the fusing the word vectors of the at least one word to obtain the sentence characteristics of the first symptom description sentence includes:

acquiring the weight of each word according to the word vectors of the words;

and carrying out weighted fusion on the word vectors of the words according to the weights of the words to obtain the sentence characteristics of the first symptom description sentence.

In another possible implementation manner, the extracting keywords from the reply sentence to obtain the medical term in the reply sentence includes:

performing word segmentation processing on the reply sentence to obtain at least one word;

and querying the at least one term from a knowledge database to obtain medical terms contained in the at least one term, wherein the knowledge database comprises at least one medical term.

In another possible implementation manner, the extracting keywords from the reply sentence to obtain a first medical term and a second medical term in the reply sentence includes:

and according to the at least one term, inquiring a first medical term and a second medical term contained in the at least one term from the knowledge database.

In another possible implementation, the term annotation model includes a plurality of reference medical terms, each reference medical term describing a symptom; the calling of the trained word labeling model to label the second symptom description sentence to obtain the medical word corresponding to the second symptom description sentence includes:

calling the trained word labeling model, labeling the second symptom description sentence, and obtaining corresponding probabilities of the plurality of reference medical words, wherein the probability corresponding to each reference medical word is used for indicating the possibility that the reference medical word is the medical word corresponding to the second symptom description sentence;

and determining the reference medical word corresponding to the maximum probability in the plurality of reference medical words as the medical word corresponding to the second symptom description sentence.

In another aspect, a medical word mapping method is provided, the method comprising:

training a word mapping model according to the first symptom description statement and the labeling information, wherein the word mapping model is used for labeling a corresponding medical word for any symptom description statement;

and calling the trained word mapping model, and mapping the target symptom description sentence to obtain the medical word corresponding to the target symptom description sentence.

In another aspect, a medical term tagging apparatus is provided, the apparatus comprising:

the acquisition module is used for acquiring a first symptom description statement and a reply statement of the first symptom description statement;

a generating module, configured to generate labeling information of the first symptom description statement based on medical terms included in the reply statement, where the labeling information includes medical terms associated with symptoms described by the first symptom description statement;

the training module is used for training a word labeling model according to the first symptom description statement and the labeling information, wherein the word labeling model is used for labeling a corresponding medical word for any symptom description statement;

and the labeling module is used for calling the trained word labeling model, labeling the second symptom description statement and obtaining the medical word corresponding to the second symptom description statement.

In one possible implementation, the apparatus further includes:

the training module is further configured to train a word mapping model according to the second symptom description sentence and the medical word corresponding to the second symptom description sentence, where the word mapping model is configured to map any symptom description sentence into the corresponding medical word;

and the mapping module is used for calling the trained word mapping model, mapping any target symptom description statement and obtaining the medical word corresponding to the target symptom description statement.

In another possible implementation manner, the training module is configured to train the term mapping model according to a sample term segment in the second symptom description term and a medical term corresponding to the second symptom description term, where the sample term segment is a segment used for describing a symptom in the second symptom description term;

the mapping module is configured to invoke the trained word mapping model, map a target statement segment in the target symptom description statement, and obtain a medical word corresponding to the target symptom description statement, where the target statement segment is a segment used for describing a symptom in the target symptom description statement.

In another possible implementation manner, the apparatus further includes:

the segmentation module is used for carrying out segmentation processing on the second symptom description sentence to obtain a plurality of sentence segments, and each sentence segment comprises at least one word;

a determining module, configured to determine a sum of weights of words included in each sentence fragment as a weight of each sentence fragment;

the determining module is further configured to determine a sentence segment corresponding to the maximum weight from the plurality of sentence segments as a sample sentence segment in the second symptom description sentence.

In another possible implementation manner, the obtaining module is further configured to obtain a third symptom description sentence and a corresponding medical term, where the medical term corresponding to the third symptom description sentence is obtained by artificial labeling;

and the training module is also used for training the word mapping model again according to the third symptom description sentence and the corresponding medical word.

In another possible implementation manner, the training module is configured to train the term mapping model again according to a sample term segment in the third symptom describing statement and a medical term corresponding to the third symptom describing statement, where the sample term segment is a segment used for describing a symptom in the third symptom describing statement.

In another possible implementation, the term annotation model includes a plurality of reference medical terms, each reference medical term describing a symptom; the training module comprises:

a labeling unit, configured to invoke the term labeling model, label the first symptom describing statement, and obtain prediction probabilities corresponding to the plurality of reference medical terms, where the prediction probability corresponding to each reference medical term is used to indicate a possibility that the reference medical term is a medical term corresponding to the first symptom describing statement;

and the training unit is used for training the word labeling model according to the prediction probabilities corresponding to the plurality of reference medical words and the medical words in the labeling information.

In another possible implementation manner, the labeling unit is configured to invoke the word labeling model, perform feature extraction on the first symptom describing statement, and obtain a statement feature of the first symptom describing statement; performing feature transformation on the sentence features to obtain reference features, wherein the reference features comprise feature values of multiple dimensions, and each dimension corresponds to one reference medical word; and respectively determining the characteristic value of each dimension in the reference characteristic as the prediction probability corresponding to the reference medical term corresponding to each dimension.

In another possible implementation manner, the tagging unit is configured to invoke the word tagging model, and perform word segmentation processing on the first symptom description sentence to obtain at least one word; coding the at least one word to obtain a word vector of the at least one word; and fusing the word vectors of the at least one word to obtain the sentence characteristics of the first symptom description sentence.

In another possible implementation, the at least one term includes a plurality of terms; the labeling unit is used for acquiring the weight of each word according to the word vectors of the plurality of words; and carrying out weighted fusion on the word vectors of the words according to the weights of the words to obtain the sentence characteristics of the first symptom description sentence.

In another possible implementation manner, the training unit is configured to determine a first target value as a true probability of a medical word in the annotation information; for any medical term in the labeling information, determining a prediction probability corresponding to a reference medical term which is the same as the medical term as the prediction probability corresponding to the medical term; determining a loss value of the word annotation model according to the prediction probability and the real probability corresponding to each medical word in the annotation information; and training the word labeling model according to the loss value.

In another possible implementation manner, the apparatus further includes:

a determining module, configured to determine a second target value as a true probability corresponding to a reference medical term that is not included in the annotation information in the plurality of reference medical terms;

the training unit is used for determining the loss value of the word labeling model according to the prediction probability and the real probability corresponding to each medical word in the labeling information and the prediction probability and the real probability corresponding to the reference medical word which is not included in the labeling information.

In another possible implementation manner, the generating module includes:

the extraction unit is used for extracting keywords from the reply sentences to obtain medical terms in the reply sentences;

and the composition unit is used for composing the obtained medical words into the labeling information.

In another possible implementation manner, the extracting unit is configured to perform word segmentation processing on the reply sentence to obtain at least one word; and querying the at least one term from a knowledge database to obtain medical terms contained in the at least one term, wherein the knowledge database comprises at least one medical term.

In another possible implementation manner, the generating module includes:

the extraction unit is used for extracting keywords from the reply sentences to obtain first medical terms and second medical terms in the reply sentences, wherein the first medical terms are terms used for describing diseases, and the second medical terms are terms used for describing symptoms;

the query unit is used for querying at least one second medical term corresponding to the first medical term from a knowledge database according to the first medical term, wherein the knowledge database comprises a corresponding relation between the first medical term and the second medical term;

and the composition unit is used for composing the extracted second medical term and the inquired second medical term into the labeling information.

In another possible implementation manner, the extracting unit is configured to perform word segmentation processing on the reply sentence to obtain at least one word; and according to the at least one term, inquiring a first medical term and a second medical term contained in the at least one term from the knowledge database.

In another possible implementation, the term annotation model includes a plurality of reference medical terms, each reference medical term describing a symptom; the labeling module is configured to call the trained word labeling model, label the second symptom describing statement, and obtain probabilities corresponding to the plurality of reference medical words, where the probability corresponding to each reference medical word is used to indicate a possibility that the reference medical word is a medical word corresponding to the second symptom describing statement; and determining the reference medical word corresponding to the maximum probability in the plurality of reference medical words as the medical word corresponding to the second symptom description sentence.

In another aspect, a medical word mapping apparatus is provided, the apparatus comprising:

the training module is used for training a word mapping model according to the first symptom description statement and the labeling information, wherein the word mapping model is used for labeling a corresponding medical word for any symptom description statement;

and the mapping module is used for calling the trained word mapping model, mapping the target symptom description statement and obtaining the medical word corresponding to the target symptom description statement.

In another aspect, a computer device is provided, the computer device comprising a processor and a memory, the memory having stored therein at least one computer program, the at least one computer program being loaded and executed by the processor to implement the operations performed in the medical term annotation method according to the above aspect or to implement the operations performed in the medical term mapping method according to the above aspect.

In another aspect, a computer-readable storage medium is provided, in which at least one computer program is stored, the at least one computer program being loaded and executed by a processor to implement the operations performed in the medical term labeling method according to the above aspect or to implement the operations performed in the medical term mapping method according to the above aspect.

In yet another aspect, a computer program product or a computer program is provided, the computer program product or the computer program comprising computer program code, the computer program code being stored in a computer readable storage medium. The computer program code is read by a processor of a computer device from a computer readable storage medium, the computer program code being executed by the processor to cause the computer device to carry out the operations carried out in the medical term labeling method as described in the above aspect or to carry out the operations carried out in the medical term mapping method as described in the above aspect.

The beneficial effects brought by the technical scheme provided by the embodiment of the application at least comprise:

according to the method, the device, the computer equipment and the storage medium, the first symptom description statement and the reply statement of the first symptom description statement are adopted to train the word annotation model, and in the training process, the symptom description statement does not need to be artificially annotated, so that unsupervised training of the word annotation model is realized, the dependence of the word annotation model on annotation data can be reduced, and the accuracy of the word annotation model is improved. Moreover, automatic labeling of symptom description sentences can be achieved through the trained word labeling model, manual labeling of the symptom description sentences is not needed, manual labeling time is saved, and labeling efficiency is improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 is a schematic diagram of an implementation environment provided by an embodiment of the present application;

FIG. 2 is a flow chart of a method for tagging medical terms provided by an embodiment of the present application;

FIG. 3 is a flow chart of another method for labeling medical terms provided by an embodiment of the present application;

FIG. 4 is a flowchart of obtaining annotation information according to an embodiment of the present disclosure;

FIG. 5 is a flow chart for obtaining a prediction probability of a reference medical term provided by an embodiment of the present application;

FIG. 6 is a flowchart for acquiring medical terms and sample sentence fragments according to an embodiment of the present application;

FIG. 7 is a flow chart of a medical term mapping method provided by an embodiment of the present application;

FIG. 8 is a flow chart of another medical term mapping method provided by an embodiment of the present application;

FIG. 9 is a schematic structural diagram of a medical term tagging device provided by an embodiment of the application;

FIG. 10 is a schematic structural diagram of another medical term tagging device provided by an embodiment of the application;

FIG. 11 is a schematic structural diagram of a medical term mapping apparatus provided in an embodiment of the present application;

fig. 12 is a schematic structural diagram of a terminal according to an embodiment of the present application;

fig. 13 is a schematic structural diagram of a server according to an embodiment of the present application.

Detailed Description

To make the objects, technical solutions and advantages of the embodiments of the present application more clear, the embodiments of the present application will be further described in detail with reference to the accompanying drawings.

The terms "first," "second," "third," and the like as used herein may be used herein to describe various concepts that are not limited by these terms unless otherwise specified. These terms are only used to distinguish one concept from another. For example, a first symptom descriptive statement can be referred to as a second symptom descriptive statement, and similarly, a second symptom descriptive statement can be referred to as a first symptom descriptive statement, without departing from the scope of the present application.

As used herein, the terms "at least one," "a plurality," "each," and "any," at least one of which includes one, two, or more than two, and a plurality of which includes two or more than two, each of which refers to each of the corresponding plurality, and any of which refers to any of the plurality. For example, the plurality of sentence fragments includes 3 sentence fragments, each sentence fragment refers to each sentence fragment in the 3 sentence fragments, and any sentence fragment refers to any sentence fragment in the 3 sentence fragments, which can be the first sentence fragment, the second sentence fragment, or the third sentence fragment.

Artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.

The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.

Natural Language Processing (NLP) is an important direction in the fields of computer science and artificial intelligence. It studies various theories and methods that enable efficient communication between humans and computers using natural language. Natural language processing is a science integrating linguistics, computer science and mathematics. Therefore, the research in this field will involve natural language, i.e. the language that people use everyday, so it is closely related to the research of linguistics. Natural language processing techniques typically include text processing, semantic understanding, machine translation, robotic question and answer, knowledge mapping, and the like.

Machine Learning (ML) is a multi-domain cross discipline, and relates to a plurality of disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like. The special research on how a computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills and reorganize the existing knowledge structure to continuously improve the performance of the computer. Machine learning is the core of artificial intelligence, is the fundamental approach for computers to have intelligence, and is applied to all fields of artificial intelligence. Machine learning and deep learning generally include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and teaching learning.

According to the scheme provided by the embodiment of the application, the word annotation model can be trained according to the technologies of artificial intelligence, such as natural language processing, machine learning and the like, and automatic annotation of the symptom description sentences can be realized by utilizing the trained word annotation model.

The medical term labeling method or the medical term mapping method provided by the embodiment of the application can be used in computer equipment. Optionally, the computer device is a terminal or a server. Optionally, the server is an independent physical server, or a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a web service, cloud communication, a middleware service, a domain name service, a security service, a CDN (Content Delivery Network), a big data and artificial intelligence platform, and the like. Optionally, the terminal is a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart speaker, a smart watch, etc., but is not limited thereto.

In one possible implementation, the computer program according to the embodiments of the present application may be deployed to be executed on one computer device or on multiple computer devices located at one site, or may be executed on multiple computer devices distributed at multiple sites and interconnected by a communication network, where the multiple computer devices distributed at the multiple sites and interconnected by the communication network can form a block chain system.

The blockchain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism and an encryption algorithm. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product services layer, and an application services layer.

The block chain underlying platform can comprise processing modules such as user management, basic service, intelligent contract and operation monitoring. The user management module is responsible for identity information management of all blockchain participants, and comprises public and private key generation maintenance (account management), key management, user real identity and blockchain address corresponding relation maintenance (authority management) and the like, and under the authorization condition, the user management module supervises and audits the transaction condition of certain real identities and provides rule configuration (wind control audit) of risk control; the basic service module is deployed on all block chain node equipment and used for verifying the validity of the service request, recording the service request to storage after consensus on the valid request is completed, for a new service request, the basic service firstly performs interface adaptation analysis and authentication processing (interface adaptation), then encrypts service information (consensus management) through a consensus algorithm, transmits the service information to a shared account (network communication) completely and consistently after encryption, and performs recording and storage; the intelligent contract module is responsible for registering and issuing contracts, triggering the contracts and executing the contracts, developers can define contract logics through a certain programming language, issue the contract logics to a block chain (contract registration), call keys or other event triggering and executing according to the logics of contract clauses, complete the contract logics and simultaneously provide the function of upgrading and canceling the contracts; the operation monitoring module is mainly responsible for deployment, configuration modification, contract setting, cloud adaptation in the product release process and visual output of real-time states in product operation, such as: alarm, monitoring network conditions, monitoring node equipment health status, and the like.

The platform product service layer provides basic capability and an implementation framework of typical application, and developers can complete block chain implementation of business logic based on the basic capability and the characteristics of the superposed business. The application service layer provides the application service based on the block chain scheme for the business participants to use.

Fig. 1 is a schematic diagram of an implementation environment provided by an embodiment of the present application. Referring to fig. 1, the implementation environment includes a plurality of terminals 101 (only 3 terminals 101 are taken as an example in fig. 1) and a server 102. The terminal 101 and the server 102 are connected via a wireless or wired network. The terminal 101 has installed thereon a target application served by the server 102, through which the terminal 101 can implement functions such as data transmission, message interaction, and the like. Optionally, the target application is a target application in an operating system of the terminal 101, or a target application provided by a third party. For example, the target application is a visit guide application having a guide function, but the visit guide application may have other functions, such as a comment function, a content sharing function, a navigation function, and the like.

Each terminal 101 is configured to log in a target application according to a user identifier, send a symptom description statement input by a user to the server 102 through the target application, receive the symptom description statements sent by the plurality of terminals 101 by the server 102, and obtain a reply statement corresponding to each symptom description statement, the server 102 is further configured with a word annotation model, based on the obtained symptom description statements and the corresponding reply statements, the word annotation model can be trained, and based on the trained word annotation model, the symptom description statement sent by any terminal 101 can be annotated.

The method provided by the embodiment of the application can be used for various scenes.

For example, in a physician consultation scenario:

the method comprises the steps that a diagnosis guide application is installed in a terminal, a word annotation model and a word mapping model are configured in a server corresponding to the diagnosis guide application, the server adopts the method provided by the embodiment of the application, the trained word annotation model is called, a plurality of symptom description sentences are annotated, and the word mapping model is trained according to the marked symptom description sentences and the corresponding medical words. Then, the terminal logs in the diagnosis guiding application based on the user identification, sends the input symptom description statement to the server, the server calls the trained word mapping model after receiving the symptom description statement, determines the medical word corresponding to the symptom description statement, and then a doctor determines the physical condition of the user according to the determined medical word and in combination with other information of the user.

For example, under the scene of the visit guide:

the method comprises the steps that a diagnosis guide application is installed in a terminal, a word annotation model and a word mapping model are configured in a server corresponding to the diagnosis guide application, the server adopts the method provided by the embodiment of the application, the trained word annotation model is called, a plurality of symptom description sentences are annotated, and the word mapping model is trained according to the marked symptom description sentences and the corresponding medical words. Then, the terminal logs in the doctor-seeing guide application based on the user identification, the input symptom description statement is sent to the server, after the server receives the symptom description statement, the trained word mapping model is called, the medical word corresponding to the symptom description statement is determined, the department to which the disease with the symptom described by the medical word belongs is determined, the department or a doctor belonging to the department is returned to the doctor-seeing guide application, and the terminal displays the doctor-seeing guide application through the doctor-seeing guide application, so that the user can see a doctor according to the displayed department or doctor.

Fig. 2 is a flowchart of a medical word labeling method provided by an embodiment of the present application, which is applied to a computer device, and as shown in fig. 2, the method includes the following steps.

201. The computer device obtains a first symptom descriptive statement and a reply statement to the first symptom descriptive statement.

Wherein the first symptom description statement is used for describing a symptom, and the reply statement is a statement used for replying to the first symptom description statement, and the medical word contained in the reply statement is associated with the symptom described by the first symptom description statement.

202. The computer device generates annotation information for the first symptom description statement based on the medical terms contained in the reply statement.

The annotation information includes medical terms associated with the symptoms described by the first symptom description sentence, and the medical terms in the annotation information are real medical terms corresponding to the first symptom description sentence.

203. And the computer equipment trains the word annotation model according to the first symptom description sentence and the annotation information.

The term labeling model is used for labeling the corresponding medical terms for any symptom description statement. In the embodiment of the application, the medical word in the annotation information is a real medical word corresponding to the first symptom description sentence, and the word annotation model is trained through the first symptom description sentence and the medical word in the first symptom description sentence, so that the annotation capacity of the word annotation model is improved, and the accuracy of the word annotation model is improved.

204. And calling the trained word labeling model by the computer equipment, and labeling the second symptom description sentence to obtain the medical word corresponding to the second symptom description sentence.

Wherein the second symptom descriptive statement is used to describe a symptom, and the second symptom descriptive statement is similar to the first symptom descriptive statement. After the word annotation model is trained, the trained word annotation model is called to label any symptom description statement, so that the medical word corresponding to the symptom description statement is obtained.

According to the method provided by the embodiment of the application, the first symptom description statement and the reply statement of the first symptom description statement are adopted to train the word annotation model, and in the training process, the symptom description statement does not need to be artificially annotated, so that the unsupervised training of the word annotation model is realized, the dependence of the word annotation model on annotation data can be reduced, and the accuracy of the word annotation model is improved. Moreover, automatic labeling of symptom description sentences can be achieved through the trained word labeling model, manual labeling of the symptom description sentences is not needed, manual labeling time is saved, and labeling efficiency is improved.

Fig. 3 is a flowchart of a medical word labeling method provided by an embodiment of the present application, which is applied to a computer device, and as shown in fig. 3, the method includes the following steps.

301. The computer device obtains a first symptom descriptive statement and a reply statement to the first symptom descriptive statement.

Wherein the first symptom descriptive statement is used for describing symptoms, optionally, the first symptom descriptive statement is a spoken descriptive statement, for example, the first symptom descriptive statement is "my belly pain", and the reply statement to the first symptom descriptive statement is a statement for replying to the first symptom descriptive statement, and the medical word contained in the reply statement is associated with the symptom described by the first symptom descriptive statement. Optionally, the reply sentence is replied to by the doctor for the first symptom describing sentence, for example, the reply sentence is "abdominal pain may be caused by various reasons".

In one possible implementation, this step 301 includes: the computer device receives the first symptom describing statement and the reply statement to the first symptom describing statement sent by the other device.

In another possible implementation manner, the computer device is a server that provides a service for the target application, a database in the server includes a plurality of first symptom describing statements and a reply statement for each first symptom describing statement, and the computer device obtains the first symptom describing statement and the reply statement for the first symptom describing statement from the database.

For example, the target application is a disease diagnosis application, the disease diagnosis application is installed in each of the user terminal and the doctor terminal, the server is a server for providing a service for the disease diagnosis application, and when the user chats with the doctor based on the disease diagnosis application, the server can store a symptom description sentence input by the user and a reply sentence replied by the doctor in the database.

In one possible implementation, this step 301 includes: the computer device obtains a plurality of first symptom describing statements and a reply statement to each of the first symptom describing statements.

302. The computer device generates annotation information for the first symptom description statement based on the medical terms contained in the reply statement.

The annotation information includes a medical term associated with the symptom described by the first symptom description statement, optionally, the medical term included in the annotation information is a second medical term, or the medical term included in the annotation information is a first medical term, or the annotation information includes the second medical term and the first medical term. Wherein the first medical term is a term used to describe a disease and the second medical term is a term used to describe a symptom. For example, the first symptom description sentence is "my belly ache", and the medical word included in the label information is "abdominal pain". Optionally, the annotation information includes at least one medical term associated with the symptom described by the first symptom describing statement.

In the embodiment of the present application, the reply sentence to the first symptom describing sentence includes the medical word, and the medical word included in the reply sentence is associated with the first symptom describing sentence. Optionally, the medical word included in the reply sentence is a second medical word, and the symptom described by the second medical word is the same as the symptom described by the first symptom description sentence. For example, the medical word included in the reply sentence is "abdominal pain", and the first symptom description sentence is "my belly pain".

Optionally, the medical word included in the reply sentence is a first medical word, and the symptom described by the first symptom description sentence is a symptom of the disease described by the first medical word. For example, the medical word included in the reply sentence is "gastrointestinal disease", and the first symptom describing sentence is "my belly pain", that is, the medical word included in the reply sentence is the first medical word, and the symptom described by the first symptom describing sentence is a symptom of the disease indicated by the first medical word.

Optionally, the medical word included in the reply sentence is a second medical word, and the symptom described by the second medical word is the same as the symptom described by the first symptom description sentence.

Optionally, the plurality of medical terms included in the reply sentence include a second medical term and a first medical term, and the second medical term and the first medical term are both associated with the symptom described by the first symptom description sentence. For example, the second medical word included in the reply sentence is "abdominal pain", the first medical word included in the reply sentence is "gastrointestinal disease", and the first symptom description sentence is "my belly pain", that is, the symptom described by the second medical word is the same as the symptom described by the first symptom description sentence, and the symptom described by the first symptom description sentence is the symptom of the disease indicated by the first medical word.

In one possible implementation, this step 302 includes: and calling an information generation model, and generating the labeling information of the first symptom description statement based on the medical words contained in the reply statement.

In the embodiment of the application, the information generation model is used for generating the labeling information of any symptom description statement. Optionally, the information generation model is a pseudo label generator, and the labeling information is a pseudo label of the first symptom description statement. And subsequently training a word labeling model based on the pseudo label.

In one possible implementation, this step 302 includes the following three ways.

The first mode is as follows: and extracting keywords from the reply sentences to obtain medical words in the reply sentences, and forming the obtained medical words into the labeling information.

In the embodiment of the application, the reply sentence includes medical terms, the medical terms included in the reply sentence are extracted from the reply sentence in a keyword extraction mode, and the extracted medical terms form the labeling information of the first symptom description sentence.

In one possible implementation, the process of obtaining the medical word in the reply sentence includes: and performing word segmentation processing on the reply sentence to obtain at least one word, and inquiring medical words contained in the at least one word from the knowledge database by using the at least one word.

Wherein the knowledge database includes at least one medical term. Optionally, the medical terms contained in the knowledge database include at least one of the second medical term and the first medical term. At least one word contained in the reply sentence is obtained by adopting a word segmentation processing mode, each word is compared with the medical word contained in the knowledge database respectively to determine the word which is the same as the medical word contained in the knowledge database in the at least one word, so that the medical word contained in the at least one word is determined, and the accuracy of the determined medical word is ensured.

Optionally, the process of querying the medical term includes: for any term in the at least one term, querying a knowledge database according to the term, in response to querying a medical term that is the same as the term, determining that the term is a medical term, and in response to not querying the medical term that is the same as the term, determining that the term is not a medical term.

The second way includes: and extracting keywords from the reply sentences to obtain first medical terms and second medical terms in the reply sentences, inquiring at least one second medical term corresponding to the first medical terms from the knowledge database according to the first medical terms, and forming the extracted second medical terms and the inquired second medical terms into the labeling information.

In an embodiment of the present application, the knowledge database comprises a correspondence between the first medical term and the second medical term for representing a symptom possessed by a certain disease, and optionally, the knowledge database is a medical knowledge map. For example, in the knowledge database, a first medical term is "fungal infection" and a second term corresponding to the first medical term is "red and swollen". The reply sentence comprises a first medical word and a second medical word, wherein the first medical word is used for describing diseases, and the second medical word is used for describing symptoms.

As shown in fig. 4, the reply sentence is "the rash may be caused by fungal infection", the first medical term "fungal infection" and the second medical term "rash" in the reply sentence are obtained by performing keyword extraction on the reply sentence, and according to the first medical term, the knowledge database is queried to determine the second medical term corresponding to the first medical term "fungal infection", and all the obtained second medical terms constitute the annotation information of the first symptom description sentence corresponding to the reply sentence. And, in the knowledge database, the correspondence between the first medical word and the second medical word is stored in the form of a triplet, i.e. the first medical word, the word used to represent the relationship, and the second medical word, such as "fungal infection, symptom, redness", a certain symptom representing the first medical word "fungal infection" being the second medical word "redness".

And extracting a first medical term and a second medical term contained in the reply sentence from the reply sentence by adopting a keyword extraction mode, inquiring the second medical term corresponding to the first medical term from a knowledge database to enrich the second medical term related to the symptom described by the first symptom description sentence, and then extracting the second medical term and the inquired second medical term to form the labeling information corresponding to the first symptom description sentence, thereby ensuring the accuracy of the labeling information.

In one possible implementation, the process of obtaining the medical word in the reply sentence includes: and performing word segmentation processing on the reply sentence to obtain at least one word, and inquiring a first medical word and a second medical word contained in the at least one word from the knowledge database according to the at least one word.

At least one word contained in the reply sentence is obtained by adopting a word segmentation processing mode, each word is compared with the medical word contained in the knowledge database respectively to determine the word which is the same as the medical word contained in the knowledge database in the at least one word, so that the first medical word and the second medical word contained in the at least one word are determined, and the accuracy of the determined medical word is ensured.

Optionally, the knowledge database includes a first sub-database and a second sub-database, where the first sub-database includes a plurality of first medical terms, and the second sub-database includes a plurality of second medical terms. The process of querying the medical term includes: for any term in the at least one term, querying a first sub-database according to the term, responding to the query of a first medical term which is the same as the term, determining that the term is the first medical term, responding to the query of no first medical term which is the same as the term, and determining that the term is not the first medical term; then, for any term which is not the first medical term in the at least one term, querying a second sub-database according to the term, determining that the term is the second medical term in response to querying the second medical term which is the same as the term, and determining that the term is not the second medical term in response to not querying the second medical term which is the same as the term.

According to the mode, the first sub database and the second sub database are respectively inquired according to the at least one term, and the first medical term and the second medical term in the at least one term can be determined.

It should be noted that, in the embodiment of the present application, the description is only performed by querying the first sub-database and then querying the second sub-database according to at least one term, and in another embodiment, the second sub-database and then the first sub-database may be queried according to the at least one term in the above manner, so as to determine a first medical term and a second medical term included in the at least one term; or simultaneously querying the first sub-database and the second sub-database according to the at least one term to determine the first medical term and the second medical term contained in the at least one term.

The third mode comprises the following steps: and extracting keywords from the reply sentences to obtain first medical terms in the reply sentences, inquiring at least one second medical term corresponding to the first medical term from the knowledge database according to the first medical terms, and forming the inquired second medical terms into the labeling information. The process is the same as the second method, and is not described herein again.

303. And calling a word labeling model by the computer equipment, labeling the first symptom description statement, and obtaining the prediction probability corresponding to the plurality of reference medical words.

The term labeling model is used for labeling the corresponding medical terms for any symptom description statement. The term annotation model includes a plurality of reference medical terms, each reference medical term describing a symptom. The prediction probability corresponding to each reference medical term is used to indicate the possibility that the corresponding reference medical term is the medical term corresponding to the first symptom description statement, and the larger the prediction probability is, the higher the possibility that the corresponding reference medical term is the medical term corresponding to the first symptom description statement is, the smaller the prediction probability is, and the smaller the possibility that the corresponding reference medical term is the medical term corresponding to the first symptom description statement is.

The first symptom description sentence is labeled through the word labeling model to determine the prediction probabilities corresponding to the plurality of reference medical words, that is, the possibility that the first symptom description sentence is associated with each reference medical word is determined, so that the word labeling model is trained based on the obtained prediction probabilities.

In one possible implementation, this step 303 comprises the following steps 3031-3033.

3031. And calling a word labeling model, and performing feature extraction on the first symptom description sentence to obtain the sentence features of the first symptom description sentence.

The sentence characteristic of the first symptom describing sentence is used to represent the characteristic included in the first symptom describing sentence, and optionally, the sentence characteristic is a sentence characteristic matrix or a sentence characteristic vector.

In one possible implementation, this step 3031 includes: and calling a word labeling model, performing word segmentation processing on the first symptom description sentence to obtain at least one word, performing coding processing on the at least one word to obtain a word vector of the at least one word, and fusing the word vector of the at least one word to obtain the sentence characteristics of the first symptom description sentence.

Wherein the word vector for each word is used to represent the meaning of the corresponding word. The first symptom description sentence is divided into at least one word by calling the word labeling model, and the word vectors of the at least one word are fused to obtain the sentence characteristics of the first symptom description sentence, so that the sentence characteristics contain the word vectors of each word, and the accuracy of the sentence characteristics is guaranteed.

Optionally, the at least one word includes a plurality of words, and the process of obtaining the sentence characteristic includes: and acquiring the weight of each word according to the word vectors of the words, and performing weighted fusion on the word vectors of the words according to the weights of the words to obtain the sentence characteristics of the first symptom description sentence.

Wherein the weight of each word is used for indicating the association degree of the corresponding word and the symptom described by the first symptom description sentence, the higher the weight is, the higher the association degree of the corresponding word and the symptom described by the first symptom description sentence is, and the lower the weight is, the lower the association degree of the corresponding word and the symptom described by the first symptom description sentence is.

After word vectors of a plurality of words contained in the first symptom description sentence are obtained, the weight of each word is obtained through the word vectors of the plurality of words, the word vectors of the plurality of words are fused according to the weights of the plurality of words, so that the word vectors of the plurality of words are fused in the sentence characteristics of the first symptom description sentence, the influence of the words with high association degree with the symptoms described by the first symptom description sentence on the sentence characteristics is highlighted, the influence of the words with low association degree with the symptoms described by the first symptom description sentence on the sentence characteristics is weakened, and the accuracy of the sentence characteristics of the first symptom description sentence is improved.

Alternatively, the process of obtaining the weights of the plurality of words includes the following three ways.

The first mode is as follows: determining the sum of the word vectors of a plurality of words as a word vector sum, and determining the similarity between the word vector of each word and the word vector sum as the weight of each word respectively.

The second mode is as follows: and performing weighted fusion on the word vectors of the words according to the initial weights of the words to obtain fusion vectors, and respectively determining the similarity between the word vector of each word and the fusion vectors as the weight of each word.

The initial weight is a weight set arbitrarily, for example, the initial weight of each word is 0.5, or weights corresponding to different words in the plurality of words are all numerical values greater than 0 and smaller than 1.

The third mode is as follows: and determining target words according to the positions of a plurality of words in the first symptom description sentence, and determining the similarity between the word vector of each word and the word vector of the target word as the weight of each word respectively. The target word is any one of the plurality of words, for example, the target word is a first word of the plurality of words, or the target word is a last word of the plurality of words.

3032. And carrying out feature transformation on the sentence features to obtain reference features.

The reference feature includes feature values of multiple dimensions, and optionally, the reference feature is a reference feature matrix or a reference feature vector. Each dimension in the reference feature corresponds to a reference medical term. Optionally, the plurality of reference medical terms are arranged in sequence in the term tagging model, and the reference feature includes a plurality of dimensions, which correspond to the plurality of reference medical terms in sequence, one to one. For example, a first dimension in the reference feature corresponds to a first medical term in the plurality of reference medical terms in the sequence, and a second dimension in the reference feature corresponds to a second medical term in the plurality of reference medical terms in the sequence.

3033. And respectively determining the characteristic value of each dimension in the reference characteristics as the prediction probability corresponding to the reference medical term corresponding to each dimension.

Because the sentence features include features included in the first symptom description sentence, the first symptom description sentence is used for describing symptoms, and each reference medical word is used for describing a symptom, the sentence features are transformed to determine the similarity between the features described by the first symptom description sentence and the symptoms described by each reference medical word, so as to obtain feature values of multiple dimensions included in the reference features, and the feature value of each dimension is the possibility that the corresponding reference medical word is the medical word corresponding to the first symptom description sentence, i.e. the prediction probabilities of the multiple reference medical words are obtained.

In a possible implementation manner, the word tagging model includes a coding sub-model, a weight obtaining sub-model, and a feature transformation sub-model, then the step 303 includes: calling the word labeling model, performing word segmentation processing on the first symptom description sentence to obtain a plurality of words, calling the coding sub-model, performing coding processing on the plurality of words to obtain word vectors of the plurality of words, calling the weight obtaining sub-model, obtaining the weight of each word according to the word vectors of the plurality of words, performing weighting fusion on the word vectors of the plurality of words according to the weights of the plurality of words to obtain the sentence characteristics of the first symptom description sentence, calling the characteristic transformation sub-model, performing characteristic transformation on the sentence characteristics to obtain reference characteristics, and respectively determining the characteristic value of each dimension in the reference characteristics as the prediction probability corresponding to the reference medical word corresponding to each dimension.

As shown in fig. 5, after the word tagging model performs word segmentation processing on the first symptom description sentence to obtain a plurality of words, the prediction probability of each reference medical word is obtained by calling a coding sub-model, a weight obtaining sub-model and a feature transformation sub-model in the word tagging model.

304. And the computer equipment trains the word labeling model according to the prediction probabilities corresponding to the plurality of reference medical words and the medical words in the labeling information.

Because the prediction probabilities corresponding to the plurality of reference medical terms are used for representing the possibility that the corresponding reference medical terms are medical terms corresponding to the first symptom description statement, and the medical terms in the annotation information are medical terms corresponding to the first symptom description statement, the accuracy of the term annotation model can be determined through the prediction probabilities of the plurality of reference medical terms and the medical terms in the annotation information, so that the term annotation model is trained to improve the accuracy of the term annotation model.

In the embodiment of the application, the word annotation model is trained by adopting the first symptom description statement and the reply statement of the first symptom description statement, and in the training process, the symptom description statement does not need to be labeled, so that the unsupervised training of the word annotation model is realized, the dependence of the word annotation model on labeling data can be reduced, and the accuracy of the word annotation model is improved.

In one possible implementation, this step 304 includes: determining the first target numerical value as the true probability of the medical terms in the labeling information, determining the prediction probability corresponding to the reference medical terms which are the same as the medical terms as the prediction probability corresponding to the medical terms for any medical term in the labeling information, determining the loss value of the term labeling model according to the prediction probability and the true probability corresponding to each medical term in the labeling information, and training the term labeling model according to the loss value.

If the first target value is an arbitrary value, for example, the first target value is 1, the tagging information includes at least one medical term, and each medical term may be the same as a reference medical term, then for any medical term in the tagging information, a prediction probability corresponding to the reference medical term that is the same as the medical term is determined as a prediction probability corresponding to the medical term, so as to indicate a possibility that the medical term is the medical term corresponding to the first symptom description sentence.

After the real probability and the prediction probability corresponding to each medical word in the labeling information are obtained, the real probability of each medical word indicates that the corresponding medical word is the medical word corresponding to the first symptom description statement, the difference between the real probability and the prediction probability corresponding to each medical word can be determined according to the real probability and the prediction probability corresponding to the medical word in the labeling information, so that the loss value of the word labeling model can be determined, and the word labeling model is trained according to the loss value, so that the accuracy of the word labeling model is improved.

Optionally, the process of obtaining the loss value of the word annotation model includes: after the prediction probability corresponding to each medical word in the labeling information is obtained, the second target numerical value is determined as the real probability corresponding to the reference medical word which is not included in the labeling information in the plurality of reference medical words, and the loss value of the word labeling model is determined according to the prediction probability and the real probability corresponding to each medical word in the labeling information and the prediction probability and the real probability corresponding to the reference medical word which is not included in the labeling information.

The second target value is an arbitrary value, for example, the second target value is 0. In the plurality of reference medical words, the symptom described by the reference medical word that is not included in the annotation information is unrelated to the symptom described by the first symptom describing sentence. After the true probability of the reference medical term which is not included in the labeling information is determined, namely the prediction probability and the true probability corresponding to each medical term in the labeling information and the prediction probability and the true probability corresponding to the reference medical term which is not included in the labeling information are determined, the loss value of the term labeling model can be determined according to the difference between the prediction probability and the true probability corresponding to each medical term and the difference between the prediction probability and the true probability corresponding to each reference medical term.

Optionally, after determining the prediction probability and the true probability corresponding to each medical term in the annotation information and the prediction probability and the true probability corresponding to the reference medical term not included in the annotation information, that is, determining the prediction probabilities and the true probabilities corresponding to the plurality of reference medical terms, and forming true features from the true probabilities corresponding to the plurality of reference medical terms, optionally, the true features are true feature vectors or true feature matrices. The real features comprise feature values of a plurality of dimensions, each dimension corresponds to one reference medical term, the feature value of each dimension is the real probability of one reference medical term, and the loss value of the term labeling model satisfies the following relation:

wherein the content of the first and second substances,

used to represent the loss value of the word annotation model,

for the purpose of representing the true features of the image,

for representing Sigmoid (logistic regression) functions for mapping values to

；

A sentence feature for representing a first symptom descriptive sentence;

the feature transformation matrix is used for representing the word annotation model;

for representing a transpose of a matrix;

for reference features.

It should be noted that, in the embodiment of the present application, the term tagging model is trained by determining the prediction probabilities corresponding to the plurality of reference medical terms and the medical terms in the tagging information, but in another embodiment, the step 303 and the step 304 need not be executed, and other manners can be adopted to train the term tagging model according to the first symptom description statement and the tagging information.

It should be noted that, in the embodiment of the present application, the word tagging model is trained by only one first symptom describing statement and the reply statement to the first symptom describing statement, and in another embodiment, a plurality of first symptom describing statements and reply statements to each first symptom describing statement are obtained, and the word tagging model is trained according to the above steps 301 and 305.

It should be noted that, in the embodiment of the present application, the training process of the term tagging model is described in only one turn, and in another embodiment, a plurality of first symptom describing statements and a reply statement for each first symptom describing statement are obtained, the above step 302 and 304 are repeated, the term tagging model is iteratively trained, and in response to the iteration number reaching the first threshold, the term tagging model is stopped from being trained; or stopping training the word labeling model in response to the fact that the loss value obtained in the current iteration turn is not larger than a second threshold value. The first threshold and the second threshold are both arbitrary values, for example, the first threshold is 10 or 15, and the second threshold is 0.4 or 0.3.

It should be noted that, in the above process of training the term tagging model according to the prediction probabilities corresponding to the plurality of reference medical terms and the medical terms in the tagging information, the process is described in a case that the plurality of reference medical terms include the medical terms in the tagging information, and in another embodiment, after the tagging information of the first symptom description statement is obtained, in response to that any medical term in the tagging information is not included in the plurality of reference medical terms, the medical term is determined as the reference medical term, the plurality of reference medical terms included in the term tagging model are updated, and after the reference medical terms included in the term tagging model are updated, the term tagging model is trained again according to the above steps 301 and 304.

305. And calling the trained word labeling model by the computer equipment, and labeling the second symptom description sentence to obtain the medical word corresponding to the second symptom description sentence.

Wherein the second symptom descriptive statement is used for describing symptoms, in the embodiment of the present application, the second symptom descriptive statement is different from the first symptom descriptive statement, and the second symptom descriptive statement is similar to the first symptom descriptive statement. After the word annotation model is trained, the trained word annotation model is called to label any symptom description statement, so that the medical word corresponding to the symptom description statement is obtained.

Through the trained word labeling model, automatic labeling of symptom description sentences can be achieved, manual labeling of the symptom description sentences is not needed, manual labeling time is saved, and labeling efficiency is improved.

In one possible implementation, this step 305 includes: and calling the trained word labeling model, labeling the second symptom description sentence to obtain the corresponding probabilities of the plurality of reference medical words, and determining the reference medical word corresponding to the maximum probability in the plurality of reference medical words as the medical word corresponding to the second symptom description sentence.

In an embodiment of the present application, the term tagging model includes a plurality of reference medical terms, each reference medical term is used for describing a symptom, a probability corresponding to each reference medical term is used for indicating a possibility that the reference medical term corresponds to a second symptom description term, and after determining the probability corresponding to each reference medical term, the reference medical term corresponding to the maximum probability is determined as the medical term corresponding to the second symptom description term, so as to ensure the accuracy of the determined medical terms.

The process of obtaining the probability corresponding to each reference medical term is similar to the step 303, and is not described herein again.

306. And the computer equipment trains the word mapping model according to the sample sentence fragments in the second symptom description sentences and the medical words corresponding to the second symptom description sentences.

The sample statement segment is a segment used for describing symptoms in the second symptom description statement, and the word mapping model is used for mapping any symptom description statement into a corresponding medical word. Optionally, the term mapping model includes a coding layer similar to a coding submodel in the term tagging model, an attention layer similar to a weight obtaining submodel in the term tagging model, and an output layer similar to a feature transformation submodel in the term tagging model. Optionally, the Word mapping model is WordCNN (Word Convolutional Neural Networks), or WordGRU (Word Gate Recurrent Unit).

In the embodiment of the application, after the word tagging model is trained, the word tagging model is called to tag the second symptom description sentence, a tagging result obtained by tagging is used as a training sample of the word mapping model, a sample sentence fragment in the second symptom tagging sentence and a medical word tagged for the second symptom description sentence are adopted, and the word mapping model is trained, so that in the process of training the word mapping model, manual tagging is not needed, automatic tagging of the second symptom tagging sentence can be realized by adopting the word tagging model, unsupervised training of the word mapping model is realized, dependence of the word mapping model on human tagging data is reduced, and accuracy of the word mapping model is improved.

In one possible implementation, this step 306 includes: and calling a word mapping model to label the sample statement segment to obtain the prediction probabilities corresponding to the plurality of reference medical words, and training the word mapping model according to the prediction probabilities corresponding to the plurality of reference medical words and the medical words corresponding to the second symptom description statement.

In embodiments of the present application, a plurality of reference medical terms are included in the term mapping model, each reference medical term being used to describe a symptom. Optionally, the term mapping model includes a plurality of reference medical terms that are identical to reference medical terms of the plurality of reference medical terms included in the term tagging model. That is, the number of reference medical terms included by the term mapping model is the same as the number of reference medical terms included by the term tagging model, and each reference medical term in the term mapping model is the same as one reference medical term in the term tagging model. Optionally, the term mapping model includes a plurality of reference medical terms that are the same as some of the plurality of reference medical terms included in the term tagging model. That is, at least one reference medical term included in the term mapping model is not included in the term annotation model; alternatively, at least one reference medical term included in the term annotation model is not included in the term mapping model. Optionally, the term mapping model includes a plurality of reference medical terms that are completely different from the plurality of reference medical terms included in the term tagging model. That is, each reference medical term in the term mapping model is not included in the term tagging model, and each reference medical term in the term tagging model is not included in the term mapping model.

The process of invoking the word mapping model to obtain the prediction probability corresponding to each reference medical word and training the word mapping model according to the obtained probability and the medical word corresponding to the second symptom description sentence is similar to the process of step 303 and 304, and is not repeated herein.

It should be noted that, in the above process of training the term mapping model according to the prediction probabilities corresponding to the plurality of reference medical terms and the medical terms corresponding to the second symptom description statement, the process is described in a case where the plurality of reference medical terms includes the medical terms corresponding to the second symptom description statement, and in another embodiment, after obtaining the medical terms corresponding to the second symptom description statement, in response to that the medical terms corresponding to the second symptom description statement are not included in the plurality of reference medical terms in the term mapping model, the medical terms are determined as the reference medical terms, the plurality of reference medical terms included in the term mapping model are updated, and after the reference medical terms included in the term mapping model are updated, the term mapping model is trained again according to the above steps 305 and 306.

In one possible implementation, the process of obtaining a sample sentence fragment in the second symptom description sentence includes: and dividing the second symptom describing sentence to obtain a plurality of sentence segments, respectively determining the weight sum of the words contained in each sentence segment as the weight of each sentence segment, and determining the sentence segment corresponding to the maximum weight in the plurality of sentence segments as a sample sentence segment in the second symptom describing sentence.

Each sentence segment comprises at least one word, the weight of each word is used for representing the association degree of the corresponding word and the symptom described by the second symptom description sentence, the higher the weight is, the higher the association degree of the corresponding word and the symptom described by the second symptom description sentence is, and the lower the weight is, the lower the association degree of the corresponding word and the symptom described by the second symptom description sentence is. The weight of each sentence fragment indicates the degree of association of the corresponding sentence fragment with the symptom described by the second symptom describing sentence.

After the second symptom describing sentence is divided into a plurality of sentence fragments, the sentence fragment corresponding to the maximum weight is the sentence fragment with the maximum degree of association with the symptom described by the second symptom describing sentence among the sentence fragments, that is, the sentence fragment corresponding to the maximum weight is the fragment for describing the symptom in the second symptom describing sentence, and then the sentence fragment corresponding to the maximum weight is determined as the sample sentence fragment in the second symptom describing sentence.

Optionally, the process of obtaining a plurality of sentence fragments includes: and determining the target threshold words as a sentence segment in sequence according to the positions of the plurality of words contained in the second symptom describing sentence to obtain a plurality of sentence segments.

The target threshold is an arbitrary value, and for example, the target threshold is 3 or 4. In the resulting plurality of sentence fragments, each sentence fragment includes a target threshold number of words.

For example, the second symptom description sentence includes a plurality of words: word 1, word 2, word 3, word 4 and word 5, with a target threshold of 2, the 4 sentence segments obtained by segmentation are: words 1 and 2, words 2 and 3, words 3 and 4, words 4 and 5.

Optionally, the weights of the plurality of sentence fragments satisfy the following relationship:

wherein the content of the first and second substances,

for representing the first of a plurality of sentence fragments

The weight corresponding to each sentence fragment is determined,

sequence numbers for multiple statement fragments

Is a positive integer greater than or equal to 1;

is used for showing the first

The sequence number of each word in the sentence fragment in the second symptom describing sentence,

is used for showing the first

The number of words contained in a sentence fragment, i.e. the target threshold,

is used for showing the first

The weight corresponding to each word.

In addition, the process of obtaining the weights of the words included in each sentence segment is the same as the process of obtaining the weights of the words in step 303, and is not described herein again.

As shown in fig. 6, after the word tagging model performs word segmentation processing on the second symptom description sentence to obtain a plurality of words, the prediction probability of each reference medical word is obtained by calling a coding sub-model, a weight obtaining sub-model and a feature transformation sub-model in the word tagging model, and a medical word corresponding to the maximum prediction probability is selected from the plurality of reference medical words. And according to the positions of a plurality of words contained in the second symptom describing sentence, sequentially determining a target threshold number of words as a sentence segment to obtain a plurality of sentence segments, according to the weight of each word in the second symptom describing sentence, determining the weight of each sentence segment, and selecting a sample sentence segment corresponding to the maximum weight from the plurality of sentence segments.

It should be noted that, in the embodiment of the present application, the term mapping model is trained by only one second symptom describing statement, and in another embodiment, a plurality of second symptom describing statements are obtained and the term mapping model is trained according to the above step 305 and step 306.

It should be noted that, in the embodiment of the present application, the training process of the word mapping model is described in only one turn, and in another embodiment, a plurality of second symptom description statements are obtained, the above steps 305 and 306 are repeated, the word mapping model is iteratively trained, and in response to the number of iterations reaching the third threshold, the training of the word mapping model is stopped; or stopping training the word mapping model in response to the loss value obtained in the current iteration turn being not greater than the fourth threshold. The third threshold and the fourth threshold are both arbitrary values, for example, the third threshold is 10 or 15, and the fourth threshold is 0.4 or 0.3.

In one possible implementation, after step 306, the method further includes: and acquiring a third symptom description sentence and a corresponding medical word, and training the word mapping model again according to the third symptom description sentence and the corresponding medical word.

In the embodiment of the present application, the third symptom descriptive statement is different from the first symptom descriptive statement and the second symptom descriptive statement, and the third symptom descriptive statement is similar to the first symptom descriptive statement and the second symptom descriptive statement. The medical words corresponding to the third symptom description sentence are obtained by artificial labeling.

After the word mapping model is trained according to the above steps 305-306, a third symptom description sentence and a medical word corresponding to the third symptom description sentence obtained by artificial labeling are obtained, and then the word mapping model is trained again according to the result of the artificial labeling, so as to realize fine tuning of the word mapping model, thereby improving the accuracy of the word mapping model.

Optionally, the process of training the word mapping model includes: and training the word mapping model again according to the sample sentence fragments in the third symptom description sentences and the medical words corresponding to the third symptom description sentences.

Wherein, the sample sentence segment is a segment for describing the symptom in the third symptom describing sentence. The process is similar to step 306, and will not be described herein.

307. And calling the trained word mapping model by the computer equipment, and mapping the target sentence fragments in the target symptom description sentences to obtain the medical words corresponding to the target symptom description sentences.

Wherein, the target sentence segment is a segment for describing symptoms in the target symptom describing sentence. After the word mapping model is trained, the trained word mapping model is called to map the target sentence segment in any symptom description sentence, so that the medical word corresponding to the symptom description sentence is obtained. Any symptom description statement is mapped by adopting a word mapping model trained in an unsupervised mode, so that the accuracy of the medical word obtained by mapping is ensured.

In one possible implementation, this step 307 includes: and calling the trained word mapping model, determining a target sentence segment in the target symptom description sentence, and mapping the target sentence segment to obtain the medical word corresponding to the target symptom description sentence.

Optionally, the process of determining the target sentence fragment includes: the method comprises the steps of dividing a target symptom describing sentence to obtain a plurality of sentence segments, respectively determining the weight sum of words contained in each sentence segment as the weight of each sentence segment, and determining the sentence segment corresponding to the maximum weight in the plurality of sentence segments as the target sentence segment in the target symptom describing sentence.

The process of obtaining the weights of the words included in each sentence segment is similar to the process of obtaining the weights of the words in step 303, and is not described herein again. Moreover, the process of determining the target sentence fragment is the same as the process of obtaining the sample sentence fragment in the second symptom describing sentence in step 306, and is not described herein again.

It should be noted that, in the embodiment of the present application, the term mapping model is trained according to the sample term segment in the second symptom describing term, and the target term segment in the target symptom describing term is mapped after the term mapping model is trained, and in another embodiment, the step 306 and the step 307 do not need to be executed, and other manners can be adopted to train the term mapping model according to the second symptom describing term and the medical term corresponding to the second symptom describing term, and invoke the trained term mapping model to map any target symptom describing term, so as to obtain the medical term corresponding to the target symptom describing term.

In one possible implementation, if the medical word corresponding to the target symptom describing sentence is the second medical word, after step 307, the method further includes: inquiring a knowledge database according to a second medical term corresponding to the target symptom description statement, determining a first medical term corresponding to the second medical term, determining department information matched with the first medical term, sending the department information to a terminal, and displaying the department information by the terminal.

Wherein the first medical term is a term used to describe a disease and the second medical term is a term used to describe a symptom. The knowledge database comprises a correspondence between the first medical term and the second medical term, and further comprises a correspondence between the first medical term and the department information. The department information is used to indicate the department diagnosing the first medical term, optionally including doctor information belonging to the department.

In this embodiment, the computer device is a server, the terminal is a user terminal, the target symptom description statement in step 307 is sent by the terminal, after the computer device obtains a second medical term corresponding to the target symptom description statement, the knowledge database is queried, a first medical term corresponding to the second medical term is determined, that is, a disease with symptoms described by the second medical term is determined, department information corresponding to the first medical term is determined, that is, a department possibly capable of diagnosing the disease described by the first medical term is determined, then the queried department information is sent to the terminal, and the terminal displays the department information, so that the user can visit the department indicated by the department information, and thus the function of diagnosis guidance is realized.

In one possible implementation, if the medical word corresponding to the target symptom describing sentence is the second medical word, after step 307, the method further includes: and querying a knowledge database according to the second medical term corresponding to the target symptom description statement, and determining the first medical term corresponding to the second medical term.

By the method provided by the embodiment of the application, the first medical word corresponding to any symptom description sentence can be determined, and then the medical doctor determines the physical condition of the user according to the determined first medical word and other information of the user.

In the embodiment of the present application, the symptom description statement is a spoken description statement, that is, the spoken statement describes a symptom, and the Medical term is a standard Medical entity, and by using the term mapping model provided in the embodiment of the present application, the symptom description statement can be mapped to the standard Medical term to realize accurate conversion from spoken language to standardization, and Medical entity standardization (Medical Concept standardization) is realized, so that subsequent processing is performed based on the mapped standardized Medical term. For example, when the method provided by the embodiment of the application is applied to a medical pre-inquiry system, spoken or ambiguous symptom description sentences input by a user can be accurately mapped to corresponding medical words, and then connected with a medical knowledge database, so that disease speculation or corresponding department recommendation can be realized, and the purpose of medical pre-inquiry is achieved.

And the trained word annotation model is utilized to provide an automatic annotation result for the word mapping model, and then the word mapping model is trained according to the automatic annotation result.

And the trained word labeling model is used, after the word mapping model is trained, the symptom description sentences are mapped by using the word mapping model, so that the accuracy of the medical words obtained by mapping is ensured.

According to the method for training the word mapping model, an unsupervised mode is adopted, the word mapping model is trained in advance by using the trained word labeling model, and after the pre-training is finished, the pre-trained word mapping model is trained again based on the artificial labeling result, so that the word mapping model is accurately adjusted, and the accuracy of the word mapping model is improved.

For the case where the word mapping model is a plurality of network models, such as WordCNN or WordGRU, the accuracy of the word mapping model obtained by using the pre-training method and the non-pre-training method of the present application is shown in table 1. As can be seen from Table 1, no matter which network model the word mapping model is, the accuracy of training the word mapping model is higher by adopting the pre-training mode provided by the application, and the accuracy of the word mapping model is increased along with the increase of symptom description sentences used for training the word mapping model.

TABLE 1

Word mapping model	Pre-training	700 symptom description sentences	800 symptom description sentences	900 symptom description sentences
					WordCNN	Whether or not	79.40	84.40	86.40
WordCNN	Is that	88.60	89.40	92.20
					WordGRU	Whether or not	84.00	85.80	88.80
WordGRU	Is that	88.40	89.20	91.20

Fig. 7 is a flowchart of a medical word mapping method provided in an embodiment of the present application, which is applied to a computer device, and as shown in fig. 7, the method includes the following steps.

701. The computer device obtains a first symptom descriptive statement and a reply statement to the first symptom descriptive statement.

702. The computer device generates annotation information for the first symptom description statement based on the medical terms contained in the reply statement.

703. And the computer equipment trains the word mapping model according to the first symptom description sentence and the labeling information.

The term mapping model is used for labeling the corresponding medical terms for any symptom description statement.

704. And calling the trained word mapping model by the computer equipment, and mapping the target symptom description sentence to obtain the medical word corresponding to the target symptom description sentence.

It should be noted that, steps 701 and 704 in the embodiment of the present application are the same as

steps

201 and 204 described above, and are not described herein again.

According to the method provided by the embodiment of the application, the word mapping model is trained by adopting the first symptom description statement and the reply statement of the first symptom description statement, and in the training process, the symptom description statement does not need to be labeled manually, so that the unsupervised training of the word mapping model is realized, the dependence of the word mapping model on labeled data can be reduced, and the accuracy of the word mapping model is improved. Moreover, the mapping of the symptom description sentences can be realized through the trained word mapping model, medical words corresponding to the symptom description sentences are obtained, and the standardized mapping of the symptom description sentences is realized.

Fig. 8 is a flowchart of a medical word mapping method provided in an embodiment of the present application, which is applied to a computer device, and as shown in fig. 8, the method includes the following steps.

801. The computer device obtains a first symptom descriptive statement and a reply statement to the first symptom descriptive statement.

802. The computer device generates annotation information for the first symptom description statement based on the medical terms contained in the reply statement.

803. And calling the word mapping model by the computer equipment, labeling the first symptom description statement, and obtaining the prediction probability corresponding to the plurality of reference medical words.

The term mapping model is used for labeling corresponding medical terms for any symptom description statement, and comprises a plurality of reference medical terms.

804. And the computer equipment trains the word labeling model according to the prediction probabilities corresponding to the plurality of reference medical words and the medical words in the labeling information.

It should be noted that, in the embodiment of the present application, the term mapping model is trained by determining the prediction probabilities corresponding to the plurality of reference medical terms and the medical terms in the labeling information, but in another embodiment, the step 803 and the step 804 do not need to be executed, and other manners can be adopted to train the term mapping model according to the first symptom description statement and the labeling information.

805. And calling the trained word mapping model by the computer equipment, and mapping the target symptom description sentence to obtain the medical word corresponding to the target symptom description sentence.

It should be noted that, steps 801 and 805 in the embodiment of the present application are the same as steps 301 and 305 in the embodiment described above, and are not described herein again.

Fig. 9 is a schematic structural diagram of a medical word tagging device provided by an embodiment of the application, and as shown in fig. 9, the device includes:

an obtaining module 901, configured to obtain a first symptom describing statement and a reply statement to the first symptom describing statement;

a generating module 902, configured to generate tagging information of the first symptom describing statement based on the medical word included in the reply statement, where the tagging information includes the medical word associated with the symptom described by the first symptom describing statement;

a training module 903, configured to train a word tagging model according to the first symptom description statement and tagging information, where the word tagging model is used to tag a corresponding medical word for any symptom description statement;

and the labeling module 904 is configured to call the trained word labeling model, label the second symptom description sentence, and obtain a medical word corresponding to the second symptom description sentence.

As shown in fig. 10, in one possible implementation, the apparatus further includes:

the training module 903 is further configured to train a word mapping model according to the second symptom description sentence and the medical word corresponding to the second symptom description sentence, where the word mapping model is used to map any symptom description sentence into a corresponding medical word;

the mapping module 905 is configured to invoke the trained word mapping model, map any target symptom description statement, and obtain a medical word corresponding to the target symptom description statement.

In another possible implementation manner, the training module 903 is configured to train the term mapping model according to a sample term segment in the second symptom describing term and a medical term corresponding to the second symptom describing term, where the sample term segment is a segment used for describing a symptom in the second symptom describing term;

the mapping module 905 is configured to invoke the trained word mapping model, and map a target sentence segment in the target symptom description sentence to obtain a medical word corresponding to the target symptom description sentence, where the target sentence segment is a segment used for describing a symptom in the target symptom description sentence.

In another possible implementation manner, the apparatus further includes:

a dividing module 906, configured to perform division processing on the second symptom describing sentence to obtain a plurality of sentence segments, where each sentence segment includes at least one word;

a determining module 907, configured to determine the sum of the weights of the terms included in each sentence fragment as the weight of each sentence fragment;

the determining module 907 is further configured to determine a sentence fragment corresponding to the largest weight from the plurality of sentence fragments as a sample sentence fragment in the second symptom description sentence.

In another possible implementation manner, the obtaining module 901 is further configured to obtain a third symptom description sentence and a corresponding medical term, where the medical term corresponding to the third symptom description sentence is obtained by artificial labeling;

the training module 903 is further configured to train the word mapping model again according to the third symptom description sentence and the corresponding medical word.

In another possible implementation manner, the training module 903 is configured to train the term mapping model again according to a sample term segment in the third symptom describing term and a medical term corresponding to the third symptom describing term, where the sample term segment is a segment used for describing a symptom in the third symptom describing term.

In another possible implementation, the term annotation model includes a plurality of reference medical terms, each reference medical term describing a symptom; a training module 903 comprising:

the labeling unit 9031 is configured to invoke a word labeling model, label the first symptom description statement, and obtain prediction probabilities corresponding to a plurality of reference medical words, where the prediction probability corresponding to each reference medical word is used to indicate a possibility that the reference medical word is a medical word corresponding to the first symptom description statement;

and the training unit 9032 is configured to train the word tagging model according to the prediction probabilities corresponding to the plurality of reference medical words and the medical words in the tagging information.

In another possible implementation manner, the labeling unit 9031 is configured to invoke a word labeling model, perform feature extraction on the first symptom describing statement, and obtain a statement feature of the first symptom describing statement; performing feature transformation on the sentence features to obtain reference features, wherein the reference features comprise feature values of a plurality of dimensions, and each dimension corresponds to one reference medical word; and respectively determining the characteristic value of each dimension in the reference characteristics as the prediction probability corresponding to the reference medical term corresponding to each dimension.

In another possible implementation manner, the labeling unit 9031 is configured to invoke a word labeling model, and perform word segmentation processing on the first symptom description sentence to obtain at least one word; coding at least one word to obtain a word vector of the at least one word; and fusing the word vectors of at least one word to obtain the sentence characteristics of the first symptom description sentence.

In another possible implementation, the at least one term includes a plurality of terms; a labeling unit 9031, configured to obtain a weight of each word according to the word vectors of the multiple words; and carrying out weighted fusion on the word vectors of the words according to the weights of the words to obtain the sentence characteristics of the first symptom description sentence.

In another possible implementation manner, the training unit 9032 is configured to determine the first target value as a true probability of the medical word in the annotation information; for any medical term in the labeling information, determining the prediction probability corresponding to the reference medical term which is the same as the medical term as the prediction probability corresponding to the medical term; determining a loss value of a word annotation model according to the prediction probability and the real probability corresponding to each medical word in the annotation information; and training the word labeling model according to the loss value.

In another possible implementation manner, the apparatus further includes:

a determining module 907, configured to determine the second target value as a true probability corresponding to a reference medical term that is not included in the annotation information in the plurality of reference medical terms;

the training unit 9032 is configured to determine a loss value of the term labeling model according to the prediction probability and the true probability corresponding to each medical term in the labeling information, and the prediction probability and the true probability corresponding to a reference medical term that is not included in the labeling information.

In another possible implementation, the generating module 902 includes:

an extraction unit 9021, configured to perform keyword extraction on the reply sentence to obtain a medical term in the reply sentence;

and the forming unit 9022 is configured to form the obtained medical term into the labeling information.

In another possible implementation manner, the extraction unit 9021 is configured to perform word segmentation processing on the reply sentence to obtain at least one word; and querying at least one term from a knowledge database to obtain medical terms contained in the at least one term, wherein the knowledge database comprises the at least one medical term.

In another possible implementation, the generating module 902 includes:

the extraction unit 9021 is configured to perform keyword extraction on the reply sentence to obtain a first medical term and a second medical term in the reply sentence, where the first medical term is a term used for describing a disease, and the second medical term is a term used for describing a symptom;

the query unit 9023 is configured to query, according to the first medical term, at least one second medical term corresponding to the first medical term from a knowledge database, where the knowledge database includes a correspondence between the first medical term and the second medical term;

and the constructing unit 9022 is configured to construct the extracted second medical term and the queried second medical term into the labeling information.

In another possible implementation manner, the extraction unit 9021 is configured to perform word segmentation processing on the reply sentence to obtain at least one word; and inquiring a first medical term and a second medical term contained in the at least one term from the knowledge database according to the at least one term.

In another possible implementation, the term annotation model includes a plurality of reference medical terms, each reference medical term describing a symptom; a labeling module 904, configured to invoke the trained term labeling model, label the second symptom description statement, and obtain probabilities corresponding to a plurality of reference medical terms, where the probability corresponding to each reference medical term is used to indicate a possibility that the reference medical term is a medical term corresponding to the second symptom description statement; and determining the reference medical word corresponding to the maximum probability in the plurality of reference medical words as the medical word corresponding to the second symptom description sentence.

It should be noted that: the medical word labeling apparatus provided in the above embodiment is only illustrated by dividing the functional modules, and in practical applications, the functions may be distributed by different functional modules according to needs, that is, the internal structure of the computer device is divided into different functional modules to complete all or part of the functions described above. In addition, the medical term labeling device and the medical term labeling method provided by the above embodiments belong to the same concept, and specific implementation processes thereof are described in the method embodiments and are not described herein again.

Fig. 11 is a schematic structural diagram of a medical word mapping apparatus provided in an embodiment of the present application, and as shown in fig. 11, the apparatus includes:

an obtaining module 1101, configured to obtain a first symptom description statement and a reply statement to the first symptom description statement;

a generating module 1102, configured to generate labeling information of the first symptom description statement based on medical terms included in the reply statement, where the labeling information includes medical terms associated with symptoms described by the first symptom description statement;

a training module 1103, configured to train a term mapping model according to the first symptom description statement and the labeling information, where the term mapping model is used to label a corresponding medical term for any symptom description statement;

and the mapping module 1104 is used for calling the trained word mapping model, mapping the target symptom description statement and obtaining the medical word corresponding to the target symptom description statement.

It should be noted that: the medical word mapping apparatus provided in the above embodiment is only illustrated by dividing the functional modules, and in practical applications, the functions may be distributed by different functional modules according to needs, that is, the internal structure of the computer device is divided into different functional modules to complete all or part of the functions described above. In addition, the medical term mapping device and the medical term mapping method provided by the above embodiments belong to the same concept, and specific implementation processes thereof are described in the method embodiments and are not described herein again.

The present application further provides a computer device, which includes a processor and a memory, where the memory stores at least one computer program, and the at least one computer program is loaded by the processor and executed to implement the operations performed in the medical word tagging method of the above embodiment or the operations performed in the medical word mapping method of the above embodiment.

Optionally, the computer device is provided as a terminal. Fig. 12 shows a block diagram of a terminal 1200 according to an exemplary embodiment of the present application. The terminal 1200 may be a portable mobile terminal such as: a smart phone, a tablet computer, an MP3 player (Moving Picture Experts Group Audio Layer III, motion video Experts compression standard Audio Layer 3), an MP4 player (Moving Picture Experts Group Audio Layer IV, motion video Experts compression standard Audio Layer 4), a notebook computer, or a desktop computer. Terminal 1200 may also be referred to by other names such as user equipment, portable terminal, laptop terminal, desktop terminal, and so forth.

The terminal 1200 includes: a processor 1201 and a memory 1202.

The processor 1201 may include one or more processing cores, such as a 4-core processor, an 8-core processor, or the like. The processor 1201 may be implemented in at least one hardware form of a DSP (Digital Signal Processing), an FPGA (Field-Programmable Gate Array), and a PLA (Programmable Logic Array). The processor 1201 may also include a main processor and a coprocessor, where the main processor is a processor for Processing data in an awake state, and is also called a Central Processing Unit (CPU); a coprocessor is a low power processor for processing data in a standby state. In some embodiments, the processor 1201 may be integrated with a GPU (Graphics Processing Unit) for rendering and drawing content required to be displayed by the display screen. In some embodiments, the processor 1201 may further include an AI (Artificial Intelligence) processor for processing a computing operation related to machine learning.

Memory 1202 may include one or more computer-readable storage media, which may be non-transitory. Memory 1202 may also include high-speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium in the memory 1202 is used to store at least one computer program for execution by the processor 1201 to implement the operations performed in the medical term labeling method of the above embodiments or to implement the operations performed in the medical term mapping method of the above embodiments.

In some embodiments, the terminal 1200 may further optionally include: a peripheral interface 1203 and at least one peripheral. The processor 1201, memory 1202, and peripheral interface 1203 may be connected by a bus or signal line. Various peripheral devices may be connected to peripheral interface 1203 via a bus, signal line, or circuit board. Specifically, the peripheral device includes: at least one of radio frequency circuitry 1204, display 1205, camera assembly 1206, audio circuitry 1207, positioning assembly 1208, and power supply 1209.

The peripheral interface 1203 may be used to connect at least one peripheral associated with I/O (Input/Output) to the processor 1201 and the memory 1202. In some embodiments, the processor 1201, memory 1202, and peripheral interface 1203 are integrated on the same chip or circuit board; in some other embodiments, any one or two of the processor 1201, the memory 1202 and the peripheral device interface 1203 may be implemented on a separate chip or circuit board, which is not limited in this embodiment.

The Radio Frequency circuit 1204 is used for receiving and transmitting RF (Radio Frequency) signals, also called electromagnetic signals. The radio frequency circuit 1204 communicates with a communication network and other communication devices by electromagnetic signals. The radio frequency circuit 1204 converts an electric signal into an electromagnetic signal to transmit, or converts a received electromagnetic signal into an electric signal. Optionally, the radio frequency circuit 1204 comprises: an antenna system, an RF transceiver, one or more amplifiers, a tuner, an oscillator, a digital signal processor, a codec chipset, a subscriber identity module card, and so forth. The radio frequency circuit 1204 may communicate with other terminals through at least one wireless communication protocol. The wireless communication protocols include, but are not limited to: the world wide web, metropolitan area networks, intranets, generations of mobile communication networks (2G, 3G, 4G, and 5G), Wireless local area networks, and/or WiFi (Wireless Fidelity) networks. In some embodiments, the rf circuit 1204 may further include NFC (Near Field Communication) related circuits, which are not limited in this application.

The display screen 1205 is used to display a UI (User Interface). The UI may include graphics, text, icons, video, and any combination thereof. When the display screen 1205 is a touch display screen, the display screen 1205 also has the ability to acquire touch signals on or over the surface of the display screen 1205. The touch signal may be input to the processor 1201 as a control signal for processing. At this point, the display 1205 may also be used to provide virtual buttons and/or a virtual keyboard, also referred to as soft buttons and/or a soft keyboard. In some embodiments, the display 1205 may be one, disposed on a front panel of the terminal 1200; in other embodiments, the display 1205 can be at least two, respectively disposed on different surfaces of the terminal 1200 or in a folded design; in other embodiments, the display 1205 may be a flexible display disposed on a curved surface or on a folded surface of the terminal 1200. Even further, the display screen 1205 may be arranged in a non-rectangular irregular figure, i.e., a shaped screen. The Display panel 1205 can be made of LCD (Liquid Crystal Display), OLED (Organic Light-Emitting Diode), or other materials.

Camera assembly 1206 is used to capture images or video. Optionally, camera assembly 1206 includes a front camera and a rear camera. The front camera is arranged on the front panel of the terminal, and the rear camera is arranged on the back of the terminal. In some embodiments, the number of the rear cameras is at least two, and each rear camera is any one of a main camera, a depth-of-field camera, a wide-angle camera and a telephoto camera, so that the main camera and the depth-of-field camera are fused to realize a background blurring function, and the main camera and the wide-angle camera are fused to realize panoramic shooting and VR (Virtual Reality) shooting functions or other fusion shooting functions. In some embodiments, camera assembly 1206 may also include a flash. The flash lamp can be a monochrome temperature flash lamp or a bicolor temperature flash lamp. The double-color-temperature flash lamp is a combination of a warm-light flash lamp and a cold-light flash lamp, and can be used for light compensation at different color temperatures.

The audio circuitry 1207 may include a microphone and a speaker. The microphone is used for collecting sound waves of a user and the environment, converting the sound waves into electric signals, and inputting the electric signals into the processor 1201 for processing or inputting the electric signals into the radio frequency circuit 1204 to achieve voice communication. For stereo capture or noise reduction purposes, multiple microphones may be provided at different locations of terminal 1200. The microphone may also be an array microphone or an omni-directional pick-up microphone. The speaker is used to convert electrical signals from the processor 1201 or the radio frequency circuit 1204 into sound waves. The loudspeaker can be a traditional film loudspeaker or a piezoelectric ceramic loudspeaker. When the speaker is a piezoelectric ceramic speaker, the speaker can be used for purposes such as converting an electric signal into a sound wave audible to a human being, or converting an electric signal into a sound wave inaudible to a human being to measure a distance. In some embodiments, the audio circuitry 1207 may also include a headphone jack.

The positioning component 1208 is used for positioning the current geographic Location of the terminal 1200 to implement navigation or LBS (Location Based Service). The Positioning component 1208 may be a Positioning component according to the Global Positioning System (GPS) in the united states, the beidou System in china, or the galileo System in russia.

The power supply 1209 is used to provide power to various components within the terminal 1200. The power source 1209 may be alternating current, direct current, disposable or rechargeable. When the power source 1209 includes a rechargeable battery, the rechargeable battery may be a wired rechargeable battery or a wireless rechargeable battery. The wired rechargeable battery is a battery charged through a wired line, and the wireless rechargeable battery is a battery charged through a wireless coil. The rechargeable battery may also be used to support fast charge technology.

In some embodiments, terminal 1200 also includes one or more sensors 1210. The one or more sensors 1210 include, but are not limited to: acceleration sensor 1211, gyro sensor 1212, pressure sensor 1213, fingerprint sensor 1214, optical sensor 1215, and proximity sensor 1216.

The acceleration sensor 1211 can detect magnitudes of accelerations on three coordinate axes of the coordinate system established with the terminal 1200. For example, the acceleration sensor 1211 may be used to detect components of the gravitational acceleration in three coordinate axes. The processor 1201 may control the display screen 1205 to display the user interface in a landscape view or a portrait view according to the gravitational acceleration signal collected by the acceleration sensor 1211. The acceleration sensor 1211 may also be used for acquisition of motion data of a game or a user.

The gyro sensor 1212 may detect a body direction and a rotation angle of the terminal 1200, and the gyro sensor 1212 may collect a 3D motion of the user on the terminal 1200 in cooperation with the acceleration sensor 1211. The processor 1201 can implement the following functions according to the data collected by the gyro sensor 1212: motion sensing (such as changing the UI according to a user's tilting operation), image stabilization at the time of photographing, game control, and inertial navigation.

Pressure sensors 1213 may be disposed on the side frames of terminal 1200 and/or underlying display 1205. When the pressure sensor 1213 is disposed on the side frame of the terminal 1200, the user's holding signal of the terminal 1200 can be detected, and the processor 1201 performs left-right hand recognition or shortcut operation according to the holding signal collected by the pressure sensor 1213. When the pressure sensor 1213 is disposed at a lower layer of the display screen 1205, the processor 1201 controls the operability control on the UI interface according to the pressure operation of the user on the display screen 1205. The operability control comprises at least one of a button control, a scroll bar control, an icon control and a menu control.

The fingerprint sensor 1214 is used for collecting a fingerprint of the user, and the processor 1201 identifies the user according to the fingerprint collected by the fingerprint sensor 1214, or the fingerprint sensor 1214 identifies the user according to the collected fingerprint. When the user identity is identified as a trusted identity, the processor 1201 authorizes the user to perform relevant sensitive operations, including unlocking a screen, viewing encrypted information, downloading software, paying, changing settings, and the like. The fingerprint sensor 1214 may be disposed on the front, back, or side of the terminal 1200. When a physical button or vendor Logo is provided on the terminal 1200, the fingerprint sensor 1214 may be integrated with the physical button or vendor Logo.

The optical sensor 1215 is used to collect the ambient light intensity. In one embodiment, the processor 1201 may control the display brightness of the display 1205 according to the ambient light intensity collected by the optical sensor 1215. Specifically, when the ambient light intensity is high, the display luminance of the display panel 1205 is increased; when the ambient light intensity is low, the display brightness of the display panel 1205 is turned down. In another embodiment, processor 1201 may also dynamically adjust the camera head 1206 shooting parameters based on the ambient light intensity collected by optical sensor 1215.

A proximity sensor 1216, also called a distance sensor, is disposed at a front panel of the terminal 1200. The proximity sensor 1216 is used to collect a distance between the user and the front surface of the terminal 1200. In one embodiment, when the proximity sensor 1216 detects that the distance between the user and the front surface of the terminal 1200 gradually decreases, the processor 1201 controls the display 1205 to switch from the bright screen state to the dark screen state; when the proximity sensor 1216 detects that the distance between the user and the front surface of the terminal 1200 gradually becomes larger, the processor 1201 controls the display 1205 to switch from the breath-screen state to the bright-screen state.

Those skilled in the art will appreciate that the configuration shown in fig. 12 is not intended to be limiting of terminal 1200 and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components may be used.

Optionally, the computer device is provided as a server. Fig. 13 is a schematic structural diagram of a server 1300 according to an embodiment of the present application, where the server 1300 may generate a relatively large difference due to different configurations or performances, and may include one or more processors (CPUs) 1301 and one or more memories 1302, where the memory 1302 stores at least one computer program, and the at least one computer program is loaded and executed by the processors 1301 to implement the methods provided by the foregoing method embodiments. Of course, the server may also have components such as a wired or wireless network interface, a keyboard, and an input/output interface, so as to perform input/output, and the server may also include other components for implementing the functions of the device, which are not described herein again.

The present application further provides a computer-readable storage medium, in which at least one computer program is stored, and the at least one computer program is loaded and executed by a processor to implement the operations performed in the medical word labeling method of the foregoing embodiments or the operations performed in the medical word mapping method of the foregoing embodiments.

Embodiments of the present application also provide a computer program product or a computer program comprising computer program code stored in a computer readable storage medium. The processor of the computer device reads the computer program code from the computer-readable storage medium, and the processor executes the computer program code, so that the computer device implements the operations performed in the medical word labeling method of the above embodiment or implements the operations performed in the medical word mapping method of the above embodiment.

It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, where the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.

The above description is only an alternative embodiment of the present application and should not be construed as limiting the present application, and any modification, equivalent replacement, or improvement made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims

1. A method of medical word annotation, the method comprising:

calling the trained word labeling model, and labeling a second symptom description sentence to obtain a medical word corresponding to the second symptom description sentence;

determining a target threshold number of words as a sentence segment in sequence according to the positions of a plurality of words contained in the second symptom describing sentence to obtain a plurality of sentence segments;

respectively determining the sum of the weights of the words contained in each sentence segment as the weight of each sentence segment, wherein the weight is used for expressing the association degree of the corresponding sentence segment and the symptom described by the corresponding symptom description sentence;

determining a sentence segment corresponding to the maximum weight in the plurality of sentence segments as a sample sentence segment in the second symptom describing sentence, wherein the sample sentence segment is a segment for describing symptoms in the second symptom describing sentence;

training a word mapping model according to the sample sentence fragments and the medical words corresponding to the second symptom description sentences, wherein the word mapping model is used for mapping any symptom description sentence into the corresponding medical word;

calling the trained word mapping model, mapping target statement segments in the target symptom describing statement to obtain medical words corresponding to the target symptom describing statement, wherein the target statement segments are segments used for describing symptoms in the target symptom describing statement, and the target statement segments are statement segments with the maximum weight in the target symptom describing statement;

the generating annotation information of the first symptom description statement based on the medical terms contained in the reply statement comprises:

extracting keywords from the reply sentences to obtain first medical terms and second medical terms in the reply sentences, wherein the first medical terms are terms used for describing diseases, and the second medical terms are terms used for describing symptoms;

according to the first medical term, at least one second medical term corresponding to the first medical term is inquired from a knowledge database, wherein the knowledge database comprises a corresponding relation between the first medical term and the second medical term;

and constructing the extracted second medical term and the inquired second medical term into the labeling information.

2. The method of claim 1, wherein after the training of the word mapping model according to the sample sentence fragments and the medical words corresponding to the second symptom description sentences, the method further comprises:

acquiring a third symptom description sentence and a corresponding medical word, wherein the medical word corresponding to the third symptom description sentence is obtained by artificial marking;

and training the word mapping model again according to the third symptom description sentence and the corresponding medical word.

3. The method of any one of claims 1-2, wherein the term annotation model comprises a plurality of reference medical terms, each reference medical term describing a symptom; the training of the word labeling model according to the first symptom description sentence and the labeling information comprises:

calling the word labeling model, labeling the first symptom description statement, and obtaining prediction probabilities corresponding to the plurality of reference medical words, wherein the prediction probability corresponding to each reference medical word is used for indicating the possibility that the reference medical word is the medical word corresponding to the first symptom description statement;

and training the word labeling model according to the prediction probabilities corresponding to the plurality of reference medical words and the medical words in the labeling information.

4. The method of claim 3, wherein said invoking the term tagging model to tag the first symptom description statement to obtain the prediction probabilities corresponding to the plurality of reference medical terms comprises:

calling the word annotation model, and performing feature extraction on the first symptom description statement to obtain statement features of the first symptom description statement;

performing feature transformation on the sentence features to obtain reference features, wherein the reference features comprise feature values of multiple dimensions, and each dimension corresponds to one reference medical word;

and respectively determining the characteristic value of each dimension in the reference characteristic as the prediction probability corresponding to the reference medical term corresponding to each dimension.

5. The method of claim 3, wherein the training the term tagging model according to the prediction probabilities corresponding to the plurality of reference medical terms and the medical terms in the tagging information comprises:

determining a first target numerical value as a true probability of the medical word in the labeling information;

for any medical term in the labeling information, determining a prediction probability corresponding to a reference medical term which is the same as the medical term as the prediction probability corresponding to the medical term;

determining a loss value of the word annotation model according to the prediction probability and the real probability corresponding to each medical word in the annotation information;

and training the word labeling model according to the loss value.

6. The method of claim 5, wherein before determining the loss value of the word tagging model according to the prediction probability and the true probability corresponding to each medical word in the tagging information, the method further comprises:

determining a second target numerical value as a true probability corresponding to the reference medical term which is not included in the labeling information in the plurality of reference medical terms;

determining a loss value of the word labeling model according to the prediction probability and the real probability corresponding to each medical word in the labeling information, including:

and determining the loss value of the word annotation model according to the prediction probability and the real probability corresponding to each medical word in the annotation information and the prediction probability and the real probability corresponding to the reference medical word which is not contained in the annotation information.

7. The method of any one of claims 1-2, wherein generating annotation information for the first symptom description statement based on the medical term contained in the reply statement comprises:

extracting keywords from the reply sentences to obtain medical words in the reply sentences;

and forming the obtained medical words into the labeling information.

8. A medical word mapping method, the method comprising:

determining a target threshold number of words as a sentence segment in sequence according to the positions of a plurality of words contained in the first symptom describing sentence to obtain a plurality of sentence segments;

determining a sentence segment corresponding to the maximum weight in the plurality of sentence segments as a sample sentence segment in the first symptom describing sentence, wherein the sample sentence segment is a segment for describing symptoms in the first symptom describing sentence;

training a word mapping model according to the sample sentence fragments and the labeling information, wherein the word mapping model is used for mapping any symptom description sentence into a corresponding medical word;

9. A medical term tagging device, the device comprising:

the labeling module is used for calling the trained word labeling model, labeling the second symptom description statement and obtaining a medical word corresponding to the second symptom description statement;

the segmentation module is used for sequentially determining a target threshold number of words as a sentence segment according to the positions of the words contained in the second symptom describing sentence to obtain a plurality of sentence segments;

the determining module is used for respectively determining the sum of the weights of the words contained in each sentence segment as the weight of each sentence segment, and the weight is used for expressing the association degree of the corresponding sentence segment and the symptom described by the corresponding symptom description sentence;

the determining module is further configured to determine a sentence fragment corresponding to a maximum weight in the plurality of sentence fragments as a sample sentence fragment in the second symptom describing sentence, where the sample sentence fragment is a fragment in the second symptom describing sentence, and the sample sentence fragment is used for describing a symptom;

the training module is further used for training a word mapping model according to the sample sentence fragments and the medical words corresponding to the second symptom description sentences, and the word mapping model is used for mapping any symptom description sentence into the corresponding medical word;

the mapping module is used for calling the trained word mapping model, mapping target statement segments in the target symptom description statements to obtain medical words corresponding to the target symptom description statements, wherein the target statement segments are segments used for describing symptoms in the target symptom description statements, and the target statement segments are statement segments with the maximum weight in the target symptom description statements;

the generation module comprises:

10. A medical word mapping apparatus, characterized in that the apparatus comprises:

the segmentation module is used for sequentially determining a target threshold number of words as a sentence segment according to the positions of the words contained in the first symptom describing sentence to obtain a plurality of sentence segments;

the determining module is further configured to determine a sentence fragment corresponding to a maximum weight in the plurality of sentence fragments as a sample sentence fragment in the first symptom describing sentence, where the sample sentence fragment is a fragment in the first symptom describing sentence, and the sample sentence fragment is used for describing a symptom;

the training module is used for training a word mapping model according to the sample sentence fragments and the labeling information, and the word mapping model is used for mapping any symptom description sentence into a corresponding medical word;

the generation module comprises:

11. A computer device, characterized in that the computer device comprises a processor and a memory, wherein at least one computer program is stored in the memory, and the at least one computer program is loaded by the processor and executed to implement the operations performed in the medical word tagging method according to any one of claims 1 to 7; or to implement the operations performed in the medical word mapping method as claimed in claim 8.

12. A computer-readable storage medium, in which at least one computer program is stored, the at least one computer program being loaded and executed by a processor to perform the operations performed in the method for tagging medical words according to any one of claims 1 to 7; or to implement the operations performed in the medical word mapping method as claimed in claim 8.