CN115878755A - Text processing method, medical text processing method and device and electronic equipment - Google Patents

Text processing method, medical text processing method and device and electronic equipment Download PDF

Info

Publication number
CN115878755A
CN115878755A CN202211440932.9A CN202211440932A CN115878755A CN 115878755 A CN115878755 A CN 115878755A CN 202211440932 A CN202211440932 A CN 202211440932A CN 115878755 A CN115878755 A CN 115878755A
Authority
CN
China
Prior art keywords
text
entity
attribute
candidate
concept
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211440932.9A
Other languages
Chinese (zh)
Inventor
姚富根
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Uc Mobile China Co ltd
Original Assignee
Uc Mobile China Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Uc Mobile China Co ltd filed Critical Uc Mobile China Co ltd
Priority to CN202211440932.9A priority Critical patent/CN115878755A/en
Publication of CN115878755A publication Critical patent/CN115878755A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application discloses a text processing method, which comprises the steps of extracting texts related to entity concepts in a specified field from texts to be analyzed to serve as entity text fragments, selecting candidate entity concepts reaching a preset similarity standard with the entity text fragments from the entity concept set in the specified field, and taking the candidate entity concepts as entities subjected to preliminary screening. And then, in order to further improve the accuracy of entity normalization, a preset interactive model is used for carrying out interactive analysis on the text to be analyzed, the entity text fragment and each candidate entity concept respectively to obtain a consistency analysis result, and a target entity concept is selected from the candidate entity concepts according to the consistency analysis result. The method ranks the candidate entity concepts reaching the preset similarity standard with the entity text fragment, selects the target entity concept from the candidate entity concepts, and combines the candidate entity concepts and the target entity concept to improve the accuracy of entity normalization of the text to be analyzed.

Description

Text processing method, medical text processing method and device and electronic equipment
Technical Field
The present application relates to the field of text information processing, and in particular, to a text processing method, a medical text processing method, a text processing apparatus, a medical text processing apparatus, an electronic device, and a computer storage medium.
Background
The standardized text can facilitate service management, improve service efficiency, and meanwhile, by combining a knowledge graph and an AI technology, an intelligent service project can be developed.
For example, in the medical service field, the electronic medical instruction (a kind of text) is digital information such as characters, symbols, charts, figures, numbers, images, etc. made by medical staff on the course and treatment condition of a patient, and is the basis for a doctor to diagnose and treat the disease. The electronic description of the disease condition is used as the original record of the whole diagnosis and treatment process of the patient, records the disease course stated by the patient or the accompanied person after the patient stays in the medical institution, and also records the analysis, diagnosis and treatment process of the disease condition, the estimation of the prognosis and the opinions of all levels of doctors on ward visit and consultation. However, the existing electronic medical condition descriptions are natural texts expressed randomly, and in order to associate the natural texts with the medical knowledge maps, the natural texts must be subjected to entity normalization, that is, the natural texts are normalized to corresponding standard concepts in a medical field knowledge base, and a bridge between the natural texts and the knowledge base is built.
The existing entity normalization schemes mainly include the following two schemes, wherein one scheme is that recalling mainly depends on the inverted index of an entity fragment and recalling based on atomic words, the atomic word recalling depends on the entity splitting result of the entity fragment, if the entity extraction is wrong, the result cannot be recalled, the final result also completely depends on a sequencing model, and the result lacks of explanatory verification. The other scheme is to adjust the embedded vector characterization based on the attention mechanism to perform entity matching or to complement the entity by using a generative task, but most of the schemes discuss an end-to-end entity normalization scheme, so that the scheme does not have a good entity normalization overall scheme applied to the medical field, and the accuracy of entity normalization is low.
Therefore, how to set an entity normalization scheme in a medical scene and improve the accuracy of entity normalization becomes an urgent problem to be solved by those skilled in the art.
Disclosure of Invention
The embodiment of the application provides a text processing method, which aims to solve the problems that an entity normalization scheme is set in a medical scene, and the accuracy of entity normalization is improved.
The embodiment of the application provides a text processing method, which comprises the following steps:
obtaining a text to be analyzed, and extracting a text related to an entity concept in a specified field from the text to be analyzed as an entity text fragment;
selecting candidate entity concepts reaching a preset similarity standard with the entity text fragment from the entity concept set of the specified field;
respectively carrying out interactive analysis on the text to be analyzed, the entity text fragment and each candidate entity concept by using a preset interactive model to obtain a consistency analysis result;
comparing the obtained consistency analysis results of the candidate entity concepts, and selecting a target entity concept from the candidate entity concepts according to a predetermined standard.
Optionally, the method further includes:
analyzing the text to be analyzed to obtain attribute information related to the specified field;
and combining the attribute information and the target entity concept to generate a specified field target text aiming at the text to be analyzed.
Optionally, the analyzing the text to be analyzed to obtain attribute information related to the specified field includes:
according to preset attribute categories relevant to the specified field, identifying attribute texts and attribute text fragments corresponding to the attribute categories from the texts to be analyzed;
normalizing each attribute text and each attribute text fragment to obtain an attribute value text corresponding to each attribute category;
and combining the attribute categories and the corresponding attribute value texts to form attribute information related to the specified field.
Optionally, the step of identifying the attribute text and the attribute text segment corresponding to each attribute category from the text to be analyzed according to the preset attribute category related to the specified field adopts an SPO entity attribute extraction algorithm based on a language representation model pre-trained in the specified field.
Optionally, the step of normalizing each attribute text segment to obtain an attribute value text corresponding to each attribute category is to apply a model attribute value processing policy, a rule attribute value processing policy, or a combination of both to normalize the attribute text and the attribute text segment for different attribute categories.
Optionally, the selecting, from the entity concept set in the specified field, a candidate entity concept that meets a predetermined similarity criterion with the entity text snippet includes:
respectively carrying out vector representation on the entity text fragment and the entity concepts in the entity concept set of the specified field;
calculating the similarity between the entity text fragment and the entity concept according to the obtained vector;
and selecting entity concepts meeting the preset similarity standard as candidate entity concepts.
Optionally, the using a preset interactive model to perform interactive analysis on the text to be analyzed, the entity text fragment, and each candidate entity concept respectively to obtain a consistency analysis result includes:
converting the text to be analyzed into corresponding text vector data to be analyzed, converting the entity text fragment into corresponding entity text fragment vector data, and converting each candidate entity concept into corresponding candidate entity concept vector data; the entity text fragment vector data comprises vector data corresponding to upper and lower context data associated with the entity text fragment;
inputting the text vector data to be analyzed, the entity text segment vector data and the candidate entity concept vector data into a preset interactive model to obtain a first similarity value of upper and lower data associated with the entity text segment and a plurality of candidate entity concepts, a second similarity value of the entity text segment and the plurality of candidate entity concepts and a global similarity value associated with the candidate entity concepts;
and obtaining a consistency analysis result of each candidate entity concept according to the first similarity value, the second similarity value and the global similarity value.
Optionally, the step of selecting a target entity concept from the candidate entity concepts according to a predetermined standard from the consistency analysis result of each candidate entity concept obtained by the comparison includes:
obtaining first scoring information corresponding to the first similarity value, second scoring information corresponding to the second similarity value and third scoring information corresponding to the global similarity value;
grading and combining the first grading information, the second grading information and the third grading information to obtain comprehensive grading information of each consistency analysis result;
comparing the comprehensive grading information of each consistency analysis result with a preset grading threshold value to obtain a target consistency analysis result meeting the preset grading threshold value;
and selecting a target entity concept from the candidate entity concepts according to the target consistency analysis result.
Optionally, the combining the attribute information with the target entity concept to generate a target text in a specified field for a text to be analyzed includes:
acquiring a combination relation template corresponding to the attribute information and the target entity concept;
determining a combination structure and a combination sequence of attribute information and a target entity concept according to the combination relation template;
and combining the attribute information and the target entity concept according to the combined structure and the combined sequence to generate a specified field target text aiming at the text to be analyzed.
Optionally, the obtaining of the combined relationship template corresponding to the attribute information and the target entity concept includes:
obtaining a plurality of candidate combination relation templates, wherein the candidate combination relation templates are obtained through a preset candidate combination relation template database, and each candidate combination relation template has a respective category identification;
acquiring attribute information and a combined category identifier of a target entity concept;
and matching the combined category identification with category identifications of a plurality of candidate combined relation templates so as to obtain the combined relation template of the attribute information and the target entity concept from the candidate combined relation templates.
Optionally, the method further includes:
acquiring a combined structure and a combined sequence of the target texts in the specified field;
determining attribute information according to the specified field target text, the combined structure and the combined sequence of the specified field target text;
obtaining an original text for generating a target entity concept, and obtaining initial attribute information from the original text;
and verifying the attribute information and the initial attribute information, and if the verification result is not matched, combining the initial attribute information and the target entity concept to generate a specified field target text for the text to be analyzed.
The embodiment of the present application further provides a medical text processing method, including:
obtaining a medical text to be analyzed, and extracting the medical text related to the medical entity concept in the specified field from the medical text to be analyzed as a medical entity text fragment;
selecting candidate medical entity concepts reaching a preset similarity standard with the medical entity text fragment from the medical entity concept set in the specified field;
respectively carrying out interactive analysis on the medical text to be analyzed, the medical entity text fragment and each candidate medical entity concept by using a preset interactive model to obtain a consistency analysis result;
comparing the obtained consistency analysis results of the candidate medical entity concepts, and selecting a target medical entity concept from the candidate medical entity concepts according to a predetermined standard.
An embodiment of the present application further provides a text processing apparatus, including:
the entity text fragment obtaining unit is used for obtaining a text to be analyzed and extracting a text related to an entity concept in a specified field from the text to be analyzed as an entity text fragment;
a candidate entity concept obtaining unit, configured to select, from the entity concept set in the specified field, a candidate entity concept that meets a predetermined similarity criterion with the entity text snippet;
a consistency analysis result obtaining unit, configured to perform interactive analysis on the text to be analyzed, the entity text fragment, and each candidate entity concept respectively using a preset interactive model, so as to obtain a consistency analysis result;
and the target entity concept obtaining unit is used for comparing the obtained consistency analysis results of the candidate entity concepts and selecting the target entity concept from the candidate entity concepts according to a preset standard.
An embodiment of the present application further provides a medical text processing apparatus, including:
the medical entity text fragment unit is used for obtaining a medical text to be analyzed and extracting a medical text related to a medical entity concept in a specified field from the medical text to be analyzed as a medical entity text fragment;
the candidate medical entity concept unit is used for selecting candidate medical entity concepts reaching a preset similarity standard with the medical entity text fragment from the medical entity concept set in the specified field;
the consistency analysis result unit is used for carrying out interactive analysis on the medical text to be analyzed, the medical entity text fragments and each candidate medical entity concept by using a preset interactive model to obtain a consistency analysis result;
and the target medical entity concept unit is used for comparing the obtained consistency analysis result of each candidate medical entity concept and selecting the target medical entity concept from the candidate medical entity concepts according to a preset standard.
An embodiment of the present application further provides an electronic device, where the electronic device includes: a processor; a memory for storing a computer program for execution by the processor to perform the method of any one of the above.
An embodiment of the present application further provides a computer storage medium, where a computer program is stored, and the computer program is executed by a processor to perform any one of the methods described above.
Compared with the prior art, the method has the following advantages:
the embodiment of the application provides a text processing method, which comprises the following steps:
obtaining a text to be analyzed, and extracting a text related to an entity concept in a specified field from the text to be analyzed as an entity text fragment; selecting candidate entity concepts reaching a preset similarity standard with the entity text fragment from the entity concept set of the specified field; respectively carrying out interactive analysis on the text to be analyzed, the entity text fragment and each candidate entity concept by using a preset interactive model to obtain a consistency analysis result; comparing the obtained consistency analysis results of the candidate entity concepts, and selecting a target entity concept from the candidate entity concepts according to a predetermined standard.
In the first embodiment of the application, a text related to entity concepts in a specified field is extracted from the text to be analyzed to serve as an entity text fragment, candidate entity concepts reaching a preset similarity standard with the entity text fragment are selected from the entity concept set in the specified field, the candidate entity concepts serve as entities subjected to preliminary screening, then, in order to further improve the accuracy of entity normalization, a preset interactive model is used for performing interactive analysis on the text to be analyzed, the entity text fragment and each candidate entity concept respectively to obtain a consistency analysis result, and a target entity concept is selected from the candidate entity concepts according to the consistency analysis result. The method ranks candidate entity concepts reaching a preset similarity standard with an entity text fragment, selects a target entity concept from the candidate entity concepts, and combines the candidate entity concepts and the target entity concept to improve the accuracy of entity normalization of a text to be analyzed.
Drawings
Fig. 1 is a schematic diagram of an application scenario provided in a first embodiment of the present application.
Fig. 2 is a flowchart of a text processing method according to a first embodiment of the present application.
Fig. 3 is a flowchart for obtaining attribute information related to a specific domain according to a first embodiment of the present application.
Fig. 4 is a schematic diagram of attribute texts and attribute text segments for identifying attribute categories according to the first embodiment of the present application.
Fig. 5 is a schematic diagram for forming attribute information related to a specific field according to a first embodiment of the present application.
Fig. 6 is a flowchart of a medical text processing method according to a second embodiment of the present application.
Fig. 7 is a schematic diagram of a text processing apparatus according to a third embodiment of the present application.
Fig. 8 is a schematic diagram of a medical text processing device according to a fourth embodiment of the present application.
Fig. 9 is a schematic view of an electronic device according to a fifth embodiment of the present application.
Detailed Description
In the following description, numerous specific details are set forth to provide a thorough understanding of embodiments of the present application. The embodiments of this application are capable of embodiments in many different forms than those described herein and can be similarly generalized by those skilled in the art without departing from the spirit and scope of the embodiments of this application and, therefore, the embodiments of this application are not limited to the specific embodiments disclosed below.
In order to make those skilled in the art better understand the solution of the present application, a detailed description is given below on a specific application scenario of an embodiment of the present application based on the customer service image processing method provided in the present application, as shown in fig. 1, where fig. 1 is a schematic view of an application scenario provided in a first embodiment of the present application.
The application scenario provided in the first embodiment of the present application may be a medical scenario, and specifically, when a user (doctor) is asking for a patient, the doctor records an electronic medical description of the patient, and the electronic medical description is recorded on a client used by the user. For example, a natural text (text to be analyzed) in the electronic medical instruction is "the upper abdomen and the left side are painful for 3-4 days", the client sends the text to be analyzed to the server (processor), and the server performs normalization processing on the text to be analyzed according to the method provided in the scene. Specifically, a text related to an entity concept of a specified field is extracted from the text to be analyzed as an entity text fragment, and in this step, the specified field includes a specified field having the same field or a different field as the text to be analyzed, or a specified category having the same category or a different category as the text to be analyzed. Taking the text to be analyzed as an example, the corresponding extracted entity text fragment is "upper left abdomen, pain". And then, selecting candidate entity concepts reaching a preset similarity standard with the entity text fragment from the entity concept set of the specified field. The entity concept set is composed of a plurality of entity concepts, and each entity concept set is provided with a corresponding category and a category identification. So that the text to be analyzed can be matched with the entity concept set with the same category identification according to the category identification of the text to be analyzed. Alternatively, each entity concept set has a corresponding domain and domain identification. So that the text to be analyzed can be matched with the entity concept set with the same domain identification according to the domain identification of the text to be analyzed. In this scenario, corresponding to the text to be analyzed in the foregoing example, the selected candidate entity concepts include epigastric distending pain, epigastric pain, left abdominal pain, epigastric pain, right abdominal pain, and the like. The first 5 candidate entity concepts with high similarity are preferably screened out in the scene.
After the candidate entity concepts are obtained, interactive analysis is carried out on the text to be analyzed, the entity text fragments and the candidate entity concepts respectively by using a preset interactive model to obtain a consistency analysis result, the obtained consistency analysis results of the candidate entity concepts are compared, and a target entity concept is selected from the candidate entity concepts according to a preset standard. The target entity concept is the accurate expression of the entity text segment obtained by entity normalization in the scene. Corresponding to the above example, the target entity concept of the present scenario is specifically "epigastric pain".
After the target entity concept is determined, the target entity concept is a standard concept or a more accurate concept of the specified field, so that the target entity concept is used as a search word for searching, and more accurate related information in the specified field can be obtained than by directly using natural text. Therefore, the preliminary technical effect expected by the technical scheme can be obtained in this step, namely, the precise entity concept is obtained, and text preparation is made for further work such as machine knowledge question answering and machine diagnosis.
Of course, in natural language, not only information in the form of standard terms of target entity concepts, but also information related to the target entity concepts, such as information representing degree, duration, distance, feeling, etc., can be generally and daily extracted, and the content of the information is attached to a target entity concept, and the information is matched with the standard concepts (i.e. target entity concepts) in a specified field, so that a very clear description of the target information to be retrieved can be obtained; these pieces of information are referred to as attribute information.
For example, "there is some pain," 3-4 days "in the above example, are not related to the target entity concept, but these pieces of information are highly related to the actual content to be described, and therefore, for these pieces of attribute information, it is necessary to combine them with the target entity concept to obtain a specified domain target text containing the target entity concept and attribute information, and this text will be able to better reflect the content expected to be expressed by the natural text, thereby providing an information retrieval basis for better machine knowledge question answering and machine diagnosis.
Specifically, in order to further improve the accuracy of the selected entity text fragment, the method further includes analyzing the text to be analyzed to obtain attribute information related to the specified field, where the attribute information is information associated with the entity text fragment in the text to be analyzed. For example, if the entity text segment is "somewhat painful, 3-4 days", the attribute information obtained correspondingly is "mild (painful), 3-4 days". And finally, combining the attribute information with the target entity concept to generate a specified field target text aiming at the text to be analyzed, namely 'slight pain in upper abdomen for 3-4 days'.
After the target text of the designated field is obtained by the method, the server can feed the target text of the designated field back to the client, so that the user can obtain processing results such as accurate entity recommendation, target text of the designated field and the like, and more accurate items such as machine knowledge question answering, machine medical diagnosis and the like can be developed on the basis of the processing results.
There are many application scenarios corresponding to the first embodiment of the present application, where the application scenarios are only schematic, and the application scenarios do not limit the scope to be protected by the first embodiment of the present application.
First embodiment
Corresponding to the above application scenario, a first embodiment of the present application provides a text processing method to improve accuracy of entity normalization. As shown in fig. 2, fig. 2 is a flowchart of a text processing method according to a first embodiment of the present application, where the method includes the following steps:
step S201, obtaining a text to be analyzed, and extracting a text related to an entity concept in a specified field from the text to be analyzed as an entity text fragment.
In this step, the text to be analyzed includes any type of text to be analyzed in any field, and the text to be analyzed refers to a text obtained by natural language, which is generally a description of a disease condition obtained by spoken or written language, generally a description of the patient himself, and thus, the wording thereof generally does not conform to the specifications for use in the medical field. In the first embodiment of the present application, the text to be analyzed includes an electronic disease description text conforming to a natural language sequence, specifically, for example, "upper abdomen left pain is 3-4 days".
After the text to be analyzed is obtained, the text related to the entity concept of the specified field can be extracted from the text to be analyzed as an entity text fragment. Specifically, a domain and a domain identifier to which the text to be analyzed belongs are obtained, and the text related to the entity concept of the specified domain matched with the domain identifier is determined according to the domain and the domain identifier to which the text to be analyzed belongs. Or obtaining the category and the category identification of the text to be analyzed, and determining the text related to the entity concept of the specified field matched with the category identification according to the category and the category identification of the text to be analyzed. The entity concept refers to information in a corresponding field that describes entity information in accordance with an object concerned by the field. For example, if the text to be analyzed as exemplified in the above steps is "upper abdomen left is somewhat painful", the text related to the entity concept of the specified domain extracted from the text to be analyzed includes "upper abdomen left, painful" as the entity text fragment.
The step extracts an Entity text fragment related to the specified field from the text to be analyzed, and specifically, the Entity text fragment can be realized by using various existing NER models (Named Entity Recognition models), and the NER models can extract entities with specific meaning or strong referenceness from the text to be analyzed according to the knowledge of the specified field. However, entity text segments (segments) extracted by the NER model often have various problems, for example, boundary errors or incomplete information, such as the text "tumor marker is increased during chemotherapy", segment is "marker is increased", and symptom subject is absent; in addition, the entity text fragment is used as the content expressed by the natural language in the text to be analyzed, so that the problem that the language does not meet the requirements of terms in related fields exists, and the standard and correct information processing on the acquired information is not facilitated; therefore, on this basis, further subsequent steps are required.
Step S202, selecting candidate entity concepts reaching a preset similarity standard with the entity text fragment from the entity concept set of the specified field.
In combination with the above, in the text related to the entity concepts in the specified field, the number of the text is more than one, and the number of the entity concepts is more than one, that is, the entity concept set is composed of a plurality of entity concepts, and each entity concept set has a corresponding category and a category identifier, so that the text to be analyzed can be matched with the entity concept set having the same category identifier according to the category identifier of the text to be analyzed. Or each entity concept set has a corresponding domain and domain identification. So that the text to be analyzed can be matched with the entity concept set with the same domain identification according to the domain identification of the text to be analyzed. The entity concept set of the first embodiment of the present application includes an entity concept set in the medical field.
The method comprises the steps of firstly, obtaining a plurality of entity text fragments in an entity concept set related to a text to be analyzed, specifically, carrying out vector characterization on the entity text fragments and the entity concepts in the entity concept set of the specified field respectively, namely, converting the text to be analyzed into corresponding vector data, and determining first vector data corresponding to the entity text fragments in the vector data, wherein the first vector data corresponding to the entity text fragments comprise associated vector data corresponding to context data related to the entity text fragments. And respectively converting each entity concept in the determined entity concept set into corresponding second vector data. Then, according to the obtained vectors, calculating similarity between the entity text segment and the entity concept, that is, determining similarity values between the entity text segment and a plurality of entity concepts in the entity concept set, specifically, inputting second vector data corresponding to each entity concept and first vector data corresponding to the entity text segment into a preset two-tower model to obtain similarity values between the second vector data corresponding to each entity concept and the entity text segment, and taking the similarity values between the second vector data corresponding to each entity concept and the first vector data corresponding to the entity text segment as the similarity values between the entity text segment and the plurality of entity concepts in the entity concept set. And finally, selecting the entity concepts meeting the preset similarity standard as candidate entity concepts, specifically, sequencing a plurality of entity concepts in the entity concept set according to the similarity value so as to obtain the candidate entity concepts reaching the preset similarity standard with the entity text fragment. In the first embodiment of the present application, the predetermined similarity criterion is set to sort the top 5 entity concepts according to the similarity value as candidate entity concepts. Corresponding to the foregoing examples, the top 5 candidate entity concepts with higher similarity values preferably selected by the present embodiment include epigastric distending pain, epigastric pain, left abdominal pain, epigastric pain, and right abdominal pain. Of course, the number of candidate entity concepts ranked according to the similarity value may be other among other predetermined similarity criteria.
It should be noted that, when the candidate entity concepts are ranked, the negative sample entity has the largest influence on the final effect, and for this purpose, a circle-loss function is used in the training, which is defined as the following formula, where d (e, e) is shown in the following n )(d(e,e p ) Represents the vector cosine similarity of an entity text fragment and a negative (positive) sample entity, K is a positive sample cluster, i represents the ith positive sample entity, L is a negative sample cluster, j represents the jth negative sample entity, m is the boundary distance between the positive and negative samples, γ is a scaling factor, exp is an exponential function, and log is a logarithmic function. The loss target is to increase the distance between each positive sample entity and each negative sample entity as much as possible, and compared with a group of positive and negative sample pairs at a time in circle loss, the loss can simultaneously calculate and introduce a large number of positive and negative sample pairs, and the sequencing effect of the preset double-tower model is improved.
Figure BDA0003948239290000101
Step S203, using a preset interactive model to perform interactive analysis on the text to be analyzed, the entity text segment and each candidate entity concept respectively to obtain a consistency analysis result.
After the candidate entity concepts are obtained, interactive analysis is carried out on the text to be analyzed, the entity text fragments and the candidate entity concepts respectively by using a preset interactive model, and a consistency analysis result is obtained. Specifically, a text to be analyzed is converted into corresponding text vector data to be analyzed, the entity text fragment is converted into corresponding entity text fragment vector data, and each candidate entity concept is converted into corresponding candidate entity concept vector data; and the entity text fragment vector data comprises vector data corresponding to the context data associated with the entity text fragment. And corresponding to the foregoing, the text to be analyzed is exemplified by "upper abdomen left is painful", the solid text segment includes "upper abdomen left, painful", the upper and lower text data associated with the solid text segment includes "upper abdomen left, painful", and the candidate solid concept includes "upper abdomen distending pain, upper abdominal pain, left abdominal pain, upper abdominal pain, right abdominal pain". And then inputting the text vector data to be analyzed, the entity text segment vector data and the candidate entity concept vector data into a preset interactive model to obtain first similarity values of the upper and lower context data associated with the entity text segment and the plurality of candidate entity concepts, second similarity values of the entity text segment and the plurality of candidate entity concepts and global similarity values of the associated candidate entity concepts, and obtaining consistency analysis results of the candidate entity concepts according to the first similarity values, the second similarity values and the global similarity values.
Step S204, comparing the obtained consistency analysis results of each candidate entity concept, and selecting a target entity concept from the candidate entity concepts according to a predetermined standard.
After the consistency analysis results of the candidate entity concepts are obtained, comprehensive scores of the consistency analysis results are compared, and the candidate entity concepts with the comprehensive scores meeting preset conditions are selected from the candidate entity concepts to serve as target entity concepts. Specifically, first scoring information corresponding to a first similarity value, second scoring information corresponding to a second similarity value, and third scoring information corresponding to a global similarity value are obtained, and the first scoring information, the second scoring information, and the third scoring information are scored and combined to obtain comprehensive scoring information of consistency analysis results corresponding to the candidate entity concepts. And comparing the comprehensive scoring information of the consistency analysis results corresponding to the candidate entity concepts with a preset scoring threshold value to obtain a target consistency analysis result meeting the preset scoring threshold value, and selecting a target entity concept from the candidate entity concepts according to the target consistency analysis result. In this step, the plurality of candidate entity concepts are further normalized to make the accuracy of the obtained target entity concept more consistent with the normalization requirement. Corresponding to the foregoing example, the corresponding target entity concept is specifically "epigastric, painful".
In the first embodiment of the application, a text related to entity concepts in a specified field is extracted from the text to be analyzed to serve as an entity text fragment, candidate entity concepts reaching a preset similarity standard with the entity text fragment are selected from the entity concept set in the specified field, the candidate entity concepts serve as entities subjected to preliminary screening, then, in order to further improve the accuracy of entity normalization, a preset interactive model is used for performing interactive analysis on the text to be analyzed, the entity text fragment and each candidate entity concept respectively to obtain a consistency analysis result, and a target entity concept is selected from the candidate entity concepts according to the consistency analysis result. The method ranks the candidate entity concepts reaching the preset similarity standard with the entity text fragment, selects the target entity concept from the candidate entity concepts, and combines the candidate entity concepts and the target entity concept to improve the accuracy of entity normalization of the text to be analyzed.
Further, after obtaining the target entity concept, in order to make the obtained text more conform to the text expressed by the canonical terms in the designated field, the method provided in the first embodiment of the present application further includes analyzing the text to be analyzed, obtaining attribute information related to the designated field, and combining the attribute information with the target entity concept to generate the designated field target text for the text to be analyzed. The attribute information refers to information associated with entity information in the text to be analyzed or attribute information related to a specified field associated with the entity information in the text to be analyzed.
In the first embodiment of the present application, analyzing a text to be analyzed to obtain attribute information related to a specified field includes the following steps, as shown in fig. 3, where fig. 3 is a flowchart provided in the first embodiment of the present application for obtaining attribute information related to a specified field, and specifically includes the following steps:
step S301, according to preset attribute categories relevant to the specified field, identifying attribute text fragments corresponding to each attribute category from the text to be analyzed.
The preset attribute category related to the designated field can be obtained through a database, namely, in each field, each entity has the attribute thereof, and the attribute corresponds to the belonging category and the category identification, namely the attribute category. In the first embodiment of the present application, the specified field is a medical field, and for a disease entity in the medical field, attribute types (secondary categories) in 15 are determined according to occurrence frequency in medical diagnosis, where the attribute types are shown in table 1:
Figure BDA0003948239290000121
/>
Figure BDA0003948239290000131
TABLE 1
Then, attribute texts and attribute text segments corresponding to various attribute categories are identified from the texts to be analyzed, and the steps are realized by adopting an SPO entity attribute extraction algorithm based on a language representation model pre-trained in a specified field. Specifically, in the text to be analyzed, the text to be analyzed is processed to extract each entity contained in the text to be analyzed and the entity identifier (including the category feature and the location information) corresponding to each entity. Wherein the entities comprise attribute entities. Then, obtaining an attribute type and an attribute type ID associated with the attribute entity, specifically, converting a text to be analyzed into corresponding vector data, wherein the vector data comprises attribute vector data corresponding to the attribute entity, inputting the vector data corresponding to the text to be analyzed into a preset classification model to obtain a category feature of the vector data, and the category feature of the vector data comprises a category feature of the attribute vector data. Matching the category characteristics of the vector data with the category identification of each attribute category to determine a target attribute type and a target attribute type ID corresponding to the category characteristics of the vector data from preset attribute categories related to the specified field, and taking the target attribute type and the target attribute type ID as the attribute type and the attribute type ID associated with the attribute entity. And then, identifying the attribute text and the attribute text fragment in the text to be analyzed according to an entity identification (ID, a position identification) and an attribute type ID, specifically, dividing the text to be analyzed into single-character entities by taking a single character as a unit, determining the association relationship between each single-character entity ID and the single-character entity, then identifying the attribute from the text to be analyzed according to the entity identification, the attribute type ID and the single-character entity ID, and determining the attribute text fragment in the text to be analyzed according to the association relationship among the entity identification, the attribute text and the single-character entity.
In the first embodiment of the present application, in order to facilitate understanding of the above steps of identifying attribute texts and attribute text snippets corresponding to respective attribute categories from the text to be analyzed, the following description will be made by way of example with reference to fig. 4. Fig. 4 is a schematic diagram of attribute texts and attribute text segments for identifying attribute categories according to the first embodiment of the present application.
Specifically, the text to be analyzed is specifically "lying still, not very painful, very painful when standing, three or four days". Inputting the text to be analyzed into a BERT (language representation model) forward model, wherein the BERT forward model at least comprises an entity analysis layer, a fragment layer, a position layer and a hidden layer. The 4-layer BERT processes the text to be analyzed to extract each entity and the entity identifier corresponding to each entity contained in the text to be analyzed, for example, the extracted entity is "lying, going, not very much, painful, standing, right, painful, severe, three or four days, and" painful "in the attribute entity. And performing forward encoding, namely converting the text to be analyzed into corresponding vector data, wherein the vector data comprises attribute vector data corresponding to the attribute entity, inputting the vector data corresponding to the text to be analyzed into a preset classification model to obtain the class characteristics of the vector data, and the class characteristics of the vector data comprise the class characteristics of the attribute vector data.
And then, matching the category characteristics of the vector data with the category identification of each attribute category to determine a target attribute type and a target attribute type ID corresponding to the category characteristics of the vector data from preset attribute categories related to the specified field, and taking the target attribute type and the target attribute type ID as the attribute type and the attribute type ID associated with the attribute entity. As shown in connection with FIG. 4, the target property type and target property type ID are action conditions for the first layer (ID, location), and the second layer severity. It should be noted that, each attribute category is input into the model in advance, and each attribute category has a corresponding category identifier and a category feature. In this embodiment, the attribute categories include at least duration of attack, severity, action condition, and the like.
Finally, dividing the text to be analyzed into single character entities by taking the single characters as units, and determining the association relationship between each single character entity ID and the single character entities, wherein, as shown in FIG. 4, the single character entity ID corresponding to "lying" is "E1, E2", the single character entity ID corresponding to "not very" is "E6, E7, E8", the single character entity ID corresponding to "lying" is "E1, E2", and the single character entity ID corresponding to "three days and four days" is "E20, E21, E22". And then, determining an attribute text in the attribute entity according to the entity identification, the attribute type ID and the single character entity ID, and determining an attribute text segment in the text to be analyzed according to the association among the entity identification, the attribute text and the single character entity. Wherein, the attribute texts marked from the text to be analyzed are 'lying down', 'not very' and 'three or four days' and the like; and the attribute text segment associated with the attribute text is "lying still", "not very painful", and "three or four days old", etc. The above is a specific example description of the SPO entity attribute extraction algorithm using the language characterization model pre-trained based on the specified domain in this step.
Of course, in other examples of the present application, the attribute text and the attribute text fragment corresponding to each attribute category may also be identified from the text to be analyzed through the following steps, specifically, in the text to be analyzed, each entity contained in the text to be analyzed and the entity identifier corresponding to each entity are extracted by processing the text to be analyzed. And then, after the entities in the text to be analyzed and the entity identifications corresponding to the entities are matched with the category identifications of the attribute categories, entity text segments corresponding to the attribute categories can be identified from the text to be analyzed. It should be noted that any word group, short sentence, or word in the text is an entity, i.e. the attribute is also an entity in the text to be analyzed. Therefore, after the entities in the text to be analyzed and the entity identifications corresponding to the entities are matched with the category identifications of the attribute categories, the attribute texts and the attribute text fragments corresponding to the attribute categories can be identified from the text to be analyzed.
Step S302, performing normalization processing on each attribute text and each attribute text fragment to obtain an attribute value text corresponding to each attribute category.
After the attribute texts and the attribute text fragments corresponding to the attribute categories are identified from the texts to be analyzed, normalization processing is carried out on the attribute texts and the attribute text fragments to obtain attribute value texts corresponding to the attribute categories. In the step, aiming at different attribute categories to which the attribute texts belong, a model attribute value processing strategy, a rule attribute value processing strategy or a combination of the model attribute value processing strategy and the rule attribute value processing strategy are respectively adopted to carry out normalization processing on the attribute texts and the attribute text fragments.
Specifically, when the processing policy corresponding to the attribute category is a model attribute value processing policy, performing normalization processing on the attribute text and the attribute text segments according to the determined model attribute value processing policy to obtain an attribute value text corresponding to each attribute category, including: firstly, converting the attribute text segment into corresponding attribute text segment vector data, and determining attribute text vector data corresponding to the attribute text in the attribute text segment vector data, wherein the attribute text segment vector data comprises vector data corresponding to upper and lower context data associated with the attribute text. In the first embodiment of the present application, the attribute text segment vector data is vector data corresponding to context data associated with the attribute text included in the attribute text segment vector data. Then, each candidate attribute value text in a preset attribute value text set is respectively converted into corresponding candidate attribute value text vector data, first similarity values of attribute text segment vector data and a plurality of candidate attribute value text vector data, second similarity values of the attribute text vector data and a plurality of candidate attribute value text vector data and global similarity values of associated candidate attribute value texts are obtained, and consistency analysis results of each candidate attribute value text are obtained according to the first similarity values, the second similarity values and the global similarity values. And finally, obtaining first scoring information corresponding to the first similarity value, second scoring information corresponding to the second similarity value and third scoring information corresponding to the global similarity value, and scoring and combining the first scoring information, the second scoring information and the third scoring information to obtain comprehensive scoring information of consistency analysis results corresponding to the candidate attribute value texts. And comparing the comprehensive grading information of the consistency analysis results corresponding to the candidate attribute value texts with a preset grading threshold value to obtain target consistency analysis results meeting the preset grading threshold value, and selecting target attribute value texts from the candidate attribute value texts according to the target consistency analysis results. And taking the target attribute value text as the attribute value text corresponding to the attribute category.
For convenience of understanding, the following will further explain the steps of performing normalization processing on the attribute text and the attribute text fragment according to the determined model attribute value processing policy to obtain the attribute value text corresponding to the attribute category, with reference to fig. 4. Specifically, the input information with the attribute category of "severity" is the attribute TEXT segment "TEXT (not very painful)" and the attribute TEXT "not very", and the candidate attribute value TEXT "mild, severe, moderate", and the like. And then carrying out encoding (encoder) processing on the information to obtain a first similarity value of attribute text fragment vector data and a plurality of candidate attribute value text vector data, a second similarity value of the attribute text vector data and a plurality of candidate attribute value text vector data and a global similarity value of associated candidate attribute value texts, and obtaining a consistency analysis result of each candidate attribute value text according to the first similarity value, the second similarity value and the global similarity value. And obtaining first scoring information corresponding to the first similarity value, second scoring information corresponding to the second similarity value and third scoring information corresponding to the global similarity value, and scoring and combining the first scoring information, the second scoring information and the third scoring information to obtain comprehensive scoring information of consistency analysis results corresponding to the candidate attribute value texts. And ordering the comprehensive grading information of the consistency analysis results corresponding to the candidate attribute value texts, obtaining a target consistency analysis result with the highest grade of the comprehensive grading information corresponding to the consistency analysis results, and selecting the target attribute value text from the candidate attribute value texts according to the target consistency analysis result. The target attribute value text is "light". The target attribute value text "mild" is taken as the attribute value text corresponding to the attribute category "severity". It can be seen that in this example, the attribute value contained in the attribute value text is a descriptive value, i.e., "light".
In the first embodiment of the present application, when the processing policy corresponding to the attribute category is a rule attribute value processing policy, performing normalization processing on the attribute text and the attribute text fragment according to the determined rule attribute value processing policy to obtain an attribute value text corresponding to each attribute category, including: and matching the attribute text and the attribute text fragment with the standard attribute text and the standard attribute text fragment, and obtaining an attribute value text matched with the attribute text and the attribute text fragment from the standard attribute text and the standard attribute text fragment. Wherein the attribute value included in the attribute value text is a numerical attribute value. In the first embodiment of the application, for the attribute value is a numeric text, such as duration, frequency, etc., the attribute segment is directly normalized by using a regular method (number + unit), but the cases of imaginary number and spoken language expression, etc. (such as days, days before, etc.) need to be considered. For other attribute values, methods such as dictionary matching, rule correction, and the like may be used as the rule attribute value processing policy. For example, the attribute category is "duration of onset", the attribute text and the attribute text fragment are "three or four days", and the rule based on number + unit corresponds to the matched attribute value text for "3 to 4 days".
Step S303, combining the attribute categories and the corresponding attribute value texts to form attribute information related to the specified field.
And after obtaining the attribute categories and the corresponding attribute value texts thereof, combining the attribute categories and the corresponding attribute value texts thereof to form attribute information related to the specified field. And combining the attribute types and the corresponding attribute value texts thereof according to a preset structure to form attribute information related to the specified field by combining the example contents. As shown in fig. 5. Fig. 5 is a schematic diagram of forming attribute information related to a specific field according to a first embodiment of the present application.
And after obtaining attribute information related to the specified field, combining the attribute information with the target entity concept to generate a specified field target text aiming at the text to be analyzed. Specifically, first, obtaining a combined relationship template of the attribute information and the target entity concept, where obtaining the combined relationship template of the attribute information and the target entity concept includes: and obtaining a plurality of candidate combination relation templates, wherein the plurality of candidate combination relation templates are obtained from a preset candidate combination relation template database, and each candidate combination relation template has respective category identification. And acquiring the combination category identification of the attribute information and the target entity concept, and matching the combination category identification with the category identifications of a plurality of candidate combination relation templates to acquire the combination relation template of the attribute information and the target entity concept from the candidate combination relation templates. And then, determining a combination structure and a combination sequence of the attribute information and the target entity concept according to the combination relation template, wherein the combination structure and the combination sequence can be preset, or can be directly set according to the attribute information and the target entity concept. And finally, combining the attribute information and the target entity concept according to the combined structure and the combined sequence to generate a specified field target text aiming at the text to be analyzed.
Corresponding to the above natural text (text to be analyzed) is that "upper abdomen left side is somewhat painful for 3-4 days", the obtained target entity concept is specifically "upper abdomen, pain", and the attribute information obtained corresponding to the entity text segment "somewhat painful for 3-4 days" is "somewhat painful for 3-4 days". And finally, combining the attribute information with the target entity concept to generate a specified field target text aiming at the text to be analyzed, wherein the specified field target text is 'epigastric pain for 3-4 days'.
The first embodiment of the application is based on a deep learning model, and an SPO extraction algorithm is used, so that the given entity can extract the attribute type and the attribute value text of the corresponding entity, and the problem that the existing model depends on two entities is solved. And meanwhile, the attribute values are normalized, and the problems of attribute type, attribute value text extraction and attribute value normalization are solved systematically.
Further, in the first embodiment of the present application, the target text in the specified domain may also be checked and corrected through the attribute information. Specifically, a combined structure and a combined sequence of the specified field target text are obtained, and attribute information is determined according to the specified field target text and the combined structure and the combined sequence of the specified field target text. The method comprises the steps of obtaining an original text for generating a target entity concept, obtaining initial attribute information from the original text, verifying the attribute information with the initial attribute information, and if the verification result is not matched, combining the initial attribute information with the target entity concept to generate a specified field target text for a text to be analyzed. According to the first embodiment of the application, the target text in the designated field is verified and corrected through the attribute information, so that the accuracy of entity normalization of the text to be analyzed is improved.
Second embodiment
Corresponding to the text processing method provided in the first embodiment of the present application, a second embodiment of the present application further provides a medical text processing method, as shown in fig. 6, where fig. 6 is a flowchart of the medical text processing method provided in the second embodiment of the present application, and the method includes the following steps:
step S601, obtaining a medical text to be analyzed, and extracting the medical text related to the medical entity concept in the specified field from the medical text to be analyzed as a medical entity text fragment.
Step S601, selecting candidate medical entity concepts reaching a preset similarity standard with the medical entity text fragment from the medical entity concept set in the specified field.
Step S601, using a preset interactive model to perform interactive analysis on the medical text to be analyzed, the medical entity text segment and each candidate medical entity concept respectively, so as to obtain a consistency analysis result.
Step S601, comparing the obtained consistency analysis results of the candidate medical entity concepts, and selecting a target medical entity concept from the candidate medical entity concepts according to a predetermined standard.
Since the medical text processing method is similar to the text processing method provided in the first embodiment of the present application, the detailed steps can be similar to the description of the first embodiment of the present application, and will not be described in detail here. An analogy may be understood as replacing a noun in the first embodiment of the present application with a corresponding noun in the second embodiment of the present application. For example, the "text to be analyzed" is replaced with the "medical text to be analyzed", the "entity concept" is replaced with the "medical entity concept", and the "entity text fragment" is replaced with the "medical entity text fragment", etc.
Third embodiment
A third embodiment of the present application provides a text processing apparatus corresponding to the text processing method provided in the first embodiment of the present application. Since the device embodiment is substantially similar to the first embodiment, it is relatively simple to describe, and reference may be made to some descriptions of the first embodiment for relevant points. The device embodiments described below are merely illustrative.
Fig. 7 is a schematic view of a text processing apparatus according to a third embodiment of the present application. The text processing apparatus includes: an entity text fragment obtaining unit 701, configured to obtain a text to be analyzed, and extract a text related to an entity concept in a specified field from the text to be analyzed as an entity text fragment; a candidate entity concept obtaining unit 702, configured to select, from the entity concept set in the specified field, a candidate entity concept that meets a predetermined similarity criterion with the entity text snippet; a consistency analysis result obtaining unit 703, configured to perform interactive analysis on the text to be analyzed, the entity text fragment, and each candidate entity concept respectively by using a preset interactive model, so as to obtain a consistency analysis result; a target entity concept obtaining unit 704, configured to compare the obtained consistency analysis results of the candidate entity concepts, and select a target entity concept from the candidate entity concepts according to a predetermined criterion.
The text processing apparatus further includes: the attribute information obtaining unit is used for analyzing the text to be analyzed to obtain attribute information related to the specified field; and the specified field target text generating unit is used for combining the attribute information and the target entity concept to generate a specified field target text aiming at the text to be analyzed.
The attribute information obtaining unit is specifically configured to identify, according to a preset attribute category related to the specified field, an attribute text and an attribute text fragment corresponding to each attribute category from the text to be analyzed; normalizing each attribute text and each attribute text fragment to obtain an attribute value text corresponding to each attribute category; and combining the attribute categories and the corresponding attribute value texts to form attribute information related to the specified field.
The candidate entity concept obtaining unit 702 is specifically configured to perform vector characterization on the entity concepts in the entity concept set of the entity text fragment and the specified field, respectively; calculating the similarity between the entity text fragment and the entity concept according to the obtained vector; and selecting entity concepts meeting the preset similarity standard as candidate entity concepts.
The consistency analysis result obtaining unit 703 is specifically configured to convert the text to be analyzed into corresponding text vector data to be analyzed, convert the entity text segment into corresponding entity text segment vector data, and convert each candidate entity concept into corresponding candidate entity concept vector data; the entity text fragment vector data comprises vector data corresponding to upper and lower context data associated with the entity text fragment; inputting the text vector data to be analyzed, the entity text segment vector data and the candidate entity concept vector data into a preset interactive model to obtain a first similarity value of upper and lower data associated with the entity text segment and a plurality of candidate entity concepts, a second similarity value of the entity text segment and the plurality of candidate entity concepts and a global similarity value associated with the candidate entity concepts; and obtaining a consistency analysis result of each candidate entity concept according to the first similarity value, the second similarity value and the global similarity value.
The target entity concept obtaining unit 704 is specifically configured to obtain first score information corresponding to the first similarity value, second score information corresponding to the second similarity value, and third score information corresponding to the global similarity value; scoring and combining the first scoring information, the second scoring information and the third scoring information to obtain comprehensive scoring information of each consistency analysis result; comparing the comprehensive grading information of each consistency analysis result with a preset grading threshold value to obtain a target consistency analysis result meeting the preset grading threshold value; and selecting a target entity concept from the candidate entity concepts according to the target consistency analysis result.
The specified field target text generation unit is specifically used for acquiring a combined relation template corresponding to the attribute information and the target entity concept; determining a combination structure and a combination sequence of attribute information and a target entity concept according to the combination relation template; and combining the attribute information and the target entity concept according to the combined structure and the combined sequence to generate a specified field target text aiming at the text to be analyzed. Wherein, the obtaining of the combination relationship template corresponding to the attribute information and the target entity concept comprises: obtaining a plurality of candidate combination relation templates, wherein the candidate combination relation templates are obtained through a preset candidate combination relation template database, and each candidate combination relation template has a respective category identification; acquiring attribute information and a combined category identifier of a target entity concept; and matching the combined category identification with category identifications of a plurality of candidate combined relation templates so as to obtain the combined relation template of the attribute information and the target entity concept from the candidate combined relation templates.
The verification unit is used for obtaining the combined structure and the combined sequence of the target text in the specified field;
determining attribute information according to the specified field target text, the combined structure and the combined sequence of the specified field target text; obtaining an original text for generating a target entity concept, and obtaining initial attribute information from the original text; and verifying the attribute information and the initial attribute information, and if the verification result is not matched, combining the initial attribute information and the target entity concept to generate a specified field target text for the text to be analyzed.
Fourth embodiment
A fourth embodiment of the present application provides a medical text processing apparatus corresponding to the medical text processing method provided in the second embodiment of the present application. Since the apparatus embodiment is substantially similar to the second embodiment, the description is relatively simple, and reference may be made to the partial description of the second embodiment for relevant points. The device embodiments described below are merely illustrative.
Please refer to fig. 8, which is a diagram illustrating a medical text processing apparatus according to a fourth embodiment of the present application. The medical text processing apparatus includes: a medical entity text fragment unit 801, configured to obtain a medical text to be analyzed, and extract a medical text related to a medical entity concept in a specified field from the medical text to be analyzed as a medical entity text fragment; a candidate medical entity concept unit 802, configured to select, from the set of medical entity concepts in the specified field, a candidate medical entity concept that meets a predetermined similarity standard with the medical entity text fragment; a consistency analysis result unit 803, configured to perform interactive analysis on the medical text to be analyzed, the medical entity text segment, and each candidate medical entity concept respectively using a preset interactive model, so as to obtain a consistency analysis result; a target medical entity concept unit 804, configured to compare the obtained consistency analysis result of each candidate medical entity concept, and select a target medical entity concept from the candidate medical entity concepts according to a predetermined standard.
Fifth embodiment
Corresponding to the method of the first embodiment of the present application, a fifth embodiment of the present application further provides an electronic device. As shown in fig. 9, fig. 9 is a schematic view of an electronic device provided in a fifth embodiment of the present application. The electronic device includes: at least one processor 901, at least one communication interface 902, at least one memory 903 and at least one communication bus 904; optionally, the communication interface 902 may be an interface of a communication module, such as an interface of a GSM module; the processor 901 may be a processor CPU, or an Application Specific Integrated Circuit ASIC (Application Specific Integrated Circuit), or one or more Integrated circuits configured to implement an embodiment of the present invention. The memory 903 may comprise high-speed RAM memory, and may also include non-volatile memory (non-volatile memory), such as at least one disk memory. The memory 903 stores a program, and the processor 901 calls the program stored in the memory 903 to execute the method of the first embodiment of the present application.
Sixth embodiment
In correspondence with the method provided in the first embodiment and the method provided in the second embodiment of the present application, a sixth embodiment of the present application also provides a computer storage medium storing a computer program that is executed by a processor to perform the method provided in the first embodiment and the method provided in the second embodiment of the present application.
Although the present application has been described with reference to the preferred embodiments, it is not intended to limit the present application, and those skilled in the art can make variations and modifications without departing from the spirit and scope of the present application, therefore, the scope of the present application should be determined by the claims that follow.
In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include forms of volatile memory in a computer readable medium, random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.
Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, computer readable media does not include non-transitory computer readable media (transient media), such as modulated data signals and carrier waves.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

Claims (16)

1. A method of text processing, comprising:
obtaining a text to be analyzed, and extracting a text related to an entity concept in a specified field from the text to be analyzed as an entity text fragment;
selecting candidate entity concepts reaching a preset similarity standard with the entity text fragment from the entity concept set of the specified field;
respectively carrying out interactive analysis on the text to be analyzed, the entity text fragment and each candidate entity concept by using a preset interactive model to obtain a consistency analysis result;
comparing the obtained consistency analysis results of the candidate entity concepts, and selecting a target entity concept from the candidate entity concepts according to a predetermined standard.
2. The text processing method according to claim 1, further comprising:
analyzing the text to be analyzed to obtain attribute information related to the specified field;
and combining the attribute information and the target entity concept to generate a specified field target text aiming at the text to be analyzed.
3. The text processing method according to claim 2, wherein the analyzing the text to be analyzed to obtain attribute information related to the specified field comprises:
according to preset attribute categories relevant to the specified field, identifying attribute texts and attribute text fragments corresponding to the attribute categories from the texts to be analyzed;
normalizing each attribute text and each attribute text fragment to obtain an attribute value text corresponding to each attribute category;
and combining the attribute categories and the corresponding attribute value texts to form attribute information related to the specified field.
4. The text processing method according to claim 3, wherein the step of identifying the attribute texts and attribute text segments corresponding to the respective attribute categories from the text to be analyzed according to preset attribute categories related to the specified domain employs an SPO entity attribute extraction algorithm based on a language characterization model pre-trained in the specified domain.
5. The text processing method according to claim 3, wherein the step of normalizing each of the attribute text segments to obtain the attribute value text corresponding to each attribute category is performed by performing normalization on the attribute text and the attribute text segments by using a model attribute value processing policy, a rule attribute value processing policy, or a combination thereof, for different attribute categories.
6. The method of claim 1, wherein selecting candidate entity concepts from the set of entity concepts in the specified domain that meet the predetermined similarity criteria with the entity text snippet comprises:
respectively carrying out vector representation on the entity text fragment and the entity concepts in the entity concept set of the specified field;
calculating the similarity between the entity text fragment and the entity concept according to the obtained vector;
and selecting entity concepts meeting the preset similarity standard as candidate entity concepts.
7. The method according to claim 1, wherein the using a preset interactive model to perform interactive analysis on the text to be analyzed, the entity text segment and each candidate entity concept respectively to obtain a consistency analysis result comprises:
converting the text to be analyzed into corresponding text vector data to be analyzed, converting the entity text fragment into corresponding entity text fragment vector data, and converting each candidate entity concept into corresponding candidate entity concept vector data; the entity text fragment vector data comprises vector data corresponding to upper and lower context data associated with the entity text fragment;
inputting the text vector data to be analyzed, the entity text segment vector data and the candidate entity concept vector data into a preset interactive model to obtain a first similarity value of upper and lower data associated with the entity text segment and a plurality of candidate entity concepts, a second similarity value of the entity text segment and the plurality of candidate entity concepts and a global similarity value associated with the candidate entity concepts;
and obtaining a consistency analysis result of each candidate entity concept according to the first similarity value, the second similarity value and the global similarity value.
8. The method of claim 7, wherein the comparing the consistency analysis results of the candidate entity concepts to select a target entity concept from the candidate entity concepts according to a predetermined criterion comprises:
obtaining first scoring information corresponding to the first similarity value, second scoring information corresponding to the second similarity value and third scoring information corresponding to the global similarity value;
grading and combining the first grading information, the second grading information and the third grading information to obtain comprehensive grading information of each consistency analysis result;
comparing the comprehensive grading information of each consistency analysis result with a preset grading threshold value to obtain a target consistency analysis result meeting the preset grading threshold value;
and selecting a target entity concept from the candidate entity concepts according to the target consistency analysis result.
9. The method according to claim 2, wherein the combining the attribute information with the target entity concept to generate a domain-specific target text for the text to be analyzed comprises:
acquiring a combination relation template corresponding to the attribute information and the target entity concept;
determining a combined structure and a combined sequence of the attribute information and the target entity concept according to the combined relation template;
and combining the attribute information and the target entity concept according to the combined structure and the combined sequence to generate a specified field target text aiming at the text to be analyzed.
10. The method according to claim 9, wherein the obtaining of the combined relationship template corresponding to the attribute information and the target entity concept comprises:
obtaining a plurality of candidate combination relation templates, wherein the candidate combination relation templates are obtained through a preset candidate combination relation template database, and each candidate combination relation template has a respective category identification;
acquiring attribute information and a combined category identifier of a target entity concept;
and matching the combined category identification with category identifications of a plurality of candidate combined relation templates so as to obtain the combined relation template of the attribute information and the target entity concept from the candidate combined relation templates.
11. The text processing method of claim 2, further comprising:
acquiring a combined structure and a combined sequence of the target texts in the specified field;
determining attribute information according to the specified field target text, the combined structure and the combined sequence of the specified field target text;
obtaining an original text for generating a target entity concept, and obtaining initial attribute information from the original text;
and verifying the attribute information and the initial attribute information, and if the verification result is not matched, combining the initial attribute information and the target entity concept to generate a specified field target text for the text to be analyzed.
12. A medical text processing method, comprising:
obtaining a medical text to be analyzed, and extracting the medical text related to the medical entity concept in the specified field from the medical text to be analyzed as a medical entity text fragment;
selecting candidate medical entity concepts reaching a preset similarity standard with the medical entity text fragment from the medical entity concept set in the specified field;
respectively carrying out interactive analysis on the medical text to be analyzed, the medical entity text fragment and each candidate medical entity concept by using a preset interactive model to obtain a consistency analysis result;
comparing the obtained consistency analysis results of the candidate medical entity concepts, and selecting a target medical entity concept from the candidate medical entity concepts according to a predetermined standard.
13. A text processing apparatus, comprising:
the entity text fragment obtaining unit is used for obtaining a text to be analyzed and extracting a text related to an entity concept in a specified field from the text to be analyzed as an entity text fragment;
a candidate entity concept obtaining unit, configured to select, from the entity concept set in the specified field, a candidate entity concept that meets a predetermined similarity criterion with the entity text snippet;
a consistency analysis result obtaining unit, configured to perform interactive analysis on the text to be analyzed, the entity text fragment, and each candidate entity concept respectively using a preset interactive model, so as to obtain a consistency analysis result;
and the target entity concept obtaining unit is used for comparing the obtained consistency analysis results of the candidate entity concepts and selecting the target entity concept from the candidate entity concepts according to a preset standard.
14. A medical text processing apparatus, comprising:
the medical entity text fragment unit is used for obtaining a medical text to be analyzed and extracting a medical text related to a medical entity concept in a specified field from the medical text to be analyzed as a medical entity text fragment;
the candidate medical entity concept unit is used for selecting candidate medical entity concepts reaching a preset similarity standard with the medical entity text fragment from the medical entity concept set in the specified field;
the consistency analysis result unit is used for carrying out interactive analysis on the medical text to be analyzed, the medical entity text fragments and each candidate medical entity concept by using a preset interactive model to obtain a consistency analysis result;
and the target medical entity concept unit is used for comparing the obtained consistency analysis result of each candidate medical entity concept and selecting the target medical entity concept from the candidate medical entity concepts according to a preset standard.
15. An electronic device, characterized in that the electronic device comprises: a processor; a memory for storing a computer program for execution by the processor to perform the method of any one of claims 1-11, 12.
16. A computer storage medium, characterized in that it stores a computer program that is executed by a processor to perform the method of any one of claims 1-11, 12.
CN202211440932.9A 2022-11-17 2022-11-17 Text processing method, medical text processing method and device and electronic equipment Pending CN115878755A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211440932.9A CN115878755A (en) 2022-11-17 2022-11-17 Text processing method, medical text processing method and device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211440932.9A CN115878755A (en) 2022-11-17 2022-11-17 Text processing method, medical text processing method and device and electronic equipment

Publications (1)

Publication Number Publication Date
CN115878755A true CN115878755A (en) 2023-03-31

Family

ID=85760108

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211440932.9A Pending CN115878755A (en) 2022-11-17 2022-11-17 Text processing method, medical text processing method and device and electronic equipment

Country Status (1)

Country Link
CN (1) CN115878755A (en)

Similar Documents

Publication Publication Date Title
US9348900B2 (en) Generating an answer from multiple pipelines using clustering
CN112328762A (en) Question and answer corpus generation method and device based on text generation model
CN109783631B (en) Community question-answer data verification method and device, computer equipment and storage medium
WO2021208444A1 (en) Method and apparatus for automatically generating electronic cases, a device, and a storage medium
CN107436916B (en) Intelligent answer prompting method and device
CN111723870B (en) Artificial intelligence-based data set acquisition method, apparatus, device and medium
CN111797245B (en) Knowledge graph model-based information matching method and related device
CN111259262A (en) Information retrieval method, device, equipment and medium
WO2021174923A1 (en) Concept word sequence generation method, apparatus, computer device, and storage medium
CN112613293A (en) Abstract generation method and device, electronic equipment and storage medium
CN117437422A (en) Medical image recognition method and device
CN117520503A (en) Financial customer service dialogue generation method, device, equipment and medium based on LLM model
CN117786069A (en) Medical question answering method, device, system, robot, storage medium and equipment
CN110377618B (en) Method, device, computer equipment and storage medium for analyzing decision result
CN114298314A (en) Multi-granularity causal relationship reasoning method based on electronic medical record
CN117932009A (en) ChatGLM model-based insurance customer service dialogue generation method, chatGLM model-based insurance customer service dialogue generation device, chatGLM model-based insurance customer service dialogue generation equipment and ChatGLM model-based insurance customer service dialogue generation medium
CN116861898A (en) Sample data processing method, device, equipment and medium
CN116956954A (en) Text translation method, device, electronic equipment and storage medium
CN114842982B (en) Knowledge expression method, device and system for medical information system
CN116719840A (en) Medical information pushing method based on post-medical-record structured processing
CN112561714B (en) Nuclear protection risk prediction method and device based on NLP technology and related equipment
CN112668284B (en) Legal document segmentation method and system
CN115878755A (en) Text processing method, medical text processing method and device and electronic equipment
JP6026036B1 (en) DATA ANALYSIS SYSTEM, ITS CONTROL METHOD, PROGRAM, AND RECORDING MEDIUM
CN115269765A (en) Account identification method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination