CN115132372A

CN115132372A - Term processing method, apparatus, electronic device, storage medium, and program product

Info

Publication number: CN115132372A
Application number: CN202210473830.0A
Authority: CN
Inventors: 张子恒; 李文琪; 郑冶枫
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2022-04-29
Filing date: 2022-04-29
Publication date: 2022-09-30

Abstract

The application provides a method, a device, an electronic device, a storage medium and a program product for processing a term; can be applied to medical technology in the field of artificial intelligence; the method comprises the following steps: acquiring a text to be matched in a specific field and a plurality of first term operation trees; wherein, each first term operation tree is constructed in advance based on a standard text of a specific field; constructing a second term operation tree corresponding to the text to be matched; determining a first node in the first term operation tree and a second node of the same type as the first node in the second term operation tree; determining the similarity between the word corresponding to the first node and the word corresponding to the second node; wherein, the first node is any one node in the first term operation tree; and under the condition that the value of the similarity meets the value-taking condition, determining the word corresponding to the first node as the standard word of the word corresponding to the second node. The standard words corresponding to the text to be matched can be accurately determined through the method and the device.

Description

Term processing method, apparatus, electronic device, storage medium, and program product

Technical Field

The present application relates to artificial intelligence technologies, and in particular, to a method, an apparatus, an electronic device, a storage medium, and a program product for processing terms.

Background

Artificial Intelligence (AI) is a comprehensive technique in computer science, and by studying the design principles and implementation methods of various intelligent machines, the machines have the functions of perception, reasoning and decision making. The artificial intelligence technology is a comprehensive subject, and relates to a wide range of fields, for example, natural language processing technology and machine learning/deep learning and other directions.

The term normalization is an important application of artificial intelligence in natural language processing. Taking the application in the medical field as an example, most medical term standardization schemes in the related technologies adopt a machine learning or deep learning model algorithm, the medical term standardization task is understood as a common short text matching task, the medical meaning in the medical term standardization task is ignored, and the determined standard words do not have medical rationality, so that the accuracy is low. Moreover, for the problem of mismatching caused by disordered word order of the text to be matched, no effective solution is available in the related technology.

Disclosure of Invention

The embodiment of the application provides a method and a device for processing a term, an electronic device, a computer-readable storage medium and a computer program product, which can accurately analyze the semantic meaning expressed by a text to be matched, so as to accurately determine a standard word corresponding to the text to be matched.

The technical scheme of the embodiment of the application is realized as follows:

the embodiment of the application provides a method for processing terms, which comprises the following steps:

acquiring a text to be matched in a specific field and a plurality of first term operation trees; each first term operation tree is constructed in advance based on a standard text of the specific field, and each standard text corresponds to a standard word in a term standard table of the specific field;

constructing a second term operation tree corresponding to the text to be matched;

performing the following for each of the first term operation trees:

determining a first node in the first term operation tree and a second node of the same type as the first node in the second term operation tree;

determining the similarity of the word corresponding to the first node and the word corresponding to the second node; wherein the first node is any one node in the first term operation tree;

and under the condition that the value of the similarity meets a value condition, determining the word corresponding to the first node as a standard word of the word corresponding to the second node.

The embodiment of the application provides a term processing device, including:

the system comprises an acquisition module, a matching module and a matching module, wherein the acquisition module is used for acquiring a text to be matched in a specific field and a plurality of first term operation trees; each first term operation tree is constructed in advance based on a standard text of the specific field, and each standard text corresponds to a standard word in a term standard table of the specific field;

the construction module is used for constructing a second term operation tree corresponding to the text to be matched;

a first determining module for performing the following for each of the first term operation trees: determining a first node in the first term operation tree and a second node of the same type as the first node in the second term operation tree;

a second determining module, configured to determine similarity between a word corresponding to the first node and a word corresponding to the second node; wherein the first node is any one node in the first term operation tree;

and a third determining module, configured to determine, when the value of the similarity satisfies a value condition, a word corresponding to the first node as a standard word of a word corresponding to the second node.

An embodiment of the present application provides an electronic device, including:

a memory for storing executable instructions;

and the processor is used for realizing the term processing method provided by the embodiment of the application when executing the executable instructions stored in the memory.

The embodiment of the application provides a computer-readable storage medium, which stores executable instructions, and the executable instructions are used for realizing the term processing method provided by the embodiment of the application when being executed by a processor.

Embodiments of the present application provide a computer program product comprising computer instructions stored in a computer readable storage medium. The processor of the electronic device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the electronic device executes the processing method of the term described above in the embodiments of the present application.

The embodiment of the application has the following beneficial effects:

by constructing the term operation tree corresponding to the text to be matched, because the data structure of the term operation tree can fully extract the semantics expressed by the text to be matched, the text to be matched can be accurately understood and analyzed, and the corresponding standard words can be accurately determined on the basis of fully understanding the text to be matched; by calculating the similarity of the word corresponding to the first node in the term operation tree corresponding to the text to be matched and the word corresponding to the second node of the same type in the term operation tree corresponding to the standard text, the determination efficiency of the standard word can be effectively improved as the similarity of the words corresponding to the nodes of the same type is calculated each time; in addition, the term operation tree can also adapt to the disorder and randomness of the text expression to be matched in different scenes, so that the accuracy of the determined standard words can be effectively improved.

Drawings

FIG. 1 is a schematic block diagram of a processing system 100 according to the present disclosure;

fig. 2 is a schematic structural diagram of a server 200 provided in an embodiment of the present application;

3A-3B are flow diagrams of a term processing method provided by an embodiment of the application;

fig. 3C is a schematic flowchart of a training method of a grid matching model according to an embodiment of the present application;

3D-3G are schematic flow diagrams of a term processing method provided by an embodiment of the application;

FIG. 4A is a schematic structural diagram of a grid matching model provided in an embodiment of the present application;

FIG. 4B is a diagram illustrating a structure of a second term operation tree according to an embodiment of the present application;

FIG. 4C is a diagram illustrating a structure of a second term operation tree after being updated according to an embodiment of the present application;

FIG. 4D is a diagram illustrating an initial spanning operation tree matrix according to an embodiment of the present disclosure;

fig. 5A is a schematic view of a medical informatization application scenario provided by an embodiment of the application;

FIG. 5B is a schematic diagram of a term processing method provided by an embodiment of the present application;

FIG. 5C is a diagram illustrating a second term operation tree according to an embodiment of the present disclosure;

FIG. 5D is a diagram illustrating an updated second term operation tree according to an embodiment of the present disclosure;

FIG. 5E is a diagram illustrating decoding a second term operation tree according to an embodiment of the present application;

FIG. 6 is a schematic diagram illustrating a construction method of a second grid relative position matrix according to an embodiment of the present application;

FIG. 7 is a schematic structural diagram of a grid matching model provided in an embodiment of the present application;

fig. 8 is a schematic structural diagram of a cross operation tree matrix according to an embodiment of the present application.

Detailed Description

In order to make the objectives, technical solutions and advantages of the present application clearer, the present application will be described in further detail with reference to the attached drawings, the described embodiments should not be considered as limiting the present application, and all other embodiments obtained by a person of ordinary skill in the art without creative efforts shall fall within the protection scope of the present application.

In the following description, reference is made to "some embodiments" which describe a subset of all possible embodiments, but it is understood that "some embodiments" may be the same subset or different subsets of all possible embodiments, and may be combined with each other without conflict.

In the following description, references to the terms "first \ second \ third" are only to distinguish similar objects and do not denote a particular order or importance, but rather "first \ second \ third" may, where permissible, be interchanged in a particular order or sequence so that embodiments of the present application described herein can be practiced in other than the order shown or described herein.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein is for the purpose of describing embodiments of the present application only and is not intended to be limiting of the application.

It is understood that, in the embodiments of the present application, the data related to the user information and the like need to be approved or approved by the user when the embodiments of the present application are applied to specific products or technologies, and the collection, use and processing of the related data need to comply with the related laws and regulations and standards of the related countries and regions.

Before further detailed description of the embodiments of the present application, terms and expressions referred to in the embodiments of the present application will be described, and the terms and expressions referred to in the embodiments of the present application will be used for the following explanation.

1) Term normalization: the method is an indispensable task in medical statistics, and in clinic, hundreds of different expression modes are often generated for the same diagnosis, and the problem to be solved by term standardization is to find corresponding medical term standard expression for various different expression modes in clinic.

2) International Classification Of Diseases (ICD): the system is a system which is classified by the World Health Organization (WHO) according to certain characteristics of diseases according to rules and is expressed in a coding mode, is the basis for determining global health trends and statistical data, and contains about 5.5 ten thousand unique codes related to injuries, diseases and causes of death, so that health practitioners can exchange health information all over the world through a universal language.

3) Short text matching task: the task is to predict semantic relevance of two short texts by using a Natural Language Processing (NLP) model, and generally performs matching by adopting a distance measurement mode in a vector space.

4) Operation Tree (calculation Tree): the method is a data structure and is a set with a hierarchical relationship, which is composed of n (n is more than or equal to 1) finite nodes. Each node in the operation tree has zero or more child nodes, wherein the node without a parent node is called a root node; each root node has one and only one father node; in addition to the root node, each child node may be divided into a plurality of disjoint sub-trees.

5) The standard word is: the words stored in the standard table are standard expression modes defined for describing certain things, are unified and standard expression for repetitive things, are issued in a specific form based on the combined result of scientific technology and practical experience, and serve as the basis and the criterion for common compliance. For example, in a medical scenario, the standard words may be standard expressions of medical terms in terms of disease, injury, medication, etc., e.g., the expression of a uniform specification issued by the world health organization in international disease classification for medical terms in terms of disease, injury, medication, etc.

6) Non-standard words: words which are not temporarily stored in the standard table are temporarily non-uniform expression modes. For example, in a medical scenario, the non-standard word may be a non-standardized expression of a medical term in terms of a disease, injury, medication, etc., e.g., the non-standard word may be a spoken expression of a medical term in terms of a disease, injury, medication, etc., by a doctor, a patient with a disease, etc.

Medical term standardization is an important technology in the medical informatization process and is also an important foundation for medical artificial intelligence. Medical term standardization aims to map/normalize non-standard text to be matched to standard text of a standard/specification in a medical standards body.

In the related art, medical term standardization is realized mostly based on traditional feature engineering and a way of manually constructing features. Currently, there are three main implementations of medical term standardization.

1) The input non-standard words are analyzed, and then fuzzy matching processing is carried out based on the analysis result. This approach causes the computation complexity of the standardized engine to be too high (o (n)), and thus it is difficult to satisfy the actual business scenario and the highly concurrent scenario.

2) A variety of literal features (e.g., segmentation features, part-of-speech features, character features, context features, glossary features, etc.) are used to model the probability distribution of non-standard words to standard words. The method has poor expansibility, and has poor recognition effect on non-standard words which cannot effectively extract the character features.

3) Using a recall ordering mode in a short text matching task, firstly, carrying out a round of recall based on similarity (realized based on the distance in a vector space) through dense semantic feature vectors of non-standard words and standard words; secondly, synonyms are added through priori knowledge; and finally, training a discrimination model to perform fine sequencing so as to determine the standard words corresponding to the non-standard words. The method ignores the premise that the model needs to meet medical rationality, and the model is too simple to model the non-standard words and the standard words, so that the method can achieve high recall rate, but cannot accurately distinguish similar concepts, and cannot accurately determine the standard words corresponding to the non-standard words.

In the implementation process of the embodiment of the application, the applicant finds that, in the related art, most of the medical term standardization schemes adopt a machine learning or deep learning model algorithm, the medical term standardization task is understood as a matching task of a common short text, and medical meanings in the medical term standardization task are ignored, so that the determined standard words have no medical rationality, and the interpretability of the determined standard words is low, and the determined standard words cannot be accepted by doctors or related practitioners.

In addition, the applicant also finds that mismatching is easily caused by misordering of the text to be matched, for example, the text to be matched, such as "chronic peritoneal hemorrhage with tumor", should be subjected to appropriate dismantling processing meeting medical rationality to obtain "chronic peritoneal hemorrhage with peritoneal tumor", and then a standardization processing flow of the model algorithm is performed based on "chronic peritoneal hemorrhage with peritoneal tumor". However, in the related art, the normalization processing is directly performed based on the original text to be matched, so that the problem that the determined standard words are not accurate and unreasonable is easily caused. That is, for the problem of mismatching caused by disordered word order of the text to be matched, no effective solution exists in the related art.

The embodiment of the application provides a method and a device for processing a term, an electronic device, a storage medium and a program product, which can accurately analyze the semantic meaning expressed by a text to be matched, so as to accurately determine a standard word corresponding to the text to be matched. The following describes an exemplary application of the electronic device for performing term processing provided by the embodiments of the present application, and the electronic device for performing term processing provided by the embodiments of the present application may be implemented as various types of user terminals such as a notebook computer, a tablet computer, a desktop computer, a set-top box, a mobile device (e.g., a mobile phone, a portable music player, a personal digital assistant, a dedicated messaging device, a portable game device), and may also be implemented as a server. In the following, an exemplary application will be explained when the electronic device is implemented as a server.

The term processing method provided by the embodiment of the application can be completed by a terminal or a server independently or cooperatively. Referring to fig. 1, fig. 1 is a schematic architecture diagram of a term processing system 100 provided in an embodiment of the present application, and includes a server 200 and a terminal 400. The terminal 400 is connected to the server 200 through a network 300, and the network 300 may be a wide area network or a local area network, or a combination of both.

As an example of the application applied to a medical archive intelligent database in a medical informatization scenario, the terminal 400 may provide term calibration service, after the terminal 400 receives a medical text to be matched input by a user, the terminal 400 sends the medical text to be matched to the server 200 through the network 300, after the server 200 receives the medical text to be matched, a second term operation tree corresponding to the medical text to be matched is constructed, a plurality of first term operation trees are obtained, the similarity between a word corresponding to a first node in the first term operation tree and a word corresponding to a second node of the same type in the second term operation tree is determined, and when the value of the similarity satisfies a condition, the word corresponding to the first node is determined as a standard word of the word corresponding to the second node. Then, the server 200 stores the determined medical standard words into the medical archive intelligent database, and after receiving the same medical texts to be matched subsequently, the server can directly acquire the corresponding medical standard words from the medical archive intelligent database; then, the server 200 returns the medical standard words to the terminal 400 through the network 300, and the terminal 400 can display the medical standard words corresponding to the medical texts to be matched on the human-computer interaction interface.

In addition, the terminal 400 can also provide term query service, after the terminal 400 receives a medical text to be queried input by a user, the terminal 400 sends the medical text to be queried to the server 200 through the network 300, after the server 200 receives the medical text to be queried, a medical standard word corresponding to the medical text to be queried is firstly determined, then a query result related to the medical standard word is returned to the terminal 400, and the terminal 400 can display the medical standard word corresponding to the medical text to be queried and the query result related to the medical standard word on a human-computer interaction interface.

As an example of the unified annotation medical text data interface applied to the medical information scene, the terminal 400 can provide an annotation service, after the terminal 400 receives a medical text to be annotated input by a user, the terminal 400 sends the medical text to be annotated to the server 200 through the network 300, after the server 200 receives the medical text to be annotated, a medical standard word (i.e., an annotation result) actually corresponding to the medical text to be annotated is determined, then, the annotation result is returned to the terminal 400, and the terminal 400 can display the annotation result corresponding to the medical text to be annotated on a human-computer interaction interface.

As an example of the real-time disease trend statistics applied to a medical informatization scene, the terminal 400 receives a plurality of medical texts to be matched, which are input by a developer, the terminal 400 sends the plurality of medical texts to be matched to the server 200 through the network 300, the server 200 determines medical standard words corresponding to each medical text to be matched after receiving the plurality of medical texts to be matched, and then the server 200 analyzes and counts a disease trend according to the plurality of determined medical standard words to obtain a real-time disease trend analysis result. Then, the server 200 sends the real-time disease trend analysis result to the terminal 400, and the terminal 400 displays the real-time disease trend analysis result on a human-computer interaction interface for a developer to query and analyze.

As an example of standardizing the professional terms in the educational informatization scenario, a developer inputs a professional text of a specific field to be matched (e.g., architecture) through a human-computer interaction interface of the operation terminal 400, the terminal 400 sends the professional text of the specific field to be matched to the server 200 through the network 300, the server 200 receives the professional text of the specific field to be matched, constructs a second term operation tree corresponding to the professional text of the specific field to be matched, and obtains a plurality of first term operation trees, determines similarity between a word corresponding to a first node in the first term operation tree and a word corresponding to a second node of the same type in the second term operation tree, and determines the word corresponding to the first node as a standard word of the word corresponding to the second node when the value of the similarity satisfies a value-taking condition. Then, the server 200 returns the professional standard words of the specific field to the terminal 400 through the network 300, and the terminal 400 may display the professional standard words of the specific field corresponding to the professional text of the specific field to be matched on the human-computer interaction interface.

As an example of standardizing search keywords in a network search scenario, a developer inputs a search key text to be matched through a human-computer interaction interface (such as a search engine client interface) of an operation terminal 400, the terminal 400 sends the search key text to be matched to the server 200 through the network 300, the server 200 receives the search key text to be matched, constructs a second term operation tree corresponding to the search key text to be matched, acquires a plurality of first term operation trees, determines similarity between a word corresponding to a first node in the first term operation tree and a word corresponding to a second node of the same type in the second term operation tree, and determines the word corresponding to the first node as a standard word of the word corresponding to the second node when the similarity satisfies a value-taking condition. Then, the server 200 returns the search key standard words to the terminal 400 through the network 300, and the terminal 400 may display the search key standard words corresponding to the search key text to be matched on the human-computer interaction interface.

In some embodiments, the server 200 may be an independent physical server, may also be a server cluster or a distributed system formed by a plurality of physical servers, and may also be a cloud server providing basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a network service, cloud communication, a middleware service, a domain name service, a security service, a CDN, and a big data and artificial intelligence platform. The terminal 400 may be, but is not limited to, a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart speaker, a smart watch, and the like. The terminal and the server may be directly or indirectly connected through wired or wireless communication, and the embodiment of the present application is not limited.

Next, referring to fig. 2, fig. 2 is a schematic structural diagram of a server 200 according to an embodiment of the present application, where the server 200 shown in fig. 2 includes: at least one processor 210, memory 230, at least one network interface 220. The various components in server 200 are coupled together by a bus system 240. It is understood that the bus system 240 is used to enable communications among the components. The bus system 240 includes a power bus, a control bus, and a status signal bus in addition to a data bus. For clarity of illustration, however, the various buses are labeled as bus system 240 in fig. 2.

The Processor 210 may be an integrated circuit chip having Signal processing capabilities, such as a general purpose Processor, a Digital Signal Processor (DSP), or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like, wherein the general purpose Processor may be a microprocessor or any conventional Processor, or the like.

The memory 230 may be removable, non-removable, or a combination thereof. Exemplary hardware devices include solid state memory, hard disk drives, optical disk drives, and the like. Memory 230 optionally includes one or more storage devices physically located remotely from processor 210.

Memory 230 includes volatile memory or nonvolatile memory, and can include both volatile and nonvolatile memory. The nonvolatile Memory may be a Read Only Memory (ROM), and the volatile Memory may be a Random Access Memory (RAM). The memory 230 described in embodiments herein is intended to comprise any suitable type of memory.

In some embodiments, memory 230 is capable of storing data, examples of which include programs, modules, and data structures, or subsets or supersets thereof, as exemplified below, to support various operations.

The operating system 231, which includes system programs for handling various basic system services and performing hardware related tasks, such as framework layers, core library layers, driver layers, etc., is used for implementing various basic services and for handling hardware based tasks.

A network communication module 232 for communicating to other computing devices via one or more (wired or wireless) network interfaces 220, exemplary network interfaces 220 including: bluetooth, wireless compatibility authentication (WiFi), and Universal Serial Bus (USB), among others.

In some embodiments, the term processing device provided by the embodiments of the present application may be implemented in software, and fig. 2 shows the term processing device 233 stored in the memory 230, which may be software in the form of programs and plug-ins, and the like, and includes the following software modules: acquisition module 2331, construction module 2332, first determination module 2333, second determination module 2334 and third determination module 2335 which are logical and therefore can be arbitrarily combined or further split depending on the functionality implemented.

The term processing method provided by the embodiments of the present application will be described below in conjunction with exemplary applications and implementations of electronic devices provided by the embodiments of the present application. It is to be understood that the methods described below may be performed by the terminal 400 or the server 200 described above individually or in cooperation.

Before describing the term processing method provided in the embodiment of the present application, a Lattice matched model (Lattice Transformer based Classifier) for term processing in the embodiment of the present application is first described. Referring to fig. 4A, fig. 4A is a schematic structural diagram of a grid matching model provided in an embodiment of the present application.

As shown in fig. 4A, the grid matching model is a double-tower structure, and each tower structure includes a coding Network, a Self-Attention Network (Self-Attention), a first residual Network, a Feed-Forward Neural Network (FNN), and a second residual Network; each tower structure forms a Transformer network; a classifier is connected after the double tower structure to form a grid matching model.

The training method of the grid matching model shown in fig. 4A will be explained below.

Referring to fig. 3C, fig. 3C is a schematic flowchart of a training method of a grid matching model according to an embodiment of the present application. The following describes steps 201 to 208 with reference to fig. 3C.

In step 201, a coding network is called to perform coding processing based on the sample words corresponding to the sample nodes, so as to obtain sample feature representations of the sample words corresponding to the sample nodes.

As an example, a first sample node and a second sample node are taken as sample nodes, respectively. The first sample node is any node in the first sample term operation tree, the second sample node is any node in the second sample term operation tree, and the types of the first sample node and the second sample node are the same.

As an example, the first sample term operation tree is pre-constructed based on each sample standard text corresponding to one sample standard word in the domain-specific term standard table. The second sample term operation tree is constructed based on the sample text to be matched.

For example, when the first sample node is taken as the sample node, the coding network is called based on the sample word corresponding to the first sample node to perform coding processing, and the sample feature representation of the sample word corresponding to the first sample node is obtained. The encoding process may be Embedding (Embedding) for encoding the sample word into a low-dimensional feature representation.

In some embodiments, the sample standard text corresponding to the first sample term operation tree and the sample text to be matched corresponding to the second sample term operation tree form a positive sample pair or a negative sample pair; the positive sample pair representation sample standard text is a standard text corresponding to a text to be matched with the sample; the negative sample pair represents that the sample standard text is not the standard text corresponding to the sample text to be matched, and in the term standard table, the standard words corresponding to the sample standard text and the standard words corresponding to the standard text corresponding to the sample text to be matched belong to the same level.

As an example, the sample standard text corresponding to the first sample term operation tree, and the sample text to be matched corresponding to the second sample term operation tree construct a positive sample pair or a negative sample pair.

And under the condition that the sample standard text is the standard text corresponding to the text to be matched, forming a positive sample pair by the sample standard text and the text to be matched. As an example, the training label corresponding to the positive sample pair is 1, that is, the true similarity between the word in the sample standard text and the word in the sample text to be matched is 1.

As an example, when the text to be matched with the sample is a medical text, a standard text corresponding to the text to be matched with the sample is determined through labeling of a professional doctor, and the determined standard text and the text to be matched with the sample form a positive sample pair.

And under the condition that the sample standard text is not the standard text corresponding to the sample text to be matched and the standard words corresponding to the sample standard text and the standard text corresponding to the sample text to be matched belong to the same level in the term standard table, the sample standard text and the sample text to be matched form a negative sample pair. The same level may indicate that the corresponding encoding bits in the term standard table are the same, e.g., 4 bits.

As an example, the training label corresponding to the negative sample pair is 0, that is, the true similarity between the word in the sample standard text and the word in the sample text to be matched is 0.

For example, see table 1, where table 1 is a schematic diagram of a portion of the terminology standard table provided in the examples of the present application.

TABLE 1 glossary of terms

Referring to table 1 above, in the case where the sample text to be matched is "cryptococcosis in lung", the corresponding standard text is "cryptococcosis lung", and thus, the sample text to be matched and "cryptococcosis lung" encoded as B45.0 constitute a pair of positive sample pairs.

In the term criteria table, the criteria words belonging to the same hierarchy as "pulmonary cryptococcosis" of B45.0 include: b45.1 "cryptococcosis cerebri", B45.2 "cryptococcosis dermalis", B45.3 "cryptococcosis osteocarcina", etc., so that the sample standard text corresponding to each standard word corresponding to the 4-bit coded hierarchy of B45.x and the text to be matched with the sample respectively form a pair of negative sample pairs.

The positive sample pair and the negative sample pair are determined through the method, the negative sample pair is constructed through the sample standard texts of the same level based on the corresponding standard words and the standard words of the standard texts corresponding to the texts to be matched with the samples, and the negative sample pair is not constructed through random selection of the sample standard texts, so that the quality of the negative sample pair is improved, and a grid matching model with good performance is conveniently obtained through training based on the high-quality negative sample. Meanwhile, the model is trained based on the positive sample pair and the negative sample pair, so that the accuracy of the grid matching model obtained by training is further improved.

In step 202, a self-attention network is called to perform calculation processing based on the sample feature representation, and a sample self-attention weight corresponding to the sample feature representation is obtained.

As an example, after obtaining the sample feature representation, a self-attention network is called to perform calculation processing based on the sample feature representation, and a sample self-attention weight corresponding to the sample feature representation is obtained. As an example, the Self-Attention network here may be a Multi-Head Self-Attention network (Multi-Head Self-Attention).

In step 203, a first residual error network is called based on the sample self-attention weight to perform calculation processing, so as to obtain a first sample residual error network calculation result corresponding to the sample characteristic representation.

As an example, after the sample self-attention weight corresponding to the sample feature representation is obtained, the first residual error network is called to perform calculation processing based on the sample self-attention weight, and a first sample residual error network calculation result corresponding to the sample feature representation is obtained.

As an example, the first residual error network comprises a summing module (Add) and a normalization module (Norm), and when the first residual error network is called to perform calculation processing based on the sample self-attention weight, the sample self-attention weight and the sample feature representation are summed through the summing module to obtain a first sample summing result; then, a normalization module is used for performing normalization processing (for example, layer normalization processing) on the first sample summation result to obtain a first sample residual error network calculation result corresponding to the sample characteristic representation.

In step 204, based on the first sample residual error network calculation result corresponding to the sample characteristic representation, a feedforward neural network is called to perform calculation processing, so as to obtain a sample feedforward neural network calculation result corresponding to the sample characteristic representation.

As an example, after the first sample residual error network calculation result is obtained, the feedforward neural network is called to perform calculation processing based on the first sample residual error network calculation result, so as to obtain a sample feedforward neural network calculation result.

As an example, the feedforward neural network includes a first fully-connected layer and a second fully-connected layer, and the first fully-connected layer includes an activation function (e.g., a linear modification unit Relu function), and the second fully-connected layer does not include an activation function.

When the feedforward neural network is called to perform calculation processing based on the first sample residual error network calculation result, firstly, a first full connection layer is called to perform calculation processing based on the first sample residual error network calculation result, namely, the first sample residual error network calculation result and a first weight parameter are multiplied to obtain a first sample multiplication result, the first sample multiplication result and a second weight parameter are summed to obtain a second sample summation result, and the second sample summation result is used as a first full connection layer sample calculation result.

Secondly, calling a second full connection layer to perform calculation processing based on the first full connection layer sample calculation result, namely determining the maximum value of the second sample summation result and a second threshold (namely 0), and multiplying the maximum value and a third weight parameter to obtain a second sample multiplication result; and summing the multiplied result of the second sample and the fourth weight parameter to obtain a third sample summation result (namely, a second full-connection layer sample calculation result), and determining the third sample summation result as a sample feedforward neural network calculation result. The first weight parameter and the second weight parameter are parameters corresponding to the first full connection layer; the third weight parameter and the fourth weight parameter are parameters corresponding to the second full connection layer.

In step 205, based on the sample feedforward neural network calculation result corresponding to the sample characteristic representation, a second residual error network is called for calculation processing, so as to obtain a second sample residual error network calculation result corresponding to the sample characteristic representation.

As an example, after the sample feedforward neural network calculation result is obtained, a second residual error network is called to perform calculation processing based on the sample feedforward neural network calculation result, so that a second sample residual error network calculation result corresponding to the sample characteristic representation is obtained.

As an example, the second residual error network is the same as the first residual error network, and also comprises a summing module and a normalization module, when the second residual error network is called to perform calculation processing based on the calculation result of the sample feedforward neural network, firstly, the calculation result of the sample feedforward neural network and the calculation result of the first sample residual error network are summed through the summing module, and a fourth sample summing result is obtained; then, a normalization module is used for performing normalization processing (for example, layer normalization processing) on the fourth sample summation result to obtain a second residual error network calculation result corresponding to the sample feature representation.

In step 206, the second sample residual error network calculation result corresponding to the sample feature representation is spliced, and a classifier is called based on the spliced result to perform classification processing, so as to obtain the prediction similarity between the sample word corresponding to the first sample node and the sample word corresponding to the second sample node.

As an example, after the second sample residual error network calculation result corresponding to the sample feature representation corresponding to the first sample node and the second sample residual error network calculation result corresponding to the sample feature representation corresponding to the second sample node are obtained through the

above step

201 and 205, the second sample residual error network calculation results corresponding to the two sample feature representations are spliced, and a classifier is called based on the spliced result to perform classification processing, so as to obtain the prediction similarity between the sample word corresponding to the first sample node and the sample word corresponding to the second sample node. The Classifier may be a Linear Classifier (Linear Classifier) for performing binary classification.

In step 207, the predicted similarity and the corresponding true similarity are substituted into a loss function for calculation processing, so as to obtain a loss value.

As an example, after obtaining the prediction similarity between the sample word corresponding to the first sample node and the sample word corresponding to the second sample node, the prediction similarity and the true similarity are substituted into the loss function to be calculated, so as to obtain a loss value.

Under the condition that a sample standard text corresponding to the first sample term operation tree and a sample text to be matched corresponding to the second sample term operation tree form a positive sample pair, the real similarity of a sample word corresponding to the first sample node and a sample word corresponding to the second sample node is 1; under the condition that a sample standard text corresponding to the first sample term operation tree and a sample text to be matched corresponding to the second sample term operation tree form a negative sample pair, the real similarity of a sample word corresponding to the first sample node and a sample word corresponding to the second sample node is 0.

As an example, the Loss Function may be a Binary Cross-Entropy Loss Function (Binary Cross-Entropy Loss Function), and the calculation formula is as follows:

wherein m represents the number of node pairs to be trained, wherein a first sample node and a second sample node constitute a node pair; y' _i Representing the real similarity, y 'of the sample word corresponding to the first sample node and the sample word corresponding to the second sample node' _i The value is 0 or 1, and y 'is obtained under the condition that the sample standard text corresponding to the first sample term operation tree and the sample text to be matched corresponding to the second sample term operation tree form a positive sample pair' _i The value is 1; y 'under the condition that the sample standard text corresponding to the first sample term operation tree and the sample text to be matched corresponding to the second sample term operation tree form a negative sample pair' _i The value is 0; y is _i And the prediction similarity of the sample word corresponding to the first sample node and the sample word corresponding to the second sample node is represented.

In step 208, parameters of the classifier, parameters of the second residual network, parameters of the feed-forward neural network, parameters of the first residual network, parameters of the self-attention network, and parameters of the encoding network are updated in a back propagation process based on the loss values.

As an example, after the loss value is determined by the above equation 1, in the back propagation process of the lattice matching model, the parameter of the classifier, the parameter of the second residual network, the parameter of the feedforward neural network, the parameter of the first residual network, the parameter of the self-attention network, and the parameter of the encoding network are updated based on the loss value.

And (5) repeatedly executing the steps 201 to 208 until the loss value reaches the minimum or the maximum training times, and ending the training process to obtain the trained grid matching model.

By training the grid matching model in the mode, the grid matching model obtained by training has good performance, and the term processing is accurately performed on the basis of the grid matching model with good performance, so that the standard words corresponding to the text to be matched are accurately determined.

The term processing method provided by the embodiments of the present application will be described below with reference to the drawings.

Referring to fig. 3A, fig. 3A is a schematic flow chart of a term processing method provided in an embodiment of the present application, and will be described with reference to the steps shown in fig. 3A.

In step 101, a text to be matched of a specific field and a plurality of first term operation trees are obtained.

As an example, the specific field may be a medical field, an educational field, an architectural field, and the like. Taking the medical field as an example, the obtained text to be matched in the medical field may be a spoken expression of medical terms of diseases, injuries, medicines and the like by doctors, patients and the like. Taking the building field as an example, the obtained text to be matched of the building field may be a spoken expression of some standard words.

As an example, each first term operation tree is pre-constructed based on a standard text of a specific field, and each standard text corresponds to a standard word in the term standard table of the specific field. Taking the medical field as an example, each standard text corresponds to one of the standard words in table 1 above.

In step 102, a second term operation tree corresponding to the text to be matched is constructed.

As an example, after obtaining the text to be matched, a second term operation tree corresponding to the text to be matched is constructed.

Referring to fig. 3B, fig. 3B is a schematic flow chart of a term processing method provided in the embodiment of the present application. Based on fig. 3A, step 102 shown in fig. 3B may be implemented by step 1021 and step 1022. The following will explain step 1021 and step 1022 shown in fig. 3B.

In step 1021, the text to be matched is split and encoded to obtain the components of the text to be matched.

As an example, the text to be matched includes multiple semantic components, and therefore, the text to be matched is firstly subjected to splitting processing (decomplexing) and Encoding processing (Encoding) to obtain the various components of the text to be matched. The components include a modifier component (objective), a site component (body), a root component (e.g., Disease root component Disease), and a logic component (Connection).

As an example, firstly, a text to be matched is subjected to splitting processing, and a logical component C of the text to be matched, that is, a link logical component, is obtained. The table look-up method can be used to determine the three types of logic components and/or the accompaniment/or the three types of logic components in the text to be matched. The method comprises the steps of establishing a corresponding relation table between words and components in advance, splitting a text to be matched after the text to be matched is obtained to obtain a plurality of words, and inquiring the corresponding components of the words in the corresponding relation table according to the obtained words.

And secondly, coding the text to be matched to obtain a modification component A, a part component B and a root component D of the text to be matched. The modification component, the part component and the root component of the text to be matched can be determined by a table look-up method, and a sequence tagging model can be called based on the text to be matched for tagging to obtain a plurality of component tags corresponding to the text to be matched, wherein each component tag corresponds to one of the modification component, the part component and the root component.

It is worth mentioning that the three components may be nested or partially overlapped, and therefore, the encoding identification needs to be performed separately and in parallel.

As an example, M ═ M for the text to be matched ₁ ,m ₂ ,m ₃ ,...,m _n And n is the length of the text M to be matched, and the following sequence of component extraction processing (namely, coding processing) is carried out:

first, taking root components as disease root components as an example, from a text to be matched M ═ M ₁ ,m ₂ ,m ₃ ,...,m _n Extracting to obtain disease root component D ═ m _k ,m _k+1 ,m _k+2 ,...,m _k+l L is the length of the disease root component, k is the position of the initial word of the disease root component in the text to be matched, k is more than or equal to 1 and less than or equal to n, l is more than or equal to 0 and less than or equal to n-k, and the set { m _k ,m _k+1 ,m _k+2 ,...,m _k+l Represents a disease root component.

Secondly, from the text M to be matched, { M ═ M ₁ ,m ₂ ,m ₃ ,...,m _n Extracting to obtain a fraction B ═ m _r ,m _r+1 ,m _r+2 ,...,m _r+x Where x is the length of the site component and r is the position of the start word of the site component in the text to be matched, 1. ltoreq. r.ltoreq.n, 0. ltoreq. x.ltoreq.n-r, set { m _r ,m _r+1 ,m _r+2 ,...,m _r+x Represents a site component.

Thirdly, from the text to be matched M ═ M ₁ ,m ₂ ,m ₃ ,...,m _n Extracting to obtain a modified component A ═ m } _q ,m _q+1 ,m _q+2 ,...,m _q+w W is the length of the modification component, q is the position of the initial word of the modification component in the text to be matched, 1 < q < n, 0 < w < n-q, and the set { m _q ,m _q+1 ,m _q+2 ,...,m _q+w Represents a modifying component.

Finally, the obtained disease root component D, the part component B and the modification component A are combined to obtain l _D 、l _B 、l _A Wherein l is _D /l _B /l _A Respectively, the length of the D/B/A component, k _D /k _B /k _A Starting words respectively representing D/B/A components in the text to be matchedThe position of (a).

In step 1022, a second term operation tree corresponding to the text to be matched is constructed based on the components of the text to be matched.

As an example, after obtaining components of the text to be matched, based on various components of the text to be matched, a second term operation tree corresponding to the text to be matched is constructed.

The second term operation tree is constructed in the above mode, and the second term operation tree is constructed based on multiple components of the text to be matched, so that the constructed second term operation tree can accurately and comprehensively express the multiple components of the text to be matched, and the second term operation tree can accurately and comprehensively express the semantics of the text to be matched.

Referring to fig. 3B, step 1022 may also be implemented by steps 10221-10224, and steps 10221-10224 are explained below with reference to fig. 3B.

In step 10221, the logical components are determined as root nodes of the second term operation tree.

For example, referring to fig. 4B, fig. 4B is a schematic structural diagram of a second term operation tree provided in an embodiment of the present application.

And determining a logic component C in the text to be matched as a root node in the second term operation tree.

In step 10222, a root component is determined as an intermediate node of the second term operation tree.

As an example, the root component in the text to be matched is determined as an intermediate node in the second term operation tree, and referring to fig. 4B, taking the root component as the disease root component D as an example, the disease root components D1 and D2 are determined as intermediate nodes in the second term operation tree, and the intermediate nodes are child nodes of the root node.

In step 10223, the part component and the modification component are determined as leaf nodes of the second term operation tree.

As an example, referring to fig. 4B, the modified components a1, a2, and A3, and the partial component B1 in the text to be matched are determined as leaf nodes in the second term operation tree, the leaf nodes being child nodes of the intermediate nodes.

In step 10224, the root node, the middle node and the leaf node are connected according to the hierarchy to which they belong, so as to obtain a second term operation tree corresponding to the text to be matched.

As an example, after the root node, the intermediate node, and the leaf node are determined, connection is performed according to a hierarchy to which each node belongs, for example, the root node is connected to the intermediate node, and the leaf node is connected to the intermediate node, so as to obtain the second term operation tree.

Referring to fig. 4B, the root node C is connected to the intermediate node D1 and the intermediate node D2, the intermediate node D1 is connected to the leaf nodes a1, a2, and A3, respectively, and the intermediate node D2 is connected to the leaf node B1, thereby obtaining a second term operation tree.

As an example, in the case where the text to be matched is "diabetic progressive chronic hemorrhage with peritoneal mass", wherein the logical component C is "with"; disease root component D includes "hemorrhage" and "swelling"; the modified component A corresponding to the disease root component D1 "hemorrhage" includes "diabetes", "progressive", "chronic"; the site component B corresponding to the disease root component D2 "tumor" includes "peritoneum", and thus, according to the above steps 10221 to 10224, the second term operation tree as shown in the right side of fig. 4B is constructed.

The construction process of the first term operation tree is similar to that of the second term operation tree, and is not described herein again. It should be noted that, only the root node and the intermediate node may be included in the first term operation tree, and in a case that the logical component is not included in the standard text, the logical component corresponding to the root node in the corresponding first term operation tree is empty.

Each component included in the text to be matched is used as a node in the second term operation tree, and the nodes are connected according to the hierarchy to which the node belongs, so that the second term operation tree is obtained, the second term operation tree can express not only each component included in the text to be matched, but also the relationship among the components, and the second term operation tree can completely and comprehensively express the semantics of the text to be matched and the incidence relationship among the semantics.

In some embodiments, in a case where the second term operation tree includes a first intermediate node and a second intermediate node, after the second term operation tree corresponding to the text to be matched is obtained, in a case where a plurality of first leaf nodes exist in the first intermediate node, the plurality of first leaf nodes are respectively connected to the second intermediate node, so as to obtain an updated second term operation tree; wherein the modification component or site component corresponding to the first leaf node is absent from the root component corresponding to the second intermediate node.

As an example, after the second term operation Tree is obtained, the second term operation Tree may be expanded (Tree-expanding) to obtain an updated second term operation Tree.

As an example, the updated second term operation tree is obtained by: if the second term operation tree comprises a first intermediate node and a second intermediate node, respectively connecting a plurality of first leaf nodes with the second intermediate node under the condition that the first intermediate node has a plurality of first leaf nodes to obtain an updated second term operation tree; wherein the modification component or site component corresponding to the first leaf node is absent from the root component corresponding to the second intermediate node.

Referring to fig. 4C, fig. 4C is a schematic structural diagram of an updated second term operation tree according to an embodiment of the present application.

The second term operation tree shown on the left side of fig. 4C is the same as the second term operation tree shown on the left side of fig. 4B, and the middle nodes of the second term operation tree include disease root components D1 and D2, where there are 3 leaf nodes of D1, the 3 leaf nodes all correspond to modified components, and there is no modified component in the disease root component D2, so that the 3 leaf nodes can be transferred to D2, that is, the 3 leaf nodes are respectively connected to the node D2. As an example, one or more of the 3 leaf nodes may also be connected to node D2, and the right side of FIG. 4C shows the case where 2 of the leaf nodes A1 and A2 are connected to node D2, resulting in an updated second term operation tree.

The updated second term operation tree is obtained by expanding the second term operation tree, and can be adapted to different writing specifications of doctors or related workers in a real scene and a large number of conditions of omission of natural language, so that the business requirement of the real scene can be met to the maximum extent, and the comprehensive semantics of the text to be matched can be further explored.

In some embodiments, after obtaining the second term operation tree, a Decoding process (Decoding) may be performed on the second term operation tree to obtain the text to be matched. It should be noted that the text to be matched obtained through the decoding process does not consider the order among the sibling components.

For example, the second term operation tree shown on the right side of fig. 4B is decoded, and the obtained text to be matched may be one of diabetes/progressive/chronic/bleeding/companion/peritoneum/tumor, diabetes/chronic/progressive/bleeding/companion/peritoneum/tumor, progressive/diabetes/chronic/bleeding/companion/peritoneum/tumor. Therefore, the second term operation tree is decoded, and the sequence of the same level components of the text to be matched is not considered, so that the text to be matched is converted into a term operation tree structure, and the problems of disorder and randomness existing in the text expression to be matched in different scenes can be effectively solved.

In step 103, a first node in the first term operation tree and a second node of the same type as the first node in the second term operation tree are determined.

As an example, after the second term operation tree is obtained, for each first term operation tree, a first node in each first term operation tree and a second node of the same type as the first node in the second term operation tree are determined. Wherein the first node is any one node in the first term operation tree.

As an example, the same type here may represent that the types of the components corresponding to the first node and the second node are the same, such as in the case that the type of the component corresponding to the first node is a root component type, the type of the component corresponding to the second node is also a root component type.

In step 104, the similarity between the word corresponding to the first node and the word corresponding to the second node is determined.

As an example, after the first node and the second node are determined, the similarity of the word corresponding to the first node and the word corresponding to the second node is determined.

It should be noted that the word corresponding to the first node is a component corresponding to the first node. As shown on the right side of fig. 4B, the component corresponding to one intermediate node is the disease root component "hemorrhage", and the word corresponding to the intermediate node is "hemorrhage".

In some embodiments, the similarity of the word corresponding to the first node and the word corresponding to the second node may be determined based on the grid matching model trained above. The following description will be made with reference to the accompanying drawings.

Referring to fig. 3D, fig. 3D is a schematic flow chart of a term processing method provided in an embodiment of the present application, and in some embodiments, based on fig. 3A, step 104 shown in fig. 3D may be implemented by steps 1041 to 1046, which will be described below with reference to steps 1041 to 1046 shown in fig. 3D.

In step 1041, a coding network is used for coding processing based on the word call corresponding to the target node, so as to obtain a target feature representation of the word corresponding to the target node.

As an example, the first node and the second node are respectively used as target nodes, and then encoding processing is performed by using an encoding network based on the word call corresponding to the target nodes, so as to obtain target feature representations of the words corresponding to the target nodes. That is, the word corresponding to the target node is encoded into a low-dimensional feature representation through the encoding network.

In step 1042, a self-attention network is invoked to perform calculation processing based on the target feature representation, and a target self-attention weight corresponding to the target feature representation is obtained.

As an example, after obtaining the target feature representation of the word corresponding to the target node, the self-attention network is called to perform self-attention calculation processing based on the target feature representation, and a target self-attention weight corresponding to the target feature representation is obtained.

Referring to fig. 3E, fig. 3E is a schematic flowchart of a term processing method provided in an embodiment of the present application, and in some embodiments, based on fig. 3D, step 1042 shown in fig. 3E may be implemented by steps 10421 to 10422, which will be described below with reference to steps 10421 to 10422 shown in fig. 3E.

In step 10421, a self-attention network is called to perform calculation processing based on the target feature representation and the target grid relative position matrix corresponding to the target feature representation, so as to obtain a first self-attention weight corresponding to the target feature representation.

As an example, when determining a target self-attention weight corresponding to a target feature representation, first, a self-attention network is called to perform self-attention calculation processing based on the target feature representation and a target grid Relative Position Matrix (target Relative Position Matrix) corresponding to the target feature representation, and a first self-attention weight is obtained.

The target grid relative position matrix is determined based on the term operation tree corresponding to the target node. Under the condition that the target node is a first node, the target grid relative position matrix is determined based on the first term operation tree; in the case where the target node is the second node, the target grid relative position matrix is determined based on the second term operation tree.

In some embodiments, the target grid relative position matrix is constructed by: determining the path distance between the node i and the node j based on the hierarchy and the path relation of the node i and the node j in the target term operation tree; determining the path distance as the value of a matrix element (i, j) of the target grid relative position matrix; when the target node is a first node, the target term operation tree is a first term operation tree, and when the target node is a second node, the target term operation tree is a second term operation tree; i is 1. ltoreq. N, j is 1. ltoreq. N, N being the number of nodes included in the target term operation tree.

As an example, when the target node is the first node, the target term operation tree is the first term operation tree, a path distance between the two nodes is determined based on the hierarchy of the node i and the node j in the first term operation tree and the path relationship, and the path distance is determined as a value of a matrix element (i, j) of the target grid relative position matrix. Wherein, i is more than or equal to 1 and less than or equal to N, j is more than or equal to 1 and less than or equal to N, and N is the number of nodes included in the first term operation tree.

And when the target node is a second node, determining the path distance between the two nodes based on the hierarchy of the node i and the node j in the second term operation tree and the path relation, and determining the path distance as the value of the matrix element (i, j) of the relative position matrix of the target grid. Wherein i is not less than 1 and not more than N, j is not less than 1 and not more than N, and N is the number of nodes included in the second term operation tree.

Because the matrix element of the relative position matrix of the target grid is the path distance between two nodes, and the path distance is determined based on the hierarchy between the nodes and the path relationship, the relative position matrix of the target grid can accurately reflect the structure of the corresponding target term operation tree, and the information of the target term operation tree can be effectively utilized.

In some embodiments, determining the path distance between the node i and the node j based on the hierarchy of the node i and the node j in the target term operation tree and the path relationship is implemented by: determining a first threshold as a path distance between a node i and a node j under the condition that the node i and the node j belong to the same hierarchy; under the condition that the node i and the node j do not belong to the same level and do not have a path relation, determining a first threshold value as the path distance between the node i and the node j; and determining the level number of the layer between the node i and the node j as the path distance between the node i and the node j under the condition that the node i and the node j do not belong to the same layer and have the path relation.

As an example, in determining the path distance between the node i and the node j, if the node i and the node j belong to the same hierarchy, there is no path relationship between the node i and the node j, that is, the node i and the node j are unreachable, and thus, the first threshold is determined as the path distance between the node i and the node j. As an example, the first threshold may take Infinity (INF).

For example, referring to the second term operation tree shown on the left side of fig. 4B, both the node D1 and the node D2 belong to the hierarchy of intermediate nodes, and thus, there is no path relationship between the two nodes, and thus INF is determined as the path distance between the node D1 and the node D2.

If the node i and the node j do not belong to the same hierarchy and do not have a path relationship, the node i and the node j are not reachable, and therefore, the first threshold value is determined as the path distance between the node i and the node j.

For example, referring to the second term operation tree shown on the left side of fig. 4B, the node D1 and the node B1 belong to an intermediate node level and a leaf node level, respectively, but there is no path relationship between the node D1 and the node B1, and thus INF is determined as a path distance between the node D1 and the node B1.

If the node i and the node j do not belong to the same hierarchy and have a path relation, determining the level number of the hierarchy between the node i and the node j as the path distance between the node i and the node j.

For example, referring to the second term operation tree shown on the left side of FIG. 4B, node C and node A1 belong to the root node level and the leaf node level, respectively, and a path C- > D1- > A1 exists between node C and node A1, so the number of levels 2 separating node C from node A1 is determined as the path distance between node C and node A1.

It should be noted that, if the target grid relative position matrix is determined based on the expanded term operation tree, for the element (i, j) in the target grid relative position matrix, if the node i and the node j in the term operation tree before expansion do not have a path relationship, but the node i and the node j in the updated term operation tree after expansion do have a path relationship, then a third threshold is determined as the path distance between the node i and the node j, for example, the third threshold may be 0.5.

For example, referring to fig. 4C, in the second term operation tree before expansion shown on the left side of fig. 4C, since the node a2 and the node D2 do not have a path relationship therebetween and do not belong to the same hierarchy, the first threshold (e.g., INF) is set as the path distance between the node a2 and the node D2, and the updated second term operation tree after expansion has a path relationship between the node a2 and the node D2, the third threshold (e.g., 0.5) is set as the path distance between the node a2 and the node D2.

The path distance between the nodes is determined based on the levels and the path relation between the nodes, so that the level and the path relation between every two nodes can be accurately reflected by the path distance, and the structural information of the corresponding target term operation tree can be accurately reflected by using the path distance as a relative position matrix of a target grid of a matrix element.

Referring to fig. 3E, step 10421 may be implemented by steps 104211 to 104212, which are described below with reference to fig. 3E.

In step 104211, a multi-head self-attention network is called to perform self-attention calculation processing based on the target feature representation and the target grid relative position matrix corresponding to the target feature representation, and a self-attention weight corresponding to each head is obtained.

As an example, the self-attention network in the embodiment of the present application is a multi-head self-attention network. When calculating the target self-attention weight corresponding to the target feature representation, firstly, calling a multi-head self-attention network to perform self-attention calculation processing based on the target feature representation and a target grid relative position matrix corresponding to the target feature representation, and obtaining the self-attention weight corresponding to each head.

As an example, the formula for calculating the self-attention weight for each head is as follows:

wherein the content of the first and second substances,

a transposed vector representing a representation of the target feature,

W _k,E 、W _k,R u, v each represent a learnable weight parameter,

representing a characteristic representation of the word corresponding to node j, R _i-j And (3) representing the value of a matrix element (i, j) in the relative position matrix of the target grid, wherein the node i and the node j are nodes in the same term operation tree.

In step 104212, the self-attention weight corresponding to each head is subjected to stitching processing, and the obtained stitching processing result is subjected to linear transformation processing, and the linear transformation processing result is determined as the first self-attention weight corresponding to the target feature representation.

As an example, after obtaining the self-attention weight corresponding to each head, the self-attention weight corresponding to each head is subjected to stitching processing to obtain a stitching processing result, and then the stitching processing result is subjected to linear transformation processing, and the obtained linear transformation processing result is determined as the first self-attention weight corresponding to the target feature representation.

As an example, the calculation formula of the first self-attention weight is as follows:

att1(a, V) ═ softmax (a) V equation 3

Where Att1(A, V) represents a first self-attention weight, A represents a self-attention weight A for each head _ij The softmax is an activation function and is used for performing linear transformation processing on the splicing processing result, and V is a weight parameter of the multi-head self-attention network, and a calculation formula of the weight parameter is as follows:

wherein the content of the first and second substances,

representing a target feature representation, W _v Representing a learnable weight parameter.

By determining the first self-attention weight corresponding to the target feature representation in the above manner, the determined first self-attention weight can more accurately measure the importance degree of the target feature representation, and meanwhile, the target self-attention weight can be accurately determined based on the accurate first self-attention weight.

In step 10422, based on the first self-attention weight and the initial cross operation tree matrix, a self-attention network is called to perform calculation processing, and a target self-attention weight corresponding to the target feature representation is obtained.

As an example, after the first self-attention weight is determined, based on the first self-attention weight and the initial cross operation tree matrix, the self-attention network is called to perform self-attention calculation processing, so as to obtain a target self-attention weight corresponding to the target feature representation.

The cross operation tree matrix comprises a plurality of matrix elements which are in one-to-one correspondence with a plurality of node pairs, the matrix elements represent the similarity of words corresponding to two nodes in the node pairs, the types of the two nodes are the same, one node is from the first term operation tree, and the other node is from the second term operation tree.

Referring to fig. 4D, fig. 4D is a schematic diagram of an initial spanning operation tree matrix provided in the embodiment of the present application. The row coordinates of the initial cross operation tree matrix are nodes in a first term operation tree, the column coordinates are nodes in a second term operation tree, and each matrix element in the matrix represents the similarity between words corresponding to two corresponding nodes. Wherein, 0.0 represents an initialization value, and the value is updated along with the calculation of the model; NA indicates that the weight is artificially fixed to 0.0 and is not updated, so that it can be ensured that similarity calculation is performed between words corresponding to only the same type of node between the two term operation trees, for example, similarity calculation is performed between C components corresponding to the term operation tree X (i.e., the first term operation tree described above) and the term operation tree Y (i.e., the second term operation tree described above), and similarity calculation is performed between corresponding D components, such as similarity calculation between XD1/YD1, XD1/YD2, XD2/YD1, and XD2/YD2, respectively.

As an example, the calculation formula of the target self-attention weight is as follows:

wherein Att (A, V) represents a target self-attention weight, softmax represents an activation function, Att1 represents a first self-attention weight, C represents an initial spanning operation tree matrix,

and V represents the weight parameter corresponding to the initial cross operation tree matrix, and the weight parameter of the self-attention network.

The target self-attention weight corresponding to the target feature representation is determined based on the target grid relative position matrix corresponding to the target feature representation and the initial cross operation tree matrix, and the target grid relative position matrix can reflect the hierarchy and the connection relation of the target term operation tree, and the initial cross operation tree matrix can reflect the spatial similarity between the two term operation trees, so the target self-attention weight obtained by calculation in the mode can be used for more accurately balancing the importance degree of the target feature representation.

In step 1043, a first residual error network is called based on the target self-attention weight to perform calculation processing, and a first target residual error network calculation result corresponding to the target feature representation is obtained.

As an example, after the target self-attention weight is obtained, a first residual error network is called to perform calculation processing based on the target self-attention weight, and a first target residual error network calculation result corresponding to the target feature representation is obtained.

Referring to fig. 3F, fig. 3F is a schematic flowchart of a term processing method provided in an embodiment of the present application, and in some embodiments, based on fig. 3D, step 1043 shown in fig. 3F may also be implemented by step 10431 and step 10432, which will be described below with reference to step 10431-step 10432 shown in fig. 3F.

In step 10431, based on the target self-attention weight and the target feature representation, a first residual error network is called to perform summation processing, so as to obtain a first target summation result.

As an example, when determining the first target residual error network calculation result, first, based on the target self-attention weight and the target feature representation, the first residual error network is invoked to perform summation processing, so as to obtain a first target summation result.

As an example, the first residual error network includes a summing module and a normalizing module, and therefore, here, the summing module sums the target self-attention weight and the target feature representation to obtain a first target summing result.

In step 10432, the first target summation result is normalized to obtain a first target residual error network calculation result corresponding to the target feature representation.

As an example, after obtaining the first target summation result, the normalization module performs normalization processing (e.g., layer normalization processing) on the first target summation result to obtain a first target residual network calculation result corresponding to the target feature representation.

By the method, the accuracy of the determined first target residual error network calculation result can be improved, and subsequent calculation can be performed based on the accurate first target residual error network calculation result.

In step 1044, based on the first target residual error network calculation result corresponding to the target feature representation, the feedforward neural network is invoked for calculation processing, so as to obtain a target feedforward neural network calculation result corresponding to the target feature representation.

As an example, after obtaining the first target residual network calculation result, based on the first target residual network settlement result, the feedforward neural network is invoked for performing calculation processing, so as to obtain a target feedforward neural network calculation result corresponding to the target feature representation.

In some embodiments, the feedforward neural network includes a first fully-connected layer and a second fully-connected layer, and based on a first target residual network calculation result corresponding to the target feature representation, the feedforward neural network is called to perform calculation processing to obtain a target feedforward neural network calculation result corresponding to the target feature representation, and the calculation is performed in the following manner: calling a first full-connection layer to perform calculation processing based on the first target residual error network calculation result to obtain a first full-connection layer target calculation result; and calling a second full-connection layer for calculation processing based on the first full-connection layer target calculation result, and determining the obtained second full-connection layer target calculation result as a target feedforward neural network calculation result corresponding to the target feature representation.

As an example, when determining the target feedforward neural network calculation result, first, based on the first target residual network calculation result, invoking the first full-link layer to perform calculation processing, that is, performing multiplication processing on the first target residual network calculation result and the first weight parameter to obtain a first target multiplication result, and performing summation processing on the first target multiplication result and the second weight parameter to obtain a third target summation result, and using the third target summation result as the first full-link layer target calculation result.

Secondly, calling a second full-link layer to perform calculation processing based on the first full-link layer target calculation result, namely determining the maximum value of a third target summation result and a second threshold (namely 0), and multiplying the maximum value by a third weight parameter to obtain a second target multiplication result; and summing the second target multiplication result and the fourth weight parameter to obtain a fourth target summation result (namely, a second full-connection layer target calculation result), and determining the fourth target summation result as a target feedforward neural network calculation result. The first weight parameter and the second weight parameter are parameters corresponding to the first full connection layer; the third weight parameter and the fourth weight parameter are parameters corresponding to the second full connection layer.

As an example, the calculation formula of the target feedforward neural network calculation result is as follows:

fnn (x) max (0, xW1+ b1) W2+ b2 formula 6

Where x denotes the first target residual network calculation result, W1 (i.e., the above first weight parameter) and b1 (i.e., the above second weight parameter) are parameters of the first fully-connected layer, and W2 (i.e., the above third weight parameter) and b2 (i.e., the above fourth weight parameter) are parameters of the second fully-connected layer.

By means of the method, the accuracy of the determined target feedforward neural network calculation result can be improved, and subsequent calculation can be conveniently carried out based on the accurate target feedforward neural network calculation result.

In step 1045, based on the target feedforward neural network calculation result corresponding to the target feature representation, a second residual error network is called for calculation processing to obtain a second target residual error network calculation result corresponding to the target feature representation.

As an example, after the target feedforward neural network calculation result is obtained, a second residual error network is called to perform calculation processing based on the target feedforward neural network calculation result, so that a second target residual error network calculation result corresponding to the target feature representation is obtained.

Referring to fig. 3G, fig. 3G is a schematic flowchart of a term processing method provided in an embodiment of the present application, and in some embodiments, based on fig. 3D, step 1045 shown in fig. 3G may also be implemented by step 10451 and step 10452, which will be described below with reference to step 10451-step 10452 shown in fig. 3G.

In step 10451, based on the target feedforward neural network calculation result and the first target residual error network calculation result, a second residual error network is called to perform summation processing, so as to obtain a second target summation result.

As an example, when determining the second target residual error network calculation result, first, based on the target feedforward neural network calculation result and the first target residual error network calculation result, the second residual error network is called to perform summation processing, so as to obtain a second target summation result.

Similar to the first residual error network, the second residual error network also includes a summation module and a normalization module, so that the summation module sums the calculation result of the target feedforward neural network and the calculation result of the first target residual error network to obtain a second target summation result.

In step 10452, the second target summation result is normalized to obtain a second target residual error network calculation result corresponding to the target feature representation.

As an example, after obtaining the second target summation result, the normalization module performs normalization processing (e.g., layer normalization processing) on the second target summation result to obtain a second target residual network calculation result corresponding to the target feature representation.

By the method, the accuracy of the determined second target residual error network calculation result can be improved, and subsequent calculation can be performed based on the accurate second target residual error network calculation result.

In step 1046, the second target residual network calculation results corresponding to the target feature representations are spliced, and a classifier is called based on the spliced result to perform classification processing, so as to obtain the similarity between the word corresponding to the first node and the word corresponding to the second node.

As an example, after determining that the target feature corresponding to the first node represents the corresponding second target residual network calculation result and the target feature corresponding to the second node represents the corresponding second target residual network calculation result through the above steps 1041 to 1045, performing a splicing process on the two second target residual network calculation results to obtain a splicing result, and calling a classifier to perform a classification process based on the splicing result to obtain a similarity between a word corresponding to the first node and a word corresponding to the second node.

As an example, the classifier is a classifier which outputs only two values, i.e. 0 or 1, and when the output result of the classifier is 1, the similarity between the word corresponding to the first node and the word corresponding to the second node is 1, i.e. the word corresponding to the first node and the word corresponding to the second node are matched; and under the condition that the output result of the classifier is 0, representing that the similarity of the word corresponding to the first node and the word corresponding to the second node is 0, namely, the word corresponding to the first node is not matched with the word corresponding to the second node.

The similarity between the words corresponding to the first node and the words corresponding to the second node is determined based on the grid matching model obtained through training, and the grid matching model obtained through training effectively utilizes the incidence relation of different components in the text to be matched, so that compared with a mode of directly matching the original text to be matched, the semantics of the text to be matched can be more effectively analyzed and understood, the standard words corresponding to the text to be matched can be more accurately determined, and the determined standard words can be more reasonable.

In some embodiments, the similarity between the word corresponding to the first node and the word corresponding to the second node may also be determined by cosine similarity and literal edit distance.

As an example, a word corresponding to the first node and a word corresponding to the second node are respectively encoded to obtain a word vector of the word corresponding to the first node and a word vector of the word corresponding to the second node, and the similarity between the word corresponding to the first node and the word corresponding to the second node is determined by calculating the cosine distance between the two word vectors. And, the larger the cosine distance between two word vectors, the higher the corresponding similarity.

As an example, the literal edit distance represents the minimum number of operations for converting from one word to another, where the operations include three operations of inserting, replacing, and deleting, for example, a word corresponding to a first node is "lung", and a word corresponding to a second node is "lung", then the literal edit distance of the word corresponding to the first node and the word corresponding to the second node is 1, that is, adding one insertion operation to the "lung" can obtain "lung". Therefore, the similarity between the word corresponding to the first node and the word corresponding to the second node can be measured according to the literal edit distance, and the smaller the numeric edit distance value is, the higher the corresponding similarity is.

In some embodiments, after obtaining the similarity between the word corresponding to the first node and the word corresponding to the second node, updating the matrix element at the first position in the initial cross operation tree matrix to be the similarity; the row coordinate of the first position is a first node, and the column coordinate of the first position is a second node.

As an example, after determining a similarity between a word corresponding to a first node and a word corresponding to a second node, updating a matrix element at a first position in the initial spanning operation tree matrix to the similarity, where a row coordinate of the first position is the first node and a column coordinate of the first position is the second node.

For example, referring to fig. 4D, if the first node is XD1 and the second node is YD1, if the similarity between the word corresponding to the determined XD1 and the word corresponding to YD1 is 0, the value of (XD1, YD1) in the initial cross operation tree matrix is updated to 0.

It should be noted that the updated cross operation tree matrix is used to calculate the similarity of the next node to the corresponding word, and after the similarity of the next node to the corresponding word is calculated, the updated cross operation tree matrix is updated again according to the similarity, and this process is repeated until the updated cross operation tree matrix is updated for the last time based on the similarity of the last node to the corresponding word, so as to obtain the final cross operation tree matrix. The final cross operation tree matrix can visually reflect the spatial similarity of the first term operation tree and the second term operation tree, so that developers can visually judge which components in the first term operation tree and the second term operation tree are successfully matched according to the cross operation tree matrix, and better interpretability is brought to a matching result.

The matrix elements of the initial cross-operation tree matrix are updated based on the similarity of the words corresponding to the first node and the words corresponding to the second node, so that the spatial similarity between the first term operation tree and the second term operation tree can be visually determined based on the updated cross-operation tree matrix.

In step 105, under the condition that the value of the similarity satisfies the value condition, determining the word corresponding to the first node as the standard word of the word corresponding to the second node.

As an example, after determining the similarity between the word corresponding to the first node and the word corresponding to the second node, if it is determined that the similarity satisfies the value-taking condition, for example, the value-taking condition may be that the similarity takes a value of 1, that is, if the similarity takes a value of 1, the word corresponding to the first node is determined as the standard word of the word corresponding to the second node.

In the embodiment of the application, by constructing the term operation tree corresponding to the text to be matched, because the data structure of the term operation tree can fully extract the semantics expressed by the text to be matched, the text to be matched can be accurately understood and analyzed, and the corresponding standard words can be accurately determined on the basis of fully understanding the text to be matched; by calculating the similarity of the word corresponding to the first node in the term operation tree corresponding to the text to be matched and the word corresponding to the second node of the same type in the term operation tree corresponding to the standard text, the determination efficiency of the standard word can be effectively improved as the similarity of the words corresponding to the nodes of the same type is calculated each time; in addition, the term operation tree can also adapt to the disorder and randomness of the text expression to be matched in different scenes, so that the accuracy of the determined standard words can be effectively improved.

In the following, an exemplary application of the embodiment of the present application in an actual term processing application scenario will be described by taking a specific field as a medical field and a text to be matched as a medical text as an example.

Medical term standardization is an important foundation for medical artificial intelligence and plays an important role in many scenes. For example, in the data center of the health record scene, medical term standardization can help hospitals to greatly reduce the workload of medical record coders and help hospitals to quickly and inexpensively construct data center for information storage and query. For another example, the medical term standardization can also standardize and get through a plurality of hospital data of different levels and different regions, thereby helping to construct an intelligent medical system with wide application range.

By way of example, referring to fig. 5A, fig. 5A is a schematic view of a medical informatization application scenario provided by an embodiment of the present application. The clinical term standardization engine can be used for carrying out intelligent auxiliary underwriting and is used for pulling through data of all parties and providing a uniform labeled diagnosis data interface. In the display interface shown in fig. 5A, the subclasses corresponding to gastric cancer including pyloric sinus cancer, virus-related gastric cancer are displayed; father class corresponding to stomach cancer comprises cancer and stomach primary malignant tumor; the morphological change in gastric cancer may be cancer; the site of occurrence of gastric cancer is the stomach; the code of gastric cancer is C16.9 and other series of terms related to gastric cancer.

The construction process of the second term operation tree is explained below. Referring to fig. 5B, fig. 5B is a schematic diagram of a term processing method provided in an embodiment of the present application.

In step 501, a text to be matched is obtained.

As an example, in response to a to-be-matched text obtaining operation, a to-be-matched text is received. The text to be matched here is medical text.

In step 502, a parsing process is performed to obtain word segmentation markers and logic components.

As an example, after obtaining the text to be matched, performing a parsing process on the text to be matched, identifying word-cutting marks such as a pause sign, a comma, a semicolon and the like in the text to be matched, and identifying logic components C such as "company, sum and or" in the text to be matched. For example, the logic component C may be identified by a table look-up method.

In step 503, an encoding process is performed to obtain A, B, D components included in the text to be matched.

As an example, a text to be matched is encoded, and a modification component a, a part component B and a disease root component D included in the text to be matched are obtained. For example, the three components may be identified by using a table lookup method, or a sequence tagging model may be called based on a text to be matched to perform tagging processing, so as to obtain a plurality of component tags corresponding to the text to be matched, where each component tag corresponds to one of a modification component, a part component, and a root component.

As an example, M ═ M for the text to be matched ₁ ,m ₂ ,m ₃ ,...,m _n Where n is the length of the text M to be matched, component extraction processing (i.e., encoding processing) is performed in the following order:

first, taking the root word component as the disease root word component as an example, the text M to be matched is determined as M ₁ ,m ₂ ,m ₃ ,...,m _n Extracting to obtain disease root component D ═ m _k ,m _k+1 ,m _k+2 ,...,m _k+l L is the length of the disease root component, k is the position of the initial word of the disease root component in the text to be matched, k is more than or equal to 1 and less than or equal to n, l is more than or equal to 0 and less than or equal to n-k, and the set { m _k ,m _k+1 ,m _k+2 ,...,m _k+l Represents a disease root component.

Secondly, from the text M to be matched, { M ═ M ₁ ,m ₂ ,m ₃ ,...,m _n Extracting to obtain a component B ═ m } _r ,m _r+1 ,m _r+2 ,...,m _r+x Where x is the length of the site component and r is the position of the start word of the site component in the text to be matched, 1. ltoreq. r.ltoreq.n, 0. ltoreq. x.ltoreq.n-r, set { m _r ,m _r+1 ,m _r+2 ,...,m _r+x Represents a site component.

Thirdly, from the text M to be matched, M ═ M ₁ ,m ₂ ,m ₃ ,...,m _n Extracting to obtain a modified component A ═ m } _q ,m _q+1 ,m _q+2 ,...,m _q+w W is the length of the modification component, q is the position of the initial word of the modification component in the text to be matched, 1 < q < n, 0 < w < n-q, and the set { m _q ,m _q+1 ,m _q+2 ,...,m _q+w Represents a modifying component.

Finally, the obtained disease root component D, the part component B and the modification component A are combined to obtain l _D 、l _B 、l _A Wherein l is _D /l _B /l _A Respectively representing the length of the D/B/A component, k _D /k _B /k _A Respectively representing the positions of the starting words of the D/B/A components in the text to be matched.

In step 504, a second term operation tree is constructed from the logical components, A, B, D components.

As an example, after identifying the logic component C, the modification component a, the part component B, and the disease root component D of the text to be matched, the second term operation tree is constructed according to these four components.

Referring to fig. 5C, fig. 5C is a schematic structural diagram of a second term operation tree provided in the embodiment of the present application.

As an example, as shown in the left side of fig. 5C, the logical component C is taken as a root node of the second term operation tree, the disease root components D1 and D2 are taken as middle nodes of the second term operation tree, the modification components a1, a2 and A3, and the part component B1 are taken as leaf nodes of the second term operation tree, and the nodes are connected according to the hierarchy to which the nodes belong, thereby obtaining the second term operation tree as shown in the left side of fig. 5C.

As an example, in the case where the text to be matched is "diabetic progressive chronic hemorrhage with peritoneal mass", wherein the logical component C is "with"; disease root component D includes "hemorrhage" and "swelling"; the modified component A corresponding to the disease root component D1 "hemorrhage" includes "diabetes", "progressive", "chronic"; the site component B corresponding to the disease root component D2 "tumor" includes "peritoneum", and thus, the second term operation tree is constructed as shown on the right side of fig. 5C.

In step 505, a second term operation tree is obtained.

As an example, after the second term operation tree is completed based on various components construction, the second term operation tree is obtained.

It should be noted that the procedure for constructing the corresponding first term operation tree based on the medical standard text is the same as that in steps 501 to 505, and is not repeated herein.

In some embodiments, after obtaining the second term operation tree, the second term operation tree may be further expanded to obtain an updated second term operation tree.

As an example, the updated second term operation tree is obtained by: if the second term operation tree comprises a first middle node and a second middle node, respectively connecting a plurality of first leaf nodes with the second middle node under the condition that the first middle node has a plurality of first leaf nodes to obtain an updated second term operation tree; wherein the modification component or site component corresponding to the first leaf node is not present in the root component corresponding to the second intermediate node.

Referring to fig. 5D, fig. 5D is a schematic structural diagram of an updated second term operation tree according to an embodiment of the present application.

The middle nodes of the second term operation tree shown on the left side of fig. 5D include disease root components D1 and D2, where there are 3 leaf nodes of D1, each of the 3 leaf nodes corresponds to a modified component, and there is no modified component in the disease root component D2, and therefore, the 3 leaf nodes can be transferred to D2, that is, the 3 leaf nodes are respectively connected to the node D2. As an example, one or more of the 3 leaf nodes may also be connected to node D2, and the right side of FIG. 5D shows the updated second term operation tree resulting from the connection of 2 of the leaf nodes A1 and A2 to node D2.

As shown in the second term operation tree on the right side of fig. 5C, since the disease root component of "hemorrhage" corresponds to 3 modifying components of "diabetes", "progressive", "chronic", and the other disease root component of "tumor" does not have a corresponding modifying component, one or more of the 3 modifying components can be transferred to "tumor", for example, only "chronic" is transferred to "tumor", thereby obtaining "diabetes/progressive/chronic/hemorrhage/concomitance/chronic/peritoneum/tumor".

Because doctors or related workers have different writing specifications in a real scene and natural language itself has a large number of cases of pre-omission, the updated second term operation tree obtained by expanding the second term operation tree can be more suitable for the real business scene.

Through the second term operation tree construction process of the steps 501 to 505, a text-form medical term (such as a diagnostic text) can be structured in a finer granularity, so as to construct a tree structure, and the tree structure is strictly defined, wherein each node corresponds to a different medical meaning, and the flexibility and the lateral extensibility of the tree structure also ensure that the service requirement of a real scene is met to the greatest extent.

In step 506, the second term operation tree is decoded to obtain a text to be matched.

As an example, after obtaining the second term operation tree, the second term operation tree may be decoded to obtain the text to be matched.

Referring to fig. 5E, fig. 5E is a schematic diagram of decoding a second term operation tree according to an embodiment of the present application.

As an example, the second term operation tree shown on the right side of fig. 5E is decoded, and the obtained text to be matched may be one of diabetes/progressive/chronic/bleeding/accompaniment/peritoneum/tumor, diabetes/chronic/progressive/bleeding/accompaniment/peritoneum/tumor, progressive/diabetes/chronic/bleeding/accompaniment/peritoneum/tumor. Therefore, the second term operation tree is decoded, and the sequence of the same level components of the text to be matched is not considered, so that the text to be matched is converted into a term operation tree structure, and the problems of disorder and randomness existing in the text expression to be matched in different scenes can be effectively solved.

The grid has been proven by a lot of work to be a structure that effectively utilizes word information and can avoid error propagation of word segmentation, and therefore, in some embodiments, after the second term operation tree is obtained, a corresponding second grid relative position matrix can be constructed based on the second term operation tree. A method of constructing the second grid relative position matrix will be described below.

Referring to fig. 6, fig. 6 is a schematic diagram of a construction method of a second grid relative position matrix provided in the embodiment of the present application.

Converting the second term operation tree shown on the left side of fig. 6 into a second grid relative position matrix structure on the right side of fig. 6, wherein the hierarchical structure information of the second term operation tree is also effectively reserved in the matrix, for example, the value of the C/D1 cell is 1, which indicates that the distance from the node C to the node D1 is 1; and the value of the cell D1/D2 is INF, which means that no path exists from the node D1 to the node D2; it should be noted that the values of the diagonal cells of the second grid relative position matrix are all 0.

In order to adapt the updated second term operation tree obtained through the expansion process, the black cells in the second grid relative position matrix are set as paths which can be obtained through the expansion process, and therefore, the values in the cells are adjusted from INF to 0.5, so that the second grid relative position matrix is constructed.

A method for determining the similarity between the word corresponding to the first node and the word corresponding to the second node by using the grid matching model is described below. Referring to fig. 7, fig. 7 is a schematic structural diagram of a grid matching model provided in an embodiment of the present application.

As shown in FIG. 7, the grid model is formed by connecting a two-tower transform model with a linear classifier. Each Transformer model comprises a coding network, a self-attention network, a first residual error network, a feedforward neural network and a second residual error network.

For a word corresponding to a first node in the first term operation tree, the grid matching model firstly utilizes a coding network to code the word to obtain a first feature representation of the word corresponding to the first node.

After the first feature representation is obtained, the self-attention network is called to perform calculation processing based on the obtained relative position matrix of the first feature representation and the first grid, and a first self-attention weight is obtained.

Here, the self-attention network is a multi-head self-attention network, and after a self-attention calculation process is performed on each head to obtain a self-attention weight corresponding to each head, a stitching process is performed on the self-attention weight corresponding to each head, and a linear transformation process is performed on a result of the stitching process to obtain a first self-attention weight. The first self-attention weight may be calculated as described above in steps 104211-104212.

After the first self-attention weight is obtained, a self-attention network is called to perform calculation processing based on the first self-attention weight and the initial cross operation tree matrix, and a second self-attention weight (namely, the target self-attention weight above) corresponding to the first feature representation is obtained. The second self-attention weight may be calculated in the manner described above with reference to 10421-step 10422.

The cross operation tree matrix will be explained below. Referring to fig. 8, fig. 8 is a schematic structural diagram of a cross operation tree matrix provided in the embodiment of the present application.

As shown in FIG. 8, a cross-treetop matrix is used to model the similarity of words corresponding to corresponding nodes between two term treetops. Matrix element 0.0 in the cross operation tree matrix represents an initialization value and can update the value along with the calculation of the model; and the matrix element NA indicates that the numerical value is artificially fixed to 0.0 and no numerical value is updated any more, so that it is ensured that only the similarity of words corresponding to nodes of the same type is calculated between the two term operation trees, for example, the similarity of C components corresponding to the term operation tree X (i.e., the first term operation tree described above) and the term operation tree Y (i.e., the second term operation tree described above) is calculated, and the similarity of corresponding D components is calculated, for example, the similarities between XD1/YD1, XD1/YD2, XD2/YD1, and XD2/YD2 are calculated, respectively.

It is worth to be noted that after the similarity between the word corresponding to the first node and the word corresponding to the second node is obtained, the matrix element at the first position in the initial cross operation tree matrix is updated to the similarity; the row coordinate of the first position is a first node, and the column coordinate of the first position is a second node.

For example, referring to fig. 8, if the first node is XD1 and the second node is YD1, if the similarity between the word corresponding to XD1 and the word corresponding to YD1 is determined to be 0, the value of (XD1, YD1) in the initial cross-operation tree matrix is updated to be 0.

And after the second self-attention weight is obtained, calling the first residual error network for calculation processing based on the second self-attention weight and the first feature representation, and obtaining a first residual error network calculation result. The first residual network calculation result may be determined in the manner described in steps 10431 to 10432 above.

And after the first residual error network calculation result is obtained, calling a feedforward neural network to perform calculation processing based on the first residual error network calculation result to obtain a first feedforward neural network calculation result.

After the first feedforward neural network calculation result is obtained, based on the first feedforward neural network calculation result and the first residual error network calculation result, the second residual error network calculation result is called to carry out calculation processing, and a second residual error network calculation result corresponding to the first characteristic representation is obtained. The second residual network calculation result may be determined in the manner described in steps 10451 to 10452 above.

And similarly executing the processing steps for the words corresponding to the second node in the second term operation tree to obtain a second residual error network calculation result corresponding to the second characteristic representation. Wherein the second node is of the same type as the first node.

After a second residual error network calculation result corresponding to the first characteristic representation and a second residual error network calculation result corresponding to the second characteristic representation are obtained, the two residual error network calculation results are spliced, and a linear classifier is called to perform classification processing based on the splicing processing result, so that the similarity between the word corresponding to the first node and the word corresponding to the second node is obtained.

As an example, the training method of the grid matching model refers to steps 201 to 208 above.

It is worth to be noted that, in the training process, the sample standard text corresponding to the first sample term operation tree and the sample text to be matched corresponding to the second sample term operation tree form a positive sample pair or a negative sample pair; the positive sample pair representation sample standard text is a standard text corresponding to a text to be matched with the sample; the negative sample pair represents that the sample standard text is not the standard text corresponding to the sample text to be matched, and in the term standard table, the standard words corresponding to the sample standard text and the standard words corresponding to the standard text corresponding to the sample text to be matched belong to the same level.

As an example, in the case that the sample standard text is a standard text corresponding to the sample text to be matched, the sample standard text and the sample text to be matched form a positive sample pair. As an example, the training label corresponding to the positive sample pair is 1, that is, the true similarity between the word in the sample standard text and the word in the sample text to be matched is 1.

For example, see table 2, where table 2 is a schematic diagram of a part of the term standard table provided in the embodiments of the present application.

TABLE 2 glossary of terms

Referring to table 2 above, in the case where the sample text to be matched is "cryptococcosis in lung", the corresponding standard text is "cryptococcosis lung", and thus, the sample text to be matched and "cryptococcosis lung" encoded as B45.0 constitute a pair of positive sample pairs.

In the term criteria table, the criteria words belonging to the same hierarchy as "pulmonary cryptococcosis" of B45.0 include: b45.1 "cryptococcosis cerebri", B45.2 "cryptococcosis dermalis", B45.3 "cryptococcosis osteocarcina", etc., so that the sample standard texts corresponding to each standard word of the 4-bit coded hierarchy of B45.x are respectively paired with the sample text to be matched to form a pair of negative samples, and the sample standard texts corresponding to the standard words of other hierarchies such as B45 hierarchy or B45.001 hierarchy are not sampled.

In order to verify the validity of the grid matching model obtained by training in the embodiment of the present application, experiments were performed using a large amount of real medical diagnostic data.

Firstly, medical experts are asked to manually label medical diagnosis data, 253 valid data are labeled, and then, a diagnosis standardized new engine and an old engine are evaluated by using the data.

TABLE 3 results of the experiment

It can be seen that the standardization performance of the grid matching model provided by the embodiment of the application on the medical diagnosis data of the real scene is improved to a certain extent compared with that of the baseline model.

Therefore, the term operation tree designed in a targeted manner in the embodiment of the application can better utilize medical information, and the grid matching model also effectively models the incidence relation of different components in the text to be matched, so that compared with the method of directly utilizing the original text to be matched to perform matching processing, the embodiment of the application can more effectively analyze and understand the medical meaning expressed by the text to be matched. In addition, because the cross operation tree matrix is introduced, the weight value in the cross operation tree matrix can be visualized to reflect which components in the operation tree of the two terms are successfully matched, and therefore, better interpretability is brought to the model.

Continuing with the exemplary structure provided by the present application for the term processing device 233 implemented as a software module, in some embodiments, as shown in fig. 2, the software module stored in the term processing device 233 of the memory 230 may include: an obtaining module 2331, configured to obtain a text to be matched in a specific field and a plurality of first term operation trees; each first term operation tree is constructed in advance based on a standard text of a specific field, and each standard text corresponds to a standard word in a term standard table of the specific field; a constructing module 2332, configured to construct a second term operation tree corresponding to the text to be matched; a first determining module 2333 for performing the following for each first term operation tree: determining a first node in the first term operation tree and a second node of the same type as the first node in the second term operation tree; a second determining module 2334, configured to determine similarity between the word corresponding to the first node and the word corresponding to the second node; wherein, the first node is any one node in the first term operation tree; a third determining module 2335, configured to determine, when the value of the similarity satisfies the value-taking condition, the word corresponding to the first node as the standard word of the word corresponding to the second node.

In the above scheme, the constructing module 2332 is configured to perform splitting and encoding processing on a text to be matched to obtain components of the text to be matched; and constructing a second term operation tree corresponding to the text to be matched based on the components of the text to be matched.

In the scheme, the components comprise modification components, part components, root components and logic components; a construction block 2332 for determining the logical components as root nodes of a second term operation tree; determining the root element as an intermediate node of the second term operation tree; wherein the intermediate node is a child node of the root node; determining the part component and the modification component as leaf nodes of the second term operation tree; wherein, the leaf node is a child node of the intermediate node; and connecting the root node, the middle node and the leaf node according to the belonged hierarchy to obtain a second term operation tree corresponding to the text to be matched.

In the above scheme, the apparatus further comprises: the connection module is used for connecting the first leaf nodes with the second intermediate node respectively under the condition that the first intermediate node has a plurality of first leaf nodes, so as to obtain an updated second term operation tree; wherein the modification component or site component corresponding to the first leaf node is absent from the root component corresponding to the second intermediate node.

In the above solution, the second determining module 2334 is configured to respectively use the first node and the second node as target nodes, and perform the following processing: performing coding processing based on the word call coding network corresponding to the target node to obtain target characteristic representation of the word corresponding to the target node; calling a self-attention network to perform calculation processing based on the target feature representation to obtain a target self-attention weight corresponding to the target feature representation; calling a first residual error network to perform calculation processing based on the target self-attention weight to obtain a first target residual error network calculation result corresponding to the target feature representation; calling a feedforward neural network to perform calculation processing based on a first target residual error network calculation result corresponding to the target characteristic representation to obtain a target feedforward neural network calculation result corresponding to the target characteristic representation; calling a second residual error network to perform calculation processing based on a target feedforward neural network calculation result corresponding to the target characteristic representation to obtain a second target residual error network calculation result corresponding to the target characteristic representation; and splicing the calculation results of the second target residual error network corresponding to the target characteristic representation, and calling a classifier to classify the calculation results based on the splicing results to obtain the similarity between the words corresponding to the first node and the words corresponding to the second node.

In the above solution, the second determining module 2334 is configured to invoke a self-attention network to perform calculation processing based on the target feature representation and a target grid relative position matrix corresponding to the target feature representation, so as to obtain a first self-attention weight corresponding to the target feature representation; the target grid relative position matrix is determined based on a term operation tree corresponding to the target node; calling a self-attention network to perform calculation processing based on the first self-attention weight and the initial cross-operation tree matrix to obtain a target self-attention weight corresponding to the target feature representation; the cross operation tree matrix comprises a plurality of matrix elements which are in one-to-one correspondence with a plurality of node pairs, the matrix elements represent the similarity of words corresponding to two nodes in the node pairs, the types of the two nodes are the same, one node is from the first term operation tree, and the other node is from the second term operation tree.

In the above scheme, the self-attention network is a multi-head self-attention network; a second determining module 2334, configured to invoke a multi-head self-attention network to perform self-attention calculation processing based on the target feature representation and the target grid relative position matrix corresponding to the target feature representation, so as to obtain a self-attention weight corresponding to each head;

and performing splicing processing on the self-attention weight corresponding to each head, performing linear transformation processing on the obtained splicing processing result, and determining the linear transformation processing result as a first self-attention weight corresponding to the target feature representation.

In the above solution, the constructing module 2332 is further configured to construct the target grid relative position matrix by: determining the path distance between the node i and the node j based on the hierarchy and the path relation of the node i and the node j in the target term operation tree; determining the path distance as the value of a matrix element (i, j) of the target grid relative position matrix; when the target node is a first node, the target term operation tree is a first term operation tree, and when the target node is a second node, the target term operation tree is a second term operation tree; i is 1. ltoreq. N, j is 1. ltoreq. N, N being the number of nodes included in the target term operation tree.

In the foregoing solution, the constructing module 2332 is further configured to determine the first threshold as the path distance between the node i and the node j when the node i and the node j belong to the same hierarchy; under the condition that the node i and the node j do not belong to the same level and do not have a path relation, determining a first threshold value as the path distance between the node i and the node j; and determining the level number of the layer between the node i and the node j as the path distance between the node i and the node j under the condition that the node i and the node j do not belong to the same layer and have the path relation.

In the above solution, the second determining module 2334 is configured to invoke the first residual error network to perform summation processing based on the target self-attention weight and the target feature representation, so as to obtain a first target summation result; and carrying out normalization processing on the first target summation result to obtain a first target residual error network calculation result corresponding to the target characteristic representation.

In the above scheme, the feedforward neural network includes a first fully-connected layer and a second fully-connected layer; a second determining module 2334, configured to invoke the first full link layer for performing calculation processing based on the first target residual error network calculation result, to obtain a first full link layer target calculation result; and calling a second full-connection layer for calculation processing based on the target calculation result of the first full-connection layer, and determining the obtained target calculation result of the second full-connection layer as a target feedforward neural network calculation result corresponding to the target feature representation.

In the above scheme, the second determining module 2334 is configured to invoke a second residual error network to perform summation processing based on the target feedforward neural network calculation result and the first target residual error network calculation result, so as to obtain a second target summation result; and carrying out normalization processing on the second target summation result to obtain a second target residual error network calculation result corresponding to the target characteristic representation.

In the above scheme, the apparatus further comprises: the updating module is used for updating the matrix elements at the first position in the initial cross operation tree matrix into similarity; the row coordinate of the first position is a first node, and the column coordinate of the first position is a second node.

In the above scheme, the apparatus further comprises: the parameter updating module is used for calling a coding network to perform coding processing based on the sample words corresponding to the sample nodes to obtain sample characteristic representation of the sample words corresponding to the sample nodes; calling a self-attention network to perform calculation processing based on the sample feature representation to obtain a sample self-attention weight corresponding to the sample feature representation; calling a first residual error network to perform calculation processing based on the sample self-attention weight to obtain a first sample residual error network calculation result corresponding to the sample characteristic representation; calling a feedforward neural network to perform calculation processing based on a first sample residual error network calculation result corresponding to the sample characteristic representation to obtain a sample feedforward neural network calculation result corresponding to the sample characteristic representation; based on the sample characteristic representation corresponding sample feedforward neural network calculation result, calling a second residual error network to perform calculation processing to obtain a second sample residual error network calculation result corresponding to the sample characteristic representation; splicing the second sample residual error network calculation results corresponding to the sample characteristic representation, and calling a classifier to perform classification processing based on the splicing result to obtain the prediction similarity of the sample word corresponding to the first sample node and the sample word corresponding to the second sample node; substituting the predicted similarity and the corresponding real similarity into a loss function for calculation processing to obtain a loss value; updating parameters of the classifier, parameters of the second residual error network, parameters of the feedforward neural network, parameters of the first residual error network, parameters of the self-attention network and parameters of the coding network based on the loss values in the back propagation process; the first sample node is any node in the first sample term operation tree, the second sample node is any node in the second sample term operation tree, and the types of the first sample node and the second sample node are the same.

In the above scheme, the sample standard text corresponding to the first sample term operation tree and the sample text to be matched corresponding to the second sample term operation tree form a positive sample pair or a negative sample pair; the positive sample pair representation sample standard text is a standard text corresponding to a text to be matched with the sample; the negative sample pair represents that the sample standard text is not the standard text corresponding to the sample text to be matched, and in the term standard table, the standard words corresponding to the sample standard text and the standard words corresponding to the standard text corresponding to the sample text to be matched belong to the same level.

Embodiments of the present application provide a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions to cause the computer device to execute the term processing method described above in the embodiments of the present application.

Embodiments of the present application provide a computer-readable storage medium having stored thereon executable instructions that, when executed by a processor, cause the processor to perform the term processing method provided by embodiments of the present application.

In some embodiments, the computer-readable storage medium may be memory such as FRAM, ROM, PROM, EPROM, EEP ROM, flash memory, magnetic surface memory, optical disk, or CD-ROM; or may be various devices including one or any combination of the above memories.

In some embodiments, the executable instructions may be in the form of a program, software module, script, or code written in any form of programming language, including compiled or interpreted languages, or declarative or procedural languages, and it may be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment.

By way of example, executable instructions may be deployed to be executed on one computing device or on multiple computing devices at one site or distributed across multiple sites and interconnected by a communication network.

In summary, in the embodiment of the present application, by constructing the term operation tree corresponding to the text to be matched, because the term operation tree is a data structure that can fully extract the semantics expressed by the text to be matched, the text to be matched can be accurately understood and analyzed, and the corresponding standard words can be accurately determined on the basis of fully understanding the text to be matched; by calculating the similarity of the word corresponding to the first node in the term operation tree corresponding to the text to be matched and the word corresponding to the second node of the same type in the term operation tree corresponding to the standard text, the determination efficiency of the standard word can be effectively improved as the similarity of the words corresponding to the nodes of the same type is calculated each time; in addition, the term operation tree can also adapt to the disorder and randomness of the text expression to be matched in different scenes, so that the accuracy of the determined standard words can be effectively improved.

The above description is only an example of the present application, and is not intended to limit the scope of the present application. Any modification, equivalent replacement, and improvement made within the spirit and scope of the present application are included in the protection scope of the present application.

Claims

1. A method for term processing, the method comprising:

performing the following for each of the first term operation trees:

determining similarity of the word corresponding to the first node and the word corresponding to the second node; wherein the first node is any one node in the first term operation tree;

2. The method of claim 1,

the constructing of the second term operation tree corresponding to the text to be matched includes:

splitting and coding the text to be matched to obtain components of the text to be matched;

and constructing a second term operation tree corresponding to the text to be matched based on the components of the text to be matched.

3. The method of claim 2,

the components comprise a modification component, a part component, a root component and a logic component; the constructing a second term operation tree corresponding to the text to be matched based on the components of the text to be matched comprises the following steps:

determining the logical component as a root node of the second term operation tree;

determining the root component as an intermediate node of the second term operation tree; wherein the intermediate node is a child node of the root node;

determining the site component and the modification component as leaf nodes of the second term operation tree; wherein the leaf nodes are child nodes of the intermediate node;

and connecting the root node, the intermediate node and the leaf node according to the belonged hierarchy to obtain a second term operation tree corresponding to the text to be matched.

4. The method of claim 3,

in a case that the second term operation tree includes a first intermediate node and a second intermediate node, after the obtaining of the second term operation tree corresponding to the text to be matched, the method further includes:

under the condition that a plurality of first leaf nodes exist in the first intermediate node, respectively connecting the first leaf nodes with the second intermediate node to obtain an updated second term operation tree;

wherein the modification component or the part component corresponding to the first leaf node is not possessed by the root component corresponding to the second intermediate node.

5. The method of claim 1,

the determining the similarity between the word corresponding to the first node and the word corresponding to the second node includes:

respectively taking the first node and the second node as target nodes, and executing the following processes:

coding processing is carried out on the basis of the word calling coding network corresponding to the target node, and target characteristic representation of the word corresponding to the target node is obtained;

calling a self-attention network to perform calculation processing based on the target feature representation to obtain a target self-attention weight corresponding to the target feature representation;

calling a first residual error network to perform calculation processing based on the target self-attention weight to obtain a first target residual error network calculation result corresponding to the target feature representation;

calling a feedforward neural network to perform calculation processing based on a first target residual error network calculation result corresponding to the target characteristic representation to obtain a target feedforward neural network calculation result corresponding to the target characteristic representation;

based on a target feedforward neural network calculation result corresponding to the target feature representation, calling a second residual error network to perform calculation processing to obtain a second target residual error network calculation result corresponding to the target feature representation;

and splicing the calculation results of the second target residual error network corresponding to the target feature representation, and calling a classifier to classify the calculation results based on the splicing results to obtain the similarity between the words corresponding to the first node and the words corresponding to the second node.

6. The method of claim 5,

the calling a self-attention network to perform calculation processing based on the target feature representation to obtain a target self-attention weight corresponding to the target feature representation includes:

calling the self-attention network to perform calculation processing based on the target feature representation and a target grid relative position matrix corresponding to the target feature representation to obtain a first self-attention weight corresponding to the target feature representation; the target grid relative position matrix is determined based on a term operation tree corresponding to the target node;

calling the self-attention network to perform calculation processing based on the first self-attention weight and the initial cross-operation tree matrix to obtain a target self-attention weight corresponding to the target feature representation;

wherein the cross operation tree matrix comprises a plurality of matrix elements corresponding to a plurality of node pairs in a one-to-one manner, the matrix elements represent the similarity of words corresponding to two nodes in the node pairs, the two nodes are of the same type, and one node is from the first term operation tree and the other node is from the second term operation tree.

7. The method of claim 6,

the self-attention network is a multi-head self-attention network;

the invoking the self-attention network to perform calculation processing based on the target feature representation and a target grid relative position matrix corresponding to the target feature representation to obtain a first self-attention weight corresponding to the target feature representation includes:

calling the multi-head self-attention network to perform self-attention calculation processing based on the target feature representation and a target grid relative position matrix corresponding to the target feature representation to obtain a self-attention weight corresponding to each head;

8. The method of claim 6,

before the invoking the self-attention network for calculation processing based on the target feature representation and a target grid relative position matrix corresponding to the target feature representation, the method further includes:

constructing the target grid relative position matrix by:

determining a path distance between a node i and a node j based on the hierarchy of the node i and the node j in the target term operation tree and a path relation;

determining the path distance as a value of a matrix element (i, j) of the target grid relative position matrix; wherein, when the target node is the first node, the target term operation tree is the first term operation tree, and when the target node is the second node, the target term operation tree is the second term operation tree; i is not less than 1 and not more than N, j is not less than 1 and not more than N, and N is the number of nodes included in the target term operation tree.

9. The method of claim 8,

the determining a path distance of the node i and the node j based on the hierarchy of the node i and the node j in the target term operation tree and the path relationship comprises:

determining a first threshold as a path distance of the node i and the node j when the node i and the node j belong to the same hierarchy;

determining the first threshold as a path distance between the node i and the node j under the condition that the node i and the node j do not belong to the same hierarchy and have no path relation;

and under the condition that the node i and the node j do not belong to the same level and have a path relation, determining the level number of the node i and the node j as the path distance of the node i and the node j.

10. The method of claim 5,

the calling a first residual error network to perform calculation processing based on the target self-attention weight to obtain a first target residual error network calculation result corresponding to the target feature representation includes:

calling the first residual error network to carry out summation processing based on the target self-attention weight and the target feature representation to obtain a first target summation result;

and carrying out normalization processing on the first target summation result to obtain a first target residual error network calculation result corresponding to the target feature representation.

11. The method of claim 5,

the feedforward neural network comprises a first fully-connected layer and a second fully-connected layer;

the step of calling a feedforward neural network to perform calculation processing based on a first target residual error network calculation result corresponding to the target feature representation to obtain a target feedforward neural network calculation result corresponding to the target feature representation includes:

calling the first full-connection layer to perform calculation processing based on the first target residual error network calculation result to obtain a first full-connection layer target calculation result;

and calling the second full-connection layer for calculation processing based on the first full-connection layer target calculation result, and determining the obtained second full-connection layer target calculation result as a target feedforward neural network calculation result corresponding to the target feature representation.

12. The method of claim 5,

the step of calling a second residual error network to perform calculation processing based on the target feedforward neural network calculation result corresponding to the target characteristic representation to obtain a second target residual error network calculation result corresponding to the target characteristic representation includes:

calling the second residual error network to carry out summation processing based on the target feedforward neural network calculation result and the first target residual error network calculation result to obtain a second target summation result;

and carrying out normalization processing on the second target summation result to obtain a second target residual error network calculation result corresponding to the target feature representation.

13. The method of claim 5,

after the obtaining of the similarity between the word corresponding to the first node and the word corresponding to the second node, the method further includes:

updating the matrix element at the first position in the initial cross operation tree matrix into the similarity; the row coordinate of the first position is the first node, and the column coordinate of the first position is the second node.

14. The method of claim 1,

before the determining the similarity between the word corresponding to the first node and the word corresponding to the second node, the method further includes:

respectively taking the first sample node and the second sample node as sample nodes, and executing the following processing:

calling a coding network to perform coding processing based on the sample words corresponding to the sample nodes to obtain sample characteristic representations of the sample words corresponding to the sample nodes;

calling a self-attention network to perform calculation processing based on the sample feature representation to obtain a sample self-attention weight corresponding to the sample feature representation;

calling a first residual error network to perform calculation processing based on the sample self-attention weight to obtain a first sample residual error network calculation result corresponding to the sample characteristic representation;

calling a feedforward neural network to perform calculation processing based on a first sample residual error network calculation result corresponding to the sample characteristic representation to obtain a sample feedforward neural network calculation result corresponding to the sample characteristic representation;

based on the sample feedforward neural network calculation result corresponding to the sample characteristic representation, calling a second residual error network to perform calculation processing to obtain a second sample residual error network calculation result corresponding to the sample characteristic representation;

splicing the second sample residual error network calculation results corresponding to the sample characteristic representations, and calling a classifier to perform classification processing based on the splicing results to obtain the prediction similarity of the sample words corresponding to the first sample node and the sample words corresponding to the second sample node;

substituting the prediction similarity and the corresponding real similarity into a loss function for calculation processing to obtain a loss value;

updating parameters of the classifier, parameters of the second residual network, parameters of the feedforward neural network, parameters of the first residual network, parameters of the self-attention network, and parameters of the encoding network based on the loss values in a back propagation process;

the first sample node is any node in a first sample term operation tree, the second sample node is any node in a second sample term operation tree, and the types of the first sample node and the second sample node are the same.

15. The method of claim 14,

the sample standard text corresponding to the first sample term operation tree and the sample text to be matched corresponding to the second sample term operation tree form a positive sample pair or a negative sample pair;

the positive sample pair represents that the sample standard text is a standard text corresponding to the text to be matched with the sample; the negative sample pair is used for representing that the sample standard text is not the standard text corresponding to the sample text to be matched, and in the term standard table, the standard words corresponding to the sample standard text and the standard words corresponding to the standard text corresponding to the sample text to be matched belong to the same level.

16. A term processing apparatus, the apparatus comprising:

17. An electronic device, characterized in that the electronic device comprises:

a memory for storing executable instructions;

a processor for implementing the term processing method as claimed in any one of claims 1 to 15 when executing executable instructions stored in the memory.

18. A computer-readable storage medium having stored thereon executable instructions for implementing the term processing method as claimed in any one of claims 1 to 15 when executed by a processor.

19. A computer program product comprising a computer program or instructions, characterized in that the computer program or instructions, when executed by a processor, implement the term processing method as claimed in any one of claims 1 to 15.