WO2021051763A1

WO2021051763A1 - Term matching method and apparatus, terminal, and computer readable storage medium

Info

Publication number: WO2021051763A1
Application number: PCT/CN2020/079603
Authority: WO
Inventors: 王利
Original assignee: 深圳中兴网信科技有限公司
Priority date: 2019-09-16
Filing date: 2020-03-17
Publication date: 2021-03-25
Also published as: CN112507107A

Abstract

Disclosed herein are a term matching method and apparatus, a terminal, and a computer readable storage medium. The term matching method comprises: according to multiple similarity calculation algorithms, respectively calculating multiple similarity values between a first term and a second term; and assigning a weight to each similarity value, respectively multiplying the multiple similarity values by the corresponding weights, and adding the multiplication results to obtain a weighted sum similarity for the multiple similarity values, the value of the weighted sum similarity being used to represent the degree to which the first term and the second term match.

Description

Term matching method, device, terminal and computer readable storage medium

This application claims the priority of a Chinese patent application filed with the Chinese Patent Office with an application number of 201910869178.2 on September 16, 2019. The entire content of this application is incorporated into this application by reference.

Technical field

This application relates to the field of medical informatization, for example, it relates to a term matching method, a term matching device, a terminal, and a computer-readable storage medium.

Background technique

Medical terms (hereinafter referred to as terms) are professional terms in the medical field, used to refer to various things, phenomena, characteristics, relationships, and processes in the medical field, such as diseases, drugs, surgical operations, inspections, etc. These terms are essential components of the clinical information system to express medical information.

There is a lack of relevant medical terminology standards and the system is not yet complete. The granularity and expression of the terms in these terminology standards are very different from those in actual clinical application scenarios, and it is difficult to directly apply them to clinical information systems. Therefore, most medical information systems of medical institutions have created their own private terminology dictionaries. Due to the large number of medical information system vendors, there are differences in similar term dictionaries of different systems in the same institution, for example, there are differences between drug term dictionaries. These reasons cause the heterogeneity of term names and codes in multiple clinical information systems to be very serious, making it impossible to interoperate between medical information systems, and it is difficult to share medical data. In this regard, the exchange of information between different medical information systems needs to map and match the term dictionaries of different systems. This work is generally performed manually, and the error rate is relatively high, which has become a bottleneck in the integration, analysis and reuse of medical data.

Summary of the invention

This application provides a term matching method, including: calculating multiple similarity values between a first term and a second term according to multiple similarity calculation algorithms; assigning weights to each similarity value, and multiple similarity values The corresponding weights are respectively multiplied, and the product results are added to obtain a weighted summation similarity of multiple similarity values, where the weighted summation similarity value is used to indicate the degree of matching between the first term and the second term.

The present application also provides a term matching device, which includes a memory, a processor, and a program stored in the memory and capable of running on the processor. When the program is executed by the processor, the term matching method as in the above technical solution is implemented.

This application also provides a terminal, including: the term matching device described in the above technical solution.

The present application also provides a computer-readable storage medium on which a computer program is stored, and when the computer program is executed, the term matching method defined in the above technical solution is implemented.

Description of the drawings

Fig. 1 shows a schematic flowchart of a term matching method according to an embodiment of the present application;

Fig. 2 shows a schematic block diagram of a term matching device according to an embodiment of the present application;

Fig. 3 shows a schematic block diagram of a terminal according to an embodiment of the present application;

Fig. 4 shows a schematic block diagram of a computer-readable storage medium according to an embodiment of the present application.

detailed description

The application will be described below with reference to the drawings and specific implementations.

In the following description, a number of implementation manners are set forth in order to facilitate the understanding of this application. However, this application can also be implemented in other ways different from those described here. Therefore, the scope of protection of this application is not specifically disclosed below. Limitations of the embodiment.

Example one

As shown in Figure 1, a term matching method provided by an embodiment of the present application includes:

Step 102: Calculate multiple similarity values between the first term and the second term according to multiple similarity calculation algorithms.

Step 104: Assign a weight to each similarity value, the multiple similarity values are respectively multiplied by the corresponding weights, and the product results are added to obtain a weighted sum similarity of the multiple similarity values. Among them, the weighted summation similarity value is used to indicate the degree of matching between the first term and the second term.

In this embodiment, considering the complexity of term composition, a variety of similarity calculation methods are used to calculate the similarity of two terms to be matched (the first term and the second term) from multiple dimensions, and the similarity is calculated by weighting. The sum method integrates multiple similarities, and the weighted sum similarity is used to express the matching degree of two terms. Among them, corresponding to multiple similarity calculation methods, multiple similarity values are generated, and the weighting process can balance the influence of multiple similarity calculation methods on the final sum of similarity, and can integrate multiple similarity calculation methods. Features accurately represent the matching degree of terms. It improves the accuracy of term matching, solves the problems of low manual operation efficiency and high error rate, and helps promote medical information sharing.

According to the term matching method of the foregoing embodiment, in some application scenarios, step S102 and step S104 include: designating a term in the first terminology system as the first term, and taking any term in the second terminology system as the first term. Two terms; multiple similarity values between the first term and the second term are calculated according to multiple similarity calculation algorithms; weights are assigned to each similarity value, and multiple similarity values are multiplied by the corresponding weights respectively, and the product The results are added together to obtain the weighted sum similarity of multiple similarity values; by changing the value of the second term multiple times, a calculation is performed every time the second term is changed, thereby generating multiple weighted sum similarities, where , The maximum value of the multiple weighted summation similarities is used to indicate the matching degree between a specified term in the first terminology system and a second term in the second terminology system.

In this embodiment, the terminology system contains multiple terms, and each term consists of a string of characters. In the first terminology system, select a term (first term), and traverse the terms in the second terminology system (second term). Term), each time a term is selected from the second terminology system and the term in the first terminology system is selected for weighted sum similarity calculation, multiple weighted sum similarity values can be calculated through multiple selections, of which, multiple The term in the second terminology system corresponding to the largest value among the weighted sum similarity values is the matching result. The accuracy of term matching is improved, and the efficiency of establishing term matching mapping relationship is higher. Compared with manual operation, the speed is improved and the error rate is reduced.

According to the term matching method of the foregoing embodiment, in some application scenarios, step S102 and step S104 include: taking a term in the first terminology system as the first term, and taking a term in the second terminology system as the second term. Term; calculate multiple similarity values between the first term and the second term according to multiple similarity calculation algorithms; assign a weight to each similarity value, and multiple similarity values are respectively multiplied by the corresponding weights, and the product is the result Add together to obtain the weighted sum similarity of multiple similarity values; calculate multiple times by changing the value of the first term and the value of the second term to generate multiple weighted sum similarities; The weighted summation similarity performs a summation operation to generate a total matching degree value, where the total matching degree value is used to indicate the matching degree between the first terminology system and the second terminology system.

In this embodiment, the term system contains multiple terms, and each term consists of a string of characters. Extract a term from the first terminology system and the second terminology system respectively, and calculate the similarity value of the two terms in a variety of ways, and then calculate the weighted sum similarity, after multiple extractions and sum the similarity The calculation (calculates the weighted summation similarity between two terms in the two terminology systems), can get multiple summation similarity values, these similarity values are accumulated to get the total matching value, and the total matching value can be expressed The degree of matching between the first terminology system and the second terminology system.

According to the term matching method of the foregoing embodiment, optionally, the calculation process further includes: in the step of assigning weights, performing a weighted summation of multiple similarity values through multiple weight combinations, so that each weight combination generates one Total matching degree value, multiple weight combinations generate multiple total matching degree values; record the maximum value of the multiple total matching degree values, which is used to represent the matching result of the first terminology system and the second terminology system.

In this embodiment, when calculating the weighted sum similarity between two terms, multiple sets of different weight combinations are used to perform a weighted sum calculation on multiple similarity values between the same pair of terms to obtain multiple weighted sums. And similarity, the weighted sum of similarity of multiple pairs of terms can be accumulated to obtain the total matching degree between term systems, and then multiple total matching degrees can be obtained according to different weight combinations, among which the maximum value of multiple total matching degrees Used to indicate the matching result of the first terminology system and the second terminology system. Optionally, the sum of multiple weights in each group of weights is equal to 1, and the weighted summation similarity obtained by this combination of weights reflects the weighted average similarity of multiple similarity calculation methods.

According to the term matching method of the foregoing embodiment, optionally, multiple similarity values between the first term and the second term are calculated according to multiple similarity calculation algorithms, including: calculating the difference between the first term and the second term separately Cosine similarity value, Jaccard similarity value, and hash similarity value.

In this embodiment, multiple similarity calculation algorithms include: cosine similarity (Cosine similarity), Jaccard similarity (Jaccard similarity), and hash similarity (Simhash similarity). Among them, Cosine similarity can calculate the similarity between two short texts from the word frequency dimension, and convert (encode) the term into a word frequency vector and then calculate the similarity between the two terms by the Cosine similarity calculation algorithm. Jaccard similarity is also known as Jaccard coefficient. The Jaccard similarity calculation algorithm is used for document data. In the case of binary attributes, two terms are reduced to Jaccard coefficient to know the degree of similarity between the two terms. The Simhash similarity calculation algorithm calculates the Hamming distance between the terms after dimensionality reduction by encoding and dimensionality reduction of terms, and calculates the similarity degree according to the Hamming distance. The calculation methods of the above three similarity calculation algorithms are different, and the calculation focuses are different. Comprehensive consideration of the three similarity values between terms can improve the accuracy of term matching.

According to the term matching method of the foregoing embodiment, optionally, calculating the cosine similarity value between the first term and the second term includes: segmenting the first term and the second term based on the word segmentation dictionary, and comparing the first term with the second term based on the stop word dictionary. A term and a second term are used to remove stop words, and a first phrase list corresponding to the first term and a second phrase list corresponding to the second term are generated; the first phrase list and the second phrase list are encoded to obtain the corresponding The first word frequency vector in the first phrase list and the second word frequency vector corresponding to the second word list; calculate the cosine value between the first word frequency vector and the second word frequency vector, where the cosine value is the first word frequency vector and the second word frequency vector The cosine similarity value of the two-word frequency vector. The larger the cosine value, the higher the similarity.

In this embodiment, the term is segmented and stopped, the term is disassembled into a list of phrases, and the list of phrases is encoded (for example, one-hot encoding (oneHot encoding)) to obtain the term frequency vector and term frequency vector of the term. As the input of the cosine similarity calculation algorithm, the cosine similarity between two terms can be calculated.

According to the term matching method of the foregoing embodiment, optionally, calculating the Jackard similarity value between the first term and the second term includes: segmenting the first term and the second term based on the word segmentation dictionary, and based on the stop word dictionary De-stop words for the first term and the second term, generate a first phrase list corresponding to the first term and a second phrase list corresponding to the second term; encode the first phrase list and the second phrase list, Obtain the first term frequency vector corresponding to the first phrase list and the second term frequency vector corresponding to the second phrase list; calculate the ratio of the intersection and union of the first term frequency vector and the second term frequency vector to obtain the Jeckard similarity Value, among which, the greater the Jaccard similarity value, the higher the similarity.

In this embodiment, the terms are segmented and stop words are removed, the terms are disassembled into a list of phrases, and the list of phrases is encoded to obtain the vector value of the term. According to the Jaccard similarity calculation algorithm, the term can be evaluated. The degree of similarity.

According to the term matching method of the foregoing embodiment, optionally, calculating the hash similarity value between the first term and the second term includes: segmenting the first term and the second term based on the word segmentation dictionary, and pairing the first term with the second term based on the stop word dictionary The first term and the second term are used to remove stop words, and a first phrase list corresponding to the first term and a second phrase list corresponding to the second term are generated; each of the first phrase list and the second phrase list is generated The word is converted into a hash value number string, the hash value number string is multiplied by the weight of the word to obtain the sequence string of each word; the sequence strings of multiple words in the first phrase list are added together to obtain the sequence corresponding to the first phrase The first term sequence string of the list, the sequence strings of multiple words in the second phrase list are added to obtain the second term sequence string corresponding to the second phrase list; the first term sequence string and the second term sequence are respectively The string is converted into a binary string; the Hamming distance between the binary string of the first term sequence string and the binary string of the second term sequence string is calculated; the hash similarity between the first term and the second term is determined according to the Hamming distance , Where the greater the hash similarity value, the higher the similarity. The calculation formula of the hash similarity is: S=1/(h+1), where S is the hash similarity and h is the Hamming distance.

In this embodiment, the term is first disassembled into a phrase list, and then each word (word) in the phrase list is hashed (calculated by the hash value of the word), and each word (word) is calculated according to the importance of the word. Words are weighted, and the weighted hash number string is accumulated to obtain the sequence value of the term. After the dimensionality of the sequence value is reduced, the Hamming distance between the terms can be calculated. According to the formula S=1/(h+1), obtain The hash similarity value is used to indicate the degree of similarity between terms.

Example two

According to the term matching method provided in the first embodiment, matching diagnostic term systems from two hospitals, term system A and term system B, mainly includes the following processes:

Take the term a ₁ "kidney and ureteral stones" from the term system A, and take the term b ₁ "kidney stones with ureteral stones" from the term system B.

Use the same method to preprocess term a ₁ and term b _1:

First perform word segmentation based on the word segmentation dictionary, and then remove the stop words based on the stop word dictionary, and get two list of phrases a ₁ "['kidney','and','ureter','calculus']" and b ₁ "[''Kidney','calculi','accompanied','ureter','calculi']".

Perform oneHot encoding on the phrase lists a ₁ and b ₁ to obtain the word frequency vectors a ₁ "[1,1,1,1,0]" and b ₁ "[1,2,1,0,1]".

Calculate the cosine similarity values of word frequency vectors a ₁ and b _{1 respectively}

Jaccard similarity value

Simhash similarity value

Among them, all three similarity calculation methods need to be executed, and the similarity values obtained by the three calculation methods jointly participate in the weighted sum calculation.

Cosine similarity value: Calculate _{the cosine value between word frequency vectors a 1} and b _1. The larger the value, the higher the similarity.

Cosine similarity value

Calculate according to the following formula:

Jaccard similarity value: Given two sets A, B, the Jaccard coefficient is defined as the ratio of the size of the intersection of A and B to the size of the union. The larger the Jaccard value, the higher the similarity. Among them, the set A corresponds to a ₁ , Set B corresponds to b ₁ .

Jaccard similarity value

Calculate according to the following formula:

Simhash similarity value: Through the hash algorithm, each word is turned into a hash value number string. For example, "kidney" is calculated as 100101 through the hash algorithm, and "stone" is calculated as 101011 through the hash algorithm.

Multiply the number string by the number of occurrences of each word as the weight, and add all the number strings according to each digit. If a digit is 0, it is counted as -1. For example, the phrase lists a ₁ and b ₁ are respectively weighted and summed to obtain {12, 27, -33, 5, -1, 7} and {23, -21, -6, 11, 8, 14}.

The number string after the weighted summation becomes a 01 string. If a digit is greater than 0, the digit is 1, if a digit is less than or equal to 0, the digit is 0. For example, the phrase list a ₁ The 01 strings corresponding to b ₁ are 110101 and 100111 respectively.

Calculate the Hamming distance h: the sum of the digits with different codes on the corresponding bits of the two number strings. The Hamming distance of the _{terms a 1} and b _{1 is 2.}

Simhash similarity value

Calculate according to the following formula:

Give weight: use three weights

Calculate the weighted summation similarity s ¹¹ , where,

For term a _{1 in} terminology system A and any term b _j in terminology system B, the same process and weight are adopted

Calculate the weighted average similarity of _{a 1} and b _j ^{and the weighted sum similarity s 1j} , and record the maximum value of ^{s 1j as}

For example, s ^1j = {0.456, 0.538, 0.324, 0.647, 0.489}, then

For any term a _{i in the} terminology system A and any term b _j in the terminology system B, use the same process and the same weight

Calculating a _i and b _i weighted sum weighted average similarity and similarity s ^ij

such as

Calculate the weight of term system A and term system B

The total matching degree T ^{1 below} :

Select multiple sets of weights

^{Calculate multiple total matching degrees T k of} term system A and term system B, for example, T ^k ={2.212, 1.876, 2.436, 1.943, 2.113, 2.085}.

Take the maximum value of the total matching degree between term system A and term system B as the result of term matching between term system A and term system B:

The total matching degree value corresponding to the third group of weights is 2.436, that is, the matching result between term system A and term system B when k=3 is 2.436 as the final matching result.

In the above steps, the Simhash similarity value is calculated by the Simhash similarity algorithm. The above steps do not fully disclose all the calculation steps, that is, the Simhash similarity value calculated according to conventional technical means can be used to participate in the weighting proposed by this application. Sum the similarity calculation, and get the matching degree of the term. As the algorithm changes, some steps of the algorithm may change, but the final result of the algorithm can still be applied to the term matching method proposed in this application.

Example three

As shown in FIG. 2, a term matching device 200 according to an embodiment of the present application includes: a memory 202, a processor 204, and a program stored on the memory 202 and running on the processor 204, and the program is executed by the processor 204 When implementing the term matching method as in any of the above embodiments. The term matching device 200 includes the effect of the term matching method as in any one of the above embodiments, and will not be repeated here.

Example four

As shown in FIG. 3, a terminal 300 according to an embodiment of the present application includes: the term matching device 200 described in the third embodiment. When the terminal 300 is running, it can realize: calculate multiple similarity values between the first term and the second term according to multiple similarity calculation algorithms; assign a weight to each similarity value, and the multiple similarity values are respectively and The corresponding weights are multiplied, and the product results are added to obtain the weighted summation similarity of the multiple similarity values, wherein the weighted summation similarity value is used to represent the first term and the second term The degree of matching of terms. The terminal 300 includes the effect of the term matching method as in any of the foregoing embodiments, and details are not described herein again.

Example five

As shown in FIG. 4, according to an embodiment of the present application, a computer-readable storage medium 400 is also provided, on which a computer program 402 is stored. When the computer program 402 is executed, the terminology defined in any of the above embodiments is implemented. Matching method.

When the computer program 402 is executed, it is realized: according to multiple similarity calculation algorithms, multiple similarity values of the first term and the second term are respectively calculated; weights are assigned to each similarity value, and the multiple similarity values are compared with each other. The corresponding weights are multiplied, and the product results are added to obtain a weighted summation similarity of multiple similarity values, where the weighted summation similarity value is used to indicate the degree of matching between the first term and the second term.

According to the computer program 402 of the above technical solution, optionally, a term is designated in the first terminology system as the first term, and any term in the second terminology system is selected as the second term; calculation based on multiple similarities The algorithm calculates multiple similarity values between the first term and the second term; assigns a weight to each similarity value, and the multiple similarity values are respectively multiplied by the corresponding weights, and the product results are added to obtain multiple similarities The weighted summation similarity of the value; by changing the value of the second term multiple times, a calculation is performed every time the second term is changed to generate multiple weighted summation similarities. Among them, the multiple weighted summation similarities The maximum value of is used to indicate the matching degree between a specified term in the first terminology system and the second term in the second terminology system.

According to the computer program 402 of the above technical solution, optionally, a term is taken in the first terminology system as the first term, and a term in the second terminology system is taken as the second term; according to multiple similarity calculation algorithms Calculate multiple similarity values of the first term and the second term respectively; assign weights to each similarity value, multiple similarity values are respectively multiplied by the corresponding weights, and the product results are added to obtain multiple similarity values The weighted summation similarity degree; calculates by changing the value of the first term and the value of the second term multiple times to generate multiple weighted summation similarities; performs a summation operation on multiple weighted summation similarities To generate a total matching degree value, where the total matching degree value is used to indicate the matching degree between the first terminology system and the second terminology system.

According to the computer program 402 of the above technical solution, optionally, the calculation process further includes: in the step of assigning weights, performing a weighted summation of multiple similarity values through multiple weight combinations, so that each weight combination generates a corresponding one. Total matching degree value, multiple weight combinations generate multiple total matching degree values; record the maximum value of the multiple total matching degree values, which is used to represent the matching result of the first terminology system and the second terminology system.

According to the computer program 402 of the foregoing technical solution, optionally, multiple similarity calculation algorithms are used to calculate multiple similarity values between the first term and the second term, including: calculating the difference between the first term and the second term separately Cosine similarity value, Jaccard similarity value, and hash similarity value.

According to the computer program 402 of the above technical solution, optionally, calculating the cosine similarity value between the first term and the second term includes: segmenting the first term and the second term based on the word segmentation dictionary, and comparing the first term and the second term based on the stop word dictionary A term and a second term are used to remove stop words, and a first phrase list corresponding to the first term and a second phrase list corresponding to the second term are generated; the first phrase list and the second phrase list are encoded to obtain the corresponding The first word frequency vector in the first phrase list and the second word frequency vector corresponding to the second word list; calculate the cosine value between the first word frequency vector and the second word frequency vector, where the cosine value is the first word frequency vector and the second word frequency vector The cosine similarity value of the two-word frequency vector. The larger the cosine value, the higher the similarity.

According to the computer program 402 of the above technical solution, optionally, calculating the Jackard similarity value between the first term and the second term includes: segmenting the first term and the second term based on the word segmentation dictionary, and based on the stop word dictionary De-stop words for the first term and the second term, generate a first phrase list corresponding to the first term and a second phrase list corresponding to the second term; encode the first phrase list and the second phrase list, Obtain the first term frequency vector corresponding to the first phrase list and the second term frequency vector corresponding to the second phrase list; calculate the ratio of the intersection and union of the first term frequency vector and the second term frequency vector to obtain the Jeckard similarity Value, among which, the greater the Jaccard similarity value, the higher the similarity.

According to the computer program 402 of the above technical solution, optionally, calculating the hash similarity value between the first term and the second term includes: segmenting the first term and the second term based on the word segmentation dictionary, and pairing the first term and the second term based on the stop word dictionary. The first term and the second term are used to remove stop words, and a first phrase list corresponding to the first term and a second phrase list corresponding to the second term are generated; each of the first phrase list and the second phrase list is generated The word is converted into a hash value number string, the hash value number string is multiplied by the weight of the word to obtain the sequence string of each word; the sequence strings of multiple words in the first phrase list are added together to obtain the sequence corresponding to the first phrase The first term sequence string of the list, the sequence strings of multiple words in the second phrase list are added to obtain the second term sequence string corresponding to the second phrase list; the first term sequence string and the second term sequence are respectively The string is converted into a binary string; the Hamming distance between the binary string of the first term sequence string and the binary string of the second term sequence string is calculated; the hash similarity between the first term and the second term is determined according to the Hamming distance , Where the greater the hash similarity value, the higher the similarity. The calculation formula of the hash similarity is: S=1/(h+1), where S is the hash similarity and h is the Hamming distance.

Through the term matching method, device, terminal and computer-readable storage medium disclosed in the above embodiments, this application can realize automatic matching of terms between terminology systems (term dictionaries), replace manual operations, reduce error rates, and help promote medical treatment. Data integration, analysis and reuse.

The embodiments of the present application may be provided as methods, systems, or computer program products. Therefore, this application may adopt the form of a complete hardware embodiment, a complete software embodiment, or an embodiment combining software and hardware. Moreover, this application may use one or more computer-usable storage media (including but not limited to disk storage, Compact Disc-Read Only Memory (CD-ROM), and optical storage) containing computer-usable program codes. Etc.) in the form of a computer program product implemented on it.

This application is described with reference to flowcharts and/or block diagrams of methods, devices (systems), and computer program products according to embodiments of this application. Each process and/or block in the flowchart and/or block diagram, and the combination of processes and/or blocks in the flowchart and/or block diagram can be realized by computer program instructions. These computer program instructions can be provided to the processor of a general-purpose computer, special-purpose computer, embedded processor, or other programmable term matching device to generate a machine, so that the instructions executed by the processor of the computer or other programmable term matching device are generated It is a device that realizes the functions specified in one process or multiple processes in the flowchart and/or one block or multiple blocks in the block diagram.

These computer program instructions can also be stored in a computer-readable memory that can direct a computer or other programmable term matching equipment to work in a specific manner, so that the instructions stored in the computer-readable memory produce an article of manufacture including the instruction device. The device implements the functions specified in one process or multiple processes in the flowchart and/or one block or multiple blocks in the block diagram.

These computer program instructions can also be loaded on a computer or other programmable term matching equipment, so that a series of operation steps are executed on the computer or other programmable equipment to produce computer-implemented processing, so as to execute on the computer or other programmable equipment. The instructions provide steps for implementing the functions specified in one process or multiple processes in the flowchart and/or one block or multiple blocks in the block diagram.

The word "include" does not exclude the presence of unlisted parts or steps. The word "a" or "an" preceding a component does not exclude the presence of multiple such components. The application can be implemented by means of hardware including different components and by means of a suitably programmed computer. Multiple of the listed devices may be embodied by the same hardware item.

Claims

A term matching method including:

Calculate multiple similarity values between the first term and the second term according to multiple similarity calculation algorithms;

A weight is assigned to each similarity value, the multiple similarity values are respectively multiplied by the corresponding weights, and the product results are added to obtain the weighted sum similarity of the multiple similarity values, wherein the weighted calculation The similarity value is used to indicate the degree of matching between the first term and the second term.
The method according to claim 1, further comprising:

Specify a term in the first terminology system as the first term, and take a term in the second terminology system as the second term.
The method according to claim 2, after said obtaining the weighted sum similarity of the multiple similarity values, further comprising:

The value of the second term is changed multiple times, and each time the value of the second term is changed, a weighted summation similarity calculation is performed to generate multiple weighted summation similarities, wherein the multiple weighted The maximum value in the sum similarity is used to indicate the degree of matching between the first term and the second term in the second terminology system.
The method according to claim 2, after said obtaining the weighted sum similarity of the multiple similarity values, further comprising:

Change the value of the first term and the value of the second term multiple times, and perform a weighted sum similarity calculation every time the value of the first term and the value of the second term are changed , To generate multiple weighted sum similarities;

A summing operation is performed on the multiple weighted summation similarities to generate a total matching degree value, where the total matching degree value is used to represent the matching degree between the first terminology system and the second terminology system.
The method according to claim 4, wherein the weight is assigned to each similarity value, the multiple similarity values are respectively multiplied by the corresponding weights, and the product results are added to obtain the multiple similarity values The weighted summation similarity includes:

Performing a weighted summation on the multiple similarity values through multiple weight combinations to obtain multiple weighted sum similarities corresponding to each weight reorganization;

The performing a summing operation on the multiple weighted summation similarities to generate a total matching degree value includes:

Corresponding to the multiple weight combinations, performing a summation operation on the multiple weighted summation similarities respectively to generate multiple total matching degree values;

The maximum value of the plurality of total matching degree values is recorded, wherein the maximum value of the plurality of total matching degree values is used to represent the matching result of the first terminology system and the second terminology system.
The method according to any one of claims 1 to 5, wherein the calculating the multiple similarity values of the first term and the second term respectively according to multiple similarity calculation algorithms includes:

The cosine similarity value, the Jackard similarity value and the hash similarity value of the first term and the second term are respectively calculated.
The method according to claim 6, wherein said calculating the cosine similarity value of said first term and said second term comprises:

The first term and the second term are segmented based on the word segmentation dictionary, and the first term and the second term are removed based on the stop word dictionary to generate the corresponding A first phrase list and a second phrase list corresponding to the second term;

Encoding the first phrase list and the second phrase list to obtain a first term frequency vector corresponding to the first phrase list and a second term frequency vector corresponding to the second phrase list;

Calculate the cosine value between the first word frequency vector and the second word frequency vector, where the cosine value is the cosine similarity value of the first word frequency vector and the second word frequency vector.
The method according to claim 6, wherein the calculating the Jaccard similarity value of the first term and the second term comprises:

The first term and the second term are segmented based on the word segmentation dictionary, and the first term and the second term are removed based on the stop word dictionary to generate the corresponding A first phrase list and a second phrase list corresponding to the second term;

Encoding the first phrase list and the second phrase list to obtain a first term frequency vector corresponding to the first phrase list and a second term frequency vector corresponding to the second phrase list;

Calculate the ratio of the intersection and union of the first word frequency vector and the second word frequency vector to obtain the Jeckard similarity value.
The method according to claim 6, wherein said calculating the hash similarity value of the first term and the second term comprises:

The first term and the second term are segmented based on the word segmentation dictionary, and the first term and the second term are removed based on the stop word dictionary to generate the corresponding A first phrase list and a second phrase list corresponding to the second term;

Converting each word in the first phrase list and the second phrase list into a hash value digital string, and the hash value digital string is multiplied by the weight of the word to obtain a sequence string of each word;

The sequence strings of multiple words in the first phrase list are added together to obtain the first term sequence string corresponding to the first phrase list, and the sequence strings of multiple words in the second phrase list are combined with each other. Add to obtain the second term sequence string corresponding to the second phrase list;

Respectively converting the first term sequence string and the second term sequence string into a binary string;

Calculating the Hamming distance between the binary string of the first term sequence string and the binary string of the second term sequence string;

Determine the hash similarity value between the first term and the second term according to the Hamming distance.
A term matching device, comprising: a memory, a processor, and a program stored on the memory and running on the processor, the program being executed by the processor can be implemented as in claims 1 to 9 Any term matching method.
A terminal including:

The term matching device according to claim 10.
A computer-readable storage medium storing a computer program, wherein when the computer program is executed, the term matching method according to any one of claims 1 to 9 is implemented.