CN117725923A - Text matching method, device, equipment and medium - Google Patents

Text matching method, device, equipment and medium Download PDF

Info

Publication number
CN117725923A
CN117725923A CN202310715607.7A CN202310715607A CN117725923A CN 117725923 A CN117725923 A CN 117725923A CN 202310715607 A CN202310715607 A CN 202310715607A CN 117725923 A CN117725923 A CN 117725923A
Authority
CN
China
Prior art keywords
text
word
word segmentation
similarity
segmentation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310715607.7A
Other languages
Chinese (zh)
Inventor
李军伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xiaohongshu Technology Co ltd
Original Assignee
Xiaohongshu Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xiaohongshu Technology Co ltd filed Critical Xiaohongshu Technology Co ltd
Priority to CN202310715607.7A priority Critical patent/CN117725923A/en
Publication of CN117725923A publication Critical patent/CN117725923A/en
Pending legal-status Critical Current

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the application discloses a text matching method, a device, equipment and a medium, which are applied to the technical field of data processing. The method comprises the following steps: acquiring a first text and a second text to be matched, and determining the text similarity between the first text and the second text; acquiring a first text word segmentation set associated with a first text and a second text word segmentation set associated with a second text; respectively determining the word weight of each first text word in the first text and the word weight of each second text word in the second text; determining word segmentation similarity between the first text word segmentation set and the second text word segmentation set based on the word weight of each first text word segmentation and the word weight of each second text word segmentation; and determining the text matching degree between the first text and the second text through the text similarity and the word segmentation similarity. By adopting the embodiment of the application, the accuracy of text matching can be improved.

Description

Text matching method, device, equipment and medium
Technical Field
The present disclosure relates to the field of data processing technologies, and in particular, to a text matching method, apparatus, device, and medium.
Background
Text matching is a very typical task, such as search recall and ranking, search question-and-answer, etc. which are common in searches, and essentially belongs to the task of text matching, i.e. a piece of text is given as query text, and then the most relevant documents or answers are matched and then returned to the user. Therefore, how to determine the degree of matching between two texts is an important task. The existing matching method is to determine the similarity of two text sections by extracting text features, and the final similarity value can be used for knowing whether the content of the two text sections belongs to similar descriptions. The text matching mode has the problem of insufficient accuracy. Therefore, how to improve the accuracy of text matching is a urgent problem to be solved.
Disclosure of Invention
The embodiment of the application provides a text matching method, device, equipment and medium, which can improve the accuracy of text matching.
In one aspect, an embodiment of the present application provides a text matching method, where the method includes:
acquiring a first text and a second text to be matched, and determining the text similarity between the first text and the second text;
acquiring a first text word segmentation set associated with a first text and a second text word segmentation set associated with a second text; the first text word segmentation set comprises at least one first text word segmentation, and the second text word segmentation set comprises at least one second text word segmentation;
Respectively determining the word weight of each first text word in the first text and the word weight of each second text word in the second text;
determining word segmentation similarity between the first text word segmentation set and the second text word segmentation set based on the word weight of each first text word segmentation and the word weight of each second text word segmentation;
and determining the text matching degree between the first text and the second text through the text similarity and the word segmentation similarity.
In one aspect, an embodiment of the present application provides a text matching apparatus, including:
the acquisition module is used for acquiring a first text and a second text to be matched and determining the text similarity between the first text and the second text;
the acquisition module is also used for acquiring a first text word segmentation set associated with the first text and a second text word segmentation set associated with the second text; the first text word segmentation set comprises at least one first text word segmentation, and the second text word segmentation set comprises at least one second text word segmentation;
the processing module is used for respectively determining the word weight of each first text word in the first text and the word weight of each second text word in the second text;
the processing module is further used for determining word segmentation similarity between the first text word segmentation set and the second text word segmentation set based on the word weight of each first text word segmentation and the word weight of each second text word segmentation;
And the processing module is also used for determining the text matching degree between the first text and the second text through the text similarity and the word segmentation similarity.
In one aspect, an embodiment of the present application provides an electronic device, including a processor and a memory, where the memory is configured to store a computer program, and the computer program includes program instructions, and the processor is configured to invoke the program instructions to perform some or all of the steps in the above method.
In one aspect, embodiments of the present application provide a computer-readable storage medium storing a computer program comprising program instructions for performing part or all of the steps of the above method when executed by a processor.
Accordingly, according to one aspect of the present application, there is provided a computer program product or computer program comprising computer instructions which, when executed by a processor, implement some or all of the steps of the above method.
In the embodiment of the application, a first text and a second text to be matched can be obtained, and the text similarity between the first text and the second text is determined; the text similarity is the overall similarity between the first text and the second text, namely the coarse granularity similarity between the texts; acquiring a first text word segmentation set associated with a first text and a second text word segmentation set associated with a second text; respectively determining the word weight of each first text word in the first text and the word weight of each second text word in the second text; determining word segmentation similarity between the first text word segmentation set and the second text word segmentation set based on the word weight of each first text word segmentation and the word weight of each second text word segmentation; the text similarity is the overall similarity between the first text and the second text, namely the coarse granularity similarity between the texts; the text similarity is local similarity between text segmentation in the first text and text segmentation in the second text, namely fine granularity similarity between texts; determining the text matching degree between the first text and the second text through the text similarity and the word segmentation similarity; the text similarity between the first text and the second text can be comprehensively determined through the coarse granularity similarity and the fine granularity similarity, so that the accuracy and the reliability of the text matching degree can be improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings needed in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a schematic diagram of a text matching scenario provided in an embodiment of the present application;
fig. 2 is a schematic flow chart of a text matching method according to an embodiment of the present application;
fig. 3 is a schematic flow chart of a text matching method according to an embodiment of the present application;
FIG. 4 is a schematic diagram of a text matching process according to an embodiment of the present application;
fig. 5 is a schematic structural diagram of a text matching device according to an embodiment of the present application;
fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all, of the embodiments of the present application. All other embodiments, which can be made by one of ordinary skill in the art based on the embodiments herein without making any inventive effort, are intended to be within the scope of the present application.
The text matching method provided by the embodiment of the application is implemented in the electronic equipment, and the electronic equipment can be a server or a terminal. The server may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing cloud services, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, basic cloud computing services such as big data and an artificial intelligent platform. The terminal may be, but is not limited to, a smart phone, a tablet computer, a notebook computer, a desktop computer, etc.
A schematic diagram of a text matching scenario proposed based on the text matching method may be shown in fig. 1, and fig. 1 proposes a network architecture, where the network architecture may include a service server and a user terminal cluster, where the user terminal cluster may include one or more user terminals, and the number of user terminals in the user terminal cluster will not be limited here. A communication connection may exist between user terminals in a cluster of user terminals. Meanwhile, any user terminal in the user terminal cluster can be in communication connection with the service server, so that each user terminal in the user terminal cluster can perform data interaction with the service server through the communication connection. The communication connection is not limited to a connection manner, and may be directly or indirectly connected through a wired communication manner, may be directly or indirectly connected through a wireless communication manner, or may be other manners, which is not limited herein. In addition, it can be understood that the electronic device according to the embodiment of the present application may be the service server shown in fig. 1, or may be any one of the user terminals in the user terminal cluster shown in fig. 1.
For example, in the embodiment of the present application, the server may obtain the first text and the second text to be matched, and implement the matching degree determination of the text by using the text matching method provided by the present application. For example, the server may, when acquiring the first text 10 and the second text 11 to be matched, determine the text similarity 12 between the first text 10 and the second text 11; the text similarity 12 is a coarse granularity similarity between the first text 10 and the second text 11; acquiring a first text word segmentation set 13 obtained by word segmentation of a first text 10 and a second text word segmentation set 14 obtained by word segmentation of a second text 11; the first set of text tokens 13 comprises at least one first text token (e.g. 13a, 13b,..sub.13 n), the second set of text tokens 14 comprises at least one second text token (e.g. 14a, 14b,..sub.14 m); determining a word weight 15 (e.g., 15a, 15b,..15 n) of each first text segment in the first text 10 and a word weight 16 (e.g., 16a, 16b,..16 m) of each second text segment in the second text 11; determining word segmentation similarity 17 between the first text word segmentation set 13 and the second text word segmentation set 14 based on the word weight 15 and the word weight 16; the word segmentation similarity 17 is a fine granularity similarity between the first text 10 and the second text 11; the text matching degree 18 between the first text 10 and the second text 11 is determined by the text similarity 12 and the word segmentation similarity 17 of the plurality of granularities.
Alternatively, in some embodiments, the electronic device may perform the text matching method according to actual business requirements to achieve accurate text matching. The technical scheme can be applied to any text matching scene. For example, in an information retrieval scenario, the first text may be a text for information query, and the second text may be a retrieved text message, and the text message to be returned to the user is retrieved by determining a text matching degree between the first text and the second text. For another example, in the e-commerce recommendation scene, the first text may be a text for commodity query, the second text may be title text information of the e-commerce product for recommendation, and the e-commerce product to be recommended to the user is determined by determining a text matching degree between the first text and the second text.
Optionally, the data related to the present application, such as the first text, the second text, etc., may be stored in a database, or may be stored in a blockchain, such as by a blockchain distributed system, which is not limited in this application.
In the specific embodiment of the present application, when a scenario of acquiring related data such as user information is referred to, for example, acquiring a first text or a second text uploaded by a user, permission or consent of the user needs to be obtained. That is, when embodiments of the present application are applied to a particular product or technology, the collection, use and processing of relevant user data complies with relevant national and regional laws and regulations. For example, prompt information can be sent out in the form of an interactive interface to prompt a user about which data is collected or acquired, and the type, content and the like of the data can be prompted to the user in a list mode, and the relevant data can be further collected, processed and the like only after confirmation operation or instruction for allowing the data to be collected is received on the interactive interface.
It can be understood that the above scenario is merely an example, and does not constitute a limitation on the application scenario of the technical solution provided in the embodiments of the present application, and the technical solution of the present application may also be applied to other scenarios. For example, as one of ordinary skill in the art can know, with the evolution of the system architecture and the appearance of new service scenarios, the technical solutions provided in the embodiments of the present application are equally applicable to similar technical problems.
Based on the above description, the embodiments of the present application propose a text matching method, which may be performed by the above-mentioned electronic device. Referring to fig. 2, fig. 2 is a flow chart of a text matching method according to an embodiment of the present application.
As shown in fig. 2, the flow of the text matching method in the embodiment of the present application may include the following:
s101, acquiring a first text and a second text to be matched, and determining the text similarity between the first text and the second text.
The first text and the second text can be texts to be matched in any scene. For example, the first text may be a social media utterance of the user and the second text may be a merchandise title. For another example, the first text may be information to be retrieved entered by the user, and the second text may be textual information in the note. The type and source of the first text and the second text are not limited herein. According to the technical scheme, the matching degree of any two texts can be determined.
The method comprises the steps of determining the text similarity between a first text and a second text, namely obtaining a text coding vector of the first text and a text coding vector of the second text, calling a first text processing model to process the text coding vector of the first text to obtain a text feature vector of the first text, and calling the first text processing model to process the text coding vector of the second text to obtain a text feature vector of the second text; and determining the text similarity between the first text and the second text according to the text feature vector of the first text and the text feature vector of the second text. The first text processing model may be a transducer (a neural network) model, among other things. The text similarity is the overall similarity between the first text and the second text, i.e. the coarse-grained similarity between the first text and the second text.
S102, acquiring a first text word segmentation set associated with the first text and a second text word segmentation set associated with the second text.
The first text can be subjected to feature processing to obtain a first text word segmentation set associated with the first text, and the first text word segmentation set comprises at least one first text word. Similarly, feature processing can be performed on the second text to obtain second text word segments associated with the second text, and the second text word segment set comprises at least one second text word segment.
S103, determining the word weight of each first text word in the first text and the word weight of each second text word in the second text respectively.
Wherein the word weight may characterize the importance of the text segmentation in a text. The word weight determining principle and process of the first text word segmentation and the second text word segmentation are the same. The word weight determining method of the first text word is exemplified by determining a first semantic association degree between each first text word and the text word associated with each first text word, and determining a second semantic association degree between each first text word and the first text; the text word associated with any one of the first text word segments is a first text word segment in the first text word segment set except any one of the first text word segments; and determining the word weight of each first text word in the first text according to the first semantic association degree and the second semantic association degree corresponding to each first text word.
The determining of the first semantic association degree between each first text word and the text word associated with each first text word may be to obtain a pre-trained semantic feature extraction model, where the semantic feature extraction model may be a neural network model constructed by a multi-head attention layer, sequentially inputting each first text word into the semantic feature extraction model, performing word segmentation attention extraction on each first text word pair by a multi-head attention component in the semantic feature extraction model, and outputting a plurality of attention matrixes corresponding to a first text word set, where each attention moment matrix is used for: and indicating initial semantic association degrees between each first text word and other first text words, and carrying out merging processing on a plurality of attention moment matrixes to obtain a merged attention matrix, wherein the merged attention moment matrix is used for indicating the semantic association degrees between each first text word and other first text words. The number of the attention matrixes is not limited, and can be set according to actual situations.
The combined attention moment array obtained through the combination processing is used for representing the semantic association degree between any two first text segmentation words in the first text segmentation word set. For example, a second column of the first row in the merged attention matrix represents a degree of semantic association between a first text word and a second first text word in the first set of text words. The merging processing may be to perform average processing on the attention moment arrays, and use the attention moment array after the average processing as a merged attention matrix. Alternatively, the merging process may be a summation process of a plurality of attention moment matrices, and the attention moment matrix after the summation process is used as the merged attention matrix. Thus, a first degree of semantic association between each first text word and the associated text word may be determined by the merged attention moment array.
In addition, when determining a plurality of attention matrices through the multi-headed attention component, a plurality of content characterization information (i.e., feature information of the first text word) of each first text word is output first, and a semantic feature of each first text word may be determined according to the plurality of content characterization information of each first text word, for example, a mean result (or a summation result) of the plurality of content characterization information of each first text word is taken as the semantic feature of each first text word. The number of content characterization information of one first text word is the same as the number of attention matrices. Therefore, determining the second semantic association between each first text word and the first text may be determining the semantic features of the first text according to the semantic features of each first text word, and taking the feature correlation between the semantic features of each first text word and the semantic features of the first text as the second semantic association between each first text word and the first text. Wherein, an average value (or a summation result) of average value results of semantic features of each first text word may be taken as the semantic feature of the first text.
Therefore, determining the word weight of each first text word in the first text according to the first semantic association degree and the second semantic association degree corresponding to each first text word may be to respectively perform weighted summation on the first semantic association degree and the second semantic association degree corresponding to each first text word to obtain the target semantic association degree of each first text word, and determining the word weight of each first text word in the first text according to the mapping relationship between the target semantic association degree and the word weight. The weighted sum coefficients and the mapping relationship between the target semantic association degree and the word weight can be set according to experience values. It can be understood that the target semantic association degree can represent the association degree of a first text word in the first text, that is, the importance degree in the first text, and the higher the target semantic association degree is, the more the first text word can represent the core meaning in the first text, so that the word weight of the first text word can be determined according to the target semantic association degree.
Similarly, the word weight of the second text word in the second text may be determined in the manner described above.
S104, determining word segmentation similarity between the first text word segmentation set and the second text word segmentation set based on the word weight of each first text word segmentation and the word weight of each second text word segmentation.
The determining of the word segmentation similarity between the first text word segmentation set and the second text word segmentation set may be that the first text words in the first text word segmentation set are ordered according to the order of the word weights of the first text words from big to small, the ordered first text word segmentation set is obtained, the second text words in the second text word segmentation set are ordered according to the order of the word weights of the second text words from big to small, the ordered second text word segmentation set is obtained, the word segmentation feature vector of each first text word is determined through the ordered first text word segmentation set, the word segmentation feature vector of each second text word is determined through the ordered second text word segmentation set, and the word segmentation similarity between the first text word segmentation set and the second text word segmentation set is determined according to the word segmentation feature vector of each first text word and the word segmentation feature vector of each second text word segmentation. It can be understood that the word weight information of each first text word segment blended in the word segment feature vector of each first text word segment determined based on the ordered first text word segment set, and the word weight information of each second text word segment blended in the word segment feature vector of each second text word segment determined based on the ordered second text word segment set. The word segmentation feature vector of each first text word and the word segmentation feature vector of each second text word are determined by the related description of the following embodiments.
The word segmentation similarity between the first text word segmentation set and the second text word segmentation set is specifically determined according to the word segmentation feature vector of each first text word segmentation and the word segmentation feature vector of each second text word segmentation, the word segmentation similarity between each first text word segmentation and each second text word segmentation is determined according to the word segmentation feature vector of each first text word segmentation and the word segmentation feature vector of each second text word segmentation, the maximum word segmentation similarity is selected from word segmentation similarity between any first text word segmentation and each second text word segmentation as target word segmentation similarity corresponding to any first text word segmentation, the maximum word segmentation similarity is selected from word segmentation similarity between any second text word segmentation and each first text word segmentation as target word segmentation similarity corresponding to any second text word segmentation, when each first text word segmentation is used as any first text word segmentation, the target word segmentation similarity of each first text word segmentation is obtained, and the target word segmentation similarity between any second text word segmentation is determined according to the target word similarity of each second word segmentation and each second word segmentation.
The determining of the word segmentation similarity between the first text word segmentation set and the second text word segmentation set may be to use the maximum target word segmentation similarity in the target word segmentation similarity of each first text word segmentation and the maximum target word segmentation similarity in the target word segmentation similarity of each second text word segmentation as the word segmentation similarity between the first text word segmentation set and the second text word segmentation set. Alternatively, the average target word segmentation similarity among the target word segmentation similarities of the respective first text word segments and the average target word segmentation similarity among the word segmentation similarities of the respective second text word segments may be used as the word segmentation similarity between the first text word segment set and the second text word segment set. Alternatively, the target word segmentation similarity of each first text word segment and the target word segmentation similarity of each second text word segment may be used as word segmentation similarity between the first text word segment set and the second text word segment set.
S105, determining the text matching degree between the first text and the second text through the text similarity and the word segmentation similarity.
The text similarity and the word segmentation similarity can be weighted and summed to obtain the text matching degree between the first text and the second text. Wherein the weighting coefficients used for the weighted summation may be set by empirical values. It can be appreciated that the text matching degree can be applied to a recommended scene, for example, a first text is a query text, a second text is a recall text, a second text matched with the first text can be determined through the text word segmentation matching degree, and the second text matched with the first text is used as a text to be recalled.
Therefore, the text matching degree between the first text and the second text can be comprehensively determined through the similarity of various granularities (namely, the text similarity of coarse granularity and the word segmentation similarity of fine granularity). The method can combine the coarse-granularity semantic relevance, can further determine the text matching degree between two texts through the similarity degree of text segmentation, and can comprehensively evaluate the text matching degree in multiple aspects so as to improve the reliability and generalization of the text matching degree.
In the embodiment of the application, a first text and a second text to be matched can be obtained, and the text similarity between the first text and the second text is determined; the text similarity is the overall similarity between the first text and the second text, namely the coarse granularity similarity between the texts; acquiring a first text word segmentation set associated with a first text and a second text word segmentation set associated with a second text; respectively determining the word weight of each first text word in the first text and the word weight of each second text word in the second text; determining word segmentation similarity between the first text word segmentation set and the second text word segmentation set based on the word weight of each first text word segmentation and the word weight of each second text word segmentation; the text similarity is the overall similarity between the first text and the second text, namely the coarse granularity similarity between the texts; the text similarity is local similarity between text segmentation in the first text and text segmentation in the second text, namely fine granularity similarity between texts; determining the text matching degree between the first text and the second text through the text similarity and the word segmentation similarity; the text similarity between the first text and the second text can be comprehensively determined through the coarse granularity similarity and the fine granularity similarity, so that the accuracy and the reliability of the text matching degree can be improved.
Referring to fig. 3, fig. 3 is a flowchart of a text matching method according to an embodiment of the present application, where the method may be performed by the above-mentioned electronic device. As shown in fig. 3, the flow of the text matching method in the embodiment of the present application may include the following:
s201, acquiring a first text and a second text to be matched, and determining the text similarity between the first text and the second text.
S202, acquiring a first text word segmentation set associated with a first text and a second text word segmentation set associated with a second text.
S203, determining the word weight of each first text word in the first text and the word weight of each second text word in the second text respectively.
S204, ordering the first text words in the first text word segmentation set according to the order of the word weights of the first text words from large to small to obtain an ordered first text word segmentation set, and ordering the second text words in the second text word segmentation set according to the order of the word weights of the second text words from large to small to obtain an ordered second text word segmentation set. The specific implementation of steps S201 to S202 may be referred to the related description of the above embodiments, which is not repeated herein.
S205, determining word segmentation similarity between the first text word segmentation set and the second text word segmentation set based on the ordered first text word segmentation set and the ordered second text word segmentation set.
The word segmentation similarity between the first text word segmentation set and the second text word segmentation set is determined by acquiring word segmentation encoding vectors of first text words in the ordered first text word segmentation set, acquiring word segmentation encoding vectors of second text words in the ordered second text word segmentation set, sequentially inputting the word segmentation encoding vectors of the first text words in the ordered first text word segmentation set into a second text processing model, performing feature processing on the word segmentation encoding vectors of the first text words by the second text processing model based on word segmentation positions of the first text words in the ordered first text word segmentation set to obtain word segmentation feature vectors of the first text words, sequentially inputting the word segmentation encoding vectors of the second text words in the ordered second text word segmentation set into the second text processing model, performing feature processing on the word segmentation encoding vectors of the second text words by the second text processing model based on word segmentation positions of the second text words in the ordered second text word segmentation set to obtain feature vectors of the second text words, and determining the feature vectors of the second text words according to the feature vectors of the first text words. Wherein the second text word segmentation model may be constructed based on an encoder in a transducer model, a neural network model.
The determining of the word segmentation similarity between the first text word segmentation set and the second text word segmentation set according to the word segmentation feature vector of each first text word segmentation and the word segmentation feature vector of each second text word segmentation may be determining of the word segmentation similarity between each first text word segmentation and each second text word segmentation according to the word segmentation feature vector of each first text word segmentation and the word segmentation feature vector of each second text word segmentation; respectively determining target word segmentation similarity corresponding to each first text word segmentation from word segmentation similarity between each first text word segmentation and each second text word segmentation, and respectively determining target word segmentation similarity corresponding to each second text word segmentation from word segmentation similarity between each first text word segmentation and each second text word segmentation; the target word segmentation similarity corresponding to any one of the first text word segmentation is the maximum word segmentation similarity in word segmentation similarity between any one of the first text word segmentation and each of the second text word segmentation; the target word segmentation similarity corresponding to any one of the second text word segments is the maximum word segmentation similarity in word segmentation similarity between any one of the second text word segments and each of the first text word segments; and taking the target word segmentation similarity corresponding to each first text word segmentation and the target word segmentation similarity corresponding to each second text word segmentation as word segmentation similarity between the first text word segmentation set and the second text word segmentation set. That is, the second text word segment most similar to each of the first text words is determined from the second text word segment set, and the first text word segment most similar to each of the second text word segments is determined from the first text word segment set.
Wherein the word segmentation similarity between two text word segments can be determined based on the feature distance between the word segmentation feature vectors of the two text word segments. For example, the word segmentation similarity between two text word segments is determined through the mapping relation between the feature distance and the word segmentation similarity. It can be understood that the smaller the feature distance between the word segmentation feature vectors of two text words, the greater the word segmentation similarity between the two text words; the larger the feature distance between the word segmentation feature vectors of the two text words, the smaller the word segmentation similarity between the two text words. The target feature distance corresponding to each first text word is determined from the feature distance between each first text word and each second text word, and the target feature distance corresponding to any one of the first text words is the minimum feature distance of the feature distance between any one of the first text words and each second text word. And respectively determining the target word segmentation similarity corresponding to each second text word from word segmentation similarity between each second text word and each first text word, namely respectively determining the target feature distance corresponding to each second text word from the feature distance between each second text word and each first text word, wherein the target feature distance corresponding to any one second text word is the minimum feature distance of the feature distance between any one second text word and each first text word.
And similarly, taking the target word segmentation similarity corresponding to each first text word segmentation and the target word segmentation similarity corresponding to each second text word segmentation as word segmentation similarity between the first text word segmentation set and the second text word segmentation set, namely taking the target feature distance corresponding to each first text word segmentation and the target feature distance corresponding to each second text word segmentation as feature distance between the first text word segmentation set and the second text word segmentation set.
S206, determining the text matching degree between the first text and the second text through the text similarity and the word segmentation similarity.
The determining the text matching degree may be to use the sum of the text similarity and the word segmentation similarity as the text matching degree between the first text and the second text. It will be appreciated that the text similarity may be derived by feature distances between the text feature vector of the first text and the text feature vector of the second text. For example, the text similarity between two texts is determined by the mapping relationship between the feature distance and the text similarity. It can be appreciated that the smaller the feature distance between text feature vectors of two texts, the greater the text similarity between the two texts; the larger the feature distance between the text feature vectors of the two texts, the smaller the text similarity between the two texts.
Therefore, determining the text matching degree between the first text and the second text by the text similarity and the word segmentation similarity may be understood as determining the text matching degree between the first text and the second text by the target feature distance corresponding to each first text word segment, the target feature distance corresponding to each second text word segment, and the feature distance between the first text and the second text. For example, the sum of the target feature distance corresponding to each first text word, the target feature distance corresponding to each second text word and the feature distance between the first text and the second text is used as the target feature distance between the first text and the second text, and the text matching degree between the first text and the second text is determined according to the target feature distance between the first text and the second text.
In some embodiments, determining the text matching degree between the first text and the second text through the text similarity and the word segmentation similarity may be determining a weighting coefficient corresponding to each first text word according to a word weight of a second text word associated with a target word segmentation similarity corresponding to each first text word, and determining a weighting coefficient corresponding to each second text word according to a word weight of a first text word associated with a target word segmentation similarity corresponding to each second text word; the second text word associated with the target word segmentation similarity corresponding to any one of the first text word is: the word segmentation similarity with any one of the first text word segmentation is the second text word segmentation with the maximum word segmentation similarity; the first text word associated with the target word segmentation similarity corresponding to any one of the second text word is: the word segmentation similarity with any second text word segmentation is the first text word segmentation with the maximum word segmentation similarity; according to the weighting coefficients corresponding to the first text word and the weighting coefficients corresponding to the second text word, carrying out weighted summation on the target word segmentation similarity corresponding to the first text word and the target word segmentation similarity corresponding to the second text word to obtain word segmentation matching degree; and determining the text matching degree between the first text and the second text through the text similarity and the word segmentation matching degree.
It can be understood that when the word weight of the second text word most similar to the first text word is larger, the first text word may be considered as the more important first text word, so the weighting coefficient corresponding to the first text word may be higher; when the word weight of a second text word which is most similar to a first text word is smaller, the first text word can be regarded as a less important first text word, so that the weighting coefficient corresponding to the first text word can be lower; correspondingly, when the word weight of the first text word segment most similar to one second text word segment is larger, the second text word segment can be regarded as a more important second text word segment, so that the weighting coefficient corresponding to the second text word segment can be higher; when the word weight of a first text word segment that is most similar to a second text word segment is smaller, the second text word segment may be considered a less important second text word segment, and thus the weighting coefficient corresponding to the second text word segment may be lower. The weighting coefficients corresponding to the first text word and the weighting coefficients corresponding to the second text word can be determined according to the mapping relation between the word weights and the weighting coefficients.
The text matching degree between the first text and the second text is determined through the text similarity and the word segmentation matching degree, and the text matching degree is obtained by carrying out weighted summation on the text similarity and the word segmentation matching degree. Wherein the weighted sum coefficient may be set according to an empirical value.
It can be understood that determining the text matching degree may be determining a weighting coefficient corresponding to each first text word according to the word weight of the second text word associated with the target feature distance corresponding to each first text word, and determining a weighting coefficient corresponding to each second text word according to the word weight of the first text word associated with the target feature distance corresponding to each second text word; the second text word associated with the target feature distance corresponding to any one of the first text word is: the feature distance between the second text word segmentation and any one of the first text word segmentation is the largest feature distance; the first text word associated with the target feature distance corresponding to any one of the second text word is: the feature distance between the first text word segmentation and any one of the second text word segmentation is the first text word segmentation with the largest feature distance; weighting and summing the target feature distance corresponding to each first text word and the target feature distance corresponding to each second text word according to the weighting coefficient corresponding to each first text word and the weighting coefficient corresponding to each second text word to obtain word feature distance; and determining a target feature distance between the first text and the second text through the feature distance between the first text and the second text and the word segmentation feature distance, and determining the text matching degree between the first text and the second text according to the target feature distance between the first text and the second text.
Further, as can be seen from the above, the text similarity is determined by the first text processing model and the second text processing model. The first text processing model and the second text processing model may be co-trained. For example, a sample text pair carrying a text matching degree label is obtained, wherein the sample text pair comprises two matched sample texts, namely a first sample text and a second sample text; invoking a first initial text processing model to determine sample text similarity between the first sample text and the second sample text; respectively determining the word weight of each first sample text word in the first sample text word set, and respectively determining the word weight of each second sample text word in the second sample text word set; invoking a second initial text processing model, and determining sample word segmentation similarity between the first sample text word segmentation set and the second sample text word segmentation set based on the word weights of the first sample text word segments and the word weights of the second sample text word segments; and determining a sample text matching degree between the first sample text and the second sample text according to the sample text similarity and the sample word segmentation similarity, determining a model loss value according to the sample text matching degree, for example, determining the model loss value according to the sample text matching degree and the text matching degree label, and iteratively training a first initial text processing model and a second initial text processing model through the model loss value to obtain the first text processing model and the second text processing model.
The procedure and principle of determining the similarity of the sample text between the first sample text and the second sample text by the first initial text processing model can be seen from the above related description. The process and principle of determining the similarity of the sample word segmentation between the first sample word segmentation set and the second sample word segmentation set based on the word weight of each first sample text word segmentation and the word weight of each second sample text word segmentation can be found in the above-mentioned related description.
When the first text processing model and the second text processing model are trained, one sample pair can be used for training the first text processing model and the second text processing model once, and one batch of sample text pairs can be used for training the first text processing model and the second text processing model once. When training the first text processing model and the second text processing model once with a batch of sample text pairs, it may be specifically that a sample text pair set is obtained, where a sample text pair set includes a plurality of sample text pairs, and a sample text pair includes a first sample text and a second sample text having a matching relationship. It is understood that the first sample text and the second sample text in one sample text pair are positive samples and the first sample text and the second sample text in a different sample text pair are negative samples. That is, a positive sample pair and a negative sample pair may be constructed from a plurality of sample text pairs, with both sample texts in the positive sample pair from the same sample pair and both sample texts in the negative sample pair from different sample pairs. In addition, when determining the model loss value, a first loss value may be determined according to the sample text similarity, a second loss value may be determined according to the sample word segmentation similarity, and the model loss value may be determined according to the first loss value and the second loss value.
Wherein the loss function L for the first loss value c The method comprises the following steps:
wherein D is pos Is a set of positive sample pairs, i is an index of positive sample pairs, D neg Is the set of negative sample pairs, k, l is the index of the negative sample; w is a model parameter, L 2 Is a regular loss function, λ is a hyper-parametric weight factor. It will be appreciated that (A) i ,B i ) Representing a first sample text and a second sample text from the same sample text pair. (A) k ,B l ) Representing a first sample text and a second sample text from different sample text pairs. The contrast learning of the model can be achieved by positive and negative samples. A is that i Sample text feature vector representing first sample text output by first initial text processing model, B i Sample text feature vectors representing second sample text output by the first initial text processing model. It will be appreciated that cos (A i ,B i ) The feature distance between the first sample text and the second sample text determined by the sample text feature vector of the first sample text and the sample text feature vector of the second sample text may be represented.
Wherein the loss function L for the second loss value s The method comprises the following steps:
/>
wherein R represents a batch of pairs of samples, C A,i Target feature distance corresponding to the ith first sample text word in the first sample text word set representing the first sample text Representing a characteristic distance between an ith first sample text word and each second sample text word, respectively; c (C) B,j Target feature distance (++f) corresponding to the ith second sample text word in the second sample text word set representing the second sample text>Representing the characteristic distance between the ith second sample text segment and each first sample text segment, respectively).
Wherein, the loss function loss for the model loss value is:
loss=ω 1 L c2 L s
wherein omega 1 And omega 2 Representing the weight factor. The first initial text processing model and the second initial text processing model may be trained by the model penalty values. Text similarity can be determined through a first text processing model obtained through training, and word segmentation similarity can be determined through a second text processing model obtained through training. The text matching degree can be determined according to the text similarity and the word segmentation similarity. It can be appreciated that the training mode can be used for realizing model training by constructing matched sample text pairs without labeling text matching degree.
For example, the set of sample text pairs includes sample text pair 1, sample text pair 2, and sample text pair 3, sample text pair 1 includes first sample text 1 and second sample text 1, sample text pair 2 includes first sample text 2 and second sample text 2, and sample text pair 3 includes first sample text 3 and second sample text 3; constructing a positive sample pair and a negative sample pair based on the sample text pair set, wherein a first sample text and a second sample text in the positive sample pair are from the same sample text pair, and a first sample text and a second sample text in the negative sample pair are from different sample pairs; determining text feature vectors of the first sample text 1-3 and the second sample text 1-3 through a first initial text processing model, determining a feature distance between the first sample text and the second sample text in a positive sample pair and a feature distance between the first sample text and the second sample text in a negative sample pair based on the text feature vectors of the first sample text 1-3 and the second sample text 1-3, and determining a first loss value based on the feature distance corresponding to the positive sample pair and the feature distance corresponding to the negative sample pair; taking a text pair 1 as an example, determining word segmentation feature vectors of all first text words 1 corresponding to the first text 1 and word segmentation feature vectors of all second text words 1 corresponding to the second sample text 2 through a second initial text processing model, and determining feature distances between all first text words 1 and all second text words 1 based on the word segmentation feature vectors of all first text words 1 and the word segmentation feature vectors of all second text words 1; selecting a minimum feature distance from the feature distance between any one of the first text word 1 and each of the second text word 1 as a target feature distance corresponding to any one of the first text word 1; determining a second loss value according to the target feature distance corresponding to each first text word of each first sample text and the target feature distance corresponding to each second text word of each second sample text; training the first initial text processing model and the second initial text processing model towards the direction of reducing the first loss value and the second loss value to obtain the first text processing model and the second text processing model.
For example, as shown in fig. 4, fig. 4 is a schematic diagram of a text matching process provided in an embodiment of the present application; the method comprises the steps of obtaining a first text and a second text D (A, B) to be matched, obtaining a text encoding vector of the first text and a text encoding vector of the second text, for example, performing word segmentation processing on the first text to obtain a first text word segment corresponding to the first text, and obtaining a first text word segmentWord segmentation encoding vectors of words, and summing the word segmentation encoding vectors of the first text segmentation to obtain text encoding vectors of the first text; inputting the text encoding vector of the first text and the text encoding vector of the second text into a first text processing model to obtain a text feature vector of the first text and a text feature vector of the second text, and determining text similarity (namely feature distance) according to the text feature vector of the first text and the text feature vector of the second text; acquiring a first text word segmentation set A= [ a ] of a first text 1 ,a 2 ,...,a n ]Second text word segmentation set b= [ B ] of second text 1 ,b 2 ,...,b n ]The word weight of each first text word segmentation and the word weight of each second text word segmentation are obtained, and the first text word segmentation set and the second text word segmentation set are ordered according to the order of the word weights from large to small, so that the ordered first text word segmentation set and the ordered second text word segmentation set are obtained; sequentially inputting text encoding vectors of the first text words in the ordered first text word segmentation set into a second text processing model to obtain word segmentation feature vectors of each first text word segmentation, and sequentially inputting text encoding vectors of the second text words in the ordered second text word segmentation set into the second text processing model to obtain word segmentation feature vectors of each second text word segmentation; determining word segmentation similarity (i.e., feature distance) between each first text word and each second text word based on the word segmentation feature vector of each first text word and the word segmentation feature vector of each second text word, further determining target word similarity (maximum word segmentation similarity), i.e., target feature distance (minimum feature distance) corresponding to each first text word, and target word similarity (maximum word similarity), i.e., target feature distance (minimum feature distance) corresponding to each second text word, thereby obtaining word segmentation similarity (i.e., feature distance) between the first text word set and the second text word set, and determining text matching degree between the first text and the second text through the word segmentation similarity (i.e., feature distance) between the first text word set and the second text word set.
In the embodiment of the application, a first text and a second text to be matched can be obtained, and the text similarity between the first text and the second text is determined; the text similarity is the overall similarity between the first text and the second text, namely the coarse granularity similarity between the texts; acquiring a first text word segmentation set associated with a first text and a second text word segmentation set associated with a second text; respectively determining the word weight of each first text word in the first text and the word weight of each second text word in the second text; ordering the first text word fragments in the first text word fragments according to the order of the word weights of the first text word fragments from large to small to obtain an ordered first text word fragment set, and ordering the second text word fragments in the second text word fragments according to the order of the word weights of the second text word fragments from large to small to obtain an ordered second text word fragment set; determining word segmentation similarity between the first text word segmentation set and the second text word segmentation set based on the ordered first text word segmentation set and the ordered second text word segmentation set; the text similarity is the overall similarity between the first text and the second text, namely the coarse granularity similarity between the texts; the text similarity is local similarity between text segmentation in the first text and text segmentation in the second text, namely fine granularity similarity between texts; determining the text matching degree between the first text and the second text through the text similarity and the word segmentation similarity; the text similarity between the first text and the second text can be comprehensively determined through the coarse granularity similarity and the fine granularity similarity, so that the accuracy and the reliability of the text matching degree can be improved.
Referring to fig. 5, fig. 5 is a schematic structural diagram of a text matching device provided in the present application. It should be noted that, the text matching device shown in fig. 5 is used to perform the method of the embodiment shown in fig. 2 and 3 of the present application, for convenience of explanation, only a portion relevant to the embodiment of the present application is shown, and specific technical details are not disclosed, and reference is made to the embodiment shown in fig. 2 and 3 of the present application. The text matching apparatus 500 may include: an acquisition module 501 and a processing module 502. Wherein:
an obtaining module 501, configured to obtain a first text and a second text to be matched, and determine a text similarity between the first text and the second text;
the obtaining module 501 is further configured to obtain a first text word segmentation set associated with a first text and a second text word segmentation set associated with a second text; the first text word segmentation set comprises at least one first text word segmentation, and the second text word segmentation set comprises at least one second text word segmentation;
a processing module 502, configured to determine a word weight of each first text word in the first text and a word weight of each second text word in the second text;
the processing module 502 is further configured to determine a word segmentation similarity between the first text segmentation set and the second text segmentation set based on the word weight of each first text segmentation and the word weight of each second text segmentation;
The processing module 502 is further configured to determine a text matching degree between the first text and the second text through the text similarity and the word segmentation similarity.
The obtaining module 501, when used for determining the text similarity between the first text and the second text, is specifically configured to:
acquiring a text encoding vector of a first text and a text encoding vector of a second text;
calling a first text processing model to perform text processing on the text encoding vector of the first text to obtain a text feature vector of the first text, and calling the first text processing model to perform text processing on the text encoding vector of the second text to obtain a text feature vector of the second text;
and determining the text similarity between the first text and the second text according to the text feature vector of the first text and the text feature vector of the second text.
The processing module 502 is specifically configured to, when determining a word weight of each first text word in the first text word segmentation set, respectively:
respectively determining first semantic association degrees between each first text word and text word associated with each first text word, and respectively determining second semantic association degrees between each first text word and the first text; the text word associated with any one of the first text word segments is a first text word segment in the first text word segment set except any one of the first text word segments;
And determining the word weight of each first text word in the first text according to the first semantic association degree and the second semantic association degree corresponding to each first text word.
The processing module 502 is specifically configured to, when determining the word segmentation similarity between the first text segmentation word set and the second text segmentation word set based on the word weight of each first text segmentation word and the word weight of each second text segmentation word:
ordering the first text word fragments in the first text word fragments according to the order of the word weights of the first text word fragments from large to small to obtain an ordered first text word fragment set, and ordering the second text word fragments in the second text word fragments according to the order of the word weights of the second text word fragments from large to small to obtain an ordered second text word fragment set;
acquiring word segmentation coding vectors of first text words in the ordered first text word segmentation set, and acquiring word segmentation coding vectors of second text words in the ordered second text word segmentation set;
sequentially inputting word segmentation coding vectors of the first text words in the ordered first text word segmentation set into a second text processing model, and performing feature processing on the word segmentation coding vectors of the first text words by the second text processing model based on word segmentation positions of the first text words in the ordered first text word segmentation set to obtain word segmentation feature vectors of the first text words;
Sequentially inputting word segmentation encoding vectors of the second text words in the ordered second text word segmentation set into a second text processing model, and performing feature processing on the word segmentation encoding vectors of the second text words by the second text processing model based on word segmentation positions of the second text words in the ordered second text word segmentation set to obtain word segmentation feature vectors of the second text words;
and determining the word segmentation similarity between the first text word segmentation set and the second text word segmentation set according to the word segmentation feature vector of each first text word segmentation and the word segmentation feature vector of each second text word segmentation.
The processing module 502 is specifically configured to, when determining the word segmentation similarity between the first text word segmentation set and the second text word segmentation set according to the word segmentation feature vector of each first text word segmentation and the word segmentation feature vector of each second text word segmentation:
determining word segmentation similarity between each first text word segment and each second text word segment according to the word segmentation feature vector of each first text word segment and the word segmentation feature vector of each second text word segment;
respectively determining target word segmentation similarity corresponding to each first text word segmentation from word segmentation similarity between each first text word segmentation and each second text word segmentation, and respectively determining target word segmentation similarity corresponding to each second text word segmentation from word segmentation similarity between each first text word segmentation and each second text word segmentation; the target word segmentation similarity corresponding to any one of the first text word segmentation is the maximum word segmentation similarity in word segmentation similarity between any one of the first text word segmentation and each of the second text word segmentation; the target word segmentation similarity corresponding to any one of the second text word segments is the maximum word segmentation similarity in word segmentation similarity between any one of the second text word segments and each of the first text word segments;
And taking the target word segmentation similarity corresponding to each first text word segmentation and the target word segmentation similarity corresponding to each second text word segmentation as word segmentation similarity between the first text word segmentation set and the second text word segmentation set.
The processing module 502 is specifically configured to, when determining a text matching degree between the first text and the second text according to the text similarity and the word segmentation similarity:
determining a weighting coefficient corresponding to each first text word according to the word weight of the second text word associated with the target word segmentation similarity corresponding to each first text word, and determining a weighting coefficient corresponding to each second text word according to the word weight of the first text word associated with the target word segmentation similarity corresponding to each second text word; the second text word associated with the target word segmentation similarity corresponding to any one of the first text word is: the word segmentation similarity with any one of the first text word segmentation is the second text word segmentation with the maximum word segmentation similarity; the first text word associated with the target word segmentation similarity corresponding to any one of the second text word is: the word segmentation similarity with any second text word segmentation is the first text word segmentation with the maximum word segmentation similarity;
According to the weighting coefficients corresponding to the first text word and the weighting coefficients corresponding to the second text word, carrying out weighted summation on the target word segmentation similarity corresponding to the first text word and the target word segmentation similarity corresponding to the second text word to obtain word segmentation matching degree;
and determining the text matching degree between the first text and the second text through the text similarity and the word segmentation matching degree.
The text similarity is determined through a first text processing model and a second text processing model; the processing module 502 is further configured to:
obtaining a sample text pair; the sample text pair comprises a first sample text and a second sample text;
invoking a first initial text processing model to determine sample text similarity between the first sample text and the second sample text;
respectively determining the word weight of each first sample text word in the first sample text word set, and respectively determining the word weight of each second sample text word in the second sample text word set;
invoking a second initial text processing model, and determining sample word segmentation similarity between the first sample text word segmentation set and the second sample text word segmentation set based on the word weights of the first sample text word segments and the word weights of the second sample text word segments;
And determining sample text matching degree between the first sample text and the second sample text according to the sample text similarity and the sample word segmentation similarity, determining a model loss value according to the sample text matching degree, and iteratively training a first initial text processing model and a second initial text processing model through the model loss value to obtain the first text processing model and the second text processing model.
For specific implementation manners of the acquiring module and the processing module, reference may be made to the description of the foregoing embodiments, and details will not be further described herein. It should be understood that the description of the beneficial effects obtained by the same method will not be repeated.
Referring to fig. 6, fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present application. As shown in fig. 6, the electronic device 600 includes: at least one processor 601, a memory 602. Optionally, the electronic device may further comprise a network interface. The processor 601, the memory 602, and the network interface are configured to receive and send messages under the control of the processor 601, the memory 602 is configured to store a computer program, the computer program includes program instructions, and the processor 601 is configured to execute the program instructions stored in the memory 602. Wherein the processor 601 is configured to invoke the program instructions to perform the above method.
The memory 602 may include volatile memory (RAM), such as random-access memory (RAM); the memory 602 may also include a non-volatile memory (non-volatile memory), such as a flash memory (flash memory), a Solid State Drive (SSD), etc.; the memory 602 may also include a combination of the types of memory described above.
The processor 601 may be a central processing unit (central processing unit, CPU). In one embodiment, the processor 601 may also be a graphics processor (Graphics Processing Unit, GPU). The processor 601 may also be a combination of a CPU and a GPU. The processor 601 may be configured to invoke the device control application stored in the memory 602 to perform the description of the text matching method in the embodiment corresponding to fig. 2 and 3, and may also perform the description of the text matching device in the embodiment corresponding to fig. 5, which is not described herein. In addition, the description of the beneficial effects of the same method is omitted.
In specific implementation, the device, the processor, the memory, etc. described in the embodiments of the present application may perform the implementation described in the foregoing method embodiments, or may perform the implementation described in the embodiments of the present application, which is not described herein again.
The embodiment of the application further provides a computer (readable) storage medium, where a computer program is stored, where the computer program includes program instructions, where the program instructions, when executed by a processor, enable the processor to perform some or all of the steps performed in the foregoing method embodiments. The computer storage medium may be volatile or nonvolatile. The computer-readable storage medium may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function, and the like; the storage data area may store data created from the use of blockchain nodes, and the like.
Embodiments of the present application provide a computer program product, which may include a computer program, where the computer program may implement some or all of the steps of the above method when executed by a processor, and is not described herein.
References herein to "a plurality" means two or more. "and/or", describes an association relationship of an association object, and indicates that there may be three relationships, for example, a and/or B, and may indicate: a exists alone, A and B exist together, and B exists alone. The character "/" generally indicates that the context-dependent object is an "or" relationship.
Those skilled in the art will appreciate that implementing all or part of the above-described embodiment methods may be accomplished by way of a computer program stored in a computer storage medium, which may be a computer-readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a random-access Memory (Random Access Memory, RAM), or the like.
The above disclosure is only a few examples of the present application, and it is not intended to limit the scope of the claims, and those of ordinary skill in the art will understand that all or a portion of the above-described embodiments may be implemented and that equivalent changes may be made in the claims of the present application while still falling within the scope of the present application.

Claims (10)

1. A method of text matching, the method comprising:
acquiring a first text and a second text to be matched, and determining the text similarity between the first text and the second text;
acquiring a first text word segmentation set associated with the first text and a second text word segmentation set associated with the second text; the first text word segmentation set comprises at least one first text word segmentation, and the second text word segmentation set comprises at least one second text word segmentation;
Respectively determining the word weight of each first text word in the first text and the word weight of each second text word in the second text;
determining word segmentation similarity between the first text word segmentation set and the second text word segmentation set based on the word weight of each first text word segmentation and the word weight of each second text word segmentation;
and determining the text matching degree between the first text and the second text through the text similarity and the word segmentation similarity.
2. The method of claim 1, wherein determining a text similarity between the first text and the second text comprises:
acquiring a text encoding vector of the first text and a text encoding vector of the second text;
calling a first text processing model to perform text processing on the text encoding vector of the first text to obtain a text feature vector of the first text, and calling the first text processing model to perform text processing on the text encoding vector of the second text to obtain a text feature vector of the second text;
and determining the text similarity between the first text and the second text according to the text feature vector of the first text and the text feature vector of the second text.
3. The method of claim 1, wherein the determining the word weight in the first text for each first text word in the first text word set comprises:
determining a first semantic association degree between each first text word and each text word associated with each first text word, and determining a second semantic association degree between each first text word and each first text word; the text word associated with any one of the first text word segments is a first text word segment in the first text word segment set except for the any one of the first text word segments;
and determining the word weight of each first text word in the first text according to the first semantic association degree and the second semantic association degree corresponding to each first text word.
4. The method of claim 1, wherein determining word segmentation similarity between the first set of text segments and the second set of text segments based on the word weights of the respective first text segments and the word weights of the respective second text segments comprises:
sorting the first text words in the first text word segmentation set according to the order of the word weights of the first text words from big to small to obtain a sorted first text word segmentation set, and sorting the second text words in the second text word segmentation set according to the order of the word weights of the second text words from big to small to obtain a sorted second text word segmentation set;
Acquiring word segmentation coding vectors of first text words in the ordered first text word segmentation set, and acquiring word segmentation coding vectors of second text words in the ordered second text word segmentation set;
sequentially inputting word segmentation coding vectors of the first text words in the ordered first text word segmentation set into a second text processing model, and performing feature processing on the word segmentation coding vectors of the first text words by the second text processing model based on word segmentation positions of the first text words in the ordered first text word segmentation set to obtain word segmentation feature vectors of the first text words;
sequentially inputting word segmentation encoding vectors of the second text words in the ordered second text word segmentation set into the second text processing model, and performing feature processing on the word segmentation encoding vectors of the second text words by the second text processing model based on word segmentation positions of the second text words in the ordered second text word segmentation set to obtain word segmentation feature vectors of the second text words;
and determining word segmentation similarity between the first text word segmentation set and the second text word segmentation set according to the word segmentation feature vectors of the first text word segments and the word segmentation feature vectors of the second text word segments.
5. The method of claim 4, wherein said determining word segmentation similarity between the first set of text segments and the second set of text segments based on the word segmentation feature vector of the respective first text segment and the word segmentation feature vector of the respective second text segment comprises:
determining word segmentation similarity between each first text word segment and each second text word segment according to the word segmentation feature vector of each first text word segment and the word segmentation feature vector of each second text word segment;
respectively determining target word segmentation similarity corresponding to each first text word from word segmentation similarity between each first text word and each second text word, and respectively determining target word segmentation similarity corresponding to each second text word from word segmentation similarity between each first text word and each second text word; the target word segmentation similarity corresponding to any one of the first text word segmentation is the maximum word segmentation similarity in the word segmentation similarity between the any one of the first text word segmentation and each of the second text word segmentation; the target word segmentation similarity corresponding to any one of the second text word segments is the maximum word segmentation similarity in the word segmentation similarity between the any one of the second text word segments and each of the first text word segments;
And taking the target word segmentation similarity corresponding to each first text word segmentation and the target word segmentation similarity corresponding to each second text word segmentation as word segmentation similarity between the first text word segmentation set and the second text word segmentation set.
6. The method of claim 5, wherein said determining a degree of text matching between the first text and the second text by the degree of text similarity and the word segmentation similarity comprises:
determining a weighting coefficient corresponding to each first text word according to the word weight of the second text word associated with the target word segmentation similarity corresponding to each first text word, and determining a weighting coefficient corresponding to each second text word according to the word weight of the first text word associated with the target word segmentation similarity corresponding to each second text word; the second text word associated with the target word segmentation similarity corresponding to any one of the first text word is: the word segmentation similarity between the first text word segmentation and any one of the first text word segmentation is the second text word segmentation with the maximum word segmentation similarity; the first text word associated with the target word segmentation similarity corresponding to any one of the second text word is: the word segmentation similarity between the first text word segmentation and any one of the second text word segmentation is the first text word segmentation with the maximum word segmentation similarity;
According to the weighting coefficients corresponding to the first text word and the weighting coefficients corresponding to the second text word, carrying out weighted summation on the target word segmentation similarity corresponding to the first text word and the target word segmentation similarity corresponding to the second text word to obtain word segmentation matching degree;
and determining the text matching degree between the first text and the second text through the text similarity and the word segmentation matching degree.
7. The method of claim 1, wherein the text similarity is determined by a first text processing model and a second text processing model; the method further comprises the steps of:
obtaining a sample text pair; the sample text pair comprises a first sample text and a second sample text;
invoking a first initial text processing model to determine sample text similarity between the first sample text and the second sample text;
respectively determining word weights of all first sample text segmentation words in the first sample text segmentation word set, and respectively determining word weights of all second sample text segmentation words in the second sample text segmentation word set;
Invoking a second initial text processing model, and determining sample word segmentation similarity between the first sample text word segmentation set and the second sample text word segmentation set based on the word weights of the first sample text word segments and the word weights of the second sample text word segments;
and determining sample text matching degree between the first sample text and the second sample text according to the sample text similarity and the sample word segmentation similarity, determining a model loss value according to the sample text matching degree, and iteratively training the first initial text processing model and the second initial text processing model through the model loss value to obtain the first text processing model and the second text processing model.
8. A text matching device, the device comprising:
the acquisition module is used for acquiring a first text and a second text to be matched and determining the text similarity between the first text and the second text;
the acquisition module is further used for acquiring a first text word segmentation set associated with the first text and a second text word segmentation set associated with the second text; the first text word segmentation set comprises at least one first text word segmentation, and the second text word segmentation set comprises at least one second text word segmentation;
The processing module is used for respectively determining the word weight of each first text word in the first text and the word weight of each second text word in the second text;
the processing module is further used for determining word segmentation similarity between the first text word segmentation set and the second text word segmentation set based on the word weight of each first text word segmentation and the word weight of each second text word segmentation;
the processing module is further configured to determine a text matching degree between the first text and the second text according to the text similarity and the word segmentation similarity.
9. An electronic device comprising a processor and a memory, wherein the memory is configured to store a computer program comprising program instructions, the processor being configured to invoke the program instructions to perform the method of any of claims 1-7.
10. A computer readable storage medium, characterized in that the computer readable storage medium stores a computer program comprising program instructions which, when executed by a processor, cause the processor to perform the method of any of claims 1-7.
CN202310715607.7A 2023-06-15 2023-06-15 Text matching method, device, equipment and medium Pending CN117725923A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310715607.7A CN117725923A (en) 2023-06-15 2023-06-15 Text matching method, device, equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310715607.7A CN117725923A (en) 2023-06-15 2023-06-15 Text matching method, device, equipment and medium

Publications (1)

Publication Number Publication Date
CN117725923A true CN117725923A (en) 2024-03-19

Family

ID=90207514

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310715607.7A Pending CN117725923A (en) 2023-06-15 2023-06-15 Text matching method, device, equipment and medium

Country Status (1)

Country Link
CN (1) CN117725923A (en)

Similar Documents

Publication Publication Date Title
CN110598206B (en) Text semantic recognition method and device, computer equipment and storage medium
CN109492772B (en) Method and device for generating information
CN112685539B (en) Text classification model training method and device based on multi-task fusion
CN111368551A (en) Method and device for determining event subject
CN114610865A (en) Method, device and equipment for recommending recalled text and storage medium
CN111310462A (en) User attribute determination method, device, equipment and storage medium
CN111709225A (en) Event cause and effect relationship judging method and device and computer readable storage medium
CN113656699A (en) User feature vector determination method, related device and medium
CN111597336B (en) Training text processing method and device, electronic equipment and readable storage medium
CN113836390B (en) Resource recommendation method, device, computer equipment and storage medium
CN114580354B (en) Information coding method, device, equipment and storage medium based on synonym
CN111460113A (en) Data interaction method and related equipment
CN116629423A (en) User behavior prediction method, device, equipment and storage medium
CN112307243A (en) Method and apparatus for retrieving image
CN113448876B (en) Service testing method, device, computer equipment and storage medium
CN117725923A (en) Text matching method, device, equipment and medium
CN110442767B (en) Method and device for determining content interaction platform label and readable storage medium
CN115129863A (en) Intention recognition method, device, equipment, storage medium and computer program product
CN113869068A (en) Scene service recommendation method, device, equipment and storage medium
CN114328894A (en) Document processing method, document processing device, electronic equipment and medium
CN113886539A (en) Method and device for recommending dialect, customer service equipment and storage medium
CN113010664A (en) Data processing method and device and computer equipment
CN112231572A (en) User feature extraction method, device, equipment and storage medium
CN112231546A (en) Heterogeneous document ordering method, heterogeneous document ordering model training method and device
CN111552827A (en) Labeling method and device, and behavior willingness prediction model training method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination