CN114282513A - Text semantic similarity matching method and system, intelligent terminal and storage medium - Google Patents

Text semantic similarity matching method and system, intelligent terminal and storage medium Download PDF

Info

Publication number
CN114282513A
CN114282513A CN202111620100.0A CN202111620100A CN114282513A CN 114282513 A CN114282513 A CN 114282513A CN 202111620100 A CN202111620100 A CN 202111620100A CN 114282513 A CN114282513 A CN 114282513A
Authority
CN
China
Prior art keywords
sample
true
similarity
text semantic
semantic similarity
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111620100.0A
Other languages
Chinese (zh)
Inventor
吴闯
马明珠
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tongcheng Network Technology Co Ltd
Original Assignee
Tongcheng Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tongcheng Network Technology Co Ltd filed Critical Tongcheng Network Technology Co Ltd
Priority to CN202111620100.0A priority Critical patent/CN114282513A/en
Publication of CN114282513A publication Critical patent/CN114282513A/en
Pending legal-status Critical Current

Links

Images

Abstract

The application relates to a natural language processing technology in the field of artificial intelligence, in particular to a matching method, a system, an intelligent terminal and a storage medium for text semantic similarity, wherein the method comprises the steps of acquiring historical data as a training sample set, and the training sample set comprises true samples, positive samples and negative samples; calculating cosine similarity between the true sample and the positive sample and cosine similarity between the true sample and the negative sample, and training a preset text semantic similarity matching model based on a calculation result; deploying the text semantic similarity matching model to an online platform; matching a standard problem based on the text semantic similarity matching model and feeding back the standard problem to the online platform. The method and the device can improve the problem that the accuracy rate of customer service problem matching is low.

Description

Text semantic similarity matching method and system, intelligent terminal and storage medium
Technical Field
The application relates to natural language processing technology in the field of artificial intelligence, in particular to a text semantic similarity matching method, a text semantic similarity matching system, an intelligent terminal and a storage medium.
Background
With the rapid development of the computer internet, the text similarity calculation is widely applied in many fields, especially in the matching scene of customer service problems at the present stage; the process of customer service problem matching scenario is as follows: the customer service searches similar problems in the database by judging the text similarity of the problems aiming at the problems proposed by the user and feeds back the searched problems to the user. In the current customer service problem matching scene, the method for judging the text similarity mainly evaluates the text similarity based on word frequency, namely counting the occurrence frequency of each word in two texts, constructing a text vector according to the occurrence frequency of the word, and reflecting the similarity between the two texts by calculating the cosine similarity between the two text vectors.
In the process of implementing the present application, the inventors found that the above-mentioned technology has at least the following problems: in the matching scene of the customer service problems at the present stage, the text similarity is evaluated based on the word frequency, semantic changes caused by a language environment are separated, and the language habit of a user is ignored, so that the judgment of the text similarity is easily influenced, and the matching accuracy of the customer service problems is low.
Disclosure of Invention
In order to solve the problem of low accuracy of customer service problem matching, the application provides a text semantic similarity matching method, a text semantic similarity matching system, an intelligent terminal and a storage medium.
In a first aspect, the present application provides a matching method for semantic similarity of texts, which adopts the following technical scheme:
a matching method of text semantic similarity comprises the following steps:
acquiring historical data as a training sample set, wherein the training sample set comprises a true sample, a positive sample and a negative sample;
calculating cosine similarity between the true sample and the positive sample and cosine similarity between the true sample and the negative sample, and training a preset text semantic similarity matching model based on a calculation result;
deploying the text semantic similarity matching model to an online platform;
matching a standard problem based on the text semantic similarity matching model and feeding back the standard problem to the online platform.
By adopting the technical scheme, historical data is obtained to serve as a training sample set, and the training sample set comprises a true sample, a positive sample and a negative sample; and training a text semantic similarity matching model based on the cosine similarity between the true sample and the positive sample and the cosine similarity between the true sample and the negative sample, deploying the text semantic similarity matching model to an online platform after training, feeding the standard problem back to the online platform, and training the text semantic similarity matching model to improve the similarity between the problem actually input by the user and the standard problem fed back to the user, thereby improving the accuracy of customer service problem matching.
In a specific possible embodiment, the true sample comprises a question of a true input on the user line; the positive sample comprises the standard question selected by the user and the standard question of the customer service real input configuration aiming at the user; the negative examples comprise the standard questions not selected by the user;
by adopting the technical scheme, enough training samples are constructed and divided carefully, so that the model can be conveniently trained, and the matching accuracy of customer service problems is improved.
In a specific possible embodiment, the calculating the cosine similarity between the true sample and the positive sample and the cosine similarity between the true sample and the negative sample, and training a preset text semantic similarity matching model based on the calculation result includes:
respectively calculating cosine similarity between the true sample and the positive sample and cosine similarity between the true sample and the negative sample, wherein a cosine similarity calculation formula is as follows:
C0 = Cosine(T,P);
C1 = Cosine(T,N1);
Ck = Cosine(T,Nk);
wherein T represents a true sample, P represents a positive sample, N represents a negative sample, and k represents the number of negative samples;
and constraining the cosine similarity between the true sample and the positive sample to be more than or equal to the cosine similarity between the true sample and the negative sample, wherein a constraint formula is as follows:
C0 = Max(C0,C1,…,Ck)。
by adopting the technical scheme, because the cosine similarity between the true sample and the negative sample is larger than that between the true sample and the positive sample sometimes when the text semantic similarity matching model is separated from the semantic environment, the cosine similarity between the true sample and the positive sample is required to be always larger than or equal to that between the true sample and the negative sample in the training process of the text semantic similarity matching model.
In a specific possible implementation, applying the constraint formula to the cosine similarity calculation formula by selecting a Softmax function to obtain Softmax (C0):
Softmax(C0) = Max(Softmax(C0), Softmax(C1),…, Softmax(Ck));
defining the error between a real problem input by a user and the standard problem selected by the user as Loss, and constraining the cosine similarity between the true sample and the positive sample to be always more than or equal to the cosine similarity between the true sample and the negative sample in the calculation process of the Loss, wherein the calculation formula of the Loss is as follows:
Loss = - log(Softmax(C0))。
by adopting the technical scheme, the Loss can more vividly represent that the cosine similarity between the true sample and the positive sample is higher than that between the true sample and the negative sample, when the Loss of the text semantic similarity matching model is smaller, the more accurate the result predicted by the text semantic similarity matching model is considered, and the Loss can reach the minimum value through training the text semantic similarity matching model, so that the final aim of model training is fulfilled.
In a specific possible embodiment, the text semantic similarity matching model is supervised trained based on labeling data, wherein the labeling data comprises the standard question actually clicked by the user and the question actually input by the user.
By adopting the technical scheme, the model has the capability of predicting and classifying unknown data by labeling the data supervised training text semantic similarity matching model.
In a specific possible implementation, the annotation data is randomly drawn as the negative sample of the true sample as opposed to the positive sample.
By adopting the technical scheme, the labeling data are randomly extracted to serve as the negative sample of the true sample relative to the positive sample, the labeling data are randomly extracted to serve as the negative sample, the similarity between the negative sample and the true sample is improved, and the similarity between the positive sample and the true sample is further improved because the similarity between the true sample and the positive sample is always larger than or equal to the similarity between the true sample and the negative sample, so that the training effect of the text semantic similarity model is enhanced.
In a specific possible implementation, the text semantic similarity matching model includes the true sample and the positive sample calculation module and the true sample and the negative sample calculation module;
before the deploying the text semantic similarity matching model to the online platform, the method further comprises:
and cutting the text semantic similarity matching model and reserving the true sample and the positive sample calculation module.
By adopting the technical scheme, the text semantic similarity matching model is formed by combining the calculation module of the true sample and the positive sample and the calculation module of the true sample and the negative sample, and by removing the calculation module of the true sample and the negative sample in the text semantic similarity matching model, the calculation modules of the true sample and the positive sample can be conveniently and directly called to match the text similarity in the positive sample set, so that the similarity matching time and the retrieval time of the text can be effectively shortened, and the matching efficiency is improved.
In a second aspect, the present application provides a matching system for semantic similarity of texts, which adopts the following technical solutions:
a matching system for text semantic similarity comprises:
the data acquisition module is used for acquiring historical data as a training sample set, and the training sample set comprises a true sample, a positive sample and a negative sample;
the model training module is used for calculating cosine similarity between the true sample and the positive sample and cosine similarity between the true sample and the negative sample, and training a preset text semantic similarity matching model based on a calculation result;
the model deployment module is used for deploying the text semantic similarity matching model to an online platform;
and the data feedback module is used for matching the standard problem based on the text semantic similarity matching model and feeding back the standard problem to the online platform.
By adopting the technical scheme, historical data is obtained to serve as a training sample set, and the training sample set comprises a true sample, a positive sample and a negative sample; and training a text semantic similarity matching model based on the cosine similarity between the true sample and the positive sample and the cosine similarity between the true sample and the negative sample, deploying the text semantic similarity matching model to an online platform after training, feeding the standard problem back to the online platform, and training the text semantic similarity matching model to improve the similarity between the problem actually input by the user and the standard problem fed back to the user, thereby improving the accuracy of customer service problem matching.
In a third aspect, the present application provides an intelligent terminal, which adopts the following technical scheme:
an intelligent terminal comprising a processor and a memory, wherein at least one instruction, at least one program, a set of codes, or a set of instructions is stored in the memory, and the at least one instruction, the at least one program, the set of codes, or the set of instructions is loaded and executed by the processor to implement a matching method for semantic similarity of texts according to any one of the first aspect.
By adopting the technical scheme, the processor in the intelligent terminal can realize the matching method of the text semantic similarity according to the related computer program stored in the memory, so that the accuracy of the text semantic similarity is improved, and the accuracy of customer service problem matching is improved.
In a fourth aspect, the present application provides a computer-readable storage medium, which adopts the following technical solutions:
a computer readable storage medium having stored therein at least one instruction, at least one program, set of codes, or set of instructions, which is loaded and executed by a processor to implement a method of matching semantic similarity of text as claimed in any one of the first aspect.
By adopting the technical scheme, the corresponding program can be stored, so that the accuracy of text semantic similarity is improved, and the accuracy of customer service problem matching is improved.
In summary, the present application includes at least one of the following beneficial technical effects:
1. acquiring historical data as a training sample set, wherein the training sample set comprises a true sample, a positive sample and a negative sample; training a text semantic similarity matching model based on cosine similarity between a true sample and a positive sample and cosine similarity between the true sample and a negative sample, deploying the text semantic similarity matching model to an online platform after training, feeding back standard problems to the online platform, and training the text semantic similarity matching model to improve similarity between problems actually input by a user and the standard problems fed back to the user so as to improve the matching accuracy of customer service problems;
2. randomly extracting the labeling data as a negative sample of the true sample relative to the positive sample, and randomly extracting the labeling data as the negative sample to improve the similarity between the negative sample and the true sample;
3. the text semantic similarity matching model is formed by combining a true sample and positive sample calculation module and a true sample and negative sample calculation module, the true sample and negative sample calculation module in the text semantic similarity matching model is removed, so that the true sample and positive sample calculation module can be conveniently and directly called to match the text similarity in the positive sample set, the text similarity matching time and the retrieval time can be effectively shortened, and the matching efficiency is improved.
Drawings
Fig. 1 is a schematic flow chart of a matching method of text semantic similarity in an embodiment of the present application.
Fig. 2 is a block diagram of a matching system for semantic similarity of texts in this embodiment.
Fig. 3 is a schematic flow chart of a matching method of text semantic similarity in the embodiment of the present application.
Description of reference numerals: 100. a data acquisition module; 200. a model training module; 300. a model deployment module; 400. and a data feedback module.
Detailed Description
The present application is described in further detail below with reference to the attached drawings.
The embodiment of the application discloses a matching method of text semantic similarity, which can be applied to an intelligent terminal, takes the intelligent terminal as an execution main body and is used for extracting text semantic features in actual problems and searching standard problems similar to the text semantic features in a standard problem library according to actual problems really input by a user on line in a matching scene of customer service problems, judges the most similar standard problems according to the text semantic similarity and feeds the searched standard problems with the highest similarity back to the user for selection. The text semantic similarity refers to extracting high-dimensional semantic features of texts on the basis of text words, and then calculating the similarity so as to measure the similarity between different texts.
Referring to fig. 1, the matching method of text semantic similarity includes the following steps:
s101, acquiring historical data as a training sample set, wherein the training sample set comprises a true sample, a positive sample and a negative sample.
In implementation, firstly, historical data is acquired as a training sample set, wherein the training sample set comprises a true sample, a positive sample and a negative sample; the true sample is the problem of real input of the online platform by the user, the positive sample is a standard problem selected on the online by the user or a standard problem configured by a customer service worker aiming at the real input of the user, and the negative sample is a standard problem which is not selected by the user in the standard problem fed back by the online platform.
S102, calculating cosine similarity between the true sample and the positive sample and cosine similarity between the true sample and the negative sample, and training a preset text semantic similarity matching model based on the calculation result.
In the implementation, high-dimensional semantic features are extracted from a true sample, a positive sample and a negative sample through a preset text semantic similarity matching model, Albert is used as a basic structure for extracting the high-dimensional semantic features of a training sample set in the text semantic similarity matching model, Albert is a deep pre-training model for extracting the text features, and compared with other more common training models, a parameter reduction technology is used in Albert to reduce memory consumption so as to improve the training speed of the model.
In implementation, model training is according to positive sample: training was performed with a negative sample ratio of 1: 4. After the semantic features of the true sample, the positive sample and the negative sample are extracted through a text semantic similarity matching model, the cosine similarity between the true sample and the positive sample and the cosine similarity between the true sample and the negative sample are respectively calculated by utilizing a cosine similarity formula. Cosine similarity is a standard for measuring similarity between texts, and the closer the cosine similarity is to 1, the higher the similarity between the two is, and the closer the cosine similarity is to 0, the more independent the two is.
Because the cosine similarity between the true sample and the negative sample is greater than the cosine similarity between the true sample and the positive sample sometimes when the text semantic similarity matching model is separated from the semantic environment, the cosine similarity between the true sample and the positive sample is always required to be greater than or equal to the cosine similarity between the true sample and the negative sample in the training process of the text semantic similarity matching model.
In order to more vividly represent that the cosine similarity between a true sample and a positive sample is higher than that between the true sample and a negative sample, the error between a real problem input by a user in deep learning and a standard problem selected by the user is defined as Loss, when the Loss of a text semantic similarity matching model is smaller, the more accurate the result predicted by the text semantic similarity matching model is considered, and the Loss can reach the minimum value through training the text semantic similarity matching model. And during the calculation of the Loss, the cosine similarity between the true sample and the positive sample is always greater than or equal to the cosine similarity between the true sample and the negative sample.
Specifically, the Loss calculation process includes first calculating a cosine similarity between the true sample (T) and the positive sample (P) and a cosine similarity between the true sample (T) and the negative sample (N), where k is the number of the negative samples (N), and a calculation formula of the cosine similarity is as follows:
C0 = Cosine(T,P);
C1 = Cosine(T,N1);
Ck = Cosine(T,Nk);
in order to ensure that the similarity between the true sample and the positive sample is greater than or equal to the similarity between the true sample and the negative sample, a calculation formula of the cosine similarity is constrained, wherein the constraint formula is as follows:
C0 = Max(C0,C1,…,Ck);
and C0 obtained through constraint is the largest value in the number set consisting of C0, C1, … and Ck obtained through calculation, wherein C0 in the constraint formula represents the cosine similarity between the true sample and the positive sample, so that the cosine similarity between the true sample and the positive sample can be ensured to be always larger than the cosine similarity between the true sample and the negative sample. For example, C1, …, the maximum value of Ck is 0.8, C0 is 0.6, and C0 is assigned to the maximum value of 0.8 by the constraint formula, so as to satisfy that the cosine similarity between the true sample and the positive sample is always greater than or equal to the cosine similarity between the true sample and the negative sample.
Specifically, since the range of the cosine similarity value is 0 to 1, a Softmax function which is also a function and has a value range of 0 to 1 is selected, a constraint formula is applied to a cosine similarity calculation formula of C0 to obtain Softmax (C0), the previously calculated C0 is replaced by Softmax (C0) to be used as the cosine similarity between the true sample (T) and the positive sample (P), and the calculation formula of the Softmax (C0) is as follows:
Softmax(C0) = Max(Softmax(C0), Softmax(C1),…, Softmax(Ck));
since the value of Softmax is 1 at most, Softmax (C0) is required to approach 1 in a wireless manner, so that the cosine similarity between the true sample (T) and the positive sample (P) is the maximum value and is always greater than or equal to the cosine similarity between the true sample (T) and the negative sample (N).
Since the final purpose of model training is to minimize Loss, and the cosine similarity, namely Softmax (C0), between the true sample (T) and the positive sample (P) is the maximum value, the calculation formula of Loss is as follows:
Loss = - log(Softmax(C0));
in one embodiment, the process of training the text semantic similarity matching model based on the training sample set is supervised training, and the supervised training needs to rely on label data, wherein the label data comprises a standard question of real click of a user and a question of real input of the user. For example, after the user inputs the problem of how to quit the airline tickets on the intelligent terminal, the intelligent terminal feeds back five standard problems with the highest similarity to the user for selection, and if the user selects one of the standard problems, the standard problem selected by the user and the problem actually input by the user are labeled and used as a group of labeled data. By marking the data and carrying out supervised training on the text semantic similarity matching model, the model can have the capability of predicting and classifying unknown data.
In one embodiment, in order to improve the training effect of the text semantic similarity model and increase the accuracy of customer service problem matching, the labeling data can be randomly extracted as a negative sample of the true sample relative to the positive sample, and the similarity between the negative sample and the true sample is improved by randomly extracting the labeling data as the negative sample.
S103, deploying the text semantic similarity matching model to an online platform.
Specifically, the text semantic similarity matching model includes a true sample and positive sample calculation module and a true sample and negative sample calculation module, the true sample and positive sample calculation module is specifically a set of calculated cosine similarity values between the true sample and the positive sample, and the true sample and negative sample calculation module is specifically a set of calculated cosine similarity values between the true sample and the negative sample. In the implementation, the true sample and negative sample calculation module in the text semantic similarity matching model is removed, and the true sample and positive sample calculation module is reserved, so that the true sample and positive sample calculation module can be directly called conveniently to match the text similarity in the positive sample set, the similarity matching time and the retrieval time of the text can be effectively shortened, and the matching efficiency is improved.
And S104, matching the standard problem based on the text semantic similarity matching model and feeding back the standard problem to the online platform.
Specifically, after a user inputs a problem on an online platform, firstly, a text semantic similarity matching model extracts semantic features capable of being input by the user, then, the semantic features of the positive sample are extracted from the positive sample set and matched with the semantic features of the user problem, and finally, five standard problems with the highest similarity are matched and fed back to the user for selection.
FIG. 3 shows a flow diagram of a matching method of text semantic similarity, where a text semantic similarity model first obtains a true sample, a positive sample, and a negative sample as a training sample set, then the text semantic similarity matching model uses Albert as a basic structure to extract high-dimensional semantic features of the true sample, the positive sample, and the negative sample, then cosine similarity between the true sample and the positive sample and cosine similarity between the true sample and the negative sample are calculated respectively, and the cosine similarity between the true sample and the positive sample is constrained to be greater than or equal to the cosine similarity between the true sample and the negative sample to perform supervised training on the text semantic similarity matching model, and after the model training is finished, the values calculated among the true sample, the positive sample, and the negative sample are reversely propagated and stored in the model. When the text semantic similarity matching model is applied to the online platform, only the true sample and the positive sample calculation module are reserved and deployed to the online platform. After a user inputs a problem on an online platform, text semantic features are extracted from the problem input by the user, most similar positive sample semantic features are automatically searched in a positive sample set, and the most similar five positive samples, namely five standard problems, are obtained and fed back to the user for selection, so that the matching process of text semantic similarity is completed.
The embodiment of the application also discloses a matching system of the text semantic similarity. Referring to fig. 2, the matching system for semantic similarity of texts comprises:
a data obtaining module 100, configured to obtain historical data as a training sample set, where the training sample set includes a true sample, a positive sample, and a negative sample; the true sample is the problem of real input of the online platform by the user, the positive sample is a standard problem selected on the online by the user or a standard problem configured by a customer service worker aiming at the real input of the user, and the negative sample is a standard problem which is not selected by the user in the standard problem fed back by the online platform.
The model training module 200 is used for calculating cosine similarity between the true sample and the positive sample and cosine similarity between the true sample and the negative sample, and training a preset text semantic similarity matching model based on a calculation result; calculating cosine similarity between the true sample and the positive sample and cosine similarity between the true sample and the negative sample, and training a preset text semantic similarity matching model based on the calculation result comprises the following steps:
respectively calculating cosine similarity between the true sample and the positive sample and cosine similarity between the true sample and the negative sample, wherein the cosine similarity calculation formula is as follows:
C0 = Cosine(T,P);
C1 = Cosine(T,N1);
Ck = Cosine(T,Nk);
wherein T represents a true sample, P represents a positive sample, N represents a negative sample, and k represents the number of negative samples;
and constraining the cosine similarity between the true sample and the positive sample to be more than or equal to the cosine similarity between the true sample and the negative sample, wherein the constraint formula is as follows:
C0 = Max(C0,C1,…,Ck)。
selecting a Softmax function, and applying a constraint formula to a cosine similarity calculation formula to obtain Softmax (C0):
Softmax(C0) = Max(Softmax(C0), Softmax(C1),…, Softmax(Ck));
defining the error between the real problem input by the user and the standard problem selected by the user as Loss, and in the calculation process of the Loss, constraining the cosine similarity between the true sample and the positive sample to be always more than or equal to the cosine similarity between the true sample and the negative sample, wherein the calculation formula of the Loss is as follows:
Loss = - log(Softmax(C0))。
the model deployment module 300 is used for deploying the text semantic similarity matching model to an online platform;
and the data feedback module 400 is used for matching the standard problem based on the text semantic similarity matching model and feeding the standard problem back to the online platform.
Optionally, the model deployment module 300 includes:
and the model cutting module is used for cutting the text semantic similarity matching model and reserving the true sample and the positive sample calculation module.
Optionally, the matching system for semantic similarity of texts further includes:
and the supervision training module is used for carrying out supervision training on the text semantic similarity matching model based on the labeling data. The annotation data comprises standard questions actually clicked by the user and questions actually input by the user.
Optionally, the supervised training module includes:
and the data enhancement submodule is used for randomly extracting the annotation data as a negative sample of the true sample relative to the positive sample.
The embodiment of the application further discloses an intelligent terminal which comprises a storage and a processor, wherein the storage stores a computer program, and when the computer program is executed by the processor, the processor executes the steps of the text semantic similarity matching method. Here, the step of a matching method of text semantic similarity may be a step in the above matching method of text semantic similarity.
The embodiment of the application also discloses a computer readable storage medium, which comprises various steps in the flow of the matching method for text semantic similarity, wherein the steps can be realized when the computer readable storage medium is loaded and executed by a processor.
The computer-readable storage medium includes, for example: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
It is obvious to those skilled in the art that, for convenience and simplicity of description, only the division of the functional modules is illustrated, and in practical applications, the above functions may be distributed by different functional modules according to needs, that is, the internal structure of the apparatus is divided into different functional modules to complete all or part of the above described functions.
The above embodiments are only used to describe the technical solutions of the present application in detail, but the above embodiments are only used to help understanding the method and the core idea of the present application, and should not be construed as limiting the present application. Those skilled in the art should also appreciate that various modifications and substitutions can be made without departing from the scope of the present disclosure.

Claims (10)

1. A matching method of text semantic similarity is characterized by comprising the following steps:
acquiring historical data as a training sample set, wherein the training sample set comprises a true sample, a positive sample and a negative sample;
calculating cosine similarity between the true sample and the positive sample and cosine similarity between the true sample and the negative sample, and training a preset text semantic similarity matching model based on a calculation result;
deploying the text semantic similarity matching model to an online platform;
matching a standard problem based on the text semantic similarity matching model and feeding back the standard problem to the online platform.
2. The matching method of text semantic similarity according to claim 1, characterized in that: the true sample comprises a question of true input on the user line; the positive sample comprises the standard question selected by the user and the standard question of the customer service real input configuration aiming at the user; the negative examples include the standard questions that were not selected by the user.
3. The matching method of text semantic similarity according to claim 2, characterized in that: the calculating the cosine similarity between the true sample and the positive sample and the cosine similarity between the true sample and the negative sample, and the training a preset text semantic similarity matching model based on the calculation result comprises:
respectively calculating cosine similarity between the true sample and the positive sample and cosine similarity between the true sample and the negative sample, wherein a cosine similarity calculation formula is as follows:
C0 = Cosine(T,P);
C1 = Cosine(T,N1);
Ck = Cosine(T,Nk);
wherein T represents a true sample, P represents a positive sample, N represents a negative sample, and k represents the number of negative samples;
and constraining the cosine similarity between the true sample and the positive sample to be more than or equal to the cosine similarity between the true sample and the negative sample, wherein a constraint formula is as follows:
C0 = Max(C0,C1,…,Ck)。
4. the matching method of text semantic similarity according to claim 3, characterized in that: selecting a Softmax function, and applying the constraint formula to the cosine similarity calculation formula to obtain Softmax (C0):
Softmax(C0) = Max(Softmax(C0), Softmax(C1),…, Softmax(Ck));
defining the error between a real problem input by a user and the standard problem selected by the user as Loss, and constraining the cosine similarity between the true sample and the positive sample to be always more than or equal to the cosine similarity between the true sample and the negative sample in the calculation process of the Loss, wherein the calculation formula of the Loss is as follows:
Loss = - log(Softmax(C0))。
5. the matching method of text semantic similarity according to claim 3, characterized in that: and carrying out supervised training on the text semantic similarity matching model based on labeling data, wherein the labeling data comprise the standard problem really clicked by a user and the problem really input by the user.
6. The matching method of text semantic similarity according to claim 5, characterized in that: randomly extracting the annotation data as the negative sample of the true sample as opposed to the positive sample.
7. The matching method of text semantic similarity according to claim 6, characterized in that:
the text semantic similarity matching model comprises a true sample and positive sample calculation module and a true sample and negative sample calculation module;
before the deploying the text semantic similarity matching model to the online platform, the method further comprises:
and cutting the text semantic similarity matching model and reserving the true sample and the positive sample calculation module.
8. A system for analyzing semantic similarity of texts is characterized by comprising:
a data acquisition module (100) for acquiring historical data as a training sample set, the training sample set comprising true samples, positive samples and negative samples;
the model training module (200) is used for calculating cosine similarity between the true sample and the positive sample and cosine similarity between the true sample and the negative sample, and training a preset text semantic similarity matching model based on a calculation result;
a model deployment module (300) for deploying the text semantic similarity matching model to an online platform;
and the data feedback module (400) is used for matching a standard problem based on the text semantic similarity matching model and feeding back the standard problem to the online platform.
9. An intelligent terminal, characterized in that the intelligent terminal comprises a processor and a memory, wherein the memory stores at least one instruction, at least one program, a set of codes or a set of instructions, and the at least one instruction, the at least one program, the set of codes or the set of instructions is loaded and executed by the processor to realize a matching method for semantic similarity of texts according to any one of claims 1 to 7.
10. A computer-readable storage medium, having stored thereon at least one instruction, at least one program, a set of codes, or a set of instructions, which is loaded and executed by a processor to implement a method of matching semantic similarity of texts as claimed in any one of claims 1 to 7.
CN202111620100.0A 2021-12-27 2021-12-27 Text semantic similarity matching method and system, intelligent terminal and storage medium Pending CN114282513A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111620100.0A CN114282513A (en) 2021-12-27 2021-12-27 Text semantic similarity matching method and system, intelligent terminal and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111620100.0A CN114282513A (en) 2021-12-27 2021-12-27 Text semantic similarity matching method and system, intelligent terminal and storage medium

Publications (1)

Publication Number Publication Date
CN114282513A true CN114282513A (en) 2022-04-05

Family

ID=80876687

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111620100.0A Pending CN114282513A (en) 2021-12-27 2021-12-27 Text semantic similarity matching method and system, intelligent terminal and storage medium

Country Status (1)

Country Link
CN (1) CN114282513A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115329063A (en) * 2022-10-18 2022-11-11 江西电信信息产业有限公司 User intention identification method and system
CN116127948A (en) * 2023-02-10 2023-05-16 北京百度网讯科技有限公司 Recommendation method and device for text data to be annotated and electronic equipment

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115329063A (en) * 2022-10-18 2022-11-11 江西电信信息产业有限公司 User intention identification method and system
CN116127948A (en) * 2023-02-10 2023-05-16 北京百度网讯科技有限公司 Recommendation method and device for text data to be annotated and electronic equipment

Similar Documents

Publication Publication Date Title
CN108363790B (en) Method, device, equipment and storage medium for evaluating comments
CN108073568B (en) Keyword extraction method and device
CN107480143B (en) Method and system for segmenting conversation topics based on context correlation
CN111222305B (en) Information structuring method and device
CN107729468B (en) answer extraction method and system based on deep learning
CN108280114B (en) Deep learning-based user literature reading interest analysis method
US20190228320A1 (en) Method, system and terminal for normalizing entities in a knowledge base, and computer readable storage medium
CN111078837B (en) Intelligent question-answering information processing method, electronic equipment and computer readable storage medium
CN106503192A (en) Name entity recognition method and device based on artificial intelligence
CN111444320A (en) Text retrieval method and device, computer equipment and storage medium
CN110795913B (en) Text encoding method, device, storage medium and terminal
CN109086265B (en) Semantic training method and multi-semantic word disambiguation method in short text
CN114282513A (en) Text semantic similarity matching method and system, intelligent terminal and storage medium
CN110781204A (en) Identification information determination method, device, equipment and storage medium of target object
CN115544303A (en) Method, apparatus, device and medium for determining label of video
CN110795942B (en) Keyword determination method and device based on semantic recognition and storage medium
CN115526171A (en) Intention identification method, device, equipment and computer readable storage medium
KR20120047622A (en) System and method for managing digital contents
CN114491079A (en) Knowledge graph construction and query method, device, equipment and medium
CN112434533B (en) Entity disambiguation method, entity disambiguation device, electronic device, and computer-readable storage medium
CN112307048A (en) Semantic matching model training method, matching device, equipment and storage medium
CN117435685A (en) Document retrieval method, document retrieval device, computer equipment, storage medium and product
CN110489740A (en) Semantic analytic method and Related product
CN115906835A (en) Chinese question text representation learning method based on clustering and contrast learning
CN112487154B (en) Intelligent search method based on natural language

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination