CN114818729A

CN114818729A - Method, device and medium for training semantic recognition model and searching sentence

Info

Publication number: CN114818729A
Application number: CN202210469400.1A
Authority: CN
Inventors: 韩佳; 杜新凯; 吕超; 谷姗姗; 张晗; 史辉; 孙垚锋
Original assignee: Sunshine Insurance Group Co Ltd
Current assignee: Sunshine Insurance Group Co Ltd
Priority date: 2022-04-28
Filing date: 2022-04-28
Publication date: 2022-07-29

Abstract

The embodiment of the application provides a method, a device and a medium for training a semantic recognition model and searching sentences, wherein the method comprises the following steps: acquiring original sample data, wherein the original sample data comprises a plurality of original training sentences; obtaining target sample data according to the original sample data, wherein the target sample data comprises a plurality of target training statement sets, and each target training statement set comprises an original training statement and a first construction training statement which has the same semantic meaning as the original training statement and has a different construction; and training the semantic recognition model to be trained at least according to the target sample data to obtain a target semantic recognition model. Through some embodiments of the method and the device, the obtained target semantic recognition model can more accurately recognize the semantics of the sentence to be matched.

Description

Method, device and medium for training semantic recognition model and searching sentence

Technical Field

The embodiment of the application relates to the field of semantic recognition, in particular to a method, a device and a medium for training a semantic recognition model and searching sentences.

Background

Semantic recognition is an important branch of the natural language processing field, which is widely used and a cornerstone of many downstream tasks. In the related art, in order to increase the accuracy of semantic recognition, a data enhancement method is generally used to add training sentences, but the added sentences have higher similarity to the original sample, so that the actual semantic recognition of the added sentences is omitted, and the accuracy of semantic recognition is reduced.

Therefore, how to improve the semantic recognition accuracy becomes a problem to be solved.

Disclosure of Invention

The embodiments of the present application provide a method, an apparatus, and a medium for training a semantic recognition model and finding sentences, and the training method provided by some embodiments of the present application enables the obtained target semantic recognition model to recognize target sentences matched with the sentences to be matched through semantics.

In a first aspect, an embodiment of the present application provides a method for training a semantic recognition model, where the training method includes: acquiring original sample data, wherein the original sample data comprises a plurality of original training sentences; obtaining target sample data according to the original sample data, wherein the target sample data comprises a plurality of target training statement sets, and each target training statement set comprises an original training statement and a first construction training statement which has the same semantic meaning as the original training statement and has a different construction; and training the semantic recognition model to be trained at least according to the target sample data to obtain a target semantic recognition model.

Therefore, the model is trained by using the first construction training sentence which has the same semantic meaning as the original training sentence and has a different construction, so that the semantic recognition model to be trained can continuously recognize the semantic meaning of the first construction training sentence in the training process, and the target semantic recognition model can recognize the correct semantic meaning corresponding to the sentence.

With reference to the first aspect, in an embodiment of the present application, the first structural training sentence is a double negative training sentence, and the double negative training sentence is obtained by adding a double negative word to the original training sentence.

Therefore, according to the embodiment of the application, the double negative words are added into the original training sentence, so that the model can obtain correct semantics even under the interference of the double negative words, and the accuracy of the target semantic recognition model can be improved.

With reference to the first aspect, in an embodiment of the present application, each target training sentence set further includes a second structural training sentence with a semantic opposite to that of the original training sentence, and a third structural sentence with a semantic identical to that of the second structural training sentence and with a different structure, where the third structural sentence is obtained by adding a double negative word to the second structural training sentence.

Therefore, the second structural training sentence and the third structural training sentence are generated, the target semantic recognition model can recognize the sentences with negative meanings, and therefore accuracy of the target semantic recognition model in the semantic recognition process is improved.

With reference to the first aspect, in an implementation manner of the present application, the training a semantic recognition model to be trained according to at least the target sample data to obtain a target semantic recognition model includes: inputting the target sample data into a semantic recognition model to be trained; obtaining a semantic prediction result corresponding to the target training sentence set through the semantic recognition model; obtaining a target loss value according to a target loss function and the semantic prediction result; adjusting parameters in the semantic recognition model to be trained according to the target loss value; and repeating the steps until the target loss value meets the preset requirement, terminating the training and obtaining the target semantic recognition model.

Therefore, the loss of the semantic prediction result is calculated in the training process, so that the target semantic recognition model can output the vector representing the text semantic most accurately.

With reference to the first aspect, in one embodiment of the present application, the objective loss function is at least related to a similarity loss sub-function; the obtaining of the semantic prediction result corresponding to the target training sentence set through the semantic recognition model includes: obtaining an original sentence semantic prediction result for the original training sentence, a first constructed sentence semantic prediction result for the first constructed training sentence, a second constructed sentence semantic prediction result for the second constructed training sentence, and a third constructed sentence semantic prediction result for the third constructed training sentence; the obtaining a target loss value according to the target loss function and the semantic prediction result includes: calculating total marginal loss values among the original sentence semantic prediction result, the first constructed sentence semantic prediction result, the second constructed sentence semantic prediction result and the third constructed sentence semantic prediction result through the similarity loss sub-function; taking the total marginal loss value as the target loss value.

Therefore, the target loss value is obtained through the similarity loss sub-function, the target loss value can be continuously reduced in the training process, and the target semantic recognition model is obtained.

With reference to the first aspect, in an embodiment of the present application, the similarity loss sub-function includes a first loss function and a second loss function; the calculating, by the similarity loss sub-function, a total marginal loss value between the original sentence semantic prediction result and the first, second, and third structural sentence semantic prediction results includes: calculating similarity differences between the original sentence semantic prediction result and the first, second and third structural sentence semantic prediction results through the first loss function; and obtaining the total marginal loss value based on the similarity difference value and the second loss function.

Therefore, according to the embodiment of the application, through the first loss function and the second loss function, the loss between the sentences with the same semantics can be gradually reduced in the training process, and an accurate semantic recognition result can be obtained.

With reference to the first aspect, in one embodiment of the present application, the target loss function is further associated with a contrast loss sub-function; before obtaining the target loss value according to the target loss function and the semantic prediction result, the method further comprises: acquiring original negative example sample data, wherein the original negative example sample data comprises a plurality of negative example training sentences; the obtaining of the target loss value according to the target loss function and the semantic prediction result includes: calculating a semantic prediction result of the original negative sample data and a contrast loss value between the original training sentences through the contrast loss sub-function; and carrying out weighted summation on the contrast loss value and the total marginal loss value to obtain a target loss value.

Therefore, the positive sample and the negative sample are distinguished by comparing the loss sub-functions, the positive sample, the soft negative sample, the double negative positive sample and the double negative soft negative sample are distinguished by the similarity loss sub-functions, the target semantic identification model can have the capability of identifying the semantic difference of similar texts, and the accuracy of downstream tasks is improved.

In a second aspect, an embodiment of the present application provides an apparatus for training a semantic recognition model, where the training apparatus includes: the data acquisition module is configured to acquire original sample data, wherein the original sample data comprises a plurality of original training sentences; the data generating module is configured to obtain target sample data according to the original sample data, wherein the target sample data comprises a plurality of target training statement sets, and each target training statement set comprises an original training statement and a first construction training statement which has the same semantic meaning as the original training statement and has a different construction; and the model training module is configured to train the semantic recognition model to be trained at least according to the target sample data to obtain the target semantic recognition model.

With reference to the second aspect, in one embodiment of the application, the first structural training sentence is a double negative training sentence, and the double negative training sentence is obtained by adding a double negative word to the original training sentence.

With reference to the second aspect, in an embodiment of the present application, each target training sentence set further includes a second constructed training sentence with a semantic opposite to that of the original training sentence, and a third constructed sentence with a semantic identical to that of the second constructed training sentence and with a different construction, where the third constructed sentence is obtained by adding a double negative word to the second constructed training sentence.

With reference to the second aspect, in an embodiment of the present application, the model training module is configured to input the target sample data into a semantic recognition model to be trained; obtaining a semantic prediction result corresponding to the target training sentence set through the semantic recognition model; obtaining a target loss value according to a target loss function and the semantic prediction result; adjusting parameters in the semantic recognition model to be trained according to the target loss value; and repeating the steps until the target loss value meets the preset requirement, terminating the training and obtaining the target semantic recognition model.

With reference to the second aspect, in one embodiment of the present application, the objective loss function is at least related to a similarity loss sub-function; a model training module configured to obtain an original sentence semantic prediction result for the original training sentence, a first constructed sentence semantic prediction result for the first constructed training sentence, a second constructed sentence semantic prediction result for the second constructed training sentence, and a third constructed sentence semantic prediction result for the third constructed training sentence; calculating total marginal loss values among the original sentence semantic prediction result, the first constructed sentence semantic prediction result, the second constructed sentence semantic prediction result and the third constructed sentence semantic prediction result through the similarity loss sub-function; taking the total marginal loss value as the target loss value.

With reference to the second aspect, in one embodiment of the present application, the similarity loss sub-function includes a first loss function and a second loss function; a model training module configured to: calculating similarity differences between the original sentence semantic prediction result and the first, second and third structural sentence semantic prediction results through the first loss function; and obtaining the total marginal loss value based on the similarity difference value and the second loss function.

With reference to the second aspect, in one embodiment of the present application, the target loss function is further associated with a contrast loss sub-function; a model training module configured to: acquiring original negative example sample data, wherein the original negative example sample data comprises a plurality of negative example training sentences; calculating a semantic prediction result of the original negative sample data and a contrast loss value between the original training sentences through the contrast loss sub-function; and carrying out weighted summation on the contrast loss value and the total marginal loss value to obtain a target loss value.

In a third aspect, an embodiment of the present application provides a method for finding a statement, where the method includes: obtaining a statement to be matched; inputting the statement to be matched into a target semantic recognition model obtained by any embodiment of the first aspect, and obtaining a semantic vector corresponding to the statement to be matched through the target semantic recognition model; and searching a target statement matched with the statement to be matched according to the semantic vector.

In a fourth aspect, an embodiment of the present application provides an electronic device, including: a processor, a memory, and a bus; the processor is connected to the memory via the bus, and the memory stores computer readable instructions for implementing the method according to any of the embodiments of the first aspect when the computer readable instructions are executed by the processor.

Drawings

FIG. 1 is a schematic view illustrating a semantic identified scene composition according to an embodiment of the present disclosure;

FIG. 2 is a flowchart of a semantic recognition method according to an embodiment of the present disclosure;

FIG. 3 is a schematic diagram illustrating a composition of target sample data according to an embodiment of the present application;

FIG. 4 is a second flowchart of a semantic recognition method according to an embodiment of the present application;

FIG. 5 is a block diagram illustrating an apparatus for semantic recognition according to an embodiment of the present disclosure;

fig. 6 is a schematic diagram illustrating a composition of an electronic device according to an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all the embodiments. The components of the embodiments of the present application, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present application, as presented in the figures, is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present application without making any creative effort, shall fall within the protection scope of the present application.

The embodiments of the present application may be applied to the field of sentence matching, for example, some embodiments of the present application acquire a target sentence matched with a sentence to be matched by recognizing semantics. For example, in order to solve the problem that the matching sentences cannot be obtained by using semantics in the background art, in some embodiments of the present application, it is necessary to construct more reasonable training sample data (for example, by generating training sentences with the same semantics as the original sample data and with different semantics), and train the semantic recognition model to be trained according to the constructed new training sample data, so that the model can learn semantic information because of semantic comparison between the constructed new training sentences. For example, in some embodiments of the present application, a method of training a semantic recognition model includes: and constructing a double negative training sentence, and then training the semantic recognition model to be trained based on the double negative training sentence to obtain the target semantic recognition model.

The method steps in the embodiments of the present application are described in detail below with reference to the accompanying drawings.

Fig. 1 provides an application scenario of some embodiments of the present application, which includes a user 110, a client 120, and a server 130. Specifically, the user 110 inputs a sentence to be matched (e.g., what is the insurance of a certain company) into the client 120. After obtaining the sentence to be matched, the client 120 inputs the sentence to be matched into the server 130, the server 130 inputs the sentence to be matched into the target semantic recognition model to obtain a semantic vector, and then searches for the target sentence corresponding to the sentence to be matched in the database according to the semantic vector.

In the prior art, in order to increase the accuracy of semantic recognition, training sentences are usually added by using a data enhancement method, but the added sentences have higher similarity to an original sample, so that the actual semantic recognition of the added sentences is omitted, and the accuracy of semantic recognition is reduced. In the embodiment of the application, a plurality of training sentences (namely target sample data is obtained) with the same semantics and opposite semantics as the original training sentences are constructed to enrich the content of the training data, so that the obtained model has certain semantic recognition capability. For example, a corresponding double negative training sentence is constructed based on a certain original training sentence (i.e., a sentence with the same semantic and different structural semantics is constructed), so that the embodiment of the application can accurately identify the correct semantic of the sentence to be matched in the process of searching the target sentence, thereby improving the accuracy of searching the target sentence.

It is understood that the server 130 in fig. 1 deploys a target semantic recognition model trained according to target sample data. The following exemplary description describes the process of training the semantic recognition model to obtain the target semantic recognition model deployed on the server in fig. 1.

As shown in fig. 2, some embodiments of the present application provide a method for training a semantic recognition model, the method including:

and S210, acquiring original sample data.

It should be noted that the original sample data includes a plurality of original training sentences. For example, the plurality of original training sentences include "i buy a car insurance of a company versus other car insurance", "recommend several car insurance of higher performance price", and the like.

In an embodiment of the present application, initial sample data is obtained from a text collected from a system log, and it is understood that the text may be a question or an interrogative sentence. Because the initial sample data (initially collected data) includes more noises, such as meaningless special characters, spaces, messy codes and the like, the noises are searched and removed by using a regular expression, and the original sample data is obtained.

In another embodiment of the present application, S210 further includes: obtaining original negative sample data, wherein the original negative sample data includes a plurality of negative training sentences.

That is to say, the original negative sample data is interference data that has no correlation with the original sample data, for example, one original sample statement in the original sample data is "compare with other car insurance, i buy car insurance of a certain company", the corresponding original negative sample data is "how long the waiting period of this product is", "recommend several car insurance with higher performance price", and the like. It can be understood that, in the prior art, the original sample data and the original negative sample data are directly used for training the model, and the obtained target model after training also has the function of acquiring the target sentence matched with the sentence to be matched, but semantic recognition is not accurate and the capability of recognizing the text with the same characters but different semantics is not provided.

And S220, obtaining target sample data according to the original sample data.

It should be noted that the target sample data includes a plurality of target training sentence sets, each target training sentence set corresponds to one original training sentence, and the target training combination includes one original training sentence and a first structural training sentence which has the same semantic meaning as the original training sentence and has a different structure.

That is, at least one first structural training statement is generated for each original training statement in the original sample data, wherein the first structural training statement has the same semantic meaning as the original training statement but a different structure, and the original training statement and the corresponding first structural training statement form a target training statement set. The reason for generating the target sample data by the original sample data is that in the process of semantic recognition, the possible semantics of two sentences which are constructed differently are the same, and the training data are added to increase the accuracy of semantic recognition.

It should be noted that each target training sentence set in the target sample data further includes a sound training sentence, the sound training sentence is generated by an original training sentence, and the sound training sentence has the same semantic and the same structure as the original training sentence.

For example: an original training sentence obtained through S210 is "compare with other car insurance, i buy car insurance of a certain company", and then a regular example training sentence generated through the sentence is "compare with other car insurance, i buy car insurance of a certain company" or "compare with other car insurance, i will buy car insurance of a certain company".

In one embodiment of the present application, the first constructed training sentence is a double negative training sentence, wherein the double negative training sentence is obtained by adding a double negative word to the original training sentence.

That is, double negative words are added on the basis of the original training sentence, and a first structural training sentence with the same semantic meaning as the original training sentence is obtained. That is, the first constructed training sentence is constructed differently from the original training sentence due to the addition of the double negative word, and the semantics of the first constructed training sentence are the same as the original training sentence due to the double negative table being positive.

For example, if the original training sentence is "compare with other car insurance, i buy car insurance of a certain company", the first structural training sentence added with the double negative word is "compare with other car insurance, i don't buy car insurance of a certain company".

It should be noted that the double negative training sentence is a sentence including a double negative word, for example, the double negative training sentence includes a non, not, etc. double negative words. It should be understood that the embodiments of the present application are not intended to limit the form of the double negatives.

It can be understood that the embodiment of the present application considers the expression of the chinese double negative sentence, and from the perspective of tone lightness and tone, the tone of the double negative positive sentence is heavier than the tone of the positive sentence, and the semantics are stronger. The double-negative sentence has a certain expression effect, but the double-negative sentence is applied to pay attention to the accuracy of the full sentence meaning expression, and because the double-negative word generally comprises two negative words, the machine may confuse the two negative words with one negative word meaning in the semantic recognition process, the embodiment of the application provides the model training method based on the double-negative sentence pattern.

In one embodiment of the present application, each target training sentence set further includes a second constructed training sentence with a semantic opposite to that of the original training sentence, and a third constructed sentence with a semantic identical to that of the second constructed training sentence and with a different construction, where the third constructed sentence is obtained by adding a double negative word to the second constructed training sentence.

That is, a second constructed training sentence is constructed from the original training sentence, the semantics of the second constructed training sentence being opposite to the original training sentence, for example: the original training sentence is 'compare with other car insurance, i buy car insurance of a certain company', then the second structure training sentence is 'compare with other car insurance, i can not buy car insurance of a certain company'. And adding a double negative word to the second structural training sentence to obtain a third structural training sentence, for example, the third structural training sentence is "compare with other car insurance, i.e. i must not buy car insurance of a certain company", i.e. the semantics of the second structural training sentence is the same as that of the third structural training sentence.

In summary, as shown in fig. 3, in order to improve the semantic recognition accuracy of the target semantic recognition model, in the embodiment of the present application, each original training statement included in the original sample data is extended by five types of statements (to obtain the target sample data), and the target semantic recognition model is obtained by training using the five types of statements. The target sample data comprises a target training sentence set corresponding to each original training sentence.

For example, taking a target training sentence set as an example, the target training sentence set includes original negative example data (i.e. negative sample), positive example training sentences (i.e. positive sample), first constructed training sentences (i.e. double negative positive sample), second constructed training sentences (i.e. soft negative sample), and third constructed training sentences (i.e. double negative soft negative sample).

Note that the soft negative examples are explicit negatives by adding a negative word, not negatives that are semantically antisense. Because it is an explicit negation by adding a negation word, it is more similar to the original sample data in the degree of similarity of the literal expression. Similarly, the double negative sentence is constructed by adding the double negative word.

After the five types of sentences are constructed, firstly, a space tool is used for analyzing the sentences to obtain a syntax tree and a part-of-speech tag of the sentences and mark word stems, and then the sentences are converted into negative sentences and double negative sentences with correct syntax and clear semantics according to the information.

For example, the soft negative sample has a highly similar expression in text to the original sample data, but the semantics are significantly different, and the negation of the original sample data is taken as the soft negative sample and verified by a loss function.

And S230, training the semantic recognition model to be trained at least according to the target sample data to obtain the target semantic recognition model.

Specifically, the specific process of training the semantic recognition model is as follows:

s1: target sample data is input into a semantic recognition model to be trained, and semantic prediction results corresponding to training sentences in at least one target training sentence set included in the target sample data are obtained.

That is to say, as shown in fig. 4, the embodiment of the present application is inspired by a PromptBERT model, and in the process of obtaining a semantic prediction result of identifying semantics, original sample data and negative samples are input into PromptBERT-1410, positive samples, double negative positive samples, soft negative samples, and double negative soft negative samples are input into PromptBERT-2420 (it can be understood that the model structures of PromptBERT-1 and PromptBERT-2 are the same and the model parameters are shared), and a semantic prediction result H1 corresponding to the original sample data, a semantic prediction result H2 corresponding to the negative samples, a semantic prediction result H3 corresponding to the positive samples, a semantic prediction result H4 corresponding to the double negative positive samples, a semantic prediction result H5 corresponding to the soft negative samples, and a semantic prediction result H6 corresponding to the double negative soft negative samples are obtained.

In particular, in the process of inputting the above sentences into a pre-trained language model (e.g., BERT model or RoBERTA model), a Prompt (i.e., Prompt) needs to be marked after each sentence, and the Prompt can be characterized as meaning [ MASK ]]"connected with the preceding sentence, is characterized as" sentence: "[ X ]]"means [ MASK]"where X represents the original sample data, negative sample, positive sample, double negative positive sample, soft negative sample, and double negative soft negative sample, and MASK represents masking the semantics of the statement. Therefore, the to-be-trained semantic recognition model is filled with MASK to output a semantic prediction result. Will be specialMASK mark of "[ MASK ]]"hidden state of _[MASK] As the output vector of the semantic recognition model to be trained, and adding a full connected layer pair H with tanh activation function in the training process _[MASK] And carrying out transformation to obtain a semantic prediction result. The semantic prediction result is characterized by the following formula (1):

H＝tanh(MLP(H _[MASK] ) (1)

wherein H represents semantic prediction result, tanh represents activation function, MLP represents full-link layer, H _[MASK] Denotes a MASK flag "[ MASK ]]"hidden state.

S2: and obtaining a target loss value according to the target loss function and the semantic prediction result.

It should be noted that, first, an original sentence semantic prediction result for an original training sentence, a first structural sentence semantic prediction result for a first structural training sentence, a second structural sentence semantic prediction result for a second structural training sentence, and a third structural sentence semantic prediction result for a third structural training sentence need to be obtained.

In one embodiment of the present application, the target loss function is associated with a similarity loss sub-function. The steps for calculating the target loss value by the similarity loss sub-function are as follows:

s201: and calculating the total marginal loss value between the semantic prediction result of the original statement and the semantic prediction result of the first structural statement, the semantic prediction result of the second structural statement and the semantic prediction result of the third structural statement through the similarity loss sub-function.

That is, after the semantic prediction results of each sentence are obtained in S1, the similarity between the semantic prediction results is calculated from the objective loss function. The method comprises two parts in loss calculation, namely, a positive sample and a negative sample are distinguished by using a contrast loss sub-function, and a positive sample, a double negative positive sample, a soft negative sample and a double negative soft negative sample are distinguished by using a similarity loss sub-function. In practical applications, however, only the positive samples, the double negative positive samples, the soft negative samples and the double negative soft negative samples may need to be distinguished, and therefore, in the embodiment of the present application, the target loss value is calculated by the similarity loss sub-function.

Specifically, the similarity loss sub-function includes a first loss function and a second loss function. The process of calculating the total marginal loss value is as follows:

firstly, calculating similarity difference values between an original sentence semantic prediction result and a first structural sentence semantic prediction result, between a second structural sentence semantic prediction result and between a third structural sentence semantic prediction result through a first loss function.

It can be understood that the original sentence semantic prediction result is represented as H1, the semantic prediction result corresponding to the negative sample is represented as H2, the semantic prediction result corresponding to the positive sample is represented as H3, the semantic prediction result corresponding to the double negative positive sample (i.e., the first structural sentence semantic prediction result) is represented as H4, the semantic prediction result corresponding to the soft negative sample (i.e., the second structural sentence semantic prediction result) is represented as H5, and the semantic prediction result corresponding to the double negative soft sample (i.e., the third structural sentence semantic prediction result) is represented as H6.

Specifically, a similarity value cos (H1, H3) between H1 and H3 and a similarity value cos (H1, H5) between H1 and H5 are calculated using a cosine similarity algorithm, and then the cos (H1, H5) is subtracted from the cos (H1, H3) to obtain a first similarity difference Δ ₁ Characterized by the following expression (2):

Δ ₁ ＝cos(H1，H5)-cos(H1，H3) (2)

calculating a similarity value cos (H1, H4) between H1 and H4 and a similarity value cos (H1, H6) between H1 and H6 by using a cosine similarity algorithm, and then subtracting cos (H1, H4) from cos (H1, H6) to obtain a second similarity difference delta ₂ Characterized by the following expression (3):

Δ ₂ ＝cos(H1，H4)-cos(H1，H6) (3)

subtracting cos (H1, H4) from cos (H1, H3) to obtain a third similarity difference Δ ₃ Characterized by the following expression (4):

Δ ₃ ＝cos(H1，H3)-cos(H1，H4) (4)

then, based on the similarity difference and the second loss function, a total marginal loss value is obtained.

That is, the second loss function is to define a Bidirectional Margin Loss (BML), and the embodiment of the present application simulates a semantic similarity difference using the BML and the three similarity differences.

Specifically, the objective of BML loss is to limit the similarity difference to [ - β, - α [ - β - ]]Within the interval, wherein ₁ And beta ₁ Respectively representing the upper difference and the lower difference of the semantic similarity of the positive sample and the soft negative sample; alpha is alpha ₂ And beta ₂ Respectively representing the upper difference and the lower difference of the semantic similarity of the double negative positive samples and the double negative soft negative samples; alpha is alpha ₃ And beta ₃ Respectively representing the upper and lower differences of the similarity of the positive sample and the double negative positive sample.

Thereafter, a total marginal loss value is calculated by defining a bidirectional marginal loss, as shown by the following expressions (5), (6), and (7):

L _{BML_1} ＝ReLU(Δ ₁ +α ₁ )+ReLU(-Δ ₁ -β ₁ ) (5)

L _{BML_2} ＝ReLU(Δ ₂ +α ₂ )+ReLU(-Δ ₂ -β ₂ ) (6)

L _{BML_3} ＝ReLU(Δ ₃ +α ₃ )+ReLU(-Δ ₃ -β ₃ ) (7)

wherein L is _{BML_1} Representing the marginal loss value, L, corresponding to the first similarity difference _{BML_2} Representing the marginal loss value, L, corresponding to the second similarity difference _{BML_3} And representing the marginal loss value corresponding to the third similarity difference value, wherein ReLU represents an activation function.

Total marginal loss value is determined by adding L _{BML_1} 、L _{BML_2} And L _{BML_3} Adding to obtain L _BML The total marginal loss value is shown by the following formula (8):

L _BML ＝L _{BML_1} +L _{BML_2} +L _{BML_3} (8)

wherein L is _BML The total margin loss value is indicated.

Therefore, according to the embodiment of the application, through the first loss function and the second loss function, the loss between sentences with the same semantic meaning can be gradually reduced in the training process, and an accurate semantic recognition result can be obtained.

S202: and weighting the total marginal loss value to obtain a target loss value, or taking the total marginal loss value as the target loss value.

That is, after obtaining the total marginal loss value in S201, the total marginal loss value may be directly used as the target loss value, or the target loss value may be obtained by multiplying the total marginal loss value by a coefficient.

In another embodiment of the present application, the target loss function is further associated with a contrast loss sub-function. The specific process diagram for obtaining the target loss value by the similarity loss sub-function and the contrast loss sub-function is shown below:

firstly, calculating a semantic prediction result of original negative sample data and a contrast loss value between original training sentences through a contrast loss sub-function.

That is, the embodiment of the present application distinguishes the positive sample and the negative sample by comparing the loss sub-function, and the calculation method is shown by the following formula (9):

wherein L is _InfoNCE The contrast loss value is expressed, cos is expressed by cosine similarity algorithm, tau is expressed by temperature factor, and N is expressed by the number of positive samples and negative samples.

Then, the comparison loss value and the total marginal loss value are subjected to weighted summation to obtain a target loss value.

That is to say, in the present embodiment, the target loss value is determined by the comparison loss value and the total marginal loss value together, specifically, the comparison loss value and the total marginal loss value are weighted and summed, where the multiplication of the weighted value and the total marginal loss value reflects the importance of the soft negative sample, the double negative positive sample and the double negative sample, and at the same time, the form difference between the comparison loss value and the total marginal loss value is adjusted. The calculation method of the target loss value is represented by the following formula (10):

L _all ＝L _InfoNCE +γL _BML (10)

wherein L is _all Represents the target loss value, L _InfoNCE Represents the value of contrast loss, L _BML Representing the total marginal loss value and gamma the weight value.

It is understood that when γ is 0, it indicates that the total marginal loss value is not present, and that a larger γ indicates that the total marginal loss value is more important to the overall loss. The value range of gamma is [0,1 ].

S3: and adjusting parameters in the semantic recognition model to be trained through the target loss value.

That is, a target loss value is obtained in each round of cyclic training, and after the parameters in the semantic recognition model to be trained are adjusted under the condition that the target loss value does not reach the minimum value or does not reach the threshold value, the semantic recognition model to be trained after the parameters are updated is used for continuing the next round of training.

S4: and repeating the steps until the target loss value meets the preset requirement, terminating the training and obtaining the target semantic recognition model.

That is, in one embodiment of the present application, when the target loss value reaches a minimum, the training is terminated and the target semantic recognition model is obtained. In another embodiment of the present application, when the target loss value reaches a preset threshold, the training is terminated and the target semantic recognition model is obtained. In yet another embodiment of the present application, after the repetition number reaches a preset number (for example, 2000 times), the training is terminated and the target semantic recognition model is obtained.

Therefore, the embodiment of the application fully considers the characteristics of double negative sentences in Chinese expression, and adopts the combination of a new paradigm (pre-training semantic recognition model, prompt and semantic prediction) and a loss function of contrast learning to obtain a target semantic recognition model. And then storing the trained target semantic recognition model, and generating an accurate semantic vector by using the target semantic recognition model.

The above describes a method for training a semantic recognition model in the embodiment of the present application, and the following describes a method for searching a sentence by applying a target semantic recognition model in the embodiment of the present application.

It is understood that the target semantic recognition model in the embodiment of the present application can be applied to an online retrieval system.

In one embodiment of the present application, a method for finding a statement is shown by the following steps:

the method comprises the following steps: and obtaining the statement to be matched.

That is, the user inputs a question (i.e., a sentence to be matched) in the online retrieval system of the client, the online retrieval system transmits the question to the server, and the server receives the question in real time.

Step two: and inputting the sentence to be matched into the target semantic recognition model, and obtaining a semantic vector corresponding to the sentence to be matched through the target semantic recognition model.

It will be appreciated that the server batches the set of target sentences answering the question before retrieving the question, obtains each semantic vector corresponding thereto and stores it in the database so that the semantic vectors of the set of target sentences are represented offline. And then loading a parameter file of the target semantic recognition model into a memory, and initializing the target semantic recognition model.

That is, after the question is obtained, the question is input into the target semantic recognition model, and a semantic vector corresponding to the question is calculated using the target semantic recognition model.

Step three: and searching a target statement matched with the statement to be matched according to the semantic vector.

That is, vectors of the target sentence set are obtained, a similarity is calculated between the semantic vector of the problem and each of the vectors of the target sentence set, and then the target sentence is obtained according to the similarity.

Having described specific embodiments of a method for querying a sentence of the present application, an apparatus for training a semantic recognition model is described.

As shown in fig. 5, an apparatus 500 for training a semantic recognition model includes: a data acquisition module 510, a data generation module 520, and a model training module 530.

A data obtaining module 510 configured to obtain original sample data, wherein the original sample data includes a plurality of original training statements.

A data generating module 520, configured to obtain target sample data according to the original sample data, where the target sample data includes a plurality of target training sentence sets, and each target training sentence set includes an original training sentence and a first structural training sentence that has the same semantic as the original training sentence and has a different structure.

A model training module 530 configured to train the semantic recognition model to be trained according to at least the target sample data, so as to obtain a target semantic recognition model.

In an embodiment of the application, the first constructed training sentence is a double negative training sentence, and the double negative training sentence is obtained by adding a double negative word to the original training sentence.

In an embodiment of the present application, each target training sentence set further includes a second constructed training sentence with a semantic opposite to that of the original training sentence, and a third constructed sentence with a semantic identical to that of the second constructed training sentence and with a different construction, where the third constructed sentence is obtained by adding a double negative word to the second constructed training sentence.

In one embodiment of the present application, the model training module 530 is configured to input the target sample data into a semantic recognition model to be trained; obtaining a semantic prediction result corresponding to the target training sentence set through the semantic recognition model; obtaining a target loss value according to a target loss function and the semantic prediction result; adjusting parameters in the semantic recognition model to be trained according to the target loss value; and repeating the steps until the target loss value meets the preset requirement, terminating the training and obtaining the target semantic recognition model.

In one embodiment of the present application, the objective loss function is related to at least a similarity loss sub-function; a model training module 530 configured to obtain an original sentence semantic prediction result for the original training sentence, a first constructed sentence semantic prediction result for the first constructed training sentence, a second constructed sentence semantic prediction result for the second constructed training sentence, and a third constructed sentence semantic prediction result for the third constructed training sentence; calculating total marginal loss values among the original sentence semantic prediction result, the first constructed sentence semantic prediction result, the second constructed sentence semantic prediction result and the third constructed sentence semantic prediction result through the similarity loss sub-function; taking the total marginal loss value as the target loss value.

In one embodiment of the present application, the similarity loss sub-function comprises a first loss function and a second loss function; a model training module 530 configured to: calculating similarity differences between the original sentence semantic prediction result and the first, second and third structural sentence semantic prediction results through the first loss function; and obtaining the total marginal loss value based on the similarity difference value and the second loss function.

In one embodiment of the present application, the target loss function is further associated with a contrast loss sub-function; a model training module 530 configured to: acquiring original negative example sample data, wherein the original negative example sample data comprises a plurality of negative example training sentences; calculating a semantic prediction result of the original negative sample data and a contrast loss value between the original training sentences through the contrast loss sub-function; and carrying out weighted summation on the contrast loss value and the total marginal loss value to obtain a target loss value.

In the embodiment of the present application, the module shown in fig. 5 can implement each process in the method embodiments of fig. 1 to 4. The operations and/or functions of the respective modules in fig. 5 are respectively for implementing the corresponding flows in the method embodiments in fig. 1 to 4. Reference may be made specifically to the description of the above method embodiments, and a detailed description is appropriately omitted herein to avoid redundancy.

As shown in fig. 6, an embodiment of the present application provides an electronic device 600, including: a processor 610, a memory 620 and a bus 630, wherein the processor is connected to the memory through the bus, the memory stores computer readable instructions, when the computer readable instructions are executed by the processor, for implementing the method according to any one of the above embodiments, specifically, the description of the above embodiments of the method can be referred to, and the detailed description is omitted here to avoid repetition.

Wherein the bus is used for realizing direct connection communication of the components. The processor in the embodiment of the present application may be an integrated circuit chip having signal processing capability. The Processor may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; but may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components. The various methods, steps, and logic blocks disclosed in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The Memory may be, but is not limited to, a Random Access Memory (RAM), a Read Only Memory (ROM), a Programmable Read Only Memory (PROM), an Erasable Read Only Memory (EPROM), an electrically Erasable Read Only Memory (EEPROM), and the like. The memory stores computer readable instructions that, when executed by the processor, perform the methods described in the embodiments above.

It will be appreciated that the configuration shown in fig. 6 is merely illustrative and may include more or fewer components than shown in fig. 6 or have a different configuration than shown in fig. 6. The components shown in fig. 6 may be implemented in hardware, software, or a combination thereof.

Embodiments of the present application further provide a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a server, the method in any of the above-mentioned all embodiments is implemented, which may specifically refer to the description in the above-mentioned method embodiments, and in order to avoid repetition, detailed description is appropriately omitted here.

The above description is only a preferred embodiment of the present application and is not intended to limit the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application. It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.

The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A method of training a semantic recognition model, the method comprising:

acquiring original sample data, wherein the original sample data comprises a plurality of original training sentences;

obtaining target sample data according to the original sample data, wherein the target sample data comprises a plurality of target training statement sets, and each target training statement set comprises an original training statement and a first construction training statement which has the same semantic meaning as the original training statement and has a different construction;

and training the semantic recognition model to be trained at least according to the target sample data to obtain a target semantic recognition model.

2. The method of claim 1, wherein the first constructed training sentence is a double negative training sentence obtained by adding a double negative word to the original training sentence.

3. The method according to any one of claims 1 to 2,

each target training sentence set further comprises a second construction training sentence with the semantic opposite to that of the original training sentence, and a third construction sentence with the same semantic as that of the second construction training sentence and with a different construction, wherein the third construction sentence is obtained by adding double negative words in the second construction training sentence.

4. The method of claim 3,

training a semantic recognition model to be trained at least according to the target sample data to obtain a target semantic recognition model, comprising:

inputting the target sample data into a semantic recognition model to be trained;

obtaining a semantic prediction result corresponding to the target training sentence set through the semantic recognition model;

obtaining a target loss value according to a target loss function and the semantic prediction result;

adjusting parameters in the semantic recognition model to be trained according to the target loss value;

and repeating the steps until the target loss value meets the preset requirement, terminating the training and obtaining the target semantic recognition model.

5. The method of claim 4, wherein the objective loss function is related to at least a similarity loss sub-function; wherein the content of the first and second substances,

the obtaining of the semantic prediction result corresponding to the target training sentence set through the semantic recognition model includes:

obtaining an original sentence semantic prediction result for the original training sentence, a first constructed sentence semantic prediction result for the first constructed training sentence, a second constructed sentence semantic prediction result for the second constructed training sentence, and a third constructed sentence semantic prediction result for the third constructed training sentence;

the obtaining a target loss value according to the target loss function and the semantic prediction result includes:

calculating total marginal loss values among the original sentence semantic prediction result, the first constructed sentence semantic prediction result, the second constructed sentence semantic prediction result and the third constructed sentence semantic prediction result through the similarity loss sub-function;

taking the total marginal loss value as the target loss value.

6. The method of claim 5, wherein the similarity loss sub-function comprises a first loss function and a second loss function;

the calculating, by the similarity loss sub-function, a total marginal loss value between the original sentence semantic prediction result and the first, second, and third structural sentence semantic prediction results includes:

calculating similarity differences between the original sentence semantic prediction result and the first, second and third structural sentence semantic prediction results through the first loss function;

and obtaining the total marginal loss value based on the similarity difference value and the second loss function.

7. The method of claim 6, wherein the target loss function is further associated with a contrast loss sub-function;

before the obtaining a target loss value according to the target loss function and the semantic prediction result, the method further includes:

acquiring original negative example sample data, wherein the original negative example sample data comprises a plurality of negative example training sentences;

the obtaining of the target loss value according to the target loss function and the semantic prediction result includes:

calculating a semantic prediction result of the original negative sample data and a contrast loss value between the original training sentences through the contrast loss sub-function;

and carrying out weighted summation on the contrast loss value and the total marginal loss value to obtain a target loss value.

8. A method of finding a statement, the method comprising:

obtaining a statement to be matched;

inputting the statement to be matched into a target semantic recognition model obtained according to any one of claims 1 to 7, and obtaining a semantic vector corresponding to the statement to be matched through the target semantic recognition model;

and searching a target statement matched with the statement to be matched according to the semantic vector.

9. An apparatus for training a semantic recognition model, the apparatus comprising:

the data acquisition module is configured to acquire original sample data, wherein the original sample data comprises a plurality of original training sentences;

the data generating module is configured to obtain target sample data according to the original sample data, wherein the target sample data comprises a plurality of target training statement sets, and each target training statement set comprises an original training statement and a first construction training statement which has the same semantic meaning as the original training statement and has a different construction;

and the model training module is configured to train the semantic recognition model to be trained at least according to the target sample data to obtain the target semantic recognition model.

10. An electronic device, comprising: a processor, a memory, and a bus;

the processor is connected to the memory via the bus, the memory storing computer readable instructions for implementing the method of any one of claims 1-8 when the computer readable instructions are executed by the processor.