CN114723008A - Language representation model training method, device, equipment, medium and user response method - Google Patents

Language representation model training method, device, equipment, medium and user response method Download PDF

Info

Publication number
CN114723008A
CN114723008A CN202210347404.2A CN202210347404A CN114723008A CN 114723008 A CN114723008 A CN 114723008A CN 202210347404 A CN202210347404 A CN 202210347404A CN 114723008 A CN114723008 A CN 114723008A
Authority
CN
China
Prior art keywords
vector
model
characterization
sentence
module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210347404.2A
Other languages
Chinese (zh)
Inventor
侯盼盼
黄明星
王福钋
张航飞
徐华韫
曹富康
沈鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Absolute Health Ltd
Original Assignee
Beijing Absolute Health Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Absolute Health Ltd filed Critical Beijing Absolute Health Ltd
Priority to CN202210347404.2A priority Critical patent/CN114723008A/en
Publication of CN114723008A publication Critical patent/CN114723008A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Abstract

The embodiment of the application provides a training method, a training device, equipment, a medium and a user response method of a language characterization model, wherein in the training method, a sample set is obtained, and each sample comprises a first statement, a second statement and a similar label; inputting a first statement into a first basic model to obtain a first characterization vector; inputting a second statement into a second base model to obtain a second characterization vector; calculating a distance between the first token vector and the second token vector to obtain a distance vector; splicing the first characterization vector, the second characterization vector and the distance vector, inputting the spliced vectors into a classification layer, and obtaining a prediction result; and calculating a difference value between the prediction result and the similar label, and adjusting parameters of the first basic model and the second basic model according to the difference value so as to train a language representation model, wherein the model can generate a vector with stronger representation capability and embody more accurate semantics of the sentence.

Description

Language representation model training method, device, equipment, medium and user response method
[ technical field ] A method for producing a semiconductor device
The embodiment of the application relates to the technical field of computers, in particular to a method, a device, equipment, a medium and a user response method for training a language representation model.
[ background of the invention ]
Sentence vector representation learning occupies an important position in the Natural Language Processing (NLP) field, and the success of accurate reply of many NLP technologies, such as frequent-ordered questions (FAQ), is rather than training high-quality sentence representation vectors. For example, for tasks such as Text Semantic matching (Semantic Text Similarity), Text vector Retrieval (Dense Text Retrieval), etc., the model needs to determine the matching score by calculating the Similarity of the coded Embedding (Embedding vector) of two sentences in the representation space to measure the Semantic relevance of the two sentences. The sentence expression vector directly determines the matching accuracy, efficiency and the like of the task.
The sentence vectors derived by the encoder such as BERT model (without Fine-tune, averaging all word vectors) are lower in quality and even not comparable to the result of Glove, so that the semantic similarity of two sentences is difficult to reflect and the sentence vectors are used for the downstream semantic matching task. While the encoder can achieve good performance on many NLP tasks after passing through the supervised Fine-tune, the supervised corpus for Fine-tune is expensive.
Therefore, in the prior art, a training method needs to be found, only a small amount of text from a downstream task without labels is used for performing tune-tune on a model, and simultaneously, the model after tune-tune can generate a vector with stronger representation capability, so that the model is more suitable for the downstream task.
[ summary of the invention ]
The embodiment of the application provides a training method, a training device, equipment, a training medium and a user response method for a language representation model, so that the language representation model can be trained, vectors with stronger representation capability can be generated by the model, and more accurate semantics of sentences can be reflected.
In a first aspect, an embodiment of the present application provides a method for training a language representation model, including: obtaining a sample set, wherein each sample comprises a first statement, a second statement and a similar label; inputting the first statement into a first basic model to obtain a first characterization vector; inputting the second statement into a second base model to obtain a second characterization vector, wherein the second base model is obtained by copying the first base model; calculating a distance between the first token vector and the second token vector to obtain a distance vector; splicing the first characterization vector, the second characterization vector and the distance vector, inputting the spliced vectors into a classification layer, and obtaining a prediction result; and calculating a difference value between the prediction result and a similar label, and adjusting parameters of the first base model and the second base model according to the difference value.
In the training method of the language representation model, the obtained model can excellently resolve the interference of the high-frequency words on sentence semantic representation by adopting a training method of contrast learning, after training, the sentence representation generated by the model is no longer dominated by the high-frequency words, and after removing the first high-frequency words, the performance does not have obvious change. Because the learning objective of the contrast learning "identify itself" can naturally recognize and suppress such high-frequency features, thereby avoiding that sentences with large semantic differences represent too close (i.e., the above-mentioned collapse phenomenon). In the training method, the fault tolerance of the model to the data is enhanced by changing the means of sample similarity judgment, so that the finally learned vector can embody purer and accurate 'semantics'.
In one possible implementation manner, at least two sentences are obtained, data enhancement is performed on each sentence, and an enhanced data set corresponding to each sentence is obtained, wherein the enhanced data set at least comprises two enhanced sentences with the same semantics; taking two enhancement statements in the same enhancement data set as positive samples, and taking two enhancement statements in different enhancement data sets as negative samples; a plurality of positive samples and a plurality of negative samples are collected, and the sample set is constructed.
In a second aspect, an embodiment of the present application provides a user response method, including: acquiring all sentences in a knowledge base, and converting each sentence into a characterization vector by using a language characterization model; receiving a user query statement, and converting the user query statement into a query characterization vector by using a language characterization model; calculating the similarity of all the characterization vectors and the query characterization vector to search the most similar characterization vector; taking the sentence corresponding to the most similar characterization vector as a response sentence of the user query sentence; wherein the language characterization model comprises: executing any one training method of claims 1 to 2 to obtain the language characterization model.
In one possible implementation manner, the method further includes: segmenting the user query sentence to obtain a plurality of words; inputting the words into the language representation model to obtain a plurality of word representation vectors; calculating the similarity of each word characterization vector and the query characterization vector to query the most similar word characterization vector; and taking the word corresponding to the most similar word token vector as the user intention.
In a third aspect, an embodiment of the present application provides a training apparatus for a language representation model, where the training apparatus is disposed in a terminal device, and the training apparatus includes: the system comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring a sample set, and each sample comprises a first statement, a second statement and similar labels; a first base model for converting the first statement into a first token vector; a second base model for converting the second sentence input into a second token vector, wherein the second base model is obtained by copying the first base model; a calculation module for calculating a distance between the first token vector and the second token vector to obtain a distance vector; the classification layer is used for splicing the first characterization vector, the second characterization vector and the distance vector, and generating a prediction result through the spliced vectors; and the adjusting module is used for calculating a difference value between the prediction result and the similar label and adjusting the parameters of the first basic model and the second basic model according to the difference value.
In one possible implementation manner, the method further includes: the system comprises an enhancement module, a semantic enhancement module and a semantic enhancement module, wherein the enhancement module is used for acquiring at least two sentences, performing data enhancement on each sentence and acquiring an enhancement data set corresponding to each sentence, and the enhancement data set at least comprises two enhancement sentences with the same semantic meaning; the sample generation module is used for taking two enhancement sentences in the same enhancement data set as positive samples and taking two enhancement sentences in different enhancement data sets as negative samples; and the sample set construction module is used for collecting a plurality of positive samples and a plurality of negative samples and constructing the sample set.
In a fourth aspect, an embodiment of the present application provides a user response apparatus, which is disposed in a terminal device, and includes: the first conversion module is used for acquiring all sentences in the knowledge base and converting each sentence into a characterization vector by using a language characterization model; the second conversion module is used for receiving a user query statement and converting the user query statement into a query representation vector by using the language representation model; the searching module is used for calculating the similarity of all the characterization vectors and the query characterization vector so as to search the most similar characterization vector; a response module, configured to use a sentence corresponding to the most similar characterization vector as a response sentence of the user query sentence; wherein the language characterization model comprises: executing any one training method of claims 1 to 2 to obtain the language characterization model.
In one possible implementation manner, the method further includes: the word segmentation module is used for segmenting words of the user query sentence to obtain a plurality of words; the third conversion module is used for inputting the words into the language representation model to obtain a plurality of word representation vectors; the second searching module is used for calculating the similarity between each word token vector and the query token vector so as to query the most similar word token vector; and the user intention determining module is used for taking the word corresponding to the most similar word feature vector as the user intention.
In a fifth aspect, an embodiment of the present application provides a terminal device, including: at least one processor; and at least one memory communicatively coupled to the processor, wherein: the memory stores program instructions executable by the processor, the processor calling the program instructions to be able to perform the method provided by the first aspect.
In a sixth aspect, embodiments of the present application provide a computer-readable storage medium storing computer instructions, which cause the computer to execute the method provided in the first aspect.
[ description of the drawings ]
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present specification, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a flow chart of a method for training a language characterization model according to an embodiment of the present application;
FIG. 2 is a flow chart of a method for generating a sample set according to an embodiment of the present application;
FIG. 3 is a flowchart of a user response method according to an embodiment of the present application;
FIG. 4 is a schematic diagram of a training apparatus for a language characterization model according to an embodiment of the present application;
fig. 5 is a schematic diagram of a user response device according to another embodiment of the present application.
[ detailed description ] embodiments
For better understanding of the technical solutions in the present specification, the following detailed description of the embodiments of the present application is provided with reference to the accompanying drawings.
It should be understood that the described embodiments are only a few embodiments of the present specification, and not all embodiments. All other embodiments obtained by a person skilled in the art based on the embodiments in the present specification without any inventive step are within the scope of the present specification.
The terminology used in the embodiments of the present application is for the purpose of describing particular embodiments only and is not intended to be limiting of the specification. As used in the examples of this application and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.
The Transformer is a model proposed in 2017 by Google, and is a seq2seq (end-to-end) model largely using Self-Attention mechanism, and the Transformer mainly includes two parts, namely an encoder and a decoder.
The BERT model, an Encoders (coder) published by google in 2018, pre-trains the deep bi-directional representation by jointly adjusting left and right context in all layers.
The BERT model integrates the advantages of a plurality of natural language processing models and has better effect in a plurality of natural language processing tasks. In the related art, the model input vector of the BERT model is the sum of vectors of a word vector (Token Embedding), a Position vector (Position Embedding), and a sentence vector (Segment Embedding). The word vector is vectorized representation of the word, the position vector is used for representing the position of the word in the text, and the sentence vector is vectorized representation of a sentence in the text.
Pre-training (pre-train): a process for learning neural network models to common features in a data set by training the neural network models using a large data set. The purpose of the pre-training is to provide good quality model parameters for subsequent neural network model training on a particular data set. The pre-training in the embodiments of the present application refers to a process of training a neural network using a label-free training text to obtain a basic model such as BERT.
Fine-tune (fine-tune): a process for further training a pre-trained neural network model using a particular data set. In general, the data amount of the data set used in the fine tuning stage is smaller than that of the data set used in the pre-training stage, and the fine tuning stage adopts a supervised learning manner, that is, the training samples in the data set used in the fine tuning stage include labeled information. The training, i.e., fine tuning phase, in the embodiments of the present application refers to a process of training a base model, such as a BERT model, using a training text containing classification labels.
In order to distinguish various models, the model used for fine-tune is called a basic model, and the model obtained after the basic model is processed by fine-tune is called a language representation model.
In the prior art, the sentence vector derived by the encoder is difficult to reflect the semantic similarity of two sentences. Taking BERT as an example, the applicant further analyzed the characteristics of the sentence vector derived by BERT in the course of research, and confirmed the following two points:
BERT tends to encode all sentences into a small spatial region, which results in most sentence pairs having a high similarity score, even those that are semantically completely unrelated. This is referred to as the "Collapse (Collapse)" phenomenon represented by the BERT sentence.
Collapse of the BERT sentence vector representation is related to the high frequency words in the sentence. Specifically, when the sentence vectors are calculated by averaging the word vectors, the word vectors of the high-frequency words will dominate the sentence vectors, making it difficult to embody the original semantics thereof. When a plurality of high-frequency words are removed when the sentence vectors are calculated, the collapse phenomenon can be relieved to a certain extent.
Based on the above problems, embodiments of the present application provide a method for training a language representation model, so as to train a language representation model, where the language representation model can generate a vector with stronger representation capability, and reflect more accurate semantics of a sentence.
Fig. 1 is a flowchart of a method for training a language representation model according to an embodiment of the present application, and as shown in fig. 1, the method for training the language representation model may include:
step 101, a sample set is obtained, wherein each sample comprises a first statement, a second statement and a similar label.
Acquiring a basic model and a sample set for training the model, where the sample set includes a plurality of samples, each sample includes training data and tag information, the training data is two sentences, that is, the first sentence and the second sentence, the tag information is used to characterize whether two sentences in the training data are sentences with similar semantics, and a pair of similar sentences are two sentences with the same semantics but different expression structures or words, for example, sample 1 may be expressed as: training data: recommending an insurance to me, i want to buy the insurance, marking information: is.
The basic model is a model used for training in the embodiment of the present application, and the basic model may be: bert, simbert, etc. simbert is a model based on BERT, the pretraining process of simbert belongs to supervised training, a Seq2Seq part is constructed by predicting a similar sentence generation task of another sentence through one sentence, and then a retrieval task is trained by representing an input sentence vector through a [ CLS ] vector in fact, so that the model has sentence generation capability and similar sentence pair retrieval capability at the same time.
Step 102, inputting the first statement into a first base model to obtain a first characterization vector.
In order to enable the model to have stronger representation capability on query in the field, a contrast learning idea is introduced in the training process. Contrast learning is one of the self-supervision tasks widely applied at present, and the core idea is as follows: humans distinguish objects by "contrast", so similar things should be close in the coded representation space, and different things should be as far apart as possible.
Therefore, the training process of the present application will use two identical models to generate vectors respectively for comparison learning.
The base model is copied once to obtain two identical base models, one of which is referred to as a first base model and the other is referred to as a second base model in the present application.
And inputting a sentence in the sample into the first basic model, and acquiring a characterization vector obtained by coding the sentence by the first basic model.
Step 103, inputting the second statement into a second basic model to obtain a second characterization vector, wherein the second basic model is obtained by copying the first basic model.
And inputting another sentence in the same sample into the second basic model, and acquiring a characterization vector obtained by coding the sentence by the second basic model.
And 104, calculating the distance between the first characterization vector and the second characterization vector to obtain a distance vector.
And 105, splicing the first characterization vector, the second characterization vector and the distance vector, inputting the spliced vectors into a classification layer, and obtaining a prediction result.
After the characterization vectors of the two sentences are obtained, for example, the two sentences, namely, the sensor 1 and the sensor 2, are subjected to embedding coding through a first basic model and a second basic model respectively to obtain u and v vectors, and a distance between the u and v vectors is calculated, for example, | u-v |, wherein | u-v | refers to a vector formed by taking absolute values of each element of u-v, and the distance between the u and v vectors is referred to as a distance vector in the application.
And splicing the three vectors of u, v and u-v to serve as features, inputting the spliced features into a classifier, predicting by the classifier according to the input features, and predicting whether the two first sentences and the second sentences are sentences with similar semantics or not. The discriminator may be a fully connected layer, etc., which is not limited in this application.
In the prior art, assuming that the sentence vectors obtained after the two sentences pass through the encoder are u, v, respectively, since the cosine values cos (u, v) < u, v >/(| u | x | v |) of the sentences are generally used for sorting in the retrieval stage, the prior art designs the loss function based on cos (u, v), and the design principle of the loss function should make the cos (u, v) of the positive sample pair as large as possible and make the cos (u, v) of the negative sample pair as small as possible.
The application considers that for the similarity task, human beings generally consider strict similarity to be similar, but the training data is generally not as accurate. On the one hand, the label itself may be noisy; on the other hand, for some sample pairs, the annotators may be marked as positive sample pairs because their topics (rather than semantics) are the same. That is, annotation data is generally less stringent than what is required by humans, and if our ranking metric is learned directly with the annotation results, it will instead be subject to unexpected bias.
Therefore, the scheme in the prior art is not adopted in the present application, and u, v, | u-v | are utilized simultaneously, and from the interpretability perspective, the classifier in the present application can be expressed as: s ═ u, w1> + < v, w2> + < | u-v |, w3>, where w1, w2, w3 characterize the parameter vectors in the classifier.
For the first two scoring items < u, w1> + < v, w2>, if it is very large, it cannot be said that u and v are very close, and similarly if it is very small, it cannot be said that u and v are far apart, and it acts more like a "topic classification" model for identifying whether the topics of u and v are consistent;
for the third term, it is known that | u-v | ═ 0 can represent u ═ v, so the third term is capable of determining the degree of approximation of the two vectors, which may represent true "semantic similarity".
In combination, it can be considered that the judgment by splicing the vectors includes not only the scoring for judging whether the topics of the two sentences are consistent, but also the scoring for judging whether the semantics of the two sentences are similar, and the topics and the semantics are separated, so that the fault tolerance of the model to the data is enhanced, and the finally learned vectors can embody the relatively pure and accurate semantics.
And 106, calculating a difference value between the prediction result and the similar label, and adjusting parameters of the first basic model and the second basic model according to the difference value.
Acquiring true labels, wherein the true labels are label information of two sentences, calculating a difference value between the true labels and the discriminator by adopting a loss function such as a cross entropy loss function, and adjusting parameters of the first basic model and the second basic model according to the difference value so as to finish one-time training on the models.
The training process is carried out on each sample in the sample set, the trained first basic model or second basic model can be used as a language representation model, the model obtained after training has strong language representation capability and recognition capability of similar sentence pairs, cos (u, v) is not directly involved in training, cos (u, v) can be used for retrieval in prediction, and performance in actual testing is quite good.
In the training method of the language representation model, the obtained model has stronger representation of sentences, especially has higher semantic identification capability for similar sentence pairs, can excellently resolve the interference of high-frequency words on the semantic representation of the sentences, the sentence representation generated by the model after training is no longer dominated by the high-frequency words, and after removing the first high-frequency words, the performance has no obvious change because the learning target of 'distinguishing self' of the contrast learning can naturally identify and inhibit the high-frequency characteristics, thereby avoiding the sentence representations with larger semantic differences from being too close (namely the collapse phenomenon). In the training method, the fault tolerance of the model to the data is enhanced by changing the means of sample similarity judgment, so that the finally learned vector can embody purer and accurate 'semantics'.
Fig. 2 is a flowchart of a method for generating a sample set according to another embodiment of the present application, as shown in fig. 2, in the embodiment shown in fig. 1 of the present application, the method further includes:
acquiring at least two sentences, performing data enhancement on each sentence, and acquiring an enhanced data set corresponding to each first basic model, wherein the enhanced data set at least comprises two enhanced sentences with the same semantic meaning;
taking two enhancement statements in the same enhancement data set as positive samples, and taking two enhancement statements in different enhancement data sets as negative samples;
a plurality of positive samples and a plurality of negative samples are collected, and the sample set is constructed.
In order to make the model have a higher degree of identification for the query of the user, the idea of contrast Learning (contrast Learning) is used to achieve the above object. To implement contrast learning in this embodiment, before obtaining the sample set, the method further includes:
a plurality of sentences are acquired, and the sentences are collected by self, such as two sentences of 'helping me see a policy', and 'i want to quit the policy'.
And taking one statement as a sample, and obtaining a plurality of enhanced statements of the sample by applying different data enhancement methods to the same sample. The data enhancement mode comprises the following steps: translation back, vocabulary replacement, random noise injection, etc. For example, after data enhancement is performed on the sentence "help me see the policy", the "query policy" and "how to query the policy" can be obtained. After data enhancement is performed on the sentence "i want to refund", it is possible to obtain "how to refund" and "do not want this insurance".
A series of enhanced sentences of the same sentence are "self-similar" so they are semantically similar, but the words, or expressions, are structurally different so that two texts of the same sentence enhanced can be taken as positive samples. Meanwhile, the enhanced text of different sentences within the same Batch is taken as a negative sample.
Collecting a plurality of positive samples and a plurality of negative samples, and constructing the sample set, wherein the sample set is used as a representation space of the supervisory signal de-normalization basic model.
In the application, a contrast learning method is adopted to train the model, since contrast learning is a self-supervision machine learning training method, data is directly used as supervision information to learn the characteristic expression of sample data, manually labeled category label information is not needed, dependence on supervised linguistic data is avoided, only a small amount of unlabelled texts from downstream tasks need to be collected for Fine-tune, the collapse problem of BERT sentence vectors can be solved, and meanwhile, the representation of the BERT sentence vectors is more suitable for the downstream tasks.
Fig. 3 is a flowchart of a user response method according to still another embodiment of the present application, and as shown in fig. 3, the user response method may include:
step 301, all sentences in the knowledge base are obtained, and each sentence is converted into a characterization vector by using the language characterization model.
Knowledge refers to sentences pre-stored in a knowledge base, such as questions, standard question sentences and the like that are frequently consulted by a user.
Each sentence in the knowledge base is obtained and converted into a characterization vector by using a language characterization model, and the language characterization model is obtained by adopting the language characterization model training method.
In order to accelerate the search efficiency, all knowledge in the knowledge base can be converted into an embedding form in advance, then the knowledge base is stored in the bottommost database such as Mysql and Hive, and the cache content is finished through Redis, so that the follow-up reuse is facilitated.
Step 302, receiving a user query statement, and converting the user query statement into a query characterization vector by using the language characterization model.
And requesting a query statement input by a user through a display layer POST or GET, and converting the obtained query statement into a query representation vector by utilizing a language representation model.
Step 303, calculating the similarity between all the token vectors and the query token vector to find the most similar token vector.
The similarity calculation can adopt cosine similarity, and the process adopts a vector matrix multiplication mode, namely, the representation vectors in the knowledge base are arranged to form a matrix, the query representation vector is multiplied by the matrix to find the most similar sentence pair, and the vector matrix multiplication mode greatly accelerates the search speed.
For scenes with very large knowledge base content, such as millions of scenes, rough recall of the most similar sentences can be performed firstly, then fine ranking is performed, and sentences which best meet the service scenes are selected through the modes of a faiss or vearch vector search base and the like.
Step 304, taking the sentence corresponding to the most similar characterization vector as a response sentence of the user query sentence.
The similarity between the query token vector and all the token vectors in the knowledge base can be obtained through vector matrix multiplication, the token vector with the highest similarity is selected, the sentence corresponding to the token vector is obtained, and the sentence is used for responding.
The user response method of the embodiment can be applied to a question-and-answer type customer service robot, a standard question sentence most similar to a user query sentence is found in a similarity matching mode, and then an answer corresponding to the standard question sentence is returned; the method can also be applied to a clue mining scene, target intention sentences are used as seed sentences to be vector-coded and stored as knowledge base contents, similarity calculation is carried out on problems frequently consulted by a user and all the seed sentences in a customer service scene, a query with high confidence coefficient is selected and returned, the query can be used as a potential value clue and can be used for subsequent value conversion operation, and therefore the potential value clue is found and energized for business.
Preferably, the method further comprises:
segmenting the user query sentence to obtain a plurality of words;
inputting the words into the language representation model to obtain a plurality of word representation vectors;
calculating the similarity between each word characterization vector and the query characterization vector to query the most similar word characterization vector;
and taking the word corresponding to the most similar word token vector as the user intention.
Queries can also be subjected to embeddings vectorization characterization,
and then performing word segmentation on the query sentence to obtain a plurality of terms, converting all the terms into embedding representations through a language representation model, performing cosine value similarity calculation on the representations of the terms after word segmentation and the query representation vectors, and representing the term with the highest confidence coefficient as the most important intention of the query.
The foregoing description of specific embodiments has been presented for purposes of illustration and description. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.
Fig. 4 is a schematic structural diagram of a training apparatus for a language representation model according to an embodiment of the present invention, where the training apparatus is disposed in a terminal device, and as shown in fig. 4, the training apparatus may include: an acquisition module 41, a first base model 42, a second base model 43, a calculation module 44, a classification layer 45 and an adjustment module 46;
the acquiring module 41 is configured to acquire a sample set, where each sample includes a first statement, a second statement, and a similar tag;
a first base model 42 for converting the first sentence into a first token vector;
a second base model 43, configured to convert the second sentence input into a second token vector, wherein the second base model is obtained by copying the first base model;
a calculating module 44, configured to calculate a distance between the first token vector and the second token vector to obtain a distance vector;
the classification layer 45 is configured to splice the first characterization vector, the second characterization vector, and the distance vector, and generate a prediction result from a vector obtained by splicing;
an adjusting module 46, configured to calculate a difference between the prediction result and the similar label, and adjust parameters of the first base model and the second base model according to the difference.
Preferably, the method further comprises the following steps:
the system comprises an enhancement module, a semantic enhancement module and a semantic enhancement module, wherein the enhancement module is used for acquiring at least two sentences, performing data enhancement on each sentence and acquiring an enhancement data set corresponding to each sentence, and the enhancement data set at least comprises two enhancement sentences with the same semantic meaning;
the sample generation module is used for taking two enhancement statements in the same enhancement data set as positive samples and taking two enhancement statements in different enhancement data sets as negative samples;
and the sample set construction module is used for collecting a plurality of positive samples and a plurality of negative samples and constructing the sample set.
The training device of the language characterization model provided in the embodiment shown in fig. 4 can be used to execute the technical solution of the method embodiment shown in fig. 1 in this specification, and the implementation principle and the technical effect thereof can be further referred to the related description in the method embodiment.
Fig. 5 is a schematic structural diagram of a user response apparatus according to an embodiment of the present invention, where the user response apparatus is disposed in a terminal device, and as shown in fig. 5, the user response apparatus may include: a first conversion module 51, a second conversion module 52, a lookup module 53 and a response module 54,
the first conversion module 51 is configured to obtain all sentences in the knowledge base, and convert each sentence into a characterization vector by using a language characterization model;
a second conversion module 52, configured to receive a user query statement, and convert the user query statement into a query characterization vector by using a language characterization model;
a searching module 53, configured to calculate similarities between all the token vectors and the query token vector, so as to search for a most similar token vector;
a response module 54, configured to use a sentence corresponding to the most similar characterization vector as a response sentence of the user query sentence; wherein the language characterization model comprises: executing any one of the training methods of claims 1 to 2 to obtain the language representation model.
Preferably, the method further comprises the following steps:
the word segmentation module is used for segmenting words of the user query sentence to obtain a plurality of words;
the third conversion module is used for inputting the words into the language representation model to obtain a plurality of word representation vectors;
the second searching module is used for calculating the similarity between each word token vector and the query token vector so as to query the most similar word token vector;
and the user intention determining module is used for taking the word corresponding to the most similar word token vector as the user intention.
The embodiment of the application provides a terminal device, which may include at least one processor; and at least one memory communicatively coupled to the processor, wherein: the memory stores program instructions executable by the processor, which when invoked by the processor is capable of performing the method embodiments described above.
The terminal device may be an intelligent electronic device such as a smart phone, a tablet computer, or a notebook computer, and the form of the terminal device is not limited in this embodiment.
Embodiments of the present application provide a computer-readable storage medium, which stores computer instructions, and the computer instructions cause the computer to execute the method embodiments shown in the present specification.
The computer-readable storage medium described above may take any combination of one or more computer-readable media. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a Read Only Memory (ROM), an Erasable Programmable Read Only Memory (EPROM) or flash memory, an optical fiber, a portable compact disc read only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, Radio Frequency (RF), etc., or any suitable combination of the foregoing.
Computer program code for carrying out operations for aspects of the present description may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
The foregoing description has been directed to specific embodiments of this disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.
In the description of embodiments of the invention, reference to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the specification. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.
Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present specification, "a plurality" means at least two, e.g., two, three, etc., unless explicitly defined otherwise.
Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing steps of a custom logic function or process, and alternate implementations are included within the scope of the preferred embodiment of the present description in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the embodiments of the present description.
The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination" or "in response to a detection", depending on the context. Similarly, the phrases "if determined" or "if detected (a stated condition or event)" may be interpreted as "when determined" or "in response to a determination" or "when detected (a stated condition or event)" or "in response to a detection (a stated condition or event)", depending on the context.
It should be noted that the terminal referred to in the embodiments of the present application may include, but is not limited to, a Personal Computer (PC), a Personal Digital Assistant (PDA), a wireless handheld device, a tablet computer (tablet computer), a mobile phone, an MP3 player, an MP4 player, and the like.
In the several embodiments provided in this specification, it should be understood that the disclosed system, apparatus, and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and there may be other divisions in actual implementation, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
In addition, functional units in the embodiments of the present description may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional unit.
The integrated unit implemented in the form of a software functional unit may be stored in a computer readable storage medium. The software functional unit is stored in a storage medium and includes several instructions to enable a computer device (which may be a personal computer, a server, or a network device) or a processor (processor) to execute some steps of the methods described in the embodiments of the present disclosure. And the aforementioned storage medium includes: various media capable of storing program codes, such as a U disk, a removable hard disk, a Read Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
The above description is only a preferred embodiment of the present disclosure, and should not be taken as limiting the present disclosure, and any modifications, equivalents, improvements, etc. made within the spirit and principle of the present disclosure should be included in the scope of the present disclosure.

Claims (10)

1. A method for training a language characterization model, comprising:
obtaining a sample set, wherein each sample comprises a first statement, a second statement and a similar label;
inputting the first statement into a first basic model to obtain a first characterization vector;
inputting the second statement into a second basic model to obtain a second characterization vector, wherein the second basic model is obtained by copying the first basic model;
calculating a distance between the first token vector and the second token vector to obtain a distance vector;
splicing the first characterization vector, the second characterization vector and the distance vector, inputting the spliced vectors into a classification layer, and obtaining a prediction result;
and calculating a difference value between the prediction result and the similar label, and adjusting parameters of the first basic model and the second basic model according to the difference value.
2. The method of claim 1, further comprising:
acquiring at least two sentences, performing data enhancement on each sentence, and acquiring an enhanced data set corresponding to each sentence, wherein each enhanced data set at least comprises two enhanced sentences with the same semantic meaning;
taking two enhancement statements in the same enhancement data set as positive samples, and taking two enhancement statements in different enhancement data sets as negative samples;
a plurality of positive samples and a plurality of negative samples are collected, and the sample set is constructed.
3. A user response method, comprising:
acquiring all sentences in a knowledge base, and converting each sentence into a characterization vector by using a language characterization model;
receiving a user query statement, and converting the user query statement into a query characterization vector by using a language characterization model;
calculating the similarity of all the characterization vectors and the query characterization vector to search the most similar characterization vector;
taking the sentence corresponding to the most similar characterization vector as a response sentence of the user query sentence;
wherein the language characterization model comprises: executing any one of the training methods of claims 1 to 2 to obtain the language representation model.
4. The method of claim 3, further comprising:
segmenting the user query sentence to obtain a plurality of words;
inputting the words into the language representation model to obtain a plurality of word representation vectors;
calculating the similarity of each word characterization vector and the query characterization vector to query the most similar word characterization vector;
and taking the word corresponding to the most similar word token vector as the user intention.
5. A training apparatus for a language characterization model, disposed in a terminal device, the training apparatus comprising:
the system comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring a sample set, and each sample comprises a first statement, a second statement and similar labels;
a first base model for converting the first statement into a first token vector;
a second base model for converting the second sentence input into a second token vector, wherein the second base model is obtained by copying the first base model;
a calculation module for calculating a distance between the first token vector and the second token vector to obtain a distance vector;
the classification layer is used for splicing the first characterization vector, the second characterization vector and the distance vector, and generating a prediction result through the spliced vectors;
and the adjusting module is used for calculating a difference value between the prediction result and the similar label and adjusting the parameters of the first basic model and the second basic model according to the difference value.
6. The apparatus of claim 5, further comprising:
the system comprises an enhancement module, a semantic enhancement module and a semantic enhancement module, wherein the enhancement module is used for acquiring at least two sentences, performing data enhancement on each sentence and acquiring an enhancement data set corresponding to each sentence, and the enhancement data set at least comprises two enhancement sentences with the same semantic meaning;
the sample generation module is used for taking two enhancement statements in the same enhancement data set as positive samples and taking two enhancement statements in different enhancement data sets as negative samples;
and the sample set construction module is used for collecting a plurality of positive samples and a plurality of negative samples and constructing the sample set.
7. A user response apparatus provided in a terminal device, the user response apparatus comprising:
the first conversion module is used for acquiring all sentences in the knowledge base and converting each sentence into a characterization vector by using a language characterization model;
the second conversion module is used for receiving a user query statement and converting the user query statement into a query representation vector by using the language representation model;
the searching module is used for calculating the similarity of all the characterization vectors and the query characterization vector so as to search the most similar characterization vector;
a response module, configured to use a sentence corresponding to the most similar characterization vector as a response sentence of the user query sentence; wherein the language characterization model comprises: executing any one training method of claims 1 to 2 to obtain the language characterization model.
8. The apparatus of claim 7, further comprising:
the word segmentation module is used for segmenting words of the user query sentence to obtain a plurality of words;
the third conversion module is used for inputting the words into the language representation model to obtain a plurality of word representation vectors;
the second searching module is used for calculating the similarity between each word token vector and the query token vector so as to query the most similar word token vector;
and the user intention determining module is used for taking the word corresponding to the most similar word token vector as the user intention.
9. A terminal device, comprising:
at least one processor; and
at least one memory communicatively coupled to the processor, wherein:
the memory stores program instructions executable by the processor, the processor invoking the program instructions to perform the method of any of claims 1 to 4.
10. A computer readable storage medium storing computer instructions, the computer instructions causing the computer to perform the method of any of claims 1 to 4.
CN202210347404.2A 2022-04-01 2022-04-01 Language representation model training method, device, equipment, medium and user response method Pending CN114723008A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210347404.2A CN114723008A (en) 2022-04-01 2022-04-01 Language representation model training method, device, equipment, medium and user response method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210347404.2A CN114723008A (en) 2022-04-01 2022-04-01 Language representation model training method, device, equipment, medium and user response method

Publications (1)

Publication Number Publication Date
CN114723008A true CN114723008A (en) 2022-07-08

Family

ID=82241981

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210347404.2A Pending CN114723008A (en) 2022-04-01 2022-04-01 Language representation model training method, device, equipment, medium and user response method

Country Status (1)

Country Link
CN (1) CN114723008A (en)

Similar Documents

Publication Publication Date Title
CN107066464B (en) Semantic natural language vector space
CN108932342A (en) A kind of method of semantic matches, the learning method of model and server
Liu et al. Sentiment analysis for e-commerce product reviews by deep learning model of Bert-BiGRU-Softmax
CN111401077B (en) Language model processing method and device and computer equipment
CN109034203B (en) Method, device, equipment and medium for training expression recommendation model and recommending expression
CN116720004B (en) Recommendation reason generation method, device, equipment and storage medium
CN111241807B (en) Machine reading understanding method based on knowledge-guided attention
CN114580382A (en) Text error correction method and device
CN112699216A (en) End-to-end language model pre-training method, system, device and storage medium
CN111739520B (en) Speech recognition model training method, speech recognition method and device
CN113297360B (en) Law question-answering method and device based on weak supervised learning and joint learning mechanism
CN113806482A (en) Cross-modal retrieval method and device for video text, storage medium and equipment
CN112667782A (en) Text classification method, device, equipment and storage medium
CN116304745B (en) Text topic matching method and system based on deep semantic information
Kshirsagar et al. A review on application of deep learning in natural language processing
CN114428850A (en) Text retrieval matching method and system
Zhang et al. Chatbot design method using hybrid word vector expression model based on real telemarketing data
CN116384379A (en) Chinese clinical term standardization method based on deep learning
CN114003773A (en) Dialogue tracking method based on self-construction multi-scene
CN114723008A (en) Language representation model training method, device, equipment, medium and user response method
CN114595324A (en) Method, device, terminal and non-transitory storage medium for power grid service data domain division
CN114692610A (en) Keyword determination method and device
CN114626378A (en) Named entity recognition method and device, electronic equipment and computer readable storage medium
CN114638231B (en) Entity linking method and device and electronic equipment
CN115310547B (en) Model training method, article identification method and device, electronic equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: 100102 201 / F, block C, 2 lizezhong 2nd Road, Chaoyang District, Beijing

Applicant after: Beijing Shuidi Technology Group Co.,Ltd.

Address before: 100102 201 / F, block C, 2 lizezhong 2nd Road, Chaoyang District, Beijing

Applicant before: Beijing Health Home Technology Co.,Ltd.

CB02 Change of applicant information