CN111539222A - Training method and device for semantic similarity task model, electronic equipment and storage medium - Google Patents
Training method and device for semantic similarity task model, electronic equipment and storage medium Download PDFInfo
- Publication number
- CN111539222A CN111539222A CN202010431949.2A CN202010431949A CN111539222A CN 111539222 A CN111539222 A CN 111539222A CN 202010431949 A CN202010431949 A CN 202010431949A CN 111539222 A CN111539222 A CN 111539222A
- Authority
- CN
- China
- Prior art keywords
- semantic similarity
- training data
- positive
- task model
- difference
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000012549 training Methods 0.000 title claims abstract description 218
- 238000000034 method Methods 0.000 title claims abstract description 48
- 230000003044 adaptive effect Effects 0.000 claims abstract description 54
- 230000006870 function Effects 0.000 claims description 63
- 230000015654 memory Effects 0.000 claims description 20
- 230000004913 activation Effects 0.000 claims description 8
- 238000010276 construction Methods 0.000 claims description 3
- 238000013473 artificial intelligence Methods 0.000 abstract description 2
- 238000005516 engineering process Methods 0.000 description 8
- 238000012546 transfer Methods 0.000 description 7
- 238000010586 diagram Methods 0.000 description 6
- 238000004821 distillation Methods 0.000 description 5
- 230000000694 effects Effects 0.000 description 5
- 238000004891 communication Methods 0.000 description 4
- 238000004590 computer program Methods 0.000 description 4
- 230000008569 process Effects 0.000 description 4
- 238000013508 migration Methods 0.000 description 3
- 230000005012 migration Effects 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 238000012545 processing Methods 0.000 description 3
- 230000006872 improvement Effects 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 239000004973 liquid crystal related substance Substances 0.000 description 2
- 238000003058 natural language processing Methods 0.000 description 2
- 239000004576 sand Substances 0.000 description 2
- 230000003213 activating effect Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000001413 cellular effect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 238000010295 mobile communication Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000007670 refining Methods 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
- 230000001953 sensory effect Effects 0.000 description 1
- 238000012163 sequencing technique Methods 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Machine Translation (AREA)
Abstract
The application discloses a training method and device of a semantic similarity task model, electronic equipment and a storage medium, and relates to the technical field of artificial intelligence. The specific implementation scheme is as follows: when the semantic similarity task model is trained by adopting each piece of training data, the value difference of positive and negative examples in the training data predicted by the semantic similarity task model is obtained; constructing an adaptive difference loss function of the semantic similarity task model based on the score difference of positive and negative examples in training data obtained by a pre-trained semantic similarity scoring model and the score difference of positive and negative examples in training data predicted by the semantic similarity task model; and training the semantic similarity task model based on the adaptive difference loss function. The training method can enable the semantic similarity task model to learn the signals of the semantic similarity scoring model more easily, and further enable the semantic similarity task model to effectively learn the information of the semantic similarity scoring model.
Description
Technical Field
The application relates to a computer technology, in particular to an artificial intelligence technology, and specifically relates to a training method and device of a semantic similarity task model, an electronic device and a storage medium.
Background
In the semantic matching task in the Natural Language Processing (NLP) field, since the text matching annotation data disclosed at present are very few, in the prior art, the training set is usually expanded by using a data distillation technology. The data distillation technology is a common method for knowledge migration, a model with a complex structure and a better effect is generally used as a teacher model, and the teacher model is used for predicting on an unlabelled data set to obtain the data set labeled by the teacher model. And then, training a student model with a relatively simple structure by using the data set labeled by the teacher model, so as to fulfill the aim of refining the knowledge of the teacher model to the student model.
In the semantic matching task, a teacher model is used for scoring the similarity between 0 and 1 of two sections of texts, and a threshold value is set for screening positive examples (similar texts) and negative examples (dissimilar texts), so that weak supervision training data can be constructed based on a large amount of unlabeled corpora. When the student model is trained, different training models focus on learning different tasks. For example, under the pointwise training paradigm, the class of the positive and negative examples of the teacher model, i.e., simple dichotomy, can be learned directly. For another example, under the pairwise training paradigm, by fitting the difference (margin) between positive and negative examples, information that the positive example score of the learning teacher model is greater than the negative example score is learned.
However, currently, advanced teacher models are usually based on pointwise training paradigm, the scores are extreme, the scores of positive examples are often close to 1, the scores of negative examples are often close to 0, and margin is large, so that it is difficult for student models to directly fit margin of the teacher model under the pairwise training paradigm, and difficulty is brought to signal migration of the teacher model, therefore, the student models in the existing semantic similarity task cannot effectively learn the information of the teacher model.
Disclosure of Invention
In order to solve the technical problem, the application provides a training method and device of a semantic similarity task model, an electronic device and a storage medium.
According to a first aspect, a training method of a semantic similarity task model is provided, wherein the training method comprises the following steps:
when a semantic similarity task model is trained by adopting each piece of training data, acquiring the value difference of positive and negative examples in the training data predicted by the semantic similarity task model;
constructing an adaptive difference loss function of the semantic similarity task model based on the score difference of positive and negative examples in the training data obtained by a pre-trained semantic similarity scoring model and the score difference of positive and negative examples in the training data predicted by the semantic similarity task model;
and training the semantic similarity task model based on the self-adaptive difference loss function.
According to a second aspect, there is provided a training apparatus for a semantic similarity task model, comprising:
the acquisition module is used for acquiring the value difference of positive and negative examples in the training data predicted by the semantic similarity task model when the semantic similarity task model is trained by adopting each piece of training data;
the building module is used for building an adaptive difference loss function of the semantic similarity task model based on the score difference of positive and negative examples in the training data obtained by a pre-trained semantic similarity scoring model and the score difference of positive and negative examples in the training data predicted by the semantic similarity task model;
and the training module is used for training the semantic similarity task model based on the self-adaptive difference loss function.
According to a third aspect, there is provided an electronic device comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method as described above.
According to a fourth aspect, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method as described above.
According to the technology, the problem that the student models cannot effectively learn the information of the teacher model in the semantic similarity task in the prior art is solved, the semantic similarity task model can more easily learn the signals of the semantic similarity scoring model, the knowledge transfer from the semantic similarity scoring model to the semantic similarity task model is better realized, and the semantic similarity task model can effectively learn the information of the semantic similarity scoring model.
It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.
Drawings
The drawings are included to provide a better understanding of the present solution and are not intended to limit the present application. Wherein:
FIG. 1 is a schematic diagram according to a first embodiment of the present application;
FIG. 2 is a schematic diagram according to a second embodiment of the present application;
FIG. 3 is a schematic illustration according to a third embodiment of the present application;
FIG. 4 is a schematic illustration according to a fourth embodiment of the present application;
FIG. 5 is a block diagram of an electronic device for implementing a training method of a semantic similarity task model according to an embodiment of the present application.
Detailed Description
The following description of the exemplary embodiments of the present application, taken in conjunction with the accompanying drawings, includes various details of the embodiments of the application for the understanding of the same, which are to be considered exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
FIG. 1 is a schematic diagram according to a first embodiment of the present application; as shown in fig. 1, this embodiment provides a training method for a semantic similarity task model, which specifically includes the following steps:
s101, when a semantic similarity task model is trained by adopting each piece of training data, acquiring the value difference of positive and negative examples in the training data predicted by the semantic similarity task model;
s102, constructing an adaptive difference loss function of the semantic similarity task model based on the score difference of positive and negative examples in training data obtained by a pre-trained semantic similarity scoring model and the score difference of positive and negative examples in training data predicted by the semantic similarity task model;
s103, training the semantic similarity task model based on the adaptive difference loss function.
The main execution body of the training method of the semantic similarity task model in this embodiment is a training device of the semantic similarity task model, and the device may be an electronic device of an entity, such as an electronic device like a computer. Or the application of software integration can be adopted, and the application can be operated on a computer when in use to realize the training of the semantic similarity task model.
The training data of this embodiment is used for training the semantic similarity task model, and each piece of training data necessarily includes a pair of positive examples and a pair of negative examples, and the positive examples and the negative examples are opposite, and there is a common corpus. For example, one piece of training data may include corpus a, corpus B, and corpus C, where corpus a and corpus B form a positive example, that is, a text with similar semantics; the corpus A and the corpus C form a negative example, namely texts with different semantics.
In this embodiment, the training data is used to train the semantic similarity task model, and as described in step S102, the adaptive difference loss function of the semantic similarity task model is constructed based on the difference between positive and negative examples in the training data obtained by the pre-trained semantic similarity scoring model, so that it can be known that the semantic similarity task model is used to learn the scoring of the positive and negative examples in the training data by the semantic similarity scoring model. Therefore, the semantic similarity scoring model in the embodiment serves as a teacher model, and the semantic similarity task model serves as a student model. In practical application, the student models are suitable for learning the teacher model, the student models are simple in structure relative to the network, and the teacher model is more complex in structure relative to the student models. For example, a teacher model may reach a network structure of several hundred layers, while a student model may only have a network structure of several tens of layers. The teacher model is the most basic model, but when concrete task was carried out, probably succinct inadequately, cause task execution speed, efficiency unsatisfactory, consequently, can carry out concrete task through training student model, make its ability of studying teacher model, because concrete student model simple structure, execution speed, efficiency is more ideal.
When the semantic similarity task model is trained by adopting each piece of training data, for each piece of training data, a corresponding self-adaptive difference loss function is constructed in the training process, and then the semantic similarity task model is trained based on the self-adaptive difference loss function. The adaptive difference loss function is called as an adaptive difference loss function, has certain adaptivity, and cannot be equal to the difference between the scores of positive and negative examples in the training data obtained by the pre-trained semantic similarity scoring model and the scores of positive and negative examples in the training data predicted by the semantic similarity task model. In addition, in this embodiment, the purpose of constructing the adaptive difference loss function is to enable the semantic similarity task model to learn the capability of the semantic similarity scoring model more easily, for example, to learn the scoring of positive and negative examples in training data by the semantic similarity scoring model, so as to perform knowledge migration better.
In addition, it should be noted that the background of the implementation of the technical solution of the present embodiment is implemented based on the data distillation technology. Specifically, a semantic similarity scoring model, namely a teacher model, scores countless training corpora, positive examples and negative examples are constructed based on scoring results, the positive examples and the negative examples with the common corpora form a piece of training data, and a training data set comprising a plurality of pieces of training data is generated, wherein each piece of training data comprises a pair of positive examples and a pair of negative examples. Then, each piece of training data is adopted, and a semantic similarity scoring model is combined to score positive and negative examples in each piece of training data, and the technical scheme of the embodiment is adopted to train the semantic similarity task model, namely the student model.
In this embodiment, when the semantic similarity task model is trained by using each piece of training data in the training data set, the training data is input into the semantic similarity task model corresponding to each piece of training data, and the semantic similarity task model can predict the similarity score of a positive case in the training data and the similarity score of a negative case in the training data. And then calculating the difference of the positive and negative examples in the training data based on the similarity scores of the positive examples and the negative examples in the training data. Similarly, the process of obtaining the score difference of the positive and negative examples in the training data based on the semantic similarity scoring model corresponds to: firstly, acquiring similarity scores of positive examples and negative examples in training data by the semantic similarity scoring model; the difference in the scores of positive and negative examples in the training data is then calculated based on the similarity scores of the positive examples and the similarity scores of the negative examples in the training data.
Or optionally, in this embodiment, based on the above principle, the semantic similarity task model and the semantic similarity scoring model may directly output the score difference corresponding to the positive and negative examples of the training data.
Then, based on the training data, according to the positive and negative score difference obtained by the semantic similarity scoring model and the positive and negative score difference predicted by the semantic similarity task model, an adaptive difference loss function of the semantic similarity task model is constructed, and finally, based on the adaptive difference loss function, the semantic similarity task model is trained, so that the semantic similarity task model can learn the signals of the semantic similarity scoring model more easily.
The training method of the semantic similarity task model of this embodiment is training in a pair training paradigm, so that the semantic similarity task model learns that the similarity of a pair of corpora is accurately scored, rather than only recognizing that the corpus is similar or dissimilar, and the trained semantic similarity task model of this embodiment can learn the relative relationship between positive and negative examples, is more suitable for matching and sequencing, and has a wider application range. For example, the trained semantic similarity task model can be applied to scenes such as an intelligent customer service system and the like which relate to semantic similarity processing. If a question and a sentence of a user are received, then the similarity between the question and the question in the pre-configured question-answer information base is analyzed, so that the question with the maximum similarity can be obtained, and a corresponding reply scheme is obtained based on the question with the maximum similarity and fed back to the user.
According to the training method of the semantic similarity task model, when the semantic similarity task model is trained by adopting various training data, the score difference of positive and negative examples in the training data predicted by the semantic similarity task model is obtained; constructing an adaptive difference loss function of the semantic similarity task model based on the score difference of positive and negative examples in training data obtained by a pre-trained semantic similarity scoring model and the score difference of positive and negative examples in training data predicted by the semantic similarity task model; the semantic similarity task model is trained based on the adaptive difference loss function, so that the semantic similarity task model can learn signals of the semantic similarity scoring model more easily, knowledge transfer from the semantic similarity scoring model to the semantic similarity task model is realized better, and further the semantic similarity task model can effectively learn information of the semantic similarity scoring model.
FIG. 2 is a schematic diagram according to a second embodiment of the present application; the training method of the semantic similarity task model of this embodiment further describes the technical solution of the present application in more detail on the basis of the technical solution of the embodiment shown in fig. 1. As shown in fig. 2, the training method of the semantic similarity task model of this embodiment may specifically include the following steps:
s201, acquiring a piece of training data from a training data set;
referring to the embodiment shown in fig. 1, the training data set is generated after scoring based on the semantic similarity scoring model. The semantic similarity task model serves as a student model, and the semantic similarity scoring model serves as a teacher model.
S202, inputting the training data into a semantic similarity task model, and acquiring positive case scores and negative case scores in the training data output by the semantic similarity task model;
s203, acquiring the difference between positive and negative scores in the training data based on the semantic similarity task model, wherein the positive scores and the negative scores are output by the semantic similarity task model;
s204, obtaining the score difference of positive and negative examples in the training data obtained based on a pre-trained semantic similarity scoring model;
since the training data set of this embodiment is generated after scoring based on the semantic similarity scoring model, the scoring of the positive examples and the scoring of the negative examples in each piece of training data by the semantic similarity scoring model can be obtained. And then, taking the difference between the positive case scoring and the negative case scoring of the training data by the semantic similarity scoring model as the score difference of the positive case and the negative case in the training data obtained based on the semantic similarity scoring model.
S205, obtaining self-adaptive positive and negative example value differences of the training data based on the positive and negative example value differences of the training data obtained by the semantic similarity scoring model and the positive and negative example value differences of the training data predicted by the semantic similarity task model;
for example, the adaptive positive and negative example fractional value difference of the training data may be obtained by the following formula:
t_dynamic=t*sigmoid(α*(s(θ)-t)) (1)
wherein,t_dynamicfor the adaptive positive and negative example fractional differences of the training data,tthe score difference of positive and negative examples in the training data obtained based on the semantic similarity scoring model,s(θ) is the score difference of positive and negative examples in the training data predicted by the semantic similarity task model, α is the scale factor, sigmoid () is the activation functionThe sub- α can be selected empirically and the value of the scaling factor α must be greater than 1, and is selected empirically to be between 10-20 optimal.
Further, the air conditioner is provided with a fan,tcan be expressed ast=post-negtWherein postScoring the score of the positive case in the training data by the semantic similarity scoring model; negtThe score of the semantic similarity scoring model on negative examples in the training data.
Further, the air conditioner is provided with a fan,s(θ) can be expressed ass(θ) ═ pos (θ) -neg (θ), where pos (θ) is the score of the positive case in the training data predicted by the semantic similarity task model; neg (θ) is the score of the negative case in the training data predicted by the semantic similarity task model.
S206, constructing an adaptive difference loss function based on the adaptive positive and negative case value difference of the training data and the positive and negative case value difference of the training data predicted by the semantic similarity task model;
for example, the adaptive difference loss function may be constructed by the following formula:
L(θ)=-log (sigmoid(σ*(s(θ)-t_dynamic))) (2)
wherein L (theta) is a constructed adaptive difference loss function,t_dynamicfor the adaptive positive and negative example fractional differences of the training data,sand (theta) is the value difference of positive and negative examples in the training data predicted by the semantic similarity task model, sigma is an adjusting parameter, and sigmoid () is an activation function. The adjustment parameter σ may also be chosen empirically, for example a value of around 5.
Steps S205-S206 are a specific implementation manner of step S102 in the embodiment shown in fig. 1.
S207, judging whether the self-adaptive difference loss function is converged; if not, go to step S208; if yes, go to step S209;
s208, adjusting parameters of the semantic similarity task model to enable the self-adaptive difference loss function to tend to converge; returning to step S201, training data continues to be acquired for training.
S209, judging whether the adaptive difference loss function is always converged in the training of continuous preset times or not, or whether the training times reach a preset time threshold value or not, if so, finishing the training, determining parameters of a semantic similarity task model, and further determining the semantic similarity task model; otherwise, returning to step S201, continuing to acquire training data for training.
The number of consecutive times may be selected according to practical experience, such as 50 consecutive times, 80 consecutive times, or other times. The preset time threshold may also be set according to actual requirements, such as hundreds of thousands or millions, and is not limited herein.
In the setting process, the calculation is performed firstly through the formula (1)tAnds(theta) and activating a function through sigmoid to obtain a weight value between 0 and 1. If it iss(θ)<tThen the weight is less than 0.5, otherwise the weight is greater than 0.5, and the weight is biased more toward 0 or 1 by scaling factor α, avoiding distribution around 0.5 as much as possibletThe weight is multiplied to obtain the self-adaptive difference loss functiont_dynamic. Finally, the cross entropy loss function is adopted by the above formula (2) to lets(theta) as large as possiblet_dynamic。
From the above, in the early stage of training of the student model, namely the semantic similarity task model, the model is generally poor in performance and often cannot be fittedtAt this timet_dynamicThe tendency is also 0, and the student model, namely the semantic similarity task model, only needs to learn that the positive score is larger than the negative score. In the later stage of training, along with the improvement of the model effect,tandsthe difference in (theta) is gradually reduced,t_dynamicthe learning difficulty is increased, positive and negative examples are better distinguished by the model, and the model effect is further improved.
By adopting the technical scheme and the data distillation technology based on the adaptive difference loss function, the training method of the semantic similarity task model can enable a student model, namely the semantic similarity task model, to dynamically adjust teacher signals of a semantic similarity scoring model to be learned, namely a teacher model, so that the learning difficulty is reduced in the early stage, the learning difficulty is gradually increased in the later stage according to the learning quality of the model, the teacher model signals are better adapted, and therefore knowledge transfer is better performed. Based on this, the technical scheme of the embodiment can enable the semantic similarity task model to learn the signals of the semantic similarity scoring model more easily, and better realize the knowledge transfer from the semantic similarity scoring model to the semantic similarity task model, so that the semantic similarity task model can effectively learn the information of the semantic similarity scoring model.
FIG. 3 is a schematic illustration according to a third embodiment of the present application; as shown in fig. 3, the present embodiment provides a training apparatus 300 for a semantic similarity task model, which includes:
an obtaining module 301, configured to obtain a score difference between positive and negative examples in training data predicted by a semantic similarity task model when the semantic similarity task model is trained using each piece of training data;
the building module 302 is configured to build an adaptive difference loss function of the semantic similarity task model based on a score difference of positive and negative examples in training data obtained by a pre-trained semantic similarity scoring model and a score difference of positive and negative examples in training data predicted by the semantic similarity task model;
and the training module 303 is configured to train the semantic similarity task model based on the adaptive difference loss function.
The implementation principle and technical effect of the training of the semantic similarity task model implemented by the training device 300 of the semantic similarity task model in this embodiment are the same as the implementation of the related method embodiment, and reference may be made to the description of the related method embodiment in detail, which is not described herein again.
FIG. 4 is a schematic illustration according to a fourth embodiment of the present application; as shown in fig. 4, the training apparatus 300 for semantic similarity task model according to this embodiment further describes the technical solution of this application in more detail based on the technical solution of the embodiment shown in fig. 3.
As shown in fig. 4, in the training apparatus 300 for semantic similarity task model of this embodiment, the building module 302 includes:
an obtaining unit 3021, configured to obtain adaptive positive and negative example score differences of training data based on the positive and negative example score differences of the training data obtained by the semantic similarity scoring model and the positive and negative example score differences of the training data predicted by the semantic similarity task model;
the constructing unit 3022 is configured to construct an adaptive difference loss function based on the adaptive positive and negative score difference of the training data and the positive and negative score difference of the training data predicted by the semantic similarity task model.
Further optionally, an obtaining unit 3021, configured to:
based on the positive and negative example score difference in the training data obtained by the semantic similarity scoring model and the positive and negative example score difference in the training data predicted by the semantic similarity task model, the self-adaptive positive and negative example score difference of the training data is obtained by the following formula:
t_dynamic=t*sigmoid(α*(s(θ)-t))
wherein,t_dynamicfor the adaptive positive and negative example fractional differences of the training data,tthe score difference of positive and negative examples in the training data obtained based on the semantic similarity scoring model,s(theta) is the score difference of positive and negative examples in the training data predicted by the semantic similarity task model, α is a scaling factor, and sigmoid () is an activation function.
Further optionally, a building unit 3022 configured to:
based on the adaptive positive and negative example score difference of the training data and the score difference of the positive and negative examples in the training data predicted by the semantic similarity task model, an adaptive difference loss function is constructed by the following formula:
L(θ)=-log(sigmoid(σ*(s(θ)-t_dynamic)))
wherein L (theta) is a constructed adaptive difference loss function,t_dynamicadaptive positive and negative examples values for training dataThe difference is that the number of the first and second,sand (theta) is the value difference of positive and negative examples in the training data predicted by the semantic similarity task model, sigma is an adjusting parameter, and sigmoid () is an activation function.
Further optionally, in the training apparatus 300 for a semantic similarity task model according to this embodiment, the training module 303 is configured to:
judging whether the adaptive difference loss function is converged;
if not, adjusting parameters of the semantic similarity task model to enable the adaptive difference loss function to tend to converge.
The implementation principle and technical effect of the training of the semantic similarity task model implemented by the training device 300 of the semantic similarity task model in this embodiment are the same as the implementation of the related method embodiment, and reference may be made to the description of the related method embodiment in detail, which is not described herein again.
According to an embodiment of the present application, an electronic device and a readable storage medium are also provided.
Fig. 5 is a block diagram of an electronic device implementing a training method for a semantic similarity task model according to an embodiment of the present application. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the present application that are described and/or claimed herein.
As shown in fig. 5, the electronic apparatus includes: one or more processors 501, memory 502, and interfaces for connecting the various components, including high-speed interfaces and low-speed interfaces. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions for execution within the electronic device, including instructions stored in or on the memory to display graphical information of a GUI on an external input/output apparatus (such as a display device coupled to the interface). In other embodiments, multiple processors and/or multiple buses may be used, along with multiple memories and multiple memories, as desired. Also, multiple electronic devices may be connected, with each device providing portions of the necessary operations (e.g., as a server array, a group of blade servers, or a multi-processor system). In fig. 5, one processor 501 is taken as an example.
The memory 502, which is a non-transitory computer readable storage medium, may be used to store non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions/modules (e.g., the relevant modules shown in fig. 3 and 4) corresponding to the training method of the semantic similarity task model in the embodiments of the present application. The processor 501 executes various functional applications of the server and data processing by running non-transitory software programs, instructions and modules stored in the memory 502, namely, implementing the training method of the semantic similarity task model in the above method embodiments.
The memory 502 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created by use of an electronic device that implements a training method of the semantic similarity task model, and the like. Further, the memory 502 may include high speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, memory 502 optionally includes memory located remotely from processor 501, and these remote memories may be connected over a network to an electronic device that implements the training method of the semantic similarity task model. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The electronic device implementing the training method of the semantic similarity task model may further include: an input device 503 and an output device 504. The processor 501, the memory 502, the input device 503 and the output device 504 may be connected by a bus or other means, and fig. 5 illustrates the connection by a bus as an example.
The input device 503 may receive input numeric or character information and generate key signal inputs related to user settings and function control of an electronic apparatus implementing the training method of the semantic similarity task model, such as a touch screen, a keypad, a mouse, a track pad, a touch pad, a pointing stick, one or more mouse buttons, a track ball, a joystick, or other input devices. The output devices 504 may include a display device, auxiliary lighting devices (e.g., LEDs), and haptic feedback devices (e.g., vibrating motors), among others. The display device may include, but is not limited to, a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display, and a plasma display. In some implementations, the display device can be a touch screen.
Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, application specific ASICs (application specific integrated circuits), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.
These computer programs (also known as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented using high-level procedural and/or object-oriented programming languages, and/or assembly/machine languages. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, Programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.
The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.
According to the technical scheme of the embodiment of the application, when the semantic similarity task model is trained by adopting each piece of training data, the value difference of positive and negative examples in the training data predicted by the semantic similarity task model is obtained; constructing an adaptive difference loss function of the semantic similarity task model based on the score difference of positive and negative examples in training data obtained by a pre-trained semantic similarity scoring model and the score difference of positive and negative examples in training data predicted by the semantic similarity task model; the semantic similarity task model is trained based on the adaptive difference loss function, so that the semantic similarity task model can learn signals of the semantic similarity scoring model more easily, knowledge transfer from the semantic similarity scoring model to the semantic similarity task model is realized better, and further the semantic similarity task model can effectively learn information of the semantic similarity scoring model.
According to the technical scheme of the embodiment of the application, the data distillation technology based on the self-adaptive difference loss function can enable the student model, namely the semantic similarity task model, to dynamically adjust the teacher signal of the to-be-learned semantic similarity scoring model, namely the teacher model, so that the learning difficulty is reduced in the early stage, the learning difficulty is gradually increased in the later stage according to the learning quality of the model, the teacher model signal is better adapted, and therefore knowledge transfer is better performed. Based on this, the technical scheme of the embodiment of the application can enable the semantic similarity task model to learn the signals of the semantic similarity scoring model more easily, and can better realize the knowledge transfer from the semantic similarity scoring model to the semantic similarity task model, so that the semantic similarity task model can effectively learn the information of the semantic similarity scoring model.
It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present application may be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solutions disclosed in the present application can be achieved, and the present invention is not limited herein.
The above-described embodiments should not be construed as limiting the scope of the present application. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present application shall be included in the protection scope of the present application.
Claims (12)
1. A training method of a semantic similarity task model comprises the following steps:
when a semantic similarity task model is trained by adopting each piece of training data, acquiring the value difference of positive and negative examples in the training data predicted by the semantic similarity task model;
constructing an adaptive difference loss function of the semantic similarity task model based on the score difference of positive and negative examples in the training data obtained by a pre-trained semantic similarity scoring model and the score difference of positive and negative examples in the training data predicted by the semantic similarity task model;
and training the semantic similarity task model based on the self-adaptive difference loss function.
2. The method according to claim 1, wherein constructing an adaptive difference loss function of the semantic similarity task model based on the score difference of positive and negative examples in the training data obtained by a pre-trained semantic similarity scoring model and the score difference of positive and negative examples in the training data predicted by the semantic similarity task model comprises:
acquiring a self-adaptive positive and negative example score difference of the training data based on the positive and negative example score difference of the training data obtained by the semantic similarity scoring model and the positive and negative example score difference of the training data predicted by the semantic similarity task model;
and constructing the self-adaptive difference loss function based on the self-adaptive positive and negative example score difference of the training data and the positive and negative example score difference of the training data predicted by the semantic similarity task model.
3. The method of claim 1, wherein obtaining adaptive positive and negative example score differences of the training data based on the positive and negative example score differences of the training data obtained by the semantic similarity scoring model and the positive and negative example score differences of the training data predicted by the semantic similarity task model comprises:
based on the positive and negative example score difference in the training data obtained by the semantic similarity scoring model and the positive and negative example score difference in the training data predicted by the semantic similarity task model, obtaining the self-adaptive positive and negative example score difference of the training data by the following formula:
t_dynamic=t*sigmoid(α*(s(θ)-t))
wherein,t_dynamicfor the adaptive positive and negative example fractional value differences of the training data,tfor the score differences of positive and negative examples in the training data based on the semantic similarity scoring model,s(theta) is the score difference of positive and negative examples in the training data predicted by the semantic similarity task model, α is a scaling factor, and sigmoid () is an activation function.
4. The method according to any one of claims 1-3, wherein constructing the adaptive difference loss function based on the adaptive positive and negative example score differences of the training data and the positive and negative example score differences in the training data predicted by the semantic similarity task model comprises:
based on the adaptive positive and negative example score difference of the training data and the score difference of the positive and negative examples in the training data predicted by the semantic similarity task model, constructing the adaptive difference loss function by the following formula:
L(θ)=-log(sigmoid(σ*(s(θ)-t_dynamic)))
wherein L (θ) is the constructed adaptive difference loss function,t_dynamicfor the adaptive positive and negative example fractional value differences of the training data,s(theta) is the value difference of positive and negative examples in the training data predicted by the semantic similarity task model, sigma is an adjusting parameter, and sigmoid () is an activation function.
5. The method of claim 4, wherein training the semantic similarity task model based on the adaptive difference loss function comprises:
judging whether the self-adaptive difference loss function is converged;
and if not, adjusting parameters of the semantic similarity task model to enable the self-adaptive difference loss function to tend to converge.
6. A training device of a semantic similarity task model comprises the following components:
the acquisition module is used for acquiring the value difference of positive and negative examples in the training data predicted by the semantic similarity task model when the semantic similarity task model is trained by adopting each piece of training data;
the building module is used for building an adaptive difference loss function of the semantic similarity task model based on the score difference of positive and negative examples in the training data obtained by a pre-trained semantic similarity scoring model and the score difference of positive and negative examples in the training data predicted by the semantic similarity task model;
and the training module is used for training the semantic similarity task model based on the self-adaptive difference loss function.
7. The apparatus of claim 6, wherein the building block comprises:
the acquisition unit is used for acquiring the adaptive positive and negative example mark value difference of the training data based on the positive and negative example mark value difference in the training data obtained by the semantic similarity marking model and the positive and negative example mark value difference in the training data predicted by the semantic similarity task model;
and the construction unit is used for constructing the self-adaptive difference loss function based on the self-adaptive positive and negative example fractional value difference of the training data and the positive and negative example fractional value difference in the training data predicted by the semantic similarity task model.
8. The apparatus of claim 6, wherein the obtaining unit is configured to:
based on the positive and negative example score difference in the training data obtained by the semantic similarity scoring model and the positive and negative example score difference in the training data predicted by the semantic similarity task model, obtaining the self-adaptive positive and negative example score difference of the training data by the following formula:
t_dynamic=t*sigmoid(α*(s(θ)-t))
wherein,t_dynamicfor the adaptive positive and negative example fractional value differences of the training data,tfor the score differences of positive and negative examples in the training data based on the semantic similarity scoring model,s(theta) is the score difference of positive and negative examples in the training data predicted by the semantic similarity task model, α is a scaling factor, and sigmoid () is an activation function.
9. The apparatus according to any one of claims 6-8, wherein the construction unit is configured to:
based on the adaptive positive and negative example score difference of the training data and the score difference of the positive and negative examples in the training data predicted by the semantic similarity task model, constructing the adaptive difference loss function by the following formula:
L(θ)=-log(sigmoid(σ*(s(θ)-t_dynamic)))
wherein L (θ) is the constructed adaptive difference loss function,t_dynamicfor the adaptive positive and negative example fractional value differences of the training data,s(theta) is the value difference of positive and negative examples in the training data predicted by the semantic similarity task model, sigma is an adjusting parameter, and sigmoid () is an activation function.
10. The apparatus of claim 9, wherein the training module is to:
judging whether the self-adaptive difference loss function is converged;
and if not, adjusting parameters of the semantic similarity task model to enable the self-adaptive difference loss function to tend to converge.
11. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-5.
12. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-5.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010431949.2A CN111539222B (en) | 2020-05-20 | 2020-05-20 | Training method, device, equipment and storage medium of semantic similarity task model |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010431949.2A CN111539222B (en) | 2020-05-20 | 2020-05-20 | Training method, device, equipment and storage medium of semantic similarity task model |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111539222A true CN111539222A (en) | 2020-08-14 |
CN111539222B CN111539222B (en) | 2023-05-23 |
Family
ID=71979497
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010431949.2A Active CN111539222B (en) | 2020-05-20 | 2020-05-20 | Training method, device, equipment and storage medium of semantic similarity task model |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111539222B (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112328710A (en) * | 2020-11-26 | 2021-02-05 | 北京百度网讯科技有限公司 | Entity information processing method, entity information processing device, electronic equipment and storage medium |
CN112767307A (en) * | 2020-12-28 | 2021-05-07 | 上海联影智能医疗科技有限公司 | Image processing method, image processing device, computer equipment and storage medium |
CN113051368A (en) * | 2021-03-24 | 2021-06-29 | 北京百度网讯科技有限公司 | Double-tower model training method, double-tower model searching device and electronic equipment |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8626524B1 (en) * | 2005-12-29 | 2014-01-07 | Quest Diagnostics Investments Inc. | System and method for transferring data with electronic messages |
CN107203782A (en) * | 2017-05-23 | 2017-09-26 | 哈尔滨工业大学 | Communication interference signals recognition methods under Larger Dynamic signal to noise ratio based on convolutional neural networks |
US20180373979A1 (en) * | 2017-06-22 | 2018-12-27 | Adobe Systems Incorporated | Image captioning utilizing semantic text modeling and adversarial learning |
CN110111335A (en) * | 2019-05-08 | 2019-08-09 | 南昌航空大学 | A kind of the urban transportation Scene Semantics dividing method and system of adaptive confrontation study |
CN110322446A (en) * | 2019-07-01 | 2019-10-11 | 华中科技大学 | A kind of domain adaptive semantic dividing method based on similarity space alignment |
CN110674714A (en) * | 2019-09-13 | 2020-01-10 | 东南大学 | Human face and human face key point joint detection method based on transfer learning |
CN110969006A (en) * | 2019-12-02 | 2020-04-07 | 支付宝(杭州)信息技术有限公司 | Training method and system of text sequencing model |
US20200134442A1 (en) * | 2018-10-29 | 2020-04-30 | Microsoft Technology Licensing, Llc | Task detection in communications using domain adaptation |
-
2020
- 2020-05-20 CN CN202010431949.2A patent/CN111539222B/en active Active
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8626524B1 (en) * | 2005-12-29 | 2014-01-07 | Quest Diagnostics Investments Inc. | System and method for transferring data with electronic messages |
CN107203782A (en) * | 2017-05-23 | 2017-09-26 | 哈尔滨工业大学 | Communication interference signals recognition methods under Larger Dynamic signal to noise ratio based on convolutional neural networks |
US20180373979A1 (en) * | 2017-06-22 | 2018-12-27 | Adobe Systems Incorporated | Image captioning utilizing semantic text modeling and adversarial learning |
US20200134442A1 (en) * | 2018-10-29 | 2020-04-30 | Microsoft Technology Licensing, Llc | Task detection in communications using domain adaptation |
CN110111335A (en) * | 2019-05-08 | 2019-08-09 | 南昌航空大学 | A kind of the urban transportation Scene Semantics dividing method and system of adaptive confrontation study |
CN110322446A (en) * | 2019-07-01 | 2019-10-11 | 华中科技大学 | A kind of domain adaptive semantic dividing method based on similarity space alignment |
CN110674714A (en) * | 2019-09-13 | 2020-01-10 | 东南大学 | Human face and human face key point joint detection method based on transfer learning |
CN110969006A (en) * | 2019-12-02 | 2020-04-07 | 支付宝(杭州)信息技术有限公司 | Training method and system of text sequencing model |
Non-Patent Citations (3)
Title |
---|
LEI JIANG; WENGANG ZHOU; HOUQIANG LI: "Knowledge Distillation with Category-Aware Attention and Discriminant Logit Losses" * |
张根保;李浩;冉琰;李裘进: "一种用于轴承故障诊断的迁移学习模型" * |
王敦泽: "基于CNN-TDNN和迁移学习的噪声鲁棒性语音识别" * |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112328710A (en) * | 2020-11-26 | 2021-02-05 | 北京百度网讯科技有限公司 | Entity information processing method, entity information processing device, electronic equipment and storage medium |
CN112328710B (en) * | 2020-11-26 | 2024-06-11 | 北京百度网讯科技有限公司 | Entity information processing method, device, electronic equipment and storage medium |
CN112767307A (en) * | 2020-12-28 | 2021-05-07 | 上海联影智能医疗科技有限公司 | Image processing method, image processing device, computer equipment and storage medium |
CN113051368A (en) * | 2021-03-24 | 2021-06-29 | 北京百度网讯科技有限公司 | Double-tower model training method, double-tower model searching device and electronic equipment |
CN113051368B (en) * | 2021-03-24 | 2023-09-22 | 北京百度网讯科技有限公司 | Double-tower model training method, retrieval device and electronic equipment |
Also Published As
Publication number | Publication date |
---|---|
CN111539222B (en) | 2023-05-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111428008B (en) | Method, apparatus, device and storage medium for training a model | |
CN111598216B (en) | Method, device and equipment for generating student network model and storage medium | |
CN110727806B (en) | Text processing method and device based on natural language and knowledge graph | |
CN111125335B (en) | Question and answer processing method and device, electronic equipment and storage medium | |
CN111079442B (en) | Vectorization representation method and device of document and computer equipment | |
CN110795569B (en) | Method, device and equipment for generating vector representation of knowledge graph | |
CN111539223A (en) | Language model training method and device, electronic equipment and readable storage medium | |
US20220019736A1 (en) | Method and apparatus for training natural language processing model, device and storage medium | |
CN111507104B (en) | Method and device for establishing label labeling model, electronic equipment and readable storage medium | |
CN111737995A (en) | Method, device, equipment and medium for training language model based on multiple word vectors | |
CN111783981A (en) | Model training method and device, electronic equipment and readable storage medium | |
US20210374343A1 (en) | Method and apparatus for obtaining word vectors based on language model, device and storage medium | |
CN111737994A (en) | Method, device and equipment for obtaining word vector based on language model and storage medium | |
CN111539222A (en) | Training method and device for semantic similarity task model, electronic equipment and storage medium | |
CN110543558B (en) | Question matching method, device, equipment and medium | |
CN111079945B (en) | End-to-end model training method and device | |
CN111709252B (en) | Model improvement method and device based on pre-trained semantic model | |
EP3549031B1 (en) | Language data prediction with neural networks and online learning | |
CN111950291A (en) | Semantic representation model generation method and device, electronic equipment and storage medium | |
CN111507111B (en) | Pre-training method and device of semantic representation model, electronic equipment and storage medium | |
CN111667056A (en) | Method and apparatus for searching model structure | |
CN110717340A (en) | Recommendation method and device, electronic equipment and storage medium | |
CN111681647A (en) | Method, apparatus, device and storage medium for recognizing word slot | |
CN111241838A (en) | Text entity semantic relation processing method, device and equipment | |
CN114492788A (en) | Method and device for training deep learning model, electronic equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |