CN115357684A - Method and device for determining loss parameters of dialogue generation model - Google Patents

Method and device for determining loss parameters of dialogue generation model Download PDF

Info

Publication number
CN115357684A
CN115357684A CN202210955316.0A CN202210955316A CN115357684A CN 115357684 A CN115357684 A CN 115357684A CN 202210955316 A CN202210955316 A CN 202210955316A CN 115357684 A CN115357684 A CN 115357684A
Authority
CN
China
Prior art keywords
loss parameter
dialogue
model
generation model
obtaining
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210955316.0A
Other languages
Chinese (zh)
Inventor
彭旋
陈自岩
高鑫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Glabal Tone Communication Technology Co ltd
Original Assignee
Glabal Tone Communication Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Glabal Tone Communication Technology Co ltd filed Critical Glabal Tone Communication Technology Co ltd
Priority to CN202210955316.0A priority Critical patent/CN115357684A/en
Publication of CN115357684A publication Critical patent/CN115357684A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Mathematical Physics (AREA)
  • Human Computer Interaction (AREA)
  • Machine Translation (AREA)

Abstract

The application provides a method and a device for determining loss parameters of a dialog generation model, wherein the method comprises the following steps: the method comprises the following steps: training a dialogue generating model through a labeled sample of dialogue data to obtain an overall loss parameter of the dialogue generating model; performing virtual countermeasure training through the labeled sample to obtain a virtual countermeasure loss parameter; and obtaining the final loss parameter of the dialogue generating model according to the sum of the overall loss parameter and the virtual countermeasure loss parameter. The method and the device can enhance the generalization ability and accuracy of the dialogue generating model.

Description

Method and device for determining loss parameters of dialog generation model
Technical Field
The present application relates to the field of natural language processing technologies, and in particular, to a method and an apparatus for determining a loss parameter of a dialog generation model.
Background
The dialogue generation is a subtask of natural language processing and information extraction, and plays an important role in the fields of intelligent question answering, multi-turn dialogue, man-machine interaction, intelligent customer service and the like.
At present, the dialog generation method mainly generates a model, applies an algorithm of natural language processing, and utilizes a coder-decoder structure to reply. The generation model is closer to the process of human conversation, and the generation model can automatically learn how to generate texts from the existing conversation texts, so that the flexibility is high.
However, the generated model often has the problems of robustness and weak generalization, and conventional countertraining (such as FGM, PGD, etc.) is added to enhance robustness, but at the same time, the generalization of the model is damaged, resulting in poor generalization of the model.
Disclosure of Invention
The embodiment of the application aims to provide a method and a device for determining a loss parameter of a dialog generation model so as to solve the problem that the accuracy of dialog generation is not high enough. The specific technical scheme is as follows:
in a first aspect, a method for determining loss parameters of a dialog generation model is provided, the method comprising:
training a dialogue generating model through a labeled sample of dialogue data to obtain an overall loss parameter of the dialogue generating model;
performing virtual countermeasure training through the labeled sample to obtain a virtual countermeasure loss parameter;
and obtaining the final loss parameter of the dialogue generating model according to the sum of the overall loss parameter and the virtual countermeasure loss parameter.
Optionally, training the dialog generation model through the labeled sample of the dialog data, and obtaining the overall loss parameter of the dialog generation model includes:
performing part-of-speech tagging and syntax tagging on the dialogue data to obtain a tagged sample, wherein the dialogue data comprises a first statement and a second statement which belong to different dialogue objects;
inputting the first sentence into a dialogue generating model to obtain a prediction result output by the dialogue generating model, wherein the prediction result comprises the content, the part of speech and the syntax of a predicted second sentence;
comparing the prediction result with the second statement to obtain a first loss parameter of the dialogue generating model about a dialogue generating task, a second loss parameter of the dialogue generating model about a part-of-speech predicting task and a third loss parameter of the dialogue generating model about a syntax predicting task;
and obtaining the overall loss parameter according to the first loss parameter, the second loss parameter and the third loss parameter.
Optionally, the performing part-of-speech tagging and syntax tagging on the dialogue data comprises:
segmenting words of the dialogue data through a word segmentation tool;
performing part-of-speech tagging on the participles by adopting a part-of-speech recognition scheme;
and carrying out syntactic annotation on the participles by adopting a syntactic recognition scheme, wherein the syntactic annotation indicates sentence components or sentence relations of the participles in the dialogue data.
Optionally, obtaining a first loss parameter of the dialog generation model for a dialog generation task, a second loss parameter for a part-of-speech prediction task, and a third loss parameter for a syntactic prediction task comprises:
obtaining a first loss parameter related to a dialog generation task through a decoder hidden state layer of the dialog generation model, wherein the dialog generation model adopts an encoder-decoder structure;
obtaining, by a decoder sharing layer of the dialog generation model, a second loss parameter for a part-of-speech prediction task;
obtaining, by a decoder sharing layer of the dialog generation model, a third loss parameter for a syntactic prediction task.
Optionally, the performing virtual confrontation training through the labeled sample to obtain a virtual confrontation loss parameter includes:
taking the first statement of the labeled sample as non-disturbance input;
converting to obtain a disturbance input by adding disturbance to the non-disturbance input;
obtaining a KL difference according to the disturbed output of the dialogue generating model aiming at the disturbed input and the undisturbed output aiming at the undisturbed input;
and minimizing the KL difference by updating the weight of the dialogue generating model to obtain a virtual confrontation loss parameter.
Optionally, after obtaining the final loss parameter of the dialog generation model, the method further includes:
performing semantic retrieval on the first half sentence of the dialogue data through a question-answer library;
and when the second half sentence corresponding to the first half sentence cannot be searched, predicting the second half sentence of the first half sentence by using a dialogue generation model with the final loss parameter.
Optionally, the dialog generation model is a seq2seq model.
In a second aspect, an apparatus for determining loss parameters of a dialog generation model is provided, the apparatus comprising:
the training module is used for training a conversation generation model through a labeled sample of conversation data to obtain an overall loss parameter of the conversation generation model;
the virtual countermeasure training module is used for carrying out virtual countermeasure training through the labeled sample to obtain a virtual countermeasure loss parameter;
and the updating module is used for obtaining the final loss parameter of the dialogue generating model according to the sum of the overall loss parameter and the virtual confrontation loss parameter.
In a third aspect, an electronic device is provided, which includes a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory complete communication with each other through the communication bus;
a memory for storing a computer program;
and a processor for implementing any of the above method steps of determining a loss parameter of the dialog generation model when executing the program stored in the memory.
In a fourth aspect, a computer-readable storage medium is provided, in which a computer program is stored, which computer program, when being executed by a processor, carries out any of the method steps for determining a loss parameter of a dialog generating model.
The embodiment of the application has the following beneficial effects:
the embodiment of the application provides a method for determining loss parameters of a dialogue generating model, and the method adopts virtual confrontation training without using label information, so that excessive dependence on manually marked samples is reduced, adopts a semi-supervised learning mode, obtains combined loss of an overall loss parameter and a virtual confrontation loss parameter based on the virtual confrontation training vat, and enhances generalization capability and accuracy of the dialogue generating model under the condition of not sacrificing the robustness of the model.
Of course, it is not necessary for any product or method of the present application to achieve all of the above advantages at the same time.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly described below, and it is obvious for those skilled in the art to obtain other drawings without inventive exercise.
Fig. 1 is a flowchart of a method for determining a loss parameter of a dialog generation model according to an embodiment of the present application;
fig. 2 is a flowchart of a method for obtaining an overall loss parameter according to an embodiment of the present application;
FIG. 3 is a block diagram illustrating a method for determining loss parameters of a dialog generation model according to an embodiment of the present application;
fig. 4 is a schematic structural diagram of an apparatus for determining a loss parameter of a dialog generation model according to an embodiment of the present application;
fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments in the present application without making creative efforts shall fall within the protection scope of the present application.
In the following description, suffixes such as "module", "component", or "unit" used to denote elements are used only for the convenience of description of the present application, and have no specific meaning in themselves. Thus, "module" and "component" may be used in a mixture.
The method for determining the loss parameters of the dialog generation model in the embodiment of the application can be executed by a server and used for enhancing the generalization capability and the accuracy of the dialog generation model.
The following describes in detail a method for determining a loss parameter of a dialog generation model provided in an embodiment of the present application with reference to a specific embodiment, and as shown in fig. 1, the specific steps are as follows:
step 101: and training the dialogue generating model through the labeled sample of the dialogue data to obtain the overall loss parameter of the dialogue generating model.
In the embodiment of the application, the server carries out part-of-speech tagging and syntax tagging on the dialogue data to obtain tagged samples, and the server trains the dialogue generating model through the tagged samples to obtain the overall loss parameters of the dialogue generating model.
Step 102: and carrying out virtual countermeasure training by marking samples to obtain virtual countermeasure loss parameters.
In the embodiment of the present application, virtual confrontation training is a data enhancement technique that does not require prior domain knowledge. In virtual countermeasure training, label information is not used, only model outputs are used to generate perturbations, and the perturbations are generated such that the output of the perturbed input is different from the model output of the original input.
And the server performs virtual countermeasure training through the labeled sample to obtain a virtual countermeasure loss parameter.
Step 103: and obtaining the final loss parameter of the dialogue generating model according to the sum of the overall loss parameter and the virtual countermeasure loss parameter.
In the embodiment of the application, the server calculates the sum of the overall loss parameter and the virtual countermeasure loss parameter to obtain the final loss parameter of the dialogue generating model. Thus, the final loss parameters of the dialogue generating model include the whole loss parameters of the dialogue generating model and the virtual confrontation loss parameters of the virtual confrontation training.
The formula for calculating the final loss parameter is as follows:
total_loss=Model_loss+vat_loss
wherein, total _ loss is a final loss parameter, model _ loss is an overall loss parameter, and vat _ loss is a virtual confrontation loss parameter.
According to the method, due to the fact that virtual confrontation training is adopted, label information is not used, excessive dependence on manually marked samples is reduced, a semi-supervised learning mode is adopted, joint loss of the overall loss parameters and the virtual confrontation loss parameters is obtained on the basis of the virtual confrontation training vat, and generalization capability and accuracy of a dialogue generation model are enhanced under the condition that robustness of the model is not sacrificed.
As an alternative embodiment, as shown in fig. 2, training the dialog generation model through the labeled sample of the dialog data to obtain the overall loss parameter of the dialog generation model includes:
step 201: and performing part-of-speech tagging and syntax tagging on the dialogue data to obtain a tagging sample.
Wherein the dialog data comprises a first sentence and a second sentence belonging to different dialog objects.
In the embodiment of the present application, the dialog data includes a first sentence and a second sentence belonging to different dialog objects, and the first sentence and the second sentence can be obtained by semantic segmentation. And after segmenting the words of the dialogue data, the server carries out part-of-speech tagging and syntax tagging to obtain a tagging sample.
Optionally, the part-of-speech tagging and the syntax tagging of the dialog data comprises: segmenting words of the speech data through a word segmentation tool; performing part-of-speech tagging on the participles by adopting a part-of-speech recognition scheme; and carrying out syntactic annotation on the participles by adopting a syntactic identification scheme, wherein the syntactic annotation indicates sentence components or sentence relations of the participles in the dialogue data.
The server adopts a word segmentation tool to segment the speech data, then adopts a part-of-speech recognition scheme to label the part-of-speech, wherein the part-of-speech labels indicate the part-of-speech of the part-of-speech, such as nouns, pronouns, conjunctions, and the like, and fig. 3 is a part-of-speech comparison table.
r n wp u c v p d m a
Pronouns Noun (name) Punctuation Word aid Conjunction word Verb and its usage Preposition word Adverb Digit word Adjective
The server adopts a syntactic recognition scheme to carry out syntactic labeling on the participles, the syntactic labeling indicates sentence components or sentence relations of the participles in the dialogue data, such as preposed objects, a main-to-predicate relation or a guest-moving relation, and a syntactic comparison table is shown in figure 4.
SBV VOB IOB FOB DBL ATT ADV
Relationship between major and minor Moving guest relationship Inter-guest relationships Preposition object Mixed language Centering relationships Middle structure
The word segmentation tool, the part-of-speech recognition scheme and the syntax recognition scheme are not limited in the application, and illustratively, the word segmentation, part-of-speech tagging and syntax tagging can be performed by adopting other natural language processing tools such as an ltp tool, a stanza and a jieba.
Step 202: and inputting the first statement into the dialogue generating model to obtain a prediction result output by the dialogue generating model.
Wherein the prediction result comprises the content, the part of speech and the syntax of the predicted second sentence.
In the embodiment of the present application, the server constructs a dialog generation model based on an encoder-decoder structure, and exemplarily, the dialog generation model is a seq2seq model. And the server inputs the first statement into the conversation generation model to obtain a prediction result output by the conversation generation model, wherein the prediction result comprises the content, the part of speech and the syntax of the predicted second statement.
Step 203: and comparing the prediction result with the second statement to obtain a first loss parameter of the dialogue generating model about the dialogue generating task, a second loss parameter of the part-of-speech predicting task and a third loss parameter of the syntactic predicting task.
In the embodiment of the application, the server compares the prediction result with the second statement, and according to the comparison result, a first loss parameter model _ loss1 related to the dialog generation task is acquired through a decoder hidden state layer of the dialog generation model, a second loss parameter model _ loss2 related to the part-of-speech prediction task is acquired through a decoder sharing layer, and a third loss parameter model _ loss3 related to the syntactic prediction task is acquired through the decoder sharing layer.
Step 204: and obtaining the overall loss parameter according to the first loss parameter, the second loss parameter and the third loss parameter.
In the embodiment of the application, the server obtains three preset hyper-parameters, and obtains the overall loss parameter according to each loss parameter and the corresponding hyper-parameter.
Model_loss=a*model_loss1+b*model_loss2+c*model_loss3
Wherein Model _ loss is the overall loss parameter, and a, b, and c are the over parameters.
In the application, the server adds the part-of-speech prediction task and the syntactic prediction task into the model, obtains the overall loss parameter of the multitask constraint, strengthens the grammatical constraint of the dialogue output statement, and solves the problems of incomplete output statement, continuous repeated words, inconsistent statements and the like compared with the generation model.
In addition, compared with the prior art that a rule template is adopted for replying, the method and the device do not need to set a large number of templates, the dialogue generation model can be adopted for automatically replying, and compared with the prior art that the real meaning of the semanteme cannot be captured by adopting the retrieval model, the method and the device can effectively capture the real meaning of the semanteme by strengthening the grammatical constraint of the dialogue output statement, and improve the replying accuracy.
As an optional implementation manner, performing virtual confrontation training by labeling samples, and obtaining a virtual confrontation loss parameter includes: taking a first statement of the marked sample as non-disturbance input; the disturbance input is obtained by adding disturbance to the non-disturbance input and transforming; obtaining KL difference according to the disturbance output of the dialogue generation model aiming at the disturbance input and the non-disturbance output aiming at the non-disturbance input; and (4) minimizing the KL difference by updating the weight of the dialogue generation model to obtain a virtual confrontation loss parameter.
1) Taking a first statement of a labeled sample as a non-disturbance input, starting from the non-disturbance input X, converting X by adding small disturbance r, wherein the converted disturbance input is T (X) = X + r;
2) The model output of the perturbing input T (x) should be different from the output of the non-perturbing input and the KL difference between the two outputs should be maximal while ensuring that the L2 norm of r is small. From all the perturbations r, let rv-adv become the perturbation of the antagonistic direction.
Δ KL (r,x (n) ,θ)≡KL[p(y|x (n) ,θ)||p(y|x (n) +r,θ)]Formula (1)
Wherein, delta KL (r,x (n) θ) is the KL difference between the perturbed and unperturbed outputs, p (y | x) (n) θ) is the unperturbed output, p (y | x) (n) + r, θ) is the perturbation output, x (n): representing a certain input (or sample point), theta being a model parameterAnd r is a perturbation to the input.
Equation (1) uses x (n) And x (n) The KL divergence of the output distribution of + r indicates the difference between the two distributions.
Figure BDA0003791092610000091
Wherein the content of the first and second substances,
Figure BDA0003791092610000092
is to counter the disturbance. In the formula (2), when the L2 norm of r is smaller than a certain value, the r which maximizes the formula (1) is found, that is, the disturbance direction which maximizes the difference between the two distributions is found, that is, the training model is found when the input is x (n) The local distribution of time is the most uneven direction.
3) After the anti-disturbance and disturbance inputs are found, the weights of the dialog generation model are updated such that the KL divergence is minimized. This will make the dialog generation model robust to different perturbations. The following losses are minimized by gradient descent, resulting in a virtual oppositional loss parameter.
Figure BDA0003791092610000101
Formula (3) is to define x (n) The Local Distribution Smoothness (LDS) of (d) is: x is a radical of a fluorine atom (n) X after maximum perturbation direction perturbation (n) Output distribution of input and x (n) The inverse of the KL divergence between the input and output distributions.
As an optional implementation, after obtaining the final loss parameter of the dialog generation model, the method further includes: performing semantic retrieval on the first half sentence of the dialogue data through a question-answer library; when the second half sentence corresponding to the first half sentence cannot be searched, the second half sentence of the first half sentence is predicted by using the dialogue generating model with the final loss parameter.
In the multi-language pre-training project, aiming at the first half sentence of the dialogue data, the server firstly utilizes a question-answer library to carry out semantic retrieval, and selects a proper reply from the question-answer library; and if the corresponding reply cannot be found in the question-answer library, calling a dialogue generation model to predict the second half sentence.
The conventional seq2seq generation type dialogue algorithm rouge-L value is about 33.5%, the rouge-L value is about 34.7% by adopting the dialogue generation method based on the improved semi-supervised learning, and the fluctuation of the loss value in the training process is reduced in a certain range.
FIG. 3 is a schematic diagram of the framework for determining the loss parameters of the dialog generation model. It can be seen that the final model parameters can be obtained by supplementing part-of-speech tagging and syntax tagging to the dialogue data, then adopting multi-task constraint, then performing virtual confrontation training, and finally solving for joint loss.
Based on the same technical concept, an embodiment of the present application further provides an apparatus for determining a dialog generation model loss parameter, as shown in fig. 4, the apparatus includes:
the training module 401 is configured to train a dialog generation model through a labeled sample of dialog data to obtain an overall loss parameter of the dialog generation model;
a virtual countermeasure training module 402, configured to perform virtual countermeasure training by labeling a sample to obtain a virtual countermeasure loss parameter;
and an updating module 403, configured to obtain a final loss parameter of the dialog generation model according to the sum of the overall loss parameter and the virtual countermeasure loss parameter.
Optionally, the training module 401 comprises:
the system comprises a labeling unit, a processing unit and a processing unit, wherein the labeling unit is used for performing part-of-speech labeling and syntax labeling on dialogue data to obtain a labeling sample, and the dialogue data comprises a first sentence and a second sentence which belong to different dialogue objects;
the input and output unit is used for inputting the first sentence into the dialogue generating model to obtain a prediction result output by the dialogue generating model, wherein the prediction result comprises the content, the part of speech and the syntax of the predicted second sentence;
the comparison unit is used for comparing the prediction result with the second statement to obtain a first loss parameter of the dialogue generation model relative to the dialogue generation task, a second loss parameter of the part of speech prediction task and a third loss parameter of the syntax prediction task;
and the obtaining unit is used for obtaining the overall loss parameter according to the first loss parameter, the second loss parameter and the third loss parameter.
Optionally, the labeling unit is configured to:
segmenting words for the dialogue data through a word segmentation tool;
performing part-of-speech tagging on the participles by adopting a part-of-speech recognition scheme;
and carrying out syntactic annotation on the participles by adopting a syntactic identification scheme, wherein the syntactic annotation indicates sentence components or sentence relations of the participles in the dialogue data.
Optionally, the comparison unit is configured to:
obtaining a first loss parameter related to a dialog generation task through a decoder hidden state layer of a dialog generation model, wherein the dialog generation model adopts an encoder-decoder structure;
obtaining a second loss parameter related to the part-of-speech prediction task through a decoder sharing layer of the dialogue generation model;
a third loss parameter is obtained for the syntactic prediction task by a decoder sharing layer of the dialog generation model.
Optionally, the virtual confrontation training module 402 is configured to:
taking a first statement of the marked sample as non-disturbance input;
the disturbance input is obtained through adding disturbance to the non-disturbance input and transformation;
obtaining KL difference according to the disturbance output of the dialogue generation model aiming at the disturbance input and the non-disturbance output aiming at the non-disturbance input;
and (4) minimizing the KL difference by updating the weight of the dialogue generation model to obtain a virtual confrontation loss parameter.
Optionally, the apparatus is further configured to:
performing semantic retrieval on the first half sentence of the dialogue data through a question-answer library;
when the second half sentence corresponding to the first half sentence cannot be searched, the second half sentence of the first half sentence is predicted by using the dialogue generating model with the final loss parameter.
Optionally, the dialog generation model is a seq2seq model.
According to another aspect of the embodiments of the present application, there is provided an electronic device, as shown in fig. 5, including a memory 503, a processor 501, a communication interface 502, and a communication bus 504, where the memory 503 stores a computer program that can be executed on the processor 501, the memory 503 and the processor 501 communicate through the communication interface 502 and the communication bus 504, and the processor 501 executes the computer program to implement the steps of the method.
The memory and the processor in the electronic equipment are communicated with the communication interface through a communication bus. The communication bus may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The communication bus may be divided into an address bus, a data bus, a control bus, etc.
The Memory may include a Random Access Memory (RAM) or a non-volatile Memory (non-volatile Memory), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the processor.
The Processor may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; the Integrated Circuit may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete gate or transistor logic device, or discrete hardware components.
There is also provided, in accordance with yet another aspect of an embodiment of the present application, a computer-readable medium having non-volatile program code executable by a processor.
Optionally, in an embodiment of the present application, a computer readable medium is configured to store program code for the processor to execute the above method.
Optionally, for a specific example in this embodiment, reference may be made to the example described in the foregoing embodiment, and this embodiment is not described herein again.
When the embodiments of the present application are specifically implemented, reference may be made to the above embodiments, and corresponding technical effects are achieved.
It is to be understood that the embodiments described herein may be implemented in hardware, software, firmware, middleware, microcode, or any combination thereof. For a hardware implementation, the Processing units may be implemented within one or more Application Specific Integrated Circuits (ASICs), digital Signal Processors (DSPs), digital Signal Processing Devices (DSPDs), programmable Logic Devices (PLDs), field Programmable Gate Arrays (FPGAs), general purpose processors, controllers, micro-controllers, microprocessors, other electronic units designed to perform the functions described herein, or a combination thereof.
For a software implementation, the techniques described herein may be implemented by means of units performing the functions described herein. The software codes may be stored in a memory and executed by a processor. The memory may be implemented within the processor or external to the processor.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
It can be clearly understood by those skilled in the art that, for convenience and simplicity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is merely a logical division, and in actual implementation, there may be other divisions, for example, multiple modules or components may be combined or integrated into another system, or some features may be omitted, or not implemented. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.
The functions may be stored in a computer-readable storage medium if they are implemented in the form of software functional units and sold or used as separate products. Based on such understanding, the technical solutions of the embodiments of the present application may be essentially implemented or make a contribution to the prior art, or may be implemented in the form of a software product stored in a storage medium and including several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a U disk, a removable hard disk, a ROM, a RAM, a magnetic disk, or an optical disk. It is noted that, in this document, relational terms such as "first" and "second," and the like, may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrases "comprising a," "8230," "8230," or "comprising" does not exclude the presence of additional like elements in a process, method, article, or apparatus that comprises the element.
The above description is merely exemplary of the present application and is presented to enable those skilled in the art to understand and practice the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (10)

1. A method for determining a loss parameter for a dialog generation model, the method comprising:
training a dialogue generating model through a labeled sample of dialogue data to obtain an overall loss parameter of the dialogue generating model;
performing virtual confrontation training through the labeled sample to obtain a virtual confrontation loss parameter;
and obtaining the final loss parameter of the dialogue generating model according to the sum of the overall loss parameter and the virtual countermeasure loss parameter.
2. The method of claim 1, wherein training a dialog generation model with annotated samples of dialog data to obtain an overall loss parameter for the dialog generation model comprises:
performing part-of-speech tagging and syntax tagging on the dialogue data to obtain a tagged sample, wherein the dialogue data comprises a first statement and a second statement which belong to different dialogue objects;
inputting the first sentence into a dialogue generating model to obtain a prediction result output by the dialogue generating model, wherein the prediction result comprises the content, the part of speech and the syntax of a predicted second sentence;
comparing the prediction result with the second statement to obtain a first loss parameter of the dialogue generating model about a dialogue generating task, a second loss parameter of the dialogue generating model about a part-of-speech predicting task and a third loss parameter of the dialogue generating model about a syntax predicting task;
and obtaining the overall loss parameter according to the first loss parameter, the second loss parameter and the third loss parameter.
3. The method of claim 2, wherein the part-of-speech tagging and syntactic tagging of the dialog data comprises:
segmenting words of the dialogue data through a word segmentation tool;
performing part-of-speech tagging on the participles by adopting a part-of-speech recognition scheme;
and carrying out syntactic annotation on the participles by adopting a syntactic recognition scheme, wherein the syntactic annotation indicates sentence components or sentence relations of the participles in the dialogue data.
4. The method of claim 2, wherein obtaining the dialog generation model includes obtaining a first loss parameter for a dialog generation task, a second loss parameter for a part-of-speech prediction task, and a third loss parameter for a syntactic prediction task:
obtaining a first loss parameter related to a dialog generation task through a decoder hidden state layer of the dialog generation model, wherein the dialog generation model adopts an encoder-decoder structure;
obtaining, by a decoder sharing layer of the dialog generation model, a second loss parameter for a part-of-speech prediction task;
obtaining, by a decoder sharing layer of the dialog generation model, a third loss parameter for a syntactic prediction task.
5. The method of claim 1, wherein the performing virtual confrontation training through the labeled sample to obtain a virtual confrontation loss parameter comprises:
taking the first statement of the labeled sample as non-disturbance input;
converting to obtain a disturbance input by adding disturbance to the non-disturbance input;
obtaining KL difference according to the disturbance output of the dialogue generation model aiming at the disturbance input and the non-disturbance output aiming at the non-disturbance input;
and minimizing the KL difference by updating the weight of the dialogue generating model to obtain a virtual confrontation loss parameter.
6. The method of claim 1, wherein after obtaining the final loss parameters of the dialog generation model, the method further comprises:
semantic retrieval is carried out on the first half sentence of the dialogue data through a question-answer library;
and when the second half sentence corresponding to the first half sentence cannot be searched, predicting the second half sentence of the first half sentence by using a dialogue generation model with the final loss parameter.
7. The method of claim 1, wherein the dialog generation model is a seq2seq model.
8. An apparatus for determining a loss parameter of a dialog generation model, the apparatus comprising:
the training module is used for training a conversation generation model through a labeled sample of conversation data to obtain an overall loss parameter of the conversation generation model;
the virtual countermeasure training module is used for carrying out virtual countermeasure training through the labeled sample to obtain a virtual countermeasure loss parameter;
and the updating module is used for obtaining the final loss parameter of the dialogue generating model according to the sum of the overall loss parameter and the virtual confrontation loss parameter.
9. An electronic device is characterized by comprising a processor, a communication interface, a memory and a communication bus, wherein the processor and the communication interface are used for realizing mutual communication by the memory through the communication bus;
a memory for storing a computer program;
a processor for implementing the method steps of any of claims 1 to 7 when executing a program stored in the memory.
10. A computer-readable storage medium, characterized in that a computer program is stored in the computer-readable storage medium, which computer program, when being executed by a processor, carries out the method steps of any one of claims 1 to 7.
CN202210955316.0A 2022-08-10 2022-08-10 Method and device for determining loss parameters of dialogue generation model Pending CN115357684A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210955316.0A CN115357684A (en) 2022-08-10 2022-08-10 Method and device for determining loss parameters of dialogue generation model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210955316.0A CN115357684A (en) 2022-08-10 2022-08-10 Method and device for determining loss parameters of dialogue generation model

Publications (1)

Publication Number Publication Date
CN115357684A true CN115357684A (en) 2022-11-18

Family

ID=84001213

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210955316.0A Pending CN115357684A (en) 2022-08-10 2022-08-10 Method and device for determining loss parameters of dialogue generation model

Country Status (1)

Country Link
CN (1) CN115357684A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116579350A (en) * 2023-07-14 2023-08-11 腾讯科技(深圳)有限公司 Robustness analysis method and device for dialogue understanding model and computer equipment

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116579350A (en) * 2023-07-14 2023-08-11 腾讯科技(深圳)有限公司 Robustness analysis method and device for dialogue understanding model and computer equipment
CN116579350B (en) * 2023-07-14 2024-01-30 腾讯科技(深圳)有限公司 Robustness analysis method and device for dialogue understanding model and computer equipment

Similar Documents

Publication Publication Date Title
US11436487B2 (en) Joint embedding of corpus pairs for domain mapping
US10657189B2 (en) Joint embedding of corpus pairs for domain mapping
CN112084334B (en) Label classification method and device for corpus, computer equipment and storage medium
CN111738016A (en) Multi-intention recognition method and related equipment
Plepi et al. Context transformer with stacked pointer networks for conversational question answering over knowledge graphs
Singh et al. A decision tree based word sense disambiguation system in Manipuri language
CN115357719B (en) Power audit text classification method and device based on improved BERT model
CN112686022A (en) Method and device for detecting illegal corpus, computer equipment and storage medium
CN114528845A (en) Abnormal log analysis method and device and electronic equipment
US10642919B2 (en) Joint embedding of corpus pairs for domain mapping
CN115357684A (en) Method and device for determining loss parameters of dialogue generation model
Sonbol et al. Learning software requirements syntax: An unsupervised approach to recognize templates
CN112036186A (en) Corpus labeling method and device, computer storage medium and electronic equipment
CN111723583B (en) Statement processing method, device, equipment and storage medium based on intention role
CN113705207A (en) Grammar error recognition method and device
Xue et al. Intent-enhanced attentive Bert capsule network for zero-shot intention detection
WO2023088278A1 (en) Method and apparatus for verifying authenticity of expression, and device and medium
Sawant et al. An Enhanced BERTopic Framework and Algorithm for Improving Topic Coherence and Diversity
Arici et al. A bert-based scoring system for workplace safety courses in italian
CN116483314A (en) Automatic intelligent activity diagram generation method
CN113627197B (en) Text intention recognition method, device, equipment and storage medium
CN114676699A (en) Entity emotion analysis method and device, computer equipment and storage medium
CN115470790A (en) Method and device for identifying named entities in file
CN115238077A (en) Text analysis method, device and equipment based on artificial intelligence and storage medium
CN115129859A (en) Intention recognition method, intention recognition device, electronic device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination