CN110197279A

CN110197279A - Transformation model training method, device, equipment and storage medium

Info

Publication number: CN110197279A
Application number: CN201910498146.6A
Authority: CN
Inventors: 陈徐屹; 何径舟; 冯仕堃; 朱丹翔; 朱志凡
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2019-06-10
Filing date: 2019-06-10
Publication date: 2019-09-03
Anticipated expiration: 2039-06-10
Also published as: CN110197279B

Abstract

The embodiment of the present invention proposes a kind of transformation model training method, device, equipment and storage medium.The transformation model training method includes: to obtain the pre-training sample including dialogue data；Using the dialogue data, input feature vector and pre-training target are generated；Initial transformation model is trained using the loss of the input feature vector, the pre-training target and pre-training, obtains pre-training transformation model.Transformation model can be improved in the forecasting accuracy of semantic expressiveness using dialogue data training transformation model in the embodiment of the present invention.Also, initial transformation model is trained after obtaining pre-training transformation model using dialogue data, transformation model needed for recycling pre-training variation model training concrete application scene can be improved the convergence rate of transformation model training.

Description

Transformation model training method, device, equipment and storage medium

Technical field

The present invention relates to technical field of data processing more particularly to a kind of transformation model training method, device, equipment and deposit Storage media.

Background technique

Much machine learning tasks need to realize using monitoring data, and a small amount of monitoring data is unable to satisfy current big rule The training demand of Molded Depth degree learning model.But the monitoring data manually marked is possible to noise occur, for example, because personal Classification standard caused by factor is uncertain etc..

Using transformation (transformer) model as network structure, for unsupervised task, using without artificial mark The data of note can be trained.But the convergence rate of transformer model training process is slow at present, time-consuming, and Model prediction accuracy is to be improved.

Summary of the invention

The embodiment of the present invention provides a kind of transformation model training method, device, equipment and storage medium, to solve existing skill One or more technical problems in art.

In a first aspect, the embodiment of the invention provides a kind of transformation model training methods, comprising:

Obtain the pre-training sample including dialogue data；

Using the dialogue data, input feature vector and pre-training target are generated；

Initial transformation model is trained using the loss of the input feature vector, the pre-training target and pre-training, is obtained To pre-training transformation model.

In one embodiment, this method further include:

The pre-training transformation model is trained using goal task training sample and goal task loss, obtains mesh The transformation model of mark task.

In one embodiment, using the dialogue data, input feature vector and pre-training target are generated, comprising:

Multiple word segments are obtained to cutting is carried out to the dialogue in the dialogue data using word segment segmentation algorithm；

Obtain the position embedding information and dialogue embedding information of each institute's predicate segment；

Selected section content is as the pre-training target from multiple institute's predicate segments；

Covering treatment is carried out to from content selected in multiple institute's predicate segments, obtains word embedding information；

Using institute's predicate embedding information, the position embedding information and the dialogue embedding information as the input feature vector.

In one embodiment, it is lost using the input feature vector, the pre-training target and pre-training and is become to initial Mold changing type is trained, and obtains pre-training transformation model, comprising:

Loss is replied using the input feature vector, the pre-training target, dialogue and covers language model loss to described Initial transformation model is trained, and is adjusted to the initial parameter of the initial transformation model；

In the case where loss is replied in the dialogue and covering language model loss no longer reduces, the pre- instruction is obtained Practice transformation model.

In one embodiment, mould is converted to the pre-training using goal task training sample and goal task loss Type is trained, and obtains the transformation model of goal task, comprising:

The pre-training transformation model is trained using goal task training sample and goal task loss, to described The pre-training parameter of pre-training transformation model is adjusted；

In the case where goal task loss no longer reduces, the transformation model of the goal task is obtained.

Second aspect, the embodiment of the invention provides a kind of transformation model training devices, comprising:

Module is obtained, for obtaining the pre-training sample including dialogue data；

Generation module generates input feature vector and pre-training target for utilizing the dialogue data；

First training module, for being become using the input feature vector, the pre-training target and pre-training loss to initial Mold changing type is trained, and obtains pre-training transformation model.

In one embodiment, the device further include:

Second training module, for converting mould to the pre-training using goal task training sample and goal task loss Type is trained, and obtains the transformation model of goal task.

In one embodiment, the generation module includes:

Cutting submodule, for using word segment segmentation algorithm to the dialogue in the dialogue data to carry out cutting, obtain To multiple word segments；

Acquisition submodule, for obtaining the position embedding information and dialogue embedding information of each institute's predicate segment；

Select submodule, for from multiple institute's predicate segments selected section content as the pre-training target；

Submodule is covered, for carrying out covering treatment to from content selected in multiple institute's predicate segments, it is embedding to obtain word Enter information；

Input submodule, for making institute's predicate embedding information, the position embedding information and the dialogue embedding information For the input feature vector.

In one embodiment, first training module is also used to utilize the input feature vector, the pre-training mesh Mark, dialogue reply loss and cover language model loss and be trained to the initial transformation model, to the initial transformation mould The initial parameter of type is adjusted；The case where loss is replied in the dialogue and covering language model loss no longer reduces Under, obtain the pre-training transformation model.

In one embodiment, second training module is also used to utilize goal task training sample and goal task Loss is trained the pre-training transformation model, is adjusted to the pre-training parameter of the pre-training transformation model；? In the case that the goal task loss no longer reduces, the transformation model of the goal task is obtained.

The third aspect, the embodiment of the invention provides a kind of transformation model training equipment, the function of the equipment can lead to Hardware realization is crossed, corresponding software realization can also be executed by hardware.The hardware or software include it is one or more with it is upper State the corresponding module of function.

It include processor and memory in the structure of the equipment in a possible design, the memory is used for Storage supports the equipment to execute the program of above-mentioned transformation model training method, the processor is configured to described for executing The program stored in memory.The equipment can also include communication interface, be used for and other equipment or communication.

Fourth aspect, the embodiment of the invention provides a kind of computer readable storage mediums, for storing transformation model instruction Practice computer software instructions used in equipment comprising for executing program involved in above-mentioned transformation model training method.

A technical solution in above-mentioned technical proposal is had the following advantages that or the utility model has the advantages that is become using dialogue data training Transformation model can be improved in the forecasting accuracy of semantic expressiveness especially colloquial style expression etc. in mold changing type.Also, utilize dialogue Data are trained initial transformation model, the pre-training transformation model of available intermediate state, subsequent, recycle pre-training When transformation model needed for variation model training concrete application scene, the convergence rate of transformation model training can be improved.

Above-mentioned general introduction is merely to illustrate that the purpose of book, it is not intended to be limited in any way.Except foregoing description Schematical aspect, except embodiment and feature, by reference to attached drawing and the following detailed description, the present invention is further Aspect, embodiment and feature, which will be, to be readily apparent that.

Detailed description of the invention

In the accompanying drawings, unless specified otherwise herein, otherwise indicate the same or similar through the identical appended drawing reference of multiple attached drawings Component or element.What these attached drawings were not necessarily to scale.It should be understood that these attached drawings depict only according to the present invention Disclosed some embodiments, and should not serve to limit the scope of the present invention.

Fig. 1 shows the flow chart of transformation model training method according to an embodiment of the present invention.

Fig. 2 shows the flow charts of transformation model training method according to an embodiment of the present invention.

Fig. 3 shows the flow chart of transformation model training method according to an embodiment of the present invention.

Fig. 4 shows the flow chart of transformation model training method according to an embodiment of the present invention.

Fig. 5 shows the exemplary diagram of semantic similarity in transformation model training method according to an embodiment of the present invention.

Fig. 6 show transformation model training method according to an embodiment of the present invention using exemplary schematic diagram.

Fig. 7 shows the structural block diagram of transformation model training device according to an embodiment of the present invention.

Fig. 8 shows the structural block diagram of transformation model training device according to an embodiment of the present invention.

Fig. 9 shows the structural block diagram of transformation model training equipment according to an embodiment of the present invention.

Specific embodiment

Hereinafter, certain exemplary embodiments are simply just described.As one skilled in the art will recognize that Like that, without departing from the spirit or scope of the present invention, described embodiment can be modified by various different modes. Therefore, attached drawing and description are considered essentially illustrative rather than restrictive.

Fig. 1 shows the flow chart of transformation model training method according to an embodiment of the present invention.As shown in Figure 1, this method packet It includes:

Step S11, the pre-training sample including dialogue data is obtained.

Step S12, using the dialogue data, input feature vector and pre-training target are generated.

Step S13, it is lost using the input feature vector, the pre-training target and pre-training and initial transformation model is carried out Training, obtains pre-training transformation model.

In one example, transformation (transformer) model may include encoder (encoder) framework.Encoder It may include from attention layer and feedforward neural network.It can be used for from attention layer in the case where paying close attention to current word, also The available current word that arrives is in the semanteme of context.

In the present embodiment, training corpus, that is, pre-training sample used may include multi-source data knowledge, including dialogue Data, encyclopaedia data and news data etc..Wherein, encyclopaedia data may include the encyclopaedia obtained from various encyclopaedic knowledge webpages Class article etc..News data may include the Domestic News etc. obtained from various news web pages.Dialogue data may include from The conversational class data obtained in various forum Web pages.Corpus can be crawled from webpage using modes such as web crawlers.

In the embodiment of the present invention, using dialogue data training transformation model, it is outstanding in semantic expressiveness that transformation model can be improved It is the forecasting accuracy of colloquial style expression etc..Also, initial transformation model is trained using dialogue data, it is available The pre-training transformation model of intermediate state, subsequent, transformation needed for recycling pre-training variation model training concrete application scene When model, the convergence rate of transformation model training can be improved.

In one embodiment, as shown in Fig. 2, this method further include:

Step S14, the pre-training transformation model is instructed using goal task training sample and goal task loss Practice, obtains the transformation model of goal task.

In a kind of example, translation model can be adapted for several scenes, therefore can have plurality of target task.Example Such as, Chinese emotion recognition task, Chinese part-of-speech tagging task, XNLI (Natural Language Inference, natural language Infer) task dispatching.

In one embodiment, as shown in figure 3, step S12 includes:

Step S21, multiple words are obtained to cutting is carried out to the dialogue in the dialogue data using word segment segmentation algorithm Segment.

It is the segmentation algorithm of words using word segment segmentation algorithm (sentence-piece).Sentence can be cut into more A word segment.By taking Chinese as an example, the granularity of word segment is between word granularity and word granularity.

Step S22, the position embedding information and dialogue embedding information of each institute's predicate segment are obtained.

After some dialogue is obtained multiple word segments to cutting, overall identification can be set in beginning of the sentence to indicate beginning of the sentence position It sets.Among dialogue and section break mark can be set to indicate conversational character conversion and talk with to end in end position.According to Dialogue is ranked up the sequence of statement, obtains the sequence number of beginning of the sentence mark, word segment and section break mark.These sequences Number can indicate the position embedding information (Position Embedding) of each word segment.

In addition, the dialogue embedding information (Dialogue Embedding) of each word segment may include that participle section is belonged to The corresponding role identification of different dialogue role.

Step S23, from multiple institute's predicate segments, selected section content is as the pre-training target.

Step S24, covering treatment is carried out to from content selected in multiple institute's predicate segments, obtains word embedding information.

Selecting as the content of pre-training target can be to have the expression of concrete meaning.Selected word segment is carried out Covering treatment can replace the content for originally belonging to these word segments with covered logo.Finally, the word embedding information obtained (Token Embedding) includes beginning of the sentence mark, covered logo, word segment, section break mark etc..

Step S25, using institute's predicate embedding information, the position embedding information and the dialogue embedding information as described defeated Enter feature.

The dialogue embedding information obtained using dialogue data may include conversational character.For example, a dialogue is to including " bright How is it weather? " and " fine day ".Wherein, " how is the weather of tomorrow? " conversational character be inquiry role, it is " fine It " conversational character be reply role.

If pre-training sample is that non-conversational data, the word segments that cutting obtains such as encyclopaedia data or news data are not necessarily to Role identification is set.Therefore, the input feature vector of these non-conversational data can not include dialogue embedding information, only include word insertion Information and position embedding information.The generation side of the word embedding informations of non-conversational data, position embedding information and pre-training target Method is similar with dialogue data, and details are not described herein.

During model training, the pre-training sample set of selection may include several dialogue data and non-conversational number According to.Without manually marking, word embedding information, position insertion can be automatically extracted from the pre-training sample of pre-training sample set Information, dialogue embedding information, pre-training target etc..Therefore, the pre-training of the transformation model of the embodiment of the present invention belongs to unsupervised Task.

In one embodiment, as shown in figure 3, step S13 includes:

Step S31, it replys loss using the input feature vector, the pre-training target, dialogue and covers language model and damage Mistake is trained the initial transformation model, is adjusted to the initial parameter of the initial transformation model.

Step S32, it in the case where loss is replied in the dialogue and covering language model loss no longer reduces, obtains The pre-training transformation model.

It include to word embedding information, position embedding information and dialogue insertion letter in input feature vector during model training In the case where breath, after input feature vector input model, available corresponding output feature.Calculate output feature and pre-training mesh Difference between mark can obtain dialogue and reply loss.

It, will be defeated in the case where input feature vector includes word embedding information and position embedding information during model training After entering feature input model, available corresponding output feature.Calculate the difference between output feature and pre-training target, energy It accesses and covers language model loss.Dialogue reply loss and the calculation method for covering language model loss can be with reference to intersections Entropy.

A kind of example of cross entropy formula includes:

Wherein, H (p, q) indicates loss, and p indicates the probability distribution of the output feature of model, and q indicates the general of pre-training target Rate distribution, x indicate the word segment of input.

If loss is larger, the initial parameter of initial transformation model can be adjusted.The method of parameter adjustment has more Kind, such as adaptability Matrix Estimation (Adam, adaptive moment estimation) optimization algorithm.Adam is that one kind can be with The first-order optimization method of stochastic gradient descent process is substituted, it can iteratively update neural network weight based on training data.

After certain adjusting parameter, after the input feature vector of pre-training sample to be inputted to current transformation model again, according to The loss that output feature obtains no longer reduces, can be using the parameter of current transformation model as the pre- instruction of pre-training transformation model Practice parameter.

In one embodiment, as shown in figure 4, step S14 includes:

Step S41, the pre-training transformation model is instructed using goal task training sample and goal task loss Practice, the pre-training parameter of the pre-training transformation model is adjusted.

Step S42, in the case where goal task loss no longer reduces, the transformation mould of the goal task is obtained Type.

Different goal tasks, training sample used may be different with loss.For example, the instruction of Chinese emotion recognition task Practicing sample may include " workmanship is very beautiful, and wife is delithted with ", and loss can be 0.1.For another example, Chinese part-of-speech tagging task can be with Including " the young teacher that is short thin and having on high myopia glasses that sees suddenly runs quickly and appear on the stage speech ", loss can be 0.2.For another example, XNLI task may include " pass by one month since election, so-and-so still very to a high-profile ", loss can be 0.2.

After generating input feature vector and training objective using goal task training sample, input feature vector can be inputted into pre-training After transformation model, available corresponding output feature.The difference between output feature and the training objective of goal task is calculated, It can obtain goal task loss.If loss is larger, can the pre-training parameter to pre-training transformation model be adjusted.So After continue to train.In the case where goal task loss no longer reduces, using the parameter of "current" model as the transformation of goal task The parameter of model.

In a kind of application example, one kind can be generated in using word segment segmentation algorithm (sentence-piece) Word segment granularity between text granularity and word granularity.It therefore, can be between model vocabulary size and the ability to express of model Weighed.For example, " natural language " this word is 4 individual characters if split according to word granularity.If according to word grain It is an entire word that degree, which is split,.High frequency segment therein is extracted using sentence-piece.Obtain " nature ", " language " two Word segment, sequence length 2.In order to solve the problems, such as that the parallel traffic volume of major term table bring is double, may be used also in the training process To introduce normalization exponential function (sampled_softmax) to accelerate to restrain.

Training corpus used may include multi-source data knowledge.For example, encyclopaedia class article, Domestic News, forum's dialogue Deng.Study to dialogue data is the important channel of semantic expressiveness.It is identical that reply corresponding inquiry (Query) often similar.Such as Shown in Fig. 5, inquiry (Query) " where you are born? " " your native place which? " reply be all " Beijing ".Therefore, this two Although talking about, pleonasm is few, they have very strong dialogue similitude, i.e. the two semantic similarity.And inquire (Query) that " you where Year birth? " answer " nineteen ninety " be the time rather than place.Therefore, although " when you are born? " " where you go out Raw? " literal similarity it is very high (editing distance of the two be 1), both but their practical semantemes have a long way to go, i.e., semantic phase From.

It, can be using DLM (Dialogue Language Model, dialogue in order to more realistically portray sentence semantics information Language model) modeling inquiry reply (Query-Response) session structure.As shown in fig. 6, by talking with to (Dialogue Pair) as the input of model, the role of dialogue is identified using dialogue embedding information (Dialogue Embedding).It utilizes Dialogue replys loss (Dialogue Response Loss) and covers language model loss (Mask LM loss) study dialogue Implicit relationship, the semantic expressiveness ability of lift scheme.In a kind of example, the implicit relationship of dialogue is included in continuous dialogue The sentence that the same teller says usually has similitude.

Referring to Fig. 6, dialogue to include " you how old? " and " 19 years old ".Cutting obtain multiple word segments " you ", " several ", " year ", " ", "? ", " 19 ", " year ".Overall identification [cls] is set in the foremost of dialogue pair, indicates beginning of the sentence position.Talking with Section break mark [sep] can be set in intermediate and end position, indicates conversational character conversion and talks with to end.According to dialogue pair The sequence of statement is ranked up, beginning of the sentence mark, word segment and section break mark sequence number may refer in Fig. 60 to 9.These sequence numbers can indicate the position embedding information (Position Embedding) of each word segment.

In addition, the dialogue embedding information (Dialogue Embedding) of each word segment may include that word segment is belonged to The corresponding role identification of different dialogue role.For example, the role identification for talking with centering INQUIRE statement in Fig. 6 is " I ", dialogue pair The role identification of middle revert statement is " R ".Specifically, " you ", " how many ", " year ", " ", "? " role identification be " I ", " 19 ", " year " role identification be " R ".

For example, selecting " several ", " year " and " 19 " as pre-training target.Selecting can as the content of pre-training target Think the expression for having concrete meaning.Covering treatment is carried out to selected word segment, the content of these word segments will be originally belonged to It is replaced with covered logo [mask].Finally, the word embedding information (Token Embedding) obtained include " [cls] ", " you ", " [mask] ", " [mask] ", " ", "? ", " [sep] " " [mask] ", " year " " [sep] ".After covering treatment beginning of the sentence mark, Covered logo, section break mark and the corresponding position embedding information of word segment and dialogue embedding information, referring to Fig. 6.

Dialogue data is added on the basis ofs the encyclopaedia corpus of writtenization relatively, Domestic News etc., can be enhanced to spoken language Semantic expressiveness ability, expanded the application category of transformation model, better semantic expressiveness effect can be obtained.

Fig. 7 shows the structural block diagram of transformation model training device according to an embodiment of the present invention.As shown in fig. 7, the device May include:

Module 71 is obtained, for obtaining the pre-training sample including dialogue data；

Generation module 72 generates input feature vector and pre-training target for utilizing the dialogue data；

First training module 73, for being lost using the input feature vector, the pre-training target and pre-training to initial Transformation model is trained, and obtains pre-training transformation model.

In one embodiment, as shown in figure 8, the device further include:

Second training module 74, for being converted using goal task training sample and goal task loss to the pre-training Model is trained, and obtains the transformation model of goal task.

In one embodiment, as shown in figure 8, the generation module 72 includes:

Cutting submodule 721, for using word segment segmentation algorithm to the dialogue in the dialogue data to carry out cutting, Obtain multiple word segments；

Acquisition submodule 722, for obtaining the position embedding information and dialogue embedding information of each institute's predicate segment；

Select submodule 723, for from multiple institute's predicate segments selected section content as the pre-training target；

Submodule 724 is covered, for carrying out covering treatment to from content selected in multiple institute's predicate segments, obtains word Embedding information；

Input submodule 725 is used for institute's predicate embedding information, the position embedding information and the dialogue embedding information As the input feature vector.

In one embodiment, first training module 73 is also used to utilize the input feature vector, the pre-training Target, dialogue reply loss and cover language model loss and be trained to the initial transformation model, to the initial transformation The initial parameter of model is adjusted；The case where loss is replied in the dialogue and covering language model loss no longer reduces Under, obtain the pre-training transformation model.

In one embodiment, second training module 74 is also used to appoint using goal task training sample and target Business loss is trained the pre-training transformation model, is adjusted to the pre-training parameter of the pre-training transformation model； In the case where goal task loss no longer reduces, the transformation model of the goal task is obtained.

The function of each module in each device of the embodiment of the present invention may refer to the corresponding description in the above method, herein not It repeats again.

Fig. 9 shows the structural block diagram of transformation model training equipment according to an embodiment of the present invention.As shown in figure 9, the equipment Include: memory 910 and processor 920, the computer program that can be run on processor 920 is stored in memory 910.Institute State the transformation model training method realized in above-described embodiment when processor 920 executes the computer program.The memory 910 and processor 920 quantity can for one or more.

The equipment further include:

Communication interface 930 carries out data interaction for being communicated with external device.

Memory 910 may include high speed RAM memory, it is also possible to further include nonvolatile memory (non- Volatile memory), a for example, at least magnetic disk storage.

If memory 910, processor 920 and the independent realization of communication interface 930, memory 910,920 and of processor Communication interface 930 can be connected with each other by bus and complete mutual communication.The bus can be Industry Standard Architecture Structure (ISA, Industry Standard Architecture) bus, external equipment interconnection (PCI, Peripheral Component Interconnect) bus or extended industry-standard architecture (EISA, Extended Industry Standard Architecture) bus etc..The bus can be divided into address bus, data/address bus, control bus etc..For Convenient for indicating, only indicated with a thick line in Fig. 9, it is not intended that an only bus or a type of bus.

Optionally, in specific implementation, if memory 910, processor 920 and communication interface 930 are integrated in one piece of core On piece, then memory 910, processor 920 and communication interface 930 can complete mutual communication by internal interface.

The embodiment of the invention provides a kind of computer readable storage mediums, are stored with computer program, the program quilt Processor realizes any method in above-described embodiment when executing.

In the description of this specification, reference term " one embodiment ", " some embodiments ", " example ", " specifically show The description of example " or " some examples " etc. means specific features, structure, material or spy described in conjunction with this embodiment or example Point is included at least one embodiment or example of the invention.Moreover, particular features, structures, materials, or characteristics described It may be combined in any suitable manner in any one or more of the embodiments or examples.In addition, without conflicting with each other, this The technical staff in field can be by the spy of different embodiments or examples described in this specification and different embodiments or examples Sign is combined.

In addition, term " first ", " second " are used for descriptive purposes only and cannot be understood as indicating or suggesting relative importance Or implicitly indicate the quantity of indicated technical characteristic." first " is defined as a result, the feature of " second " can be expressed or hidden It include at least one this feature containing ground.In the description of the present invention, the meaning of " plurality " is two or more, unless otherwise Clear specific restriction.

Any process described otherwise above or method description are construed as in flow chart or herein, and expression includes It is one or more for realizing specific logical function or process the step of executable instruction code module, segment or portion Point, and the range of the preferred embodiment of the present invention includes other realization, wherein can not press shown or discussed suitable Sequence, including according to related function by it is basic simultaneously in the way of or in the opposite order, Lai Zhihang function, this should be of the invention Embodiment person of ordinary skill in the field understood.

Expression or logic and/or step described otherwise above herein in flow charts, for example, being considered use In the order list for the executable instruction for realizing logic function, may be embodied in any computer-readable medium, for Instruction execution system, device or equipment (such as computer based system, including the system of processor or other can be held from instruction The instruction fetch of row system, device or equipment and the system executed instruction) it uses, or combine these instruction execution systems, device or set It is standby and use.For the purpose of this specification, " computer-readable medium ", which can be, any may include, stores, communicates, propagates or pass Defeated program is for instruction execution system, device or equipment or the dress used in conjunction with these instruction execution systems, device or equipment It sets.The more specific example (non-exhaustive list) of computer-readable medium include the following: there is the electricity of one or more wirings Interconnecting piece (electronic device), portable computer diskette box (magnetic device), random access memory (RAM), read-only memory (ROM), erasable edit read-only storage (EPROM or flash memory), fiber device and portable read-only memory (CDROM).In addition, computer-readable medium can even is that the paper that can print described program on it or other suitable Jie Matter, because can then be edited, be interpreted or when necessary with other for example by carrying out optical scanner to paper or other media Suitable method is handled electronically to obtain described program, is then stored in computer storage.

It should be appreciated that each section of the invention can be realized with hardware, software, firmware or their combination.Above-mentioned In embodiment, software that multiple steps or method can be executed in memory and by suitable instruction execution system with storage Or firmware is realized.It, and in another embodiment, can be under well known in the art for example, if realized with hardware Any one of column technology or their combination are realized: having a logic gates for realizing logic function to data-signal Discrete logic, with suitable combinational logic gate circuit specific integrated circuit, programmable gate array (PGA), scene Programmable gate array (FPGA) etc..

Those skilled in the art are understood that realize all or part of step that above-described embodiment method carries It suddenly is that relevant hardware can be instructed to complete by program, the program can store in a kind of computer-readable storage medium In matter, which when being executed, includes the steps that one or a combination set of embodiment of the method.

It, can also be in addition, each functional unit in each embodiment of the present invention can integrate in a processing module It is that each unit physically exists alone, can also be integrated in two or more units in a module.Above-mentioned integrated mould Block both can take the form of hardware realization, can also be realized in the form of software function module.The integrated module is such as Fruit is realized and when sold or used as an independent product in the form of software function module, also can store in a computer In readable storage medium storing program for executing.The storage medium can be read-only memory, disk or CD etc..

The above description is merely a specific embodiment, but scope of protection of the present invention is not limited thereto, any Those familiar with the art in the technical scope disclosed by the present invention, can readily occur in its various change or replacement, These should be covered by the protection scope of the present invention.Therefore, protection scope of the present invention should be with the guarantor of the claim It protects subject to range.

Claims

1. a kind of transformation model training method characterized by comprising

Obtain the pre-training sample including dialogue data；

Initial transformation model is trained using the loss of the input feature vector, the pre-training target and pre-training, is obtained pre- Training transformation model.

2. the method according to claim 1, wherein further include:

The pre-training transformation model is trained using goal task training sample and goal task loss, target is obtained and appoints The transformation model of business.

3. the method according to claim 1, wherein generating input feature vector and pre- instruction using the dialogue data Practice target, comprising:

4. according to the method described in claim 3, it is characterized in that, utilizing the input feature vector, the pre-training target and pre- Training loss is trained initial transformation model, obtains pre-training transformation model, comprising:

In the case where loss is replied in the dialogue and covering language model loss no longer reduces, obtains the pre-training and become Mold changing type.

5. according to the method described in claim 2, it is characterized in that, utilizing goal task training sample and goal task loss pair The pre-training transformation model is trained, and obtains the transformation model of goal task, comprising:

The pre-training transformation model is trained using goal task training sample and goal task loss, to the pre- instruction The pre-training parameter for practicing transformation model is adjusted；

6. a kind of transformation model training device characterized by comprising

First training module, for being lost using the input feature vector, the pre-training target and pre-training to initial transformation mould Type is trained, and obtains pre-training transformation model.

7. device according to claim 6, which is characterized in that further include:

Second training module, for using goal task training sample and goal task loss to the pre-training transformation model into Row training, obtains the transformation model of goal task.

8. device according to claim 6, which is characterized in that the generation module includes:

Cutting submodule, for using word segment segmentation algorithm to the dialogue in the dialogue data to carry out cutting, obtain more A word segment；

Submodule is covered, for carrying out covering treatment to from content selected in multiple institute's predicate segments, obtains word insertion letter Breath；

Input submodule, for using institute's predicate embedding information, the position embedding information and the dialogue embedding information as institute State input feature vector.

9. device according to claim 8, which is characterized in that first training module is also used to special using the input Sign, the pre-training target, dialogue reply loss and cover language model loss and be trained to the initial transformation model, right The initial parameter of the initial transformation model is adjusted；Loss and covering language model loss are replied not in the dialogue In the case where reducing again, the pre-training transformation model is obtained.

10. device according to claim 7, which is characterized in that second training module is also used to utilize goal task Training sample and goal task loss are trained the pre-training transformation model, to the pre- instruction of the pre-training transformation model Practice parameter to be adjusted；In the case where goal task loss no longer reduces, the transformation model of the goal task is obtained.

11. a kind of transformation model training equipment characterized by comprising

One or more processors；

Storage device, for storing one or more programs；

When one or more of programs are executed by one or more of processors, so that one or more of processors Realize the method as described in any one of claims 1 to 5.

12. a kind of computer readable storage medium, is stored with computer program, which is characterized in that the program is held by processor The method as described in any one of claims 1 to 5 is realized when row.