CN110197279A - Transformation model training method, device, equipment and storage medium - Google Patents
Transformation model training method, device, equipment and storage medium Download PDFInfo
- Publication number
- CN110197279A CN110197279A CN201910498146.6A CN201910498146A CN110197279A CN 110197279 A CN110197279 A CN 110197279A CN 201910498146 A CN201910498146 A CN 201910498146A CN 110197279 A CN110197279 A CN 110197279A
- Authority
- CN
- China
- Prior art keywords
- training
- transformation model
- dialogue
- loss
- goal task
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/088—Non-supervised learning, e.g. competitive learning
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Machine Translation (AREA)
Abstract
The embodiment of the present invention proposes a kind of transformation model training method, device, equipment and storage medium.The transformation model training method includes: to obtain the pre-training sample including dialogue data;Using the dialogue data, input feature vector and pre-training target are generated;Initial transformation model is trained using the loss of the input feature vector, the pre-training target and pre-training, obtains pre-training transformation model.Transformation model can be improved in the forecasting accuracy of semantic expressiveness using dialogue data training transformation model in the embodiment of the present invention.Also, initial transformation model is trained after obtaining pre-training transformation model using dialogue data, transformation model needed for recycling pre-training variation model training concrete application scene can be improved the convergence rate of transformation model training.
Description
Technical field
The present invention relates to technical field of data processing more particularly to a kind of transformation model training method, device, equipment and deposit
Storage media.
Background technique
Much machine learning tasks need to realize using monitoring data, and a small amount of monitoring data is unable to satisfy current big rule
The training demand of Molded Depth degree learning model.But the monitoring data manually marked is possible to noise occur, for example, because personal
Classification standard caused by factor is uncertain etc..
Using transformation (transformer) model as network structure, for unsupervised task, using without artificial mark
The data of note can be trained.But the convergence rate of transformer model training process is slow at present, time-consuming, and
Model prediction accuracy is to be improved.
Summary of the invention
The embodiment of the present invention provides a kind of transformation model training method, device, equipment and storage medium, to solve existing skill
One or more technical problems in art.
In a first aspect, the embodiment of the invention provides a kind of transformation model training methods, comprising:
Obtain the pre-training sample including dialogue data;
Using the dialogue data, input feature vector and pre-training target are generated;
Initial transformation model is trained using the loss of the input feature vector, the pre-training target and pre-training, is obtained
To pre-training transformation model.
In one embodiment, this method further include:
The pre-training transformation model is trained using goal task training sample and goal task loss, obtains mesh
The transformation model of mark task.
In one embodiment, using the dialogue data, input feature vector and pre-training target are generated, comprising:
Multiple word segments are obtained to cutting is carried out to the dialogue in the dialogue data using word segment segmentation algorithm;
Obtain the position embedding information and dialogue embedding information of each institute's predicate segment;
Selected section content is as the pre-training target from multiple institute's predicate segments;
Covering treatment is carried out to from content selected in multiple institute's predicate segments, obtains word embedding information;
Using institute's predicate embedding information, the position embedding information and the dialogue embedding information as the input feature vector.
In one embodiment, it is lost using the input feature vector, the pre-training target and pre-training and is become to initial
Mold changing type is trained, and obtains pre-training transformation model, comprising:
Loss is replied using the input feature vector, the pre-training target, dialogue and covers language model loss to described
Initial transformation model is trained, and is adjusted to the initial parameter of the initial transformation model;
In the case where loss is replied in the dialogue and covering language model loss no longer reduces, the pre- instruction is obtained
Practice transformation model.
In one embodiment, mould is converted to the pre-training using goal task training sample and goal task loss
Type is trained, and obtains the transformation model of goal task, comprising:
The pre-training transformation model is trained using goal task training sample and goal task loss, to described
The pre-training parameter of pre-training transformation model is adjusted;
In the case where goal task loss no longer reduces, the transformation model of the goal task is obtained.
Second aspect, the embodiment of the invention provides a kind of transformation model training devices, comprising:
Module is obtained, for obtaining the pre-training sample including dialogue data;
Generation module generates input feature vector and pre-training target for utilizing the dialogue data;
First training module, for being become using the input feature vector, the pre-training target and pre-training loss to initial
Mold changing type is trained, and obtains pre-training transformation model.
In one embodiment, the device further include:
Second training module, for converting mould to the pre-training using goal task training sample and goal task loss
Type is trained, and obtains the transformation model of goal task.
In one embodiment, the generation module includes:
Cutting submodule, for using word segment segmentation algorithm to the dialogue in the dialogue data to carry out cutting, obtain
To multiple word segments;
Acquisition submodule, for obtaining the position embedding information and dialogue embedding information of each institute's predicate segment;
Select submodule, for from multiple institute's predicate segments selected section content as the pre-training target;
Submodule is covered, for carrying out covering treatment to from content selected in multiple institute's predicate segments, it is embedding to obtain word
Enter information;
Input submodule, for making institute's predicate embedding information, the position embedding information and the dialogue embedding information
For the input feature vector.
In one embodiment, first training module is also used to utilize the input feature vector, the pre-training mesh
Mark, dialogue reply loss and cover language model loss and be trained to the initial transformation model, to the initial transformation mould
The initial parameter of type is adjusted;The case where loss is replied in the dialogue and covering language model loss no longer reduces
Under, obtain the pre-training transformation model.
In one embodiment, second training module is also used to utilize goal task training sample and goal task
Loss is trained the pre-training transformation model, is adjusted to the pre-training parameter of the pre-training transformation model;?
In the case that the goal task loss no longer reduces, the transformation model of the goal task is obtained.
The third aspect, the embodiment of the invention provides a kind of transformation model training equipment, the function of the equipment can lead to
Hardware realization is crossed, corresponding software realization can also be executed by hardware.The hardware or software include it is one or more with it is upper
State the corresponding module of function.
It include processor and memory in the structure of the equipment in a possible design, the memory is used for
Storage supports the equipment to execute the program of above-mentioned transformation model training method, the processor is configured to described for executing
The program stored in memory.The equipment can also include communication interface, be used for and other equipment or communication.
Fourth aspect, the embodiment of the invention provides a kind of computer readable storage mediums, for storing transformation model instruction
Practice computer software instructions used in equipment comprising for executing program involved in above-mentioned transformation model training method.
A technical solution in above-mentioned technical proposal is had the following advantages that or the utility model has the advantages that is become using dialogue data training
Transformation model can be improved in the forecasting accuracy of semantic expressiveness especially colloquial style expression etc. in mold changing type.Also, utilize dialogue
Data are trained initial transformation model, the pre-training transformation model of available intermediate state, subsequent, recycle pre-training
When transformation model needed for variation model training concrete application scene, the convergence rate of transformation model training can be improved.
Above-mentioned general introduction is merely to illustrate that the purpose of book, it is not intended to be limited in any way.Except foregoing description
Schematical aspect, except embodiment and feature, by reference to attached drawing and the following detailed description, the present invention is further
Aspect, embodiment and feature, which will be, to be readily apparent that.
Detailed description of the invention
In the accompanying drawings, unless specified otherwise herein, otherwise indicate the same or similar through the identical appended drawing reference of multiple attached drawings
Component or element.What these attached drawings were not necessarily to scale.It should be understood that these attached drawings depict only according to the present invention
Disclosed some embodiments, and should not serve to limit the scope of the present invention.
Fig. 1 shows the flow chart of transformation model training method according to an embodiment of the present invention.
Fig. 2 shows the flow charts of transformation model training method according to an embodiment of the present invention.
Fig. 3 shows the flow chart of transformation model training method according to an embodiment of the present invention.
Fig. 4 shows the flow chart of transformation model training method according to an embodiment of the present invention.
Fig. 5 shows the exemplary diagram of semantic similarity in transformation model training method according to an embodiment of the present invention.
Fig. 6 show transformation model training method according to an embodiment of the present invention using exemplary schematic diagram.
Fig. 7 shows the structural block diagram of transformation model training device according to an embodiment of the present invention.
Fig. 8 shows the structural block diagram of transformation model training device according to an embodiment of the present invention.
Fig. 9 shows the structural block diagram of transformation model training equipment according to an embodiment of the present invention.
Specific embodiment
Hereinafter, certain exemplary embodiments are simply just described.As one skilled in the art will recognize that
Like that, without departing from the spirit or scope of the present invention, described embodiment can be modified by various different modes.
Therefore, attached drawing and description are considered essentially illustrative rather than restrictive.
Fig. 1 shows the flow chart of transformation model training method according to an embodiment of the present invention.As shown in Figure 1, this method packet
It includes:
Step S11, the pre-training sample including dialogue data is obtained.
Step S12, using the dialogue data, input feature vector and pre-training target are generated.
Step S13, it is lost using the input feature vector, the pre-training target and pre-training and initial transformation model is carried out
Training, obtains pre-training transformation model.
In one example, transformation (transformer) model may include encoder (encoder) framework.Encoder
It may include from attention layer and feedforward neural network.It can be used for from attention layer in the case where paying close attention to current word, also
The available current word that arrives is in the semanteme of context.
In the present embodiment, training corpus, that is, pre-training sample used may include multi-source data knowledge, including dialogue
Data, encyclopaedia data and news data etc..Wherein, encyclopaedia data may include the encyclopaedia obtained from various encyclopaedic knowledge webpages
Class article etc..News data may include the Domestic News etc. obtained from various news web pages.Dialogue data may include from
The conversational class data obtained in various forum Web pages.Corpus can be crawled from webpage using modes such as web crawlers.
In the embodiment of the present invention, using dialogue data training transformation model, it is outstanding in semantic expressiveness that transformation model can be improved
It is the forecasting accuracy of colloquial style expression etc..Also, initial transformation model is trained using dialogue data, it is available
The pre-training transformation model of intermediate state, subsequent, transformation needed for recycling pre-training variation model training concrete application scene
When model, the convergence rate of transformation model training can be improved.
In one embodiment, as shown in Fig. 2, this method further include:
Step S14, the pre-training transformation model is instructed using goal task training sample and goal task loss
Practice, obtains the transformation model of goal task.
In a kind of example, translation model can be adapted for several scenes, therefore can have plurality of target task.Example
Such as, Chinese emotion recognition task, Chinese part-of-speech tagging task, XNLI (Natural Language Inference, natural language
Infer) task dispatching.
In one embodiment, as shown in figure 3, step S12 includes:
Step S21, multiple words are obtained to cutting is carried out to the dialogue in the dialogue data using word segment segmentation algorithm
Segment.
It is the segmentation algorithm of words using word segment segmentation algorithm (sentence-piece).Sentence can be cut into more
A word segment.By taking Chinese as an example, the granularity of word segment is between word granularity and word granularity.
Step S22, the position embedding information and dialogue embedding information of each institute's predicate segment are obtained.
After some dialogue is obtained multiple word segments to cutting, overall identification can be set in beginning of the sentence to indicate beginning of the sentence position
It sets.Among dialogue and section break mark can be set to indicate conversational character conversion and talk with to end in end position.According to
Dialogue is ranked up the sequence of statement, obtains the sequence number of beginning of the sentence mark, word segment and section break mark.These sequences
Number can indicate the position embedding information (Position Embedding) of each word segment.
In addition, the dialogue embedding information (Dialogue Embedding) of each word segment may include that participle section is belonged to
The corresponding role identification of different dialogue role.
Step S23, from multiple institute's predicate segments, selected section content is as the pre-training target.
Step S24, covering treatment is carried out to from content selected in multiple institute's predicate segments, obtains word embedding information.
Selecting as the content of pre-training target can be to have the expression of concrete meaning.Selected word segment is carried out
Covering treatment can replace the content for originally belonging to these word segments with covered logo.Finally, the word embedding information obtained
(Token Embedding) includes beginning of the sentence mark, covered logo, word segment, section break mark etc..
Step S25, using institute's predicate embedding information, the position embedding information and the dialogue embedding information as described defeated
Enter feature.
The dialogue embedding information obtained using dialogue data may include conversational character.For example, a dialogue is to including " bright
How is it weather? " and " fine day ".Wherein, " how is the weather of tomorrow? " conversational character be inquiry role, it is " fine
It " conversational character be reply role.
If pre-training sample is that non-conversational data, the word segments that cutting obtains such as encyclopaedia data or news data are not necessarily to
Role identification is set.Therefore, the input feature vector of these non-conversational data can not include dialogue embedding information, only include word insertion
Information and position embedding information.The generation side of the word embedding informations of non-conversational data, position embedding information and pre-training target
Method is similar with dialogue data, and details are not described herein.
During model training, the pre-training sample set of selection may include several dialogue data and non-conversational number
According to.Without manually marking, word embedding information, position insertion can be automatically extracted from the pre-training sample of pre-training sample set
Information, dialogue embedding information, pre-training target etc..Therefore, the pre-training of the transformation model of the embodiment of the present invention belongs to unsupervised
Task.
In one embodiment, as shown in figure 3, step S13 includes:
Step S31, it replys loss using the input feature vector, the pre-training target, dialogue and covers language model and damage
Mistake is trained the initial transformation model, is adjusted to the initial parameter of the initial transformation model.
Step S32, it in the case where loss is replied in the dialogue and covering language model loss no longer reduces, obtains
The pre-training transformation model.
It include to word embedding information, position embedding information and dialogue insertion letter in input feature vector during model training
In the case where breath, after input feature vector input model, available corresponding output feature.Calculate output feature and pre-training mesh
Difference between mark can obtain dialogue and reply loss.
It, will be defeated in the case where input feature vector includes word embedding information and position embedding information during model training
After entering feature input model, available corresponding output feature.Calculate the difference between output feature and pre-training target, energy
It accesses and covers language model loss.Dialogue reply loss and the calculation method for covering language model loss can be with reference to intersections
Entropy.
A kind of example of cross entropy formula includes:
Wherein, H (p, q) indicates loss, and p indicates the probability distribution of the output feature of model, and q indicates the general of pre-training target
Rate distribution, x indicate the word segment of input.
If loss is larger, the initial parameter of initial transformation model can be adjusted.The method of parameter adjustment has more
Kind, such as adaptability Matrix Estimation (Adam, adaptive moment estimation) optimization algorithm.Adam is that one kind can be with
The first-order optimization method of stochastic gradient descent process is substituted, it can iteratively update neural network weight based on training data.
After certain adjusting parameter, after the input feature vector of pre-training sample to be inputted to current transformation model again, according to
The loss that output feature obtains no longer reduces, can be using the parameter of current transformation model as the pre- instruction of pre-training transformation model
Practice parameter.
In one embodiment, as shown in figure 4, step S14 includes:
Step S41, the pre-training transformation model is instructed using goal task training sample and goal task loss
Practice, the pre-training parameter of the pre-training transformation model is adjusted.
Step S42, in the case where goal task loss no longer reduces, the transformation mould of the goal task is obtained
Type.
Different goal tasks, training sample used may be different with loss.For example, the instruction of Chinese emotion recognition task
Practicing sample may include " workmanship is very beautiful, and wife is delithted with ", and loss can be 0.1.For another example, Chinese part-of-speech tagging task can be with
Including " the young teacher that is short thin and having on high myopia glasses that sees suddenly runs quickly and appear on the stage speech ", loss can be 0.2.For another example,
XNLI task may include " pass by one month since election, so-and-so still very to a high-profile ", loss can be 0.2.
After generating input feature vector and training objective using goal task training sample, input feature vector can be inputted into pre-training
After transformation model, available corresponding output feature.The difference between output feature and the training objective of goal task is calculated,
It can obtain goal task loss.If loss is larger, can the pre-training parameter to pre-training transformation model be adjusted.So
After continue to train.In the case where goal task loss no longer reduces, using the parameter of "current" model as the transformation of goal task
The parameter of model.
In a kind of application example, one kind can be generated in using word segment segmentation algorithm (sentence-piece)
Word segment granularity between text granularity and word granularity.It therefore, can be between model vocabulary size and the ability to express of model
Weighed.For example, " natural language " this word is 4 individual characters if split according to word granularity.If according to word grain
It is an entire word that degree, which is split,.High frequency segment therein is extracted using sentence-piece.Obtain " nature ", " language " two
Word segment, sequence length 2.In order to solve the problems, such as that the parallel traffic volume of major term table bring is double, may be used also in the training process
To introduce normalization exponential function (sampled_softmax) to accelerate to restrain.
Training corpus used may include multi-source data knowledge.For example, encyclopaedia class article, Domestic News, forum's dialogue
Deng.Study to dialogue data is the important channel of semantic expressiveness.It is identical that reply corresponding inquiry (Query) often similar.Such as
Shown in Fig. 5, inquiry (Query) " where you are born? " " your native place which? " reply be all " Beijing ".Therefore, this two
Although talking about, pleonasm is few, they have very strong dialogue similitude, i.e. the two semantic similarity.And inquire (Query) that " you where
Year birth? " answer " nineteen ninety " be the time rather than place.Therefore, although " when you are born? " " where you go out
Raw? " literal similarity it is very high (editing distance of the two be 1), both but their practical semantemes have a long way to go, i.e., semantic phase
From.
It, can be using DLM (Dialogue Language Model, dialogue in order to more realistically portray sentence semantics information
Language model) modeling inquiry reply (Query-Response) session structure.As shown in fig. 6, by talking with to (Dialogue
Pair) as the input of model, the role of dialogue is identified using dialogue embedding information (Dialogue Embedding).It utilizes
Dialogue replys loss (Dialogue Response Loss) and covers language model loss (Mask LM loss) study dialogue
Implicit relationship, the semantic expressiveness ability of lift scheme.In a kind of example, the implicit relationship of dialogue is included in continuous dialogue
The sentence that the same teller says usually has similitude.
Referring to Fig. 6, dialogue to include " you how old? " and " 19 years old ".Cutting obtain multiple word segments " you ", " several ",
" year ", " ", "? ", " 19 ", " year ".Overall identification [cls] is set in the foremost of dialogue pair, indicates beginning of the sentence position.Talking with
Section break mark [sep] can be set in intermediate and end position, indicates conversational character conversion and talks with to end.According to dialogue pair
The sequence of statement is ranked up, beginning of the sentence mark, word segment and section break mark sequence number may refer in Fig. 60 to
9.These sequence numbers can indicate the position embedding information (Position Embedding) of each word segment.
In addition, the dialogue embedding information (Dialogue Embedding) of each word segment may include that word segment is belonged to
The corresponding role identification of different dialogue role.For example, the role identification for talking with centering INQUIRE statement in Fig. 6 is " I ", dialogue pair
The role identification of middle revert statement is " R ".Specifically, " you ", " how many ", " year ", " ", "? " role identification be " I ",
" 19 ", " year " role identification be " R ".
For example, selecting " several ", " year " and " 19 " as pre-training target.Selecting can as the content of pre-training target
Think the expression for having concrete meaning.Covering treatment is carried out to selected word segment, the content of these word segments will be originally belonged to
It is replaced with covered logo [mask].Finally, the word embedding information (Token Embedding) obtained include " [cls] ", " you ",
" [mask] ", " [mask] ", " ", "? ", " [sep] " " [mask] ", " year " " [sep] ".After covering treatment beginning of the sentence mark,
Covered logo, section break mark and the corresponding position embedding information of word segment and dialogue embedding information, referring to Fig. 6.
Dialogue data is added on the basis ofs the encyclopaedia corpus of writtenization relatively, Domestic News etc., can be enhanced to spoken language
Semantic expressiveness ability, expanded the application category of transformation model, better semantic expressiveness effect can be obtained.
Fig. 7 shows the structural block diagram of transformation model training device according to an embodiment of the present invention.As shown in fig. 7, the device
May include:
Module 71 is obtained, for obtaining the pre-training sample including dialogue data;
Generation module 72 generates input feature vector and pre-training target for utilizing the dialogue data;
First training module 73, for being lost using the input feature vector, the pre-training target and pre-training to initial
Transformation model is trained, and obtains pre-training transformation model.
In one embodiment, as shown in figure 8, the device further include:
Second training module 74, for being converted using goal task training sample and goal task loss to the pre-training
Model is trained, and obtains the transformation model of goal task.
In one embodiment, as shown in figure 8, the generation module 72 includes:
Cutting submodule 721, for using word segment segmentation algorithm to the dialogue in the dialogue data to carry out cutting,
Obtain multiple word segments;
Acquisition submodule 722, for obtaining the position embedding information and dialogue embedding information of each institute's predicate segment;
Select submodule 723, for from multiple institute's predicate segments selected section content as the pre-training target;
Submodule 724 is covered, for carrying out covering treatment to from content selected in multiple institute's predicate segments, obtains word
Embedding information;
Input submodule 725 is used for institute's predicate embedding information, the position embedding information and the dialogue embedding information
As the input feature vector.
In one embodiment, first training module 73 is also used to utilize the input feature vector, the pre-training
Target, dialogue reply loss and cover language model loss and be trained to the initial transformation model, to the initial transformation
The initial parameter of model is adjusted;The case where loss is replied in the dialogue and covering language model loss no longer reduces
Under, obtain the pre-training transformation model.
In one embodiment, second training module 74 is also used to appoint using goal task training sample and target
Business loss is trained the pre-training transformation model, is adjusted to the pre-training parameter of the pre-training transformation model;
In the case where goal task loss no longer reduces, the transformation model of the goal task is obtained.
The function of each module in each device of the embodiment of the present invention may refer to the corresponding description in the above method, herein not
It repeats again.
Fig. 9 shows the structural block diagram of transformation model training equipment according to an embodiment of the present invention.As shown in figure 9, the equipment
Include: memory 910 and processor 920, the computer program that can be run on processor 920 is stored in memory 910.Institute
State the transformation model training method realized in above-described embodiment when processor 920 executes the computer program.The memory
910 and processor 920 quantity can for one or more.
The equipment further include:
Communication interface 930 carries out data interaction for being communicated with external device.
Memory 910 may include high speed RAM memory, it is also possible to further include nonvolatile memory (non-
Volatile memory), a for example, at least magnetic disk storage.
If memory 910, processor 920 and the independent realization of communication interface 930, memory 910,920 and of processor
Communication interface 930 can be connected with each other by bus and complete mutual communication.The bus can be Industry Standard Architecture
Structure (ISA, Industry Standard Architecture) bus, external equipment interconnection (PCI, Peripheral
Component Interconnect) bus or extended industry-standard architecture (EISA, Extended Industry
Standard Architecture) bus etc..The bus can be divided into address bus, data/address bus, control bus etc..For
Convenient for indicating, only indicated with a thick line in Fig. 9, it is not intended that an only bus or a type of bus.
Optionally, in specific implementation, if memory 910, processor 920 and communication interface 930 are integrated in one piece of core
On piece, then memory 910, processor 920 and communication interface 930 can complete mutual communication by internal interface.
The embodiment of the invention provides a kind of computer readable storage mediums, are stored with computer program, the program quilt
Processor realizes any method in above-described embodiment when executing.
In the description of this specification, reference term " one embodiment ", " some embodiments ", " example ", " specifically show
The description of example " or " some examples " etc. means specific features, structure, material or spy described in conjunction with this embodiment or example
Point is included at least one embodiment or example of the invention.Moreover, particular features, structures, materials, or characteristics described
It may be combined in any suitable manner in any one or more of the embodiments or examples.In addition, without conflicting with each other, this
The technical staff in field can be by the spy of different embodiments or examples described in this specification and different embodiments or examples
Sign is combined.
In addition, term " first ", " second " are used for descriptive purposes only and cannot be understood as indicating or suggesting relative importance
Or implicitly indicate the quantity of indicated technical characteristic." first " is defined as a result, the feature of " second " can be expressed or hidden
It include at least one this feature containing ground.In the description of the present invention, the meaning of " plurality " is two or more, unless otherwise
Clear specific restriction.
Any process described otherwise above or method description are construed as in flow chart or herein, and expression includes
It is one or more for realizing specific logical function or process the step of executable instruction code module, segment or portion
Point, and the range of the preferred embodiment of the present invention includes other realization, wherein can not press shown or discussed suitable
Sequence, including according to related function by it is basic simultaneously in the way of or in the opposite order, Lai Zhihang function, this should be of the invention
Embodiment person of ordinary skill in the field understood.
Expression or logic and/or step described otherwise above herein in flow charts, for example, being considered use
In the order list for the executable instruction for realizing logic function, may be embodied in any computer-readable medium, for
Instruction execution system, device or equipment (such as computer based system, including the system of processor or other can be held from instruction
The instruction fetch of row system, device or equipment and the system executed instruction) it uses, or combine these instruction execution systems, device or set
It is standby and use.For the purpose of this specification, " computer-readable medium ", which can be, any may include, stores, communicates, propagates or pass
Defeated program is for instruction execution system, device or equipment or the dress used in conjunction with these instruction execution systems, device or equipment
It sets.The more specific example (non-exhaustive list) of computer-readable medium include the following: there is the electricity of one or more wirings
Interconnecting piece (electronic device), portable computer diskette box (magnetic device), random access memory (RAM), read-only memory
(ROM), erasable edit read-only storage (EPROM or flash memory), fiber device and portable read-only memory
(CDROM).In addition, computer-readable medium can even is that the paper that can print described program on it or other suitable Jie
Matter, because can then be edited, be interpreted or when necessary with other for example by carrying out optical scanner to paper or other media
Suitable method is handled electronically to obtain described program, is then stored in computer storage.
It should be appreciated that each section of the invention can be realized with hardware, software, firmware or their combination.Above-mentioned
In embodiment, software that multiple steps or method can be executed in memory and by suitable instruction execution system with storage
Or firmware is realized.It, and in another embodiment, can be under well known in the art for example, if realized with hardware
Any one of column technology or their combination are realized: having a logic gates for realizing logic function to data-signal
Discrete logic, with suitable combinational logic gate circuit specific integrated circuit, programmable gate array (PGA), scene
Programmable gate array (FPGA) etc..
Those skilled in the art are understood that realize all or part of step that above-described embodiment method carries
It suddenly is that relevant hardware can be instructed to complete by program, the program can store in a kind of computer-readable storage medium
In matter, which when being executed, includes the steps that one or a combination set of embodiment of the method.
It, can also be in addition, each functional unit in each embodiment of the present invention can integrate in a processing module
It is that each unit physically exists alone, can also be integrated in two or more units in a module.Above-mentioned integrated mould
Block both can take the form of hardware realization, can also be realized in the form of software function module.The integrated module is such as
Fruit is realized and when sold or used as an independent product in the form of software function module, also can store in a computer
In readable storage medium storing program for executing.The storage medium can be read-only memory, disk or CD etc..
The above description is merely a specific embodiment, but scope of protection of the present invention is not limited thereto, any
Those familiar with the art in the technical scope disclosed by the present invention, can readily occur in its various change or replacement,
These should be covered by the protection scope of the present invention.Therefore, protection scope of the present invention should be with the guarantor of the claim
It protects subject to range.
Claims (12)
1. a kind of transformation model training method characterized by comprising
Obtain the pre-training sample including dialogue data;
Using the dialogue data, input feature vector and pre-training target are generated;
Initial transformation model is trained using the loss of the input feature vector, the pre-training target and pre-training, is obtained pre-
Training transformation model.
2. the method according to claim 1, wherein further include:
The pre-training transformation model is trained using goal task training sample and goal task loss, target is obtained and appoints
The transformation model of business.
3. the method according to claim 1, wherein generating input feature vector and pre- instruction using the dialogue data
Practice target, comprising:
Multiple word segments are obtained to cutting is carried out to the dialogue in the dialogue data using word segment segmentation algorithm;
Obtain the position embedding information and dialogue embedding information of each institute's predicate segment;
Selected section content is as the pre-training target from multiple institute's predicate segments;
Covering treatment is carried out to from content selected in multiple institute's predicate segments, obtains word embedding information;
Using institute's predicate embedding information, the position embedding information and the dialogue embedding information as the input feature vector.
4. according to the method described in claim 3, it is characterized in that, utilizing the input feature vector, the pre-training target and pre-
Training loss is trained initial transformation model, obtains pre-training transformation model, comprising:
Loss is replied using the input feature vector, the pre-training target, dialogue and covers language model loss to described initial
Transformation model is trained, and is adjusted to the initial parameter of the initial transformation model;
In the case where loss is replied in the dialogue and covering language model loss no longer reduces, obtains the pre-training and become
Mold changing type.
5. according to the method described in claim 2, it is characterized in that, utilizing goal task training sample and goal task loss pair
The pre-training transformation model is trained, and obtains the transformation model of goal task, comprising:
The pre-training transformation model is trained using goal task training sample and goal task loss, to the pre- instruction
The pre-training parameter for practicing transformation model is adjusted;
In the case where goal task loss no longer reduces, the transformation model of the goal task is obtained.
6. a kind of transformation model training device characterized by comprising
Module is obtained, for obtaining the pre-training sample including dialogue data;
Generation module generates input feature vector and pre-training target for utilizing the dialogue data;
First training module, for being lost using the input feature vector, the pre-training target and pre-training to initial transformation mould
Type is trained, and obtains pre-training transformation model.
7. device according to claim 6, which is characterized in that further include:
Second training module, for using goal task training sample and goal task loss to the pre-training transformation model into
Row training, obtains the transformation model of goal task.
8. device according to claim 6, which is characterized in that the generation module includes:
Cutting submodule, for using word segment segmentation algorithm to the dialogue in the dialogue data to carry out cutting, obtain more
A word segment;
Acquisition submodule, for obtaining the position embedding information and dialogue embedding information of each institute's predicate segment;
Select submodule, for from multiple institute's predicate segments selected section content as the pre-training target;
Submodule is covered, for carrying out covering treatment to from content selected in multiple institute's predicate segments, obtains word insertion letter
Breath;
Input submodule, for using institute's predicate embedding information, the position embedding information and the dialogue embedding information as institute
State input feature vector.
9. device according to claim 8, which is characterized in that first training module is also used to special using the input
Sign, the pre-training target, dialogue reply loss and cover language model loss and be trained to the initial transformation model, right
The initial parameter of the initial transformation model is adjusted;Loss and covering language model loss are replied not in the dialogue
In the case where reducing again, the pre-training transformation model is obtained.
10. device according to claim 7, which is characterized in that second training module is also used to utilize goal task
Training sample and goal task loss are trained the pre-training transformation model, to the pre- instruction of the pre-training transformation model
Practice parameter to be adjusted;In the case where goal task loss no longer reduces, the transformation model of the goal task is obtained.
11. a kind of transformation model training equipment characterized by comprising
One or more processors;
Storage device, for storing one or more programs;
When one or more of programs are executed by one or more of processors, so that one or more of processors
Realize the method as described in any one of claims 1 to 5.
12. a kind of computer readable storage medium, is stored with computer program, which is characterized in that the program is held by processor
The method as described in any one of claims 1 to 5 is realized when row.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910498146.6A CN110197279B (en) | 2019-06-10 | 2019-06-10 | Transformation model training method, device, equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910498146.6A CN110197279B (en) | 2019-06-10 | 2019-06-10 | Transformation model training method, device, equipment and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110197279A true CN110197279A (en) | 2019-09-03 |
CN110197279B CN110197279B (en) | 2021-01-29 |
Family
ID=67754344
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910498146.6A Active CN110197279B (en) | 2019-06-10 | 2019-06-10 | Transformation model training method, device, equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110197279B (en) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111209740A (en) * | 2019-12-31 | 2020-05-29 | 中移(杭州)信息技术有限公司 | Text model training method, text error correction method, electronic device and storage medium |
CN111209383A (en) * | 2020-01-06 | 2020-05-29 | 广州小鹏汽车科技有限公司 | Method and device for processing multi-turn dialogue, vehicle, and storage medium |
CN111709248A (en) * | 2020-05-28 | 2020-09-25 | 北京百度网讯科技有限公司 | Training method and device of text generation model and electronic equipment |
CN112508093A (en) * | 2020-12-03 | 2021-03-16 | 北京百度网讯科技有限公司 | Self-training method and device, electronic equipment and readable storage medium |
CN112883180A (en) * | 2021-02-24 | 2021-06-01 | 挂号网(杭州)科技有限公司 | Model training method and device, electronic equipment and storage medium |
EP3835996A1 (en) * | 2019-12-12 | 2021-06-16 | Beijing Baidu Netcom Science And Technology Co., Ltd. | Method, apparatus, electronic device and storage medium for processing a semantic representation model |
CN113111665A (en) * | 2021-04-16 | 2021-07-13 | 清华大学 | Personalized dialogue rewriting method and device |
CN113378583A (en) * | 2021-07-15 | 2021-09-10 | 北京小米移动软件有限公司 | Dialogue reply method and device, dialogue model training method and device, and storage medium |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2016081289A1 (en) * | 2014-11-18 | 2016-05-26 | Merck Sharp & Dohme Corp. | Process for producing recombinant trypsin |
CN107944027A (en) * | 2017-12-12 | 2018-04-20 | 苏州思必驰信息科技有限公司 | Create the method and system of semantic key index |
CN108509411A (en) * | 2017-10-10 | 2018-09-07 | 腾讯科技(深圳)有限公司 | Semantic analysis and device |
CN108536735A (en) * | 2018-03-05 | 2018-09-14 | 中国科学院自动化研究所 | Multi-modal lexical representation method and system based on multichannel self-encoding encoder |
CN108829685A (en) * | 2018-05-07 | 2018-11-16 | 内蒙古工业大学 | A kind of illiteracy Chinese inter-translation method based on single language training |
RU2678716C1 (en) * | 2017-12-11 | 2019-01-31 | Общество с ограниченной ответственностью "Аби Продакшн" | Use of autoencoders for learning text classifiers in natural language |
CN109346084A (en) * | 2018-09-19 | 2019-02-15 | 湖北工业大学 | Method for distinguishing speek person based on depth storehouse autoencoder network |
CN109829299A (en) * | 2018-11-29 | 2019-05-31 | 电子科技大学 | A kind of unknown attack recognition methods based on depth self-encoding encoder |
-
2019
- 2019-06-10 CN CN201910498146.6A patent/CN110197279B/en active Active
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2016081289A1 (en) * | 2014-11-18 | 2016-05-26 | Merck Sharp & Dohme Corp. | Process for producing recombinant trypsin |
CN108509411A (en) * | 2017-10-10 | 2018-09-07 | 腾讯科技(深圳)有限公司 | Semantic analysis and device |
RU2678716C1 (en) * | 2017-12-11 | 2019-01-31 | Общество с ограниченной ответственностью "Аби Продакшн" | Use of autoencoders for learning text classifiers in natural language |
CN107944027A (en) * | 2017-12-12 | 2018-04-20 | 苏州思必驰信息科技有限公司 | Create the method and system of semantic key index |
CN108536735A (en) * | 2018-03-05 | 2018-09-14 | 中国科学院自动化研究所 | Multi-modal lexical representation method and system based on multichannel self-encoding encoder |
CN108829685A (en) * | 2018-05-07 | 2018-11-16 | 内蒙古工业大学 | A kind of illiteracy Chinese inter-translation method based on single language training |
CN109346084A (en) * | 2018-09-19 | 2019-02-15 | 湖北工业大学 | Method for distinguishing speek person based on depth storehouse autoencoder network |
CN109829299A (en) * | 2018-11-29 | 2019-05-31 | 电子科技大学 | A kind of unknown attack recognition methods based on depth self-encoding encoder |
Non-Patent Citations (1)
Title |
---|
周永章等: "《地球科学大数据挖掘与机器学习》", 30 September 2018 * |
Cited By (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP3835996A1 (en) * | 2019-12-12 | 2021-06-16 | Beijing Baidu Netcom Science And Technology Co., Ltd. | Method, apparatus, electronic device and storage medium for processing a semantic representation model |
US11520991B2 (en) | 2019-12-12 | 2022-12-06 | Beijing Baidu Netcom Science And Technology Co., Ltd. | Method, apparatus, electronic device and storage medium for processing a semantic representation model |
CN111209740A (en) * | 2019-12-31 | 2020-05-29 | 中移(杭州)信息技术有限公司 | Text model training method, text error correction method, electronic device and storage medium |
CN111209740B (en) * | 2019-12-31 | 2023-08-15 | 中移(杭州)信息技术有限公司 | Text model training method, text error correction method, electronic device and storage medium |
CN111209383B (en) * | 2020-01-06 | 2023-04-07 | 广州小鹏汽车科技有限公司 | Method and device for processing multi-turn dialogue, vehicle, and storage medium |
CN111209383A (en) * | 2020-01-06 | 2020-05-29 | 广州小鹏汽车科技有限公司 | Method and device for processing multi-turn dialogue, vehicle, and storage medium |
CN111709248A (en) * | 2020-05-28 | 2020-09-25 | 北京百度网讯科技有限公司 | Training method and device of text generation model and electronic equipment |
CN111709248B (en) * | 2020-05-28 | 2023-07-11 | 北京百度网讯科技有限公司 | Training method and device for text generation model and electronic equipment |
CN112508093A (en) * | 2020-12-03 | 2021-03-16 | 北京百度网讯科技有限公司 | Self-training method and device, electronic equipment and readable storage medium |
CN112508093B (en) * | 2020-12-03 | 2022-01-28 | 北京百度网讯科技有限公司 | Self-training method and device, electronic equipment and readable storage medium |
CN112883180A (en) * | 2021-02-24 | 2021-06-01 | 挂号网(杭州)科技有限公司 | Model training method and device, electronic equipment and storage medium |
CN113111665A (en) * | 2021-04-16 | 2021-07-13 | 清华大学 | Personalized dialogue rewriting method and device |
CN113111665B (en) * | 2021-04-16 | 2022-10-04 | 清华大学 | Personalized dialogue rewriting method and device |
CN113378583A (en) * | 2021-07-15 | 2021-09-10 | 北京小米移动软件有限公司 | Dialogue reply method and device, dialogue model training method and device, and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN110197279B (en) | 2021-01-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110197279A (en) | Transformation model training method, device, equipment and storage medium | |
CN105244020B (en) | Prosodic hierarchy model training method, text-to-speech method and text-to-speech device | |
CN110377716A (en) | Exchange method, device and the computer readable storage medium of dialogue | |
CN107818164A (en) | A kind of intelligent answer method and its system | |
CN110334354A (en) | A kind of Chinese Relation abstracting method | |
CN107992597A (en) | A kind of text structure method towards electric network fault case | |
Yu et al. | Sequential labeling using deep-structured conditional random fields | |
CN106571139B (en) | Phonetic search result processing method and device based on artificial intelligence | |
CN108829678A (en) | Name entity recognition method in a kind of Chinese international education field | |
CN107195295A (en) | Audio recognition method and device based on Chinese and English mixing dictionary | |
CN107301860A (en) | Audio recognition method and device based on Chinese and English mixing dictionary | |
CN107577662A (en) | Towards the semantic understanding system and method for Chinese text | |
CN103810999A (en) | Linguistic model training method and system based on distributed neural networks | |
CN107391614A (en) | A kind of Chinese question and answer matching process based on WMD | |
CN110263325A (en) | Chinese automatic word-cut | |
CN106844341A (en) | News in brief extracting method and device based on artificial intelligence | |
CN107977363A (en) | Title generation method, device and electronic equipment | |
CN111489746B (en) | Power grid dispatching voice recognition language model construction method based on BERT | |
CN110852040B (en) | Punctuation prediction model training method and text punctuation determination method | |
CN107247751A (en) | Content recommendation method based on LDA topic models | |
CN111191445A (en) | Advertisement text classification method and device | |
CN111742322A (en) | System and method for domain and language independent definition extraction using deep neural networks | |
CN110399472A (en) | Reminding method, device, computer equipment and storage medium are putd question in interview | |
CN113239666A (en) | Text similarity calculation method and system | |
CN108846125A (en) | Talk with generation method, device, terminal and computer readable storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |