CN114547492A - Training method for generating model, method, device, equipment and medium for generating file - Google Patents
Training method for generating model, method, device, equipment and medium for generating file Download PDFInfo
- Publication number
- CN114547492A CN114547492A CN202210152882.8A CN202210152882A CN114547492A CN 114547492 A CN114547492 A CN 114547492A CN 202210152882 A CN202210152882 A CN 202210152882A CN 114547492 A CN114547492 A CN 114547492A
- Authority
- CN
- China
- Prior art keywords
- decoding unit
- time step
- prediction result
- model
- determining
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000012549 training Methods 0.000 title claims abstract description 74
- 238000000034 method Methods 0.000 title claims abstract description 73
- 238000004590 computer program Methods 0.000 claims description 13
- 238000004364 calculation method Methods 0.000 claims description 7
- 238000005070 sampling Methods 0.000 claims description 5
- 238000012163 sequencing technique Methods 0.000 claims description 3
- 238000013473 artificial intelligence Methods 0.000 abstract description 3
- 238000010586 diagram Methods 0.000 description 11
- 238000004891 communication Methods 0.000 description 8
- 230000008569 process Effects 0.000 description 6
- 238000012545 processing Methods 0.000 description 6
- 238000012937 correction Methods 0.000 description 4
- 230000006870 function Effects 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 3
- 230000009286 beneficial effect Effects 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 238000013459 approach Methods 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 230000001413 cellular effect Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000013145 classification model Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000009499 grossing Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 230000000717 retained effect Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 230000001953 sensory effect Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9538—Presentation of query results
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/166—Editing, e.g. inserting or deleting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/047—Probabilistic or stochastic networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Evolutionary Computation (AREA)
- Biophysics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Databases & Information Systems (AREA)
- Probability & Statistics with Applications (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
Abstract
The disclosure provides a training method for generating a model, a method, a device, equipment and a medium for generating a file, and relates to the field of artificial intelligence, in particular to the field of text generation. The specific implementation scheme comprises the following steps: determining a first prediction result by combining a real target of a training sample and the coding information output by the coding unit through a first decoding unit; determining a reference input value of the second decoding unit from the real target and the first prediction result; and predicting the input reference input value through a second decoding unit to obtain a final prediction result, so that the model parameters are reversely adjusted based on the final prediction result. The method trains the generative model in a two-stage decoding mode, and can ensure the diversity of the pattern generated by the generative model.
Description
Technical Field
The present disclosure relates to the field of artificial intelligence, and in particular, to the field of text generation, and more particularly, to a training method for generating a model, a method and apparatus for generating a document, an electronic device, a storage medium, and a computer program product.
Background
In internet marketing, a landing page refers to a webpage which is displayed to a potential user in a skipping mode after a potential user clicks an advertisement or searches by using a search engine, and the landing page comprises a telephone art which is attractive to netizens, so that the page selling point can be highlighted, the search efficiency of the user can be improved, the netizens can be helped to directly reach various components, and the user can be stimulated to form conversion.
Disclosure of Invention
The present disclosure provides a training method of a generative model, a pattern generation method, an apparatus, an electronic device, a storage medium, and a computer program product.
According to an aspect of the present disclosure, there is provided a training method of a generative model, the generative model including one encoding unit and two decoding units, the method including:
determining a first prediction result by combining a real target of a training sample and the coding information output by the coding unit through a first decoding unit;
determining a reference input value of the second decoding unit from the real target and the first prediction result;
and predicting the input reference input value through a second decoding unit to obtain a final prediction result, so that the model parameters are reversely adjusted based on the final prediction result.
According to an aspect of the present disclosure, there is provided a document generation method including:
transmitting the text to be processed to a coding unit for generating a model to obtain coding information; the generative model is obtained after training according to any training method of the generative model in the disclosure;
the encoded information is used as an input to a second decoding unit that generates a model, and the generated pattern is determined based on the output of the second decoding unit.
According to an aspect of the present disclosure, there is provided a training apparatus for generating a model, the generating model including an encoding unit and two decoding units, the apparatus including:
the first-stage decoding module is used for determining a first prediction result by combining a real target of a training sample and the coding information output by the coding unit through the first decoding unit;
a sampling module for determining a reference input value of the second decoding unit from the real target and the first prediction result;
and the two-stage decoding module is used for predicting the input reference input value through the second decoding unit to obtain a final prediction result, so that the model parameters are reversely adjusted based on the final prediction result.
According to an aspect of the present disclosure, there is provided a document generation apparatus including:
the encoding module is used for transmitting the text to be processed to the encoding unit of the generation model to obtain encoding information; the generative model is obtained after training according to any generative model training method in the disclosure;
and the generating module is used for taking the coding information as the input of a second decoding unit of the generating model and determining the generated file according to the output of the second decoding unit.
According to another aspect of the present disclosure, there is provided an electronic device including:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a training method or a pattern generation method of generating a model according to any of the embodiments of the present disclosure.
According to another aspect of the present disclosure, there is provided a non-transitory computer readable storage medium storing computer instructions for causing a computer to perform a training method or a pattern generation method of generating a model according to any of the embodiments of the present disclosure.
According to another aspect of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements the training method or the pattern generation method of the generative model of any of the embodiments of the present disclosure.
According to the technology disclosed by the invention, the generative model is trained in a two-stage decoding mode, so that the diversity of the pattern generated by the trained generative model can be ensured.
It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.
Drawings
The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:
FIG. 1 is a schematic flow chart diagram of a training method for generating a model according to an embodiment of the present disclosure;
FIG. 2 is a schematic flow chart diagram illustrating a further training method for generating a model according to an embodiment of the present disclosure;
FIG. 3 is a schematic flow chart diagram illustrating a further training method for generating a model according to an embodiment of the present disclosure;
FIG. 4 is a schematic flow chart diagram of a method for generating a document according to an embodiment of the disclosure;
FIG. 5 is a schematic structural diagram of a training apparatus for generating a model according to an embodiment of the present disclosure;
FIG. 6 is a schematic structural diagram of a document generation apparatus provided in accordance with an embodiment of the present disclosure;
FIG. 7 is a block diagram of an electronic device for implementing a training method of generative models according to an embodiment of the present disclosure.
Detailed Description
Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
The generative model in the embodiment of the disclosure is mainly used for generating words in landing pages which are attractive to netizens, such as generating titles of the landing pages. The generative model of the present disclosure is constructed based on a transform framework, and includes an encoding unit and two decoding units, where the two decoding units have the same structure and share parameters, and further perform model training in a two-stage decoding manner, and the specific process is as follows.
Fig. 1 is a schematic flowchart of a training method for generating a model according to an embodiment of the present disclosure, which is applicable to a case of training a generated model in a two-stage decoding manner. The method can be executed by a training device for generating the model, which is realized by software and/or hardware and is integrated on the electronic equipment.
Specifically, referring to fig. 1, the training method for generating the model is as follows:
s101, determining a first prediction result through a first decoding unit by combining a real target of a training sample and the coding information output by a coding unit.
In the embodiment of the present disclosure, when training the generative model, first, training sample data is input into the coding unit of the generative model to obtain corresponding coding information, where the coding information is mainly used as an intermediate input of the first decoding unit. The real target of the training sample refers to the labeling knowledge of the sample, and the real target of the training sample is directly used as the initial input of the first decoding unit during model training.
In an alternative embodiment, the first decoding unit obtains the first prediction result by prediction at a plurality of time steps. In specific implementation, firstly, a real target corresponding to a sub-prediction result generated by the first decoding unit at any time step is used as initial input of the first decoding unit at the next time step; for example, if the sub-prediction result generated by the generative model at the first time step is "yesterday" and the real target corresponding to the time step is "present", then "present" is used as the initial input of the first decoding means at the next time step. This allows prediction errors to occur only at the first time step and not accumulate to the following. I.e. the real target is used as the initial input of the first decoding unit, it can be ensured that errors will not accumulate. It should be noted that, the conventional generative model mainly uses a seq2seq framework, and the output of the previous hidden state is used as the input of the next hidden state during the training of the model, so that in the early training stage, if an extremely bad result occurs in the preceding state, all the following state states are affected by the extremely bad result, resulting in a completely disordered final generated result.
Further, the encoded information is input as a first decoding unit in the middle of each time step. This results in a model predictive probability distribution at each time step. Therefore, the sub-prediction result of each time step can be determined according to the probability distribution of the prediction result obtained by the first decoding unit at each time step, and the set of the sub-prediction results generated at each time step is used as the first prediction result.
And S102, determining a reference input value of the second decoding unit from the real target and the first prediction result.
Optionally, in an early stage of model training, the real target is used as a reference input value of the second decoding unit; in the later stage of the model training, the first prediction result generated in step S101 is used as the reference input value of the second decoding unit. The benefits of doing so are: in the initial stage of model training, the second decoding unit basically and completely uses the real target as a reference value, so that the initial generation capability can be better and faster realized, and as the training time is increased, the first prediction result output by the first decoding unit is used as the reference value, so that the model is prevented from prematurely correcting the output to be the real target, the problem of excessive correction is solved, and the diversity is reserved. Because generative models are different from classification models, which is a complete one-hot problem, it is better that the task goals of generative models are not perfectly one-to-one aligned, which may allow synonyms or synonyms, but rather generate multiple synonyms. If only the real target (i.e. ground channel) is used as the input of the second decoding unit, the model will be continuously corrected according to the real target, and the diversity will be killed in advance.
S103, predicting the input reference input value through a second decoding unit to obtain a final prediction result, and adjusting the model parameters reversely based on the final prediction result.
After the reference input value is determined in step S102, the reference input value is input to the second decoding unit for prediction to obtain a final prediction result, and then the model parameters are reversely adjusted based on the final prediction result, mainly by updating the network parameters of the generated model through gradient backhaul. It should be noted that the gradient back-pass is from the second decoding unit to the encoding unit without passing through the first decoding unit.
In the embodiment of the disclosure, the generation model is trained in a two-stage decoding mode, and the real target is used as the input of the first decoding unit, so that prediction errors can be prevented from accumulating; according to the training period, the real target or the first prediction result is used as the input of the second decoding unit, so that the problem of over correction can be avoided, and the diversity of the file generated by the trained generation model is further ensured.
Fig. 2 is a schematic flowchart of a further training method for generating a model according to an embodiment of the present disclosure, which is based on the foregoing embodiment, and the embodiment of the present disclosure refines the process of determining the reference input value of the second decoding unit, and referring to fig. 2, the training method for generating a model specifically includes the following steps:
s201, determining a first prediction result by combining a real target of a training sample and the coding information output by the coding unit through a first decoding unit.
S202, respectively calculating the probability of the real target and the first prediction result as reference input values according to the number of training rounds and the preset hyper-parameter.
And S203, determining a reference input value of the second decoding unit according to the probability calculation result.
In the embodiment of the present disclosure, a formula for calculating a probability that a real target is used as a reference input value of the second decoding unit is as follows: p ═ β/(β + exp (epoch/β)); wherein, beta is a preset hyper-parameter, and epoch is the number of training rounds. The probability of the first prediction being a reference input value is 1-p. As can be seen from the probability calculation formula, the probability p becomes smaller and smaller as the number of training rounds epoch increases, i.e., the probability p becomes larger and larger, and the output of the first decoding unit is used as the reference input of the second decoding unit. The benefits of doing so are: in the initial stage of model training, the second decoding unit basically and completely uses the real target as a reference value, so that the initial generation capability can be better and faster realized, and as the training time is increased, the first prediction result output by the first decoding unit is used as the reference value, so that the model is prevented from prematurely correcting the output to be the real target, the problem of excessive correction is solved, and the diversity is reserved.
S204, predicting the input reference input value through a second decoding unit to obtain a final prediction result, and reversely adjusting the model parameters based on the final prediction result.
In the embodiment of the disclosure, in the initial stage of model training, the second decoding unit basically and completely uses the real target as the reference value, so that the model can better and faster have the primary generation capability, and as the training time increases, the first prediction result output by the first decoding unit is used as the reference value, so that the model is prevented from prematurely correcting the output to be the real target, the problem of excessive correction is solved, and the diversity is retained.
Fig. 3 is a schematic flowchart of a further training method for generating a model according to an embodiment of the present disclosure, where the embodiment of the present disclosure refines, on the basis of the foregoing embodiment, a process of determining respective sub-predictors for each time step according to a probability distribution of a predictor obtained by the first decoding unit at each time step, and referring to fig. 3, the method further includes the following steps:
and S301, regarding any time step, taking the probability distribution of the prediction result obtained by the first decoding unit in the time step with the highest probability as the sub-prediction result of the time step.
And S302, selecting one of the N prediction results before probability sequencing as a sub-prediction result of any time step according to the probability distribution of the prediction results obtained by the first decoding unit at the time step.
In the embodiment of the disclosure, the output of the first decoding unit is converted into probability distribution by softmax at each time step, and the sub-prediction result at the time step with the maximum probability is used as the sub-prediction result at the time step, so that the accuracy of model prediction can be ensured; and one of the prediction results with higher probability is selected as the sub-prediction result of the time step, so that the diversity of model prediction can be ensured under the condition that the prediction accuracy is not changed much.
Further, for any time step, the first decoding unit calculates the probability distribution of the prediction result of the first decoding unit at the time step based on the preset noise parameter g and the temperature parameter t. Optionally, the first decoding unit is at that timeThe probability distribution formula of the steps is as follows:wherein z isiAs input for normal softmax, t is the super-ginseng temperature, g ═ log (-log (u)), and u obeys Uniform (0, 1). It should be noted that before the probability distribution is calculated, some noise is added, so that the magnitude of the softmax probability of the direct output has a certain value. And meanwhile, the smoothing degree of softmax can be controlled by increasing the temperature t, the higher the temperature is, the smoother the generated probability distribution is, and the sharper the probability distribution is, the closer the probability distribution is to the one-hot. In practical training, the temperature can be slowly reduced to gradually approach the real discrete distribution.
Fig. 4 is a schematic flow chart of a pattern generation method according to an embodiment of the present disclosure, which is applicable to a case of generating a pattern through a trained generative model. The method can be executed by a file generation device which is realized by software and/or hardware and is integrated on the electronic equipment.
Referring to fig. 4, the document generation method is specifically as follows:
s401, transmitting the text to be processed to a coding unit for generating a model to obtain coding information; the generative model is obtained after training according to any one of the training methods of the generative model disclosed by the disclosure.
S402, the coding information is used as the input of a second decoding unit for generating the model, and the generated file is determined according to the output of the second decoding unit.
A generative model can be trained through the above embodiments, and the specific training process refers to the above embodiments and is not described herein again. On the basis, the to-be-processed text can be predicted directly by using the generation model, wherein the text content in the landing page which can be selected by the to-be-processed text is predicted for generating a title according to the text content of the landing page. In the concrete implementation, firstly, coding information is obtained from a coding unit of a text generation model to be processed, the coding information is used as the input of a second decoding unit of the generation model, and the generated file is determined according to the output of the second decoding unit. It should be noted that, in the stage of using the trained generated model for prediction, only the second decoding unit needs to be used, that is, in the process of using the model, two-stage decoding is not needed.
In the embodiment of the disclosure, the purpose of generating diversified files from the text to be processed can be realized through the trained generating model.
Fig. 5 is a schematic structural diagram of a training apparatus for generating a model according to an embodiment of the present disclosure, where the generated model includes one encoding unit and two decoding units, and this embodiment is applicable to a case where the generated model is trained in a two-stage decoding manner. As shown in fig. 5, the apparatus specifically includes:
a first-stage decoding module 501, configured to determine, by a first decoding unit, a first prediction result by combining a real target of a training sample and coding information output by a coding unit;
a sampling module 502 for determining a reference input value of the second decoding unit from the real target and the first prediction result;
the second-stage decoding module 503 is configured to predict the input reference input value through the second decoding unit to obtain a final prediction result, so that the model parameter is reversely adjusted based on the final prediction result.
On the basis of the foregoing embodiment, optionally, the one-stage decoding module includes:
the initial input determining submodule is used for taking a real target corresponding to a sub-prediction result generated by the first decoding unit at any time step as the initial input of the first decoding unit at the next time step;
an intermediate input determination submodule for inputting the encoded information as an intermediate input of the first decoding unit at each time step;
and the first-stage decoding submodule is used for determining the sub-prediction result of each time step according to the probability distribution of the prediction result obtained by the first decoding unit at each time step, and taking the set of the sub-prediction results generated at each time step as the first prediction result.
On the basis of the foregoing embodiment, optionally, the one-stage decoding sub-module is further configured to:
regarding any time step, taking the probability maximum in the probability distribution of the prediction results obtained by the first decoding unit at the time step as a sub-prediction result of the time step; or,
and for any time step, selecting one of the N predicted results before the probability sequence as a sub-predicted result of the time step according to the probability distribution of the predicted results obtained by the first decoding unit at the time step.
On the basis of the above embodiment, optionally, the method further includes:
and the probability distribution calculating module is used for calculating the probability distribution of the prediction result of the first decoding unit at any time step based on the preset noise parameter and the temperature parameter.
On the basis of the foregoing embodiment, optionally, the sampling module includes:
the probability calculation submodule is used for respectively calculating the probability of the real target and the first prediction result as reference input values according to the number of training rounds and the preset hyper-parameter;
and the determining submodule is used for determining a reference input value of the second decoding unit according to the probability calculation result.
The training device for generating the model, provided by the embodiment of the disclosure, can execute the training method for generating the model, provided by any embodiment of the disclosure, and has the corresponding functional modules and beneficial effects of the execution method. Reference may be made to the description of any method embodiment of the disclosure for a matter not explicitly described in this embodiment.
Fig. 6 is a schematic structural diagram of a document generation apparatus according to an embodiment of the present disclosure, where a generation model includes one encoding unit and two decoding units, and this embodiment is applicable to a case where a document is generated by a trained generation model. As shown in fig. 6, the apparatus specifically includes:
the encoding module 601 is configured to transmit a text to be processed to an encoding unit that generates a model, so as to obtain encoding information; wherein the generative model is obtained after training according to any one of claims 1 to 5;
the generating module 602 is configured to use the encoded information as an input of a second decoding unit for generating a model, and determine a generated pattern according to an output of the second decoding unit.
The document generation device provided by the embodiment of the disclosure can execute the document generation method provided by any embodiment of the disclosure, and has corresponding functional modules and beneficial effects of the execution method. Reference may be made to the description of any method embodiment of the disclosure for a matter not explicitly described in this embodiment
In the technical scheme of the disclosure, the acquisition, storage, application and the like of the personal information of the related user all accord with the regulations of related laws and regulations, and do not violate the good customs of the public order.
The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.
FIG. 7 illustrates a schematic block diagram of an example electronic device 700 that can be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.
As shown in fig. 7, the device 700 comprises a computing unit 701, which may perform various suitable actions and processes according to a computer program stored in a Read Only Memory (ROM)702 or a computer program loaded from a storage unit 708 into a Random Access Memory (RAM) 703. In the RAM703, various programs and data required for the operation of the device 700 can also be stored. The computing unit 701, the ROM 702, and the RAM703 are connected to each other by a bus 704. An input/output (I/O) interface 705 is also connected to bus 704.
Various components in the device 700 are connected to the I/O interface 705, including: an input unit 706 such as a keyboard, a mouse, or the like; an output unit 707 such as various types of displays, speakers, and the like; a storage unit 708 such as a magnetic disk, optical disk, or the like; and a communication unit 709 such as a network card, modem, wireless communication transceiver, etc. The communication unit 709 allows the device 700 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), Complex Programmable Logic Devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.
Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.
The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server with a combined blockchain.
It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved, and the present disclosure is not limited herein.
The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.
Claims (15)
1. A training method of a generative model comprising an encoding unit and two decoding units, the method comprising:
determining a first prediction result by combining a real target of a training sample and the coding information output by the coding unit through a first decoding unit;
determining a reference input value of a second decoding unit from the real target and the first prediction result;
predicting the input reference input value through the second decoding unit to obtain a final prediction result, so that model parameters are reversely adjusted based on the final prediction result.
2. The method of claim 1, wherein determining, by the first decoding unit, the first prediction result by combining the real target of the training sample and the coding information output by the coding unit comprises:
taking a real target corresponding to a sub-prediction result generated by the first decoding unit at any time step as initial input of the first decoding unit at the next time step;
inputting the encoded information as an intermediate of each time step for the first decoding unit;
and determining the sub-prediction result of each time step according to the probability distribution of the prediction result obtained by the first decoding unit at each time step, and taking the set of the sub-prediction results generated at each time step as the first prediction result.
3. The method of claim 2, wherein the determining the sub-predictors for each time step based on a probability distribution of the predictors obtained by the first decoding unit at each time step comprises:
regarding any time step, taking the probability maximum in the probability distribution of the prediction results obtained by the first decoding unit at the time step as a sub-prediction result of the time step; or,
and for any time step, selecting one of the N predicted results before probability sequencing as a sub-predicted result of the time step according to the probability distribution of the predicted results obtained by the first decoding unit at the time step.
4. The method of claim 2, further comprising:
for any time step, the first decoding unit calculates the probability distribution of the prediction result of the first decoding unit at the time step based on preset noise parameters and temperature parameters.
5. The method of claim 1, wherein said determining a reference input value for a second decoding unit from said real target and said first prediction result comprises:
respectively calculating the probability of the real target and the first prediction result as the reference input value according to the number of training rounds and a preset hyper-parameter;
and determining a reference input value of the second decoding unit according to the probability calculation result.
6. A method of generating a document, comprising:
transmitting the text to be processed to a coding unit for generating a model to obtain coding information; wherein the generative model is obtained after training according to the method of any one of claims 1 to 5;
and the coding information is used as the input of a second decoding unit of the generated model, and the generated file is determined according to the output of the second decoding unit.
7. A training apparatus for generating a model, the generating model including an encoding unit and two decoding units, the apparatus comprising:
the first-stage decoding module is used for determining a first prediction result by combining a real target of a training sample and the coding information output by the coding unit through a first decoding unit;
a sampling module for determining a reference input value of a second decoding unit from the real target and the first prediction result;
and the two-stage decoding module is used for predicting the input reference input value through the second decoding unit to obtain a final prediction result, so that the model parameters are reversely adjusted based on the final prediction result.
8. The apparatus of claim 7, wherein the one-stage decoding module comprises:
the initial input determining submodule is used for taking a real target corresponding to a sub-prediction result generated by the first decoding unit at any time step as the initial input of the first decoding unit at the next time step;
an intermediate input determination submodule for taking the encoded information as an intermediate input of the first decoding unit at each time step;
and the first-stage decoding submodule is used for determining the sub-prediction result of each time step according to the probability distribution of the prediction result obtained by the first decoding unit at each time step, and taking the set of the sub-prediction results generated at each time step as the first prediction result.
9. The apparatus of claim 8, wherein the one-stage decoding sub-module is further to:
regarding any time step, taking the probability maximum in the probability distribution of the prediction results obtained by the first decoding unit at the time step as a sub-prediction result of the time step; or,
and for any time step, selecting one of the N predicted results before probability sequencing as a sub-predicted result of the time step according to the probability distribution of the predicted results obtained by the first decoding unit at the time step.
10. The apparatus of claim 8, further comprising:
and the probability distribution calculating module is used for calculating the probability distribution of the prediction result of the first decoding unit at any time step based on preset noise parameters and temperature parameters.
11. The apparatus of claim 7, wherein the sampling module comprises:
the probability calculation sub-module is used for respectively calculating the probabilities of the real target and the first prediction result as the reference input value according to the number of training rounds and a preset hyper-parameter;
and the determining submodule is used for determining the reference input value of the second decoding unit according to the probability calculation result.
12. A document creation apparatus comprising:
the encoding module is used for transmitting the text to be processed to the encoding unit of the generation model to obtain encoding information; wherein the generative model is obtained after training according to the method of any one of claims 1 to 5;
and the generating module is used for taking the coding information as the input of a second decoding unit of the generated model and determining the generated file according to the output of the second decoding unit.
13. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-5 or 6.
14. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any of claims 1-5 or 6.
15. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 1-5 or 6.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210152882.8A CN114547492A (en) | 2022-02-18 | 2022-02-18 | Training method for generating model, method, device, equipment and medium for generating file |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210152882.8A CN114547492A (en) | 2022-02-18 | 2022-02-18 | Training method for generating model, method, device, equipment and medium for generating file |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114547492A true CN114547492A (en) | 2022-05-27 |
Family
ID=81675963
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210152882.8A Pending CN114547492A (en) | 2022-02-18 | 2022-02-18 | Training method for generating model, method, device, equipment and medium for generating file |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114547492A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115188014A (en) * | 2022-06-22 | 2022-10-14 | 北京百度网讯科技有限公司 | Landing page processing method, model training method and device and electronic equipment |
CN115512391A (en) * | 2022-09-29 | 2022-12-23 | 珠海视熙科技有限公司 | Target detection model training method, device and equipment for data adaptive resampling |
-
2022
- 2022-02-18 CN CN202210152882.8A patent/CN114547492A/en active Pending
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115188014A (en) * | 2022-06-22 | 2022-10-14 | 北京百度网讯科技有限公司 | Landing page processing method, model training method and device and electronic equipment |
CN115188014B (en) * | 2022-06-22 | 2023-11-14 | 北京百度网讯科技有限公司 | Floor page processing method, model training method, device and electronic equipment |
CN115512391A (en) * | 2022-09-29 | 2022-12-23 | 珠海视熙科技有限公司 | Target detection model training method, device and equipment for data adaptive resampling |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112466288B (en) | Voice recognition method and device, electronic equipment and storage medium | |
CN113239705A (en) | Pre-training method and device of semantic representation model, electronic equipment and storage medium | |
CN113656581B (en) | Text classification and model training method, device, equipment and storage medium | |
CN114547492A (en) | Training method for generating model, method, device, equipment and medium for generating file | |
US20210303608A1 (en) | Keyword generating method, apparatus, device and storage medium | |
CN112541124A (en) | Method, apparatus, device, medium and program product for generating a multitask model | |
CN110309275A (en) | A kind of method and apparatus that dialogue generates | |
CN113590858A (en) | Target object generation method and device, electronic equipment and storage medium | |
CN113239157B (en) | Method, device, equipment and storage medium for training conversation model | |
CN114937478B (en) | Method for training a model, method and apparatus for generating molecules | |
CN112925900A (en) | Search information processing method, device, equipment and storage medium | |
CN115631261A (en) | Training method of image generation model, image generation method and device | |
CN113869042A (en) | Text title generation method and device, electronic equipment and storage medium | |
CN114428907A (en) | Information searching method and device, electronic equipment and storage medium | |
CN113468857A (en) | Method and device for training style conversion model, electronic equipment and storage medium | |
CN113205189A (en) | Prediction model training method, prediction method and prediction device | |
CN113656689B (en) | Model generation method and network information pushing method | |
CN113642654B (en) | Image feature fusion method and device, electronic equipment and storage medium | |
CN115203564A (en) | Information flow recommendation method and device and computer program product | |
CN114037060A (en) | Pre-training model generation method and device, electronic equipment and storage medium | |
CN114254028A (en) | Event attribute extraction method and device, electronic equipment and storage medium | |
CN113361621A (en) | Method and apparatus for training a model | |
CN113807397A (en) | Training method, device, equipment and storage medium of semantic representation model | |
CN113361712B (en) | Training method of feature determination model, semantic analysis method, semantic analysis device and electronic equipment | |
CN117591948B (en) | Comment generation model training method and device, and information generation method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |