CN114648110A - Model training method and device, electronic equipment and computer storage medium - Google Patents

Model training method and device, electronic equipment and computer storage medium Download PDF

Info

Publication number
CN114648110A
CN114648110A CN202011505035.2A CN202011505035A CN114648110A CN 114648110 A CN114648110 A CN 114648110A CN 202011505035 A CN202011505035 A CN 202011505035A CN 114648110 A CN114648110 A CN 114648110A
Authority
CN
China
Prior art keywords
training
decoder
data
model
encoder
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011505035.2A
Other languages
Chinese (zh)
Inventor
桂敏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba Group Holding Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN202011505035.2A priority Critical patent/CN114648110A/en
Publication of CN114648110A publication Critical patent/CN114648110A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The embodiment of the application provides a model training method, a model training device, electronic equipment and a computer storage medium, wherein the model training method comprises the following steps: acquiring pre-training sample data, wherein the pre-training sample data comprises multi-mode data; pre-training an encoder in a neural network model by using pre-training sample data to obtain a pre-trained encoder; acquiring a feature representation output after a pre-training encoder processes pre-training sample data and a pre-training reference sample corresponding to the feature representation; the decoder in the neural network model is pre-trained using the feature representations and the pre-training reference samples. And the encoder and the decoder are trained in stages, so that the training effect and the training efficiency of the model are improved.

Description

Model training method and device, electronic equipment and computer storage medium
Technical Field
The embodiment of the application relates to the technical field of artificial intelligence, in particular to a model training method and device, electronic equipment and a computer storage medium.
Background
With the development of science and technology, the information dissemination on the internet is more favored to the more intuitive modes such as images and videos. Multimodal data is increasingly being applied to various areas of information dissemination and storage. Data can be divided into characters, images, videos, voice and the like according to different carrier types, and the multi-mode data is data which comprises multiple carrier types. Although multimodal data is more intuitive, in some scenarios, much of the information requires manual configuration by the user in text based on the multimodal data. For example, in the electronic market scene, the selling points of the commodities, the common problems and the like need to be manually filled according to the image, the video or the text description of the commodities; for another example, in a live scene, pages such as live topics and keyword descriptions need to be manually filled by a user according to the content of an image or a video, which may consume a large amount of labor time cost. Taking a commodity selling point as an example, the image, the text description and the like of a commodity usually reveal the selling point of the commodity, the commodity selling point can be obtained only by intelligently identifying and extracting the image and the text, if the image and the text are intelligently filled by using a neural network model, manual annotation is needed when the model is trained, and the cost of manual annotation is too high because the commodity selling point is extracted from multi-mode data of the image, the text description and the like of the commodity and the image and the text description are needed to be respectively annotated. Therefore, for the neural network model for processing multi-modal data, the model training efficiency is low, and the model training effect is poor.
Disclosure of Invention
In view of the above, embodiments of the present application provide a model training method, apparatus, electronic device and computer storage medium to solve some or all of the above problems.
According to a first aspect of embodiments of the present application, there is provided a model training method, including: acquiring pre-training sample data, wherein the pre-training sample data comprises multi-mode data; pre-training an encoder in a neural network model by using pre-training sample data to obtain a pre-trained encoder; acquiring a feature representation output after a pre-training encoder processes pre-training sample data and a pre-training reference sample corresponding to the feature representation; the decoder in the neural network model is pre-trained using the feature representations and the pre-training reference samples.
According to a second aspect of embodiments of the present application, there is provided a model training apparatus, including: the system comprises a sample module, a pre-training module and a pre-training module, wherein the sample module is used for acquiring pre-training sample data which comprises multi-modal data; the encoder module is used for pre-training an encoder in the neural network model by utilizing pre-training sample data to obtain a pre-trained encoder; the characteristic representation module is used for acquiring the characteristic representation output after the pre-training completed encoder processes the pre-training sample data and the pre-training reference sample corresponding to the characteristic representation; and the decoder module is used for pre-training a decoder in the neural network model by utilizing the feature representation and the pre-training reference sample.
According to a third aspect of embodiments of the present application, there is provided an electronic apparatus, including: the processor, the memory and the communication interface complete mutual communication through the communication bus; the memory is used for storing at least one executable instruction, and the executable instruction enables the processor to execute the operation corresponding to the model training method of the first aspect.
According to a fourth aspect of embodiments of the present application, there is provided a computer storage medium having stored thereon a computer program which, when executed by a processor, implements the model training method as in the first aspect.
According to the model training method, the model training device, the electronic equipment and the computer storage medium, pre-training sample data is obtained, wherein the pre-training sample data comprises multi-modal data; pre-training an encoder in a neural network model by using pre-training sample data to obtain a pre-trained encoder; acquiring a feature representation output after pre-training sample data is processed by an encoder after pre-training is finished and a pre-training reference sample corresponding to the feature representation; and pre-training a decoder in the neural network model by using the feature representation and the pre-training reference sample. The encoder is pre-trained firstly, the feature representation of the output of the pre-trained encoder is utilized again, the decoder is trained, staged training is adopted for the encoder and the decoder, model training can be easier to converge, the model training effect is improved, manual marking is not needed, the labor cost is reduced, and the training efficiency of the model is improved.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments described in the embodiments of the present application, and other drawings can be obtained by those skilled in the art according to the drawings.
Fig. 1 is a schematic view of a scenario of a model training method according to an embodiment of the present application;
FIG. 2 is a flowchart of a model training method according to an embodiment of the present disclosure;
fig. 3 is a schematic diagram illustrating an encoder according to an embodiment of the present application;
fig. 4 is a schematic diagram illustrating an effect of a decoder according to an embodiment of the present application;
fig. 5 is a schematic view of an application scenario of a neural network model according to an embodiment of the present application;
fig. 6 is a block diagram of a structure of a model training apparatus according to a second embodiment of the present application;
fig. 7 is a schematic structural diagram of an electronic device according to a third embodiment of the present application.
Detailed Description
In order to make those skilled in the art better understand the technical solutions in the embodiments of the present application, the technical solutions in the embodiments of the present application will be described clearly and completely below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, but not all embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments in the present application shall fall within the scope of protection of the embodiments in the present application.
The following further describes specific implementations of embodiments of the present application with reference to the drawings of the embodiments of the present application.
Example one
For convenience of understanding, an application scenario of the model training method is described, and as shown in fig. 1, fig. 1 is a scenario diagram of the model training method provided in the first embodiment of the present application. The scenario shown in fig. 1 includes a model training apparatus 101, and the model training apparatus 101 may be an apparatus for executing the model training method provided in the first embodiment of the present application.
The model training apparatus 101 may be a terminal device such as a notebook computer or a desktop computer, or the model training apparatus 101 may be a server or the like. As shown in fig. 1, the model training apparatus 101 may acquire pre-training sample data including multi-modal data, pre-train an encoder in the neural network model using the pre-training sample data, and then pre-train a decoder in the neural network model using a feature representation output by the pre-trained encoder and a corresponding pre-training reference sample. And pre-training the encoder and the decoder in stages to obtain a pre-trained neural network model.
With reference to the scenario shown in fig. 1, a model training method provided in the first embodiment of the present application is described in detail, it should be noted that fig. 1 is only an application scenario of the model training method provided in the first embodiment of the present application, and does not represent that the model training method must be applied to the scenario shown in fig. 1, referring to fig. 2, fig. 2 is a flowchart of a model training method provided in the first embodiment of the present application, and the method includes the following steps:
step 201, pre-training sample data is obtained.
The pre-training sample data comprises multi-modal data. In this application, the multimodal data may comprise a plurality of carrier types of data or the multimodal data may comprise multimodal data, e.g. the multimodal data may comprise at least two types of data from text, images, video, speech.
Step 202, pre-training the encoder in the neural network model by using pre-training sample data to obtain the pre-trained encoder.
The neural network model includes an encoder, which may be a noise reduction self-encoder, and a decoder, and the encoder is pre-trained for advancement in step 202. The encoder may be configured to perform feature extraction on input data and output a feature representation, where data output by the encoder is defined as a feature representation in this application. It should be noted that the number of pre-training sample data may be multiple, the multiple pre-training sample data are input into the encoder, a loss function value is calculated according to the output feature representation, and the parameters of the encoder are adjusted according to the loss function value until the loss function value is less than or equal to the preset function value.
Optionally, in an embodiment of the present application, pre-training an encoder in a neural network model by using pre-training sample data to obtain a pre-trained encoder, includes: processing the data of at least one mode in the pre-training sample data by adding noise to obtain pre-training sample data containing noise; and inputting pre-training sample data containing noise into an encoder, and pre-training the encoder. Adding noise to the pre-training sample data may be to mask partial data, and further, performing processing of adding noise to data of at least one mode in the pre-training sample data to obtain pre-training sample data containing noise, including: and covering data of at least one mode in the pre-training sample data to obtain the pre-training sample data containing noise.
As shown in fig. 3, fig. 3 is a schematic diagram of an encoder according to an embodiment of the present application, in fig. 3, the pre-training sample data includes text data and image data, the text data is divided into 6 parts (i.e., 6 phrases) and respectively represented by X1-X6, and the image data is divided into 4 parts and respectively represented by Y1-Y4, and noise may be added to one type of data or noise may be added to both types of data. In fig. 3, for example, noise is added to text data, a Phrase-based Masked Language model (PMLM) is introduced into a Phrase structure tree to extract a Phrase structure in pre-training sample data, the Phrase structure is Masked with a Phrase as a minimum granularity, X2 and X3 are Masked, and then X1, X4, X5, X6, and Y1-Y4 are input to an encoder to train the encoder. After partial data is covered, the semantics of pre-training sample data become imperfect, missing character data can be predicted by using image data, and the learning ability of an encoder can be improved.
Step 203, obtaining a feature representation output after the pre-training encoder processes the pre-training sample data, and a pre-training reference sample corresponding to the feature representation.
It should be noted that the decoder is configured to parse the feature representation output by the encoder to obtain an output of the decoder, where the output of the decoder may include text content. The pre-training reference samples are the output of the decoder, and may be said to be expected data. The pre-training reference samples corresponding to the feature representation are the expected output of the decoder after inputting the feature representation into the decoder.
And step 204, pre-training a decoder in the neural network model by using the feature representation and the pre-training reference sample.
It should be noted that the pre-training sample data for pre-training the encoder and the decoder may be the same or different. Optionally, in an embodiment of the present application, pre-training a decoder in a neural network model using the feature representations and the pre-training reference samples includes: carrying out noise adding treatment on the pre-training reference sample to obtain a pre-training reference sample containing noise; and inputting the pre-training reference sample containing noise and the corresponding feature representation into a decoder, and pre-training the decoder. After noise is added to the pre-training reference samples, the learning capability of the decoder can be improved, and the effect of the decoder is enhanced. Further, inputting the pre-training reference sample containing noise and the corresponding feature representation into a decoder, and pre-training the decoder, comprising: and inputting the feature representation into a decoder to obtain the output of the corresponding decoder, comparing the output of the decoder with the pre-training reference sample, and adjusting the parameters of the decoder on the premise of fixing the parameters of the pre-trained encoder according to the comparison result. Because the pre-training of the encoder is completed, the parameters of the encoder are fixed, only the parameters of the decoder are adjusted, and the model convergence can be accelerated and the consistency of the pre-training and the fine-tuning stages can be ensured.
As shown in fig. 4, fig. 4 is a schematic diagram for providing an effect of a decoder according to an embodiment of the present application, in fig. 4, a pre-training sample data is input into an encoder and then output to a feature representation, the feature representation is divided into 6 phrases, which are respectively represented by Z1-Z6, and correspondingly, reference data is also divided into 6 phrases, which are respectively represented by C1-C6, C2 and C3 may be masked, Z1-Z6 is input into the decoder, and then C2 and C3 are predicted, and output of the decoder is obtained, and parameter adjustment is performed on the decoder.
It should be noted that, alternatively, noise may be added to the pre-training reference sample by using a Masked Region Classification Model (MRC) with Linguistic cues, which exemplifies two implementations of how to add noise to the pre-training reference sample.
Optionally, in a first implementation manner, the processing of adding noise to the pre-training reference sample to obtain the pre-training reference sample containing noise includes: and dividing the pre-training reference sample into at least two phrases, disordering the at least two phrases, and obtaining the pre-training reference sample containing noise.
Optionally, in a second implementation manner, the processing of adding noise to the pre-training reference sample to obtain the pre-training reference sample containing noise includes: and performing phrase deletion or phrase masking on the pre-training reference sample, and obtaining the pre-training reference sample containing noise.
Optionally, after step 201 and step 204, the decoder may be further trained according to different decoding tasks to implement different decoding tasks. For example, after pre-training a decoder in a neural network model by using the feature representation and the pre-training reference sample, the method further includes: obtaining model training sample data, and inputting the model training sample data into a pre-trained encoder to obtain characteristic representation; and inputting the model reference sample with the characteristic representation corresponding to the decoding task into the pre-trained decoder, and training the pre-trained decoder. After the decoder is further trained according to a specific decoding task, the decoder can solve different problems, for example, commodity selling points, search result pushing, question and answer information pushing and the like can be automatically generated. Here, three specific examples are listed for explanation:
optionally, in the first example, the neural network model is used to generate a commodity selling point, the model training sample data includes multi-modal data of the commodity, and the model reference sample includes selling point data of the commodity; inputting model reference samples corresponding to the characteristic representation and the decoding task into a pre-trained decoder, and training the pre-trained decoder, wherein the method comprises the following steps: and inputting the feature representation and the selling point data of the corresponding commodity into a pre-trained decoder, and training the decoder. Inputting the multi-modal data of the commodity into the encoder to obtain the feature representation of the corresponding commodity, and inputting the feature representation of the commodity and the selling point data of the corresponding commodity into the decoder to train the decoder. After the decoder is trained, the trained neural network model can be obtained, and after the multi-mode data of the commodity is input into the neural network model, the selling point data of the commodity can be automatically generated.
Optionally, in a second example, the neural network model is used for intelligent search, the model training sample data comprises multi-modal data for search, and the model reference sample comprises search results; inputting model reference samples with characteristic representation corresponding to a decoding task into a pre-trained decoder, and training the pre-trained decoder, wherein the method comprises the following steps: and inputting the feature representation and the corresponding search result into a pre-trained decoder, and training the decoder. Inputting the multi-modal data for searching into the encoder to obtain corresponding feature representation, and inputting the feature representation and the corresponding search result into the decoder, namely training the decoder. After the decoder is trained, the trained neural network model can be obtained, and after multi-modal data for searching are input into the neural network model, a searching result can be automatically obtained.
Optionally, in a third example, the neural network model is used for intelligent question answering, the model training sample data comprises multi-modal data for questioning, and the model reference sample comprises question answering data; inputting model reference samples corresponding to the characteristic representation and the decoding task into a pre-trained decoder, and training the pre-trained decoder, wherein the method comprises the following steps: and inputting the feature representation and the corresponding question and answer data into a pre-trained decoder, and training the decoder. Inputting the multi-modal data for question answering into the encoder to obtain corresponding feature representation, and inputting the feature representation and the corresponding question answering data into the decoder to train the decoder. After the decoder is trained, a trained neural network model can be obtained, and after multi-modal data for question answering are input into the neural network model, question answering data can be obtained, namely relevant questions and answers are automatically pushed according to the multi-modal data input by a user.
Based on the above three examples, after the pre-training of the encoder and the decoder is completed, the decoder may be further trained according to the decoding task, and after the training is completed, different decoding tasks may be performed by using the trained neural network model, where a specific application scenario is listed here for description. As shown in fig. 5, the application scenario shows a terminal device 501, a cloud 502, and a user 503, and it should be noted that the terminal device 501 may access a network and be connected to the cloud 502 through the network. In the present application, the Network includes a Local Area Network (LAN), a Wide Area Network (WAN), and a mobile communication Network; such as the World Wide Web (WWW), Long Term Evolution (LTE) networks, 2G networks (2 th Generation Mobile Network), 3G networks (3 th Generation Mobile Network), 5G networks (5 th Generation Mobile Network), etc. Of course, this is merely an example and does not represent a limitation of the present application. The cloud 502 may include the model training apparatus 101 shown in fig. 1, which may be a server, a relay Device, a Device-to-Device (D2D) Device, and so on.
The user 503 inputs multimodal data on the terminal device 501, the terminal device 501 transmits the multimodal data to the cloud 502, the cloud 502 processes the multimodal data by using the trained neural network model, specifically, a coder is used for performing feature extraction on the multimodal data to obtain corresponding feature representations, then a decoder is used for analyzing the feature representations to obtain output of corresponding models, the cloud 502 returns the output of the models to the terminal device 501, and the user can check the output of the models on the terminal device 501. For example, a user inputs pictures and characters of a commodity on the terminal device 501, and the terminal device 501 interacts with the cloud 502 to show selling points of the commodity to the user; for another example, a user inputs multimodal data for searching on the terminal device 501, and the terminal device 501 interacts with the cloud 502 to display a search result to the user; for another example, a user inputs multimodal data for question answering on the terminal device 501, and the terminal device 501 interacts with the cloud 502 to display question answering data to the user, that is, automatically push questions and answers. Of course, the above is merely illustrative.
According to the model training method provided by the embodiment of the application, pre-training sample data is obtained, wherein the pre-training sample data comprises multi-mode data; pre-training an encoder in a neural network model by using pre-training sample data to obtain a pre-trained encoder; acquiring a feature representation output after a pre-training encoder processes pre-training sample data and a pre-training reference sample corresponding to the feature representation; the decoder in the neural network model is pre-trained using the feature representations and the pre-training reference samples. The encoder is pre-trained firstly, the characteristic representation of the output of the pre-trained encoder is utilized again, the decoder is trained, staged training is adopted for the encoder and the decoder, model training can be more easily converged, the model training effect is improved, manual marking is not needed, the labor cost is reduced, and the training efficiency of the model is improved.
Example two
Based on the method described in the first embodiment, a second embodiment of the present application provides a model training apparatus for performing the method described in the first embodiment, and referring to fig. 6, the model training apparatus 60 includes:
a sample module 601, configured to obtain pre-training sample data, where the pre-training sample data includes multi-modal data;
an encoder module 602, configured to pre-train an encoder in the neural network model by using pre-training sample data to obtain an encoder with completed pre-training;
a feature representation module 603, configured to obtain a feature representation output after the pre-training encoder processes pre-training sample data, and a pre-training reference sample corresponding to the feature representation;
a decoder module 604 for pre-training a decoder in the neural network model using the feature representation and the pre-training reference samples.
Optionally, in an embodiment of the present application, the encoder module 602 is configured to perform processing of adding noise to data of at least one modality in the pre-training sample data to obtain pre-training sample data containing noise; and inputting pre-training sample data containing noise into an encoder, and pre-training the encoder to obtain the pre-trained encoder.
Optionally, in an embodiment of the present application, the encoder module 602 is configured to mask data of at least one modality in the pre-training sample data to obtain the pre-training sample data containing noise.
Optionally, in an embodiment of the present application, the decoder module 604 is configured to perform a noise adding process on the pre-training reference sample to obtain a pre-training reference sample containing noise; and inputting the pre-training reference sample containing noise and the corresponding feature representation into a decoder, and pre-training the decoder.
Optionally, in an embodiment of the present application, the decoder module 604 is configured to input the feature representation into the decoder to obtain an output of the corresponding decoder, compare the output of the decoder with the pre-training reference sample, and adjust a parameter of the decoder according to a comparison result on the premise that a parameter of the pre-trained encoder is fixed.
Optionally, in an embodiment of the present application, the decoder module 604 is configured to divide the pre-training reference sample into at least two phrases, shuffle the at least two phrases, and obtain the pre-training reference sample containing noise.
Optionally, in an embodiment of the present application, the decoder module 604 is configured to perform phrase deletion or phrase masking on the pre-training reference samples and obtain the pre-training reference samples containing noise.
Optionally, in an embodiment of the present application, as shown in fig. 6, the model training apparatus 60 further includes a training module 605, configured to obtain multi-modal data, input the multi-modal data into the pre-trained encoder to obtain a feature representation; and inputting the model reference sample with the characteristic representation corresponding to the decoding task into the pre-trained decoder, and training the pre-trained decoder.
Optionally, in one embodiment of the present application, the multimodal data comprises multimodal data for the good, and the model reference samples comprise selling point data for the good; the training module 605 is configured to input the feature representation and the selling point data of the corresponding commodity into the pre-trained decoder, and train the decoder.
Optionally, in one embodiment of the application, the multimodal data comprises multimodal data for searching, the model reference sample comprises search results; a training module 605, configured to input the feature representation and the corresponding search result into a pre-trained decoder, and train the decoder.
Optionally, in one embodiment of the present application, the multimodal data comprises multimodal data for questioning, and the model reference samples comprise question and answer data; a training module 605, configured to input the feature representation and the corresponding question and answer data into a pre-trained decoder, and train the decoder.
The model training device provided by the embodiment of the application acquires pre-training sample data, wherein the pre-training sample data comprises multi-mode data; pre-training an encoder in a neural network model by using pre-training sample data to obtain a pre-trained encoder; acquiring a feature representation output after a pre-training encoder processes pre-training sample data and a pre-training reference sample corresponding to the feature representation; the decoder in the neural network model is pre-trained using the feature representations and the pre-training reference samples. The encoder is pre-trained firstly, the characteristic representation of the output of the pre-trained encoder is utilized again, the decoder is trained, staged training is adopted for the encoder and the decoder, model training can be more easily converged, the model training effect is improved, manual marking is not needed, the labor cost is reduced, and the training efficiency of the model is improved.
EXAMPLE III
Based on the method described in the first embodiment, a third embodiment of the present application provides an electronic device, configured to execute the method described in the first embodiment, and referring to fig. 7, fig. 7 is a schematic structural diagram of an electronic device provided in a fifth embodiment of the present application, where a specific embodiment of the present application does not limit a specific implementation of the electronic device.
As shown in fig. 7, the electronic device may include: a processor (processor)702, a Communications Interface 704, a memory 706, and a communication bus 708.
Wherein:
the processor 702, communication interface 704, and memory 706 communicate with each other via a communication bus 708.
A communication interface 704 for communicating with other electronic devices, such as a terminal device or a server.
The processor 702 is configured to execute the program 710, and may specifically execute the relevant steps in the foregoing method embodiments.
In particular, the program 710 may include program code that includes computer operating instructions.
The processor 702 may be a central processing unit CPU, or an Application Specific Integrated Circuit (ASIC), or one or more Integrated circuits configured to implement embodiments of the present application. The electronic device comprises one or more processors, which can be the same type of processor, such as one or more CPUs; or may be different types of processors such as one or more CPUs and one or more ASICs.
The memory 706 stores a program 710. The memory 706 may comprise high-speed RAM memory, and may also include non-volatile memory (non-volatile memory), such as disk memory.
The program 710 may be specifically configured to cause the processor 702 to execute any one of the methods of the first embodiment.
For specific implementation of each step in the program 710, reference may be made to corresponding steps and corresponding descriptions in units in the above embodiment of the model training method, which are not described herein again. It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described devices and modules may refer to the corresponding process descriptions in the foregoing method embodiments, and are not described herein again.
The electronic equipment provided by the embodiment of the application acquires pre-training sample data, wherein the pre-training sample data comprises multi-mode data; pre-training an encoder in a neural network model by using pre-training sample data to obtain a pre-trained encoder; acquiring a feature representation output after a pre-training encoder processes pre-training sample data and a pre-training reference sample corresponding to the feature representation; the decoder in the neural network model is pre-trained using the feature representations and the pre-training reference samples. The encoder is pre-trained firstly, the characteristic representation of the output of the pre-trained encoder is utilized again, the decoder is trained, staged training is adopted for the encoder and the decoder, model training can be more easily converged, the model training effect is improved, manual marking is not needed, the labor cost is reduced, and the training efficiency of the model is improved.
Example four
Based on the method described in the first embodiment, a fourth embodiment of the present application provides a computer storage medium, on which a computer program is stored, and the computer program, when executed by a processor, implements the method described in the first embodiment.
The computer storage medium provided by the embodiment of the application acquires pre-training sample data, wherein the pre-training sample data comprises multi-mode data; pre-training an encoder in a neural network model by using pre-training sample data to obtain a pre-trained encoder; acquiring a feature representation output after a pre-training encoder processes pre-training sample data and a pre-training reference sample corresponding to the feature representation; the decoder in the neural network model is pre-trained using the feature representations and the pre-training reference samples. The encoder is pre-trained firstly, the characteristic representation of the output of the pre-trained encoder is utilized again, the decoder is trained, staged training is adopted for the encoder and the decoder, model training can be more easily converged, the model training effect is improved, manual marking is not needed, the labor cost is reduced, and the training efficiency of the model is improved.
It should be noted that, according to the implementation requirement, each component/step described in the embodiment of the present application may be divided into more components/steps, and two or more components/steps or partial operations of the components/steps may also be combined into a new component/step to achieve the purpose of the embodiment of the present application.
The above-described methods according to embodiments of the present application may be implemented in hardware, firmware, or as software or computer code storable in a recording medium such as a CD ROM, a RAM, a floppy disk, a hard disk, or a magneto-optical disk, or as computer code originally stored in a remote recording medium or a non-transitory machine-readable medium downloaded through a network and to be stored in a local recording medium, so that the methods described herein may be stored in such software processes on a recording medium using a general-purpose computer, a dedicated processor, or programmable or dedicated hardware such as an ASIC or FPGA. It will be appreciated that the computer, processor, microprocessor controller, or programmable hardware includes memory components (e.g., RAM, ROM, flash memory, etc.) that can store or receive software or computer code that, when accessed and executed by the computer, processor, or hardware, implements the model training methods described herein. Further, when a general-purpose computer accesses code for implementing the model training methods illustrated herein, execution of the code transforms the general-purpose computer into a special-purpose computer for performing the model training methods illustrated herein.
Those of ordinary skill in the art will appreciate that the various illustrative elements and method steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the technical solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the embodiments of the present application.
The above embodiments are only used for illustrating the embodiments of the present application, and not for limiting the embodiments of the present application, and those skilled in the relevant art can make various changes and modifications without departing from the spirit and scope of the embodiments of the present application, so that all equivalent technical solutions also belong to the scope of the embodiments of the present application, and the scope of patent protection of the embodiments of the present application should be defined by the claims.

Claims (14)

1. A method of model training, comprising:
acquiring pre-training sample data, wherein the pre-training sample data comprises multi-modal data;
pre-training an encoder in a neural network model by using the pre-training sample data to obtain a pre-trained encoder;
acquiring a feature representation output after the pre-training completed encoder processes the pre-training sample data, and a pre-training reference sample corresponding to the feature representation;
pre-training a decoder in the neural network model using the feature representation and the pre-training reference samples.
2. The method of claim 1, wherein said pre-training an encoder in a neural network model with the pre-training sample data to obtain a pre-trained encoder comprises:
processing the data of at least one mode in the pre-training sample data by adding noise to obtain pre-training sample data containing noise;
and inputting the pre-training sample data containing the noise into the encoder, and pre-training the encoder to obtain the encoder with the pre-training completed.
3. The method according to claim 2, wherein the processing of adding noise to the data of at least one modality in the pre-training sample data to obtain pre-training sample data containing noise comprises:
and covering data of at least one mode in the pre-training sample data to obtain the pre-training sample data containing the noise.
4. The method of claim 1, wherein said pre-training a decoder in the neural network model using the feature representations and the pre-training reference samples comprises:
adding noise to the pre-training reference sample to obtain a pre-training reference sample containing noise;
and inputting the pre-training reference sample containing the noise and the corresponding feature representation into the decoder to pre-train the decoder.
5. The method of claim 4, wherein said inputting the pre-training reference samples containing noise and the corresponding feature representations into the decoder, pre-training the decoder, comprises:
and inputting the feature representation into the decoder to obtain the output of the corresponding decoder, comparing the output of the decoder with the pre-training reference sample, and adjusting the parameters of the decoder on the premise of fixing the parameters of the coder which is pre-trained according to the comparison result.
6. The method of claim 4, wherein the subjecting the pre-training reference sample to noise addition to obtain a pre-training reference sample containing noise comprises:
and dividing the pre-training reference sample into at least two phrases, disordering the at least two phrases, and obtaining the pre-training reference sample containing the noise.
7. The method of claim 4, wherein the subjecting the pre-training reference sample to noise addition to obtain a pre-training reference sample containing noise comprises:
and carrying out phrase deletion or phrase covering on the pre-training reference sample, and obtaining the pre-training reference sample containing the noise.
8. The method of claim 1, wherein after the pre-training a decoder in the neural network model using the feature representations and the pre-training reference samples, further comprising:
obtaining model training sample data, inputting the model training sample data into the pre-trained encoder to obtain characteristic representation; and inputting the model reference sample corresponding to the feature representation and the decoding task into the pre-trained decoder, and training the pre-trained decoder.
9. The method of claim 8, wherein the multi-model training sample data comprises multi-modal data for a commodity, the model reference sample comprises selling point data for a commodity;
inputting the model reference sample corresponding to the feature representation and the decoding task into the pre-trained decoder, and training the pre-trained decoder, wherein the training comprises:
and inputting the feature representation and the corresponding selling point data of the commodity into the pre-trained decoder, and training the decoder.
10. The method of claim 8, wherein the model training sample data comprises multimodal data for searching, the model reference sample comprises search results;
inputting the model reference sample corresponding to the feature representation and the decoding task into the pre-trained decoder, and training the pre-trained decoder, wherein the training comprises:
and inputting the feature representation and the corresponding search result into the pre-trained decoder, and training the decoder.
11. The method of claim 8, wherein the model training sample data comprises multimodal data for questioning, the model reference sample comprises question and answer data;
inputting the model reference sample corresponding to the feature representation and the decoding task into the pre-trained decoder, and training the pre-trained decoder, wherein the training comprises:
and inputting the feature representation and the corresponding question and answer data into the pre-trained decoder, and training the decoder.
12. A model training apparatus comprising:
the device comprises a sample module, a pre-training module and a pre-training module, wherein the sample module is used for acquiring pre-training sample data which comprises multi-modal data;
the encoder module is used for pre-training an encoder in the neural network model by using the pre-training sample data to obtain a pre-trained encoder;
the characteristic representation module is used for acquiring a characteristic representation output after the pre-training completed encoder processes the pre-training sample data and a pre-training reference sample corresponding to the characteristic representation;
a decoder module for pre-training a decoder in the neural network model using the feature representation and the pre-training reference sample.
13. An electronic device, comprising: the system comprises a processor, a memory, a communication interface and a communication bus, wherein the processor, the memory and the communication interface complete mutual communication through the communication bus;
the memory is used for storing at least one executable instruction, and the executable instruction causes the processor to execute the operation corresponding to the model training method according to any one of claims 1-11.
14. A computer storage medium having stored thereon a computer program which, when executed by a processor, implements the model training method as claimed in any one of claims 1 to 11.
CN202011505035.2A 2020-12-18 2020-12-18 Model training method and device, electronic equipment and computer storage medium Pending CN114648110A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011505035.2A CN114648110A (en) 2020-12-18 2020-12-18 Model training method and device, electronic equipment and computer storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011505035.2A CN114648110A (en) 2020-12-18 2020-12-18 Model training method and device, electronic equipment and computer storage medium

Publications (1)

Publication Number Publication Date
CN114648110A true CN114648110A (en) 2022-06-21

Family

ID=81990142

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011505035.2A Pending CN114648110A (en) 2020-12-18 2020-12-18 Model training method and device, electronic equipment and computer storage medium

Country Status (1)

Country Link
CN (1) CN114648110A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116700684A (en) * 2022-09-30 2023-09-05 荣耀终端有限公司 Code generation method and terminal

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116700684A (en) * 2022-09-30 2023-09-05 荣耀终端有限公司 Code generation method and terminal
CN116700684B (en) * 2022-09-30 2024-04-12 荣耀终端有限公司 Code generation method and terminal

Similar Documents

Publication Publication Date Title
CN111507099A (en) Text classification method and device, computer equipment and storage medium
US20230103340A1 (en) Information generating method and apparatus, device, storage medium, and program product
CN110930980B (en) Acoustic recognition method and system for Chinese and English mixed voice
CN104735468B (en) A kind of method and system that image is synthesized to new video based on semantic analysis
CN109429522A (en) Voice interactive method, apparatus and system
CN111967224A (en) Method and device for processing dialog text, electronic equipment and storage medium
JP2020030408A (en) Method, apparatus, device and medium for identifying key phrase in audio
CN114245203B (en) Video editing method, device, equipment and medium based on script
CN110991165A (en) Method and device for extracting character relation in text, computer equipment and storage medium
KR20200059993A (en) Apparatus and method for generating conti for webtoon
CN115994536B (en) Text information processing method, system, equipment and computer storage medium
CN111193657A (en) Chat expression reply method, device and storage medium
CN112399269A (en) Video segmentation method, device, equipment and storage medium
CN116797695A (en) Interaction method, system and storage medium of digital person and virtual whiteboard
CN111368145A (en) Knowledge graph creating method and system and terminal equipment
CN112232070A (en) Natural language processing model construction method, system, electronic device and storage medium
CN113919360A (en) Semantic understanding method, voice interaction method, device, equipment and storage medium
CN114648110A (en) Model training method and device, electronic equipment and computer storage medium
CN115238124A (en) Video character retrieval method, device, equipment and storage medium
CN112002325B (en) Multi-language voice interaction method and device
CN113283218A (en) Semantic text compression method and computer equipment
CN112749556A (en) Multi-language model training method and device, storage medium and electronic equipment
CN112765973A (en) Scoring model training method and device and composition scoring method and device
CN104484416B (en) A kind of loading method and device of collection information
CN116913278B (en) Voice processing method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination