CN115358304A - Training method of label generation model, label generation method and related equipment - Google Patents

Training method of label generation model, label generation method and related equipment Download PDF

Info

Publication number
CN115358304A
CN115358304A CN202210957721.6A CN202210957721A CN115358304A CN 115358304 A CN115358304 A CN 115358304A CN 202210957721 A CN202210957721 A CN 202210957721A CN 115358304 A CN115358304 A CN 115358304A
Authority
CN
China
Prior art keywords
training
training samples
model
label
generation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210957721.6A
Other languages
Chinese (zh)
Inventor
杜正印
侯林凯
马航航
袁泽寰
卢靓妮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Youzhuju Network Technology Co Ltd
Original Assignee
Beijing Youzhuju Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Youzhuju Network Technology Co Ltd filed Critical Beijing Youzhuju Network Technology Co Ltd
Priority to CN202210957721.6A priority Critical patent/CN115358304A/en
Publication of CN115358304A publication Critical patent/CN115358304A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Abstract

The disclosure provides a training method of a label generation model, a label generation method and related equipment. The training method of the label generation model comprises the following steps: obtaining a plurality of training samples, the training samples including a generation time; constructing a loss function according to a plurality of generation times corresponding to the training samples; and training an initial model to obtain the label generation model according to the training samples and by combining the loss function.

Description

Training method of label generation model, label generation method and related equipment
Technical Field
The present disclosure relates to the field of computer technologies, and in particular, to a training method for a label generation model, a label generation method, and a related device.
Background
Generating a corresponding Tag (Tag) for the content may enable a user to determine a rough classification of the content based on the Tag, thereby facilitating the user's selection of the content. In the related art, one implementation of generating tags for content is to utilize a machine learning model to generate tags based on content.
However, the inventors of the present disclosure found that the machine learning model employed in the related art has a general effect of generating a label.
Disclosure of Invention
The present disclosure provides a training method for a label generation model, a method for generating a label, and related devices, so as to solve or partially solve the above problems.
In a first aspect of the present disclosure, a training method for a label generation model is provided, including:
obtaining a plurality of training samples, the training samples including a generation time;
constructing a loss function according to a plurality of generation times corresponding to the training samples; and
and training an initial model according to the training samples and by combining the loss function to obtain the label generation model.
In a second aspect of the present disclosure, a method for generating a tag is provided, including:
acquiring target data; and
and inputting the target data into a label generation model obtained by training by adopting the method of the first aspect to obtain the label of the target data.
In a third aspect of the present disclosure, a training apparatus for a label generation model is provided, including:
an acquisition module configured to: obtaining a plurality of training samples, the training samples including a generation time;
a build module configured to: constructing a loss function according to a plurality of generation times corresponding to the training samples; and
a training module configured to: and training an initial model to obtain the label generation model according to the training samples and by combining the loss function.
In a fourth aspect of the present disclosure, an apparatus for generating a tag is provided, including:
an acquisition module configured to: acquiring target data; and
a generation module configured to: and inputting the target data into a label generation model obtained by training by adopting the method of the first aspect to obtain the label of the target data.
In a fifth aspect of the disclosure, a computer device is provided, comprising one or more processors, memory; and one or more programs, wherein the one or more programs are stored in the memory and executed by the one or more processors, the programs comprising instructions for performing the method according to the first or second aspect.
A sixth aspect of the disclosure provides a non-transitory computer readable storage medium containing a computer program which, when executed by one or more processors, causes the processors to perform the method of the first or second aspect.
In a seventh aspect of the present disclosure, there is provided a computer program product comprising computer program instructions which, when run on a computer, cause the computer to perform the method of the first or second aspect.
According to the training method for generating the model by the label, the method for generating the label and the related equipment, loss weighting of sample granularity is carried out in the model training process, and historical data are fully utilized under the condition of considering data distribution updating, so that the accuracy of sample label identification is improved.
Drawings
In order to clearly illustrate the technical solutions of the present disclosure or related technologies, the drawings used in the embodiments or related technologies description will be briefly introduced below, and obviously, the drawings in the following description are only embodiments of the present disclosure, and other drawings can be obtained by those skilled in the art without creative efforts.
Fig. 1 illustrates a schematic diagram of an exemplary system provided by an embodiment of the present disclosure.
Fig. 2A illustrates a flow diagram of an exemplary method provided by an embodiment of the present disclosure.
Fig. 2B shows a flow diagram of an exemplary method according to an embodiment of the present disclosure.
FIG. 3 shows a schematic diagram of an exemplary model training flow, in accordance with an embodiment of the present disclosure.
Fig. 4 shows a flow diagram of another exemplary method provided by an embodiment of the present disclosure.
Fig. 5 shows a hardware structure diagram of an exemplary computer device provided by the embodiment of the present disclosure.
Fig. 6 illustrates a schematic diagram of an exemplary apparatus provided by an embodiment of the present disclosure.
Fig. 7 illustrates a schematic diagram of another exemplary apparatus provided by embodiments of the present disclosure.
Detailed Description
For the purpose of promoting a better understanding of the objects, aspects and advantages of the present disclosure, reference is made to the following detailed description taken in conjunction with the accompanying drawings.
It is to be noted that technical terms or scientific terms used in the embodiments of the present disclosure should have a general meaning as understood by those having ordinary skill in the art to which the present disclosure belongs, unless otherwise defined. The use of "first," "second," and similar terms in the embodiments of the disclosure is not intended to indicate any order, quantity, or importance, but rather to distinguish one element from another. The word "comprising" or "comprises", and the like, means that the element or item preceding the word comprises the element or item listed after the word and its equivalent, but does not exclude other elements or items. The terms "connected" or "coupled" and the like are not restricted to physical or mechanical connections, but may include electrical connections, whether direct or indirect. "upper", "lower", "left", "right", and the like are used merely to indicate relative positional relationships, and when the absolute position of the object being described is changed, the relative positional relationships may also be changed accordingly.
With the continuous development of internet technology, content production platforms are emerging and upgraded continuously. As content in various content-production platforms is layered endlessly, users are dazzling in selecting content. In order to facilitate the user to more easily understand the classification of the content when selecting the content, one way is to generate a corresponding Tag (Tag) based on the content. In this way, the user can preliminarily determine whether the content is the content that the user wants to view by identifying the tag.
In the related art, one content production platform is a multimedia data production platform. The multimedia data may be, for example, video, audio, etc. However, the shooting method, shooting style, shooting content, and the like of the video may change relatively greatly with time. Therefore, the mathematical distribution of the video content will also vary considerably. In a scene of video content label identification, once the label generation model is trained, the information changing along with time is difficult to be embodied into the model, and the identification accuracy is greatly influenced.
In an industrial scenario, the model effect of the current data distribution is the most interesting information, while the model effect of the historical data distribution is relatively unimportant. Thus, the problem can be abstracted to optimize the current test effect under a time series training data stream.
In the related art, one solution is to periodically update the model iteratively with new data, thereby adapting the model to the new data distribution. The method helps the model to adapt to new data distribution quickly by retraining the model on newly marked training data regularly, and ensures the effect of the model.
However, this kind of method needs to retrain the model, so the amount of data needed for new data distribution is often large and the cost is high. In addition, such a method updates the model only with new data, and cannot properly use the history data, and forgets information about the distribution of the history data.
Another approach is to pre-train the model with historical data and then fine-tune with new data.
As mentioned above, if the regular training models are all from zero, it is naturally better suited to the current distribution, but the amount of data needed is often larger. Therefore, another approach is to use the model of the previous cycle as a pre-training model, and then use the updated data for fine-tuning.
However, this method requires a relatively complicated procedure and is sensitive to parameters, and depends on a lot of experience and parameter adjustment. Moreover, since the historical data is only applied by the new model in the mode of pre-training the weight, the effect is a space for improvement.
In view of this, it is desirable to find a simple and effective method for helping a model to efficiently utilize new data and historical data in a scene where data distribution changes over time, so as to improve the performance of the model in the current data distribution.
Fig. 1 illustrates a schematic diagram of an exemplary system 100 provided by an embodiment of the present disclosure.
As shown in fig. 1, the system 100 may include at least one terminal device (e.g., terminal devices 102, 104), a server 106, and a database server 108. A medium, such as network 110, may be included between terminal devices 102 and 104 and server 106 and database server 108 to provide a communication link. Network 110 may include various types of connections, such as wire, wireless communication links, or fiber optic cables, to name a few.
User 112 may use terminal devices 102 and 104 to interact with server 106 via network 110 to receive or send messages and the like. Various Applications (APP) may be installed on the terminal devices 102 and 104, such as a model training application, a tag generation application, a video application, a social application, a payment application, a web browser, an instant messenger, and so on.
Here, the terminal devices 102 and 104 may be hardware or software. When the terminal devices 102 and 104 are hardware, they may be various electronic devices having a display screen, including but not limited to smart phones, tablet computers, e-book readers, MP3 players, laptop portable computers (Laptop), desktop computers (PC), and the like. When the terminal devices 102 and 104 are software, they can be installed in the electronic devices listed above. It may be implemented as multiple pieces of software or software modules (e.g., to provide distributed services) or as a single piece of software or software module. And is not particularly limited herein.
When the terminal devices 102 and 104 are hardware, a video capture device may also be installed thereon. The video acquisition device can be various devices capable of realizing the function of acquiring video, such as a camera, a sensor and the like. The user 112 may capture video using video capture devices on the end devices 102 and 104.
The server 106 may be a server that provides various services, such as a backend server that provides support for various applications displayed on the terminal devices 102 and 104. The background server may train the initial model using the samples in the sample set sent by the terminal devices 102 and 104, and may send the training result (e.g., the generated label generation model) to the terminal devices 102 and 104. In this way, the user may apply the generated Tag generation model to generate a Tag (Tag).
Database server 108 may also be a database server that provides various services. For example, a database server may have a sample set stored therein. The sample set contains a large number of samples. Wherein the samples may include multimedia data, e.g., video, audio, etc. In this way, user 112 may also select training samples from the sample set stored by database server 108 via terminal devices 102 and 104. It is to be appreciated that database server 108 may not be provided in system 100, as server 106 may perform the related functions of database server 108.
Here, the server 106 and the database server 108 may be hardware or software. When they are hardware, they can be implemented as a distributed server cluster composed of a plurality of servers, or as a single server. When they are software, they may be implemented as multiple pieces or modules (e.g., to provide distributed services) or as a single piece or module. And is not particularly limited herein.
It should be noted that the method for generating a label or the training method for a label generation model provided in the embodiments of the present application are generally performed by the server 106. Accordingly, the means for generating labels or the training means for the label generation model are also typically located in the server 106.
It should be understood that the number of terminal devices, networks, database servers, and servers in fig. 1 are merely illustrative. There may be any number of terminals, networks, database servers, and servers, as desired for implementation.
The embodiment of the disclosure provides a training method for a label generation model, which can solve or partially solve the above problems.
Fig. 2A illustrates a flow diagram of an exemplary method 200 provided by an embodiment of the present disclosure. The method 200 may be implemented by the server 106 of fig. 1. As shown in fig. 2A, the method 200 may further include the following steps.
At step 202, a plurality of training samples may be obtained. Wherein the training sample comprises a generation time.
FIG. 3 illustrates a schematic diagram of an exemplary model training flow 300, according to an embodiment of the present disclosure.
As shown in FIG. 3, a training sample set S may be prepared prior to training the model, and each training sample in the set S may be denoted as S i . The training sample set S may include all sample data prior to the current time node, e.g., all video data prior to the current time node. It is to be appreciated that, in order to improve the model training efficiency, the number of training samples may be reduced appropriately, for example, sample data generated within a predetermined time period (e.g., three months, half a year, one year, etc.) from the current time node is selected as the training sample.
In some embodiments, as shown in fig. 3, the training sample 302 may include a generation time 3022 of the training sample 302. The generation time 3022 is used to characterize the time at which the training sample 302 was generated. For example, the generation time 3022 may be a shooting time of a video shot by the user 112 with the terminal apparatus 102 or 104 or an upload time of uploading the video to the server 106. In this way, when the model is trained by using the training sample 302 with the generation time 3022, the trained model can reflect the distribution characteristics of the sample based on time, so that the performance of the model on data distribution can be improved.
In some embodiments, the training samples 302 are labeled with labels 3024. The label 3024 may reflect the general classification of the training sample 302. The label used for labeling can be a variety of labels that are preset, such as food exploration shops, life VLOG, and the like. In some embodiments, the tag 3024 may be labeled by the user 112 with the terminal device 102 or 104.
At step 204, an initial model 304 may be built.
In some embodiments, the initial model 304 may be a machine learning model. Since the label generation model 306 finally trained by the initial model 304 needs to generate labels for the data based on the input data, the initial model 304 may be a classification model, such as a neural network model, a decision tree, a support vector machine, a bayesian classifier, or the like.
Since the model needs to be optimized based on the loss function when training the model, the loss function 308 may also be constructed in step 206.
In order that the trained label generation model 306 may reflect the distribution characteristics of the samples based on time, in some embodiments, the loss function may be constructed according to a plurality of production times corresponding to the training samples.
As an alternative embodiment, a plurality of weights corresponding to the plurality of training samples may be determined according to a plurality of generation times corresponding to the plurality of training samples, and then the loss function may be constructed according to the plurality of weights. Therefore, the weight in the loss function is set by using the generation time corresponding to the training sample, so that when the model is optimized based on the loss function, the model obtained by training can reflect the distribution characteristics of the sample based on time, and the performance of the model is improved.
In some embodiments, the loss function is in the form of:
Figure BDA0003792033740000071
as can be seen, the loss function L comprises a plurality of training samples s i Corresponding plurality of sub-functions loss i The penalty function L is determined by weighting a plurality of weights Weight i With a plurality of sub-functions loss i Corresponding to the multiplication and further summation (Σ).
Wherein, the Weight i May be a function F related to the generation time of the training samples, like:
Weight i =F(t i )
wherein the function F (t) i ) The function F is related to the generation time of the training sample, and when the function F is constructed, it is only necessary to reflect the correlation with the generation time of the training sample, and any function may be used for design in actual implementation.
In general, the closer the generation time of the training samples is to the current time (e.g., the test time), the more significant its impact on the model. Therefore, as an alternative embodiment, the relationship between the plurality of weights and the generation time of the plurality of training samples may be a linear relationship.
Optionally, the linear relationship is in the form of:
Figure BDA0003792033740000072
wherein, t i For training a sample s i Generation time of t test The average of the generation times of the test samples in the test set is used. As an alternative embodiment, the aforementioned time may be a time stamp carried by the sample for the convenience of calculation. The time stamp is generally the total number of seconds from greenwich time 1970, 01, 00 min 00 s (beijing time 1970, 01, 08 min 00 s) to date.
It can be understood that, according to different choices of the test samples in the test set, the weight values corresponding to the single training sample may be different. However, when t is i Less than t test When, the distance t test The further away the training sample, the smaller the weight, the faciesAccordingly, the contribution to the model is smaller. Therefore, the weight in the loss function is set by adopting the linear relation, so that the trained model can reflect the distribution characteristics of the sample based on time. Accordingly, the selection of test samples may be adjusted during the training of the model to ensure the performance of the model. In order to make as many training samples as possible meet the criterion that the farther from the current time, the less contribution to the model, the sample data that is generated at a time closer to the current time may be selected as the test sample.
In some embodiments, the weights may be related to a period of elimination of the training samples. Thus, the linear relationship can be expressed as the following equation:
Figure BDA0003792033740000081
wherein, the elimination period of the training sample is marked as T expel When t is i Less than or equal to T expel When F (t) i )=0。
It can be seen that the elimination period T expel The method is used for eliminating some relatively long data to ensure that the data does not contribute to the loss function any more, so that when the model is optimized, the model optimization can be completed by using the data closer to the current time, the contribution of the new data to the model can be better represented, and the model obtained by final training can be better matched with the characteristics of the recent data.
In some scenarios, the data closer to the current time may have a more significant contribution to the model, and to embody such a feature, as an alternative embodiment, the relationship between the plurality of weights and the generation time of the plurality of training samples may be a variable rate attenuation relationship.
Optionally, the shift damping relationship is in the form of:
Figure BDA0003792033740000082
wherein, t i For training samples s i Generation time of t test The average of the generation times of the test samples in the test set is taken. As an alternative embodiment, the aforementioned time may be a time stamp carried by the sample for the convenience of calculation. The time stamp is generally the total number of seconds from greenwich time 1970, 01, 00 min 00 s (beijing time 1970, 01, 08 min 00 s) to date.
It can be understood that, according to different choices of the test samples in the test set, the weight values corresponding to a single training sample may be different. However, when t is i Less than t test When, the distance t test The further the training samples are, the weight and, correspondingly, the contribution to the model are significantly reduced. Therefore, the weight in the loss function is set by adopting the variable speed attenuation relation, so that the trained model can reflect the characteristics of the variable speed attenuation distribution of the sample based on time. Accordingly, the selection of test samples may be adjusted during the training of the model to ensure the performance of the model. In order to make as many training samples as possible meet the criterion that the farther from the current time, the less contribution to the model, the sample data that is generated at a time closer to the current time may be selected as the test sample.
It should be noted that, in addition to the shift damping relationship given by the foregoing formula, there are actually many functions that can embody such a shift damping relationship, for example, an exponential function. Accordingly, other types of functions that embody a shift attenuation relationship are within the scope of the present disclosure.
In some embodiments, the weights may be related to a period of elimination of the training samples. Thus, the shift damping relationship can be expressed as the following equation:
Figure BDA0003792033740000091
wherein, the elimination period of the training sample is marked as T expel When t is i Less than or equal to T expel When F (t) i )=0。
It can be seen that the elimination period T expel The method is used for eliminating some relatively long-term data to enable the data not to contribute to a loss function any more, so that when a model is optimized, model optimization can be completed by using data which is close to the current time, contribution of the new data to the model can be better expressed, and the model obtained through final training can be better matched with recent data characteristics.
In some embodiments, the weights may also be determined using the effect of the historical model on the test set.
Fig. 2B illustrates a flow diagram of an exemplary method 206 in accordance with an embodiment of the disclosure. As shown in fig. 2B, the step of determining a plurality of weights corresponding to the plurality of training samples according to the generation times of the plurality of training samples in step 206 may further include the following steps.
At step 2062, a first label generation model and a second label generation model corresponding to the first time node and the second time node are determined.
Suppose a time node t i-1 And t i The corresponding models are M respectively j-1 And M j . Wherein, the model M j-1 Is at time node t i-1 By time node t i-1 The model obtained by training the previously produced data as a training sample, model M j Is at time node t i By time node t i The resulting model was trained using previously produced data as training samples.
It should be noted that, when selecting a time node, two time nodes may be arbitrarily selected without selecting an adjacent time node, and the corresponding model needs to be the model corresponding to the selected time node.
At step 2064, a first accuracy rate of the first tag generative model and a second accuracy rate of the second tag generative model are determined.
For example, the model M can be derived by testing using test samples of the test set as probes j-1 And M j Respectively, has an accuracy of P j-1 And P j
The accuracy may be calculated by: and after each test sample is input into the model, the prediction label is obtained by model prediction, if the prediction label is consistent with the real label of the test sample, the correct number is increased by one, otherwise, the error number is increased by one. And repeating the steps until all the test samples in the test set are tested, obtaining the correct total number and the error total number, and then dividing the correct total number by the total number of the test samples in the test set to obtain the accuracy.
The selection mode of the test sample in the test set is not limited. In some embodiments, to enable the model to better reflect the characteristics of recent data, data closer to the current time node may be selected as the test sample.
In step 2066, fitting is performed according to a preset function according to the first accuracy and the second accuracy, and the first time node and the second time node, so as to obtain a target function.
The preset function may select an arbitrary function. In some embodiments, the predetermined function may be a linear function or a variable rate attenuation function, as described above for function F (t) i ) Any embodiment of (a). Thus, fitting (t) with the preset function j-1 ,P j-1 ) And (t) j ,P j ) Two points, thereby obtaining the objective function.
In step 2068, a plurality of weights corresponding to the plurality of training samples are determined according to the objective function and by combining the generation time of the training samples.
Thus, after fitting the objective function, the generation time t of each training sample is determined i The objective function is substituted, and the weight corresponding to each training sample can be obtained.
In some embodiments, more time nodes can be further selected, and then a plurality of objective functions are obtained by respectively fitting two time nodes close to each other in time sequence into a group according to the method, so that the selected time nodes are used as function endpoints, and the final objective function is expressed as a piecewise function, thereby better reflecting the characteristic of data distribution according to time.
In step 208, an initial model may be trained to obtain the label generation model according to the training samples and the loss function.
As an alternative embodiment, a gradient descent method may be used to optimize the model using the aforementioned loss function L, so as to obtain the label generation model 306.
It is understood that there are many ways to optimize the model, and besides the gradient descent method, a Momentum method (Momentum), an RMSprop (Root Mean Square prop) method, an Adam method, an AdamW method, and the like may be used.
It can be seen from the foregoing embodiments that, in the training method for generating a model by using labels provided in the embodiments of the present disclosure, loss weighting of sample granularity is performed in the model training process, and historical data is fully utilized in consideration of data distribution updating, so that the accuracy of sample label identification is improved.
It should be noted that the method of the embodiments of the present disclosure may be executed by a single device, such as a computer or a server. The method of the embodiment can also be applied to a distributed scene and completed by the mutual cooperation of a plurality of devices. In such a distributed scenario, one of the devices may only perform one or more steps of the method of the embodiments of the present disclosure, and the devices may interact with each other to complete the method.
It should be noted that the above describes some embodiments of the disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments described above and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.
The embodiment of the disclosure also provides a method for generating the label, and the accuracy of the generated label is higher.
Fig. 4 illustrates a flow diagram of an exemplary method 400 provided by an embodiment of the present disclosure. The method 400 may be implemented by the server 106 of fig. 1. As shown in fig. 4, the method 400 may further include the following steps.
At step 402, target data, such as any multimedia data (video, audio, etc.), may be obtained.
In step 404, the target data may be input into a tag generation model to obtain a tag of the target data.
Wherein the label generation model is trained according to a training sample containing a production time and a loss function constructed according to the production time.
According to the method for generating the label, the used label generation model performs loss weighting on sample granularity in the model training process, and makes full use of historical data under the condition of considering data distribution updating, so that the accuracy of sample label identification is improved.
In some embodiments, the label generation model may be obtained by training by using any embodiment of the aforementioned training method 200 for label generation models or the permutation and combination of the embodiments, and may have the technical effects of the corresponding embodiments, which are not described herein again.
The embodiment of the present disclosure further provides a computer device for implementing the method 200, 300, or 400. Fig. 5 shows a hardware structure diagram of an exemplary computer device 500 provided by the embodiments of the present disclosure. Computer device 500 may be used to implement terminal devices 102 and 104 of fig. 1, and may also be used to implement server 106 of fig. 1. In some scenarios, the computer device 500 may also be used to implement the database server 108 of fig. 1.
As shown in fig. 5, the computer device 500 may include: a processor 502, a memory 504, a network module 506, a peripheral interface 508, and a bus 510. The processor 502, memory 504, network module 506, and peripheral interface 508 are communicatively coupled to each other within the computer device 500 via bus 510.
The processor 502 may be a Central Processing Unit (CPU), an image processor, a neural Network Processor (NPU), a Microcontroller (MCU), a programmable logic device, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), or one or more Integrated circuits. The processor 502 may be used to perform functions related to the techniques described in this disclosure. In some embodiments, processor 502 may also include multiple processors integrated into a single logic component. For example, as shown in FIG. 5, the processor 502 may include a plurality of processors 502a, 502b, and 502c.
The memory 504 may be configured to store data (e.g., instructions, computer code, etc.). As shown in fig. 5, the data stored by the memory 504 may include program instructions (e.g., for implementing the methods 200, 300, or 400 of embodiments of the present disclosure) as well as data to be processed (e.g., the memory may store configuration files for other modules, etc.). The processor 502 may also access program instructions and data stored in the memory 504 and execute the program instructions to operate on the data to be processed. The memory 504 may include volatile memory devices or nonvolatile memory devices. In some embodiments, the memory 504 may include Random Access Memory (RAM), read Only Memory (ROM), optical disks, magnetic disks, hard disks, solid State Disks (SSDs), flash memory, memory sticks, and the like.
The network interface 506 may be configured to provide communications with other external devices to the computer device 500 via a network. The network may be any wired or wireless network capable of transmitting and receiving data. For example, the network may be a wired network, a local wireless network (e.g., bluetooth, wiFi, near Field Communication (NFC), etc.), a cellular network, the internet, or a combination of the above. It is to be understood that the type of network is not limited to the specific examples described above.
Peripheral interface 508 may be configured to connect computer device 500 with one or more peripheral devices for the input and output of information. For example, the peripheral devices may include input devices such as keyboards, mice, touch pads, touch screens, microphones, various sensors, and output devices such as displays, speakers, vibrators, indicator lights, and the like.
The bus 510 may be configured to transfer information between various components of the computer device 500 (e.g., the processor 502, the memory 504, the network interface 506, and the peripheral interface 508), such as an internal bus (e.g., a processor-memory bus), an external bus (a USB port, a PCI-E bus), and so forth.
It should be noted that although the architecture of the computer device 500 described above only shows the processor 502, the memory 504, the network interface 506, the peripheral interface 508 and the bus 510, in a specific implementation, the architecture of the computer device 500 may also include other components necessary for normal operation. Moreover, those skilled in the art will appreciate that the architecture of the computer device 500 described above may also include only the components necessary to implement the embodiments of the present disclosure, and need not include all of the components shown in the figures.
The embodiment of the disclosure also provides a training device of the label generation model. Fig. 6 illustrates a schematic diagram of an exemplary apparatus 600 provided by an embodiment of the present disclosure. As shown in fig. 6, the apparatus 600 may be used to implement the method 200 or 300, and may further include the following modules.
An acquisition module 602 configured to: obtaining a plurality of training samples, the training samples including a generation time;
a build module 604 configured to: constructing a loss function according to a plurality of generation times corresponding to the training samples; and
a training module 606 configured to: and training an initial model to obtain the label generation model according to the training samples and by combining the loss function.
In some embodiments, the building module 604 is configured to: determining a plurality of weights corresponding to the plurality of training samples according to a plurality of generation times corresponding to the plurality of training samples; and constructing the loss function according to the plurality of weights.
In some embodiments, the loss function includes a plurality of sub-functions corresponding to the plurality of training samples, the loss function being obtained by multiplying the plurality of weights by the plurality of sub-function correspondences.
In some embodiments, the relationship between the plurality of weights and the generation time of the plurality of training samples is a linear relationship or a variable rate decay relationship.
In some embodiments, the building module 604 is configured to: determining a first label generation model and a second label generation model corresponding to the first time node and the second time node; determining a first accuracy rate of the first tag generative model and a second accuracy rate of the second tag generative model; fitting according to a preset function according to the first accuracy and the second accuracy, and the first time node and the second time node to obtain a target function; and determining a plurality of weights corresponding to the plurality of training samples according to the objective function and by combining the generation time of the training samples.
In some embodiments, the weights are related to a period of elimination of the training samples.
In some embodiments, the training samples are multimedia data labeled with labels.
For convenience of description, the above devices are described as being divided into various modules by functions, which are described separately. Of course, the functionality of the various modules may be implemented in the same one or more software and/or hardware implementations of the present disclosure.
The apparatus of the foregoing embodiment is used to implement the corresponding method 200 or 300 in any of the foregoing embodiments, and has the beneficial effects of the corresponding method embodiment, which are not described herein again.
The embodiment of the disclosure also provides a device for generating the label. Fig. 7 illustrates a schematic diagram of an exemplary apparatus 700 provided by an embodiment of the present disclosure. The source code includes a first code for pre-parsing a layout file, as shown in fig. 7, the apparatus 700 may be used to implement the method 400, and may further include the following modules.
An acquisition module 702 configured to: acquiring target data; and
a generating module 704 configured to: inputting the target data into a label generation model to obtain a label of the target data;
the label generation model is trained according to a training sample containing the production time and a loss function constructed according to the production time.
In some embodiments, the label generation model is trained by:
obtaining a plurality of training samples, the training samples including a generation time;
constructing a loss function according to a plurality of generation times corresponding to the training samples; and
and training an initial model according to the training samples and by combining the loss function to obtain the label generation model.
In some embodiments, constructing the loss function according to a plurality of generation times corresponding to the plurality of training samples includes:
determining a plurality of weights corresponding to the plurality of training samples according to a plurality of generation times corresponding to the plurality of training samples; and
and constructing the loss function according to the plurality of weights.
In some embodiments, the loss function includes a plurality of sub-functions corresponding to the plurality of training samples, the loss function being obtained by multiplying the plurality of weights by the plurality of sub-function correspondences.
In some embodiments, the relationship between the plurality of weights and the generation time of the plurality of training samples is a linear relationship or a variable rate decay relationship.
In some embodiments, determining a plurality of weights corresponding to the plurality of training samples according to the generation times of the plurality of training samples includes:
determining a first label generation model and a second label generation model corresponding to the first time node and the second time node;
determining a first accuracy rate of the first tag generative model and a second accuracy rate of the second tag generative model;
fitting according to a preset function according to the first accuracy and the second accuracy, and the first time node and the second time node to obtain a target function; and
and determining a plurality of weights corresponding to the plurality of training samples according to the objective function and by combining the generation time of the training samples.
In some embodiments, the weights are related to a period of elimination of the training samples.
In some embodiments, the training samples are multimedia data labeled with a label.
For convenience of description, the above devices are described as being divided into various modules by functions, and are described separately. Of course, the functionality of the various modules may be implemented in the same one or more software and/or hardware implementations of the present disclosure.
The apparatus of the foregoing embodiment is used to implement the method 400 corresponding to any one of the foregoing embodiments, and has the beneficial effects of the corresponding method embodiment, which are not described herein again.
Based on the same inventive concept, corresponding to any of the embodiment methods described above, the present disclosure also provides a non-transitory computer-readable storage medium storing computer instructions for causing the computer to perform the method 200, 300, or 400 according to any of the above embodiments.
Computer-readable media of the present embodiments, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device.
The computer instructions stored in the storage medium of the above embodiment are used to enable the computer to execute the method 200, 300, or 400 according to any embodiment, and have the beneficial effects of the corresponding method embodiment, which are not described herein again.
The present disclosure also provides a computer program product comprising a computer program corresponding to any of the embodiment methods 200, 300 or 400 described above, based on the same inventive concept. In some embodiments, the computer program is executable by one or more processors to cause the processors to perform the methods 200, 300, or 400. Corresponding to the execution subject corresponding to each step in the embodiments of the method 200, 300 or 400, the processor executing the corresponding step may be of the corresponding execution subject.
The computer program product of the foregoing embodiment is used to enable a processor to execute the method 200, 300, or 400 according to any of the foregoing embodiments, and has the advantages of corresponding method embodiments, which are not described herein again.
Those of ordinary skill in the art will understand that: the discussion of any embodiment above is meant to be exemplary only, and is not intended to intimate that the scope of the disclosure, including the claims, is limited to these examples; within the idea of the present disclosure, also technical features in the above embodiments or in different embodiments may be combined, steps may be implemented in any order, and there are many other variations of the different aspects of the embodiments of the present disclosure as described above, which are not provided in detail for the sake of brevity.
In addition, well-known power/ground connections to Integrated Circuit (IC) chips and other components may or may not be shown in the provided figures for simplicity of illustration and discussion, and so as not to obscure the embodiments of the disclosure. Further, devices may be shown in block diagram form in order to avoid obscuring embodiments of the disclosure, and also in view of the fact that specifics with respect to implementation of such block diagram devices are highly dependent upon the platform within which the embodiments of the disclosure are to be implemented (i.e., specifics should be well within purview of one skilled in the art). Where specific details (e.g., circuits) are set forth in order to describe example embodiments of the disclosure, it should be apparent to one skilled in the art that the embodiments of the disclosure can be practiced without, or with variation of, these specific details. Accordingly, the description is to be regarded as illustrative instead of restrictive.
While the present disclosure has been described in conjunction with specific embodiments thereof, many alternatives, modifications, and variations of these embodiments will be apparent to those of ordinary skill in the art in light of the foregoing description. For example, other memory architectures, such as Dynamic RAM (DRAM), may use the discussed embodiments.
The disclosed embodiments are intended to embrace all such alternatives, modifications and variances which fall within the broad scope of the appended claims. Therefore, any omissions, modifications, equivalents, improvements, and the like that may be made without departing from the spirit or scope of the embodiments of the present disclosure are intended to be included within the scope of the disclosure.

Claims (13)

1. A training method of a label generation model comprises the following steps:
obtaining a plurality of training samples, the training samples including a generation time;
constructing a loss function according to a plurality of generation times corresponding to the training samples; and
and training an initial model to obtain the label generation model according to the training samples and by combining the loss function.
2. The method of claim 1, wherein constructing a loss function based on a plurality of generation times corresponding to the plurality of training samples comprises:
determining a plurality of weights corresponding to the plurality of training samples according to a plurality of generation times corresponding to the plurality of training samples; and
and constructing the loss function according to the weights.
3. The method of claim 2, wherein the loss function comprises a plurality of sub-functions corresponding to the plurality of training samples, the loss function being obtained by multiplying the plurality of weights by the plurality of sub-function correspondences.
4. The method of claim 2, wherein the relationship between the plurality of weights and the generation times of the plurality of training samples is a linear relationship or a variable rate of decay relationship.
5. The method of claim 2, wherein determining a plurality of weights corresponding to the plurality of training samples according to the generation times of the plurality of training samples comprises:
determining a first label generation model and a second label generation model corresponding to the first time node and the second time node;
determining a first accuracy rate of the first tag generation model and a second accuracy rate of the second tag generation model;
fitting according to a preset function according to the first accuracy and the second accuracy, and the first time node and the second time node to obtain a target function; and
and determining a plurality of weights corresponding to the plurality of training samples according to the objective function and by combining the generation time of the training samples.
6. The method of any of claims 2-5, wherein the weight is related to a washout period of the training samples.
7. The method of claim 1, wherein the training samples are multimedia data labeled with a label.
8. A method of generating a tag, comprising:
acquiring target data; and
inputting the target data into a label generation model obtained by training according to the method of any one of claims 1-7 to obtain a label of the target data.
9. A training apparatus for a label generation model, comprising:
an acquisition module configured to: obtaining a plurality of training samples, the training samples including a generation time;
a build module configured to: constructing a loss function according to a plurality of generation times corresponding to the training samples; and
a training module configured to: and training an initial model to obtain the label generation model according to the training samples and by combining the loss function.
10. An apparatus for generating a label, comprising:
an acquisition module configured to: acquiring target data; and
a generation module configured to: inputting the target data into a label generation model obtained by training according to the method of any one of claims 1-7 to obtain the label of the target data.
11. A computer device comprising one or more processors, memory; and one or more programs, wherein the one or more programs are stored in the memory and executed by the one or more processors, the programs comprising instructions for performing the method of any of claims 1-7 or the method of claim 8.
12. A non-transitory computer-readable storage medium containing a computer program which, when executed by one or more processors, causes the processors to perform the method of any one of claims 1-7 or the method of claim 8.
13. A computer program product comprising computer program instructions which, when run on a computer, cause the computer to perform the method of any one of claims 1-7 or the method of claim 8.
CN202210957721.6A 2022-08-10 2022-08-10 Training method of label generation model, label generation method and related equipment Pending CN115358304A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210957721.6A CN115358304A (en) 2022-08-10 2022-08-10 Training method of label generation model, label generation method and related equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210957721.6A CN115358304A (en) 2022-08-10 2022-08-10 Training method of label generation model, label generation method and related equipment

Publications (1)

Publication Number Publication Date
CN115358304A true CN115358304A (en) 2022-11-18

Family

ID=84033394

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210957721.6A Pending CN115358304A (en) 2022-08-10 2022-08-10 Training method of label generation model, label generation method and related equipment

Country Status (1)

Country Link
CN (1) CN115358304A (en)

Similar Documents

Publication Publication Date Title
CN110366734B (en) Optimizing neural network architecture
CN108446374B (en) User's Intention Anticipation method, apparatus, electronic equipment, storage medium
CN109359793B (en) Prediction model training method and device for new scene
CN108805091B (en) Method and apparatus for generating a model
US20160232440A1 (en) Recurrent neural networks for data item generation
CN110766142A (en) Model generation method and device
CN108021931A (en) A kind of data sample label processing method and device
WO2017201507A1 (en) Memory-efficient backpropagation through time
CN110008973B (en) Model training method, method and device for determining target user based on model
US20220245424A1 (en) Microgenre-based hyper-personalization with multi-modal machine learning
CN113240127A (en) Federal learning-based training method and device, electronic equipment and storage medium
CN110473042B (en) Method and device for acquiring information
CN111079944A (en) Method and device for realizing interpretation of transfer learning model, electronic equipment and storage medium
CN116684330A (en) Traffic prediction method, device, equipment and storage medium based on artificial intelligence
CN113535912A (en) Text association method based on graph convolution network and attention mechanism and related equipment
CN111325291B (en) Entity object classification method for selectively integrating heterogeneous models and related equipment
CN116703466A (en) System access quantity prediction method based on improved wolf algorithm and related equipment thereof
CN115358304A (en) Training method of label generation model, label generation method and related equipment
CN113361621B (en) Method and device for training model
CN115952317A (en) Video processing method, device, equipment, medium and program product
CN111949860B (en) Method and apparatus for generating a relevance determination model
CN113742593A (en) Method and device for pushing information
CN111784377A (en) Method and apparatus for generating information
CN116703498B (en) Commodity recommendation method and device, electronic equipment and storage medium
CN114610996A (en) Information pushing method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination