CN117726917A - Model training method, device, electronic equipment and computer storage medium - Google Patents

Model training method, device, electronic equipment and computer storage medium Download PDF

Info

Publication number
CN117726917A
CN117726917A CN202311036665.3A CN202311036665A CN117726917A CN 117726917 A CN117726917 A CN 117726917A CN 202311036665 A CN202311036665 A CN 202311036665A CN 117726917 A CN117726917 A CN 117726917A
Authority
CN
China
Prior art keywords
source
model
training
training data
trained
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311036665.3A
Other languages
Chinese (zh)
Inventor
张威
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xiaohongshu Technology Co ltd
Original Assignee
Xiaohongshu Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xiaohongshu Technology Co ltd filed Critical Xiaohongshu Technology Co ltd
Priority to CN202311036665.3A priority Critical patent/CN117726917A/en
Publication of CN117726917A publication Critical patent/CN117726917A/en
Pending legal-status Critical Current

Links

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The embodiment of the application discloses a model training method, a model training device, electronic equipment and a computer storage medium; in the embodiment of the application, a data set of a model to be trained is obtained, wherein the data set comprises training data of at least two sources; extracting features of the training data of each source to obtain sample features corresponding to the training data of each source; determining a loss value of the model to be trained for each source according to sample characteristics corresponding to the training data of each source; and training the model to be trained according to the loss value of each source to obtain a target model after the training of the model to be trained is completed. According to the training method and device, the training effect of the model to be trained can be improved.

Description

Model training method, device, electronic equipment and computer storage medium
Technical Field
The application relates to the technical field of neural network models, in particular to a model training method, a device, electronic equipment and a computer storage medium.
Background
With the development of science and technology, the neural network model is increasingly widely applied. Before using the neural network model, the neural network model needs to be trained from the data. Training of the neural network model includes supervised training and unsupervised training. When the neural network model is supervised and trained, the data needs to be marked.
At present, different data sources exist in the data, the identifications corresponding to the data of the different data sources are possibly the same, at the moment, the data of the same identification are required to be clustered and combined and then marked and trained, but noise is introduced, so that the training effect of the neural network model is poor.
Disclosure of Invention
The embodiment of the application provides a model training method, a model training device, electronic equipment and a computer storage medium, which can solve the technical problem of poor training effect of a neural network model.
The embodiment of the application provides a model training method, which comprises the following steps:
acquiring a data set of a model to be trained, wherein the data set comprises training data of at least two sources;
extracting features of the training data of each source to obtain sample features corresponding to the training data of each source;
determining a loss value of the model to be trained for each source according to sample characteristics corresponding to the training data of each source;
and training the model to be trained according to the loss value of each source to obtain a target model after the training of the model to be trained is completed.
Accordingly, an embodiment of the present application provides a model training apparatus, including:
the acquisition module is used for acquiring a data set of a model to be trained, wherein the data set comprises training data of at least two sources;
the extraction module is used for extracting the characteristics of the training data of each source to obtain sample characteristics corresponding to the training data of each source;
the determining module is used for determining a loss value of the model to be trained for each source according to the sample characteristics corresponding to the training data of each source;
and the training module is used for training the model to be trained according to the loss value of each source to obtain a target model after the training of the model to be trained is completed.
In addition, the embodiment of the application also provides electronic equipment, which comprises a processor and a memory, wherein the memory stores a computer program, and the processor is used for running the computer program in the memory to realize the model training method provided by the embodiment of the application.
In addition, the embodiment of the application further provides a computer readable storage medium, and the computer readable storage medium stores a computer program, and the computer program is suitable for being loaded by a processor to execute any model training method provided by the embodiment of the application.
In addition, the embodiment of the application also provides a computer program product, which comprises a computer program, wherein the computer program is executed by a processor to realize any model training method provided by the embodiment of the application.
In the embodiment of the application, a data set of a model to be trained is obtained, wherein the data set comprises training data of at least two sources; extracting features of the training data of each source to obtain sample features corresponding to the training data of each source; determining a loss value of the model to be trained for each source according to sample characteristics corresponding to training data of each source; according to the loss values of each source, training the model to be trained to obtain a target model after training the model to be trained, calculating the loss values corresponding to each source according to the training data of each source, and training the model to be trained according to the loss values of each source, so that cluster combination of the training data of different sources is not needed, noise is not introduced, the workload of cluster combination is reduced, and the training efficiency and training effect of the model to be trained are improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly introduced below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic flow chart of a model training method provided in an embodiment of the present application;
FIG. 2 is a schematic diagram of loss values for each source provided by embodiments of the present application;
FIG. 3 is a schematic structural diagram of a model training device according to an embodiment of the present disclosure;
fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all, of the embodiments of the present application. All other embodiments, which can be made by those skilled in the art based on the embodiments herein without making any inventive effort, are intended to be within the scope of the present application.
The embodiment of the application provides a model training method, a model training device, electronic equipment and a computer storage medium. The model training device can be integrated in electronic equipment, and the electronic equipment can be a server, a terminal and other equipment.
The server may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, network acceleration services (Content Delivery Network, CDN), basic cloud computing services such as big data and an artificial intelligent platform.
The terminal may be, but is not limited to, a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart speaker, a smart watch, etc. The terminal and the server may be directly or indirectly connected through wired or wireless communication, which is not limited herein.
In addition, "plurality" in the embodiments of the present application means two or more. "first" and "second" and the like in the embodiments of the present application are used for distinguishing descriptions and are not to be construed as implying relative importance.
The following will describe in detail. The following description of the embodiments is not intended to limit the preferred embodiments.
In this embodiment, the model training method of the present application will be described from the viewpoint of the model training apparatus, and for convenience, the model training apparatus will be described in detail below as integrated in a terminal, that is, the terminal will be used as an execution subject.
Referring to fig. 1, fig. 1 is a flow chart of a model training method according to an embodiment of the present application. The model training method may include:
s101, acquiring a data set of a model to be trained, wherein the data set comprises training data of at least two sources.
The model to be trained (neural network) refers to a network system formed by connecting various neurons, and the neurons refer to a computing unit. The type of the model to be trained may be selected according to practical situations, for example, the model to be trained may be a convolutional neural network model (Convolutional Neural Networks, CNN) or a recurrent neural network (Recurrent Neural Networks, RNN), which are not limited herein.
The function of the model to be trained may be set according to actual situations, for example, the model to be trained may be used for classification or text extraction, which is not limited herein.
The data set includes training data of at least two sources, which can be understood as that the data set includes training data with source tags, the source tags of the training data are different, and the terminal can determine the source of the training data by identifying the source tags.
Since the data to be trained is trained according to a batch (batch) of training data each time the model to be trained is trained, the data set of the model to be trained may include at least one batch of training data, and each batch of training data may include training data of at least two sources.
The source may refer to the place of generation and/or the time of generation of the training data. For example, the first training data is collected through the first shooting device, the second training data is collected through the second shooting device, the source of the first training data is the first shooting device, the source of the second training data is the second shooting device, and the sources of the first training data and the second training data are different. For example, at time t1, first training data is collected through the first photographing device, at time t2, second training data is collected through the first photographing device, the source of the first training data is time t1, the source of the second training data is time t2, the time interval between time t1 and time t2 meets the preset interval condition, and the sources of the first training data and the second training data are different.
Alternatively, the formats of the training data from different sources may be the same or different, for example, the training data from different sources may be RGB images, or the different sources may include a first source and a second source, where the training data from the first source may be RGB images, and the training data from the second source may be gray images.
Optionally, when the terminal obtains the training instruction of the model to be trained, the terminal may obtain the data set of the model to be trained from the local storage space or the server, or the terminal may also obtain the data set by receiving the data set of the model to be trained sent by the terminal.
The manner of acquiring the data set of the model to be trained may be selected according to practical situations, and the embodiments of the present application are not limited herein.
Because when the quantity of training data of different sources differs more, the training effect of the model to be trained is reduced. Thus, in some embodiments, the process of acquiring a data set of a model to be trained may be:
acquiring an initial data set, wherein the initial data set comprises training data of at least two sources;
determining the amount of training data for each source;
screening out sources corresponding to the number meeting the preset number condition from the data set to obtain candidate sources;
performing incremental processing on training data corresponding to the candidate sources to obtain adjusted training data of the candidate sources;
and determining a data set of the model to be trained according to the adjusted training data of the candidate source and the training data of the source.
The sources corresponding to the number satisfying the preset number condition may be sources that differ more from the number of training data from other sources in the initial dataset. For example, the initial training set includes training data of a first source, training data of a second source, and training data of a third source, the number of the training data of the first source is 1 thousand, the number of the training data of the second source is 10, and the number of the training data of the third source is 800.
In an embodiment of the present application, an initial data set is obtained, the initial data set including training data of at least two sources; determining the amount of training data for each source; screening out sources corresponding to the number meeting the preset number condition from the data set to obtain candidate sources; performing incremental processing on training data corresponding to the candidate sources to obtain adjusted training data of the candidate sources; according to the adjusted training data of the candidate sources and the training data of the sources, a data set of the model to be trained is determined, so that the quantity of the training data of different sources in the data set is not greatly different, and the training effect of the model to be trained, which is trained according to the training data of different sources, is improved. The training effect of the model to be trained can be selected according to actual conditions, for example, the training effect of the model to be trained can be understood as at least one of convergence speed of the model to be trained, recall rate, accuracy rate and AP value of the model to be trained, and the embodiment of the application is not limited herein.
Where incremental processing may refer to processing to increase the amount of training data of the candidate source. The manner of incremental processing may be selected according to practical situations, and embodiments of the present application are not limited herein.
For example, the process of performing incremental processing on the training data corresponding to the candidate source to obtain the adjusted training data of the candidate source may be:
acquiring a preset data augmentation strategy;
and performing incremental processing on training data corresponding to the candidate sources according to a preset data augmentation strategy to obtain adjusted training data of the candidate sources.
The preset data augmentation policy may be selected according to actual situations, for example, when the training data is an image, the preset data augmentation policy may be clipping, flipping, shrinking or amplifying, and when the training data is a text, the preset data augmentation policy may be word replacement or sequential replacement, which is not limited herein.
In the embodiment of the application, incremental processing is performed on training data corresponding to a candidate source according to a preset data augmentation strategy, so as to obtain adjusted training data of the candidate source, so that the number of training data of the candidate source is increased, and the number of training data of the candidate source is not different from the number of training sample data of other sources in an initial data set.
For another example, the incremental processing is performed on the training data corresponding to the candidate source, and the process of obtaining the adjusted training data of the candidate source may also be:
screening sources meeting preset fusion conditions from the data set to obtain sources to be fused;
and carrying out fusion processing on training data corresponding to the candidate sources and training data of the sources to be fused to obtain adjusted training data of the candidate sources.
The source satisfying the preset fusion condition may be determined according to at least one of the number of training data of the source and the type of the source (the type of the source refers to the generation time or the generation place).
For example, the source satisfying the preset fusion condition may be a source where the difference between the number and the number of training data of the candidate source satisfies the preset number difference condition, and the difference satisfying the preset number difference condition may be the minimum difference or the second small difference, which is not limited herein.
For example, the initial training set includes training data of a first source, training data of a second source, and training data of a third source, the number of training data of the first source is 1 thousand, the number of training data of the second source is 10, the number of training data of the third source is 800, the second source is a candidate source, at this time, the difference between the number of training data of the third source and the number of training data of the second source is the smallest, and the third source is a source to be fused.
Alternatively, when the source is the generation time, the source of the preset fusion condition may be a source whose time difference from the candidate source satisfies the preset time condition. The time difference satisfying the preset time condition may be the smallest time difference or the second smallest time difference, which is not limited herein.
For example, the initial training set includes training data at time t1, training data at time t2, and training data at time t3 (the training data at time t1, the training data at time t2, and the training data at time t3 may be data generated by the same device or data generated by different devices), where time t3 is a candidate source, and when time t2 is closest to time t3, time t2 is taken as a source to be fused.
Alternatively, when the source is a generation place, the source of the preset fusion condition may be a source whose distance difference from the candidate source satisfies the preset distance condition. The distance difference value satisfying the preset distance condition may be the largest distance difference value or the second largest distance difference value, which is not limited herein in the embodiment of the present application.
For example, the initial training set includes training data generated by the photographing device of the place d1 at the time t1, training data generated by the photographing device of the place d2 at the time t1, and training data generated by the photographing device of the place d3 at the time t1, where the photographing device of the place d3 is a candidate source, and at this time, the distance between the photographing device of the place d1 and the photographing device of the place d3 is the farthest, and the photographing device of the place d1 is the source to be fused.
After obtaining the source to be fused, the terminal performs merging processing on the training sample of the source to be fused and the training data of the candidate source to obtain the adjusted training data of the candidate source.
The terminal may use the training data of the source to be fused as training data of the candidate source to increase the number of training data corresponding to the candidate source, or may use the training data of the candidate source as training data of the source to be fused to increase the number of training data corresponding to the candidate source.
Although the number of training data of the candidate sources can be increased through the preset data augmentation strategy, the labels of the training data of the candidate sources are not increased, so that in order to further improve the training efficiency of the model to be trained, in the embodiment of the application, sources meeting the preset fusion condition are screened out from the data set, and the sources to be fused are obtained; and carrying out fusion processing on the training data corresponding to the candidate sources and the training data of the sources to be fused to obtain the adjusted training data of the candidate sources, so that the number of the training data of the candidate sources is increased, the labels of the training data of the candidate sources can be increased, and the training effect of a model to be trained, which is trained according to the training data of the candidate sources, is improved.
S102, extracting features of training data of each source to obtain sample features corresponding to the training data of each source.
The terminal can extract the characteristics of the training data of each source through the model to be trained, and sample characteristics corresponding to the training data of each source are obtained.
For example, as shown in fig. 2, at least two sources include source 1, source 2, and source n, and training data of source 1, source 2, source n are input into a model to be trained to perform feature extraction, so as to obtain sample features corresponding to the training data of source 1, source 2, source n.
S103, determining a loss value of the model to be trained for each source according to sample characteristics corresponding to training data of each source.
For example, as shown in fig. 2, at least two sources include source 1, source 2, and source n, training data of source 1, source 2, source n is input into a model to be trained to perform feature extraction, sample features corresponding to the training data of source 1, source 2, source n are obtained, a loss value corresponding to source 1 is calculated according to the sample features corresponding to the training data of source 1, a loss value corresponding to source 2 is calculated according to the sample features corresponding to the training data of source 2, and a loss value corresponding to source n is calculated according to the sample features corresponding to the training data of source n.
The terminal can predict a prediction type label corresponding to the training data of each source according to sample characteristics corresponding to the training data of each source through the model to be trained, and then calculate a loss value of the model to be trained for each source according to the prediction label and a real type label of the training data of each source.
Optionally, the manner of calculating the loss value of the model to be trained for each source according to the sample characteristics corresponding to the training data of each source may be selected according to practical situations, for example, the loss value of the model to be trained for each source may be calculated according to the sample characteristics corresponding to the training data of each source through a square loss function or a cross entropy loss function, which is not limited herein in the embodiments of the present application.
Alternatively, the manner in which the loss value for each source is calculated may or may not be the same. For example, the at least two sources may include a first source and a second source, the loss value of the first source and the loss value of the second source each calculated by a cross entropy loss function. For another example, the at least two sources may include a first source whose loss value is calculated by a square loss function and a second source whose loss value is calculated by a cross entropy loss function.
It should be noted that if the model to be trained includes a normalization layer (the normalization layer may be Softmax, for example), the training data that may be one source corresponds to one normalization layer.
S104, training the model to be trained according to the loss value of each source to obtain a target model after training the model to be trained.
After obtaining the loss value of each source, the terminal can judge whether the loss value of each source meets a preset loss condition, if the loss value of each source does not meet the preset loss condition, the model parameters of the model to be trained are updated according to the loss value of each source to obtain an updated model, the updated model is used as the model to be trained, the step of extracting the characteristics of training data of each source is carried out to obtain sample characteristics corresponding to the training data of each source is carried out, and if the loss value of each source meets the preset loss condition, the model to be trained is used as the target model.
The determining unit may determine that the loss value of each source satisfies the preset loss condition when the loss value of at least one source satisfies the preset loss condition, or determine that the loss value of each source satisfies the preset loss condition when the loss values of all sources in the data set satisfy the preset loss condition, or determine that the loss value of each source satisfies the preset loss condition when the target loss value obtained by adding the loss values of all sources in the data set satisfies the preset loss condition.
Or, the terminal may also determine whether the training frequency meets the preset frequency condition after obtaining the loss value of each source, if the training frequency does not meet the preset frequency condition, update the model parameters of the model to be trained according to the loss value of each source to obtain an updated model, take the updated model as the model to be trained, and return to execute the step of extracting the characteristics of the training data of each source to obtain the sample characteristics corresponding to the training data of each source, and if the training frequency meets the preset training frequency, take the model to be trained as the target model.
The function of the target model may be set according to actual situations, for example, the target model may be used for classification, segmentation, text extraction, and the like, which is not limited herein.
For example, when the object model is used for classifying the image, after obtaining the object model, it may further include;
and acquiring the image to be identified, and classifying the image to be identified through the target model to obtain the category of the image to be identified.
In some embodiments, according to the loss value of each source, training the model to be trained, and the process of obtaining the target model after training the model to be trained may be:
carrying out fusion treatment on the loss values of each source to obtain a target loss value;
and training the model to be trained according to the target loss value to obtain a target model of which the training of the model to be trained is completed.
If the target loss value does not meet the preset loss condition, updating the model parameters of the model to be trained according to the target loss value to obtain an updated model, taking the updated model as the model to be trained, and returning to execute feature extraction on the training data of each source to obtain sample features corresponding to the training data of each source, and if the target loss value meets the preset loss condition, taking the model to be trained as the target model.
In other embodiments, the training of the model to be trained according to the loss value of each source may also be performed by:
acquiring an updating sequence corresponding to each source;
and training the model to be trained according to the loss value of each source through updating sequence to obtain a target model after the training of the model to be trained is completed.
If the training frequency does not meet the preset frequency condition or the loss value of each source does not meet the preset loss condition, the terminal obtains the update sequence corresponding to each source, then screens out the target source from each source according to the update sequence, updates the model parameters of the model to be trained according to the loss value of the target source to obtain a candidate model, takes the candidate model as the model to be trained, and returns to execute the step of screening out the target source from each source according to the update sequence until each source is screened out, and takes the candidate model as the updated model.
For example, each source includes a first source and a second source, the update sequence of the first source is earlier than the update sequence of the second source, the first source is first used as a target source according to the update sequence, the model parameters of the model to be trained are updated according to the loss value of the target source, the candidate model after the first update is obtained, then the second source is used as a target source according to the update sequence, the model parameters of the candidate model after the first update are updated according to the loss value of the target source, the candidate model after the second update is obtained, at this time, each source is screened, and the candidate model after the second update is used as the model after the update.
The manner of obtaining the update sequence corresponding to each source may be selected according to the actual situation, which is not limited in this embodiment of the present application.
For example, the process of obtaining the update sequence corresponding to each source may be:
acquiring a preset random rule;
and generating an updating sequence corresponding to each source according to a preset random rule.
In the embodiment of the application, the preset random rule is obtained, and then the update sequence corresponding to each source is generated according to the preset random rule, so that the update sequence corresponding to each source is not the same every time the model to be trained is trained, and the training effect of the model to be trained is further improved.
For another example, the process of obtaining the update sequence corresponding to each source may also be:
acquiring a history updating sequence corresponding to each source;
and determining the updating sequence corresponding to each source according to the historical updating sequence.
The historical update sequence corresponding to each source may be the update sequence of each source in the model to be trained in the last training. The terminal may directly modify the historical update sequence of the source to obtain the update sequence corresponding to the source, or the terminal may advance the historical update sequence corresponding to each source by one bit or retract the historical update sequence corresponding to each source by one bit to obtain the update sequence corresponding to each source.
For example, each source includes a first source, a second source and a third source, the history update sequence of each source is that the first source precedes the second source, the second source precedes the third source, if the history update sequence of each source is advanced by one bit, the update sequence of each source is that the second source precedes the third source, the third source precedes the first source, if the history update sequence of each source is advanced by one bit, the update sequence of each source is that the third source precedes the first source, and the first source precedes the second source.
For another example, when each source has an arrangement order in the data set, the arrangement order may be used as the update order, or the order in which training data of each source is input into the model to be trained may be used as the update order.
In other embodiments, the terminal may determine whether the data set includes at least two sources of training data, and when the data set does not include at least two sources of training data, may train the model to be trained according to a training method in the related art, and when the data set includes at least two sources of training data, train the model to be trained according to the model method provided in the embodiments of the present application.
From the above, in the embodiment of the present application, a data set of a model to be trained is obtained, where the data set includes training data of at least two sources; extracting features of the training data of each source to obtain sample features corresponding to the training data of each source; determining a loss value of the model to be trained for each source according to sample characteristics corresponding to training data of each source; according to the loss values of each source, training the model to be trained to obtain a target model after training the model to be trained, calculating the loss values corresponding to each source according to the training data of each source, and training the model to be trained according to the loss values of each source, so that cluster combination of the training data of different sources is not needed, noise is not introduced, the workload of cluster combination is reduced, and the training efficiency and training effect of the model to be trained are improved.
In order to facilitate better implementation of the model training method provided by the embodiment of the application, the embodiment of the application also provides a device based on the model training method. Where the meaning of the nouns is the same as in the model training method described above, specific implementation details may be referred to in the description of the method embodiments.
For example, as shown in fig. 3, the model training apparatus may include:
an acquisition module 301, configured to acquire a data set of a model to be trained, where the data set includes training data of at least two sources.
The extracting module 302 is configured to perform feature extraction on the training data of each source, so as to obtain sample features corresponding to the training data of each source.
The determining module 303 is configured to determine a loss value of the model to be trained for each source according to the sample feature corresponding to the training data of each source.
The training module 304 is configured to train the model to be trained according to the loss value of each source, so as to obtain a target model after training the model to be trained.
Optionally, the training module 304 is specifically configured to perform:
acquiring an updating sequence corresponding to each source;
and training the model to be trained according to the loss value of each source through updating sequence to obtain a target model after the training of the model to be trained is completed.
Optionally, the training module 304 is specifically configured to perform:
acquiring a preset random rule;
and generating an updating sequence corresponding to each source according to a preset random rule.
Optionally, the training module 304 is specifically configured to perform:
acquiring a history updating sequence corresponding to each source;
and determining the updating sequence corresponding to each source according to the historical updating sequence.
Optionally, the obtaining module 301 is specifically configured to perform:
acquiring an initial data set, wherein the initial data set comprises training data of at least two sources;
determining the amount of training data for each source;
screening out sources corresponding to the number meeting the preset number condition from the data set to obtain candidate sources;
performing incremental processing on training data corresponding to the candidate sources to obtain adjusted training data of the candidate sources;
and determining a data set of the model to be trained according to the adjusted training data of the candidate source and the training data of the source.
Optionally, the obtaining module 301 is specifically configured to perform:
screening sources meeting preset fusion conditions from the data set to obtain sources to be fused;
and carrying out fusion processing on training data corresponding to the candidate sources and training data of the sources to be fused to obtain adjusted training data of the candidate sources.
Optionally, the obtaining module 301 is specifically configured to perform:
acquiring a preset data augmentation strategy;
and performing incremental processing on training data corresponding to the candidate sources according to a preset data augmentation strategy to obtain adjusted training data of the candidate sources.
In the specific implementation, each module may be implemented as an independent entity, or may be combined arbitrarily, and implemented as the same entity or a plurality of entities, and the specific implementation and the corresponding beneficial effects of each module may be referred to the foregoing method embodiments, which are not described herein again.
The embodiment of the application also provides an electronic device, which may be a server or a terminal, as shown in fig. 4, and shows a schematic structural diagram of the electronic device according to the embodiment of the application, specifically:
the electronic device may include one or more processing cores 'processors 401, one or more computer-readable storage media's memory 402, power supply 403, and input unit 404, among other components. Those skilled in the art will appreciate that the electronic device structure shown in fig. 4 is not limiting of the electronic device and may include more or fewer components than shown, or may combine certain components, or may be arranged in different components. Wherein:
the processor 401 is a control center of the electronic device, connects various parts of the entire electronic device using various interfaces and lines, performs various functions of the electronic device and processes data by running or executing computer programs and/or modules stored in the memory 402, and calling data stored in the memory 402. Optionally, processor 401 may include one or more processing cores; preferably, the processor 401 may integrate an application processor and a modem processor, wherein the application processor mainly processes an operating system, a user interface, an application program, etc., and the modem processor mainly processes wireless communication. It will be appreciated that the modem processor described above may not be integrated into the processor 401.
The memory 402 may be used to store computer programs and modules, and the processor 401 executes various functional applications and data processing by executing the computer programs and modules stored in the memory 402. The memory 402 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, a computer program (such as a sound playing function, an image playing function, etc.) required for at least one function, and the like; the storage data area may store data created according to the use of the electronic device, etc. In addition, memory 402 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage device. Accordingly, the memory 402 may also include a memory controller to provide the processor 401 with access to the memory 402.
The electronic device further comprises a power supply 403 for supplying power to the various components, preferably the power supply 403 may be logically connected to the processor 401 by a power management system, so that functions of managing charging, discharging, and power consumption are performed by the power management system. The power supply 403 may also include one or more of any of a direct current or alternating current power supply, a recharging system, a power failure detection circuit, a power converter or inverter, a power status indicator, and the like.
The electronic device may further comprise an input unit 404, which input unit 404 may be used for receiving input digital or character information and generating keyboard, mouse, joystick, optical or trackball signal inputs in connection with user settings and function control.
Although not shown, the electronic device may further include a display unit or the like, which is not described herein. In particular, in this embodiment, the processor 401 in the electronic device loads executable files corresponding to the processes of one or more computer programs into the memory 402 according to the following instructions, and the processor 401 executes the computer programs stored in the memory 402, so as to implement various functions, for example:
acquiring a data set of a model to be trained, wherein the data set comprises training data of at least two sources;
extracting features of the training data of each source to obtain sample features corresponding to the training data of each source;
determining a loss value of the model to be trained for each source according to sample characteristics corresponding to training data of each source;
and training the model to be trained according to the loss value of each source to obtain a target model after the training of the model to be trained is completed.
The specific embodiments and the corresponding beneficial effects of the above operations can be referred to the detailed description of the model training method, and are not described herein.
It will be appreciated by those of ordinary skill in the art that all or part of the steps of the various methods of the above embodiments may be performed by a computer program, or by computer program control related hardware, which may be stored in a computer readable storage medium and loaded and executed by a processor.
To this end, embodiments of the present application provide a computer readable storage medium having stored therein a computer program that is capable of being loaded by a processor to perform the steps of any of the model training methods provided by embodiments of the present application. For example, the computer program may perform the steps of:
acquiring a data set of a model to be trained, wherein the data set comprises training data of at least two sources;
extracting features of the training data of each source to obtain sample features corresponding to the training data of each source;
determining a loss value of the model to be trained for each source according to sample characteristics corresponding to training data of each source;
and training the model to be trained according to the loss value of each source to obtain a target model after the training of the model to be trained is completed.
The specific embodiments and the corresponding beneficial effects of each of the above operations can be found in the foregoing embodiments, and are not described herein again.
Wherein the computer-readable storage medium may comprise: read Only Memory (ROM), random access Memory (RAM, random Access Memory), magnetic or optical disk, and the like.
Because the computer program stored in the computer readable storage medium may execute the steps in any model training method provided in the embodiments of the present application, the beneficial effects that any model training method provided in the embodiments of the present application may be achieved, which are detailed in the previous embodiments and are not described herein.
Among other things, according to one aspect of the present application, a computer program product or computer program is provided that includes computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions to cause the computer device to perform the model training method described above.
The foregoing describes in detail a model training method, apparatus, electronic device and computer storage medium provided in the embodiments of the present application, and specific examples are applied to illustrate the principles and embodiments of the present application, where the foregoing examples are only used to help understand the method and core idea of the present application; meanwhile, those skilled in the art will have variations in the specific embodiments and application scope in light of the ideas of the present application, and the present description should not be construed as limiting the present application in view of the above.

Claims (10)

1. A method of model training, comprising:
acquiring a data set of a model to be trained, wherein the data set comprises training data of at least two sources;
extracting features of the training data of each source to obtain sample features corresponding to the training data of each source;
determining a loss value of the model to be trained for each source according to sample characteristics corresponding to the training data of each source;
and training the model to be trained according to the loss value of each source to obtain a target model after the training of the model to be trained is completed.
2. The model training method according to claim 1, wherein training the model to be trained according to the loss value of each source to obtain a target model after training the model to be trained, comprises:
acquiring an updating sequence corresponding to each source;
and training the model to be trained according to the loss value of each source through the updating sequence to obtain a target model after the training of the model to be trained is completed.
3. The model training method of claim 2, wherein said obtaining an update sequence for each of said sources comprises:
acquiring a preset random rule;
and generating an updating sequence corresponding to each source according to the preset random rule.
4. The model training method of claim 2, wherein said obtaining an update sequence for each of said sources comprises:
acquiring a history updating sequence corresponding to each source;
and determining the updating sequence corresponding to each source according to the historical updating sequence.
5. The model training method of claim 1, wherein the acquiring the data set of the model to be trained comprises:
acquiring an initial data set, wherein the initial data set comprises training data of at least two sources;
determining the amount of training data for each of said sources;
screening out sources corresponding to the number meeting the preset number condition from the data set to obtain candidate sources;
performing incremental processing on training data corresponding to the candidate sources to obtain adjusted training data of the candidate sources;
and determining the data set of the model to be trained according to the adjusted training data of the candidate source and the training data of the source.
6. The method for training a model according to claim 5, wherein the performing incremental processing on the training data corresponding to the candidate source to obtain the adjusted training data of the candidate source includes:
screening out sources meeting preset fusion conditions from the data set to obtain sources to be fused;
and carrying out fusion processing on the training data corresponding to the candidate source and the training data of the source to be fused to obtain the adjusted training data of the candidate source.
7. The method for training a model according to claim 5, wherein the performing incremental processing on the training data corresponding to the candidate source to obtain the adjusted training data of the candidate source includes:
acquiring a preset data augmentation strategy;
and performing incremental processing on training data corresponding to the candidate sources according to the preset data augmentation strategy to obtain adjusted training data of the candidate sources.
8. A model training device, comprising:
the acquisition module is used for acquiring a data set of a model to be trained, wherein the data set comprises training data of at least two sources;
the extraction module is used for extracting the characteristics of the training data of each source to obtain sample characteristics corresponding to the training data of each source;
the determining module is used for determining a loss value of the model to be trained for each source according to sample characteristics corresponding to the training data of each source;
and the training module is used for training the model to be trained according to the loss value of each source to obtain a target model after the training of the model to be trained is completed.
9. An electronic device comprising a processor and a memory, the memory storing a computer program, the processor being configured to execute the computer program in the memory to perform the model training method of any of claims 1 to 7.
10. A computer readable storage medium, characterized in that the computer readable storage medium stores a computer program adapted to be loaded by a processor for performing the model training method of any of claims 1 to 7.
CN202311036665.3A 2023-08-16 2023-08-16 Model training method, device, electronic equipment and computer storage medium Pending CN117726917A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311036665.3A CN117726917A (en) 2023-08-16 2023-08-16 Model training method, device, electronic equipment and computer storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311036665.3A CN117726917A (en) 2023-08-16 2023-08-16 Model training method, device, electronic equipment and computer storage medium

Publications (1)

Publication Number Publication Date
CN117726917A true CN117726917A (en) 2024-03-19

Family

ID=90207569

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311036665.3A Pending CN117726917A (en) 2023-08-16 2023-08-16 Model training method, device, electronic equipment and computer storage medium

Country Status (1)

Country Link
CN (1) CN117726917A (en)

Similar Documents

Publication Publication Date Title
CN110196908A (en) Data classification method, device, computer installation and storage medium
US11853352B2 (en) Method and apparatus for establishing image set for image recognition, network device, and storage medium
CN111741330A (en) Video content evaluation method and device, storage medium and computer equipment
CN111382190B (en) Object recommendation method and device based on intelligence and storage medium
CN111291618B (en) Labeling method, labeling device, server and storage medium
WO2022028147A1 (en) Image classification model training method and apparatus, computer device, and storage medium
CN116049397A (en) Sensitive information discovery and automatic classification method based on multi-mode fusion
CN113837307A (en) Data similarity calculation method and device, readable medium and electronic equipment
CN116383521B (en) Subject word mining method and device, computer equipment and storage medium
CN112269875A (en) Text classification method and device, electronic equipment and storage medium
CN116975622A (en) Training method and device of target detection model, and target detection method and device
CN116978028A (en) Video processing method, device, electronic equipment and storage medium
CN113824989B (en) Video processing method, device and computer readable storage medium
CN116978087A (en) Model updating method, device, equipment, storage medium and program product
CN113110804B (en) Duplicate picture deleting method, device, equipment and storage medium
CN117726917A (en) Model training method, device, electronic equipment and computer storage medium
CN116415624A (en) Model training method and device, and content recommendation method and device
CN115168609A (en) Text matching method and device, computer equipment and storage medium
CN111538859B (en) Method and device for dynamically updating video tag and electronic equipment
CN111091198B (en) Data processing method and device
CN114627343A (en) Deep learning model training method, image processing method, device and equipment
CN115712719A (en) Data processing method, data processing device, computer readable storage medium and computer equipment
CN116992031B (en) Data processing method, device, electronic equipment, storage medium and program product
CN117725959A (en) Data updating method, device, electronic equipment and computer storage medium
CN117726893A (en) Sample enhancement method, device, electronic equipment and computer storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination