CN117454161A - Network model training method, device, equipment and storage medium - Google Patents

Network model training method, device, equipment and storage medium Download PDF

Info

Publication number
CN117454161A
CN117454161A CN202210796594.6A CN202210796594A CN117454161A CN 117454161 A CN117454161 A CN 117454161A CN 202210796594 A CN202210796594 A CN 202210796594A CN 117454161 A CN117454161 A CN 117454161A
Authority
CN
China
Prior art keywords
sample data
multiplexing
enhancement
data
enhanced
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210796594.6A
Other languages
Chinese (zh)
Inventor
陈小军
赵江枫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen University
Original Assignee
Shenzhen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen University filed Critical Shenzhen University
Priority to CN202210796594.6A priority Critical patent/CN117454161A/en
Publication of CN117454161A publication Critical patent/CN117454161A/en
Pending legal-status Critical Current

Links

Landscapes

  • Image Analysis (AREA)

Abstract

The invention discloses a training sample construction method, device, equipment and storage medium. The method comprises the following steps: acquiring at least two sample data sets; the sample data set comprises original sample data and enhanced sample data corresponding to the original sample data; performing feature enhancement on at least two sample data sets to obtain multiplexed sample data; inputting the multiplexing sample data into a preset service model to obtain multiplexing sample latent features generated in the training process of the service model based on the multiplexing sample data; performing feature decomposition on the multiplexing sample latent features to obtain enhanced latent features; and iteratively updating the business model based on the loss function corresponding to the business category according to the enhanced latent feature and the business category to which the business model belongs. The embodiment of the invention solves the problem of sparse model parameters and improves the training efficiency of the network model.

Description

Network model training method, device, equipment and storage medium
Technical Field
The present invention relates to the field of data processing technologies, and in particular, to a method, an apparatus, a device, and a storage medium for training a network model.
Background
With the increase of the scale of the neural network, the phenomenon of sparse parameters can occur in the conventional network model in the training process. The prior art generally adopts the techniques of model structure search, model pruning and the like to solve the problem. However, in the process of solving the parameter sparseness problem, a great deal of time is required to be consumed, so that the training efficiency of the network model is low.
Disclosure of Invention
The invention provides a network model training method, device, equipment and storage medium, which are used for solving the problem of model parameter sparseness and improving the training efficiency of a network model.
According to an aspect of the present invention, there is provided a network model training method, the method comprising:
acquiring at least two sample data sets; the sample data set comprises original sample data and enhanced sample data corresponding to the original sample data;
performing feature enhancement on the at least two sample data sets to obtain multiplexed sample data;
inputting the multiplexing sample data into a preset service model to obtain multiplexing sample latent features generated by the service model in the training process based on the multiplexing sample data;
performing feature decomposition on the multiplexing sample latent features to obtain enhanced latent features;
And iteratively updating the service model based on a loss function corresponding to the service class according to the enhanced latent feature and the service class to which the service model belongs.
According to another aspect of the present invention, there is provided a network model training apparatus, the apparatus comprising:
a sample data set acquisition module for acquiring at least two sample data sets; the sample data set comprises original sample data and enhanced sample data corresponding to the original sample data;
the multiplexing sample data determining module is used for carrying out characteristic enhancement on the at least two sample data sets to obtain multiplexing sample data;
the sample latent feature generation model is used for inputting the multiplexing sample data into a preset service model to obtain multiplexing sample latent features generated by the service model in the training process based on the multiplexing sample data;
the enhanced latent feature generation module is used for carrying out feature decomposition on the multiplexed sample latent features to obtain enhanced latent features;
and the iteration updating module is used for iteratively updating the service model based on a loss function corresponding to the service category according to the enhanced latent feature and the service category to which the service model belongs.
According to another aspect of the present invention, there is provided an electronic apparatus including:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the network model training method of any one of the embodiments of the present invention.
According to another aspect of the present invention, there is provided a computer readable storage medium storing computer instructions for causing a processor to implement a network model training method according to any of the embodiments of the present invention when executed.
The embodiment of the application obtains at least two sample data sets; the sample data set comprises original sample data and enhanced sample data corresponding to the original sample data; performing feature enhancement on at least two sample data sets to obtain multiplexed sample data; inputting the multiplexing sample data into a preset service model to obtain multiplexing sample latent features generated in the training process of the service model based on the multiplexing sample data; carrying out feature decomposition on the multiplexing sample latent features to obtain enhanced latent features; and iteratively updating the business model based on the loss function corresponding to the business category according to the enhanced latent feature and the business category to which the business model belongs. According to the scheme, the multiplexing sample data of the sample data set is obtained through multiplexing the sample data set, the mode of decomposing the latent features obtained through model training is multiplexed, the parameter sparsity problem of the neural network is greatly relieved, and meanwhile, the influence of insufficient sample data on the network model is reduced. Meanwhile, the multipath sample data sets are processed, corresponding multiplexed sample data are obtained and input into the network model, structural improvement on the network model is not needed, and model training efficiency is improved.
It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the invention or to delineate the scope of the invention. Other features of the present invention will become apparent from the description that follows.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a network model training method according to a first embodiment of the present invention;
FIG. 2A is a flowchart of a network model training method according to a second embodiment of the present invention;
fig. 2B is a schematic diagram of a network model training method according to a second embodiment of the present invention;
fig. 3 is a schematic structural diagram of a network model training device according to a third embodiment of the present invention;
fig. 4 is a schematic structural diagram of an electronic device implementing a network model training method according to an embodiment of the present invention.
Detailed Description
In order that those skilled in the art will better understand the present invention, a technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.
It should be noted that the terms "first," "second," and the like in the description and the claims of the present invention and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the invention described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
Example 1
Fig. 1 is a flowchart of a network model training method according to an embodiment of the present invention, where the method may be performed by a network model training device, the network model training device may be implemented in hardware and/or software, and the network model training device may be configured in an electronic device. As shown in fig. 1, the method includes:
s110, acquiring at least two sample data sets; the sample data set comprises original sample data and enhanced sample data corresponding to the original sample data.
The enhanced sample data may be sample data obtained by performing data enhancement on the original sample data by adopting a data enhancement strategy. For example, different data enhancement strategies may be employed based on the different data forms of the raw sample data. For example, if the data form of the original sample data is a picture form, the corresponding data enhancement policy may be a picture geometry transformation.
Alternatively, one piece of original sample data may correspond to at least one piece of enhanced sample data, and may specifically relate to a preset data enhancement policy. For example, if three different data enhancement policies are preset, one original sample data may obtain three different enhanced sample data based on the three different data enhancement policies.
In an alternative embodiment, the enhanced sample data corresponding to the original sample data is determined in the following manner: according to at least one preset data enhancement mode, carrying out data enhancement on the original sample data to obtain at least one enhancement sample data corresponding to the original sample data.
The data enhancement mode can be a preset data enhancement strategy based on original sample data in different data forms by related technicians.
For example, if the related technician sets three data enhancement modes in advance according to the actual requirement, the data enhancement mode a, the data enhancement mode B and the data enhancement mode C are respectively set. The three data enhancement modes are adopted to conduct data enhancement on the original sample data respectively, so that enhanced sample data A obtained after the original sample data are subjected to data enhancement based on the data enhancement mode A, enhanced sample data B obtained after the original sample data are subjected to data enhancement based on the data enhancement mode B, and enhanced sample data C obtained after the original sample data are subjected to data enhancement based on the data enhancement mode C are obtained.
In a specific embodiment, if the related technician sets three data enhancement modes in advance according to the actual requirement, the three data enhancement modes are respectively a data enhancement mode a, a data enhancement mode B and a data enhancement mode C, and three sample data sets are required to be acquired, and the three sample data sets are respectively a first sample data set corresponding to the original sample data a, a second sample data set corresponding to the original sample data B and a third sample data set corresponding to the original sample data C. Correspondingly, the data enhancement is respectively carried out on the original sample data a, the original sample data b and the original sample data c based on three preset data enhancement modes. Obtaining a first sample data set: original sample data a, enhanced sample data a1, enhanced sample data a2, and enhanced sample data a3; the enhancement sample data a1 is obtained by carrying out data enhancement on the original sample data a in a data enhancement mode A; the enhanced sample data a2 is obtained by carrying out data enhancement on the original sample data a in a data enhancement mode B; the enhanced sample data a3 is obtained by data enhancement of the original sample data a by a data enhancement mode C. Second sample data set: original sample data b, enhanced sample data b1, enhanced sample data b2, and enhanced sample data b3; the enhancement sample data b1 is obtained by carrying out data enhancement on the original sample data b in a data enhancement mode A; the enhanced sample data B2 is obtained by carrying out data enhancement on the original sample data B in a data enhancement mode B; the enhanced sample data b3 is obtained by data enhancement of the original sample data b by a data enhancement mode C. Third sample data set: original sample data c, enhanced sample data c1, enhanced sample data c2, and enhanced sample data c3; the enhancement sample data c1 is obtained by carrying out data enhancement on the original sample data c in a data enhancement mode A; the enhanced sample data c2 is obtained by carrying out data enhancement on the original sample data c in a data enhancement mode B; the enhanced sample data C3 is obtained by data enhancement of the original sample data C by a data enhancement mode C.
And S120, performing feature enhancement on at least two sample data sets to obtain multiplexed sample data.
And respectively performing characteristic processing on at least two sample data groups. Any one of at least two sample data sets will be described as an example. And inputting the sample data of the group into a preset first characteristic multiplexing network to obtain multiplexing enhancement characteristic data corresponding to the sample data of the group. Wherein, the characteristic multiplexing network can be preset by the related technicians. For example, if the original sample data is data in the form of a picture, the feature multiplexing network may be CNN (Convolutional Neural Network ). The multiplexing enhancement feature data may be a result obtained by integrating the linear independent features corresponding to each sample data in the sample data set.
And determining multiplexing enhancement characteristic data corresponding to at least two sample data groups respectively. The determining process can be multi-path parallel simultaneous determination, and the determining efficiency can be improved. And inputting the multiplexing enhancement feature data corresponding to each sample data group into a preset second feature multiplexing network for feature enhancement, and carrying out weighted average on the feature data generated by each multiplexing enhancement feature data in the feature enhancement process by the feature multiplexing network to obtain multiplexing sample data. The second characteristic multiplexing network may be the same as or different from the first characteristic multiplexing network, and may be specifically preset according to actual requirements.
S130, inputting the multiplexed sample data into a preset service model to obtain the multiplexed sample latent features generated in the training process of the service model based on the multiplexed sample data.
The service model can be a neural network model to be trained, which is preset based on different service requirements. For example, the service model may be a network model with classification requirements, a network model for object detection, or a network model for text analysis, which is not limited in this embodiment.
The multiplexed sample data may be input to a preset service model, where the service model is trained based on the multiplexed sample data, to obtain a multiplexed sample latent feature corresponding to the multiplexed sample data generated in the training process.
In the model training process of the existing network model based on sample data, a training step length is usually preset, a large amount of sample data is trained in batches, and when the training step length is 1, 1 sample data is input into the network model at one time, namely, the network model is used for training samples in a sample training set one by one. The training step length of the scheme can be considered as 1 by inputting one piece of multiplexed sample data obtained by preprocessing into a preset service model.
And S140, performing feature decomposition on the multiplexing sample latent features to obtain enhanced latent features.
The multiplexing sample latent features may be input to a preset first feature decomposition network to obtain multiplexing enhanced latent features corresponding to the sample data sets respectively. And inputting the multiplexing enhanced latent features into a preset second feature decomposition network to obtain enhanced latent features corresponding to the multiplexing enhanced latent features respectively. The first feature decomposition network and the second feature decomposition network may be the same or different, and may be specifically preset by a relevant technician according to actual requirements. Both the first and second feature decomposition networks may be used to perform feature decomposition.
And S150, iteratively updating the business model based on a loss function corresponding to the business category according to the enhanced latent feature and the business category to which the business model belongs.
Wherein the service class to which the service model belongs may relate to the function implemented by the service model. For example, the business class to which the business model for classifying corresponds may be a classification task.
The loss function corresponding to the service class may be preset by a related technician before training the service model.
In an alternative embodiment, if the traffic class is a classification task, the traffic class corresponding loss function is a summation function of the contrast loss, classification loss, and consistency loss.
If the service class is a classification task, the loss function corresponding to the service class may be a loss function obtained by directly adding the comparison loss, the classification loss and the consistency loss, or may be a loss function obtained by adding weights. The loss functions corresponding to the contrast loss, the classification loss and the consistency loss may be preset by a relevant technician, for example, the classification loss may be a cross entropy loss function, which is not limited in this embodiment.
The method includes the steps that the enhanced latent features can be sent to different task heads according to the service category to which the service model belongs, prediction data output by the task heads are obtained, the service model is counter-propagated based on the prediction data and a preset loss function, and iterative updating of the service model is achieved. The task header may be output data that is desired, for example, for a classification model, a classification category or a classification probability value, or the like.
The embodiment of the application obtains at least two sample data sets; the sample data set comprises original sample data and enhanced sample data corresponding to the original sample data; performing feature enhancement on at least two sample data sets to obtain multiplexed sample data; inputting the multiplexing sample data into a preset service model to obtain multiplexing sample latent features generated in the training process of the service model based on the multiplexing sample data; carrying out feature decomposition on the multiplexing sample latent features to obtain enhanced latent features; and iteratively updating the business model based on the loss function corresponding to the business category according to the enhanced latent feature and the business category to which the business model belongs. According to the scheme, the multiplexing sample data of the sample data set is obtained through multiplexing the sample data set, the mode of decomposing the latent features obtained through model training is multiplexed, the parameter sparsity problem of the neural network is greatly relieved, and meanwhile, the influence of insufficient sample data on the network model is reduced. Meanwhile, the multipath sample data sets are processed, corresponding multiplexed sample data are obtained and input into the network model, structural improvement on the network model is not needed, and model training efficiency is improved.
Example two
Fig. 2A is a flowchart of a network model training method according to a second embodiment of the present invention, where optimization and improvement are performed based on the above technical solutions.
Further, the step of carrying out feature enhancement on at least two sample data sets to obtain multiplexed sample data is thinned into the step of carrying out feature enhancement on at least two sample data sets respectively to obtain multiplexed enhancement feature data corresponding to each sample data set; and carrying out characteristic enhancement on each multiplexing enhancement characteristic data to obtain the multiplexing sample data. In order to perfect the determination of the multiplexed sample data.
Further, the multiplexing sample latent features are subjected to feature decomposition in the step, so that the enhanced latent features are thinned to be subjected to feature decomposition, and the multiplexing enhanced latent features are obtained; and carrying out feature decomposition on the multiplexing enhanced latent features to obtain the enhanced latent features. In a manner that improves the determination of enhanced latent features.
As shown in fig. 2A, the method comprises the following specific steps:
s210, acquiring at least two sample data sets; the sample data set comprises original sample data and enhanced sample data corresponding to the original sample data.
And S220, respectively carrying out characteristic enhancement on at least two sample data sets to obtain multiplexing enhancement characteristic data corresponding to each sample data set.
In the feature enhancement of the sample data in each sample data group, in order to facilitate the subsequent feature mapping for each sample data, the position information encoding may be performed for the data of the sample data group.
In an alternative embodiment, the feature enhancement is performed on at least two sample data sets to obtain multiplexing enhancement feature data corresponding to each sample data set, including: respectively carrying out position information coding on at least two sample data sets to obtain coded sample data corresponding to each sample data set; and carrying out characteristic enhancement on each coded sample data to obtain multiplexing enhancement characteristic data.
The position information encoding method for each sample data in the sample data set may be, for example, one-hot encoding (one-hot encoding).
Any one of the at least two sample data sets is exemplified. The position information of the sample data in the sample data group within the group is encoded. Wherein the sample data in the sample data set includes original sample data and enhanced script data. And obtaining a coded sample data set after the sample data is coded, wherein the coded sample data set comprises coded sample data, and the coded sample data are respectively original sample data and enhanced sample data after the position information is coded.
And inputting the coded sample data set into a preset first characteristic multiplexing network, and obtaining the characteristic data mapped by each coded sample data by the first characteristic multiplexing network based on the coded sample data in the coded sample data set. For example, the encoded sample data in the set of encoded sample data is { x } 1 ,x 2 ,…,x n Then the obtained mapped characteristic data isThe resulting mapped feature data has linear independent features. And superposing the mapped characteristic data to obtain multiplexing enhancement characteristic data corresponding to the sample data set. For example, the manner of superposition may be direct addition, addition after weighting, or average after weighted summation, and specifically may be set according to actual requirements, which is not limited in this embodiment. The first feature multiplexing network can be specifically set according to actual requirements. For example, if the data form of the sample data set is a picture form, the corresponding first feature multiplexing network may be a CNN network.
For example, if the sample data in the encoded sample data set input to the feature multiplexing network is 1*n-dimensional encoded original sample data a, 1*n-dimensional encoded enhanced sample data b, and 1*n-dimensional encoded enhanced sample data c, the original sample data a, the enhanced sample data b, and the enhanced sample data c are input to the first feature multiplexing network, and then the first feature multiplexing network processes the input sample data, so that the feature vector a corresponding to the original sample data a, the feature vector b corresponding to the enhanced sample data b, and the feature vector c corresponding to the enhanced sample data c can be obtained respectively. And the linearity among the feature vector a, the feature vector b and the feature vector c is irrelevant. And superposing the feature vector a, the feature vector b and the feature vector c, and taking the superposed result as multiplexing enhancement feature data corresponding to the sample data set.
And S230, performing characteristic enhancement on each multiplexing enhancement characteristic data to obtain multiplexing sample data.
Illustratively, after the multiplexing enhancement feature data corresponding to at least two sample data sets are obtained, each multiplexing enhancement feature data is subjected to feature enhancement to obtain multiplexed sample data.
When the multiplexing enhancement feature data is subjected to feature enhancement, in order to facilitate the subsequent feature mapping for the multiplexing enhancement feature data, the position information encoding can be performed for the multiplexing enhancement feature data.
In an alternative embodiment, each multiplexing enhancement feature data is subjected to feature enhancement to obtain multiplexing sample data, which includes: position information coding is carried out on each multiplexing enhancement characteristic data to obtain coding characteristic data corresponding to each sample data group; and carrying out characteristic enhancement on each piece of coding characteristic data to obtain multiplexing sample data.
The method of encoding the position information of each piece of multiplexed enhancement feature data may be, for example, one-hot encoding.
Illustratively, the position information encoding is performed on each multiplexing enhancement feature data, so as to obtain encoding feature data corresponding to each multiplexing enhancement feature data. And inputting the coded characteristic data into a second characteristic multiplexing network in the form of element progenitor to obtain the characteristic data mapped by the coded characteristic data. And based on a certain weight distribution, carrying out weighted summation on the feature data mapped by each coded feature data, and then averaging to obtain final multiplexing sample data. The second feature multiplexing network may be the same as or different from the first feature multiplexing network, and may be specifically set by a relevant technician according to actual requirements.
S240, inputting the multiplexed sample data into a preset service model to obtain the multiplexed sample latent features generated in the training process of the service model based on the multiplexed sample data.
S250, performing feature decomposition on the multiplexing sample latent features to obtain multiplexing enhanced latent features.
Illustratively, the multiplexed sample latent features are input into a first feature decomposition network, and feature mapping is performed on the multiplexed sample latent features by the first feature decomposition network to obtain multiplexed enhanced latent features corresponding to each sample data set.
And S260, carrying out feature decomposition on the multiplexing enhanced latent features to obtain the enhanced latent features.
The multiplexing enhancement latent features are input into a second feature decomposition network, and feature mapping is carried out on the multiplexing enhancement latent features corresponding to each sample data set by the second feature decomposition network, so that the enhancement latent features corresponding to each sample position corresponding to the sample data set are obtained.
S270, according to the enhanced latent features and the service category to which the service model belongs, iteratively updating the service model based on the loss function corresponding to the service category.
According to the scheme of the embodiment, the multiplexing enhancement characteristic data corresponding to each sample data group is obtained by respectively carrying out characteristic enhancement on at least two sample data groups; performing feature enhancement on each multiplexing enhancement feature data to obtain multiplexing sample data, and performing feature decomposition on multiplexing sample latent features to obtain multiplexing enhancement latent features; and carrying out feature decomposition on the multiplexing enhanced latent features to obtain the enhanced latent features. The method and the device realize accurate determination of multiplexing sample data and enhancing latent features, solve the problem of parameter sparsity in the large-scale neural network training process, and simultaneously reduce the influence of insufficient sample data on a network model. The network model is not required to be structurally improved, and the model training efficiency is improved.
In one embodiment, a schematic diagram of a network model training method is shown in fig. 2B.
A plurality of sample data sets are collected and formed from the original data set and are respectively used for multi-path input and parallel processing. Each sample data group comprises original sample data and enhancement data obtained by carrying out data enhancement on the original sample data based on different data enhancement strategies. In this embodiment, three data enhancement strategies are used to perform data enhancement on the original sample data. For each path of input, the sample data group in the path is subjected to enhancement multiplexing, specifically, the sample data group is subjected to position information coding and then subjected to characteristic enhancement, so that multiplexing enhancement characteristic data corresponding to the sample data group is obtained. And carrying out sample multiplexing on the multiplexing enhancement characteristic data output by each path to obtain multiplexing sample data. Specifically, the multiplexing enhancement feature data corresponding to each path is subjected to position coding and then feature enhancement. And inputting the obtained multiplexing sample data to a main body part of the service model for model training to obtain multiplexing sample latent features output by the service model. And decomposing the multiplexing sample latent features to obtain multiplexing enhanced latent features respectively corresponding to the multiplexing inputs to form multiplexing outputs. And for each path of output, carrying out feature enhancement decomposition on the multiplexing enhancement latent features obtained by decomposition to obtain enhancement latent features respectively corresponding to each sample data set. And inputting the enhanced latent features into task heads related to the tasks to obtain model predicted values, and updating the service model by back propagation based on the model predicted values and a preset loss function. The task head may be preset by a relevant technician based on actual different requirements, and may be, for example, a classification task, a probability value determining task, and the like.
Example III
Fig. 3 is a schematic structural diagram of a network model training device according to a third embodiment of the present invention. The device for training the network model provided by the embodiment of the invention can be suitable for training a large-scale neural network model, and can be realized in a hardware and/or software mode, as shown in fig. 3, and specifically comprises: a sample data set acquisition module 301, a multiplexed sample data determination module 302, a sample latent feature generation model 303, an enhanced latent feature generation module 304, and an iterative update module 305. Wherein,
a sample data set acquisition module 301, configured to acquire at least two sample data sets; the sample data set comprises original sample data and enhanced sample data corresponding to the original sample data;
the multiplexed sample data determining module 302 is configured to perform feature enhancement on the at least two sample data sets to obtain multiplexed sample data;
the sample latent feature generating model 303 is configured to input the multiplexed sample data into a preset service model, so as to obtain a multiplexed sample latent feature generated by the service model in a training process based on the multiplexed sample data;
The enhanced latent feature generation module 304 is configured to perform feature decomposition on the multiplexed sample latent feature to obtain an enhanced latent feature;
and the iteration updating module 305 is configured to iteratively update the service model based on a loss function corresponding to the service class according to the enhanced latent feature and the service class to which the service model belongs.
The embodiment of the application obtains at least two sample data sets; the sample data set comprises original sample data and enhanced sample data corresponding to the original sample data; performing feature enhancement on at least two sample data sets to obtain multiplexed sample data; inputting the multiplexing sample data into a preset service model to obtain multiplexing sample latent features generated in the training process of the service model based on the multiplexing sample data; carrying out feature decomposition on the multiplexing sample latent features to obtain enhanced latent features; and iteratively updating the business model based on the loss function corresponding to the business category according to the enhanced latent feature and the business category to which the business model belongs. According to the scheme, the multiplexing sample data of the sample data set is obtained through multiplexing the sample data set, the mode of decomposing the latent features obtained through model training is multiplexed, the parameter sparsity problem of the neural network is greatly relieved, and meanwhile, the influence of insufficient sample data on the network model is reduced. Meanwhile, the multipath sample data sets are processed, corresponding multiplexed sample data are obtained and input into the network model, structural improvement on the network model is not needed, and model training efficiency is improved.
Optionally, the multiplexed sample data determining module 302 includes:
the multiplexing enhancement feature data determining unit is used for respectively carrying out feature enhancement on the at least two sample data sets to obtain multiplexing enhancement feature data corresponding to each sample data set;
and the multiplexing sample data determining unit is used for carrying out characteristic enhancement on each multiplexing enhancement characteristic data to obtain the multiplexing sample data.
Optionally, the multiplexing enhancement feature data determining unit includes:
the coding sample data determining subunit is used for respectively carrying out position information coding on at least two sample data sets to obtain coding sample data corresponding to each sample data set;
and the multiplexing enhancement characteristic data determining subunit is used for carrying out characteristic enhancement on each coded sample data to obtain the multiplexing enhancement characteristic data.
Optionally, the multiplexed sample data determining unit includes:
the coding characteristic data determining subunit is used for carrying out position information coding on each multiplexing enhancement characteristic data to obtain coding characteristic data corresponding to each sample data group;
and the multiplexing sample data determining subunit is used for carrying out characteristic enhancement on each piece of coding characteristic data to obtain the multiplexing sample data.
Optionally, the enhanced sample data corresponding to the original sample data is determined in the following manner:
and carrying out data enhancement on the original sample data according to at least one preset data enhancement mode to obtain at least one enhancement sample data corresponding to the original sample data.
Optionally, the enhanced latent feature generating module 304 includes:
the multiplexing enhancement latent feature determining unit is used for carrying out feature decomposition on the multiplexing sample latent feature to obtain multiplexing enhancement latent feature;
and the enhanced latent feature generation unit is used for carrying out feature decomposition on the multiplexing enhanced latent features to obtain the enhanced latent features.
Optionally, if the service class is a classification task, the service class corresponding loss function is a summation function of a comparison loss, a classification loss and a consistency loss.
The network model training device provided by the embodiment of the invention can execute the network model training method provided by any embodiment of the invention, and has the corresponding functional modules and beneficial effects of the execution method.
Example IV
Fig. 4 shows a schematic diagram of an electronic device 40 that may be used to implement an embodiment of the invention. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. Electronic equipment may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices (e.g., helmets, glasses, watches, etc.), and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the inventions described and/or claimed herein.
As shown in fig. 4, the electronic device 40 includes at least one processor 41, and a memory communicatively connected to the at least one processor 41, such as a Read Only Memory (ROM) 42, a Random Access Memory (RAM) 43, etc., in which the memory stores a computer program executable by the at least one processor, and the processor 41 may perform various suitable actions and processes according to the computer program stored in the Read Only Memory (ROM) 42 or the computer program loaded from the storage unit 48 into the Random Access Memory (RAM) 43. In the RAM 43, various programs and data required for the operation of the electronic device 40 may also be stored. The processor 41, the ROM 42 and the RAM 43 are connected to each other via a bus 44. An input/output (I/O) interface 45 is also connected to bus 44.
Various components in electronic device 40 are connected to I/O interface 45, including: an input unit 46 such as a keyboard, a mouse, etc.; an output unit 47 such as various types of displays, speakers, and the like; a storage unit 48 such as a magnetic disk, an optical disk, or the like; and a communication unit 49 such as a network card, modem, wireless communication transceiver, etc. The communication unit 49 allows the electronic device 40 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.
The processor 41 may be various general and/or special purpose processing components with processing and computing capabilities. Some examples of processor 41 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various processors running machine learning model algorithms, digital Signal Processors (DSPs), and any suitable processor, controller, microcontroller, etc. Processor 41 performs the various methods and processes described above, such as the network model training method.
In some embodiments, the network model training method may be implemented as a computer program tangibly embodied on a computer-readable storage medium, such as storage unit 48. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 40 via the ROM 42 and/or the communication unit 49. When the computer program is loaded into RAM 43 and executed by processor 41, one or more steps of the network model training method described above may be performed. Alternatively, in other embodiments, processor 41 may be configured to perform the network model training method in any other suitable manner (e.g., by means of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.
A computer program for carrying out methods of the present invention may be written in any combination of one or more programming languages. These computer programs may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the computer programs, when executed by the processor, cause the functions/acts specified in the flowchart and/or block diagram block or blocks to be implemented. The computer program may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of the present invention, a computer-readable storage medium may be a tangible medium that can contain, or store a computer program for use by or in connection with an instruction execution system, apparatus, or device. The computer readable storage medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. Alternatively, the computer readable storage medium may be a machine readable signal medium. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on an electronic device having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) through which a user can provide input to the electronic device. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), blockchain networks, and the internet.
The computing system may include clients and servers. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical hosts and VPS service are overcome.
It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps described in the present invention may be performed in parallel, sequentially, or in a different order, so long as the desired results of the technical solution of the present invention are achieved, and the present invention is not limited herein.
The above embodiments do not limit the scope of the present invention. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should be included in the scope of the present invention.

Claims (10)

1. A method for training a network model, comprising:
acquiring at least two sample data sets; the sample data set comprises original sample data and enhanced sample data corresponding to the original sample data;
performing feature enhancement on the at least two sample data sets to obtain multiplexed sample data;
inputting the multiplexing sample data into a preset service model to obtain multiplexing sample latent features generated by the service model in the training process based on the multiplexing sample data;
Performing feature decomposition on the multiplexing sample latent features to obtain enhanced latent features;
and iteratively updating the service model based on a loss function corresponding to the service class according to the enhanced latent feature and the service class to which the service model belongs.
2. The method of claim 1, wherein the characterizing the at least two sample data sets to obtain multiplexed sample data comprises:
respectively carrying out characteristic enhancement on the at least two sample data sets to obtain multiplexing enhancement characteristic data corresponding to each sample data set;
and carrying out characteristic enhancement on each multiplexing enhancement characteristic data to obtain the multiplexing sample data.
3. The method according to claim 2, wherein the performing feature enhancement on at least two sample data sets to obtain multiplexed enhancement feature data corresponding to each sample data set includes:
respectively carrying out position information coding on at least two sample data sets to obtain coded sample data corresponding to each sample data set;
and carrying out characteristic enhancement on each coded sample data to obtain the multiplexing enhancement characteristic data.
4. The method according to claim 2, wherein feature enhancing each of the multiplexed enhanced feature data to obtain the multiplexed sample data, comprises:
Position information coding is carried out on each multiplexing enhancement characteristic data to obtain coding characteristic data corresponding to each sample data group;
and carrying out characteristic enhancement on each piece of coding characteristic data to obtain the multiplexing sample data.
5. The method of claim 1, wherein the enhanced sample data corresponding to the original sample data is determined by:
and carrying out data enhancement on the original sample data according to at least one preset data enhancement mode to obtain at least one enhancement sample data corresponding to the original sample data.
6. The method of any of claims 1-5, wherein said subjecting the multiplexed sample latent features to feature decomposition results in enhanced latent features, comprising:
performing feature decomposition on the multiplexing sample latent features to obtain multiplexing enhanced latent features;
and carrying out feature decomposition on the multiplexing enhanced latent features to obtain enhanced latent features.
7. The method according to any of claims 1-5, wherein if the traffic class is a classification task, the traffic class correspondence loss function is a summation function of contrast loss, classification loss, and consistency loss.
8. A network model training apparatus, comprising:
a sample data set acquisition module for acquiring at least two sample data sets; the sample data set comprises original sample data and enhanced sample data corresponding to the original sample data;
the multiplexing sample data determining module is used for carrying out characteristic enhancement on the at least two sample data sets to obtain multiplexing sample data;
the sample latent feature generation model is used for inputting the multiplexing sample data into a preset service model to obtain multiplexing sample latent features generated by the service model in the training process based on the multiplexing sample data;
the enhanced latent feature generation module is used for carrying out feature decomposition on the multiplexed sample latent features to obtain enhanced latent features;
and the iteration updating module is used for iteratively updating the service model based on a loss function corresponding to the service category according to the enhanced latent feature and the service category to which the service model belongs.
9. An electronic device, the electronic device comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
The memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the network model training method of any of claims 1-7.
10. A computer readable storage medium storing computer instructions for causing a processor to implement the network model training method of any one of claims 1-7 when executed.
CN202210796594.6A 2022-07-06 2022-07-06 Network model training method, device, equipment and storage medium Pending CN117454161A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210796594.6A CN117454161A (en) 2022-07-06 2022-07-06 Network model training method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210796594.6A CN117454161A (en) 2022-07-06 2022-07-06 Network model training method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN117454161A true CN117454161A (en) 2024-01-26

Family

ID=89589726

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210796594.6A Pending CN117454161A (en) 2022-07-06 2022-07-06 Network model training method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN117454161A (en)

Similar Documents

Publication Publication Date Title
CN113792851B (en) Font generation model training method, font library building method, font generation model training device and font library building equipment
US20230186607A1 (en) Multi-task identification method, training method, electronic device, and storage medium
CN114429633B (en) Text recognition method, training method and device of model, electronic equipment and medium
CN112580732A (en) Model training method, device, equipment, storage medium and program product
CN115454706A (en) System abnormity determining method and device, electronic equipment and storage medium
CN116489038A (en) Network traffic prediction method, device, equipment and medium
CN115294397A (en) Classification task post-processing method, device, equipment and storage medium
CN113887630A (en) Image classification method and device, electronic equipment and storage medium
CN117521768A (en) Training method, device, equipment and storage medium of image search model
CN115880506B (en) Image generation method, model training method and device and electronic equipment
CN115759283A (en) Model interpretation method and device, electronic equipment and storage medium
CN115600607A (en) Log detection method and device, electronic equipment and medium
CN116359738A (en) Method, device, equipment and storage medium for monitoring health state of battery
CN115457365A (en) Model interpretation method and device, electronic equipment and storage medium
CN115359322A (en) Target detection model training method, device, equipment and storage medium
CN112784967B (en) Information processing method and device and electronic equipment
CN117454161A (en) Network model training method, device, equipment and storage medium
CN112528027A (en) Text classification method, device, equipment, storage medium and program product
CN113989152A (en) Image enhancement method, device, equipment and storage medium
CN113903071A (en) Face recognition method and device, electronic equipment and storage medium
CN115034388B (en) Determination method and device for quantization parameters of ranking model and electronic equipment
CN115496916B (en) Training method of image recognition model, image recognition method and related device
CN114037057B (en) Pre-training model generation method and device, electronic equipment and storage medium
CN115223177A (en) Text recognition method, device, equipment and storage medium
CN117745512A (en) Quantum computation-based image processing method, device, electronic equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication