WO2023083240A1 - 训练智能模型的方法、装置、电子设备及存储介质 - Google Patents

训练智能模型的方法、装置、电子设备及存储介质 Download PDF

Info

Publication number
WO2023083240A1
WO2023083240A1 PCT/CN2022/131038 CN2022131038W WO2023083240A1 WO 2023083240 A1 WO2023083240 A1 WO 2023083240A1 CN 2022131038 W CN2022131038 W CN 2022131038W WO 2023083240 A1 WO2023083240 A1 WO 2023083240A1
Authority
WO
WIPO (PCT)
Prior art keywords
training
sample
model
neural network
network
Prior art date
Application number
PCT/CN2022/131038
Other languages
English (en)
French (fr)
Inventor
孟让
浦世亮
陈伟杰
杨世才
谢迪
Original Assignee
杭州海康威视数字技术股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 杭州海康威视数字技术股份有限公司 filed Critical 杭州海康威视数字技术股份有限公司
Publication of WO2023083240A1 publication Critical patent/WO2023083240A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches

Definitions

  • the present application relates to the field of computers, and in particular to a method, device, electronic equipment and machine-readable storage medium for training an intelligent model.
  • the training samples include labeled training samples and unlabeled training samples.
  • the labeled training samples include sample data and labels, and the unlabeled training samples include sample data.
  • the training samples are samples for training license plates
  • the labeled training samples include the license plate image and the label for the license plate position in the license plate image
  • the unlabeled training samples include the license plate image.
  • multiple training samples can be used to train the neural network, and the multiple training samples include labeled training samples.
  • the labeled training samples By using the labeled training samples to train the neural network, an intelligent model with superior performance can be obtained.
  • the multiple training samples include both labeled training samples and unlabeled training samples, how to train an intelligent model with the same superiority is an urgent problem to be solved at present.
  • embodiments of the present application provide a method, device, electronic device, and machine-readable storage medium for training an intelligent model.
  • the present application provides a method for training an intelligent model, the method comprising: obtaining a first network set and a second network set from a first neural network, the first network set including the first neural network and m -1 second neural network, m is an integer greater than 1, the second network set includes m second neural networks, the m-1 second neural networks and the m second neural networks are all Different sub-networks in the first neural network, the network scale of each neural network in the first network set is greater than the network scale of each neural network in the second network set; based on the first sample set and The second set of samples trains each neural network in the first set of networks to obtain a first set of models, the first set of samples includes a plurality of first training samples, and each first training sample includes a first training sample data and a label, the label is used to indicate a target in the first training data, the second sample set includes a plurality of second training samples, each second training sample includes second training data, and the first The model set includes a first intelligent model obtained by training the first neural network
  • the training each neural network in the first network set based on the first sample set and the second sample set to obtain the first model set includes: for any neural network in the first network set A neural network to be trained, based on the first sample set, train the parameters of the convolutional layer and the parameters of the fully connected layer included in the neural network to be trained, and the value of the parameter of the convolutional layer after training is the first A parameter value, and the value of the parameter of the fully connected layer after training is a second parameter value; the value of the parameter of the convolutional layer is fixed to the first parameter value, based on the first sample set and the second sample set, training the parameters of the fully connected layer included in the neural network to be trained, the value of the parameter of the fully connected layer after training is changed from the second parameter value to the third parameter value; The value of the parameter of the fully connected layer is fixed to the third parameter value, and based on the first sample set and the second sample set, the parameters of the convolutional layer of the neural network to be trained are trained to obtain the Describe the intelligent model
  • the training the parameters of the fully connected layer included in the neural network to be trained based on the first sample set and the second sample set includes: A first training sample is input to the neural network to be trained, a first output result corresponding to each first training sample output by the neural network to be trained is obtained, and each of the second sample sets is The second training sample is input to the neural network to be trained, and the first output result corresponding to each second training sample is obtained; based on the first output result corresponding to each first training sample and the first output result corresponding to each first training sample A label included in a training sample, the first loss function value is obtained through the first loss function, and based on the first output result corresponding to each second training sample, the second loss function value is obtained through the distance loss function, the first A loss function is other loss functions except the distance loss function; when the loss function value of the fully connected layer is not a minimum value, the value of the parameter of the fully connected layer is adjusted based on the value of the fully connected layer loss function, so
  • the fully connected layer loss function value includes the first loss function
  • the training the parameters of the convolutional layer of the neural network to be trained based on the first sample set and the second sample set includes: using each of the first sample set The first training sample is input to the neural network to be trained, and the second output result corresponding to each first training sample output by the neural network to be trained is obtained, and each first training sample in the second sample set is obtained.
  • Two training samples are input to the neural network to be trained, and a second output result corresponding to each second training sample is obtained; based on the second output result corresponding to each first training sample and the first training sample Included labels, obtain a third loss function value through the first loss function, and obtain a fourth loss function value through the distance loss function based on the second output result corresponding to each second training sample;
  • the value of the convolutional layer loss function is not the minimum value
  • the value of the parameter of the convolutional layer is adjusted based on the value of the convolutional layer loss function, and the value of the convolutional layer loss function includes the value of the third loss function and the value of the first Four loss function values.
  • the training of each neural network in the second network set based on the first sample set, the second sample set, and the first model set to obtain a second model set includes : input each second training sample in the second sample set to each intelligent model in the first model set, and obtain a plurality of corresponding to the second training samples output by each intelligent model The third output result; based on the multiple third output results corresponding to each second training sample, respectively obtain the label corresponding to each second training sample; based on the first sample set, each of the The second training sample and the label corresponding to each second training sample train each neural network in the second network set to obtain a second model set.
  • the method further includes: for each of the n first devices, based on resource information of the first device, determining a third model set corresponding to the first device, the first A third model set corresponding to a device includes at least one second smart model, the resources of the first device meet the resources required by each second smart model in the at least one second smart model, and n is greater than 1 An integer; based on the first intelligent model, select a second intelligent model for the first device from a third model set corresponding to the first device.
  • the selecting a second intelligent model for the first device from a third model set corresponding to the first device based on the first intelligent model includes: based on the second sample set, Obtaining difference information between the first smart model and each second smart model in a third model set corresponding to the first device; based on the difference information between the first smart model and each second smart model , selecting a second smart model from the at least one second smart model.
  • the acquiring the difference information of each second smart model in the third model set corresponding to the first smart model and the first device based on the second sample set includes: combining the Each second training sample in the second sample set is input to the first intelligent model, and the fourth output result corresponding to each second training sample output by the first intelligent model is obtained; the second Each second training sample in the sample set is input to the target intelligent model, and the fifth output result corresponding to each second training sample output by the target intelligent model is obtained, and the target intelligent model is the first device A second intelligent model in the corresponding third model set; based on the fourth output result and the fifth output result corresponding to each second training sample, the difference between the first intelligent model and the target intelligent model is obtained information.
  • the method is applied to a central computing-edge device system architecture, the system architecture includes a central device and multiple edge smart devices, each edge smart device is deployed in a different area, and the computing capability of the central device and storage capacity are respectively greater than the computing power and storage capacity of each edge smart device, the method is executed by the central device, and the second training samples in the second sample set are the Data collected by each edge smart device.
  • the present application provides a device for training an intelligent model, the device comprising: an acquisition module, configured to acquire a first network set and a second network set from a first neural network, the first network set includes The first neural network and m-1 second neural networks, m is an integer greater than 1, the second network set includes m second neural networks, the m-1 second neural networks and the m The second neural network is a different sub-network in the first neural network, and the network scale of each neural network in the first network set is greater than the network scale of each neural network in the second network set;
  • a training module configured to train each neural network in the first network set based on the first sample set and the second sample set to obtain a first model set, the first sample set including a plurality of first Training samples, each first training sample includes first training data and a label, the label is used to indicate a target in the first training data, the second sample set includes a plurality of second training samples, each first training sample The second training sample includes second training data, and the first model set includes a first intelligent
  • Model a second training module, configured to train each neural network in the second network set based on the first sample set, the second sample set and the first model set, to obtain a second model A set, the second model set includes m second intelligent models obtained by training the m second neural networks.
  • the first training module is configured to: for any neural network to be trained in the first network set, based on the first sample set, train the convolution included in the neural network to be trained The parameters of the layer and the parameters of the fully connected layer, the value of the parameter of the convolution layer after training is the first parameter value, and the value of the parameter of the fully connected layer after training is the second parameter value; the described The value of the parameter of the convolutional layer is fixed to the first parameter value, and based on the first sample set and the second sample set, the parameters of the fully connected layer included in the neural network to be trained are trained, and the trained The value of the parameter of the fully connected layer is changed from the second parameter value to the third parameter value; the value of the parameter of the fully connected layer is fixed to the third parameter value, based on the first sample set and the second sample set, and train the parameters of the convolutional layer of the neural network to be trained to obtain an intelligent model corresponding to the neural network to be trained.
  • the first training module is configured to: input each first training sample in the first sample set to the neural network to be trained, and obtain the output of the neural network to be trained.
  • the first output result corresponding to each first training sample, and each second training sample in the second sample set is input to the neural network to be trained, and the first output result corresponding to each second training sample is obtained.
  • An output result based on the first output result corresponding to each first training sample and the label included in each first training sample, a first loss function value is obtained through a first loss function, and based on each The first output result corresponding to the second training sample obtains the second loss function value through the distance loss function, and the first loss function is a loss function other than the distance loss function; the loss function value of the fully connected layer is not When it is the minimum value, adjust the value of the parameter of the fully connected layer based on the value of the fully connected layer loss function, and the value of the fully connected layer loss function includes the first loss function value and the second loss function value.
  • the first training module is configured to: input each first training sample in the first sample set to the neural network to be trained, and obtain the output of the neural network to be trained.
  • the second output result corresponding to each first training sample and inputting each second training sample in the second sample set to the neural network to be trained, and obtaining the first corresponding to each second training sample
  • Two output results based on the second output result corresponding to each first training sample and the label included in the first training sample, a third loss function value is obtained through the first loss function, and based on each of the For the second output result corresponding to the second training sample, the fourth loss function value is obtained through the distance loss function; when the loss function value of the convolution layer is not the minimum value, adjust the convolution based on the loss function value of the convolution layer Layer parameter values, the convolutional layer loss function value includes the third loss function value and the fourth loss function value.
  • the second training module is configured to: input each second training sample in the second sample set to each intelligent model in the first model set, and obtain each intelligent model A plurality of third output results corresponding to the second training sample output by the model; based on the plurality of third output results corresponding to each second training sample, respectively obtain a label corresponding to each second training sample; Based on the first sample set, each second training sample, and the label corresponding to each second training sample, train each neural network in the second network set to obtain a second model set.
  • the apparatus further includes: a determining module, configured to, for each of the n first devices, determine a third model corresponding to the first device based on resource information of the first device A set, the third model set corresponding to the first device includes at least one second smart model, the resources of the first device meet the resources required by each second smart model in the at least one second smart model, n is an integer greater than 1; a selection module, configured to select a second smart model for the first device from a third model set corresponding to the first device based on the first smart model.
  • a determining module configured to, for each of the n first devices, determine a third model corresponding to the first device based on resource information of the first device A set, the third model set corresponding to the first device includes at least one second smart model, the resources of the first device meet the resources required by each second smart model in the at least one second smart model, n is an integer greater than 1
  • a selection module configured to select a second smart model for the first device from a third model set corresponding to the
  • the selection module is configured to: obtain, based on the second sample set, difference information between the first intelligent model and each second intelligent model in a third model set corresponding to the first device ; Based on the difference information between the first intelligent model and each of the second intelligent models, selecting a second intelligent model from the at least one second intelligent model.
  • the selection module is configured to: input each second training sample in the second sample set to the first intelligent model, and obtain each second training sample output by the first intelligent model.
  • the fourth output result corresponding to the two training samples input each second training sample in the second sample set to the target intelligent model, and obtain the first corresponding to each second training sample output by the target intelligent model
  • Five output results, the target intelligent model is a second intelligent model in the third model set corresponding to the first device; based on the fourth output result and the fifth output result corresponding to each second training sample, Obtain difference information between the first intelligent model and the target intelligent model.
  • the present application provides an electronic device, which includes a processor and a memory; the memory is used to store machine-executable instructions; the processor is used to read and execute instructions stored in the memory. Machine-executable instructions to implement the above-mentioned method for training an intelligent model.
  • the present application provides a machine-readable storage medium, where machine-executable instructions are stored in the machine-readable storage medium, and when the machine-executable instructions are executed by a processor, the above-mentioned method for training an intelligent model is implemented .
  • the first network set includes the first neural network and m-1 second neural networks
  • the second network set includes m second neural networks.
  • the first training samples included in the first sample set are labeled training samples
  • the second training samples included in the second sample set are unlabeled training samples. Since the scales of the neural networks in the first network set are larger than the scales of the neural networks in the second network set, even if the second training samples in the second sample set are used as unlabeled training samples, the first sample set can also be used and the second sample set to train each neural network in the first network set to obtain an intelligent model corresponding to each neural network in the first network set to obtain the first model set.
  • the intelligent model corresponding to the network so when it includes both labeled training samples and unlabeled training samples, it also trains an intelligent model with superior performance.
  • FIG. 1 is a schematic diagram of a neural network structure provided by an embodiment of the present application.
  • FIG. 2 is a schematic diagram of another neural network structure provided by the embodiment of the present application.
  • FIG. 3 is a schematic structural diagram of a network architecture provided by an embodiment of the present application.
  • Fig. 4 is a flow chart of a method for training an intelligent model provided by an embodiment of the present application
  • FIG. 5 is a flow chart of another method for training an intelligent model provided by an embodiment of the present application.
  • FIG. 6 is a flow chart of another method for training an intelligent model provided by an embodiment of the present application.
  • FIG. 7 is a flow chart of another method for training an intelligent model provided by an embodiment of the present application.
  • Fig. 8 is a schematic structural diagram of a device for training an intelligent model provided by an embodiment of the present application.
  • FIG. 9 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.
  • the first neural network is a large-scale neural network, and the first neural network may be called a super network.
  • the first neural network includes a large number of sub-networks.
  • the first neural network includes a large number of neurons. By dividing the neurons included in the first neural network according to different depths, different widths, and different convolutions and sizes, a large number of sub-networks can be divided into the first neural network.
  • the neurons in the first neural network are divided according to different depths, and the first neural network is divided into a plurality of different sub-networks.
  • the neurons in the supernetwork are divided according to different widths, and the first neural network is divided into a plurality of different subnetworks.
  • Each sub-network in the first neural network is a neural network, and training samples are used to train the neural network to obtain an intelligent model with required functions. For example, in order to be able to recognize license plates, training samples can be used to train a neural network to obtain a license plate recognition model with a license plate recognition function.
  • the neural network includes a convolutional layer and a fully connected layer, and the convolutional layer is used to extract features from training samples.
  • the fully connected layer processes the objects in the training samples based on the extracted features. For example, based on the extracted features, the fully connected layer classifies the objects in the training sample, etc.
  • the convolutional layer includes a feature extractor, and the fully connected layer includes multiple classifiers.
  • the feature extractor belongs to the convolutional layer of the neural network and is used to extract features from training samples.
  • the classifier classifies the objects in the training sample based on the features extracted by the feature extractor.
  • the labeled training samples include training data and a label
  • the label is used to indicate a target in the training data.
  • the training data in the training sample is a picture including a license plate
  • the label in the training sample is used to indicate the position of the license plate in the picture
  • the training sample is a labeled training sample.
  • Unlabeled training samples include the training data.
  • the training sample includes a picture
  • the picture includes a license plate image
  • the training sample is an unlabeled training sample.
  • the above intelligent model takes the license plate recognition model as an example, but the intelligent model can also be an intelligent model for semantic segmentation, an intelligent model for target detection, an intelligent model for key point detection and/or an intelligent model for video understanding model etc. These intelligent models can be obtained by training neural networks using training samples.
  • the classifier of the fully connected layer classifies the training data in the training samples.
  • the classifier classifies each image in the training sample, and when an image is classified as a license plate image in the training sample, the position of the license plate image is used as the license plate position.
  • the classifier classifies the semantics of each data content in the training sample, and classifies data content with different semantics, so as to implement semantic segmentation.
  • a first sample set and a second sample set are provided, the first sample set includes a plurality of first training samples, each first training sample includes first training data and a label, and the label is used to indicate The target included in the first training data, the second sample set includes a plurality of second training samples, and each second training sample includes the second training data.
  • each first training sample in the first sample set is a labeled training sample
  • each second training sample in the second sample set is an unlabeled training sample.
  • the first neural network and each sub-network included in the first neural network are trained to obtain an intelligent model corresponding to the first neural network and an intelligent model corresponding to each sub-network. Since the scales of each sub-network of the first neural network are different, the scale of the intelligent model corresponding to the trained first neural network is different from the scale of the intelligent model corresponding to each sub-network, so that intelligent models of different scales can be deployed in different on hardware capable devices.
  • the first training samples in the first sample set and the second training samples in the second sample set may be training samples in different environments.
  • each first training sample in the first sample set may be a sample obtained by photographing various license plates during the day
  • each second training sample in the second sample set may be a night Samples obtained by photographing various license plates.
  • the license plate recognition model obtained through the training of the first sample set and the second sample set can perform license plate recognition during the daytime, and can also perform license plate recognition at night.
  • an embodiment of the present application provides a network architecture, the network architecture includes a central device and multiple edge intelligent devices, and each edge intelligent device communicates with the central device.
  • each edge smart device and the center device are connected to the communication network, and each edge smart device establishes a network connection with the center device in the communication network, so as to realize the communication between each edge smart device and the center device .
  • each edge smart device establishes a network connection with the center device in the communication network, so as to realize the communication between each edge smart device and the center device .
  • there are other ways to realize the communication between each edge intelligent device and the central device which will not be described in detail here.
  • the computing power and storage capacity of each edge smart device are much smaller than the computing power and storage capacity of the central device, and the computing power and storage capacity of each edge smart device may be different.
  • the central device includes a first sample set, and each first training sample in the first sample set includes first training data and a label, where the label is used to indicate a target included in the first training data.
  • the labels included in each first training sample may be obtained by manually marking each first training sample.
  • Each edge smart device is deployed in a different area, and each edge smart device collects the area where it is located, uses the collected data as a second training sample, and sends the second training sample to the central device. Since the number of edge smart devices is often large, each edge smart device collects a large number of second training samples, so the central device will obtain a large number of second training samples. If you continue to manually mark the second training samples, the labor cost is too high, so the central device directly uses the data collected by each edge smart device as the second training samples, and forms each second training samples into the second sample set .
  • the central device trains a supernetwork based on the first sample set and the second sample set, and trains the supernetwork into multiple intelligent models of different scales, and each intelligent model requires devices with different computing and storage capabilities.
  • the central device installs different smart models to different edge smart devices.
  • each edge smart device is deployed in different scenarios. Since each edge smart device is affected by its environment, the scene corresponding to the second training sample collected by each edge smart device is different from the scene corresponding to each first training sample in the central device.
  • the central device is a central server.
  • the central device is a cloud computing server or a cloud computing server cluster.
  • Edge smart devices are smart cameras, computers or single-chip microcomputers, etc.
  • the network architecture is a central computing-edge device system architecture, such as a cloud-centric computing-edge device system architecture, that is, a central device is a server or server cluster with large-scale computing and storage capabilities (for example, a server cluster consists of 200 graphics cards, 100 CPUs, and 100T storage hard drives, etc.), and multiple edge smart devices refer to edge devices with multiple small-scale computing and storage capabilities.
  • Servers with a small cluster size such as computers with no or fewer graphics cards; single-chip microcomputers), cameras with a certain ability to carry smart models, or smart glasses with smaller computing power.
  • the network architecture shown in Figure 4 is applied to the intelligent wildlife protection monitoring system.
  • the system consists of a cloud computing center (as the central device) and multiple edge smart devices.
  • the cloud computing center is a server cluster, which has strong hardware computing power and storage capacity, and can utilize pre-selected and collected tagged wild animal data (the first training sample).
  • the cloud computing center can use the labeled first training sample to train the hypernetwork to obtain a supernetwork model.
  • Edge smart devices including smart cameras installed in different wildlife protection areas, are used to collect and process image data to identify and monitor possible wild animals.
  • a smart camera is a device capable of running an intelligent model. It has different models, so its computing and storage capabilities are different, but it is very limited compared with the cloud computing center, and may not be able to directly run the super network model trained by the cloud computing center to complete the task. .
  • the amount of data collected by the smart camera is huge, and it is expensive to use manpower to mark the wild animal targets in the image; different smart cameras are affected by the environment, and the data collected and processed are compared with the first training samples in the cloud computing center. different scenarios.
  • the training sample is to train the super network through the training method provided by this application, and train an intelligent model that meets the computing power, storage capacity and scene conditions of each smart camera.
  • the camera can be equipped with an intelligent model that meets the conditions (its own computing power, storage capacity and scene conditions), and then process the collected images to perform intelligent monitoring tasks for wild animals.
  • the training method of the present application only trains a hypernetwork model, and a customized smart model can be obtained by collecting computing power, storage capacity information of different smart cameras and collected unlabeled data (that is, the second training sample). It saves the separate training cost of different models required by different smart cameras, and the cost of manual labeling of different scene data collected by different smart cameras.
  • the intelligent wildlife protection and monitoring system described above can also be replaced by other tasks.
  • monitoring tasks in dangerous environments such as personnel/object intrusion warning systems in substation areas of ultra-high voltage power grids, operation identification systems for violations of safety regulations on large-scale production lines, etc.; it can also be different road monitoring equipment and Intelligent traffic command center and intelligent traffic system composed of different road electronic camera equipment, etc.
  • the embodiment of the present application provides a method for training an intelligent model.
  • the method is applied to the network architecture shown in FIG. 3 , and the execution subject of the method is the central device in the network architecture.
  • the method uses To train the first neural network, obtain the first intelligent model corresponding to the first neural network and the second intelligent model corresponding to each of the multiple sub-networks in the first neural network, including:
  • Step 301 Obtain a first network set and a second network set from the first neural network, the first network set includes the first neural network and m-1 second neural networks, m is an integer greater than 1, and the second network set Including m second neural networks, the m-1 second neural networks and the m second neural networks are different sub-networks in the first neural network, and the network size of each neural network in the first network set is greater than The network size of each neural network in the second network ensemble.
  • Step 302 Based on the first sample set and the second sample set, train each neural network in the first network set to obtain a first model set, the first sample set includes a plurality of first training samples, each first The training sample includes the first training data and a label, the label is used to indicate the target in the first training data, the second sample set includes a plurality of second training samples, each second training sample includes the second training data, the first model The set includes a first intelligent model obtained by training a first neural network and m-1 second intelligent models obtained by training m-1 second neural networks.
  • the central device receives the data collected by each edge smart device, and uses the data collected by each edge smart device as a second training sample.
  • the data collected by each edge smart device is pictures and the like.
  • Step 303 Based on the first sample set, the second sample set and the first model set, train each neural network in the second network set to obtain a second model set, the second model set includes training the m second neural networks The m second intelligent models obtained by the network.
  • the first neural network is divided into a first network set and a second network set, the first network set includes the first neural network and m-1 second neural networks, and the second network set includes m Second neural network.
  • the first training samples included in the first sample set are labeled training samples, and the second training samples included in the second sample set are unlabeled training samples. Since the scales of the neural networks in the first network set are larger than the scales of the neural networks in the second network set, even if the second training samples in the second sample set are used as unlabeled training samples, the first sample set can also be used and the second sample set to train each neural network in the first network set to obtain an intelligent model corresponding to each neural network in the first network set to obtain the first model set.
  • the intelligent model corresponding to the network so when it includes both labeled training samples and unlabeled training samples, it also trains an intelligent model with superior performance.
  • the unlabeled training samples are all collected by different edge smart devices, and each edge smart device is deployed in a different area, so each edge smart device is in a different scene.
  • These unlabeled training samples constitute the second sample set, and the embodiment of the present application trains intelligent models of different scales based on the first sample set and the second sample set, so that each intelligent model can meet different scene conditions. Since intelligent models of different scales have different requirements on the computing power and storage capacity of edge intelligent devices, intelligent models of different scales can be deployed on different edge intelligent devices.
  • the training data in the second training sample is classified through the first model set including m intelligent models, and each The m processing results corresponding to the second training sample, based on the m processing results corresponding to each second training sample, the label corresponding to each second training sample is obtained, so that based on each first training sample and each second training The label corresponding to the sample can train each neural network in the second network set, so that each intelligent model in the second model set has superior performance.
  • the first network set and the second network set are obtained from the first neural network
  • the first network set includes the first neural network and m-1 second neural networks
  • the second network set Including m second neural networks
  • the m-1 second neural networks and the m second neural networks are different sub-networks of the first neural network
  • m is an integer greater than 1
  • each of the first network set The network size of the neural network is greater than the network size of each neural network in the second set of networks.
  • step 301 2m neural networks are randomly selected in the super network, the 2m neural networks include the first neural network and 2m-1 sub-networks, and the 2m-1 sub-networks are 2m-1 second neural networks,
  • the first neural network has the largest scale among the 2m neural networks.
  • the second neural network forms the first network set, and the remaining m unselected second neural networks form the second network set.
  • the number of sub-networks included in the first neural network is greater than or equal to 2m-1, so after performing the above steps 301-303, 2m-1 second neural networks may be randomly selected from the first neural network, The first neural network and the 2m-1 second neural networks continue to perform the above-mentioned operations of 301-303 until each sub-network in the first neural network is trained to obtain an intelligent model corresponding to each sub-network.
  • step 302 refers to FIG. 5 , in step 302, the steps 3021 to 3024 are as follows to implement step 302 .
  • the operations of 3021 to 3024 are respectively:
  • Step 3021 Based on the first sample set, train the parameters of the convolutional layer and the parameters of the fully connected layer included in the neural network to be trained.
  • the neural network to be trained is any neural network in the first network set, and the trained convolution
  • the value of the parameter of the layer is the first parameter value
  • the value of the parameter of the fully connected layer after training is the second parameter value.
  • the parameters of the training convolutional layer are essentially the parameters of the feature extractor included in the training convolutional layer, and the parameters of the training of the fully connected layer are essentially the parameters of each classifier included in the training of the fully connected layer.
  • step 3021 implement step 3021 according to the following 11 to 14 operations.
  • the operations of 11 to 14 are respectively:
  • the sixth output result corresponding to the first training sample is used to indicate the target included in the first training data in the first training sample.
  • the neural network to be trained includes a convolutional layer and a fully connected layer.
  • the convolutional layer in the neural network to be trained is trained from the first training sample in the first training sample. Extract features from the data, input the feature to the fully connected layer in the neural network to be trained, and based on the feature, the fully connected layer classifies and processes the targets included in the first training data in the first training sample, and outputs the processing result.
  • the output processing result is the sixth output result corresponding to the first training sample.
  • each classifier included in the fully-connected layer is based on this feature to classify the target included in the first training data in the first training sample, so the sixth training sample corresponding to the first training sample output by the neural network to be trained
  • the output results include the processing results output by each classifier.
  • the feature extractor of the convolutional layer extracts features from the first training data in the first training sample, and inputs the features to each classifier in the neural network to be trained, and each classifier is based on the features for the first training data.
  • the objects included in the first training data in a training sample are classified and processed, and the processing results are output.
  • the sixth output result corresponding to the first training sample includes the output result of each classifier.
  • the first loss function is:
  • s 1 is the value of the fifth loss function
  • F d is the feature extractor of the convolutional layer
  • C di is the ith classifier of the fully connected layer
  • (x s , y s ) is the first Sample set
  • (x, y) is the first training sample
  • x is the first training data in the first training sample
  • y is the label in the first training sample
  • K represents the total number of categories
  • y represents the label
  • the first loss function may be a cross-entropy loss function or the like.
  • Adjusting the value of the parameter of the convolutional layer includes adjusting the value of the parameter of the feature extractor of the neural network to be trained. Adjusting the value of the parameter of the fully connected layer includes adjusting the value of the parameter of each classifier of the neural network to be trained.
  • the feature extractor of the neural network to be trained may also be a feature extractor in other neural networks, so when adjusting the neural network to be trained When the parameters of the feature extractor are adjusted, the parameters of the feature extractor in other neural networks are also adjusted accordingly.
  • the classifier of the neural network to be trained may also be a classifier in other neural networks, so when adjusting the parameters of the classifier in the neural network to be trained, the parameters of the classifier in other neural networks will also be included. Adjustment.
  • Step 3022 Fix the value of the parameter of the convolutional layer of the neural network to be trained as the first parameter value, and train the parameters of the fully connected layer included in the neural network to be trained based on the first sample set and the second sample set. After training The value of the parameter of the fully connected layer is changed from the second parameter value to the third parameter value.
  • fixing the value of the parameter of the convolutional layer of the neural network to be trained as the first parameter value includes fixing the value of the parameter of the feature extractor of the neural network to be trained as the first parameter value. Based on the first sample set and the second sample set, the parameters of each classifier included in the neural network to be trained are trained, and the value of the parameter of each classifier after training is changed from the second parameter value to the third parameter value.
  • the value of the third parameter may or may not be the same for each classifier.
  • step 3022 the parameters of the fully connected layer included in the neural network to be trained are trained according to the following operations 21 to 26.
  • the operations of 21 to 26 are respectively:
  • each first training sample in the first sample set is input to the neural network to be trained, so that the neural network to be trained can classify the targets included in the first training data in each first training sample, and each The first output result corresponding to the first training sample.
  • the first output result corresponding to the first training sample is used to indicate the target included in the first training data in the first training sample.
  • the convolutional layer (feature extractor) in the neural network to be trained extracts features from the first training data in the first training sample
  • the fully-connected layer (each classifier in the fully-connected layer) in the neural network inputs the feature
  • the fully-connected layer (each classifier) performs the target included in the first training data in the first training sample based on the feature.
  • Classification processing, outputting a processing result, the first output result corresponding to the first training sample includes the output result of each classifier.
  • the first output result corresponding to the second training sample is used to indicate the target included in the second training data in the second training sample.
  • the convolutional layer (feature extractor) in the neural network to be trained extracts features from the second training data in the second training sample, and sends to the neural network to be trained
  • the fully connected layer (fully connected layer includes each classifier) in the neural network inputs the feature, and the fully connected layer (each classifier) classifies the target included in the second training data in the second training sample based on the feature Processing, outputting a processing result, the first output result corresponding to the second training sample includes the output result of each classifier.
  • s 1 is still used to represent the first loss function value.
  • the second loss function value is obtained through the following distance loss function, and the first loss function is different from the distance loss function.
  • the distance loss function is:
  • s 2 is the second loss function value
  • ⁇ x t ⁇ is the second sample set
  • (x) is the second training sample.
  • the fully connected layer loss function value includes a first loss function value and a second loss function value.
  • the loss function value of the fully connected layer can be equal to s 1 -s 2 , where,
  • the loss function value of the fully connected layer is the minimum value, the operation ends.
  • the value of the parameter of the convolutional layer (feature extractor) of the neural network to be trained is referred to as the first parameter value, and the fully connected layer of the neural network to be trained (multiple classifiers in the fully connected layer) ) is called the third parameter value.
  • Step 3023 Fix the value of the parameter of the fully-connected layer (multiple classifiers in the fully-connected layer) of the neural network to be trained to the third parameter value, based on the first sample set and the second sample set, train the to-be-trained The parameters of the convolutional layer (feature extractor) of the neural network are obtained to obtain the corresponding intelligent model of the neural network to be trained.
  • step 3023 the parameters of the convolutional layer (feature extractor) included in the neural network to be trained are trained according to the following operations 31 to 36.
  • the operations of 31 to 36 are respectively:
  • the second output result corresponding to the first training sample is used to indicate the target included in the first training data in the first training sample.
  • the convolutional layer (feature extractor) in the neural network to be trained extracts features from the first training data in the first training sample
  • the fully connected layer (each classifier in the fully connected layer) in the neural network inputs the feature
  • the fully connected layer (each classifier in the fully connected layer) identifies the first training sample in the first training sample based on the feature.
  • the target included in the data outputs the recognition result
  • the second output result corresponding to the first training sample includes the output result of the fully connected layer (each classifier in the fully connected layer).
  • the second output result corresponding to the second training sample is used to indicate the target included in the second training data in the second training sample.
  • the convolutional layer (feature extractor) in the neural network to be trained extracts features from the second training data in the second training sample, and sends to the neural network to be trained
  • the fully connected layer (each classifier in the fully connected layer) in the neural network inputs the feature, and the fully connected layer (each classifier in the fully connected layer) identifies the second training sample in the second training sample based on the feature.
  • the target included in the data outputs the recognition result, and the second output result corresponding to the second training sample includes the output result of each classifier.
  • s 1 is still used to denote the third loss function value.
  • s 2 is still used to denote the fourth loss function value.
  • the loss function value of the convolution layer is not the minimum value, adjust the parameter value of the convolution layer (feature extractor) of the neural network to be trained based on the loss function value of the convolution layer, and return to execute 31, the loss function value of the convolution layer A third loss function value and a fourth loss function value are included.
  • the convolution layer loss function value can be equal to s 1 +s 2 , where,
  • the m intelligent models include a first intelligent model obtained by training the first neural network, and m-1 second intelligent models obtained by respectively training m-1 second neural networks.
  • step 303 is implemented according to the following operations of 3031 to 3033.
  • the operations of 3031 to 3033 are respectively:
  • Step 3031 Input each second training sample in the second sample set to each smart model in the first model set, and obtain a plurality of third output results corresponding to the second training samples output by each smart model.
  • the first model set includes m intelligent models, and for each second training sample in the second sample set, the second training sample is respectively input into each of the m intelligent models.
  • Each intelligent model processes the target in the second training data included in the second training sample, and outputs a processing result. Acquire the processing results output by the m intelligent models, a total of m processing results, and use the m processing results as the m third output results corresponding to the second training sample. For each second training sample in the second sample set, perform the above operation to obtain m third output results corresponding to each second training sample in the second sample set.
  • Step 3032 Based on the multiple third output results corresponding to each second training sample, obtain the label corresponding to each second training sample.
  • the m third output results corresponding to the second training sample calculate the average value of the m third output results, or select the maximum value or the minimum value from the m third output results value etc.
  • the average value, the maximum value or the minimum value are used as labels corresponding to the second training sample.
  • the label corresponding to each second training sample is obtained according to the above operation.
  • Step 3033 Based on the first sample set, each second training sample and the label corresponding to each second training sample, train each neural network in the second network set to obtain a second model set.
  • the second neural network can be trained according to the following operations 41-44.
  • the operations of 41-44 are respectively:
  • each training sample into the second neural network, make the second neural network process the target included in the training data in each training sample, and obtain the seventh output result corresponding to each training sample, each
  • the training samples include every first training sample in the first sample set and every second training sample in the second sample set.
  • the seventh output result corresponding to the training sample is used to indicate the target included in the training data in the training sample.
  • the convolutional layer (feature extractor) in the second neural network extracts features from the training data in the second training sample, and sends
  • the fully-connected layer (each classifier in the fully-connected layer) in the network inputs this feature
  • the fully-connected layer (each classifier in the fully-connected layer) conducts the target included in the training data in the training sample based on this feature.
  • Processing, outputting a processing result, the seventh output result corresponding to the training sample includes the output result of the fully connected layer.
  • training can be performed according to the above operations 41-44 to obtain a second intelligent model corresponding to each second neural network, and a total of m second intelligent models can be obtained.
  • the first intelligent model corresponding to the first neural network and the second intelligent model corresponding to each sub-network in the first neural network are trained through the above steps 301-303.
  • the scale of each second intelligent model is different.
  • the second intelligent model is adapted to devices with different hardware capabilities. Assume that there are n devices with different hardware performances, n is an integer greater than 1, and there are many second intelligent models, so each device can choose a second intelligent model from many second intelligent models.
  • the embodiment of the present application provides the following method for selecting an intelligent model, including:
  • Step 601 Determine n third model sets based on resource information of n devices, and the n devices correspond to the n third model sets one-to-one.
  • the n devices include the first device, the third model set corresponding to the first device includes at least one second smart model, and the resources of the first device meet the needs of each second smart model in the at least one second smart model H.
  • the resource information of the first device includes the free memory size of the first device and/or the number of CPU processing cores, etc., and the resource information of the first device is used to indicate resources on the device.
  • step 601 based on the resource information of the first device and the scale of each second intelligent model trained, determine at least one second intelligent model that can run on the first device (for example, the resources of the first device need to meet determined resources required by each second smart model), and determine a model set including the at least one second smart model as a third model set corresponding to the first device.
  • the above operations are respectively performed on the other n-1 devices to obtain a third model set corresponding to each of the n-1 devices.
  • Step 602 Based on the first smart model, select a second smart model for the first device from a third model set corresponding to the first device.
  • each second training sample in the second sample set is input to the first intelligent model, and a fourth output result corresponding to each second training sample output by the first intelligent model is obtained.
  • Input each second training sample in the second sample set to the target intelligent model, and obtain the fifth output result corresponding to each second training sample output by the target intelligent model, and the target intelligent model is the third model corresponding to the first device A second smart model in the collection.
  • difference information between the first intelligent model and the target intelligent model is acquired.
  • difference information between the first smart model and each second smart model in the third model set corresponding to the first device is acquired.
  • one second intelligent model is selected from at least one second intelligent model.
  • calculate the difference between the fourth output result and the fifth output result of each second training sample that is, calculate the difference corresponding to each second training sample, based on the corresponding
  • the difference is to calculate the average difference, and use the average difference as the difference information between the first intelligent model and the target intelligent model.
  • the embodiment of the present application provides an apparatus 700 for training an intelligent model, and the apparatus 700 includes:
  • An acquisition module 701 configured to acquire a first network set and a second network set from the first neural network, the first network set includes the first neural network and m-1 second neural networks, m is an integer greater than 1 , the second network set includes m second neural networks, the m-1 second neural networks and the m second neural networks are different sub-networks in the first neural network, the first neural network
  • the network size of each neural network in the first network set is larger than the network size of each neural network in the second network set;
  • the first training module 702 is configured to train each neural network in the first network set based on the first sample set and the second sample set to obtain a first model set, and the first sample set includes a plurality of First training samples, each first training sample includes first training data and a label, the label is used to indicate a target in the first training data, the second sample set includes a plurality of second training samples, each The second training samples include the second training data, and the first model set includes the first intelligent model obtained by training the first neural network and the m-1th intelligent model obtained by training the m-1 second neural networks.
  • the second training module 703 is configured to train each neural network in the second network set based on the first sample set, the second sample set, and the first model set to obtain a second model set , the second model set includes m second intelligent models obtained by training the m second neural networks.
  • the first training module 702 is configured to: for any neural network to be trained in the first network set, based on the first sample set, train volumes included in the neural network to be trained The parameters of the product layer and the parameters of the fully connected layer, the value of the parameter of the convolution layer after training is the first parameter value, and the value of the parameter of the fully connected layer after training is the second parameter value;
  • the value of the parameter of the convolutional layer is fixed to the first parameter value, based on the first sample set and the second sample set, the parameters of the fully connected layer included in the neural network to be trained are trained, and after training
  • the value of the parameter of the fully connected layer is changed from the second parameter value to the third parameter value; the value of the parameter of the fully connected layer is fixed to the third parameter value, based on the first sample set and the second sample set, train the parameters of the convolutional layer of the neural network to be trained, and obtain the intelligent model corresponding to the neural network to be trained.
  • the first training module 702 is configured to: input each first training sample in the first sample set to the neural network to be trained, and obtain the output of the neural network to be trained.
  • the first output result corresponding to each of the first training samples, and each second training sample in the second sample set is input to the neural network to be trained, and the output corresponding to each of the second training samples is obtained.
  • the first output result based on the first output result corresponding to each first training sample and the label included in each first training sample, obtain a first loss function value through a first loss function, and based on each The first output result corresponding to the second training sample, the second loss function value is obtained through the distance loss function, and the first loss function is other loss functions except the distance loss function; in the fully connected layer loss function value If it is not the minimum value, adjust the value of the parameter of the fully connected layer based on the value of the fully connected layer loss function, where the value of the fully connected layer loss function includes the first loss function value and the second loss function value.
  • the first training module 702 is configured to: input each first training sample in the first sample set to the neural network to be trained, and obtain the output of the neural network to be trained.
  • the second output result corresponding to each first training sample, and input each second training sample in the second sample set to the neural network to be trained, and obtain the corresponding output result of each second training sample The second output result; based on the second output result corresponding to each first training sample and the label included in the first training sample, a third loss function value is obtained through the first loss function, and based on each of the The second output result corresponding to the second training sample, obtain the fourth loss function value through the distance loss function; when the convolution layer loss function value is not the minimum value, adjust the volume based on the convolution layer loss function value
  • the value of the parameter of the convolutional layer, the value of the loss function of the convolution layer includes the value of the third loss function and the value of the fourth loss function.
  • the second training module 703 is configured to: input each second training sample in the second sample set to each intelligent model in the first model set, and obtain each of the A plurality of third output results corresponding to the second training samples output by the intelligent model; based on the plurality of third output results corresponding to each second training sample, respectively obtain the label corresponding to each second training sample ; Based on the first sample set, each second training sample and the label corresponding to each second training sample, train each neural network in the second network set to obtain a second model set .
  • the apparatus 700 further includes: a determining module, configured to, for each of the n first devices, determine the third device corresponding to the first device based on the resource information of the first device.
  • a model set, the third model set corresponding to the first device includes at least one second smart model, and the resources of the first device meet the resources required by each second smart model in the at least one second smart model , n is an integer greater than 1;
  • the selection module is configured to select a second smart model for the first device from a third model set corresponding to the first device based on the first smart model.
  • the selection module is configured to: obtain, based on the second sample set, difference information between the first intelligent model and each second intelligent model in a third model set corresponding to the first device ; Based on the difference information between the first intelligent model and each of the second intelligent models, selecting a second intelligent model from the at least one second intelligent model.
  • the selection module is configured to: input each second training sample in the second sample set to the first intelligent model, and obtain each second training sample output by the first intelligent model.
  • the fourth output result corresponding to the two training samples input each second training sample in the second sample set to the target intelligent model, and obtain the first corresponding to each second training sample output by the target intelligent model
  • Five output results, the target intelligent model is a second intelligent model in the third model set corresponding to the first device; based on the fourth output result and the fifth output result corresponding to each second training sample, Obtain difference information between the first intelligent model and the target intelligent model.
  • the first sample set and the second sample set can be used to train the neural networks in the first network set
  • an intelligent model corresponding to each neural network in the first network set is obtained, and a first model set is obtained. Then use the first model set, the first sample set and the second sample set to train each neural network in the second network set with a smaller network size, so that each neural network in the second network set can be trained
  • the intelligent model corresponding to the network so when it includes both labeled training samples and unlabeled training samples, it also trains an intelligent model with superior performance.
  • Fig. 9 shows a structural block diagram of an electronic device 800 provided by an exemplary embodiment of the present application.
  • the electronic device 800 may be a portable mobile terminal, such as: a smart phone, a tablet computer, an MP3 player (Moving Picture Experts Group Audio Layer III, moving picture experts compress standard audio layer 3), MP4 (Moving Picture Experts Group Audio Layer IV, Motion Picture Expert compresses standard audio levels 4) Players, laptops or desktops.
  • the electronic device 800 may also be called user equipment, portable terminal, laptop terminal, desktop terminal, and other names.
  • the electronic device 800 includes: a processor 801 and a memory 802 .
  • the processor 801 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and the like.
  • Processor 801 can adopt at least one hardware form in DSP (Digital Signal Processing, digital signal processing), FPGA (Field-Programmable Gate Array, field programmable gate array), PLA (Programmable Logic Array, programmable logic array) accomplish.
  • Processor 801 may also include a main processor and a coprocessor, and the main processor is a processor for processing data in a wake-up state, also known as a CPU (Central Processing Unit, central processing unit); the coprocessor is Low-power processor for processing data in standby state.
  • CPU Central Processing Unit, central processing unit
  • the coprocessor is Low-power processor for processing data in standby state.
  • the processor 801 may be integrated with a GPU (Graphics Processing Unit, image processor), and the GPU is used for rendering and drawing the content that needs to be displayed on the display screen.
  • the processor 801 may also include an AI (Artificial Intelligence, artificial intelligence) processor, where the AI processor is used to process computing operations related to machine learning.
  • AI Artificial Intelligence, artificial intelligence
  • Memory 802 may include one or more computer-readable storage media, which may be non-transitory.
  • the memory 802 may also include high-speed random access memory, and non-volatile memory, such as one or more magnetic disk storage devices and flash memory storage devices.
  • the non-transitory computer-readable storage medium in the memory 802 is used to store at least one instruction, and the at least one instruction is used to be executed by the processor 801 to realize the training intelligence provided by the method embodiments in this application. The method of the model.
  • the electronic device 800 may optionally further include: a peripheral device interface 803 and at least one peripheral device.
  • the processor 801, the memory 802, and the peripheral device interface 803 may be connected through buses or signal lines.
  • Each peripheral device can be connected to the peripheral device interface 803 through a bus, a signal line or a circuit board.
  • the peripheral device includes: at least one of a radio frequency circuit 804 , a display screen 805 , a camera component 806 , an audio circuit 807 , a positioning component 808 and a power supply 809 .
  • the peripheral device interface 803 may be used to connect at least one peripheral device related to I/O (Input/Output, input/output) to the processor 801 and the memory 802 .
  • the processor 801, the memory 802 and the peripheral device interface 803 are integrated on the same circuit board; in some other embodiments, any one or both of the processor 801, the memory 802 and the peripheral device interface 803 It can be implemented on a separate circuit board, which is not limited in this embodiment.
  • the radio frequency circuit 804 is used to receive and transmit RF (Radio Frequency, radio frequency) signals, also called electromagnetic signals.
  • the radio frequency circuit 804 communicates with the communication network and other communication devices through electromagnetic signals.
  • the radio frequency circuit 804 converts electrical signals into electromagnetic signals for transmission, or converts received electromagnetic signals into electrical signals.
  • the radio frequency circuit 804 includes: an antenna system, an RF transceiver, one or more amplifiers, a tuner, an oscillator, a digital signal processor, a codec, a subscriber identity module card, and the like.
  • the radio frequency circuit 804 can communicate with other terminals through at least one wireless communication protocol.
  • the wireless communication protocol includes but is not limited to: World Wide Web, Metropolitan Area Network, Intranet, various generations of mobile communication networks (2G, 3G, 4G and 5G), wireless local area network and/or WiFi (Wireless Fidelity, Wireless Fidelity) network.
  • the radio frequency circuit 804 may also include circuits related to NFC (Near Field Communication, short distance wireless communication), which is not limited in this application.
  • the display screen 805 is used to display a UI (User Interface, user interface).
  • the UI can include graphics, text, icons, video, and any combination thereof.
  • the display screen 805 also has the ability to collect touch signals on or above the surface of the display screen 805.
  • the touch signal can be input to the processor 801 as a control signal for processing.
  • the display screen 805 can also be used to provide virtual buttons and/or virtual keyboards, also called soft buttons and/or soft keyboards.
  • the display screen 805 may be a flexible display screen, which is arranged on the curved surface or the folding surface of the electronic device 800 . Even, the display screen 805 can also be set as a non-rectangular irregular figure, that is, a special-shaped screen.
  • the display screen 805 can be made of LCD (Liquid Crystal Display, liquid crystal display), OLED (Organic Light-Emitting Diode, organic light-emitting diode) and other materials.
  • the camera assembly 806 is used to capture images or videos.
  • the camera component 806 includes a front camera and a rear camera.
  • the front camera is set on the front panel of the terminal, and the rear camera is set on the back of the terminal.
  • there are at least two rear cameras which are any one of the main camera, depth-of-field camera, wide-angle camera, and telephoto camera, so as to realize the fusion of the main camera and the depth-of-field camera to realize the background blur function.
  • camera assembly 806 may also include a flash.
  • the flash can be a single-color temperature flash or a dual-color temperature flash. Dual color temperature flash refers to the combination of warm light flash and cold light flash, which can be used for light compensation under different color temperatures.
  • Audio circuitry 807 may include a microphone and speakers.
  • the microphone is used to collect sound waves of the user and the environment, and convert the sound waves into electrical signals and input them to the processor 801 for processing, or input them to the radio frequency circuit 804 to realize voice communication.
  • the microphone can also be an array microphone or an omnidirectional collection microphone.
  • the speaker is used to convert the electrical signal from the processor 801 or the radio frequency circuit 804 into sound waves.
  • the loudspeaker can be a conventional membrane loudspeaker or a piezoelectric ceramic loudspeaker.
  • the audio circuit 807 may also include a headphone jack.
  • the positioning component 808 is used to locate the current geographic location of the electronic device 800, so as to realize navigation or LBS (Location Based Service, location-based service).
  • the positioning component 808 may be a positioning component based on the GPS (Global Positioning System, Global Positioning System) of the United States, the Beidou system of China or the Galileo system of Russia.
  • the power supply 809 is used to supply power to various components in the electronic device 800 .
  • the power source 809 can be alternating current, direct current, disposable batteries or rechargeable batteries.
  • the rechargeable battery may be a wired rechargeable battery or a wireless rechargeable battery.
  • a wired rechargeable battery is a battery charged through a wired line
  • a wireless rechargeable battery is a battery charged through a wireless coil.
  • the rechargeable battery can also be used to support fast charging technology.
  • the electronic device 800 further includes one or more sensors 810 .
  • the one or more sensors 810 include, but are not limited to: an acceleration sensor 811 , a gyroscope sensor 812 , a pressure sensor 813 , a fingerprint sensor 814 , an optical sensor 815 and a proximity sensor 816 .
  • the acceleration sensor 811 can detect the acceleration on the three coordinate axes of the coordinate system established by the electronic device 800 .
  • the acceleration sensor 811 can be used to detect the components of the acceleration of gravity on the three coordinate axes.
  • the processor 801 may control the display screen 805 to display a user interface in a landscape view or a portrait view according to the gravitational acceleration signal collected by the acceleration sensor 811 .
  • the acceleration sensor 811 can also be used for collecting game or user's motion data.
  • the gyro sensor 812 can detect the body direction and rotation angle of the electronic device 800 , and the gyro sensor 812 can cooperate with the acceleration sensor 811 to collect 3D actions of the user on the electronic device 800 .
  • the processor 801 can realize the following functions: motion sensing (such as changing the UI according to the user's tilt operation), image stabilization during shooting, game control and inertial navigation.
  • the pressure sensor 813 may be disposed on a side frame of the electronic device 800 and/or a lower layer of the display screen 805 .
  • the pressure sensor 813 can detect the user's grip signal on the electronic device 800, and the processor 801 performs left and right hand recognition or shortcut operation according to the grip signal collected by the pressure sensor 813.
  • the processor 801 controls the operable controls on the UI interface according to the user's pressure operation on the display screen 805.
  • the operable controls include at least one of button controls, scroll bar controls, icon controls, and menu controls.
  • the fingerprint sensor 814 is used to collect the user's fingerprint, and the processor 801 recognizes the identity of the user according to the fingerprint collected by the fingerprint sensor 814, or, the fingerprint sensor 814 recognizes the user's identity according to the collected fingerprint.
  • the processor 801 authorizes the user to perform relevant sensitive operations, such sensitive operations include unlocking the screen, viewing encrypted information, downloading software, making payment, and changing settings.
  • the fingerprint sensor 814 may be disposed on the front, back or side of the electronic device 800 . When the electronic device 800 is provided with a physical button or a manufacturer's logo, the fingerprint sensor 814 may be integrated with the physical button or the manufacturer's Logo.
  • the optical sensor 815 is used to collect ambient light intensity.
  • the processor 801 may control the display brightness of the display screen 805 according to the ambient light intensity collected by the optical sensor 815 . Specifically, when the ambient light intensity is high, the display brightness of the display screen 805 is increased; when the ambient light intensity is low, the display brightness of the display screen 805 is decreased.
  • the processor 801 may also dynamically adjust shooting parameters of the camera assembly 806 according to the ambient light intensity collected by the optical sensor 815 .
  • the proximity sensor 816 also called a distance sensor, is usually arranged on the front panel of the electronic device 800 .
  • the proximity sensor 816 is used to collect the distance between the user and the front of the electronic device 800 .
  • the processor 801 controls the display screen 805 to switch from the bright screen state to the off screen state; when the proximity sensor 816 detects When the distance between the user and the front of the electronic device 800 gradually increases, the processor 801 controls the display screen 805 to switch from the off-screen state to the on-screen state.
  • FIG. 9 does not constitute a limitation to the electronic device 800, and may include more or less components than shown in the figure, or combine some components, or adopt a different arrangement of components.
  • the embodiment of the present application also provides a machine-readable storage medium, in which machine-executable instructions are stored.
  • machine-executable instructions When the machine-executable instructions are executed by a processor, the above-described method for training an intelligent model is implemented.
  • the machine-readable storage medium may be ROM, RAM, CD-ROM, magnetic tape, floppy disk, and optical data storage device, among others.

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Image Analysis (AREA)

Abstract

本申请提供一种训练智能模型的方法、装置、电子设备及存储介质。所述方法包括:从第一神经网络中获取包括第一神经网络及m-1个第二神经网络的第一网络集合和包括m个第二神经网络的第二网络集合,所述m-1个第二神经网络和所述m个第二神经网络是所述第一神经网络中的不同子网络;基于第一样本集合和第二样本集合,训练所述第一网络集合中的每个神经网络,得到第一模型集合,其包括训练第一神经网络得到的第一智能模型和训练m-1个第二神经网络得到的m-1个第二智能模型;基于所述第一样本集合、所述第二样本集合和所述第一模型集合,训练所述第二网络集合中的每个神经网络,得到第二模型集合,其包括训练所述m个第二神经网络得到的m个第二智能模型。

Description

训练智能模型的方法、装置、电子设备及存储介质 技术领域
本申请涉及计算机领域,特别涉及一种训练智能模型的方法、装置、电子设备及机器可读存储介质。
背景技术
训练样本包括有标签的训练样本和无标签的训练样本两类,有标签的训练样本包括样本数据和标签,无标签的训练样本包括样本数据。例如,假设训练样本是用于训练车牌的样本,有标签的训练样本包括车牌图像和标签,该标签用于车牌图像中的车牌位置,无标签的训练样本包括车牌图像。
目前可以使用多个训练样本来训练神经网络,该多个训练样本包括有标签的训练样本,通过该有标签的训练样本训练神经网络,能够得到具有优越性能的智能模型。但是当该多个训练样本既包括有标签的训练样本,又包括无标签训练样本,如何训练出具有相同优越性的智能模型,是目前急需解决的问题。
发明内容
为了解决相关技术中的问题,本申请实施例提供了一种训练智能模型的方法、装置、电子设备及机器可读存储介质。
一方面,本申请提供了一种训练智能模型的方法,所述方法包括:从第一神经网络中获取第一网络集合和第二网络集合,所述第一网络集合包括第一神经网络和m-1个第二神经网络,m为大于1的整数,所述第二网络集合包括m个第二神经网络,所述m-1个第二神经网络和所述m个第二神经网络是所述第一神经网络中的不同子网络,所述第一网络集合中的每个神经网络的网络规模大于所述第二网络集合中的每个神经网络的网络规模;基于第一样本集合和第二样本集合,训练所述第一网络集合中的每个神经网络,得到第一模型集合,所述第一样本集合包括多个第一训练样本,每个第一训练样本包括第一训练数据和标签,所述标签用于指示所述第一训练数据中的目标,所述第二样本集合包括多个第二训练样本,每个第二训练样本包括第二训练数据,所述第一模型集合包括训练所述第一神经网络得到的第一智能模型和训练所述m-1个第二神经网络得到的m-1个第二智能模型;基于所述第一样本集合、所述第二样本集合和所述第一模型集合,训练所述第二网络集合中的每个神经网络,得到第二模型集合,所述第二模型集合包括训练所述m个第二神经网络得到的m个第二智能模型。
可选的,所述基于第一样本集合和第二样本集合,训练所述第一网络集合中的每个神经网络,得到第一模型集合,包括:针对所述第一网络集合中的任一个待训练神经网络,基于所述第一样本集合,训练所述待训练神经网络包括的卷积层的参数和全连接层的参数,训练后的所述卷积层的参数的值为第一参数值,以及训练后的所述全连接层的参数的值为第二参数值;将所述卷积层的参数的值固定为所述第一参数值,基于所述第一样本集合和所述第二样本集合,训练所述待训练神经网络包括的全连接层的参数,训练后的所述全连接层的参数的值由所述第二参数值变为第三参数值;将所述全连接层的参数的值固定为所述第三参数值,基于所述第一样本集合和所述第二样本集合,训练所述待训练神经网络的卷积层的参数,得到所述待训练神经网络对应的智能模型。
可选的,所述基于所述第一样本集合和所述第二样本集合,训练所述待训练神经网络包括的全连接层的参数,包括:将所述第一样本集合中的每个第一训练样本输入至所述待训练神经网络,获取所述待训练神经网络输出的所述每个第一训练样本对应的第一输出结果,以及将所述第二样本集合中的每个第二训练样本输入至所述待训练神经网络,获取所述每个第二训练样本对应的第一输出结果;基于所述每个第一训练样本对应的第一输出结果和所述每个第一训练样本包括的标签,通过第一损失函数获取第一损失函数 值,以及基于所述每个第二训练样本对应的第一输出结果,通过距离损失函数获取第二损失函数值,所述第一损失函数是除所述距离损失函数之外的其他损失函数;在全连接层损失函数值不是最小值时,基于所述全连接层损失函数值调整所述全连接层的参数的值,所述全连接层损失函数值包括所述第一损失函数值和所述第二损失函数值。
可选的,所述基于所述第一样本集合和所述第二样本集合,训练所述待训练神经网络的卷积层的参数,包括:将所述第一样本集合中的每个第一训练样本输入至所述待训练神经网络,获取所述待训练神经网络输出的所述每个第一训练样本对应的第二输出结果,以及将所述第二样本集合中的每个第二训练样本输入至所述待训练神经网络,获取所述每个第二训练样本对应的第二输出结果;基于所述每个第一训练样本对应的第二输出结果和所述第一训练样本包括的标签,通过所述第一损失函数获取第三损失函数值,以及基于所述每个第二训练样本对应的第二输出结果,通过所述距离损失函数获取第四损失函数值;在卷积层损失函数值不是最小值时,基于所述卷积层损失函数值调整所述卷积层的参数的值,所述卷积层损失函数值包括所述第三损失函数值和所述第四损失函数值。
可选的,所述基于所述第一样本集合、所述第二样本集合和所述第一模型集合,训练所述第二网络集合中的每个神经网络,得到第二模型集合,包括:将所述第二样本集合中的每个第二训练样本输入至所述第一模型集合中的每个智能模型,获取所述每个智能模型输出的所述第二训练样本对应的多个第三输出结果;基于所述每个第二训练样本对应的多个第三输出结果,分别获取所述每个第二训练样本对应的标签;基于所述第一样本集合、所述每个第二训练样本和所述每个第二训练样本对应的标签,训练所述第二网络集合中的每个神经网络,得到第二模型集合。
可选的,所述方法还包括:针对n个第一设备中的每个第一设备,基于所述第一设备的资源信息,确定所述第一设备对应的第三模型集合,所述第一设备对应的第三模型集合包括至少一个第二智能模型,所述第一设备的资源满足所述至少一个第二智能模型中的每个第二智能模型所需要的资源,n为大于1的整数;基于所述第一智能模型,从所述第一设备对应的第三模型集合中为所述第一设备选择一个第二智能模型。
可选的,所述基于所述第一智能模型,从所述第一设备对应的第三模型集合中为所述第一设备选择一个第二智能模型,包括:基于所述第二样本集合,获取所述第一智能模型与所述第一设备对应的第三模型集合中的每个第二智能模型的差异信息;基于所述第一智能模型与所述每个第二智能模型的差异信息,从所述至少一个第二智能模型中选择一个第二智能模型。
可选的,所述基于所述第二样本集合,获取所述第一智能模型与所述第一设备对应的第三模型集合中的每个第二智能模型的差异信息,包括:将所述第二样本集合中的每个第二训练样本输入到所述第一智能模型,获取所述第一智能模型输出的所述每个第二训练样本对应的第四输出结果;将所述第二样本集合中的每个第二训练样本输入到目标智能模型,获取所述目标智能模型输出的所述每个第二训练样本对应的第五输出结果,所述目标智能模型是所述第一设备对应的第三模型集合中的一个第二智能模型;基于所述每个第二训练样本对应的第四输出结果和第五输出结果,获取所述第一智能模型与所述目标智能模型的差异信息。
可选的,所述方法应用于中心计算-边缘设备的系统架构,所述系统架构包括中心设备和多个边缘智能设备,每个边缘智能设备部署在不同区域中,所述中心设备的运算能力和存储能力分别大于所述每个边缘智能设备的运算能力和存储能力,所述方法由所述中心设备执行,所述第二样本集合中的第二训练样本是所述中心设备接收的所述每个边缘智能设备采集的数据。
另一方面,本申请提供了一种训练智能模型的装置,所述装置包括:获取模块,用于从第一神经网络中获取第一网络集合和第二网络集合,所述第一网络集合包括第一神 经网络和m-1个第二神经网络,m为大于1的整数,所述第二网络集合包括m个第二神经网络,所述m-1个第二神经网络和所述m个第二神经网络是所述第一神经网络中的不同子网络,所述第一网络集合中的每个神经网络的网络规模大于所述第二网络集合中的每个神经网络的网络规模;第一训练模块,用于基于第一样本集合和第二样本集合,训练所述第一网络集合中的每个神经网络,得到第一模型集合,所述第一样本集合包括多个第一训练样本,每个第一训练样本包括第一训练数据和标签,所述标签用于指示所述第一训练数据中的目标,所述第二样本集合包括多个第二训练样本,每个第二训练样本包括第二训练数据,所述第一模型集合包括训练所述第一神经网络得到的第一智能模型和训练所述m-1个第二神经网络得到的m-1个第二智能模型;第二训练模块,用于基于所述第一样本集合、所述第二样本集合和所述第一模型集合,训练所述第二网络集合中的每个神经网络,得到第二模型集合,所述第二模型集合包括训练所述m个第二神经网络得到的m个第二智能模型。
可选的,所述第一训练模块,用于:针对所述第一网络集合中的任一个待训练神经网络,基于所述第一样本集合,训练所述待训练神经网络包括的卷积层的参数和全连接层的参数,训练后的所述卷积层的参数的值为第一参数值,以及训练后的所述全连接层的参数的值为第二参数值;将所述卷积层的参数的值固定为所述第一参数值,基于所述第一样本集合和所述第二样本集合,训练所述待训练神经网络包括的全连接层的参数,训练后的所述全连接层的参数的值由所述第二参数值变为第三参数值;将所述全连接层的参数的值固定为所述第三参数值,基于所述第一样本集合和所述第二样本集合,训练所述待训练神经网络的卷积层的参数,得到所述待训练神经网络对应的智能模型。
可选的,所述第一训练模块,用于:将所述第一样本集合中的每个第一训练样本输入至所述待训练神经网络,获取所述待训练神经网络输出的所述每个第一训练样本对应的第一输出结果,以及将所述第二样本集合中的每个第二训练样本输入至所述待训练神经网络,获取所述每个第二训练样本对应的第一输出结果;基于所述每个第一训练样本对应的第一输出结果和所述每个第一训练样本包括的标签,通过第一损失函数获取第一损失函数值,以及基于所述每个第二训练样本对应的第一输出结果,通过距离损失函数获取第二损失函数值,所述第一损失函数是除所述距离损失函数之外的其他损失函数;在全连接层损失函数值不是最小值时,基于所述全连接层损失函数值调整所述全连接层的参数的值,所述全连接层损失函数值包括所述第一损失函数值和所述第二损失函数值。
可选的,所述第一训练模块,用于:将所述第一样本集合中的每个第一训练样本输入至所述待训练神经网络,获取所述待训练神经网络输出的所述每个第一训练样本对应的第二输出结果,以及将所述第二样本集合中的每个第二训练样本输入至所述待训练神经网络,获取所述每个第二训练样本对应的第二输出结果;基于所述每个第一训练样本对应的第二输出结果和所述第一训练样本包括的标签,通过所述第一损失函数获取第三损失函数值,以及基于所述每个第二训练样本对应的第二输出结果,通过所述距离损失函数获取第四损失函数值;在卷积层损失函数值不是最小值时,基于所述卷积层损失函数值调整所述卷积层的参数的值,所述卷积层损失函数值包括所述第三损失函数值和所述第四损失函数值。
可选的,所述第二训练模块,用于:将所述第二样本集合中的每个第二训练样本输入至所述第一模型集合中的每个智能模型,获取所述每个智能模型输出的所述第二训练样本对应的多个第三输出结果;基于所述每个第二训练样本对应的多个第三输出结果,分别获取所述每个第二训练样本对应的标签;基于所述第一样本集合、所述每个第二训练样本和所述每个第二训练样本对应的标签,训练所述第二网络集合中的每个神经网络,得到第二模型集合。
可选的,所述装置还包括:确定模块,用于针对n个第一设备中的每个第一设备,基于所述第一设备的资源信息,确定所述第一设备对应的第三模型集合,所述第一设备 对应的第三模型集合包括至少一个第二智能模型,所述第一设备的资源满足所述至少一个第二智能模型中的每个第二智能模型所需要的资源,n为大于1的整数;选择模块,用于基于所述第一智能模型,从所述第一设备对应的第三模型集合中为所述第一设备选择一个第二智能模型。
可选的,所述选择模块,用于:基于所述第二样本集合,获取所述第一智能模型与所述第一设备对应的第三模型集合中的每个第二智能模型的差异信息;基于所述第一智能模型与所述每个第二智能模型的差异信息,从所述至少一个第二智能模型中选择一个第二智能模型。
可选的,所述选择模块,用于:将所述第二样本集合中的每个第二训练样本输入到所述第一智能模型,获取所述第一智能模型输出的所述每个第二训练样本对应的第四输出结果;将所述第二样本集合中的每个第二训练样本输入到目标智能模型,获取所述目标智能模型输出的所述每个第二训练样本对应的第五输出结果,所述目标智能模型是所述第一设备对应的第三模型集合中的一个第二智能模型;基于所述每个第二训练样本对应的第四输出结果和第五输出结果,获取所述第一智能模型与所述目标智能模型的差异信息。
再一方面,本申请提供了一种电子设备,该电子设备包括处理器和存储器;所述存储器,用于存储机器可执行指令;所述处理器,用于读取并执行所述存储器存储的机器可执行指令,以实现如上所述的训练智能模型的方法。
又一方面,本申请提供了一种机器可读存储介质,该机器可读存储介质内存储有机器可执行指令,该机器可执行指令被处理器执行时实现如上所述的训练智能模型的方法。
本申请实施例提供的技术方案可以包括以下有益效果:
通过将第一神经网络划分成第一网络集合和第二网络集合,第一网络集合包括第一神经网络和m-1个第二神经网络,第二网络集合包括m个第二神经网络。第一样本集合包括的第一训练样本为有标签训练样本,第二样本集合包括的第二训练样本为无标签训练样本。由于第一网络集合中的神经网络的规模均大于第二网络集合中的神经网络的规模,即使用第二样本集合中的第二训练样本为无标签训练样本,也可以使用第一样本集合和第二样本集合训练第一网络集合中的每个神经网络,得到第一网络集合中的每个神经网络对应的智能模型,得到第一模型集合。然后使用第一模型集合、第一样本集合和第二样本集合,对网络规模较小的第二网络集合中的每个神经网络进行训练,从而可以训练出第二网络集合中的每个神经网络对应的智能模型,如此当既包括有标签的训练样本,又包括无标签训练样本,也训练出具有优越性能的智能模型。
应当理解的是,以上的一般描述和后文的细节描述仅是示例性和解释性的,并不能限制本申请。
附图说明
此处的附图被并入说明书中并构成本说明书的一部分,示出了符合本申请的实施例,并与说明书一起用于解释本申请的原理。
图1是本申请实施例提供的一种神经网络结构示意图;
图2是本申请实施例提供的另一种神经网络结构示意图;
图3是本申请实施例提供的一种网络架构的结构示意图;
图4是本申请实施例提供的一种训练智能模型的方法流程图;
图5是本申请实施例提供的另一种训练智能模型的方法流程图;
图6是本申请实施例提供的另一种训练智能模型的方法流程图;
图7是本申请实施例提供的另一种训练智能模型的方法流程图;
图8是本申请实施例提供的一种训练智能模型的装置结构示意图;
图9是本申请实施例提供的一种电子设备结构示意图。
通过上述附图,已示出本申请明确的实施例,后文中将有更详细的描述。这些附图 和文字描述并不是为了通过任何方式限制本申请构思的范围,而是通过参考特定实施例为本领域技术人员说明本申请的概念。
具体实施方式
这里将详细地对示例性实施例进行说明,其示例表示在附图中。下面的描述涉及附图时,除非另有表示,不同附图中的相同数字表示相同或相似的要素。以下示例性实施例中所描述的实施方式并不代表与本申请相一致的所有实施方式。相反,它们仅是与如所附权利要求书中所详述的、本申请的一些方面相一致的装置和方法的例子。
第一神经网络是规模较大的神经网络,第一神经网络可以称为超网络。第一神经网络包括大量的子网络。
第一神经网络包括大量的神经元,按照不同深度、不同宽度和不同卷积和尺寸,对第一神经网络包括的神经元进行划分,可以在第一神经网络中划分出大量的子网络。
参见图1,按不同深度,对第一神经网络中的神经元进行划分,将第一神经网络划分成多个不同的子网络。或者,参见图2,按不同宽度,对超网络中的神经元进行划分,将第一神经网络划分成多个不同的子网络。
第一神经网络中的每个子网络为神经网络,使用训练样本对神经网络进行训练,得到具有所需功能的智能模型。例如,为了能够识别车牌,可以使用训练样本训练神经网络,得到具有车牌识别功能的车牌识别模型。
对于任意一个神经网络,该神经网络包括卷积层和全连接层,卷积层用于从训练样本中提取特征。全连接层基于提取的特征,对训练样本中的目标进行处理。例如,全连接层基于提取的特征,对该训练样本中的目标进行分类等。
可选的,卷积层包括特征提取器,全连接层包括多个分类器。其中,特征提取器属于该神经网络的卷积层,用于从训练样本中提取特征。对于每个分类器,该分类器基于特征提取器提取的特征,对该训练样本中的目标进行分类。
训练样本分为有标签的训练样本和无标签的训练样本两类。其中,有标签的训练样本包括训练数据和标签,该标签用于指示训练数据中的目标。例如,对于用于训练车牌识别模型的训练样本,该训练样本中的训练数据是包括车牌的图片,该训练样本中的标签用于指示该图片中的车牌位置,该训练样本为有标签的训练样本。无标签的训练样本包括训练数据。例如,对于用于训练车牌识别模型的训练样本,该训练样本包括图片,该图片包括车牌图像,该训练样本为无标签的训练样本。
上述智能模型以车牌识别模型为例,但智能模型还可以为用于实现语义分割的智能模型、用于目标检测的智能模型、用于关键点检测的智能模型和/或用于视频理解的智能模型等。这些智能模型均可以使用训练样本对神经网络进行训练得到。
无论是何种智能模型,全连接层的分类器对训练样本中的训练数据进行分类处理。如上述车牌识别模型,分类器对训练样本中的每个图像进行分类,在该训练样本中分类出某个图像为车牌图像时,将该车牌图像的位置作为车牌位置。再例如,对于用于实现语义分割的智能模型,分类器对训练样本中的各数据内容的语义进行分类,分类出不同的语义的数据内容,以实现进行语义分割。
在本申请实施例提供了第一样本集合和第二样本集合,第一样本集合包括多个第一训练样本,每个第一训练样本包括第一训练数据和标签,该标签用于指示该第一训练数据包括的目标,第二样本集合包括多个第二训练样本,每个第二训练样本包括第二训练数据。其中,第一样本集合中的每个第一训练样本均为有标签的训练样本,第二样本集合中的每个第二训练样本均为无标签的训练样本。
基于第一样本集合和第二样本集合,训练第一神经网络以及第一神经网络包括的各个子网络,从而得到第一神经网络对应的智能模型,以及各个子网络对应的智能模型。由于第一神经网络的各个子网络的规模不同,所以训练出的第一神经网络对应的智能模型的规模和各个子网络对应的智能模型的规模不同,从而可以将不同规模智能模型,部 署在不同硬件性能的设备上。
第一样本集合中的第一训练样本和第二样本集合中的第二训练样本可能是不同环境下的训练样本。例如以上述车牌识别模型为例,第一样本集合中的每个第一训练样本可能是白天对各种车牌进行拍摄得到的样本,第二样本集合中的每个第二训练样本可能是黑夜对各种车牌进行拍摄得到的样本。这样通过第一样本集合和第二样本集合训练得到的车牌识别模型即可以在白天进行车牌识别,也可以在黑夜进行车牌识别。
参见图4,本申请实施例提供了一种网络架构,该网络架构包括中心设备和多个边缘智能设备,每个边缘智能设备与中心设备通信。
可选的,每个边缘智能设备和中心设备均接入通信网络,每个边缘智能设备在该通信网络中分别建立与中心设备之间的网络连接,以实现每个边缘智能设备与中心设备通信。当然,还有其他实现每个边缘智能设备与中心设备通信的方式,在此不再详细说明。
每个边缘智能设备的计算能力和存储能力分别远小于中心设备的计算能力和存储能力,每个边缘智能设备的计算能力和存储能力可能不同。
中心设备包括第一样本集合,第一样本集合中的每个第一训练样本包括第一训练数据和标签,该标签用于指示该第一训练数据包括的目标。
可选的,每个第一训练样本包括的标签可能是采用人工方式分别对每个第一训练样本进行标注得到的。
每个边缘智能设备部署在不同的区域中,每个边缘智能设备对所在区域进行采集,并将采集的数据作为第二训练样本,向中心设备发送第二训练样本。由于边缘智能设备的个数往往较大,每个边缘智能设备采集到大量的第二训练样本,所以中心设备会得到大量的第二训练样本。如果继续采用人工方式对第二训练样本进行标注,人工成本太高,所以中心设备将每个边缘智能设备采集的数据直接作为第二训练样本,并将每个第二训练样本组成第二样本集合。然后中心设备基于第一样本集合和第二样本集合训练一个超网络,将超网络训练成多个不同规模的智能模型,每个智能模型所需要设备的计算能力和存储能力不同。中心设备向不同的边缘智能设备安装不同的智能模型。
其中,对于每个边缘智能设备,每个边缘智能设备部署在不同的场景中。由于每个边缘智能设备受所在环境的影响,所以每个边缘智能设备采集的第二训练样本对应的场景和中心设备中的各第一训练样本对应的场景不同。
可选的,中心设备为中心服务器。例如中心设备为云计算服务器或云计算服务器集群等。边缘智能设备为智能摄像头、电脑或单片机等。
可选的,该网络架构是中心计算-边缘设备的系统架构,例如可以是云中心计算-边缘设备的系统架构,即一个中心设备是一个具有大规模运算和存储能力的服务器或者服务器集群(举例来说,服务器集群由200台显卡、100台CPU和100T存储硬盘等设备组成),多个边缘智能设备是指具有多个小规模运算和存储能力的边缘设备,比如,边缘智能设备是比服务器集群规模小的服务器(如无显卡或者显卡较少的电脑;单片机),有一定的智能模型搭载能力的摄像头,或者,运算能力更小的智能眼镜等。
图4所示的网络架构应用在智能野生动物保护监测系统。该系统由一个云计算中心(为中心设备)、多个边缘智能设备组成。
在云计算中心是一个服务器集群,该集群硬件运算能力和存储能力强,可以利用预选收集的带标签的野生动物数据(第一训练样本)。云计算中心可以使用带标签的第一训练样本训练超网络,得到超网络模型。边缘智能设备,包括在不同野生动物保护片区安装的智能摄像头,用于采集和处理图像数据,对可能出现的野生动物进行识别和监测。智能摄像头是具有运行智能模型能力的设备,具有不同的型号,因此运算和存储能力各不相同,但与云计算中心相比十分有限,可能无法直接运行云计算中心训练获得的超网络模型完成任务。
智能摄像头采集的数据数量庞大,动用人力对图像中野生动物目标进行标注,代价 昂贵;不同的智能摄像头受所在环境影响,采集、处理的数据与云计算中心中的第一训练样本相比,具有不同的场景。
不同智能摄像头需要向云计算中心发送自身的运算和存储条件信息,以及采集的一系列无标注图像信息(为第二训练样本),然后云计算中心根据这些无标注图像信息和自身包括的第一训练样本,通过本申请提供的训练方法对超网络进行训练,训练出符合每个智能摄像头运算能力、存储能力和每个智能摄像头所处场景条件的智能模型。这样,对于不同摄像头,该摄像头可以搭载符合条件(自身的运算能力、存储能力和所处场景条件)的智能模型,然后对采集图像处理,执行野生动物智能监测任务。
在以上过程中,如果不使用本申请的训练方法,则可能需要对不同的智能摄像头训练不同的模型以满足他们各自运算能力的要求以及处理数据的场景要求;如果向系统中添加新的摄像设备,需要重新进行训练,导致系统的可持续性和灵活性不高。但是本申请的训练方法只训练了一个超网络模型,通过收集不同智能摄像头的运算能力、存储能力信息以及所采集的无标签数据(即为第二训练样本),即可获得定制智能模型。省去了不同智能摄像头所需不同模型的单独训练成本,以及不同智能摄像头所采集不同场景数据的人工标注成本。
以上所述的智能野生动物保护监测系统,也可以替换为其他任务。比如,在危险环境中的监控任务:如超高压电网变电站区域的人员/物体入侵预警系统、大规模生产线上对违反安全规范操作识别系统等;也可以是智慧城市公共服务系统不同道路监控设备和智慧交通指挥中心和不同道路电子摄像设备组成的智慧交通系统等。
参见图4,本申请实施例提供了一种训练智能模型的方法,所述方法应用于图3所示的网络架构,所述方法的执行主体为该网络架构中的中心设备,所述方法用于训练第一神经网络,得到第一神经网络对应的第一智能模型和第一神经网络中的多个子网络各自对应的第二智能模型,包括:
步骤301:从第一神经网络中获取第一网络集合和第二网络集合,第一网络集合包括第一神经网络和m-1个第二神经网络,m为大于1的整数,第二网络集合包括m个第二神经网络,该m-1个第二神经网络和该m个第二神经网络是第一神经网络中的不同子网络,第一网络集合中的每个神经网络的网络规模大于第二网络集合中的每个神经网络的网络规模。
步骤302:基于第一样本集合和第二样本集合,训练第一网络集合中的每个神经网络,得到第一模型集合,第一样本集合包括多个第一训练样本,每个第一训练样本包括第一训练数据和标签,该标签用于指示第一训练数据中的目标,第二样本集合包括多个第二训练样本,每个第二训练样本包括第二训练数据,第一模型集合包括训练第一神经网络得到的第一智能模型和训练m-1个第二神经网络得到的m-1个第二智能模型。
可选的,中心设备接收每个边缘智能设备采集的数据,将每个边缘智能设备采集的数据作为第二训练样本。例如,每个边缘智能设备采集的数据为图片等。
步骤303:基于第一样本集合、第二样本集合和第一模型集合,训练第二网络集合中的每个神经网络,得到第二模型集合,第二模型集合包括训练该m个第二神经网络得到的m个第二智能模型。
在本申请实施例中,将第一神经网络划分成第一网络集合和第二网络集合,第一网络集合包括第一神经网络和m-1个第二神经网络,第二网络集合包括m个第二神经网络。第一样本集合包括的第一训练样本为有标签训练样本,第二样本集合包括的第二训练样本为无标签训练样本。由于第一网络集合中的神经网络的规模均大于第二网络集合中的神经网络的规模,即使用第二样本集合中的第二训练样本为无标签训练样本,也可以使用第一样本集合和第二样本集合训练第一网络集合中的每个神经网络,得到第一网络集合中的每个神经网络对应的智能模型,得到第一模型集合。然后使用第一模型集合、第一样本集合和第二样本集合,对网络规模较小的第二网络集合中的每个神经网络进行 训练,从而可以训练出第二网络集合中的每个神经网络对应的智能模型,如此当既包括有标签的训练样本,又包括无标签训练样本,也训练出具有优越性能的智能模型。
其中,无标签的训练样本均是不同的边缘智能设备采集得到的,每个边缘智能设备部署在不同的区域,所以每个边缘智能设备所处场景不同。这些无标签的训练样本构成第二样本集合,而本申请实施例基于第一样本集合和第二样本集合训练出不同规模的智能模型,使得每个智能模型能够满足不同场景条件。由于不同规模的智能模型对边缘智能设备的运算能力和存储能力要求不同,所以可以将不同规模的智能模型部署在不同的边缘智能设备上。
其中,由于训练出第一模型集合,对第二样本集合中的每个第二训练样本,通过第一模型集合包括m个智能模型对第二训练样本中的训练数据进行分类处理,得到每个第二训练样本对应的m个处理结果,基于每个第二训练样本对应的m个处理结果,获取每个第二训练样本对应的标签,这样基于每个第一训练样本和每个第二训练样本对应的标签,可以训练第二网络集合中的每个神经网络,从而使得第二模型集合中的每个智能模型具有优越性能。
对于上述图4所示的方法300,接下来对所述方法300中的各步骤进行说明。
对于上述步骤301,在步骤301中,从第一神经网络中获取第一网络集合和第二网络集合,第一网络集合包括第一神经网络和m-1个第二神经网络,第二网络集合包括m个第二神经网络,该m-1个第二神经网络和该m个第二神经网络是第一神经网络的不同子网络,m为大于1的整数,第一网络集合中的每个神经网络的网络规模大于第二网络集合中的每个神经网络的网络规模。
在步骤301中,在超网络中随机选择2m个神经网络,该2m个神经网络包括第一神经网络和2m-1个子网络,该2m-1个子网络即为2m-1个第二神经网络,在该2m个神经网络中第一神经网络的规模最大。从该2m个神经网络中选择规模最大的m个神经网络,该m个神经网络包括第一神经网络和m-1个规模最大的第二神经网络,将第一神经网络和选择m-1个第二神经网络组成第一网络集合,将剩下未选择的m个第二神经网络组成第二网络集合。
其中,第一神经网络包括的子网络个数大于或等于2m-1个,所以在执行上述步骤301-303后,还可以从第一神经网络中再随机选择2m-1个第二神经网络,将第一神经网络和该2m-1个第二神经网络,继续执行上述301-303的操作,直至对第一神经网络中的每个子网络进行训练,得到每个子网络对应的智能模型。
对于上述步骤302,参见图5,在步骤302中,按如下3021至3024的操作实现步骤302。该3021至3024的操作分别为:
步骤3021:基于第一样本集合,训练待训练神经网络包括的卷积层的参数和全连接层的参数,待训练神经网络是第一网络集合中的任一个神经网络,训练后的卷积层的参数的值为第一参数值,以及训练后的全连接层的参数的值为第二参数值。
训练卷积层的参数实质是训练卷积层包括的特征提取器的参数,训练全连接层的参数实质是训练全连接层包括的每个分类器的参数。
在步骤3021中,按如下11至14的操作实现步骤3021。该11至14的操作分别为:
11:将第一样本集合中的每个第一训练样本输入至待训练神经网络,使待训练神经网络对每个第一训练样本中的第一训练数据包括的目标进行分类处理,得到每个第一训练样本对应的第六输出结果。
对于任一个第一训练样本,第一训练样本对应的第六输出结果用于指示该第一训练样本中的第一训练数据包括的目标。
其中,待训练神经网络包括卷积层和全连接层,在将该第一训练样本输入到待训练神经网络后,待训练神经网络中的卷积层从该第一训练样本中的第一训练数据中提取特征,向待训练神经网络中的全连接层输入该特征,全连接层基于该特征,对该第一训练 样本中的第一训练数据包括的目标进行分类处理,输出处理结果,该输出的处理结果即为该第一训练样本对应的第六输出结果。
其中,全连接层包括的每个分类器均基于该特征,对第一训练样本中的第一训练数据包括的目标进行分类处理,所以待训练神经网络输出的该第一训练样本对应的第六输出结果包括每个分类器输出的处理结果。
例如,卷积层的特征提取器从该第一训练样本中的第一训练数据中提取特征,向待训练神经网络中的每个分类器输入该特征,每个分类器基于该特征对该第一训练样本中的第一训练数据包括的目标进行分类处理,输出处理结果。其中,该第一训练样本对应的第六输出结果包括每个分类器的输出结果。
12:基于每个第一训练样本对应的第六输出结果,按如下第一损失函数获取第五损失函数值。
第一损失函数为:
Figure PCTCN2022131038-appb-000001
在第一损失函数中,s 1为第五损失函数值,F d为卷积层的特征提取器,C di为全连接层的第i个分类器,(x s,y s)为第一样本集合,(x,y)为第一训练样本,x为第一训练样本中的第一训练数据,y为第一训练样本中的标签,
Figure PCTCN2022131038-appb-000002
为特征提取器F d和第i个分类器C di的交叉熵损失函数,K表示类别总数目,y代表标签,k为第一训练样本对应的第六输出结果,当k=y时,1 [k=y]取值为1,C(F(x))为分类器函数,z为全连接层包括的分类器数目。
可选的,第一损失函数可以为交叉熵损失函数等。
13:在第五损失函数值不是第一损失函数的最小值时,基于第五损失函数值调整待训练神经网络的卷积层的参数的值以及全连接层的参数的值,返回执行11。
调整卷积层的参数的值包括调整待训练神经网络的特征提取器的参数的值。调整全连接层的参数的值包括调整待训练神经网络的每个分类器的参数的值。
其中,需要说明的是,由于待训练神经网络是第一神经网络中的网络,待训练神经网络的特征提取器也可能是其他神经网络中的特征提取器,所以在调整待训练神经网络中的特征提取器的参数时,其他神经网络中的该特征提取器的参数也随之一并调整。同样,待训练神经网络的分类器也可能是其他神经网络中的分类器,所以在调整待训练神经网络中的分类器的参数时,其他神经网络中的该分类器的参数也随之一并调整。
14:在第五损失函数值是第一损失函数的最小值时,则结束操作。其中,为了便于说明,将待训练神经网络的卷积层的参数的值称为第一参数值,将待训练神经网络的全连接层的参数的值称为第二参数值。
步骤3022:将待训练神经网络的卷积层的参数的值固定为第一参数值,基于第一样本集合和第二样本集合,训练待训练神经网络包括的全连接层的参数,训练后的全连接层的参数的值由第二参数值变为第三参数值。
在步骤3022中,将待训练神经网络的卷积层的参数的值固定为第一参数值包括将待训练神经网络的特征提取器的参数的值固定为第一参数值。基于第一样本集合和第二样本集合,训练待训练神经网络包括的每个分类器的参数,训练后的每个分类器的参数的值由第二参数值变为第三参数值。
每个分类器的第三参数值可能均相同,可能不均相同。
在步骤3022中,按如下21至26的操作训练待训练神经网络包括的全连接层的参数。该21至26的操作分别为:
21:将第一样本集合中的每个第一训练样本输入至待训练神经网络,使待训练神经 网络对每个第一训练样本中的第一训练数据包括的目标进行分类处理,得到每个第一训练样本对应的第一输出结果。
例如,将第一样本集合中的每个第一训练样本输入至待训练神经网络,使待训练神经网络对每个第一训练样本中的第一训练数据包括的目标进行分类处理,得到每个第一训练样本对应的第一输出结果。
对于任一个第一训练样本,第一训练样本对应的第一输出结果用于指示该第一训练样本中的第一训练数据包括的目标。
其中,在将该第一训练样本输入到待训练神经网络后,待训练神经网络中的卷积层(特征提取器)从该第一训练样本中的第一训练数据中提取特征,向待训练神经网络中的全连接层(全连接层中的每个分类器)输入该特征,全连接层(每个分类器)基于该特征对该第一训练样本中的第一训练数据包括的目标进行分类处理,输出处理结果,该第一训练样本对应的第一输出结果包括每个分类器的输出结果。
22:将第二样本集合中的每个第二训练样本输入至待训练神经网络,使待训练神经网络对每个第二训练样本中的第二训练数据包括的目标进行处理,得到每个第二训练样本对应的第一输出结果。
对于任一个第二训练样本,第二训练样本对应的第一输出结果用于指示该第二训练样本中的第二训练数据包括的目标。
其中,在将该第二训练样本输入到待训练神经网络后,待训练神经网络中的卷积层(特征提取器)从该第二训练样本中的第二训练数据中提取特征,向待训练神经网络中的全连接层(全连接层包括每个分类器)输入该特征,全连接层(每个分类器)基于该特征对该第二训练样本中的第二训练数据包括的目标进行分类处理,输出处理结果,该第二训练样本对应的第一输出结果包括每个分类器的输出结果。
23:基于每个第一训练样本对应的第一输出结果和每个第一训练样本包括的标签,按上述第一损失函数获取第一损失函数值。
在23操作中,仍使用s 1表示第一损失函数值。
24:基于每个第二训练样本对应的第一输出结果,通过如下距离损失函数获取第二损失函数值,第一损失函数和距离损失函数不同。
距离损失函数为:
Figure PCTCN2022131038-appb-000003
在距离损失函数中,s 2为第二损失函数值,{x t}为第二样本集合,(x)为第二训练样本。
25:在全连接层损失函数值不是最小值时,基于全连接层损失函数值调整待训练神经网络的全连接层(全连接层包括的多个分类器)的参数的值,返回执行21,全连接层损失函数值包括第一损失函数值和第二损失函数值。
例如,全连接层损失函数值可等于s 1-s 2,其中,
Figure PCTCN2022131038-appb-000004
26:在全连接层损失函数值是最小值时,则结束操作。其中,为了便于说明,将待训练神经网络的卷积层(特征提取器)的参数的值称为第一参数值,将待训练神经网络的全连接层(全连接层中的多个分类器)的参数的值称为第三参数值。
步骤3023:将待训练神经网络的全连接层(该全连接层中的多个分类器)的参数的值固定为第三参数值,基于第一样本集合和第二样本集合,训练待训练神经网络的卷积层(特征提取器)的参数,得到待训练神经网络对应的智能模型。
在步骤3023中,按如下31至36的操作训练待训练神经网络包括的卷积层(特征提取器)的参数。该31至36的操作分别为:
31:将第一样本集合中的每个第一训练样本输入至待训练神经网络,使待训练神经网络对每个第一训练样本中的第一训练数据包括的目标进行处理,得到每个第一训练样本对应的第二输出结果。
对于任一个第一训练样本,第一训练样本对应的第二输出结果用于指示该第一训练样本中的第一训练数据包括的目标。
其中,在将该第一训练样本输入到待训练神经网络后,待训练神经网络中的卷积层(特征提取器)从该第一训练样本中的第一训练数据中提取特征,向待训练神经网络中的全连接层(全连接层中的每个分类器)输入该特征,全连接层(全连接层中的每个分类器)基于该特征识别该第一训练样本中的第一训练数据包括的目标,输出识别结果,该第一训练样本对应的第二输出结果包括全连接层(全连接层中的每个分类器)的输出结果。
32:将第二样本集合中的每个第二训练样本输入至待训练神经网络,使待训练神经网络对每个第二训练样本中的第二训练数据包括的目标进行处理,得到每个第二训练样本对应的第二输出结果。
对于任一个第二训练样本,第二训练样本对应的第二输出结果用于指示该第二训练样本中的第二训练数据包括的目标。
其中,在将该第二训练样本输入到待训练神经网络后,待训练神经网络中的卷积层(特征提取器)从该第二训练样本中的第二训练数据中提取特征,向待训练神经网络中的全连接层(全连接层中的每个分类器)输入该特征,全连接层(全连接层中的每个分类器)基于该特征识别该第二训练样本中的第二训练数据包括的目标,输出识别结果,该第二训练样本对应的第二输出结果包括每个分类器的输出结果。
33:基于每个第一训练样本对应的第二输出结果和每个第一训练样本包括的标签,按上述第一损失函数获取第三损失函数值。
在33操作中,仍使用s 1表示第三损失函数值。
34:基于每个第二训练样本对应的第二输出结果,通过上述距离损失函数获取第四损失函数值。
在34操作中,仍使用s 2表示第四损失函数值。
35:在卷积层损失函数值不是最小值时,基于卷积层损失函数值调整待训练神经网络的卷积层(特征提取器)的参数的值,返回执行31,卷积层损失函数值包括第三损失函数值和第四损失函数值。
例如,卷积层损失函数值可等于s 1+s 2,其中,
Figure PCTCN2022131038-appb-000005
36:在卷积层损失函数值是最小值时,则结束操作,将此时的待训练神经作为智能模型。
对于第一网络集合中的每个神经网络(第一神经网络和m-1个第二神经网络),对每个神经网络执行上述步骤3021-3023的流程,得到m个智能模型。该m个智能模型包括对第一神经网络进行训练得到的第一智能模型,以及分别对m-1个第二神经网络进行训练得到的m-1个第二智能模型。
对于上述步骤303,参见图6,在步骤303中,按如下3031至3033的操作实现步 骤303。该3031至3033的操作分别为:
步骤3031:将第二样本集合中的每个第二训练样本输入至第一模型集合中的每个智能模型,获取每个智能模型输出的第二训练样本对应的多个第三输出结果。
第一模型集合包括m个智能模型,对于第二样本集合中的每个第二训练样本,将该第二训练样本分别输入到该m个智能模型中的每个智能模型中。每个智能模型对该第二训练样本包括的第二训练数据中的目标进行处理,输出处理结果。获取该m个智能模型输出的处理结果,共m个处理结果,将该m个处理结果作为该第二训练样本对应的m个第三输出结果。对第二样本集合中的每个第二训练样本,执行上述操作,得到第二样本集合中的每个第二训练样本对应的m个第三输出结果。
步骤3032:基于每个第二训练样本对应的多个第三输出结果,分别获取每个第二训练样本对应的标签。
对于任一个第二训练样本,该第二训练样本对应的m个第三输出结果,计算该m个第三输出结果的平均值,或者,从该m个第三输出结果中选择最大值或最小值等。将该平均值,该最大值或者该最小值作为该第二训练样本对应的标签。对于第二样本集合的每个第二训练样本,按上述操作获取每个第二训练样本对应的标签。
步骤3033:基于第一样本集合、每个第二训练样本和每个第二训练样本对应的标签,训练第二网络集合中的每个神经网络,得到第二模型集合。
对于第二网络集合中的任一个第二神经网络,可以按如下41-44的操作训练该第二神经网络。该41-44的操作分别为:
41:将每个训练样本输入至该第二神经网络,使该第二神经网络对每个训练样本中的训练数据包括的目标进行处理,得到每个训练样本对应的第七输出结果,该每个训练样本包括第一样本集体中的每个第一训练样本和第二样本集合中的每个第二训练样本。
对于每个训练样本,该训练样本对应的第七输出结果用于指示该训练样本中的训练数据包括的目标。
其中,在将该训练样本输入到该第二神经网络后,该第二神经网络中的卷积层(特征提取器)从该第二训练样本中的训练数据中提取特征,向该第二神经网络中的全连接层(全连接层中的每个分类器)输入该特征,全连接层(全连接层中的每个分类器)基于该特征对该训练样本中的训练数据包括的目标进行处理,输出处理结果,该训练样本对应的第七输出结果包括全连接层的输出结果。
42:基于该每个训练样本对应的第七输出结果和该每个训练样本对应的标签,按上述第一损失函数获取损失函数值。
43:在该损失函数值不是第一损失函数的最小值时,基于该损失函数值调整该第二神经网络的参数的值,返回执行41。
44:在该损失函数值是第一损失函数的最小值时,则结束操作,将此时的第二神经网络作为第二智能模型。
对第二网络集合中的每个第二神经网络,可以按上述41-44的操作进行训练,得到每个第二神经网络对应的第二智能模型,共得到m个第二智能模型。
通过上述步骤301-303的流程训练出第一神经网络对应的第一智能模型和第一神经网络中的每个子网络对应的第二智能模型,每个第二智能模型的规模不同,不同规模的第二智能模型适应于不同的硬件性能的设备。假设存在n个不同硬件性能的设备,n为大于1的整数,第二智能模型的个数较多,因此可以从众多的第二智能模型中选择每个设备选择一个第二智能模型。参见图7,本申请实施例提供了如下选择智能模型的方法,包括:
步骤601:基于n个设备的资源信息,确定n个第三模型集合,该n个设备与该n个第三模型集合一一对应。
其中,该n个设备包括第一设备,第一设备对应的第三模型集合包括至少一个第二智能模型,第一设备的资源满足至少一个第二智能模型中的每个第二智能模型所需要的资源。
可选的,第一设备的资源信息包括第一设备空闲的内存大小和/或CPU处理核个数等,第一设备的资源信息用于指示设备上的资源。
在步骤601中,基于第一设备的资源信息和训练出的每个第二智能模型的规模,确定能够在第一设备上运行的至少一个第二智能模型(例如,第一设备的资源需要满足确定出的每个第二智能模型所需要的资源),并将包括该至少一个第二智能模型的模型集合确定为第一设备对应的第三模型集合。按上述相同方法,对其他n-1个设备分别执行上述操作,得到该n-1个设备中的每个设备对应的第三模型集合。
步骤602:基于第一智能模型,从第一设备对应的第三模型集合中为第一设备选择一个第二智能模型。
在步骤602中,将第二样本集合中的每个第二训练样本输入到第一智能模型,获取第一智能模型输出的每个第二训练样本对应的第四输出结果。将第二样本集合中的每个第二训练样本输入到目标智能模型,获取目标智能模型输出的每个第二训练样本对应的第五输出结果,目标智能模型是第一设备对应的第三模型集合中的一个第二智能模型。基于每个第二训练样本的第四输出结果和第五输出结果,获取第一智能模型与目标智能模型的差异信息。按上述相同方式,获取第一智能模型与第一设备对应的第三模型集合中的每个第二智能模型之间的差异信息。基于第一智能模型与每个第二智能模型的差异信息,从至少一个第二智能模型中选择一个第二智能模型。
可选的,计算每个第二训练样本的第四输出结果和第五输出结果之间的差值,即计算出每个第二训练样本对应的差值,基于每个第二训练样本对应的差值,计算出平均差值,将该平均差值作为第一智能模型与目标智能模型的差异信息。
参见图8,本申请实施例提供了一种训练智能模型的装置700,所述装置700包括:
获取模块701,用于从第一神经网络中获取第一网络集合和第二网络集合,所述第一网络集合包括第一神经网络和m-1个第二神经网络,m为大于1的整数,所述第二网络集合包括m个第二神经网络,所述m-1个第二神经网络和所述m个第二神经网络是所述第一神经网络中的不同子网络,所述第一网络集合中的每个神经网络的网络规模大于所述第二网络集合中的每个神经网络的网络规模;
第一训练模块702,用于基于第一样本集合和第二样本集合,训练所述第一网络集合中的每个神经网络,得到第一模型集合,所述第一样本集合包括多个第一训练样本,每个第一训练样本包括第一训练数据和标签,所述标签用于指示所述第一训练数据中的目标,所述第二样本集合包括多个第二训练样本,每个第二训练样本包括第二训练数据,所述第一模型集合包括训练所述第一神经网络得到的第一智能模型和训练所述m-1个第二神经网络得到的m-1个第二智能模型;
第二训练模块703,用于基于所述第一样本集合、所述第二样本集合和所述第一模型集合,训练所述第二网络集合中的每个神经网络,得到第二模型集合,所述第二模型集合包括训练所述m个第二神经网络得到的m个第二智能模型。
可选的,所述第一训练模块702,用于:针对所述第一网络集合中的任一个待训练神经网络,基于所述第一样本集合,训练所述待训练神经网络包括的卷积层的参数和全连接层的参数,训练后的所述卷积层的参数的值为第一参数值,以及训练后的所述全连接层的参数的值为第二参数值;将所述卷积层的参数的值固定为所述第一参数值,基于所述第一样本集合和所述第二样本集合,训练所述待训练神经网络包括的全连接层的参数,训练后的所述全连接层的参数的值由所述第二参数值变为第三参数值;将所述全连接层的参数的值固定为所述第三参数值,基于所述第一样本集合和所述第二样本集合,训练所述待训练神经网络的卷积层的参数,得到所述待训练神经网络对应的智能模型。
可选的,所述第一训练模块702,用于:将所述第一样本集合中的每个第一训练样本输入至所述待训练神经网络,获取所述待训练神经网络输出的所述每个第一训练样本对应的第一输出结果,以及将所述第二样本集合中的每个第二训练样本输入至所述待训练神经网络,获取所述每个第二训练样本对应的第一输出结果;基于所述每个第一训练样本对应的第一输出结果和所述每个第一训练样本包括的标签,通过第一损失函数获取第一损失函数值,以及基于所述每个第二训练样本对应的第一输出结果,通过距离损失函数获取第二损失函数值,所述第一损失函数是除所述距离损失函数之外的其他损失函数;在全连接层损失函数值不是最小值时,基于所述全连接层损失函数值调整所述全连接层的参数的值,所述全连接层损失函数值包括所述第一损失函数值和所述第二损失函数值。
可选的,所述第一训练模块702,用于:将所述第一样本集合中的每个第一训练样本输入至所述待训练神经网络,获取所述待训练神经网络输出的所述每个第一训练样本对应的第二输出结果,以及将所述第二样本集合中的每个第二训练样本输入至所述待训练神经网络,获取所述每个第二训练样本对应的第二输出结果;基于所述每个第一训练样本对应的第二输出结果和所述第一训练样本包括的标签,通过所述第一损失函数获取第三损失函数值,以及基于所述每个第二训练样本对应的第二输出结果,通过所述距离损失函数获取第四损失函数值;在卷积层损失函数值不是最小值时,基于所述卷积层损失函数值调整所述卷积层的参数的值,所述卷积层损失函数值包括所述第三损失函数值和所述第四损失函数值。
可选的,所述第二训练模块703,用于:将所述第二样本集合中的每个第二训练样本输入至所述第一模型集合中的每个智能模型,获取所述每个智能模型输出的所述第二训练样本对应的多个第三输出结果;基于所述每个第二训练样本对应的多个第三输出结果,分别获取所述每个第二训练样本对应的标签;基于所述第一样本集合、所述每个第二训练样本和所述每个第二训练样本对应的标签,训练所述第二网络集合中的每个神经网络,得到第二模型集合。
可选的,所述装置700还包括:确定模块,用于针对n个第一设备中的每个第一设备,基于所述第一设备的资源信息,确定所述第一设备对应的第三模型集合,所述第一设备对应的第三模型集合包括至少一个第二智能模型,所述第一设备的资源满足所述至少一个第二智能模型中的每个第二智能模型所需要的资源,n为大于1的整数;选择模块,用于基于所述第一智能模型,从所述第一设备对应的第三模型集合中为所述第一设备选择一个第二智能模型。
可选的,所述选择模块,用于:基于所述第二样本集合,获取所述第一智能模型与所述第一设备对应的第三模型集合中的每个第二智能模型的差异信息;基于所述第一智能模型与所述每个第二智能模型的差异信息,从所述至少一个第二智能模型中选择一个第二智能模型。
可选的,所述选择模块,用于:将所述第二样本集合中的每个第二训练样本输入到所述第一智能模型,获取所述第一智能模型输出的所述每个第二训练样本对应的第四输出结果;将所述第二样本集合中的每个第二训练样本输入到目标智能模型,获取所述目标智能模型输出的所述每个第二训练样本对应的第五输出结果,所述目标智能模型是所述第一设备对应的第三模型集合中的一个第二智能模型;基于所述每个第二训练样本对应的第四输出结果和第五输出结果,获取所述第一智能模型与所述目标智能模型的差异信息。
在本申请实施例中,由于第一网络集合中的神经网络的规模均大于第二网络集合中的神经网络的规模,可以使用第一样本集合和第二样本集合训练第一网络集合中的每个神经网络,得到第一网络集合中的每个神经网络对应的智能模型,得到第一模型集合。然后使用第一模型集合、第一样本集合和第二样本集合,对网络规模较小的第二网络集 合中的每个神经网络进行训练,从而可以训练出第二网络集合中的每个神经网络对应的智能模型,如此当既包括有标签的训练样本,又包括无标签训练样本,也训练出具有优越性能的智能模型。
图9示出了本申请一个示例性实施例提供的电子设备800的结构框图。该电子设备800可以是便携式移动终端,比如:智能手机、平板电脑、MP3播放器(Moving Picture Experts Group Audio Layer III,动态影像专家压缩标准音频层面3)、MP4(Moving Picture Experts Group Audio Layer IV,动态影像专家压缩标准音频层面4)播放器、笔记本电脑或台式电脑。电子设备800还可能被称为用户设备、便携式终端、膝上型终端、台式终端等其他名称。
通常,电子设备800包括有:处理器801和存储器802。
处理器801可以包括一个或多个处理核心,比如4核心处理器、8核心处理器等。处理器801可以采用DSP(Digital Signal Processing,数字信号处理)、FPGA(Field-Programmable Gate Array,现场可编程门阵列)、PLA(Programmable Logic Array,可编程逻辑阵列)中的至少一种硬件形式来实现。处理器801也可以包括主处理器和协处理器,主处理器是用于对在唤醒状态下的数据进行处理的处理器,也称CPU(Central Processing Unit,中央处理器);协处理器是用于对在待机状态下的数据进行处理的低功耗处理器。在一些实施例中,处理器801可以集成有GPU(Graphics Processing Unit,图像处理器),GPU用于负责显示屏所需要显示的内容的渲染和绘制。一些实施例中,处理器801还可以包括AI(Artificial Intelligence,人工智能)处理器,该AI处理器用于处理有关机器学习的计算操作。
存储器802可以包括一个或多个计算机可读存储介质,该计算机可读存储介质可以是非暂态的。存储器802还可包括高速随机存取存储器,以及非易失性存储器,比如一个或多个磁盘存储设备、闪存存储设备。在一些实施例中,存储器802中的非暂态的计算机可读存储介质用于存储至少一个指令,该至少一个指令用于被处理器801所执行以实现本申请中方法实施例提供的训练智能模型的方法。
在一些实施例中,电子设备800还可选包括有:外围设备接口803和至少一个外围设备。处理器801、存储器802和外围设备接口803之间可以通过总线或信号线相连。各个外围设备可以通过总线、信号线或电路板与外围设备接口803相连。具体地,外围设备包括:射频电路804、显示屏805、摄像头组件806、音频电路807、定位组件808和电源809中的至少一种。
外围设备接口803可被用于将I/O(Input/Output,输入/输出)相关的至少一个外围设备连接到处理器801和存储器802。在一些实施例中,处理器801、存储器802和外围设备接口803被集成在同一电路板上;在一些其他实施例中,处理器801、存储器802和外围设备接口803中的任意一个或两个可以在单独的电路板上实现,本实施例对此不加以限定。
射频电路804用于接收和发射RF(Radio Frequency,射频)信号,也称电磁信号。射频电路804通过电磁信号与通信网络以及其他通信设备进行通信。射频电路804将电信号转换为电磁信号进行发送,或者,将接收到的电磁信号转换为电信号。可选地,射频电路804包括:天线系统、RF收发器、一个或多个放大器、调谐器、振荡器、数字信号处理器、编解码器、用户身份模块卡等等。射频电路804可以通过至少一种无线通信协议来与其它终端进行通信。该无线通信协议包括但不限于:万维网、城域网、内联网、各代移动通信网络(2G、3G、4G及5G)、无线局域网和/或WiFi(Wireless Fidelity,无线保真)网络。在一些实施例中,射频电路804还可以包括NFC(Near Field Communication,近距离无线通信)有关的电路,本申请对此不加以限定。
显示屏805用于显示UI(User Interface,用户界面)。该UI可以包括图形、文本、图标、视频及其它们的任意组合。当显示屏805是触摸显示屏时,显示屏805还具有采 集在显示屏805的表面或表面上方的触摸信号的能力。该触摸信号可以作为控制信号输入至处理器801进行处理。此时,显示屏805还可以用于提供虚拟按钮和/或虚拟键盘,也称软按钮和/或软键盘。在一些实施例中,显示屏805可以为一个,设置在电子设备800的前面板;在另一些实施例中,显示屏805可以为至少两个,分别设置在电子设备800的不同表面或呈折叠设计;在另一些实施例中,显示屏805可以是柔性显示屏,设置在电子设备800的弯曲表面上或折叠面上。甚至,显示屏805还可以设置成非矩形的不规则图形,也即异形屏。显示屏805可以采用LCD(Liquid Crystal Display,液晶显示屏)、OLED(Organic Light-Emitting Diode,有机发光二极管)等材质制备。
摄像头组件806用于采集图像或视频。可选地,摄像头组件806包括前置摄像头和后置摄像头。通常,前置摄像头设置在终端的前面板,后置摄像头设置在终端的背面。在一些实施例中,后置摄像头为至少两个,分别为主摄像头、景深摄像头、广角摄像头、长焦摄像头中的任意一种,以实现主摄像头和景深摄像头融合实现背景虚化功能、主摄像头和广角摄像头融合实现全景拍摄以及VR(Virtual Reality,虚拟现实)拍摄功能或者其它融合拍摄功能。在一些实施例中,摄像头组件806还可以包括闪光灯。闪光灯可以是单色温闪光灯,也可以是双色温闪光灯。双色温闪光灯是指暖光闪光灯和冷光闪光灯的组合,可以用于不同色温下的光线补偿。
音频电路807可以包括麦克风和扬声器。麦克风用于采集用户及环境的声波,并将声波转换为电信号输入至处理器801进行处理,或者输入至射频电路804以实现语音通信。出于立体声采集或降噪的目的,麦克风可以为多个,分别设置在电子设备800的不同部位。麦克风还可以是阵列麦克风或全向采集型麦克风。扬声器则用于将来自处理器801或射频电路804的电信号转换为声波。扬声器可以是传统的薄膜扬声器,也可以是压电陶瓷扬声器。当扬声器是压电陶瓷扬声器时,不仅可以将电信号转换为人类可听见的声波,也可以将电信号转换为人类听不见的声波以进行测距等用途。在一些实施例中,音频电路807还可以包括耳机插孔。
定位组件808用于定位电子设备800的当前地理位置,以实现导航或LBS(Location Based Service,基于位置的服务)。定位组件808可以是基于美国的GPS(Global Positioning System,全球定位系统)、中国的北斗系统或俄罗斯的伽利略系统的定位组件。
电源809用于为电子设备800中的各个组件进行供电。电源809可以是交流电、直流电、一次性电池或可充电电池。当电源809包括可充电电池时,该可充电电池可以是有线充电电池或无线充电电池。有线充电电池是通过有线线路充电的电池,无线充电电池是通过无线线圈充电的电池。该可充电电池还可以用于支持快充技术。
在一些实施例中,电子设备800还包括有一个或多个传感器810。该一个或多个传感器810包括但不限于:加速度传感器811、陀螺仪传感器812、压力传感器813、指纹传感器814、光学传感器815以及接近传感器816。
加速度传感器811可以检测以电子设备800建立的坐标系的三个坐标轴上的加速度大小。比如,加速度传感器811可以用于检测重力加速度在三个坐标轴上的分量。处理器801可以根据加速度传感器811采集的重力加速度信号,控制显示屏805以横向视图或纵向视图进行用户界面的显示。加速度传感器811还可以用于游戏或者用户的运动数据的采集。
陀螺仪传感器812可以检测电子设备800的机体方向及转动角度,陀螺仪传感器812可以与加速度传感器811协同采集用户对电子设备800的3D动作。处理器801根据陀螺仪传感器812采集的数据,可以实现如下功能:动作感应(比如根据用户的倾斜操作来改变UI)、拍摄时的图像稳定、游戏控制以及惯性导航。
压力传感器813可以设置在电子设备800的侧边框和/或显示屏805的下层。当压力传感器813设置在电子设备800的侧边框时,可以检测用户对电子设备800的握持信 号,由处理器801根据压力传感器813采集的握持信号进行左右手识别或快捷操作。当压力传感器813设置在显示屏805的下层时,由处理器801根据用户对显示屏805的压力操作,实现对UI界面上的可操作性控件进行控制。可操作性控件包括按钮控件、滚动条控件、图标控件、菜单控件中的至少一种。
指纹传感器814用于采集用户的指纹,由处理器801根据指纹传感器814采集到的指纹识别用户的身份,或者,由指纹传感器814根据采集到的指纹识别用户的身份。在识别出用户的身份为可信身份时,由处理器801授权该用户执行相关的敏感操作,该敏感操作包括解锁屏幕、查看加密信息、下载软件、支付及更改设置等。指纹传感器814可以被设置在电子设备800的正面、背面或侧面。当电子设备800上设置有物理按键或厂商Logo时,指纹传感器814可以与物理按键或厂商Logo集成在一起。
光学传感器815用于采集环境光强度。在一个实施例中,处理器801可以根据光学传感器815采集的环境光强度,控制显示屏805的显示亮度。具体地,当环境光强度较高时,调高显示屏805的显示亮度;当环境光强度较低时,调低显示屏805的显示亮度。在另一个实施例中,处理器801还可以根据光学传感器815采集的环境光强度,动态调整摄像头组件806的拍摄参数。
接近传感器816,也称距离传感器,通常设置在电子设备800的前面板。接近传感器816用于采集用户与电子设备800的正面之间的距离。在一个实施例中,当接近传感器816检测到用户与电子设备800的正面之间的距离逐渐变小时,由处理器801控制显示屏805从亮屏状态切换为息屏状态;当接近传感器816检测到用户与电子设备800的正面之间的距离逐渐变大时,由处理器801控制显示屏805从息屏状态切换为亮屏状态。
本领域技术人员可以理解,图9中示出的结构并不构成对电子设备800的限定,可以包括比图示更多或更少的组件,或者组合某些组件,或者采用不同的组件布置。
本申请实施例还提供了一种机器可读存储介质,该机器可读存储介质内存储有机器可执行指令,该机器可执行指令被处理器执行时实现上文描述的训练智能模型的方法。例如,该机器可读存储介质可以是ROM、RAM、CD-ROM、磁带、软盘和光数据存储设备等。
本领域技术人员在考虑说明书及实践这里公开的申请后,将容易想到本申请的其它实施方案。本申请旨在涵盖本申请的任何变型、用途或者适应性变化,这些变型、用途或者适应性变化遵循本申请的一般性原理并包括本申请未公开的本技术领域中的公知常识或惯用技术手段。说明书和实施例仅被视为示例性的,本申请的真正范围和精神由下面的权利要求指出。
应当理解的是,本申请并不局限于上面已经描述并在附图中示出的精确结构,并且可以在不脱离其范围进行各种修改和改变。本申请的范围仅由所附的权利要求来限制。

Claims (12)

  1. 一种训练智能模型的方法,其特征在于,所述方法包括:
    从第一神经网络中获取第一网络集合和第二网络集合,所述第一网络集合包括第一神经网络和m-1个第二神经网络,m为大于1的整数,所述第二网络集合包括m个第二神经网络,所述m-1个第二神经网络和所述m个第二神经网络是所述第一神经网络中的不同子网络,所述第一网络集合中的每个神经网络的网络规模大于所述第二网络集合中的每个神经网络的网络规模;
    基于第一样本集合和第二样本集合,训练所述第一网络集合中的每个神经网络,得到第一模型集合,所述第一样本集合包括多个第一训练样本,每个第一训练样本包括第一训练数据和标签,所述标签用于指示所述第一训练数据中的目标,所述第二样本集合包括多个第二训练样本,每个第二训练样本包括第二训练数据,所述第一模型集合包括训练所述第一神经网络得到的第一智能模型和训练所述m-1个第二神经网络得到的m-1个第二智能模型;
    基于所述第一样本集合、所述第二样本集合和所述第一模型集合,训练所述第二网络集合中的每个神经网络,得到第二模型集合,所述第二模型集合包括训练所述m个第二神经网络得到的m个第二智能模型。
  2. 如权利要求1所述的方法,其特征在于,所述基于第一样本集合和第二样本集合,训练所述第一网络集合中的每个神经网络,得到第一模型集合,包括:
    针对所述第一网络集合中的任一个待训练神经网络,基于所述第一样本集合,训练所述待训练神经网络包括的卷积层的参数和全连接层的参数,训练后的所述卷积层的参数的值为第一参数值,以及训练后的所述全连接层的参数的值为第二参数值;
    将所述卷积层的参数的值固定为所述第一参数值,基于所述第一样本集合和所述第二样本集合,训练所述待训练神经网络包括的全连接层的参数,训练后的所述全连接层的参数的值由所述第二参数值变为第三参数值;
    将所述全连接层的参数的值固定为所述第三参数值,基于所述第一样本集合和所述第二样本集合,训练所述待训练神经网络的卷积层的参数,得到所述待训练神经网络对应的智能模型。
  3. 如权利要求2所述的方法,其特征在于,所述基于所述第一样本集合和所述第二样本集合,训练所述待训练神经网络包括的全连接层的参数,包括:
    将所述第一样本集合中的每个第一训练样本输入至所述待训练神经网络,获取所述待训练神经网络输出的所述每个第一训练样本对应的第一输出结果,以及将所述第二样本集合中的每个第二训练样本输入至所述待训练神经网络,获取所述每个第二训练样本对应的第一输出结果;
    基于所述每个第一训练样本对应的第一输出结果和所述每个第一训练样本包括的标签,通过第一损失函数获取第一损失函数值,以及基于所述每个第二训练样本对应的第一输出结果,通过距离损失函数获取第二损失函数值,所述第一损失函数是除所述距离损失函数之外的其他损失函数;
    在全连接层损失函数值不是最小值时,基于所述全连接层损失函数值调整所述全连接层的参数的值,所述全连接层损失函数值包括所述第一损失函数值和所述第二损失函数值。
  4. 如权利要求3所述的方法,其特征在于,所述基于所述第一样本集合和所述第二样本集合,训练所述待训练神经网络的卷积层的参数,包括:
    将所述第一样本集合中的每个第一训练样本输入至所述待训练神经网络,获取所述待训练神经网络输出的所述每个第一训练样本对应的第二输出结果,以及将所述第二样本集合中的每个第二训练样本输入至所述待训练神经网络,获取所述每个第二训练样本对应的第二输出结果;
    基于所述每个第一训练样本对应的第二输出结果和所述第一训练样本包括的标签,通过所述第一损失函数获取第三损失函数值,以及基于所述每个第二训练样本对应的第二输出结果,通过所述距离损失函数获取第四损失函数值;
    在卷积层损失函数值不是最小值时,基于所述卷积层损失函数值调整所述卷积层的参数的值,所述卷积层损失函数值包括所述第三损失函数值和所述第四损失函数值。
  5. 如权利要求1所述的方法,其特征在于,所述基于所述第一样本集合、所述第二样本集合和所述第一模型集合,训练所述第二网络集合中的每个神经网络,得到第二模型集合,包括:
    将所述第二样本集合中的每个第二训练样本输入至所述第一模型集合中的每个智能模型,获取所述每个智能模型输出的所述每个第二训练样本对应的多个第三输出结果;
    基于所述每个第二训练样本对应的多个第三输出结果,分别获取所述每个第二训练样本对应的标签;
    基于所述第一样本集合、所述每个第二训练样本和所述每个第二训练样本对应的标签,训练所述第二网络集合中的每个神经网络,得到第二模型集合。
  6. 如权利要求1所述的方法,其特征在于,所述方法还包括:
    针对n个第一设备中的每个第一设备,基于所述第一设备的资源信息,确定所述第一设备对应的第三模型集合,所述第一设备对应的第三模型集合包括至少一个第二智能模型,所述第一设备的资源满足所述至少一个第二智能模型中的每个第二智能模型所需要的资源,n为大于1的整数;
    基于所述第一智能模型,从所述第一设备对应的第三模型集合中为所述第一设备选择一个第二智能模型。
  7. 如权利要求6所述的方法,其特征在于,所述基于所述第一智能模型,从所述第一设备对应的第三模型集合中为所述第一设备选择一个第二智能模型,包括:
    基于所述第二样本集合,获取所述第一智能模型与所述第一设备对应的第三模型集合中的每个第二智能模型的差异信息;
    基于所述第一智能模型与所述每个第二智能模型的差异信息,从所述至少一个第二智能模型中选择一个第二智能模型。
  8. 如权利要求7所述的方法,其特征在于,所述基于所述第二样本集合,获取所述第一智能模型与所述第一设备对应的第三模型集合中的每个第二智能模型的差异信息,包括:
    将所述第二样本集合中的每个第二训练样本输入到所述第一智能模型,获取所述第一智能模型输出的所述每个第二训练样本对应的第四输出结果;
    将所述第二样本集合中的每个第二训练样本输入到目标智能模型,获取所述目标智能模型输出的所述每个第二训练样本对应的第五输出结果,所述目标智能模型是所述第一设备对应的第三模型集合中的一个第二智能模型;
    基于所述每个第二训练样本对应的第四输出结果和第五输出结果,获取所述第一智能模型与所述目标智能模型的差异信息。
  9. 如权利要求1-8任一项所述的方法,其特征在于,所述方法应用于中心计算-边缘设备的系统架构,所述系统架构包括中心设备和多个边缘智能设备,每个边缘智能设备部署在不同区域中,所述中心设备的运算能力和存储能力分别大于所述每个边缘智能设备的运算能力和存储能力,所述方法由所述中心设备执行,所述第二样本集合中的第二训练样本是所述中心设备接收的所述每个边缘智能设备采集的数据。
  10. 一种训练智能模型的装置,其特征在于,所述装置包括:
    获取模块,用于从第一神经网络中获取第一网络集合和第二网络集合,所述第一网络集合包括第一神经网络和m-1个第二神经网络,m为大于1的整数,所述第二网络集合包括m个第二神经网络,所述m-1个第二神经网络和所述m个第二神经网络是所述 第一神经网络中的不同子网络,所述第一网络集合中的每个神经网络的网络规模大于所述第二网络集合中的每个神经网络的网络规模;
    第一训练模块,用于基于第一样本集合和第二样本集合,训练所述第一网络集合中的每个神经网络,得到第一模型集合,所述第一样本集合包括多个第一训练样本,每个第一训练样本包括第一训练数据和标签,所述标签用于指示所述第一训练数据中的目标,所述第二样本集合包括多个第二训练样本,每个第二训练样本包括第二训练数据,所述第一模型集合包括训练所述第一神经网络得到的第一智能模型和训练所述m-1个第二神经网络得到的m-1个第二智能模型;
    第二训练模块,用于基于所述第一样本集合、所述第二样本集合和所述第一模型集合,训练所述第二网络集合中的每个神经网络,得到第二模型集合,所述第二模型集合包括训练所述m个第二神经网络得到的m个第二智能模型。
  11. 一种电子设备,其特征在于,所述电子设备包括处理器和存储器;
    所述存储器,用于存储机器可执行指令;
    所述处理器,用于读取并执行所述存储器存储的机器可执行指令,以实现如权利要求1-9任一项所述的方法。
  12. 一种机器可读存储介质,其特征在于,所述机器可读存储介质内存储有机器可执行指令,所述机器可执行指令被处理器执行时实现如权利要求1-9任一项所述的方法。
PCT/CN2022/131038 2021-11-10 2022-11-10 训练智能模型的方法、装置、电子设备及存储介质 WO2023083240A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111329061.9A CN114118236A (zh) 2021-11-10 2021-11-10 训练智能模型的方法及装置
CN202111329061.9 2021-11-10

Publications (1)

Publication Number Publication Date
WO2023083240A1 true WO2023083240A1 (zh) 2023-05-19

Family

ID=80378256

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/131038 WO2023083240A1 (zh) 2021-11-10 2022-11-10 训练智能模型的方法、装置、电子设备及存储介质

Country Status (2)

Country Link
CN (1) CN114118236A (zh)
WO (1) WO2023083240A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114118236A (zh) * 2021-11-10 2022-03-01 杭州海康威视数字技术股份有限公司 训练智能模型的方法及装置

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107851213A (zh) * 2015-07-22 2018-03-27 高通股份有限公司 神经网络中的转移学习
JP2020027659A (ja) * 2018-08-10 2020-02-20 ネイバー コーポレーションNAVER Corporation 畳み込み回帰型ニューラルネットワークを訓練させる方法、および訓練された畳み込み回帰型ニューラルネットワークを使用する、入力されたビデオのセマンティックセグメンテーション方法
CN111492381A (zh) * 2017-12-13 2020-08-04 超威半导体公司 神经网络的功能子网络的同时训练
US20210089883A1 (en) * 2019-09-24 2021-03-25 Salesforce.Com, Inc. System and Method for Learning with Noisy Labels as Semi-Supervised Learning
CN113366496A (zh) * 2018-12-21 2021-09-07 伟摩有限责任公司 用于粗略和精细对象分类的神经网络
CN114118236A (zh) * 2021-11-10 2022-03-01 杭州海康威视数字技术股份有限公司 训练智能模型的方法及装置

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107851213A (zh) * 2015-07-22 2018-03-27 高通股份有限公司 神经网络中的转移学习
CN111492381A (zh) * 2017-12-13 2020-08-04 超威半导体公司 神经网络的功能子网络的同时训练
JP2020027659A (ja) * 2018-08-10 2020-02-20 ネイバー コーポレーションNAVER Corporation 畳み込み回帰型ニューラルネットワークを訓練させる方法、および訓練された畳み込み回帰型ニューラルネットワークを使用する、入力されたビデオのセマンティックセグメンテーション方法
CN113366496A (zh) * 2018-12-21 2021-09-07 伟摩有限责任公司 用于粗略和精细对象分类的神经网络
US20210089883A1 (en) * 2019-09-24 2021-03-25 Salesforce.Com, Inc. System and Method for Learning with Noisy Labels as Semi-Supervised Learning
CN114118236A (zh) * 2021-11-10 2022-03-01 杭州海康威视数字技术股份有限公司 训练智能模型的方法及装置

Also Published As

Publication number Publication date
CN114118236A (zh) 2022-03-01

Similar Documents

Publication Publication Date Title
CN111091132B (zh) 基于人工智能的图像识别方法、装置、计算机设备及介质
CN110807361B (zh) 人体识别方法、装置、计算机设备及存储介质
CN110222789B (zh) 图像识别方法及存储介质
CN110490179B (zh) 车牌识别方法、装置及存储介质
CN110839128B (zh) 拍照行为检测方法、装置及存储介质
CN111104980B (zh) 确定分类结果的方法、装置、设备及存储介质
CN110796248A (zh) 数据增强的方法、装置、设备及存储介质
CN111127509A (zh) 目标跟踪方法、装置和计算机可读存储介质
CN111738365B (zh) 图像分类模型训练方法、装置、计算机设备及存储介质
CN110705614A (zh) 模型训练方法、装置、电子设备及存储介质
CN110647881A (zh) 确定图像对应的卡片类型的方法、装置、设备及存储介质
CN110675473B (zh) 生成gif动态图的方法、装置、电子设备及介质
WO2023083240A1 (zh) 训练智能模型的方法、装置、电子设备及存储介质
CN113724189A (zh) 图像处理方法、装置、设备及存储介质
CN112818979A (zh) 文本识别方法、装置、设备及存储介质
CN113343709B (zh) 意图识别模型的训练方法、意图识别方法、装置及设备
CN115937738A (zh) 视频标注模型的训练方法、装置、设备及存储介质
CN113205069B (zh) 虚假车牌检测方法、装置及计算机存储介质
CN113936240A (zh) 确定样本图像的方法、装置、设备及存储介质
CN114283395A (zh) 车道线检测的方法、装置、设备及计算机可读存储介质
CN111582184B (zh) 页面检测方法、装置、设备及存储介质
CN112990424B (zh) 神经网络模型训练的方法和装置
CN113920222A (zh) 获取地图建图数据的方法、装置、设备及可读存储介质
CN111488895B (zh) 对抗数据生成方法、装置、设备及存储介质
CN113569894A (zh) 图像分类模型的训练方法、图像分类方法、装置及设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22892034

Country of ref document: EP

Kind code of ref document: A1