CN112836801A

CN112836801A - Deep learning network determination method and device, electronic equipment and storage medium

Info

Publication number: CN112836801A
Application number: CN202110150032.XA
Authority: CN
Inventors: 王师广; 张行程; 郑华滨
Original assignee: Shanghai Sensetime Intelligent Technology Co Ltd
Current assignee: Shanghai Sensetime Intelligent Technology Co Ltd
Priority date: 2021-02-03
Filing date: 2021-02-03
Publication date: 2021-05-25
Also published as: KR20220113919A; WO2022166069A1

Abstract

The disclosure relates to a deep learning network determining method, device, electronic device and storage medium, in the embodiment of the disclosure, a target event to be processed by a target device is determined; determining an original network from the trained network set according to the target event, and acquiring equipment resource information of original equipment for training the original network and equipment resource information of target equipment; segmenting a target network from an original network according to the equipment resource information of the target equipment and the equipment resource information of the original equipment; the target network is used for being deployed on the target equipment and processing the target event, the number of the network modules of the target network and the original network is the same, and the connection structures among the network modules are the same. Therefore, by determining the target network from the original network, a researcher does not need to train the network for each device, time spent on network design and training can be saved, and working efficiency is greatly improved.

Description

Deep learning network determination method and device, electronic equipment and storage medium

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to a deep learning network determining method and apparatus, an electronic device, and a storage medium.

Background

With the development of science and technology and the increasing maturity of human civilization, the application of deep learning networks on devices is more and more common. However, since the device resources provided by different devices are different, the device resources that a mobile handset and a server can provide are obviously different. In order to use the deep learning network on different devices, a researcher is often required to manually design different networks for different devices, and the deep learning network is trained based on different sample data from the construction of the network structure of the deep learning network to obtain the trained deep learning network, so that the researcher wastes a large amount of time on network design and training, and delays other work.

Disclosure of Invention

The disclosure provides a deep learning network determination method, a deep learning network determination device, an electronic device and a storage medium, which are used for reducing the time spent by a researcher in network design and training.

In some possible embodiments, the present disclosure provides a deep learning network determination method, the method comprising:

determining a target event to be processed by target equipment;

determining an original network from the trained network set according to the target event; the original network is a network matched with the target event in the network set;

acquiring equipment resource information of original equipment for training an original network and equipment resource information of target equipment;

segmenting a target network from an original network according to the equipment resource information of the target equipment and the equipment resource information of the original equipment; the target network is used for being deployed on the target equipment and processing the target event, the number of the network modules of the target network and the original network is the same, and the connection structures among the network modules are the same.

In some possible embodiments, the method further comprises the step of training the original network:

acquiring a sample data set, wherein the sample data set comprises a plurality of input data and a label corresponding to each input data;

constructing a preset learning network, determining a plurality of sampling networks from the preset learning network, and setting initial parameters for each sampling network;

and training the plurality of sampling networks by using the sample data set, taking the trained plurality of sampling networks as a plurality of standby networks, and determining an original network from the plurality of standby networks.

In some possible embodiments, the plurality of sampling networks includes a first level network, at least one second level network and a third level network; the number of the network modules of the first-level network, at least one second-level network and at least one third-level network is the same, and the connection structures among the network modules are the same;

the output resolution of each network module in the first-level network is greater than the output resolution of the corresponding network module in the second-level network; the output resolution of each network module in the second-level network is greater than the output resolution of the corresponding network module in the third-level network;

the number of output channels of each network module in the first-level network is greater than that of the corresponding network module in the second-level network; the number of output channels of each network module in the second-level network is greater than that of the corresponding network module in the third-level network;

the number of convolution modules of each network module in the first-level network is greater than that of the corresponding network module in the second-level network; the number of convolution modules of each network module in the second-level network is greater than that of the corresponding network module in the third-level network.

In some possible embodiments, training the plurality of sampling networks using the sample data set, and using the trained plurality of sampling networks as a plurality of backup networks, includes:

taking each sampling network in the plurality of sampling networks as a current sampling network;

processing a plurality of input data based on a current sampling network, and determining output data corresponding to each input data;

determining a loss value based on the output data and the label corresponding to the input data;

and when the loss value is less than or equal to the preset threshold value, determining the current sampling network as a standby network.

In some possible embodiments, before determining the current sampling network as the standby network when the loss value is less than or equal to the preset threshold, the method further includes:

when the loss value is larger than the preset threshold value, performing back propagation based on the loss value, updating parameters of the current sampling network to obtain an updated sampling network, and determining the updated sampling network as the current sampling network;

repeating the steps: processing a plurality of input data based on a current sampling network, and determining output data corresponding to each input data; based on the labels corresponding to the output data and the input data, a loss value is determined.

taking the first level network as a current first sampling network, and taking each network of at least one second level network and at least one third level network as a current second sampling network;

processing a plurality of input data based on a current first sampling network, and determining first output data corresponding to each input data; determining a first loss value based on the first output data corresponding to each input data and the label corresponding to each input data;

for each current second sampling network of the plurality of current second sampling networks: processing a plurality of input data based on a current second sampling network, and determining second output data corresponding to each input data; determining a second loss value based on the second output data corresponding to each input data and the first output data corresponding to each input data;

and when the first loss value is less than or equal to a first preset threshold value and when the second loss value is less than or equal to a second preset threshold value, determining the current first sampling network and the current second sampling network as standby networks.

In some possible embodiments, before determining the current first sampling network and the current second sampling deep learning network as the standby network when the first loss value is less than or equal to the first preset threshold and when the second loss value is less than or equal to the second preset threshold, the method further includes:

when the first loss value is larger than a first preset threshold value, performing back propagation based on the first loss value, performing parameter updating on the current first sampling network to obtain an updated first sampling network, and determining the updated first sampling network as the current first sampling network; when the second loss value is larger than a second preset threshold value, performing back propagation based on the second loss value, performing parameter updating on the current second sampling network to obtain an updated second sampling network, and determining the updated second sampling network as the current second sampling network; repeating the steps: processing a plurality of input data based on a current first sampling network, and determining first output data corresponding to each input data; determining a first loss value based on the first output data corresponding to each input data and the label corresponding to each input data; processing a plurality of input data based on a current second sampling network, and determining second output data corresponding to each input data; a second penalty value is determined based on the second output data corresponding to each input data and the first output data corresponding to each input data.

In some possible embodiments, the original network comprises one master leg and at least one slave leg; the master branch comprises a plurality of network modules, and the network modules in each slave branch comprise partial network modules in the master branch;

if the original network has a plurality of slave branches, each of the plurality of slave branches at least shares one network module.

In some possible embodiments, each network module comprises at least one convolution module, and each convolution module comprises at least one convolution layer, one batch normalization layer (BN layer), and one activation layer;

the convolution layer, the batch normalization layer (BN layer) and the active layer are connected in series;

the convolutional layer is a deep convolutional layer, a point-by-point convolutional layer, or a two-dimensional convolutional layer.

In some possible embodiments, the device resource information of the original device at least includes: the video memory capacity data, the video memory performance data, the network performance data, the memory capacity data and the computing capacity data of the original equipment;

the device resource information of the target device includes at least: the video memory capacity data, the video memory performance data, the network performance data, the memory capacity data and the computing capacity data of the target equipment.

In some possible embodiments, determining an original network from the trained network set according to the target event includes:

decomposing the target event, and determining a target task identifier and a target object identifier of the target event;

acquiring a task identifier and an object identifier of each network in a network set;

determining a target task identifier and a first matching degree value of each task identifier;

determining a target object identifier and a second matching degree value of each object identifier;

and determining an original network from the network set according to the first matching degree value and the second matching degree value.

In some possible embodiments, segmenting the target network from the original network according to the device resource information of the target device and the device resource information of the original device includes:

adjusting the output resolution, the number of output channels and the number of volume modules of each network module in the original network according to the equipment resource information of the target equipment and the equipment resource information of the original equipment by using a preset search rule;

and determining the adjusted original network as a target network.

In some possible embodiments, the present disclosure provides a deep learning network determination apparatus, comprising:

the target event determining unit is used for determining a target event to be processed by the target equipment;

an original network determining unit, configured to determine an original network from the trained network set according to the target event; the original network is a network matched with the target event in the network set;

a resource information obtaining unit, configured to obtain device resource information of an original device that trains the original network and device resource information of the target device;

the target network segmentation unit is used for segmenting a target network from the original network according to the equipment resource information of the target equipment and the equipment resource information of the original equipment; the target network is used for being deployed on the target device and processing the target event, the number of the network modules of the target network is the same as that of the original network, and the connection structures between the network modules are the same.

In some possible embodiments, the apparatus further comprises a network training unit comprising:

the data acquisition subunit is used for acquiring a sample data set, wherein the sample data set comprises a plurality of input data and a label corresponding to each input data;

the network construction subunit is used for constructing a preset learning network, determining a plurality of sampling networks from the preset learning network, and setting initial parameters for each sampling network;

and the network training subunit is used for training the plurality of sampling networks by using the sample data set, taking the plurality of trained sampling networks as a plurality of standby networks, and determining the original network from the plurality of standby networks.

the number of output channels of each network module in the first-level network is greater than that of the corresponding network module in the second-level network; the number of output channels of each network module in the second-level network is greater than the number of output channels of the corresponding network module in the third-level network;

the number of convolution modules of each network module in the first-level network is greater than that of the corresponding network module in the second-level network; the number of convolution modules of each network module in the second-level network is greater than the number of convolution modules of the corresponding network module in the third-level network.

In some possible embodiments, the network training subunit is configured to:

when the first loss value is larger than a first preset threshold value, performing back propagation based on the first loss value, performing parameter updating on the current first sampling network to obtain an updated first sampling network, and determining the updated first sampling network as the current first sampling network; when the second loss value is larger than a second preset threshold value, performing back propagation based on the second loss value, performing parameter updating on the current second sampling network to obtain an updated second sampling network, and determining the updated second sampling network as the current second sampling network;

repeating the steps: processing a plurality of input data based on a current first sampling network, and determining first output data corresponding to each input data; determining a first loss value based on the first output data corresponding to each input data and the label corresponding to each input data; processing a plurality of input data based on a current second sampling network, and determining second output data corresponding to each input data; a second penalty value is determined based on the second output data corresponding to each input data and the first output data corresponding to each input data.

In some possible embodiments, the original network determining unit is configured to:

In some possible embodiments, the target network segmentation unit is to:

and determining the adjusted original network as a target network.

In some possible embodiments, the present disclosure also provides an electronic device comprising at least one processor, and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor, and the at least one processor implements the deep learning network determination method according to the second aspect by executing the instructions stored by the memory.

In some possible embodiments, the present disclosure also provides a computer-readable storage medium having at least one instruction or at least one program stored therein, the at least one instruction or the at least one program being loaded and executed by a processor to implement the deep learning network determining method of the second aspect of the present disclosure.

In some possible embodiments, the present disclosure also provides a computer program product containing instructions which, when run on a computer, cause the computer to execute to implement the deep learning network determination method of the second aspect of the present disclosure.

In the embodiment of the disclosure, a target event to be processed by a target device is determined; determining an original network from the trained network set according to the target event, and acquiring equipment resource information of original equipment for training the original network and equipment resource information of target equipment; segmenting a target network from an original network according to the equipment resource information of the target equipment and the equipment resource information of the original equipment; the target network is used for being deployed on the target equipment and processing the target event, the number of the network modules of the target network and the original network is the same, and the connection structures among the network modules are the same. Therefore, by determining the target network from the original network, a researcher does not need to train the network for each device, time spent on network design and training can be saved, and working efficiency is greatly improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Other features and aspects of the present disclosure will become apparent from the following detailed description of exemplary embodiments, which proceeds with reference to the accompanying drawings.

Drawings

In order to more clearly illustrate the embodiments of the present specification or the technical solutions and advantages of the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present specification, and other drawings can be obtained by those skilled in the art without inventive efforts.

FIG. 1 shows a block diagram of a deep learning network determination system according to an embodiment of the present disclosure;

FIG. 2 shows a flow diagram of a deep learning network determination method according to an embodiment of the disclosure;

FIG. 3 illustrates a flow chart of a method of determining an original network according to an embodiment of the present disclosure;

FIG. 4 illustrates a flow diagram of training an original network according to an embodiment of the present disclosure;

fig. 5 shows a schematic structural diagram of a preset learning network according to an embodiment of the present disclosure;

FIG. 6 shows a block diagram of a convolution module according to an embodiment of the present disclosure;

FIG. 7 shows a block diagram of a convolution module according to an embodiment of the present disclosure;

FIG. 8 shows a block diagram of a convolution module according to an embodiment of the present disclosure;

fig. 9 shows a block diagram of a deep learning network determination apparatus according to an embodiment of the present disclosure;

FIG. 10 shows a block diagram of an electronic device in accordance with an embodiment of the disclosure;

FIG. 11 shows a block diagram of another electronic device in accordance with an embodiment of the disclosure.

Detailed Description

The technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the drawings in the embodiments of the present disclosure, and it is obvious that the described embodiments are only a part of the embodiments of the present disclosure, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without inventive step based on the embodiments in the present description, belong to the protection scope of the present invention.

It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or server that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

Various exemplary embodiments, features and aspects of the present disclosure will be described in detail below with reference to the accompanying drawings. In the drawings, like reference numbers can indicate functionally identical or similar elements. While the various aspects of the embodiments are presented in drawings, the drawings are not necessarily drawn to scale unless specifically indicated.

The word "exemplary" is used exclusively herein to mean "serving as an example, embodiment, or illustration. Any embodiment described herein as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments.

The term "and/or" herein is merely an association describing an associated object, meaning that three relationships may exist, e.g., a and/or B, may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the term "at least one" herein means any one of a plurality or any combination of at least two of a plurality, for example, including at least one of A, B, C, and may mean including any one or more elements selected from the group consisting of A, B and C.

Furthermore, in the following detailed description, numerous specific details are set forth in order to provide a better understanding of the present disclosure. It will be understood by those skilled in the art that the present disclosure may be practiced without some of these specific details. In some instances, methods, means, elements and circuits that are well known to those skilled in the art have not been described in detail so as not to obscure the present disclosure.

Referring to fig. 1, fig. 1 illustrates a deep learning network determination system according to an embodiment of the present disclosure, as shown in fig. 1, including a first device 10 and a second device 20.

Optionally, the first device 10 may collect various deep learning networks from various other devices, put all the collected deep learning networks into a set, and store the set in a storage area, so as to determine an original network related to the target event from the set at a later time, and further determine the target network. Alternatively, the other device described above may comprise the first device itself. The original network can be a deep learning network related to the target event in all collected various deep learning networks.

The first device 10 may include, but is not limited to, smart phones, desktop computers, tablet computers, laptop computers, smart speakers, digital assistants, Augmented Reality (AR)/Virtual Reality (VR) devices, smart wearable devices, and the like. But also a server. Alternatively, the operating system running on the electronic device or the server may include, but is not limited to, an android system, an IOS system, linux, windows, Unix, and the like.

Alternatively, the second device 20 may be a target device that needs to deploy a target network, and is configured to receive a target network split from an original network, deploy the target network on the device, and process a target event. The second device 20 may include, but is not limited to, smart phones, desktop computers, tablet computers, laptop computers, smart speakers, digital assistants, Augmented Reality (AR)/Virtual Reality (VR) devices, smart wearable devices, and the like. But also a server. Alternatively, the operating system running on the electronic device or the server may include, but is not limited to, an android system, an IOS system, linux, windows, Unix, and the like.

In some possible embodiments, the first device 10 determines a target event to be processed by the target device (the second device 20), and determines an original network from the trained network set according to the target event, where the original network is a network in the network set that matches the target event. The first device 10 acquires device resource information of an original device training an original network and device resource information of a target device, and segments a target network from the original network according to the device resource information of the target device and the device resource information of the original device, wherein the target network is used for being deployed on the target device (the second device 20) and processing a target event, and the number of network modules of the target network and the original network is the same and the connection structure between the network modules is the same.

In an alternative embodiment, the first device 10 and the second device 20 may be connected by a wireless link or a wired link.

The following describes a deep learning network determination method according to an embodiment of the present disclosure, taking a server as an execution subject. The deep learning network determination method may be implemented by way of a processor invoking computer readable instructions stored in a memory.

Fig. 2 shows a flowchart of a deep learning network determination method according to an embodiment of the present disclosure, and as shown in fig. 2, the method includes:

in step S201, a target event to be processed by the target device is determined.

In the embodiment of the disclosure, the server is to deploy a network capable of processing the target event on the target device, and since some previously trained networks are pre-stored, the server may directly determine an original network capable of processing the target event from the networks, and perform some adjustments on the original network based on the device resource information of the target device to obtain the target network. Based on the above description, a target event to be processed by the target device is first determined.

In some possible embodiments, the target event may include a variety of events, such as image recognition, image segmentation, (device, mechanical) fault recognition, obstacle recognition, trajectory prediction, article scoring, and so forth. For convenience of understanding, the following description will be given by taking image segmentation as a target event, and for other events, reference is made to the image segmentation event, and details will not be repeated.

In some possible embodiments, the target device may be a server, that is, the server determines the target event to be processed for itself. In other possible embodiments, the target device is a device other than a server. Specifically, the server may receive a network requirement instruction of the target device, where the network requirement instruction carries an identifier of the target event, and the network requirement instruction is used to instruct the server to determine the target event according to the identifier of the target event.

In step S202, an original network is determined from the trained network set according to the target event; the original network is the network in the network set matched with the target event.

In the embodiment of the present disclosure, optionally, the trained network set may be stored in the server in advance, and after the target event is determined, the original network may be determined directly from the trained network set. Optionally, after determining the divided target events, the server may obtain a plurality of trained networks from each device connected to the server, and determine an original network from the trained networks, where the original network is a network in the network set that matches the target events, in other words, the original network is a network capable of processing the target events. For example, assuming that the target event is image segmentation, the determined original network is the network that processed the image segmentation event.

However, in some alternative embodiments, there is no network in the trained network set corresponding to the target event, i.e. for example, there is no network in the network set that has just processed the image segmentation, in this case, fig. 3 shows a flowchart of a method for determining an original network according to an embodiment of the present disclosure, as shown in fig. 3, the method includes:

in step S2021, the target event is decomposed, and the target task identifier and the target object identifier of the target event are determined.

In some possible embodiments, the server may identify the target event, and obtain a target task identifier and a target object identifier of the target event.

In other possible embodiments, the network requirement instruction carries an identifier of a target event, and the identifier of the target event may include a target task identifier and a target object identifier. After receiving the identifier of the target event, the server may decompose the identifier of the target event to obtain a target task identifier and a target object identifier. The target object identifier is used for indicating an object which is finally determined by the server and processed by a target network, and the target task identifier is used for indicating a specific mode which is finally determined by the server and processed by the target network.

For example, the event "image recognition" can be decomposed into: the object to be processed is an "image", and the specific way of processing the object is "recognition", and thus the task identifier of "image recognition" may be an identifier corresponding to "recognition", and the object identifier may be an identifier corresponding to "image". As another example, the event "image segmentation" can be decomposed as: the object to be processed is an "image", the specific way to process the object is "segmentation", and thus the task identifier of "image recognition" may be an identifier corresponding to "segmentation", and the object identifier may be an identifier corresponding to "image". As another example, the event "article scoring" can be broken down into: the object to be processed is an article, the specific way for processing the object is score, therefore, the task identifier of the article score can be the identifier corresponding to the score, and the object identifier can be the identifier corresponding to the article. In some possible embodiments, the task identifier and the image identifier may be unique identifiers, for example, the identifier corresponding to "image" in the object identifier is a001, the identifier corresponding to "article" is B001, the identifier corresponding to "identify" in the a002 … … task identifier is B001, the identifier corresponding to "segment" is B002, and the identifier corresponding to "score" is B003 … …

Based on the above example, in step S2021, if the target event is "image segmentation", the server may determine that the target task identifier of the target event is B002 and the target object identifier is a 001.

In step S2022, the task identifier and the object identifier of each network in the network set are acquired.

In some optional embodiments, when the server acquires each network, each network may carry its own task identifier and object identifier; in other optional embodiments, when the server acquires each network, each network may carry an identifier of an event, and resolve a task identifier and an object identifier corresponding to each event from the identifier of each event.

In step S2023, the target task identity and the first matching degree value of each task identity are determined.

In step S2024, the target object id and the second matching degree value of each object id are determined.

And continuing to describe the image segmentation based on the target event, in some possible embodiments, directly matching the target task identifier with the task identifier of each network in the network set, wherein if the target task identifier is the same as the task identifier of each network in the network set, the first matching degree value is 1, and if the target task identifier is not the same as the task identifier of each network in the network set, the first matching degree value is 0. Similarly, the target object identifier is directly matched with the object identifier of each network in the network set, if the target object identifier is the same as the object identifier of each network in the network set, the second matching degree value is 1, and if the target object identifier is not the same as the object identifier of each network in the network set, the second matching degree value is 0.

However, the matching method in the upper segment is too absolute, if the target object is identified as "image a 001" and the object is identified as "picture a 101", the second matching degree value is directly equal to 0 according to the matching method in the upper segment, which obviously is not ideal, and even may result in that the whole scheme cannot be performed normally. Therefore, in some possible embodiments, taking the target object id as an example, a similar extended object id may be determined based on the target object id, for example, a similar extended object id "photo a 101", "photo a 012", etc. may be determined according to "image a 001", and then the target task id and the extended object id are respectively matched with each object id, so as to obtain a second matching degree value between the target object id and each object id. Specifically, if the first matching degree value of the target task identifier and each task identifier is worth determining, the extended task identifier corresponding to the target task identifier may be determined first by referring to the determination process of the object identifier.

In step S2025, an original network is determined from the network set according to the first matching degree value and the second matching degree value.

In some possible embodiments, a network in which both the first matching degree value and the second matching degree value are 1 may be determined as the original network.

The disclosed embodiment further includes a process of training to obtain an original network, and in an alternative embodiment, the original network may include, but is not limited to, a deep learning network such as a convolutional neural network, a cyclic neural network, or a recurrent neural network. The deep Learning network can be considered as a form of Machine Learning, and Machine Learning (ML) is a multi-domain cross subject, and relates to multiple subjects such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like. The special research on how a computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills and reorganize the existing knowledge structure to continuously improve the performance of the computer. Machine learning is the core of artificial intelligence, is the fundamental approach for computers to have intelligence, and is applied to all fields of artificial intelligence. Machine learning and deep learning generally include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and formal education learning. Machine learning can be divided into supervised machine learning, unsupervised machine learning and semi-supervised machine learning.

In a specific embodiment, taking the neural volume and the network as an example, fig. 4 shows a flowchart of training to obtain an original network according to an embodiment of the present disclosure, as shown in fig. 4, the method includes:

in step S401, a sample data set is obtained, where the sample data set includes a plurality of input data and a tag corresponding to each input data.

If the original network is previously processed with image segmentation events, two preparations are required before training the original network: firstly, a preset learning network is constructed, and secondly, a sample data set is collected. Taking image segmentation as an example, the server needs to acquire an image.

In some possible embodiments, the server may download the images from an image library on the internet, and may also obtain the images through other electronic devices, for example, the images may be captured from a video screen recorded by a camera of the vehicle. For example, the image is captured from a video screen recorded by a vehicle camera at a frame rate of 30 frames, one image is captured every 10 frames, 3 images are captured every second, and the captured image is transmitted to a server. In some possible embodiments, the server may pre-process the image according to actual requirements, wherein the pre-processing includes size change of the image, gray value processing, gaussian filtering processing, gamma correction, or the like. Optionally, the server needs to obtain, in addition to the image, a label corresponding to the image, such as an object (person, dog, plant, furniture, etc.) included in each image, and further, the label may further include a position of the object included in the image on the image.

In step S402, a preset learning network is constructed, a plurality of sampling networks are determined from the preset learning network, and initial parameters are set for each sampling network.

Fig. 5 shows a schematic structural diagram of a preset learning network according to an embodiment of the present disclosure, and as shown in fig. 5, the preset learning network in step S402 may be referred to as a super network or a search space, and may be a network including only one branch as shown in fig. 5(a), a network including two branches as shown in fig. 5(b), or a network including five branches as shown in fig. 5 (c). The number of branches in the preset learning network may be set according to an actual situation, and the embodiment of the present disclosure is not limited.

In consideration of the requirement that the network should satisfy both real-time performance and small capacity of network processing, the embodiments of the present disclosure will be described by taking a network including two branches as an example. The two-branch network shown in fig. 5(b) includes an upper main branch and a lower branch, wherein the main branch includes a first network module, a second network module, a third network module, a fourth network module, a fifth network module and a sixth network module which are connected in series from front to back, and can be used to capture high-level semantic information (such as type information, correlation between objects, context information, and the like) in an image segmentation event. The slave network module comprises a first network module, a second network module and a third network module which are connected in series from front to back, and can be used for capturing the underlying semantic information (such as outline information, size information, color information and the like) in an image segmentation event. Wherein the first network module, the second network module and the third network module may be shared by the master and slave.

In addition, the present disclosure may also preset the resolution, the type, the number of channels, and/or the number of convolution modules of each network module in the two-branch network, and since a plurality of sampling networks need to be determined from the preset learning network, the resolution, the number of channels, and/or the number of convolution modules of each network module may be a range, as shown in table 1:

table 1: network structure unit

The second line content is represented by the range of the output resolution of the first network module being 896 × 1792 and 1536 × 3072, the type of the convolution module being 2-dimensional convolution, the number of the convolution modules being 1, and the range of the number of output channels being 24-40.

The third row of content is represented by the output resolution of the second network module in the range of 448 × 896-.

The fourth row content is the output resolution range of the third network module is 224 x 448-.

The content of the fifth row is the range of the output resolution of the fourth network module is 112 × 224-.

The sixth row has output resolutions ranging from 56 x 112 to 96 x 192 for the fifth network module, the type of convolution module is expanded depth separable convolution, the number of convolution modules is 2 to 4, and the number of output channels ranges from 72 to 128.

The seventh row has output resolutions ranging from 28 x 56 to 48 x 96 for the sixth network module, the type of convolution module is expanded depth separable convolution, the number of convolution modules is 2 to 4, and the number of output channels ranges from 96 to 160.

In step S402, determining a plurality of sampling networks from the preset learned networks may include obtaining an infinite number of sampling networks through the composition of different resolutions of each network module, the type of convolution module, different channel numbers, and/or different numbers of convolution modules, or determining a plurality of sampling networks in consideration of the computational power or practical application of the device for training the sampling networks.

In some possible embodiments, the plurality of sampling networks may include sampling networks that differ in randomly determined resolution, number of channels, and/or number of convolution modules. Alternatively, the plurality of sampling networks may include a first level network (or referred to as a maximum model), at least one second level network and a third level network (or referred to as a minimum model). The number of each network module in the first-level network, the at least one second-level network and the third-level network is the same, for example, as in fig. 5(b), the master is 6 network modules, the slave is 3 network modules, and the connection structure between the network modules is the same.

Specifically, the first-level network may include a network in which the output resolution of each network module is the maximum, the number of output channels is the maximum, and the number of convolution modules is the maximum, which is described by data in the table, the output resolution of the first network module in the first-level network is 1536 × 3072, the type of convolution module is 2-dimensional convolution, the number of convolution modules is 1, and the number of output channels is 40; the output resolution of the second network module is 768 × 1536, the type of convolution module is depth separable convolution, the number of convolution modules is 1, and the number of output channels is 64; the output resolution of the third network module is 384 × 768, the type of convolution module is depth separable convolution, the number of convolution modules is 1, and the number of output channels is 72; the output resolution of the fourth network module is 192 × 384, the type of convolution module is expanded depth separable convolution, the number of convolution modules is 4, and the number of output channels is 72; the output resolution of the fifth network module is 96 × 192, the type of convolution module is expanded depth separable convolution, the number of convolution modules is 4, and the number of output channels is 128; the output resolution of the sixth network module is 48 × 96, the type of convolution module is expanded depth separable convolution, the number of convolution modules is 4, and the number of output channels is 160.

The third-level network may include a network with the minimum output resolution of each network module, the minimum number of output channels, and the minimum number of convolution modules, which are described by data in the table, where the output resolution of the first network module in the third-level network is 896 × 1792, the type of convolution module is 2-dimensional convolution, the number of convolution modules is 1, and the number of output channels is 24; the output resolution of the second network module is 448 × 896, the type of convolution module is depth separable convolution, the number of convolution modules is 1, and the number of output channels is 32; the output resolution of the third network module is 224 x 448, the type of convolution module is depth separable convolution, the number of convolution modules is 1, and the number of output channels is 48; the output resolution of the fourth network module is 112 × 224, the type of convolution module is expanded depth separable convolution, the number of convolution modules is 2, and the number of output channels is 48; the output resolution of the fifth network module is 56 × 112, the type of convolution module is expanded depth separable convolution, the number of convolution modules is 2, and the number of output channels is 72; the output resolution of the sixth network module is 28 × 56, the type of convolution module is expanded depth separable convolution, the number of convolution modules is 2, and the number of output channels is 96.

Each network module of the second-level network is arranged between the first-level network and the third-level network, and the output resolution of each network module in the first-level network is greater than that of the corresponding network module in the second-level network; the output resolution of each network module in the second-level network is greater than the output resolution of the corresponding network module in the third-level network; the number of output channels of each network module in the first-level network is greater than that of the corresponding network module in the second-level network; the number of output channels of each network module in the second-level network is greater than that of the corresponding network module in the third-level network; the number of convolution modules of each network module in the first-level network is greater than that of the corresponding network module in the second-level network; the number of convolution modules of each network module in the second-level network is greater than that of the corresponding network module in the third-level network.

Assuming there are 4 sampling networks in total, then the number of second level networks is 2, and then second level network 1 may be: the output resolution of the first network module is 1236 × 2456, the type of the convolution module is 2-dimensional convolution, the number of the convolution modules is 1, and the number of output channels is 36; the output resolution of the second network module is 668 x 1036, the type of convolution module is depth separable convolution, the number of convolution modules is 1, and the number of output channels is 48; the output resolution of the third net module is 364 × 668, the type of convolution module is depth separable convolution, the number of convolution modules is 1, and the number of output channels is 56; the output resolution of the fourth network module is 184 × 358, the type of convolution module is expanded depth separable convolution, the number of convolution modules is 3, and the number of output channels is 68; the output resolution of the fifth network module is 86 × 168, the type of convolution module is expanded depth separable convolution, the number of convolution modules is 3, and the number of output channels is 108; the output resolution of the sixth network module is 36 × 72, the type of convolution module is expanded depth separable convolution, the number of convolution modules is 3, and the number of output channels is 156. The second level network 2 may be: the output resolution of the first network module is 1036 × 1856, the type of the convolution module is 2-dimensional convolution, the number of the convolution modules is 1, and the number of output channels is 28; the output resolution of the second network module is 548 x 936, the type of convolution module is depth separable convolution, the number of convolution modules is 1, and the number of output channels is 32; the output resolution of the third network module is 264 × 548, the type of convolution module is depth separable convolution, the number of convolution modules is 1, and the number of output channels is 50; the output resolution of the fourth network module is 124 × 256, the type of the convolution module is expanded depth separable convolution, the number of the convolution modules is 2, and the number of output channels is 60; the output resolution of the fifth network module is 66 × 128, the type of convolution module is expanded depth separable convolution, the number of convolution modules is 3, and the number of output channels is 108; the output resolution of the sixth network module is 32 x 72, the type of convolution module is expanded depth separable convolution, the number of convolution modules is 3, and the number of output channels is 114.

In some possible embodiments, since the convolution module mainly calculates at the convolution layer, in order to reduce the calculation amount, the server may set different convolution modules, and the calculation amount of the convolution layer may be expressed as Cout × Cin × k × k, where Cout, Cin, k are the number of output channels, the number of input channels, and the size of the convolution kernel respectively, and the calculation amount and the capacity to reduce the convolution layer may reduce the total calculation amount by reducing the number of channels and/or by reducing the size of the convolution kernel. Thus, in some more computationally intensive network modules, a less computationally intensive depth separable convolution may be used in place of the 2-dimensional convolution. This is because the computation load of the depth separable convolution is smaller than that of the 2-dimensional convolution in an equivalent computing environment. Whereas depth-separable convolution decomposes a Cout x Cin x k convolution kernel into two dimensions, depth convolution (1 x k) and point-by-point convolution (Cout x Cin x 1), and thus can be split into two convolution layers, depth convolution and point-by-point convolution.

In some possible embodiments, each convolution module may include at least one convolution layer according to the type to which it belongs. Optionally, if the convolution module is a 2-dimensional convolution, the convolution module may only include one two-dimensional convolution layer. If the convolution module is a depth-separable convolution, the convolution module may include two convolution layers of a depth convolution layer and a point-by-point convolution layer in series, and if the convolution module is an expanded depth-separable convolution, the convolution module may include three convolution layers of a point-by-point convolution layer, a depth convolution layer, and a point-by-point convolution layer in series.

As shown in fig. 6, fig. 6 illustrates a block diagram of a convolution module including a left depth separable convolution module and an expanded depth separable convolution according to an embodiment of the present disclosure. The depth separable convolution module comprises a depth convolution layer with convolution kernel of 3 x 3 and a point-by-point convolution layer with convolution kernel of 1 x 1, and the expanded depth separable convolution comprises the point-by-point convolution layer with convolution kernel of 1 x 1, the depth convolution layer with convolution kernel of 3 x 3 and the point-by-point convolution layer with convolution kernel of 1 x 1. And in practical applications, data is often not linearly separable, and in order to introduce a non-linear factor, an active layer ReLU may be introduced behind the convolutional layer. Further, in order to normalize, shift, and scale the data output by the convolutional layer, and prevent the data from being saturated, a batch normalization BN layer may be provided between the convolutional layer and the active layer, so as to obtain a structure diagram of the convolutional module shown in fig. 7. Further, the network degradation problem in the expanded depth separable convolution can be solved using the residual structure in the convolution module as shown in fig. 8.

In some possible embodiments, before training each sampling network, the server may set initial parameters for each sampling network, for example, for all convolutional layers, the server may initialize them by using kaiming _ init, Vavier, etc., for some batch normalized BN layers, the scaling parameter γ is initialized to 1, and the translation parameter β is initialized to 0, and in particular, for the residual structure in the expanded depth separable convolution, the scaling parameter γ of the last batch normalized BN layer may be set to 0, so that the output of all residual blocks before residual concatenation is an all-0 vector, which can improve the stability of training.

In step S403, a plurality of sampling networks are trained using the sample data set, the trained sampling networks are used as a plurality of standby networks, and an original network is determined from the plurality of standby networks.

In some possible embodiments, the specific training process may include: the method comprises the steps of carrying out forward propagation on each sampling network with the initial parameters set as above to obtain output under the appointed input, then obtaining the gradient of the input and the parameters by utilizing backward propagation, and then carrying out gradient processing (such as limiting the gradient range) and updating to finally obtain a plurality of trained standby networks.

In a possible embodiment, each sampling network is trained in the same way, and specifically, the server, for each sampling network of the plurality of sampling networks: and taking each sampling network as a current sampling network, processing a plurality of input data based on the current sampling network, and determining output data corresponding to each input data. Subsequently, the server may determine a loss value based on the tags corresponding to the output data and the input data, perform back propagation based on the loss value when the loss value is greater than a preset threshold value, perform parameter update on the current sampling network to obtain an updated sampling network, and determine the updated sampling network as the current sampling network; repeating the steps: processing a plurality of input data based on a current sampling network, and determining output data corresponding to each input data; based on the labels corresponding to the output data and the input data, a loss value is determined. And determining the current sampling network as a standby network until the loss value is less than or equal to the preset threshold value. In this approach, each sampling network appears to be trained as the current sampling network, but in practice, all sampling networks are trained at the same time. Certainly, the server can also train each sampling network in different time periods, so that the server can be prevented from influencing other processes due to the fact that computing power is greatly consumed, and further the downtime of the server is caused.

In another possible embodiment, in order to save the computing resources of the server and further reduce the burden on the server, the server may use the first-level network as a "teacher" network and use the other second-level networks and third-level networks as "student" networks according to a preset setting or directly, so that the performance of the first-level network may be directly migrated to the second-level networks and the third-level networks during the network training process.

Specifically, the server takes the first-level network as a current first sampling network, and takes each of at least one second-level network and at least one third-level network as a current second sampling network. Processing a plurality of input data based on a current first sampling network, and determining first output data corresponding to each input data; a first loss value is determined based on the first output data corresponding to each input data and the label corresponding to each input data. For each current second sampling network of the plurality of current second sampling networks: processing a plurality of input data based on a current second sampling network, and determining second output data corresponding to each input data; a second penalty value is determined based on the second output data corresponding to each input data and the first output data corresponding to each input data. When the first loss value is larger than a first preset threshold value, performing back propagation based on the first loss value, performing parameter updating on the current first sampling network to obtain an updated first sampling network, and determining the updated first sampling network as the current first sampling network; when the second loss value is larger than a second preset threshold value, performing back propagation based on the second loss value, performing parameter updating on the current second sampling network to obtain an updated second sampling network, and determining the updated second sampling network as the current second sampling network; repeating the steps: processing a plurality of input data based on a current first sampling network, and determining first output data corresponding to each input data; determining a first loss value based on the first output data corresponding to each input data and the label corresponding to each input data; processing a plurality of input data based on a current second sampling network, and determining second output data corresponding to each input data; a second penalty value is determined based on the second output data corresponding to each input data and the first output data corresponding to each input data. And determining the current first sampling network and the current second sampling network as standby networks only when the first loss value is less than or equal to a first preset threshold value and when the second loss value is less than or equal to a second preset threshold value.

The first output data output by the current first sampling network may be a soft tag, for example, a plurality of segmented objects and a probability value of each object in the case of an image segmentation event, rather than only a hard tag of the object. Therefore, when the current second sampling network is trained, the soft label can be used as a true value to be compared with the second output data, and the loss value is determined. Therefore, useful information of a large and complex model (a first-level network) with good performance can be extracted and transferred to a smaller network (a second-level network and a third-level network), so that the learned small network can have performance close to that of the answering network, and computing resources are greatly saved. Optionally, the large model (first-level network) needs the server to regularize it, that is, add a regularization layer to the network model. This is because only the largest model can directly contact the label, and other small models are supervised by the output of the large model, i.e. the soft label distillation.

In some possible embodiments, during the training process, the large model will typically reach the accuracy peak and then begin to be over-fit, while the small model is still under-fit. For this purpose, the server can solve this problem by modifying the learning rate according to preset rules. For example, the server first reduces the learning rate by using an exponential decay method, and keeps the learning rate constant when the learning rate reaches a certain initial set value. The benefit of this is that small models can be made faster using a slightly larger learning rate, giving a constant learning rate at large models to reach a peak in accuracy will oscillate the gradient and alleviate the overfitting phenomenon.

The server may determine a plurality of standby networks through other training manners including the above two specific embodiments, in an alternative embodiment, each standby network may be used as an original network capable of processing the image segmentation event, or a determination may be made on the performance of each network, for example, the accuracy of each network and the computing resources to be used are determined through experimental samples, and one or several original networks are determined according to the performance of each network.

Since the original network and the corresponding sampling network (or super network) are identical in structure, the original network includes a master branch and at least one slave branch, wherein the master branch may include a plurality of network modules, and the network module in each slave branch includes a part of the network module in the master branch. If the original network has a plurality of slave branches, each of the plurality of slave branches at least shares one network module. Furthermore, each network module at least comprises a convolution module, and each convolution module at least comprises a convolution layer, a batch normalization layer (BN layer) and an activation layer; wherein the convolution layer, the batch normalization layer (BN layer) and the active layer are connected in series; the convolutional layer may be a deep convolutional layer, a point-by-point convolutional layer, or a two-dimensional convolutional layer.

In step S203, device resource information of the original device training the original network and device resource information of the target device are acquired.

In some possible embodiments, the device resource information of the original device at least includes: the video memory capacity data, the video memory performance data, the network performance data, the memory capacity data and the computing capacity data of the original equipment; the device resource information of the target device includes at least: the video memory capacity data, the video memory performance data, the network performance data, the memory capacity data and the computing capacity data of the target equipment.

Optionally, the device resource information of the original device may be self-contained in the original network, and the server may be directly extracted from the original network; or by obtaining the information from the device resource information storage area of the original device itself and transmitting the information to the server.

Optionally, the server may obtain a target event to be processed by the target device, and obtain device resource information of the target device from the device resource information storage area of the target device. The two steps can be carried out simultaneously, can be carried out in tandem, and it is not specified which of the two steps precedes and which of the two steps succeeds, as appropriate.

In step S204, the target network is segmented from the original network according to the device resource information of the target device and the device resource information of the original device; the target network is used for being deployed on the target equipment and processing the target event, the number of the network modules of the target network and the original network is the same, and the connection structures among the network modules are the same.

In some possible embodiments, the server may adjust, by using a preset search rule, an output resolution, an output channel number, and a number of volume modules of each network module in the original network according to the device resource information of the target device and the device resource information of the original device; and determining the adjusted original network as a target network. The preset search rule may include a random search algorithm, a genetic search algorithm, a differentiable network structure parameter search, a reinforcement learning-based search algorithm, a grid search algorithm, and the like.

For example, if the original network includes the first network module with output resolution of 896 × 1792, the type of convolution module is 2-dimensional convolution, the number of convolution modules is 1, and the number of output channels is 24; the output resolution of the second network module is 448 × 896, the type of convolution module is depth separable convolution, the number of convolution modules is 1, and the number of output channels is 32; the output resolution of the third network module is 224 x 448, the type of convolution module is depth separable convolution, the number of convolution modules is 1, and the number of output channels is 48; the output resolution of the fourth network module is 112 × 224, the type of convolution module is expanded depth separable convolution, the number of convolution modules is 2, and the number of output channels is 48; the output resolution of the fifth network module is 56 × 112, the type of convolution module is expanded depth separable convolution, the number of convolution modules is 2, and the number of output channels is 72; the output resolution of the sixth network module is 28 × 56, the type of convolution module is expanded depth separable convolution, the number of convolution modules is 2, and the number of output channels is 96. And each device resource information of the target device is half of each device resource information of the original device, for example, if the original device is a desktop, and the target device is a mobile phone, the output resolution of each network module can be reduced, the number of output channels can be reduced, and the number of convolution modules can be reduced. Or, the device resource information of the target device is substantially similar to the device resource information of the original device, and only if one device resource is larger than the device resource of the original device, the output resolution of part of the network modules can be increased, the number of output channels can be increased, or the number of convolution modules can be increased.

In some possible embodiments, in order to secure and make the target network better suited to the target device, after the target network is used to be deployed on the target device, the target device may be run on the target device, and it may be determined whether further adjustments need to be made to the network module output resolution, the number of output channels, and the number of convolution modules based on the running result, so as to ensure that the target network is better suited to the target device.

In some possible embodiments, in order to decouple training and searching, resulting in better performance of the target network, the processes of training the original network and searching from the original network to obtain the target network may be separated as described above. It is also possible that the processes of training the original network and searching the target network from the original network are crossed, which are determined by the actual situation of the application, and are not limited in this disclosure.

In summary, the embodiment of the present application provides a preset learning network (a super network, such as a two-branch network in the text), and after training the preset learning network, the preset learning network can be directly intercepted from an original network according to the obtained device resource information by using an algorithm of network structure search, so as to obtain a target network suitable for each target device, without retraining each target network, so that time spent by a researcher in network design and training can be reduced, and the original network obtained by training can have more utilization values.

Fig. 9 shows a block diagram of a deep learning network determination apparatus according to an embodiment of the present disclosure, and as shown in fig. 9, the deep learning network determination apparatus includes:

the target event determining unit 901 is configured to determine a target event to be processed by a target device;

the original network determining unit 902 is configured to determine an original network from the trained network set according to the target event; the original network is a network matched with the target event in the network set;

the resource information obtaining unit 903 is configured to obtain device resource information of an original device that trains the original network and device resource information of the target device;

the target network splitting unit 904 is configured to split a target network from the original network according to the device resource information of the target device and the device resource information of the original device; the target network is used for being deployed on the target device and processing the target event, the number of the network modules of the target network is the same as that of the original network, and the connection structures between the network modules are the same.

In some possible embodiments, the network training subunit is configured to:

for each sampling network of a plurality of sampling networks:

taking each sampling network as a current sampling network;

when the loss value is larger than the preset threshold value, performing back propagation based on the loss value, updating parameters of the current sampling network to obtain an updated sampling network, and determining the updated sampling network as the current sampling network; repeating the steps: processing a plurality of input data based on a current sampling network, and determining output data corresponding to each input data; determining a loss value based on the output data and the label corresponding to the input data;

In some possible embodiments, the network training subunit is configured to:

when the first loss value is larger than a first preset threshold value, performing back propagation based on the first loss value, performing parameter updating on the current first sampling network to obtain an updated first sampling network, and determining the updated first sampling network as the current first sampling network; when the second loss value is larger than a second preset threshold value, performing back propagation based on the second loss value, performing parameter updating on the current second sampling network to obtain an updated second sampling network, and determining the updated second sampling network as the current second sampling network; repeating the steps: processing a plurality of input data based on a current first sampling network, and determining first output data corresponding to each input data; determining a first loss value based on the first output data corresponding to each input data and the label corresponding to each input data; processing a plurality of input data based on a current second sampling network, and determining second output data corresponding to each input data; determining a second loss value based on the second output data corresponding to each input data and the first output data corresponding to each input data;

In some possible embodiments, the target network segmentation unit is to:

and determining the adjusted original network as a target network.

In some embodiments, functions of or modules included in the apparatus provided in the embodiments of the present disclosure may be used to execute the method described in the above method embodiments, and specific implementation thereof may refer to the description of the above embodiments, and for brevity, will not be described again here.

The embodiment of the present disclosure also provides a computer-readable storage medium, in which at least one instruction or at least one program is stored, and the at least one instruction or the at least one program is loaded by a processor and when executed, implements the above method. The computer readable storage medium may be a non-volatile computer readable storage medium.

An embodiment of the present disclosure further provides an electronic device, including: a processor; a memory for storing processor-executable instructions; wherein the processor is configured to implement the above method.

The electronic device may be provided as a terminal, server, or other form of device.

Embodiments of the present disclosure provide a computer program product containing instructions that, when run on a computer, cause the computer to perform the deep learning network determination method of the present disclosure.

FIG. 10 shows a block diagram of an electronic device in accordance with an embodiment of the disclosure. For example, the electronic device 1000 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, a fitness device, a personal digital assistant, or the like terminal.

Referring to fig. 10, electronic device 1000 may include one or more of the following components: processing component 1002, memory 1004, power component 1006, multimedia component 1008, audio component 1010, input/output (I/O) interface 1012, sensor component 1014, and communications component 1016.

The processing component 1002 generally controls overall operation of the electronic device 1000, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing components 1002 may include one or more processors 1020 to execute instructions to perform all or a portion of the steps of the methods described above. Further, processing component 1002 may include one or more modules that facilitate interaction between processing component 1002 and other components. For example, the processing component 1002 may include a multimedia module to facilitate interaction between the multimedia component 1008 and the processing component 1002.

The memory 1004 is configured to store various types of data to support operations at the electronic device 1000. Examples of such data include instructions for any application or method operating on the electronic device 1000, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 1004 may be implemented by any type or combination of volatile or non-volatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.

The power supply component 1006 provides power to the various components of the electronic device 1000. The power components 1006 may include a power management system, one or more power sources, and other components associated with generating, managing, and distributing power for the electronic device 1000.

The multimedia component 1008 includes a screen that provides an output interface between the electronic device 1000 and a user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 1008 includes a front facing camera and/or a rear facing camera. The front camera and/or the rear camera may receive external multimedia data when the electronic device 1000 is in an operating mode, such as a shooting mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.

The audio component 1010 is configured to output and/or input audio signals. For example, the audio component 1010 may include a Microphone (MIC) configured to receive external audio signals when the electronic device 1000 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signal may further be stored in the memory 1004 or transmitted via the communication component 1016. In some embodiments, audio component 1010 also includes a speaker for outputting audio signals.

I/O interface 1012 provides an interface between processing component 1002 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.

The sensor assembly 1014 includes one or more sensors for providing various aspects of status assessment for the electronic device 1000. For example, the sensor assembly 1014 may detect an open/closed state of the electronic device 1000, the relative positioning of components, such as a display and keypad of the electronic device 1000, the sensor assembly 1014 may also detect a change in position of the electronic device 1000 or a component of the electronic device 1000, the presence or absence of user contact with the electronic device 1000, orientation or acceleration/deceleration of the electronic device 1000, and a change in temperature of the electronic device 1000. The sensor assembly 1014 may include a proximity sensor configured to detect the presence of a nearby object without any physical contact. The sensor assembly 1014 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 1014 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

The communication component 1016 is configured to facilitate wired or wireless communication between the electronic device 1000 and other devices. The electronic device 1000 may access a wireless network based on a communication standard, such as WiFi, 2G or 3G, or a combination thereof. In an exemplary embodiment, the communication component 1016 receives a broadcast signal or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communications component 1016 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.

In an exemplary embodiment, the electronic device 1000 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors, or other electronic components for performing the above-described methods.

In an exemplary embodiment, a non-transitory computer-readable storage medium, such as the memory 1004, is also provided that includes computer program instructions executable by the processor 1020 of the electronic device 1000 to perform the above-described methods.

FIG. 11 shows a block diagram of another electronic device in accordance with an embodiment of the disclosure. For example, the electronic device 1100 may be provided as a server. Referring to fig. 11, electronic device 1100 includes a processing component 1122 that further includes one or more processors and memory resources, represented by memory 1132, for storing instructions, such as application programs, that are executable by processing component 1122. The application programs stored in memory 1132 may include one or more modules that each correspond to a set of instructions. Further, the processing component 1122 is configured to execute instructions to perform the above-described method.

The electronic device 1100 may also include a power component 1126 configured to perform power management of the electronic device 1100, a wired or wireless network interface 1150 configured to connect the electronic device 1100 to a network, and an input/output (I/O) interface 1158. The electronic device 1100 may operate based on an operating system stored in memory 1132, such as Windows Server, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, or the like.

In an exemplary embodiment, a non-transitory computer readable storage medium, such as the memory 1132, is also provided that includes computer program instructions executable by the processing component 1122 of the electronic device 1100 to perform the methods described above.

The present disclosure may be systems, methods, and/or computer program products. The computer program product may include a computer-readable storage medium having computer-readable program instructions embodied thereon for causing a processor to implement various aspects of the present disclosure.

The computer readable storage medium may be a tangible device that can hold and store the instructions for use by the instruction execution device. The computer readable storage medium may be, for example, but not limited to, an electronic memory device, a magnetic memory device, an optical memory device, an electromagnetic memory device, a semiconductor memory device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a Static Random Access Memory (SRAM), a portable compact disc read-only memory (CD-ROM), a Digital Versatile Disc (DVD), a memory stick, a floppy disk, a mechanical coding device, such as punch cards or in-groove projection structures having instructions stored thereon, and any suitable combination of the foregoing. Computer-readable storage media as used herein is not to be construed as transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or other transmission medium (e.g., optical pulses through a fiber optic cable), or electrical signals transmitted through electrical wires.

The computer-readable program instructions described herein may be downloaded from a computer-readable storage medium to a respective computing/processing device, or to an external computer or external storage device via a network, such as the internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers and/or edge servers. The network adapter card or network interface in each computing/processing device receives computer-readable program instructions from the network and forwards the computer-readable program instructions for storage in a computer-readable storage medium in the respective computing/processing device.

The computer program instructions for carrying out operations of the present disclosure may be assembler instructions, Instruction Set Architecture (ISA) instructions, machine-related instructions, microcode, firmware instructions, state setting data, or source or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The computer-readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider). In some embodiments, the electronic circuitry that can execute the computer-readable program instructions implements aspects of the present disclosure by utilizing the state information of the computer-readable program instructions to personalize the electronic circuitry, such as a programmable logic circuit, a Field Programmable Gate Array (FPGA), or a Programmable Logic Array (PLA).

Various aspects of the present disclosure are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the disclosure. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable program instructions.

These computer-readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer-readable program instructions may also be stored in a computer-readable storage medium that can direct a computer, programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer-readable medium storing the instructions comprises an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.

The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer, other programmable apparatus or other devices implement the functions/acts specified in the flowchart and/or block diagram block or blocks.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

Having described embodiments of the present disclosure, the foregoing description is intended to be exemplary, not exhaustive, and not limited to the disclosed embodiments. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein is chosen in order to best explain the principles of the embodiments, the practical application, or improvements made to the technology in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

Claims

1. A method for deep learning network determination, the method comprising:

determining a target event to be processed by target equipment;

acquiring equipment resource information of original equipment for training the original network and equipment resource information of the target equipment;

segmenting a target network from the original network according to the equipment resource information of the target equipment and the equipment resource information of the original equipment; the target network is used for being deployed on the target device and processing the target event, the number of the network modules of the target network is the same as that of the original network, and the connection structures between the network modules are the same.

2. The method of determining a deep learning network of claim 1, further comprising the step of training the original network:

and training the plurality of sampling networks by using the sample data set, taking the trained plurality of sampling networks as a plurality of standby networks, and determining the original network from the plurality of standby networks.

3. The deep learning network determination method of claim 2, wherein the plurality of sampling networks includes a first level network, at least one second level network and a third level network; the number of the network modules of the first-level network, the at least one second-level network and the third-level network is the same, and the connection structures among the network modules are the same;

4. The method according to any one of claims 2 to 3, wherein the training the plurality of sampling networks by using the sample data set, and using the trained plurality of sampling networks as a plurality of backup networks comprises:

taking each sampling network of the plurality of sampling networks as a current sampling network;

processing the plurality of input data based on the current sampling network, and determining output data corresponding to each input data;

determining a loss value based on the output data and a label corresponding to the input data;

5. The method according to claim 4, wherein when the loss value is less than or equal to the preset threshold, before determining the current sampling network as a backup network, the method further comprises:

when the loss value is larger than a preset threshold value, performing back propagation based on the loss value, updating parameters of the current sampling network to obtain an updated sampling network, and determining the updated sampling network as the current sampling network;

repeating the steps: processing the plurality of input data based on the current sampling network, and determining output data corresponding to each input data; determining a loss value based on the output data and the label corresponding to the input data.

6. The method according to claim 3, wherein the training the plurality of sampling networks by using the sample data set, and using the trained plurality of sampling networks as a plurality of standby networks comprises:

taking the first level network as a current first sampling network and each of the at least one second level network and the third level network as a current second sampling network;

processing the plurality of input data based on the current first sampling network, and determining first output data corresponding to each input data; determining a first loss value based on first output data corresponding to each input data and a label corresponding to each input data;

for each of the plurality of current second sampling networks: processing the plurality of input data based on the current second sampling network, and determining second output data corresponding to each input data; determining a second loss value based on the second output data corresponding to each input data and the first output data corresponding to each input data;

and when the first loss value is less than or equal to the first preset threshold value and when the second loss value is less than or equal to the second preset threshold value, determining the current first sampling network and the current second sampling deep learning network as standby networks.

7. The deep learning network determination method according to claim 6, wherein before determining the current first sampling network and the current second sampling deep learning network as standby networks when the first loss value is less than or equal to the first preset threshold and when the second loss value is less than or equal to the second preset threshold, the method further comprises:

when the first loss value is larger than a first preset threshold value, performing back propagation on the basis of the first loss value, updating parameters of the current first sampling network to obtain an updated first sampling network, and determining the updated first sampling deep learning network as the current first sampling network; when the second loss value is larger than a second preset threshold value, performing back propagation on the basis of the second loss value, updating parameters of the current second sampling network to obtain an updated second sampling network, and determining the updated second sampling deep learning network as the current second sampling network;

repeating the steps: processing the plurality of input data based on the current first sampling network, and determining first output data corresponding to each input data; determining a first loss value based on first output data corresponding to each input data and a label corresponding to each input data; processing the plurality of input data based on the current second sampling network, and determining second output data corresponding to each input data; and determining a second loss value based on the second output data corresponding to each input data and the first output data corresponding to each input data.

8. The deep learning network determination method of any one of claims 1-2, wherein the original network comprises a master branch and at least one slave branch; the master leg comprises a plurality of network modules, the network modules in each slave leg comprising a partial network module in the master leg;

9. The method according to claim 8, wherein each network module comprises at least one convolution module, and each convolution module comprises at least one convolution layer, one batch normalization layer (BN layer), and one activation layer;

the convolutional layer, the batch normalization layer (BN layer), and the active layer are connected in series;

10. The deep learning network determination method of claim 1, wherein the device resource information of the original device at least comprises: the video memory capacity data, the video memory performance data, the network performance data, the memory capacity data and the computing capacity data of the original equipment;

the device resource information of the target device at least includes: the video memory capacity data, the video memory performance data, the network performance data, the memory capacity data and the computing capacity data of the target device.

11. The method of claim 1, wherein the determining an original network from a trained network set according to the target event comprises:

acquiring a task identifier and an object identifier of each network in the network set;

determining the target task identifier and a first matching degree value of each task identifier;

determining a second matching degree value of the target object identifier and each object identifier;

12. The method of claim 1, wherein the segmenting the target network from the original network according to the device resource information of the target device and the device resource information of the original device comprises:

and determining the adjusted original deep learning network as the target network.

13. An apparatus for deep learning network determination, the apparatus comprising:

the original deep learning network determining unit is used for determining an original network from the trained network set according to the target event; the original network is a network matched with the target event in the network set;

14. An electronic device comprising at least one processor, and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor, the at least one processor implementing a deep learning network determination method as recited in claims 1-12 by executing the instructions stored by the memory.

15. A computer-readable storage medium, having at least one instruction or at least one program stored therein, the at least one instruction or at least one program being loaded and executed by a processor to implement a deep learning network determination method as claimed in claims 1-12.