CN113792876A

CN113792876A - Backbone network generation method, device, equipment and storage medium

Info

Publication number: CN113792876A
Application number: CN202111088473.8A
Authority: CN
Inventors: 崔程; 郜廷权; 魏胜禹; 杜宇宁; 郭若愚; 陆彬; 周颖; 吕雪莹; 刘其文; 胡晓光; 于佃海; 马艳军
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2021-09-16
Filing date: 2021-09-16
Publication date: 2021-12-14
Anticipated expiration: 2041-09-16
Also published as: CN113792876B; JP7324891B2; US20220247626A1; JP2022091919A; EP4095761A1; US11929871B2

Abstract

The disclosure provides a method, a device, equipment and a storage medium for generating a backbone network, and relates to the technical field of artificial intelligence, in particular to deep learning and computer vision technologies. The method comprises the following steps: acquiring a training image set, a reasoning image set and an initial backbone network set; aiming at each initial backbone network in the initial backbone network set, training and reasoning the initial backbone network by using a training image set and a reasoning image set to obtain the reasoning time consumption and the reasoning precision of the trained backbone network in the reasoning process; determining a basic backbone network based on the reasoning time consumption and the reasoning precision of each trained backbone network in the reasoning process; and obtaining the target backbone network based on the basic backbone network and a preset target network. The method for generating the backbone network improves the reasoning speed and the reasoning precision of the backbone network on the Intel central processing unit.

Description

Backbone network generation method, device, equipment and storage medium

Technical Field

The present disclosure relates to the field of artificial intelligence technologies, and in particular, to deep learning and computer vision technologies, and in particular, to a method, an apparatus, a device, and a storage medium for generating a backbone network.

Background

In computer vision tasks based on deep learning, such as image classification, target detection, image semantic segmentation, metric learning and the like, backbone networks are not separated, and the significance of the backbone networks serving as characteristic extractors is self-evident. In some existing light-weight backbone networks, such as SHuffleNetV2 and MobileNetV3, the time consumed by reasoning of these backbone networks at the Intel CPU (Intel central processing unit) end is still not ideal, so that tasks such as target detection and image segmentation cannot be predicted in real time at the Intel CPU end.

Disclosure of Invention

The disclosure provides a method, a device, equipment and a storage medium for generating a backbone network.

According to a first aspect of the present disclosure, a method for generating a backbone network is provided, including: acquiring a training image set, a reasoning image set and an initial backbone network set; aiming at each initial backbone network in the initial backbone network set, training and reasoning the initial backbone network by using a training image set and a reasoning image set to obtain the reasoning time consumption and the reasoning precision of the trained backbone network in the reasoning process; determining a basic backbone network based on the reasoning time consumption and the reasoning precision of each trained backbone network in the reasoning process; and obtaining the target backbone network based on the basic backbone network and a preset target network.

According to a second aspect of the present disclosure, there is provided an image classification method, comprising: acquiring an image to be classified; extracting features of the image to be classified by using a pre-generated backbone network to obtain image features, wherein the backbone network is generated by a method described in any implementation manner of the first aspect; and classifying the image characteristics to obtain a classification result.

According to a third aspect of the present disclosure, there is provided a backbone network generation apparatus, including: a first acquisition module configured to acquire a training image set, an inference image set, and an initial backbone network set; the training module is configured to train and reason the initial backbone network by using the training image set and the inference image set aiming at each initial backbone network in the initial backbone network set to obtain the inference time consumption and the inference precision of the trained backbone network in the inference process; the determining module is configured to determine a basic backbone network based on the inference time consumption and the inference precision of each trained backbone network in the inference process; an obtaining module configured to obtain a target backbone network based on the basic backbone network and a preset target network.

According to a fourth aspect of the present disclosure, there is provided an image classification apparatus including: a second obtaining module configured to obtain an image to be classified; the extraction module is configured to extract features of an image to be classified by using a pre-generated backbone network to obtain image features, wherein the backbone network is generated by the method described in any one of the implementation manners of the first aspect; and the classification module is configured to classify the image features to obtain a classification result.

According to a fifth aspect of the present disclosure, there is provided an electronic device comprising: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method as described in any one of the implementations of the first aspect or the second aspect.

According to a sixth aspect of the present disclosure, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing a computer to perform a method as described in any one of the implementation manners of the first or second aspect.

According to a seventh aspect of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements the method as described in any of the implementations of the first or second aspect.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

FIG. 1 is an exemplary system architecture diagram in which the present disclosure may be applied;

fig. 2 is a flow diagram of one embodiment of a method of generating a backbone network according to the present disclosure;

fig. 3 is a flow diagram of another embodiment of a method of generating a backbone network according to the present disclosure;

fig. 4 is a flow diagram of yet another embodiment of a method of generating a backbone network according to the present disclosure;

FIG. 5 is a flow diagram for one embodiment of an image classification method according to the present disclosure;

fig. 6 is a schematic structural diagram of an embodiment of a generation apparatus of a backbone network according to the present disclosure;

FIG. 7 is a schematic structural diagram of one embodiment of an image classification apparatus according to the present disclosure;

fig. 8 is a block diagram of an electronic device for implementing a method of generating a backbone network or a method of classifying images according to an embodiment of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

It should be noted that, in the present disclosure, the embodiments and features of the embodiments may be combined with each other without conflict. The present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.

Fig. 1 shows an exemplary system architecture 100 to which an embodiment of a backbone network generation method or a backbone network generation apparatus of the present disclosure may be applied.

As shown in fig. 1, the system architecture 100 may include

terminal devices

101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the

terminal devices

101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.

A user may use the

terminal devices

101, 102, 103 to interact with the server 105 via the network 104 to receive or transmit information or the like. Various client applications may be installed on the

terminal devices

101, 102, 103.

The

terminal apparatuses

101, 102, and 103 may be hardware or software. When the

terminal devices

101, 102, 103 are hardware, they may be various electronic devices including, but not limited to, smart phones, tablet computers, laptop portable computers, desktop computers, and the like. When the

terminal apparatuses

101, 102, 103 are software, they can be installed in the above-described electronic apparatuses. It may be implemented as multiple pieces of software or software modules, or as a single piece of software or software module. And is not particularly limited herein.

The server 105 may provide various services. For example, the server 105 may analyze and process the training image set, the inference image set, and the initial backbone network set acquired from the

terminal devices

101, 102, 103, and generate a processing result (e.g., a target backbone network).

The server 105 may be hardware or software. When the server 105 is hardware, it may be implemented as a distributed server cluster composed of a plurality of servers, or may be implemented as a single server. When the server 105 is software, it may be implemented as multiple pieces of software or software modules (e.g., to provide distributed services), or as a single piece of software or software module. And is not particularly limited herein.

It should be noted that the method for generating the backbone network provided by the embodiment of the present disclosure is generally executed by the server 105, and accordingly, the device for generating the backbone network is generally disposed in the server 105.

It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

With continued reference to fig. 2, a flow 200 of one embodiment of a method of generating a backbone network according to the present disclosure is shown. The method for generating the backbone network comprises the following steps:

step 201, a training image set, an inference image set and an initial backbone network set are obtained.

In this embodiment, the executing entity (e.g., the server 105 shown in fig. 1) of the method for generating the backbone network may acquire the training image set, the inference image set, and the initial backbone network combination. The training image set is used for training an initial backbone network in the initial backbone network set, and the training image set comprises at least one image. The training image set may be an existing image set, such as an ImageNet-1k image data set, or an image set including a certain number of images collected from existing images, which is not specifically limited in this embodiment. The reasoning image set comprises at least one image, and the trained backbone network can reason the reasoning images in the reasoning image set. The initial backbone network set includes at least one initial backbone network, and the initial backbone network may be an existing backbone network or a backbone network obtained through training, which is not specifically limited in this embodiment.

Step 202, aiming at each initial backbone network in the initial backbone network set, training and reasoning the initial backbone network by using the training image set and the reasoning image set to obtain the reasoning time consumption and the reasoning precision of the trained backbone network in the reasoning process.

In this embodiment, for each initial backbone network in the initial backbone network set obtained in step 201, the executing body may train and reason the initial backbone network by using the training image set and the inference image set obtained in step 201, so as to obtain inference time consumption and inference accuracy of the trained backbone network in the inference process.

For example, the executing entity may train the initial backbone network using a training image set, so as to obtain a trained backbone network; and then reasoning the reasoning images in the set of the pushed images by using the trained backbone network, thereby obtaining the reasoning time consumption and the reasoning precision of the trained backbone network in the reasoning process. Because the inference image set can comprise at least one image, under the condition that the inference image set only comprises one image, the executing main body can use the time consumption and the precision of an inference result used when the trained backbone network is used for inferring the image as the time consumption and the inference precision of the trained backbone network in the inference process; and under the condition that the reasoning image set comprises a plurality of images, the execution main body records the time consumption and the precision of the reasoning result used when the trained backbone network is used for reasoning each image, then the reasoning time consumption of all the images and the reasoning precision of all the images are respectively averaged, and the result is used as the reasoning time consumption and the reasoning precision of the trained backbone network in the reasoning process.

Optionally, since the inference process is executed on the Intel CPU, during the inference process, the mklnd nn (deep learning acceleration library) may be started, so as to improve the inference speed of the backbone network under the Intel CPU.

And step 203, determining a basic backbone network based on the inference time consumption and the inference precision of each trained backbone network in the inference process.

In this embodiment, the executing body may determine the basic backbone network based on the inference time consumption and the inference precision of each trained backbone network in the inference process, where the basic backbone network is a backbone network with less inference time consumption and high inference precision in the trained backbone networks, that is, a backbone network with the best effect in the initial backbone network set.

After the step 202, the inference time consumption and the inference precision of the trained backbone network corresponding to each initial backbone network in the initial backbone network set in the inference process can be obtained, and it can be understood that the backbone network with the higher inference precision is better when the inference time consumption is smaller, or the backbone network with the higher inference precision is better when the inference time consumption is the same, or the backbone network with the lower inference time consumption is better when the inference precision is the same, so the execution main body can determine the basic backbone network based on the rule. When the inference time consumption and the inference precision of different backbone networks are different, the basic backbone network can be determined based on the proportional relation between the inference time consumption and the inference precision.

Optionally, after determining the basic backbone network, the executing entity may find a design rule corresponding to the basic backbone network, determine other backbone networks with a structure similar to that of the basic backbone network based on the design rule, and repeatedly execute step 202 and step 203, thereby obtaining inference time consumption and inference accuracy corresponding to the other backbone networks, and determine a backbone network with a better effect based on the inference time consumption and the inference accuracy, and use the backbone network as the basic backbone network.

And step 204, obtaining a target backbone network based on the basic backbone network and a preset target network.

In this embodiment, the executing entity may obtain the target backbone network based on the basic backbone network obtained in step 203 and a preset target network. The preset target network is a pre-constructed network that can further improve the inference accuracy of the basic backbone network but hardly affects the inference time of the basic backbone network, for example, the target network may include a larger full connection layer or a stronger activation function.

Since the basic backbone network obtained in step 203 already has a good balance capability of reasoning time consumption-reasoning accuracy, in this step, the executing entity obtains a preset target network and adds the target network to the basic backbone network obtained in step 203 to obtain a target backbone network, thereby further improving the reasoning accuracy of the target backbone network.

The method for generating the backbone network comprises the steps of firstly obtaining a training image set, a reasoning image set and an initial backbone network set; then, aiming at each initial backbone network in the initial backbone network set, training and reasoning the initial backbone network by using the training image set and the reasoning image set to obtain the reasoning time consumption and the reasoning precision of the trained backbone network in the reasoning process; then, determining a basic backbone network based on the reasoning time consumption and the reasoning precision of each trained backbone network in the reasoning process; and finally, obtaining the target backbone network based on the basic backbone network and a preset target network. The method for generating the backbone network in the embodiment is based on the Intel CPU, so that the target backbone network obtained by the method in the embodiment has higher inference precision and higher inference speed on the Intel CPU; in addition, the migration cost of the target backbone network obtained based on the method in the embodiment is low, and the migration is more convenient.

In the technical scheme of the disclosure, the collection, storage, use, processing, transmission, provision, disclosure and other processing of the personal information of the related user are all in accordance with the regulations of related laws and regulations and do not violate the good customs of the public order.

With continued reference to fig. 3, fig. 3 illustrates a flow 300 of another embodiment of a method of generating a backbone network according to the present disclosure. The method for generating the backbone network comprises the following steps:

step 301, acquiring a training image set, an inference image set and an initial backbone network set.

In this embodiment, an executive (e.g., the server 105 shown in fig. 1) of the method for generating the backbone network may acquire a training image set, an inference image set, and an initial backbone network set. Step 301 is substantially the same as step 201 in the foregoing embodiment, and the specific implementation manner may refer to the foregoing description of step 201, which is not described herein again.

In some optional implementations of this embodiment, the initial set of backbone networks includes at least one initial backbone network; and the initial backbone network is obtained by the following steps: acquiring network modules of each lightweight backbone network to obtain a network module set; and randomly combining the network modules in the network module set to obtain an initial backbone network.

In this implementation, first, network modules (blocks) in each existing lightweight backbone network may be obtained, so as to obtain a network module set including at least one block, for example, the network module set may include one or more of the following: depthSepConv (Depthwise Separable Convolution module), Channel-Shuffle block, Inverted residual block, Ghost block, and Fire block, where depthSepConv is the block used by backbone network MobileNet V1, Channel-Shuffle block is the block used by backbone network ShffleNet V1/V2, Inverted residual block is the block used by backbone network MobileNet V2/V3, Ghost block is the block used by backbone network GhostNet, and Fire block is the block used by SqueeNet.

Then, the execution main body may randomly combine the network modules in the network module set to obtain at least one combined initial backbone network, where the at least one initial backbone network forms an initial backbone network set. By randomly combining the network modules in the network module set, the structure of the obtained initial backbone network is not limited to a certain structure, and the structure of the initial backbone network is enriched.

It should be noted that any two network modules in the network module set may be combined, and any three network modules in the network module set may also be combined, and the number of blocks used in the random combination is not limited in this embodiment.

And step 302, aiming at each initial backbone network in the initial backbone network set, training the initial backbone network by using the training image set to obtain the trained backbone network.

In this embodiment, for each initial backbone network in the initial backbone network set, an executing entity (for example, the server 105 shown in fig. 1) of the backbone network generation method may train the initial backbone network by using the training image set obtained in step 301, so as to obtain a trained backbone network. Preferably, the executive body trains the initial backbone network by using the ImageNet-1k image dataset to obtain a trained backbone network.

And 303, converting the trained backbone network into a reasoning network, and reasoning the reasoning image set by using the reasoning network to obtain the reasoning time consumption and the reasoning precision of the reasoning network in the reasoning process.

In this embodiment, the executing entity may convert the backbone network trained in step 302 into an inference network, and the specific conversion process may be implemented by using the prior art, which is not described herein again. And then, the executive body infers the inference image set by using the obtained inference network so as to obtain the inference time consumption and the inference precision when the inference network infers each image in the inference image set, then averages the inference time consumption and the inference precision of all the images, and takes the result as the inference time consumption and the inference precision of the inference network in the inference process. Therefore, the inference time consumption and the inference precision corresponding to the obtained inference network can represent the average level of the inference time consumption and the inference precision of the inference network in the inference process.

And step 304, drawing the inference time consumption and the inference precision of each inference network in the inference process as points in a two-dimensional coordinate system.

In this embodiment, the execution body may draw the inference time consumption and the inference precision of each inference network in the inference process as one point in a two-dimensional coordinate system by using the inference time consumption as an abscissa and the inference precision as an ordinate, so as to obtain the two-dimensional coordinate system including the points of the inference time consumption and the inference precision corresponding to each inference network.

And 305, determining a target point from each point in the two-dimensional coordinate system, and determining an initial backbone network corresponding to the target point as a basic backbone network.

In this embodiment, the executing entity may determine a target point from each point in the two-dimensional coordinate system, and determine an initial backbone network corresponding to the target point as the basic backbone network. It can be understood that the inference time consumption of the initial backbone network is smaller and the inference precision is higher for the point corresponding to the upper left in the two-dimensional coordinate system, so the initial backbone network is better for the point corresponding to the upper left. Therefore, in this embodiment, the point closer to the upper left in the two-dimensional coordinate system is used as the target point, and then the initial backbone network corresponding to the target point is determined as the basic backbone network, so that the obtained basic backbone network has higher inference accuracy and shorter inference time.

Step 306, obtain the target network.

In this embodiment, the executing body may obtain a target network, where the target network is pre-constructed, and may further improve the inference accuracy of the basic backbone network but hardly affect the network consuming time for inference of the basic backbone network. Wherein the target network comprises at least one of the following: activation function, full connectivity layer. As an example, the activation function may be an h-swish activation function, which is a more powerful activation function with better results, or a larger fully connected layer may also be used.

And 307, adding the target network to the basic backbone network to obtain the target backbone network.

In this embodiment, the executing entity may add the target network obtained in step 306 to the basic backbone network, so as to obtain the target backbone network. For example, a full connection layer containing a larger size is added to the end of the base backbone network, resulting in the target backbone network. Therefore, on the premise of ensuring the time consumption of reasoning of the target backbone network, the reasoning precision of the target backbone network is further improved.

As can be seen from fig. 3, compared with the embodiment corresponding to fig. 2, in the method for generating a backbone network in this embodiment, first, a training image set, an inference image set, and an initial backbone network set are obtained, and for each initial backbone network in the initial backbone network set, the training image set is used to train the initial backbone network, so as to obtain a trained backbone network; secondly, converting the trained backbone network into a reasoning network, and reasoning the reasoning image set by using the reasoning network to obtain the reasoning time consumption and the reasoning precision of the reasoning network in the reasoning process; drawing the inference time consumption and the inference precision of each inference network in the inference process as points in a two-dimensional coordinate system, determining a target point from each point in the two-dimensional coordinate system, and determining an initial backbone network corresponding to the target point as a basic backbone network; and finally, acquiring a target network, and adding the target network to the basic backbone network to obtain the target backbone network. The method for generating the backbone network in the embodiment further improves the reasoning precision of the target backbone network on the basis of ensuring the reasoning time consumption of the target backbone network on the Intel CPU.

With continuing reference to fig. 4, fig. 4 illustrates a flow 400 of yet another embodiment of a method of generating a backbone network according to the present disclosure. The method for generating the backbone network comprises the following steps:

step 401, a training image set, an inference image set, and an initial backbone network set are obtained.

Step 402, aiming at each initial backbone network in the initial backbone network set, training the initial backbone network by using the training image set to obtain the trained backbone network.

And 403, converting the trained backbone network into a reasoning network, and reasoning the reasoning image set by using the reasoning network to obtain the reasoning time consumption and the reasoning precision of the reasoning network in the reasoning process.

And step 404, drawing the inference time consumption and the inference precision of each inference network in the inference process as points in a two-dimensional coordinate system.

Step 405, determining a target point from each point in the two-dimensional coordinate system, and determining an initial backbone network corresponding to the target point as a basic backbone network.

Step 406, obtain the target network.

Step 407, adding the target network to the basic backbone network to obtain the target backbone network.

The steps 401-.

Step 408, updating the convolution kernel size of the target backbone network.

In this embodiment, an executing entity (for example, the server 105 shown in fig. 1) of the backbone network generation method may update the convolution kernel size of the target backbone network, that is, change the convolution kernel size (kernel-size) of the target backbone network to a preset size larger than the current convolution kernel size, where the preset size may be set according to specific situations, and this embodiment does not limit this. And the inference precision of the target backbone network is further improved by updating the convolution kernel size of the target backbone network.

Step 409, adding the SE module to a preset target adding position in the target backbone network to obtain a final backbone network.

In this embodiment, the execution body may add an SE module (Squeeze-and-Excitation Networks) to a predetermined target addition position in the target backbone network, so as to obtain a final backbone network. The SE module learns the correlation among the channels, screens out the attention of the channels, can further improve the accuracy of the network model, and can be loaded into the existing network model framework. In this embodiment, the SE module is loaded into the target backbone network obtained in step 408 to obtain a final backbone network, so that the inference accuracy of the final backbone network is further improved.

In some optional implementations of this embodiment, the target addition position is determined by: adding the SE module to different positions in a target backbone network to obtain a corresponding first backbone network set; for each first backbone network in the first backbone network set, reasoning the reasoning image set by using the first backbone network to obtain the reasoning time consumption and the reasoning precision of the first backbone network in the reasoning process; and determining the target adding position based on the reasoning time consumption and the reasoning precision of each first backbone network in the reasoning process.

In this implementation, the SE module is first added to different positions of the target backbone network, so as to obtain a plurality of corresponding first backbone networks, where the plurality of first backbone networks form a first backbone network set. And then, reasoning the reasoning images in the reasoning image set by utilizing the first backbone network aiming at each first backbone network in the first backbone network set, thereby obtaining the reasoning time consumption and the reasoning precision of the first backbone network in the reasoning process. Finally, the first backbone network with the optimal effect is determined based on the inference time consumption and the inference precision of each first backbone network in the inference process, and the specific determination process may refer to the foregoing embodiments and will not be described herein again. And the adding position of the SE module in the optimal first backbone network is the target adding position. And determining the target adding position of the SE module according to the reasoning time consumption and the reasoning precision of each first backbone network in the reasoning process, so that the reasoning precision of the final backbone network added with the SE module is improved.

As can be seen from fig. 4, compared with the embodiment corresponding to fig. 3, the method for generating the backbone network in this embodiment highlights a step of updating the convolution kernel size of the target backbone network and adding the SE module to the target backbone network, so as to obtain a final backbone network, and further improves the inference accuracy of the final backbone network.

With continued reference to fig. 5, a flow 500 of one embodiment of an image classification method according to the present disclosure is shown. The image classification method comprises the following steps:

step 501, obtaining an image to be classified.

In this embodiment, an executing subject (for example, the server 105 shown in fig. 1) of the image classification method may acquire an image to be classified, where the image to be classified may be selected and uploaded from existing images by a user, or may be captured by a camera of a terminal device by the user, and the image to be classified may be an image including any person or thing, which is not specifically limited in this embodiment.

Step 502, extracting the features of the image to be classified by using a pre-generated backbone network to obtain the image features.

In this embodiment, the executing entity may extract features of the image to be classified by using a backbone network trained in advance to obtain image features, where the backbone network may be obtained by the method described in the foregoing embodiment. Specifically, the executing body may input the image to be classified acquired in step 501 into a backbone network generated in advance, so that the backbone network extracts features of the image to be classified, thereby obtaining image features corresponding to the image to be classified.

And 503, classifying the image features to obtain a classification result.

In this embodiment, the executing entity may classify the image features obtained in step 502, so as to obtain a final classification result. Specifically, the complaint execution main body may assign a classification label to the image feature of each dimension based on the image features of each dimension extracted by the backbone network, and obtain a final classification result based on each classification label.

The image classification method provided by the embodiment of the disclosure includes the steps of firstly, obtaining an image to be classified; then, extracting the features of the images to be classified by using a pre-trained backbone network to obtain image features; and finally, classifying the image characteristics to obtain a classification result. In the image classification method in this embodiment, the features of the image to be classified are extracted by using the pre-generated backbone network, so that the speed and accuracy of extracting the features are improved, and the accuracy of the final classification result is improved.

With further reference to fig. 6, as an implementation of the methods shown in the above-mentioned figures, the present disclosure provides an embodiment of an apparatus for generating a backbone network, where the embodiment of the apparatus corresponds to the embodiment of the method shown in fig. 2, and the apparatus may be specifically applied to various electronic devices.

As shown in fig. 6, a backbone network generation device 600 of the present embodiment includes: a first obtaining module 601, a training module 602, a determining module 603 and an obtaining module 604. The first obtaining module 601 is configured to obtain a training image set, an inference image set, and an initial backbone network set; the training module 602 is configured to train and reason the initial backbone network by using the training image set and the inference image set for each initial backbone network in the initial backbone network set, so as to obtain inference time consumption and inference accuracy of the trained backbone network in an inference process; a determining module 603 configured to determine a basic backbone network based on the inference time consumption and the inference precision of each trained backbone network in the inference process; a obtaining module 604 configured to obtain a target backbone network based on the basic backbone network and a preset target network.

In this embodiment, the backbone network generation apparatus 600: the specific processing and the technical effects thereof of the first obtaining module 601, the training module 602, the determining module 603, and the obtaining module 604 can refer to the related descriptions of step 201 and step 204 in the corresponding embodiment of fig. 2, which are not described herein again.

In some optional implementations of this embodiment, the training module includes: the training submodule is configured to train the initial backbone network by using a training image set to obtain a trained backbone network; and the reasoning submodule is configured to convert the trained backbone network into a reasoning network, and utilizes the reasoning network to carry out reasoning on the reasoning image set to obtain the reasoning time consumption and the reasoning precision of the reasoning network in the reasoning process.

In some optional implementations of this embodiment, the determining module includes: the drawing submodule is configured to draw the reasoning time consumption and the reasoning precision of the reasoning network in the reasoning process as points in a two-dimensional coordinate system; and the determining submodule is configured to determine a target point from each point in the two-dimensional coordinate system, and determine an initial backbone network corresponding to the target point as a basic backbone network.

In some optional implementations of this embodiment, the obtaining module includes: an acquisition submodule configured to acquire a target network, wherein the target network comprises at least one of: activation function, full connectivity layer; and the obtaining submodule is configured to add the target network into the basic backbone network to obtain the target backbone network.

In some optional implementations of this embodiment, the above backbone network generation apparatus 600 further includes: an update module configured to update a convolution kernel size of the target backbone network.

In some optional implementations of this embodiment, the above backbone network generation apparatus 600 further includes: and the adding module is configured to add the SE module to a preset target adding position in the target backbone network to obtain a final backbone network.

In some optional implementations of this embodiment, the target adding position is determined by: adding the SE module to different positions in a target backbone network to obtain a corresponding first backbone network set; for each first backbone network in the first backbone network set, reasoning the reasoning image set by using the first backbone network to obtain the reasoning time consumption and the reasoning precision of the first backbone network in the reasoning process; and determining the target adding position based on the reasoning time consumption and the reasoning precision of each first backbone network in the reasoning process.

With further reference to fig. 7, as an implementation of the methods shown in the above figures, the present disclosure provides an embodiment of an image classification apparatus, which corresponds to the method embodiment shown in fig. 5, and which is particularly applicable to various electronic devices.

As shown in fig. 7, the image classification apparatus 700 of the present embodiment includes: a second obtaining module 701, an extracting module 702 and a classifying module 703. The second obtaining module 701 is configured to obtain an image to be classified; an extraction module 702 configured to extract features of an image to be classified by using a pre-generated backbone network to obtain image features; the classification module 703 is configured to classify the image features to obtain a classification result.

In the present embodiment, in the image classification apparatus 700: the specific processing of the second obtaining module 701, the extracting module 702 and the classifying module 703 and the technical effects thereof can refer to the related descriptions of step 501 and step 503 in the corresponding embodiment of fig. 5, which are not repeated herein.

The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.

FIG. 8 illustrates a schematic block diagram of an example electronic device 800 that can be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 8, the apparatus 800 includes a computing unit 801 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM)802 or a computer program loaded from a storage unit 808 into a Random Access Memory (RAM) 803. In the RAM 803, various programs and data required for the operation of the device 800 can also be stored. The calculation unit 801, the ROM 802, and the RAM 803 are connected to each other by a bus 804. An input/output (I/O) interface 805 is also connected to bus 804.

A number of components in the device 800 are connected to the I/O interface 805, including: an input unit 806, such as a keyboard, a mouse, or the like; an output unit 807 such as various types of displays, speakers, and the like; a storage unit 808, such as a magnetic disk, optical disk, or the like; and a communication unit 809 such as a network card, modem, wireless communication transceiver, etc. The communication unit 809 allows the device 800 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.

Computing unit 801 may be a variety of general and/or special purpose processing components with processing and computing capabilities. Some examples of the computing unit 801 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and the like. The calculation unit 801 executes the respective methods and processes described above, such as the generation method of the backbone network or the image classification method. For example, in some embodiments, the generation method or the image classification method of the backbone network may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as the storage unit 808. In some embodiments, part or all of the computer program can be loaded and/or installed onto device 800 via ROM 802 and/or communications unit 809. When the computer program is loaded into the RAM 803 and executed by the computing unit 801, one or more steps of the above-described method of generating a backbone network or method of image classification may be performed. Alternatively, in other embodiments, the computing unit 801 may be configured to perform the generation method or the image classification method of the backbone network by any other suitable means (e.g. by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server with a combined blockchain.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved, and the present disclosure is not limited herein.

The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims

1. A method for generating a backbone network comprises the following steps:

acquiring a training image set, a reasoning image set and an initial backbone network set;

aiming at each initial backbone network in the initial backbone network set, training and reasoning the initial backbone network by using the training image set and the reasoning image set to obtain the reasoning time consumption and the reasoning precision of the trained backbone network in the reasoning process;

determining a basic backbone network based on the reasoning time consumption and the reasoning precision of each trained backbone network in the reasoning process;

and obtaining a target backbone network based on the basic backbone network and a preset target network.

2. The method of claim 1, wherein the training and reasoning the initial backbone network with the training image set and the reasoning image set to obtain reasoning time consumption and reasoning accuracy of the trained backbone network in a reasoning process comprises:

training the initial backbone network by using the training image set to obtain a trained backbone network;

and converting the trained backbone network into a reasoning network, and reasoning the reasoning image set by using the reasoning network to obtain the reasoning time consumption and the reasoning precision of the reasoning network in the reasoning process.

3. The method of claim 2, wherein the determining a basic backbone network based on the inference time consumption and the inference precision of each trained backbone network in the inference process comprises:

drawing the inference time consumption and the inference precision of each inference network in the inference process as points in a two-dimensional coordinate system;

and determining a target point from each point in the two-dimensional coordinate system, and determining an initial backbone network corresponding to the target point as a basic backbone network.

4. The method of claim 1, wherein the deriving a target backbone network based on the base backbone network and a preset target network comprises:

obtaining a target network, wherein the target network comprises at least one of: activation function, full connectivity layer;

and adding the target network to the basic backbone network to obtain a target backbone network.

5. The method of any of claims 1-4, further comprising:

and updating the convolution kernel size of the target backbone network.

6. The method of any of claims 1-5, further comprising:

and adding the SE module to a preset target adding position in the target backbone network to obtain a final backbone network.

7. The method of claim 6, wherein the target addition location is determined by:

adding an SE module to different positions in the target backbone network to obtain a corresponding first backbone network set;

for each first backbone network in the first backbone network set, reasoning the reasoning image set by using the first backbone network to obtain reasoning time consumption and reasoning precision of the first backbone network in a reasoning process;

and determining the target adding position based on the reasoning time consumption and the reasoning precision of each first backbone network in the reasoning process.

8. The method of claim 1, wherein the initial set of backbone networks comprises at least one initial backbone network; and

the initial backbone network is obtained by the following steps:

acquiring network modules of each lightweight backbone network to obtain a network module set;

and randomly combining the network modules in the network module set to obtain the initial backbone network.

9. An image classification method, comprising:

acquiring an image to be classified;

extracting features of the image to be classified by using a pre-generated backbone network to obtain image features, wherein the backbone network is generated by the method of any one of claims 1-8;

and classifying the image features to obtain a classification result.

10. An apparatus for generating a backbone network, comprising:

a first acquisition module configured to acquire a training image set, an inference image set, and an initial backbone network set;

the training module is configured to train and reason the initial backbone network by using the training image set and the inference image set aiming at each initial backbone network in the initial backbone network set, so as to obtain the inference time consumption and the inference precision of the trained backbone network in the inference process;

the determining module is configured to determine a basic backbone network based on the inference time consumption and the inference precision of each trained backbone network in the inference process;

an obtaining module configured to obtain a target backbone network based on the basic backbone network and a preset target network.

11. The apparatus of claim 10, wherein the training module comprises:

a training submodule configured to train the initial backbone network using the training image set, to obtain a trained backbone network;

and the reasoning submodule is configured to convert the trained backbone network into a reasoning network, and utilizes the reasoning network to carry out reasoning on the reasoning image set to obtain the reasoning time consumption and the reasoning precision of the reasoning network in the reasoning process.

12. The apparatus of claim 11, wherein the means for determining comprises:

the drawing submodule is configured to draw the reasoning time consumption and the reasoning precision of each reasoning network in the reasoning process as points in a two-dimensional coordinate system;

and the determining submodule is configured to determine a target point from each point in the two-dimensional coordinate system, and determine an initial backbone network corresponding to the target point as a basic backbone network.

13. The apparatus of claim 10, wherein the means for obtaining comprises:

an acquisition submodule configured to acquire a target network, wherein the target network comprises at least one of: activation function, full connectivity layer;

an obtaining submodule configured to add the target network to the basic backbone network to obtain a target backbone network.

14. The apparatus of any of claims 10-13, further comprising:

an update module configured to update a convolution kernel size of the target backbone network.

15. The apparatus of any of claims 10-14, further comprising:

and the adding module is configured to add the SE module to a preset target adding position in the target backbone network to obtain a final backbone network.

16. The apparatus of claim 15, wherein the target addition location is determined by:

17. The apparatus of claim 10, wherein the initial set of backbone networks comprises at least one initial backbone network; and

the initial backbone network is obtained by the following steps:

18. An image classification apparatus comprising:

a second obtaining module configured to obtain an image to be classified;

an extraction module configured to extract features of the image to be classified by using a pre-generated backbone network to obtain image features, wherein the backbone network is generated by the method according to any one of claims 1 to 8;

and the classification module is configured to classify the image features to obtain a classification result.

19. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-9.

20. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-9.

21. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 1-9.