CN115618271A

CN115618271A - Object type identification method, device, equipment and storage medium

Info

Publication number: CN115618271A
Application number: CN202210478922.8A
Authority: CN
Inventors: 刘文然
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2022-05-05
Filing date: 2022-05-05
Publication date: 2023-01-17
Anticipated expiration: 2042-05-05
Also published as: CN115618271B

Abstract

The application discloses an object class identification method, an object class identification device, object class identification equipment and a storage medium, which can be applied to various scenes such as cloud technology, artificial intelligence, intelligent traffic, internet of vehicles and the like, wherein the method comprises the following steps: acquiring target data of a target object; performing class identification processing on the target data based on the object classification model to obtain a target class label of the target object; the object classification model is obtained by carrying out object class recognition training on the pruning classification model based on sample data of a sample object, the pruning classification model is a model obtained by pruning channels to be pruned in the initial classification model, and the channels to be pruned are channels with the scaling parameter absolute values smaller than a preset threshold value in the initial classification model; the initial classification model is obtained by performing object class identification training on a preset network based on sample data; the preset network comprises an attention network provided with a zooming layer; the method and the device reduce the operation amount of the object classification model, improve the calculation speed of the model and improve the identification speed of the object class.

Description

Object type identification method, device, equipment and storage medium

Technical Field

The present application relates to the field of computer technologies, and in particular, to a method, an apparatus, a device, and a storage medium for identifying an object class.

Background

In the related art, for a neural network including a CNN or an MLP, the non-important parameters in the network are usually pruned according to the importance of the parameters, so as to achieve the purpose of reducing the model. For the Attention layer, because parameters in the layer mainly consist of full connection layers (FC), and the connection modes of these FC layers are different from those of CNN and MLP, if channel pruning is directly performed, the pruned network cannot be correctly calculated.

Disclosure of Invention

The application provides an object type identification method, an object type identification device and a storage medium, which can improve the identification rate of object types.

In one aspect, the present application provides an object class identification method, including:

acquiring target data of a target object;

performing class identification processing on the target data based on an object classification model to obtain a target class label of the target object; the object classification model is obtained by carrying out object class recognition training on a pruning classification model based on sample data of a sample object, the pruning classification model is a model obtained by pruning a channel to be pruned in an initial classification model, and the channel to be pruned is a channel with a scaling parameter absolute value smaller than a preset threshold value in the initial classification model; the initial classification model is obtained by performing object class recognition training on a preset network based on the sample data; the preset network comprises an updated attention network, and the updated attention network is an attention network provided with a zooming layer; scaling parameters for each channel in the initial classification model are determined based on the scaling layer; the sample data is labeled with a sample class label of the sample object.

Another aspect provides an object class identification apparatus, including:

the target data acquisition module is used for acquiring target data of a target object;

the target class determination module is used for carrying out class identification processing on the target data based on an object classification model to obtain a target class label of the target object; the object classification model is obtained by carrying out object class recognition training on a pruning classification model based on sample data of a sample object, the pruning classification model is a model obtained by pruning a channel to be pruned in an initial classification model, and the channel to be pruned is a channel with a scaling parameter absolute value smaller than a preset threshold value in the initial classification model; the initial classification model is obtained by performing object class identification training on a preset network based on the sample data; the preset network comprises an updated attention network, and the updated attention network is an attention network provided with a zooming layer; scaling parameters for each channel in the initial classification model are determined based on the scaling layer; the sample data is labeled with a sample class label of the sample object.

Another aspect provides an object class identification device, which includes a processor and a memory, where at least one instruction or at least one program is stored in the memory, and the at least one instruction or the at least one program is loaded by the processor and executed to implement the object class identification method as described above.

Another aspect provides a computer storage medium storing at least one instruction or at least one program, which is loaded and executed by a processor to implement the object class identification method as described above.

Another aspect provides a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the computer device executes to implement the object class identification method as described above.

The object type identification method, device, equipment and storage medium provided by the application have the following technical effects:

the method comprises the steps of obtaining target data of a target object; performing class identification processing on the target data based on an object classification model to obtain a target class label of the target object; the object classification model is obtained by carrying out object class recognition training on a pruning classification model based on sample data of a sample object, the pruning classification model is a model obtained by pruning a channel to be pruned in an initial classification model, and the channel to be pruned is a channel with a scaling parameter absolute value smaller than a preset threshold value in the initial classification model; the initial classification model is obtained by performing object class identification training on a preset network based on the sample data; the preset network comprises an updated attention network, and the updated attention network is an attention network provided with a zooming layer; scaling parameters for each channel in the initial classification model are determined based on the scaling layer; the sample data is marked with a sample class label of the sample object; according to the method and the device, the zooming layer is arranged in the attention network of the preset network, the channel to be pruned is determined through parameters of the zooming layer, pruning is carried out in the model containing the attention network, and the object classification model is further determined according to the pruning classification model, so that the operation amount of the object classification model is reduced, the calculation speed of the model is increased, and the identification speed of the object class is increased.

Drawings

In order to more clearly illustrate the technical solutions and advantages of the embodiments of the present application or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and other drawings can be obtained by those skilled in the art without creative efforts.

FIG. 1 is a schematic diagram of an object class identification system according to an embodiment of the present disclosure;

fig. 2 is a schematic flowchart of an object class identification method according to an embodiment of the present application;

FIG. 3 is a flowchart illustrating a method for training an object classification model according to an embodiment of the present disclosure;

fig. 4 is a schematic flowchart of a method for adding a scaling layer to the original attention network to obtain the updated attention network according to an embodiment of the present application;

FIG. 5 is a schematic flow chart diagram illustrating a method for determining the object classification model based on the initial classification model according to an embodiment of the present application;

fig. 6 is a schematic flowchart of a method for pruning the channel to be pruned from the initial classification model to obtain the pruning classification model according to the embodiment of the present application;

fig. 7 is a schematic flowchart of a method for performing object class identification training on the pruning classification model based on the sample data to obtain the object classification model according to the embodiment of the present application;

fig. 8 is a schematic structural diagram of a picture classification model provided in an embodiment of the present application;

FIG. 9 is a diagram comparing the structure of Attention before and after adding a zoom layer according to an embodiment of the present application;

FIG. 10 is a comparison graph of the initial classification model according to the present application before and after channel pruning;

FIG. 11 is a diagram illustrating a structural comparison of the anchoring before and after the addition of an index pooling layer according to an embodiment of the present application;

FIG. 12 is a flowchart of a method for constructing an object classification model according to an embodiment of the present disclosure;

fig. 13 is a schematic structural diagram of an object class identification apparatus according to an embodiment of the present application;

fig. 14 is a schematic structural diagram of a server according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

First, some terms or terms appearing in the description of the embodiments of the present application are explained as follows:

artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making. Specifically, the scheme provided by the embodiment of the application relates to the field of machine learning of artificial intelligence. Machine Learning (ML) is a multi-domain cross discipline, and relates to a plurality of disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like. The special research on how a computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills and reorganize the existing knowledge structure to continuously improve the performance of the computer. Machine learning is the core of artificial intelligence, is the fundamental approach to make computers have intelligence, and is applied in various fields of artificial intelligence.

The smart transportation fully utilizes the new generation information technology such as Internet of things, space perception, cloud computing, mobile internet and the like in the whole transportation field, comprehensively utilizes theories and tools such as traffic science, system method, artificial intelligence and knowledge mining, aims at comprehensive perception, deep fusion, active service and scientific decision, deeply mines related transportation data by building a real-time dynamic information service system to form a problem analysis model, realizes the improvement of industry resource allocation optimization capability, public decision capability, industry management capability and public service capability, promotes the safer, more efficient, more convenient, more economic, more environment-friendly and more comfortable operation and development of transportation, and drives the transformation and upgrading of related transportation industries.

Attention: the nature of the Attention mechanism, a neural network layer, is inspirational from the human visual Attention mechanism. Generally, when people perceive things visually, a scene is not seen from head to tail, but a specific part is observed according to needs. And when we find that a scene often appears something we want to observe in a certain part, we can learn to pay attention to the part when similar scenes reappear in the future.

Transformer: a neural network module is a model which utilizes an attention mechanism to improve the training speed of the model.

ImageNet dataset: a computer vision data set was created with the teaching of Stanford university Li Feifei. The data set includes 14,197,122 pictures and 21,841 Synset indices. Synset is a node in the WordNet hierarchy, which is again a set of synonym sets. The ImageNet dataset has been the benchmark for evaluating the performance of image classification algorithms.

It should be noted that the terms "first," "second," and the like in the description and claims of this application and in the accompanying drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It should be understood that the data so used may be interchanged under appropriate circumstances such that embodiments of the application described herein may be implemented in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or server that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

Referring to fig. 1, fig. 1 is a schematic diagram of an object class identification system according to an embodiment of the present disclosure, and as shown in fig. 1, the object class identification system may at least include a server 01 and a client 02.

Specifically, in this embodiment of the application, the server 01 may include an independently operating server, or a distributed server, or a server cluster including a plurality of servers, and may also be a cloud server that provides basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a Network service, cloud communication, a middleware service, a domain name service, a security service, a CDN (Content Delivery Network), and a big data and artificial intelligence platform. The server 01 may comprise a network communication unit, a processor and a memory, etc. In particular, the server 01 may be configured to train an object classification model, and determine a class label of a target object based on the object classification model.

Specifically, in this embodiment, the client 02 may include a smart phone, a desktop computer, a tablet computer, a notebook computer, a digital assistant, a smart wearable device, a smart speaker, a vehicle-mounted terminal, a smart television, and other types of physical devices, or may include software running in the physical devices, such as a web page provided by some service providers to a user, or may be an application provided by the service providers to the user. Specifically, the client 02 may be used to query the category of the target type online.

An object class identification method of the present application is described below, and fig. 2 is a schematic flowchart of an object class identification method provided in an embodiment of the present application, and the present specification provides method operation steps as described in the embodiment or the flowchart, but may include more or less operation steps based on conventional or non-creative labor. The order of steps recited in the embodiments is merely one manner of performing the steps in a multitude of orders and does not represent the only order of execution. In practice, the system or server product may be implemented in a sequential or parallel manner (e.g., parallel processor or multi-threaded environment) according to the embodiments or methods shown in the figures. Specifically, as shown in fig. 2, the method may be applied to the server 01 shown in fig. 1, and may include:

s201: target data of a target object is acquired.

In the embodiment of the present application, the target object may be an object in various application scenarios and fields, and may include a person, an animal, a commodity, a living article, and the like, for example, may include but is not limited to a user, a store, an address, an animal, an electronic device, and the like. The target data is data corresponding to the target object and can represent the attribute of the target object. The target data may include, but is not limited to, characters, text, images, and the like.

In this embodiment of the present application, the acquiring target data of the target object may include:

and receiving the target data sent by the terminal in response to the object class identification instruction.

In the embodiment of the application, the target data corresponding to the target object can be acquired through the terminal corresponding to the target object.

S203: performing class identification processing on the target data based on an object classification model to obtain a target class label of the target object; the object classification model is obtained by carrying out object class recognition training on a pruning classification model based on sample data of a sample object, the pruning classification model is a model obtained by pruning a channel to be pruned in an initial classification model, and the channel to be pruned is a channel with a scaling parameter absolute value smaller than a preset threshold value in the initial classification model; the initial classification model is obtained by performing object class identification training on a preset network based on the sample data; the preset network comprises an updated attention network, and the updated attention network is an attention network provided with a zooming layer; scaling parameters for each channel in the initial classification model are determined based on the scaling layer; the sample data is labeled with a sample class label of the sample object.

In the embodiment of the present application, the sample object and the target object are objects of the same application scene and the same type, and the sample data and the target data are data of the same type, for example, if the target data is an image, the sample data is also an image. The predetermined network may be various networks including Attention network (Attention), for example, the predetermined network may be a Transformer, and may also be other types of networks; updating the attention network to be an attention network provided with a zoom (Scale) layer; the preset threshold corresponding to the scaling parameter may be set according to an actual situation, and may be set to a value close to zero.

In the embodiment of the application, an initial classification model is obtained according to preset network training, then the channels to be pruned are determined according to the scaling parameters of each channel corresponding to the scaling layer when the model converges, so that a pruning classification model containing the attention network is determined, and then the model is further trained to obtain an object classification model. The object classification model in the embodiment can be applied to different scenes to classify different objects; the object classification model of the embodiment reduces the calculation amount of Attention during network deployment, and can be applied to models using an Attention algorithm, such as text classification, picture classification, video classification and the like. For example, in an image classification scenario, various images may be classified by an object classification model; in an App scene, classifying users according to the associated data of the users and App service indexes; in the advertisement scene, whether the user is interested in a specific advertisement can be judged according to the associated data of the user and the advertisement service index.

In a specific embodiment, the Transformer is used as a neural network basic module, and plays an important role in natural language processing and computer vision tasks, for example, in a picture classification task, a Transformer layer can be stacked to build a picture classification network, and finally, a classifier is accessed to classify pictures; as shown in fig. 8, fig. 8 is a schematic structural diagram of a picture classification model, where the picture classification model is composed of multiple layers of transformers, and each Transformer includes an improved Attention, so that the computation amount of the classification model can be reduced, and the classification speed of the model can be increased.

In some embodiments, as shown in fig. 3, the training method of the object classification model includes:

s301: acquiring an original attention network;

in the embodiment of the present application, the original Attention network may be Attention.

S303: adding a scaling layer in the original attention network to obtain the updated attention network;

in the embodiment of the present application, a scaling (Scale) layer can be added to the Attention, so as to obtain an updated Attention network.

In some embodiments, as shown in fig. 4, the scaling layer includes a first scaling layer and a second scaling layer, and adding a scaling layer to the original attention network to obtain the updated attention network includes:

s3031: determining a first linear layer, a second linear layer, a first matrix multiplication layer, and a second matrix multiplication layer in the original attention network; the first linear layer is connected with the first matrix multiplication layer, and the second linear layer is connected with the second matrix multiplication layer;

in the embodiment of the present application, as shown in fig. 9, (a) in fig. 9 is a schematic structural diagram of an orientation, where the orientation includes a linear layer and a matrix multiplication layer (Mat multiplex), where the linear layer includes a first linear layer (corresponding to value linear transformation matrix, V) and a second linear layer (corresponding to key linear transformation matrix, K); the matrix multiplication layer comprises a first matrix multiplication layer and a second matrix multiplication layer; the first linear layer is connected with the first matrix multiplication layer, and the second linear layer is connected with the second matrix multiplication layer; the original attention network further comprises a normalized exponential function layer (Softmax), a third linear layer (corresponding query linear transformation matrix, Q), the third linear layer being connected with the second matrix multiplication layer, the second matrix multiplication layer being connected with the normalized exponential function layer, the normalized exponential function layer being connected with the first matrix multiplication layer. The first linear layer is used for determining a content vector of data, and the second linear layer is used for determining a content vector query identification of the data; the third linear layer to determine a query vector for the data; q is the query vector of the word, K is the "looked up" vector, and V is the content vector. Wherein, Q is most suitable for searching the goal, K is most suitable for receiving and searching, V is the content, these three do not necessarily need to be unanimous, so the network has set up three vectors so, then learn out the most suitable Q, K, V, in order to strengthen the ability of the network.

S3033: and adding the first scaling layer between the first linear layer and the first matrix multiplication layer connection layer, and adding the second scaling layer between the second linear layer and the second matrix multiplication layer connection layer to obtain the updated attention network.

In the embodiment of the present application, two scaling layers may be added to the Attention to obtain the Scaled Attention to network (Scaled Attention); as shown in fig. 9, (b) of fig. 9 is a schematic structural diagram of an Attention (Scaled Attention) with a scaling layer added.

S305: constructing the preset network based on the updated attention network;

in the embodiment of the application, a preset network can be constructed according to the updated attention network; for example, a Transformer can be constructed from Scaled Attention.

S307: according to the sample data of the sample object, carrying out object class identification training on the preset network to obtain an initial classification model; the initial classification model comprises at least two channels;

in the embodiment of the application, the preset network comprises at least two channels, and the number of the channels of the initial classification model obtained by training is the same as that of the channels of the preset network.

In some embodiments, the performing, according to sample data of the sample object, object class recognition training on the preset network to obtain an initial classification model may include:

inputting sample data of the sample object into the preset network for object class identification training, and continuously adjusting parameters of the preset network in the training process until an object class label output by the preset network is matched with a labeled object class label;

and taking the preset network corresponding to the parameters when the output object class label is matched with the labeled object class label as the initial classification model.

S309: determining the object classification model based on the initial classification model.

In some embodiments, as shown in fig. 5, the determining the object classification model based on the initial classification model includes:

s3091: obtaining a target function corresponding to the scaling layer in the initial classification model;

s3093: determining a scaling parameter corresponding to each channel in the initial classification model according to the objective function;

in the embodiment of the present application, the scaling parameter corresponding to each channel in the initial classification model may be determined according to the coefficient in the objective function.

S3095: determining channels to be cut out based on the scaling parameters corresponding to each channel in the initial classification model; the channel to be cut is a channel of which the scaling parameter absolute value is smaller than a preset threshold value;

in this embodiment of the application, the preset threshold may be set according to an actual requirement, for example, the preset threshold may be a value close to 0.

In some embodiments, the determining a channel to be pruned based on the scaling parameter corresponding to each channel in the initial classification model may include:

determining the total quantity of channels in the initial classification model and the proportion of channels to be cut;

in the embodiment of the present application, the proportion of the channels to be pruned is the proportion of the channels to be pruned in the total number of the channels in the model, and may be set according to actual situations, for example, may be set to 10%, 20%, and the like.

Determining the number of channels to be pruned according to the total number of the channels and the proportion of the channels to be pruned;

in the embodiment of the present application, a product of the total number of channels and the proportion of the channels to be pruned may be calculated to obtain the number of the channels to be pruned.

Determining the preset threshold according to the scaling parameter corresponding to each channel in the initial classification model and the number of the channels to be cut;

and determining the identification information of the channel to be cut according to the scaling parameter corresponding to each channel in the initial classification model and the preset threshold.

In the embodiment of the application, after all attentions in the model are converted into scaledattentions, the model is trained on the original task, the obtained model and the original model are consistent in task precision, then a part of channels are selected to be cut according to the parameters of the Scale layer, as the closer the Scale layer is to 0, the less important the characteristics in the channel is, the corresponding channel cutting can be performed according to the absolute value of the Scale layer parameters, for example, a channel with 20% of cut is preset, a channel with 20% of absolute value closest to 0 is selected from the Scale layer parameters, that is, the preset threshold value can be a numerical value close to 0, the serial number of the channel is recorded, and the subsequent calculation is not considered. In a specific embodiment, as shown in fig. 10, (a) in fig. 10 is the scaling parameters of 5 channels in the initial classification model, and (b) is the scaling parameters of the other channels after two channels are cut out from the model.

In some embodiments, one channel in the model corresponds to a set of characteristic parameters, the sum of squares of the characteristic parameters corresponding to each channel can be calculated, the characteristic parameters of the channel include scaling parameters, and then the channel to be pruned is determined according to the sum of squares of the characteristic parameters corresponding to each channel of the model; for example, a channel whose sum of squares is smaller than a preset value may be determined as a channel to be clipped.

S3097: pruning the channel to be pruned in the initial classification model to obtain the pruning classification model;

in some embodiments, as shown in fig. 6, the pruning the channel to be pruned from the initial classification model to obtain the pruning classification model includes:

s30971: determining identification information to be cut of the channel to be cut;

s30973: adding an index pooling layer after the scaling layer of the initial classification model;

in the embodiment of the application, unimportant channels in K and V are found through a Scale layer, the serial number of the channel is recorded, then an IndexPooling layer is added behind the Scale layer in the Scaled Attention, namely, the retained channels are selected through Pooling, and a new Attention is obtained, which is called IndexPoolingAttention.

In some embodiments, the index pooling layers include a first index pooling layer and a second index pooling layer, the adding an index pooling layer after the scaling layer of the initial classification model includes:

adding a first index pooling layer between the first scaling layer and the first matrix multiplication layer;

and adding a second index pooling layer between the second scaling layer and the second matrix multiplication layer.

In a specific embodiment, as shown in fig. 11, (a) in fig. 11 is a schematic structural diagram of an Attention (Scaled Attention) added with a scaling layer, and (b) in fig. 11 is a schematic structural diagram of an Attention (indexpolyngattention) added with an index pooling layer.

S30975: and pruning the channel to be pruned corresponding to the identification information to be pruned from the channel of the initial classification model based on the index pooling layer to obtain the pruning classification model.

In the examples of the present application, the remaining channels were selected by Pooling, and the new orientation, called IndexPoolingAttention, was obtained.

S3099: and carrying out object class identification training on the pruning classification model based on the sample data to obtain the object classification model.

In the embodiment of the present application, after the scaledAttentions in the model are all converted into IndexPoolingAttentions, the model is retrained.

In the embodiment of the application, because the calculated quantities of the Scale layer and the IndexPooling are far smaller than the calculated quantity of the matrix multiplication in the Attention, the calculated quantities in the matrix multiplication and the softmax operation are reduced by cutting off some channel characteristics of K and V, so that the purposes of reducing the calculated quantities and improving the model identification speed are achieved.

In some embodiments, as shown in fig. 7, the performing object class recognition training on the pruning classification model based on the sample data to obtain the object classification model includes:

s30991: acquiring initial model parameters of the initial classification model;

s30993: taking the initial model parameters as initial training parameters of the pruning classification model;

s30995: and carrying out object class identification training on the pruning classification model based on the sample data and the initial training parameters to obtain the object classification model.

In the embodiment of the application, since the initial classification model is trained to be convergent, the trained parameters in the initial classification model are loaded into the current pruning classification model and then trained, so that the convergence speed of the model can be increased, and the accuracy of the model is easily maintained.

In a specific embodiment, as shown in fig. 12, fig. 12 is a flowchart of a method for constructing an object classification model, including:

the Attention module in the model is added to the Scale layer first. Input with original Attention model

After passing through a Linear layer, obtaining

The original Attention operation can be represented by formula (1). When the input features are received by a Linear layer, they are received in a form that is flattened into a one-dimensional tensor, and then multiplied by a weight matrix. This matrix multiplication produces output signatures.

After passing through the Linear layer, the pair

And

adding Scale layer operation, namely multiplying each channel of N channels of the vector by a parameter, thereby obtaining the importance of each channel through the learning of the parameter in training, wherein the importance of each channel can be realizedCharacterized by the magnitude of the absolute value of this parameter. The Attention operation obtained after adding the Scale layer can be represented by formula (2), wherein S (×) represents the Scale layer.

Since the parameters in the Scale layer can be learned in training, when the parameters of the Scale are all 1, the parameters are equivalent to the original Attention operation, and after learning, the more the parameters of the corresponding channel of the Scale layer are close to 0, the lower the importance of the channel is represented, and the channel can be cut.

Specifically, in the embodiment of the application, compared with a model without pruning, the calculated amount and the model reasoning speed are optimized through the pruned model. As shown in table 1, the Deit is an image classification model based on a transform, and is classified into Deit-Small, deit-Base and the like according to the size of the model, and after the original Deit network is pruned by using the method of this embodiment, the obtained pruned model reduces the calculated amount and increases the inference speed on the premise of maintaining the accuracy of the original model on the Imagenet data set.

TABLE 1 data comparison table of the post-pruning deit and pre-pruning model of this example

Model (model)	Top1 precision	Calculated quantity (GFLOPS)	Reasoning time (images/s)
				Deit-Small	79.8	4.6	930
Deit-Base	81.8	17.6	290
				Deit-Small after pruning	79.5	3.7	1120
Post-pruning Deit-Base	81.3	14.0	350

According to the technical scheme provided by the embodiment of the application, the embodiment of the application acquires the target data of the target object; performing class identification processing on the target data based on an object classification model to obtain a target class label of the target object; the object classification model is obtained by carrying out object class recognition training on a pruning classification model based on sample data of a sample object, the pruning classification model is a model obtained by pruning a channel to be pruned in an initial classification model, and the channel to be pruned is a channel with a scaling parameter absolute value smaller than a preset threshold value in the initial classification model; the initial classification model is obtained by performing object class identification training on a preset network based on the sample data; the preset network comprises an updated attention network, and the updated attention network is an attention network provided with a zooming layer; scaling parameters for each channel in the initial classification model are determined based on the scaling layer; the sample data is marked with a sample class label of the sample object; according to the method and the device, the zooming layer is arranged in the attention network of the preset network, the channel to be pruned is determined through parameters of the zooming layer, pruning is carried out in the model containing the attention network, and the object classification model is further determined according to the pruning classification model, so that the operation amount of the object classification model is reduced, the calculation speed of the model is increased, and the identification speed of the object class is increased.

An embodiment of the present application further provides an object class identification apparatus, as shown in fig. 13, the apparatus includes:

a target data obtaining module 1310 for obtaining target data of a target object;

a target category determining module 1320, configured to perform category identification processing on the target data based on an object classification model, so as to obtain a target category tag of the target object; the object classification model is obtained by carrying out object class recognition training on a pruning classification model based on sample data of a sample object, the pruning classification model is a model obtained by pruning a channel to be pruned in an initial classification model, and the channel to be pruned is a channel with a scaling parameter absolute value smaller than a preset threshold value in the initial classification model; the initial classification model is obtained by performing object class identification training on a preset network based on the sample data; the preset network comprises an updated attention network, and the updated attention network is an attention network provided with a zooming layer; scaling parameters for each channel in the initial classification model are determined based on the scaling layer; the sample data is labeled with a sample class label of the sample object.

In some embodiments, the apparatus may further comprise:

the original attention network acquisition module is used for acquiring an original attention network;

an attention network updating module, configured to add a scaling layer to the original attention network to obtain an updated attention network;

the preset network construction module is used for constructing the preset network based on the updated attention network;

the initial classification model determining module is used for carrying out object class recognition training on the preset network according to sample data of the sample object to obtain an initial classification model; the initial classification model comprises at least two channels;

and the object classification model determining module is used for determining the object classification model based on the initial classification model.

In some embodiments, the object classification model determination module comprises:

an objective function obtaining unit, configured to obtain an objective function corresponding to the scaling layer in the initial classification model;

a scaling parameter determining unit, configured to determine, according to the objective function, a scaling parameter corresponding to each channel in the initial classification model;

a to-be-pruned channel determining unit, configured to determine a to-be-pruned channel based on a scaling parameter corresponding to each channel in the initial classification model; the channel to be cut is a channel of which the scaling parameter absolute value is smaller than a preset threshold value;

a pruning classification model determining unit, configured to prune the channel to be pruned from the initial classification model to obtain the pruning classification model;

and the object classification model determining unit is used for carrying out object class identification training on the pruning classification model based on the sample data to obtain the object classification model.

In some embodiments, the scaling layer comprises a first scaling layer and a second scaling layer, and the attention network update module may comprise:

a network layer determining unit, configured to determine a first linear layer, a second linear layer, a first matrix multiplication layer, and a second matrix multiplication layer in the original attention network; the first linear layer is connected with the first matrix multiplication layer, and the second linear layer is connected with the second matrix multiplication layer; the first linear layer is used for determining a content vector of data, and the second linear layer is used for determining a content vector query identification of the data;

a scaling layer adding unit, configured to add the first scaling layer between the first linear layer and the first matrix multiplication layer connection layer, and add the second scaling layer between the second linear layer and the second matrix multiplication layer connection layer, so as to obtain the updated attention network.

In some embodiments, the pruning classification model determination unit may include:

the identification information to be pruned determining subunit is used for determining the identification information to be pruned of the channel to be pruned;

an index pooling layer adding subunit, configured to add an index pooling layer after the scaling layer of the initial classification model;

and the channel pruning subunit is configured to prune, from the channels of the initial classification model, the channels to be pruned corresponding to the identification information to be pruned based on the index pooling layer, to obtain the pruning classification model.

In some embodiments, the index pooling layers include a first index pooling layer and a second index pooling layer, the apparatus may further include:

in some embodiments, the index pooling layer adding subunit may include:

a first adding subunit for adding a first index pooling layer between the first scaling layer and the first matrix multiplication layer;

a second adding subunit for adding a second index pooling layer between the second scaling layer and the second matrix multiplication layer.

In some embodiments, the object classification model determining unit may include:

an initial model parameter determining subunit, configured to obtain initial model parameters of the initial classification model;

an initial training parameter determining subunit, configured to use the initial model parameter as an initial training parameter of the pruning classification model;

and the object classification model determining subunit is used for performing object class identification training on the pruning classification model based on the sample data and the initial training parameters to obtain the object classification model.

The device and method embodiments in the device embodiment described are based on the same inventive concept.

The embodiment of the present application provides an object class identification device, which includes a processor and a memory, where the memory stores at least one instruction or at least one program, and the at least one instruction or the at least one program is loaded and executed by the processor to implement the object class identification method provided by the above method embodiment.

Embodiments of the present application further provide a computer storage medium, where the storage medium may be disposed in a terminal to store at least one instruction or at least one program for implementing an object class identification method in the method embodiments, and the at least one instruction or the at least one program is loaded and executed by the processor to implement the object class identification method provided in the method embodiments.

Embodiments of the present application also provide a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the computer device executes the computer instructions to implement the object class identification method provided by the method embodiment.

Alternatively, in an embodiment of the present application, the storage medium may be located in at least one network server of a plurality of network servers of a computer network. Optionally, in this embodiment, the storage medium may include but is not limited to: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.

The memory according to the embodiments of the present application may be used to store software programs and modules, and the processor may execute various functional applications and data processing by operating the software programs and modules stored in the memory. The memory can mainly comprise a program storage area and a data storage area, wherein the program storage area can store an operating system, application programs needed by functions and the like; the storage data area may store data created according to use of the apparatus, and the like. Further, the memory may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device. Accordingly, the memory may also include a memory controller to provide the processor access to the memory.

The object class identification method provided by the embodiment of the application can be executed in a mobile terminal, a computer terminal, a server or a similar operation device. Taking the example of the application running on a server, fig. 14 is a hardware structure block diagram of the server of the object class identification method provided in the embodiment of the present application. As shown in fig. 14, the server 1400 may have a relatively large difference due to different configurations or performances, and may include one or more Central Processing Units (CPUs) 1410 (the CPU 1410 may include but is not limited to a Processing device such as a microprocessor MCU or a programmable logic device FPGA), a memory 1430 for storing data, and one or more storage media 1420 (e.g., one or more mass storage devices) for storing application programs 1423 or data 1422. Memory 1430 and storage medium 1420 may be, among other things, transient storage or persistent storage. The program stored on the storage medium 1420 may include one or more modules, each of which may include a series of instruction operations on a server. Still further, a central processor 1410 may be provided in communication with the storage medium 1420 to execute a series of instruction operations in the storage medium 1420 on the server 1400. Server 1400 may also include one or more power supplies 1460, one or more wired or wireless network interfaces 1450, one or more input output interfaces 1440, and/or one or more operating systems 1421 such as Windows Server, mac OS XTM, unixTM, linuxTM, freeBSDTM, etc.

The input/output interface 1440 may be used to receive or transmit data via a network. Specific examples of the network described above may include a wireless network provided by a communication provider of the server 1400. In one example, the i/o Interface 1440 includes a Network Interface Controller (NIC) that can be connected to other Network devices via a base station to communicate with the internet. In one example, the i/o interface 1440 may be a Radio Frequency (RF) module, which is used to communicate with the internet in a wireless manner.

It will be understood by those skilled in the art that the structure shown in fig. 14 is only an illustration and is not intended to limit the structure of the electronic device. For example, server 1400 may also include more or fewer components than shown in FIG. 14, or have a different configuration than shown in FIG. 14.

As can be seen from the embodiments of the object class identification method, apparatus, device or storage medium provided in the present application, the present application obtains target data of a target object; performing class identification processing on the target data based on an object classification model to obtain a target class label of the target object; the object classification model is obtained by carrying out object class recognition training on a pruning classification model based on sample data of a sample object, the pruning classification model is a model obtained by pruning a channel to be pruned in an initial classification model, and the channel to be pruned is a channel with a scaling parameter absolute value smaller than a preset threshold value in the initial classification model; the initial classification model is obtained by performing object class identification training on a preset network based on the sample data; the preset network comprises an updated attention network, and the updated attention network is an attention network provided with a zooming layer; scaling parameters for each channel in the initial classification model are determined based on the scaling layer; the sample data is marked with a sample class label of the sample object; according to the method and the device, the zooming layer is arranged in the attention network of the preset network, the channel to be pruned is determined according to parameters of the zooming layer, pruning is carried out in the model containing the attention network, and the object classification model is further determined according to the pruning classification model, so that the operation amount of the object classification model is reduced, the calculation speed of the model is increased, and the identification speed of the object type is increased.

It should be noted that: the sequence of the embodiments of the present application is only for description, and does not represent the advantages and disadvantages of the embodiments. And specific embodiments thereof have been described above. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.

All the embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from other embodiments. In particular, as for the apparatus, device, and storage medium embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and reference may be made to some descriptions of the method embodiments for relevant points.

It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, where the program may be stored in a computer storage medium, and the above storage medium may be a read-only memory, a magnetic disk, an optical disk, or the like.

The above description is only exemplary of the present application and should not be taken as limiting the present application, as any modification, equivalent replacement, or improvement made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims

1. An object class identification method, characterized in that the method comprises:

acquiring target data of a target object;

performing class identification processing on the target data based on an object classification model to obtain a target class label of the target object; the object classification model is obtained by carrying out object class recognition training on a pruning classification model based on sample data of a sample object, the pruning classification model is a model obtained by pruning a channel to be pruned in an initial classification model, and the channel to be pruned is a channel with a scaling parameter absolute value smaller than a preset threshold value in the initial classification model; the initial classification model is obtained by performing object class identification training on a preset network based on the sample data; the preset network comprises an updated attention network, and the updated attention network is an attention network provided with a zooming layer; scaling parameters for each channel in the initial classification model are determined based on the scaling layer; and the sample data is marked with a sample class label of the sample object.

2. The method of claim 1, wherein the training method of the object classification model comprises:

acquiring an original attention network;

adding a scaling layer in the original attention network to obtain the updated attention network;

constructing the preset network based on the updated attention network;

according to the sample data of the sample object, carrying out object class identification training on the preset network to obtain an initial classification model; the initial classification model comprises at least two channels;

determining the object classification model based on the initial classification model.

3. The method of claim 2, wherein determining the object classification model based on the initial classification model comprises:

obtaining a target function corresponding to the scaling layer in the initial classification model;

determining a scaling parameter corresponding to each channel in the initial classification model according to the objective function;

determining channels to be cut out based on the scaling parameters corresponding to each channel in the initial classification model; the channel to be cut is a channel with the scaling parameter absolute value smaller than a preset threshold value;

pruning the channel to be pruned in the initial classification model to obtain the pruning classification model;

and carrying out object class identification training on the pruning classification model based on the sample data to obtain the object classification model.

4. The method of claim 2, wherein the scaling layer comprises a first scaling layer and a second scaling layer, and wherein adding a scaling layer to the original attention network to obtain the updated attention network comprises:

determining a first linear layer, a second linear layer, a first matrix multiplication layer, and a second matrix multiplication layer in the original attention network; the first linear layer is connected with the first matrix multiplication layer, and the second linear layer is connected with the second matrix multiplication layer; the first linear layer is used for determining a content vector of data, and the second linear layer is used for determining a content vector query identification of the data;

and adding the first scaling layer between the first linear layer and the first matrix multiplication layer connection layer, and adding the second scaling layer between the second linear layer and the second matrix multiplication layer connection layer to obtain the updated attention network.

5. The method according to claim 3, wherein the pruning the channel to be pruned in the initial classification model to obtain the pruning classification model comprises:

determining identification information to be cut of the channel to be cut;

adding an index pooling layer after the scaling layer of the initial classification model;

and pruning channels to be pruned corresponding to the identification information to be pruned from the channels of the initial classification model based on the index pooling layer to obtain the pruning classification model.

6. The method of claim 5, wherein the index pooling layers comprise a first index pooling layer and a second index pooling layer, and wherein adding an index pooling layer after the scaling layer of the initial classification model comprises:

7. The method of claim 3, wherein the training of object class recognition on the pruning classification model based on the sample data to obtain the object classification model comprises:

acquiring initial model parameters of the initial classification model;

taking the initial model parameters as initial training parameters of the pruning classification model;

and carrying out object class identification training on the pruning classification model based on the sample data and the initial training parameters to obtain the object classification model.

8. An object class identification apparatus, characterized in that the apparatus comprises:

the target class determination module is used for carrying out class identification processing on the target data based on an object classification model to obtain a target class label of the target object; the object classification model is obtained by carrying out object class recognition training on a pruning classification model based on sample data of a sample object, the pruning classification model is a model obtained by pruning channels to be pruned in an initial classification model, and the channels to be pruned are channels with scaling parameter absolute values smaller than a preset threshold value in the initial classification model; the initial classification model is obtained by performing object class recognition training on a preset network based on the sample data; the preset network comprises an updated attention network, and the updated attention network is an attention network provided with a zooming layer; scaling parameters for each channel in the initial classification model are determined based on the scaling layer; the sample data is labeled with a sample class label of the sample object.

9. An object class identification device, characterized in that the device comprises a processor and a memory, wherein at least one instruction or at least one program is stored in the memory, and the at least one instruction or the at least one program is loaded and executed by the processor to implement the object class identification method according to any one of claims 1 to 7.

10. A computer storage medium having stored therein at least one instruction which is loaded and executed by a processor to implement the object class identification method of any one of claims 1 to 7.