CN114463556B

CN114463556B - Equal-variation network training method and device, and image recognition method and device

Info

Publication number: CN114463556B
Application number: CN202210082330.4A
Authority: CN
Inventors: 陈智强; 余山; 陈阳
Original assignee: Beijing Zhiyuan Artificial Intelligence Research Institute
Current assignee: Beijing Zhiyuan Artificial Intelligence Research Institute
Priority date: 2022-01-24
Filing date: 2022-01-24
Publication date: 2022-12-16
Anticipated expiration: 2042-01-24
Also published as: CN114463556A

Abstract

The invention discloses an equal transformation network training method, which comprises the following steps: constructing a network model consisting of convolutional layers; converting the convolution operation of the convolution layer in the network model into equal-variation convolution operation by using the dimension parameters corresponding to the target transformation group; and training the network model by using the images in the training image set to obtain the equal variation network. By adding a target transformation group dimension in the convolution layer of the network, namely converting the convolution operation of the convolution layer in the network into the equal-variation convolution operation by using the dimension parameter corresponding to the target transformation group, the equal-variation network on the target transformation group is realized, and the transformation of the image can be decoupled from the essential characteristics. The target transformation group can be any linear transformation group, including but not limited to a translation transformation group, a rotation transformation group, a scaling transformation group, a miscut transformation group and a three-dimensional rotation transformation group in a homogeneous space, so that the scheme can realize the realization of a transformation network such as any linear group and the like, and has strong universality.

Description

Equal-variation network training method and device, and image recognition method and device

Technical Field

The invention relates to the technical field of artificial intelligence, in particular to an equal transformation network training method and device and an image recognition method and device.

Background

Generally, the appearance of a visual object is changed by rotating, scaling and other transformations of the visual object in an image, so that a great learning space is brought to a neural network. The feasible solution is to enhance the decoupling capability of the neural network and obtain a more compact potential learning space, and the convolutional neural network has the intrinsic translational decoupling capability and can learn the basic features regardless of the position of the object in the input. In order to further improve the decoupling capability of the network, a group equivariant theory and a rotating group equivariant network are proposed.

However, in the prior art, only the invariant network on the simple transformation group, such as translation, rotation and mirror image, has no general method for realizing the arbitrary linear group invariant network.

Disclosure of Invention

The invention aims to provide an equal variation network training method and device, an image recognition method and device and an electronic device aiming at the defects of the prior art, and the aim is realized by the following technical scheme.

The first aspect of the present invention provides an equal-variation network training method, including:

constructing a network model consisting of convolutional layers;

converting the convolution operation of the convolution layer in the network model into equal-variation convolution operation by using the dimension parameters corresponding to the target transformation group;

and training the network model by using the images in the training image set to obtain the uniform variable network.

A second aspect of the present invention provides an image recognition method, including:

inputting an image into an invariant network obtained by training according to the method of the first aspect, performing a plurality of times of invariant convolution operations on the image by the invariant network to obtain an invariant feature map of a target transformation group, and performing target task recognition according to the feature map;

and acquiring the identification result output by the equal transformer network.

A third aspect of the present invention provides an equal-variation network training apparatus, including:

the construction module is used for constructing a network model consisting of convolutional layers;

the conversion module is used for converting the convolution operation of the convolution layer in the network model into the equal-variation convolution operation by utilizing the dimension parameters corresponding to the target transformation group;

and the training module is used for training the network model by utilizing the images in the training image set so as to obtain the equal variation network.

A fourth aspect of the present invention proposes an image recognition apparatus comprising:

an identification module, configured to input an image into an invariant network trained by the method according to the first aspect, perform a plurality of invariant convolution operations on the image by the invariant network to obtain an invariant feature map of a target transform group, and perform target task identification according to the feature map;

and the acquisition module is used for acquiring the identification result output by the equal transformer network.

A fifth aspect of the present invention proposes an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the method according to the first or second aspect when executing the program.

Based on the above-mentioned equal-variation network training method and image recognition method of the first aspect and the second aspect, the present invention has at least the following advantages or advantages:

by adding a target transformation group dimension in the convolution layer of the network, namely converting the convolution operation of the convolution layer in the network into the equal-variation convolution operation by using the dimension parameter corresponding to the target transformation group, the equal-variation network on the target transformation group is realized, and the transformation of the image can be decoupled from the essential characteristics.

Furthermore, the target transformation group can be any linear transformation group, including but not limited to a translation transformation group, a rotation transformation group, a scaling transformation group, a miscut transformation group, and a three-dimensional rotation transformation group in a homogeneous space, so that the scheme can realize the realization of a transformation network such as any linear group, and has strong universality.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the invention and not to limit the invention. In the drawings:

FIG. 1 is a flow chart illustrating an embodiment of a method for training an equal variation network according to an exemplary embodiment of the present invention;

FIG. 2 is a flowchart illustrating an implementation of an equal variation network training according to an exemplary embodiment of the present invention;

FIG. 3 is a flowchart illustrating an embodiment of an image recognition method according to an exemplary embodiment of the present invention;

FIG. 4 is a schematic diagram illustrating an operation of equal-variation convolution of a first layer convolution layer of the equal-variation network according to the embodiment shown in FIG. 3;

FIG. 5 is a schematic diagram illustrating an operation process of equal-variation convolution of subsequent layer convolution layers of the equal-variation network according to the embodiment shown in FIG. 3;

FIG. 6 is a schematic structural diagram illustrating an equal transformation network training apparatus according to an exemplary embodiment of the present invention;

FIG. 7 is a schematic diagram illustrating an exemplary embodiment of an image recognition apparatus according to the present invention;

FIG. 8 is a diagram illustrating a hardware configuration of an electronic device in accordance with an exemplary embodiment of the present invention;

fig. 9 is a schematic diagram of a storage medium according to an exemplary embodiment of the present invention.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. The following description refers to the accompanying drawings in which the same numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present invention. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the invention, as detailed in the appended claims.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in this specification and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.

It is to be understood that although the terms first, second, third, etc. may be used herein to describe various information, these information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the present invention. The word "if," as used herein, may be interpreted as "at … …" or "at … …" or "in response to a determination," depending on the context.

In order to realize the construction of an arbitrary group of equal-variation networks, the application provides an equal-variation network training method, namely a network model consisting of convolution layers is constructed, the convolution operation of the convolution layers in the network model is converted into equal-variation convolution operation by utilizing the dimension parameters corresponding to a target transformation group, and then the network model is trained by utilizing images in a training image set so as to obtain the equal-variation networks.

The technical effects that can be achieved based on the above description are:

the invention has at least the following beneficial effects or advantages:

Furthermore, the target transformation group can be any linear transformation group, including but not limited to a translation transformation group, a rotation transformation group, a scaling transformation group, a miscut transformation group, and a three-dimensional rotation transformation group in a homogeneous space, so that the scheme can realize the realization of a transformation network such as any group, and has strong universality.

In order to make the technical solutions of the embodiments of the present application better understood, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application.

The first embodiment is as follows:

fig. 1 is a flowchart illustrating an embodiment of an equal-variation network training method according to an exemplary embodiment of the present invention, and as shown in fig. 1, the equal-variation network training method includes the following steps:

step 101: a network model is constructed that is composed of convolutional layers.

Wherein the initially constructed network model may be a conventional deep convolutional network, which is composed of a plurality of conventional convolutional layers.

Step 102: and converting the convolution operation of the convolution layer in the network model into the equal-variation convolution operation by using the dimension parameters corresponding to the target transformation group.

In this embodiment, the image f (x) is considered as a function of position, and is denoted as Lg for the transforms on the target transform group G, where G ∈ G, [ Lgf ] (x) = f (Gg · x), and Gg · x is the transform operation performed on the position space x, and the target transform group is an arbitrary group of linear transforms that need to satisfy the linearity condition on the position space and the parameter space G, as follows:

where the first equation is a linear condition of the parameter space g and the second equation is a linear condition of the position space x.

The transformation groups satisfying the linear condition include, but are not limited to, a translation transformation group, a rotation transformation group, a scaling transformation group, a miscut transformation group, and a three-dimensional rotation transformation group in homogeneous space:

the first matrix is a miscut transformation group transformation matrix, the second matrix is a scaling transformation group transformation matrix, the third matrix is a transformation matrix when the miscut and the scaling are transformed simultaneously, and the fourth matrix and the fifth matrix are three-dimensional rotation transformation group transformation matrices in a homogeneous space.

In one possible implementation, the convolution operation in the convolutional layer is typically implemented using a convolution kernel, and thus the convolution kernel used in the convolutional layer may be transformed using the dimension parameters corresponding to the target transform group to convert the convolution operation of the convolutional layer into an equal-variant convolution operation.

Wherein, the dimension parameters corresponding to the target transformation group generally include transformation range and sampling interval,

in an optional embodiment, for the process of transforming the convolution kernel used in the convolution layer by using the dimension parameter corresponding to the target transformation group, each transformation parameter may be taken from the transformation range at a sampling interval, and the convolution kernel used in the convolution layer may be transformed by using the taken transformation parameter.

For example, the dimension parameters corresponding to the rotation transformation group include an angle range of 0 degree to 360 degrees and a sampling interval of 45 degrees, from 0 degree to 360 degrees, one rotation transformation is performed on the convolution kernel by sampling an angle every 45 degrees to obtain a transformed convolution kernel, and finally, several times of sampling are performed to obtain several transformed convolution kernels.

Step 103: and training the network model by using the images in the training image set to obtain the equal variation network.

Wherein, the identification task of the equal variation network is a task for realizing the training image set. For example, if the training image set is used to implement a classification task (i.e., the label of the image label is a classification label), the recognition result of the invariant network is the classification result of the input image.

It can be understood by those skilled in the art that the above description of the classification task is only an exemplary illustration, and the task type of the peer-to-peer network in the present application is not particularly limited, for example, when the task of the peer-to-peer network is the target detection task, the processing result of the peer-to-peer network is the target detection result.

Thus, the training process shown in fig. 1 is completed, and a dimension of the target transformation group is added to the convolutional layer of the network, that is, the convolutional operation of the convolutional layer in the network is converted into an equal-variation convolutional operation by using a dimension parameter corresponding to the target transformation group, so that an equal-variation network on the target transformation group is realized, and the transformation of the image can be decoupled from the essential features.

Example two:

fig. 2 is a flowchart illustrating a specific implementation process of an equal-variation network training according to an exemplary embodiment of the present invention, based on the embodiment illustrated in fig. 1, as illustrated in fig. 2, the specific implementation process of the equal-variation network training includes the following steps:

step 201: and constructing a network model consisting of the convolutional layers, and converting the convolutional layers in the network model into the equal-variation convolutional layers by using the dimensional parameters corresponding to the target transformation group.

For specific implementation of converting the convolutional layer into the equal-variation convolutional layer, reference may be made to the relevant description in the above embodiments, and details are not repeated herein.

Step 202: and performing Gaussian modulation on the equal-variation convolution layer in the network model by using a Gaussian sampling function.

Since the layers of equal convolution in the network model are defined in discrete space, both the feature mapping and the actual convolution kernel are performed under discrete sampling. Under discrete sampling, the isotacticity of the convolution is limited, being able to be invariant only over discrete groups. The Gaussian modulated equal-variation convolutional layer is guided in space position and angle, and position and angle parameters can be optimized, so that parameters can be defined in a continuous space, the trained equal-variation network can overcome the limitation of discrete sampling, and the equal-variation on a continuous conversion group is obtained.

In a specific implementation, the convolution kernel used in the equal-variation convolution layer is modulated by using a Gaussian sampling function, so that the equal-variation convolution layer performs equal-variation convolution operation on the input characteristic by using the modulated convolution kernel.

In one possible implementation, the specific modulation process for the convolution kernel is to perform an equal-variant convolution operation on the convolution kernel used by the equal-variant convolution layer using a gaussian sampling function to obtain a modulated convolution kernel.

Wherein, the modulation formula is as follows:

in the above equation, Ψ is the modulated convolution kernel,

for the convolution kernel before modulation, ζ (x) is a gaussian sampling function, and the specific form of the function is as follows:

where σ is the variance and T represents the transposed symbol.

Step 203: and training the network model after Gaussian modulation by using the images in the training image set to obtain the equal-variation network.

To this end, the training flow shown in fig. 2 is completed, after the equal-variation network of the general-purpose transformation group is constructed, the equal-variation convolution layer in the equal-variation network is further gaussian-modulated by using a gaussian sampling function, so that the equal-variation network can overcome the limitation of discrete sampling, obtain the equal-variation property on the continuous transformation group, and train the obtained continuous transformation group equal-variation network.

Example three:

fig. 3 is a flowchart of an embodiment of an image recognition method according to an exemplary embodiment of the present invention, based on the embodiment shown in fig. 1, an invariant network used in this embodiment is obtained by training using the training method shown in fig. 1, and as shown in fig. 3, the image recognition method includes the following steps:

step 301: the image is input into the training equal-variation network, the image is subjected to a plurality of times of equal-variation convolution operations by the equal-variation network to obtain an equal-variation feature map of the target transformation group, and target task recognition is carried out according to the feature map.

The equal-variant network includes a plurality of convolutional layers for performing equal-variant convolution operations, that is, each convolutional layer is actually an equal-variant convolutional layer.

In one possible implementation, in the process of performing a plurality of times of equal-variation convolution operations on an image by an equal-variation network, first, the first layer of convolutional layer performs equal-variation convolution operations on the image by using the transformed convolution kernel, and outputs the obtained equal-variation feature map to the second layer of convolutional layer, and then, from the second layer of convolutional layer to the last layer of convolutional layer, performs equal-variation convolution operations on the input feature map by using the transformed convolution kernel through each convolutional layer, and outputs the equal-variation feature map.

In specific implementation, the first layer convolution layer performs an equal-variation convolution operation on the image by using the transformed convolution kernel as follows:

where f (x) is the image, x represents the spatial position of the spatial dimension of the image, f (x) is two-dimensional, h ₁ (x, a) is an output characteristic diagram which is three-dimensional, wherein two dimensions are the space dimension corresponding to x, the other dimension is the transformation dimension of the target transformation group corresponding to a, a represents different transformation parameters of the G dimension of the target transformation group, and psi ₀ (G _a X) is the transformed convolution kernel in the first convolutional layer, a indicates an equal sign of a transformed convolution, a indicates a sign of a convolution, and y is an integral variable.

Therefore, by continuously changing the value of a, a series of characteristic diagram outputs h can be obtained ₁ (x,a)。

As shown in fig. 4, assuming that the values of a are three a1, a2, and a3, a1 corresponds to three transformed convolution kernels, a2 corresponds to one transformed convolution kernel, and a3 corresponds to one transformed convolution kernel, so that the images are convolved by the three transformed convolution kernels, respectively, to obtain a feature map corresponding to a1, a feature map corresponding to a2, and a feature map corresponding to a 3.

Starting from the second convolutional layer to the last convolutional layer (i.e. the subsequent convolutional layer), each convolutional layer uses the transformed convolutional core to perform the equal-variation convolution operation on the input feature map as follows:

wherein h is _n+1 (x, a) is the output characteristic diagram of the nth convolution layer, h _n (x, a) is the input feature map of the nth convolutional layer, Ψ _n (G _a X) is the transformed convolution kernel in the nth convolution layer, h _n+1 、h _n And psi _n The convolution layers are three-dimensional, wherein two dimensions are space dimensions corresponding to x, the other dimension is transformation dimension corresponding to a, the lines represent equal-variation convolution symbols, the marks represent convolution symbols, the number of channels of the transformed convolution kernels in the nth convolution layer is consistent with the number of input feature maps, one channel processes one input feature map, y is an integral variable of the space dimension, and b is an integral variable of the G dimension of the target transformation group.

Therefore, by continuously transforming the value of a, the characteristic diagram output h corresponding to each a can be obtained _n+1 (x,a)。

As shown in fig. 5, assuming that there are an input feature map corresponding to a1, an input feature map corresponding to a2, and an input feature map corresponding to a3, the convolution kernel obtained by using a1 transform, the convolution kernel obtained by using a2 transform, and the convolution kernel obtained by using a3 transform all have 3 channels, each channel is used to process one input feature map, and after each input feature map is subjected to convolution processing by the three transformed convolution kernels, three feature map outputs are still obtained, that is, an output feature map corresponding to a1, an output feature map corresponding to a2, and an output feature map corresponding to a 3.

Step 302: and acquiring the identification result output by the equal transformation network.

If the classification task is realized by the equal transformation network, the output identification result is the classification result of the image, and the identification task of the equal transformation network is not specifically limited in the application.

At this point, the image recognition process shown in fig. 3 is completed.

Corresponding to the embodiment of the equal transformation network training method, the invention also provides an embodiment of the equal transformation network training device.

Fig. 6 is a schematic structural diagram of an equal-variation network training apparatus according to an exemplary embodiment of the present invention, the apparatus is configured to execute the equal-variation network training method provided in any of the above embodiments, and as shown in fig. 6, the equal-variation network training apparatus includes:

a constructing module 610 for constructing a network model composed of convolutional layers;

a conversion module 620, configured to convert, by using the dimension parameter corresponding to the target transformation group, the convolution operation of the convolution layer in the network model into an equal-transformation convolution operation;

a training module 630, configured to train the network model by using images in the training image set to obtain an invariant network.

Fig. 7 is a schematic structural diagram of an image recognition apparatus according to an exemplary embodiment of the present invention, the apparatus being configured to perform the image recognition method provided in any of the above embodiments, as shown in fig. 7, the image recognition apparatus includes:

the identification module 710 is configured to input an image into a trained invariant network, perform a plurality of invariant convolution operations on the image by the invariant network to obtain an invariant feature map of a target transform group, and perform target task identification according to the feature map;

an obtaining module 720, configured to obtain the identification result output by the equal transformation network.

The implementation process of the functions and actions of each unit in the above device is specifically described in the implementation process of the corresponding step in the above method, and is not described herein again.

For the device embodiments, since they substantially correspond to the method embodiments, reference may be made to the partial description of the method embodiments for relevant points. The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules can be selected according to actual needs to achieve the purpose of the solution of the present invention. One of ordinary skill in the art can understand and implement it without inventive effort.

The embodiment of the present invention further provides an electronic device corresponding to the training method for an invariant network or the image recognition method provided in the foregoing embodiment, so as to execute the training method for an invariant network or the image recognition method.

Fig. 8 is a hardware block diagram of an electronic device according to an exemplary embodiment of the present invention, the electronic device including: a communication interface 601, a processor 602, a memory 603, and a bus 604; the communication interface 601, the processor 602 and the memory 603 communicate with each other via a bus 604. The processor 602 may execute the above-described training method or image recognition method by reading and executing machine executable instructions corresponding to the control logic of the training method or image recognition method in the memory 603, and the specific contents of the method are described in the above embodiments and will not be described herein again.

The memory 603 referred to in this disclosure may be any electronic, magnetic, optical, or other physical storage device that can contain stored information, such as executable instructions, data, and the like. Specifically, the Memory 603 may be a RAM (Random Access Memory), a flash Memory, a storage drive (e.g., a hard disk drive), any type of storage disk (e.g., an optical disk, a DVD, etc.), or similar storage medium, or a combination thereof. The communication connection between the network element of the system and at least one other network element is realized through at least one communication interface 601 (which may be wired or wireless), and the internet, a wide area network, a local network, a metropolitan area network, and the like can be used.

Bus 604 can be an ISA bus, PCI bus, EISA bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. The memory 603 is used for storing a program, and the processor 602 executes the program after receiving the execution instruction.

The processor 602 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware or instructions in the form of software in the processor 602. The Processor 602 may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; but may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components. The various methods, steps, and logic blocks disclosed in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor.

The electronic device provided by the embodiment of the application and the equal-variation network training method or the image recognition method provided by the embodiment of the application have the same inventive concept and have the same beneficial effects as the method adopted, operated or realized by the electronic device.

Referring to fig. 9, the computer readable storage medium is an optical disc 30, and a computer program (i.e., a program product) is stored thereon, and when being executed by a processor, the computer program may execute the equivalent network training method or the image recognition method provided in any of the foregoing embodiments.

It should be noted that examples of the computer-readable storage medium may also include, but are not limited to, a phase change memory (PRAM), a Static Random Access Memory (SRAM), a Dynamic Random Access Memory (DRAM), other types of Random Access Memories (RAM), a Read Only Memory (ROM), an Electrically Erasable Programmable Read Only Memory (EEPROM), a flash memory, or other optical and magnetic storage media, which are not described in detail herein.

The computer-readable storage medium provided by the above-mentioned embodiment of the present application and the equal-variation network training method or the image recognition method provided by the embodiment of the present application have the same inventive concept and have the same beneficial effects as the method adopted, run or implemented by the application program stored in the computer-readable storage medium.

Other embodiments of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This invention is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the invention and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the invention being indicated by the following claims.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising a … …" does not exclude the presence of another identical element in a process, method, article, or apparatus that comprises the element.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims

1. An image recognition method, characterized in that the method comprises:

inputting an image into a trained equal-variation network, carrying out a plurality of equal-variation convolution operations on the image by the equal-variation network to obtain an equal-variation feature map of a target transformation group, and carrying out target task identification according to the feature map; the equal-variation network comprises a plurality of layers of convolution layers for executing equal-variation convolution operation, and the equal-variation network performs multiple equal-variation convolution operations on the image, and comprises the following steps:

performing, by the first layer convolutional layer, an equal-variation convolution operation on the image using the transformed convolutional layer, and outputting an obtained equal-variation feature map to the second layer convolutional layer, where the performing of the equal-variation convolution operation on the image is as follows:

wherein h is ₁ (x, a) is an output feature map, f (x) is an image, x represents the spatial position of the spatial dimension of the image, a represents the transformation parameters of the G dimension of the target transformation group, the transformation parameters including transformation range and sampling interval, Ψ ₀ (G _a X) is the transformed convolution kernel in the first convolutional layer, ", indicates an equal sign of the transformed convolution,", indicates a sign of the convolution, and y is an integral variable;

starting from the second layer of convolutional layer to the last layer of convolutional layer, performing equal-variation convolution operation on the input feature graph by using the transformed convolutional core through each convolutional layer and outputting the feature graph, wherein the process of performing equal-variation convolution operation on the input feature graph comprises the following steps:

wherein h is _n+1 (x, a) is the output characteristic diagram of the nth convolution layer, h _n (x, a) is the input feature map of the nth convolutional layer, Ψ _n (G _a X) is a transformed convolution kernel in the nth convolutional layer, wherein the constant sign indicates a convolution symbol, the number of channels of the transformed convolution kernel in the nth convolutional layer is the same as the number of input feature maps, one channel processes one input feature map, y is an integral variable of a spatial dimension, and b is an integral variable of a G dimension of the target transform group;

wherein, the training process of the equal variation network comprises the following steps: constructing a network model consisting of convolutional layers; converting the convolution operation of the convolution layer in the network model into an equal-variation convolution operation by utilizing the dimension parameters corresponding to the target transformation group; and training the network model by using the images in the training image set to obtain the equal variation network.

2. The method of claim 1, wherein transforming convolution operations of convolution layers in the network model into invariant convolution operations using dimensional parameters corresponding to a target transformation group comprises:

and transforming the convolution kernels used in the convolution layers by using the dimension parameters corresponding to the target transformation group so as to convert the convolution operation of the convolution layers into the uniform-variation convolution operation.

3. The method of claim 1, wherein the dimension parameters corresponding to the target transformation group include a transformation range and a sampling interval;

transforming the convolution kernels used in the convolution layer by using the dimension parameters corresponding to the target transformation group, comprising:

and taking each transformation parameter from the transformation range according to the sampling interval, and transforming the convolution kernel used in the convolution layer by using the taken transformation parameter.

4. An image recognition apparatus, characterized in that the apparatus comprises:

the identification module is used for inputting images into a trained equal-variation network, carrying out multiple equal-variation convolution operations on the images by the equal-variation network to obtain an equal-variation feature map of a target transformation group, and identifying a target task according to the feature map; the equal-variation network comprises a plurality of layers of convolution layers for executing equal-variation convolution operation, and the equal-variation network performs multiple equal-variation convolution operations on the image, and comprises the following steps:

starting from the second layer of convolutional layer to the last layer of convolutional layer, carrying out equal-variation convolution operation on the input feature graph by each convolutional layer by using the transformed convolutional core and outputting the feature graph, wherein the process of carrying out equal-variation convolution operation on the input feature graph comprises the following steps:

wherein h is _n+1 (x, a) is the output characteristic diagram of the nth convolution layer, h _n (x, a) is the input feature map of the nth convolutional layer, Ψ _n (G _a X) is a transformed convolution kernel in the nth convolution layer, wherein an indicates an equal-shift convolution symbol, an indicates a convolution symbol, the number of channels of the transformed convolution kernel in the nth convolution layer is consistent with the number of input feature maps, one channel processes one input feature map, y is an integral variable of a space dimension, and b is an integral variable of a G dimension of a target transformation group;

wherein the apparatus further comprises:

the training module is used for constructing a network model consisting of convolutional layers; converting the convolution operation of the convolution layer in the network model into equal-variation convolution operation by using the dimension parameters corresponding to the target transformation group; and training the network model by using the images in the training image set to obtain the equal variation network.

5. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of the method according to any of claims 1-3 when executing the program.