CN114463556B - Equal-variation network training method and device, and image recognition method and device - Google Patents

Equal-variation network training method and device, and image recognition method and device Download PDF

Info

Publication number
CN114463556B
CN114463556B CN202210082330.4A CN202210082330A CN114463556B CN 114463556 B CN114463556 B CN 114463556B CN 202210082330 A CN202210082330 A CN 202210082330A CN 114463556 B CN114463556 B CN 114463556B
Authority
CN
China
Prior art keywords
equal
convolution
variation
network
layer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210082330.4A
Other languages
Chinese (zh)
Other versions
CN114463556A (en
Inventor
陈智强
余山
陈阳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Zhiyuan Artificial Intelligence Research Institute
Original Assignee
Beijing Zhiyuan Artificial Intelligence Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Zhiyuan Artificial Intelligence Research Institute filed Critical Beijing Zhiyuan Artificial Intelligence Research Institute
Priority to CN202210082330.4A priority Critical patent/CN114463556B/en
Publication of CN114463556A publication Critical patent/CN114463556A/en
Application granted granted Critical
Publication of CN114463556B publication Critical patent/CN114463556B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses an equal transformation network training method, which comprises the following steps: constructing a network model consisting of convolutional layers; converting the convolution operation of the convolution layer in the network model into equal-variation convolution operation by using the dimension parameters corresponding to the target transformation group; and training the network model by using the images in the training image set to obtain the equal variation network. By adding a target transformation group dimension in the convolution layer of the network, namely converting the convolution operation of the convolution layer in the network into the equal-variation convolution operation by using the dimension parameter corresponding to the target transformation group, the equal-variation network on the target transformation group is realized, and the transformation of the image can be decoupled from the essential characteristics. The target transformation group can be any linear transformation group, including but not limited to a translation transformation group, a rotation transformation group, a scaling transformation group, a miscut transformation group and a three-dimensional rotation transformation group in a homogeneous space, so that the scheme can realize the realization of a transformation network such as any linear group and the like, and has strong universality.

Description

Equal-variation network training method and device, and image recognition method and device
Technical Field
The invention relates to the technical field of artificial intelligence, in particular to an equal transformation network training method and device and an image recognition method and device.
Background
Generally, the appearance of a visual object is changed by rotating, scaling and other transformations of the visual object in an image, so that a great learning space is brought to a neural network. The feasible solution is to enhance the decoupling capability of the neural network and obtain a more compact potential learning space, and the convolutional neural network has the intrinsic translational decoupling capability and can learn the basic features regardless of the position of the object in the input. In order to further improve the decoupling capability of the network, a group equivariant theory and a rotating group equivariant network are proposed.
However, in the prior art, only the invariant network on the simple transformation group, such as translation, rotation and mirror image, has no general method for realizing the arbitrary linear group invariant network.
Disclosure of Invention
The invention aims to provide an equal variation network training method and device, an image recognition method and device and an electronic device aiming at the defects of the prior art, and the aim is realized by the following technical scheme.
The first aspect of the present invention provides an equal-variation network training method, including:
constructing a network model consisting of convolutional layers;
converting the convolution operation of the convolution layer in the network model into equal-variation convolution operation by using the dimension parameters corresponding to the target transformation group;
and training the network model by using the images in the training image set to obtain the uniform variable network.
A second aspect of the present invention provides an image recognition method, including:
inputting an image into an invariant network obtained by training according to the method of the first aspect, performing a plurality of times of invariant convolution operations on the image by the invariant network to obtain an invariant feature map of a target transformation group, and performing target task recognition according to the feature map;
and acquiring the identification result output by the equal transformer network.
A third aspect of the present invention provides an equal-variation network training apparatus, including:
the construction module is used for constructing a network model consisting of convolutional layers;
the conversion module is used for converting the convolution operation of the convolution layer in the network model into the equal-variation convolution operation by utilizing the dimension parameters corresponding to the target transformation group;
and the training module is used for training the network model by utilizing the images in the training image set so as to obtain the equal variation network.
A fourth aspect of the present invention proposes an image recognition apparatus comprising:
an identification module, configured to input an image into an invariant network trained by the method according to the first aspect, perform a plurality of invariant convolution operations on the image by the invariant network to obtain an invariant feature map of a target transform group, and perform target task identification according to the feature map;
and the acquisition module is used for acquiring the identification result output by the equal transformer network.
A fifth aspect of the present invention proposes an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the method according to the first or second aspect when executing the program.
Based on the above-mentioned equal-variation network training method and image recognition method of the first aspect and the second aspect, the present invention has at least the following advantages or advantages:
by adding a target transformation group dimension in the convolution layer of the network, namely converting the convolution operation of the convolution layer in the network into the equal-variation convolution operation by using the dimension parameter corresponding to the target transformation group, the equal-variation network on the target transformation group is realized, and the transformation of the image can be decoupled from the essential characteristics.
Furthermore, the target transformation group can be any linear transformation group, including but not limited to a translation transformation group, a rotation transformation group, a scaling transformation group, a miscut transformation group, and a three-dimensional rotation transformation group in a homogeneous space, so that the scheme can realize the realization of a transformation network such as any linear group, and has strong universality.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the invention and not to limit the invention. In the drawings:
FIG. 1 is a flow chart illustrating an embodiment of a method for training an equal variation network according to an exemplary embodiment of the present invention;
FIG. 2 is a flowchart illustrating an implementation of an equal variation network training according to an exemplary embodiment of the present invention;
FIG. 3 is a flowchart illustrating an embodiment of an image recognition method according to an exemplary embodiment of the present invention;
FIG. 4 is a schematic diagram illustrating an operation of equal-variation convolution of a first layer convolution layer of the equal-variation network according to the embodiment shown in FIG. 3;
FIG. 5 is a schematic diagram illustrating an operation process of equal-variation convolution of subsequent layer convolution layers of the equal-variation network according to the embodiment shown in FIG. 3;
FIG. 6 is a schematic structural diagram illustrating an equal transformation network training apparatus according to an exemplary embodiment of the present invention;
FIG. 7 is a schematic diagram illustrating an exemplary embodiment of an image recognition apparatus according to the present invention;
FIG. 8 is a diagram illustrating a hardware configuration of an electronic device in accordance with an exemplary embodiment of the present invention;
fig. 9 is a schematic diagram of a storage medium according to an exemplary embodiment of the present invention.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. The following description refers to the accompanying drawings in which the same numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present invention. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the invention, as detailed in the appended claims.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in this specification and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.
It is to be understood that although the terms first, second, third, etc. may be used herein to describe various information, these information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the present invention. The word "if," as used herein, may be interpreted as "at … …" or "at … …" or "in response to a determination," depending on the context.
In order to realize the construction of an arbitrary group of equal-variation networks, the application provides an equal-variation network training method, namely a network model consisting of convolution layers is constructed, the convolution operation of the convolution layers in the network model is converted into equal-variation convolution operation by utilizing the dimension parameters corresponding to a target transformation group, and then the network model is trained by utilizing images in a training image set so as to obtain the equal-variation networks.
The technical effects that can be achieved based on the above description are:
the invention has at least the following beneficial effects or advantages:
by adding a target transformation group dimension in the convolution layer of the network, namely converting the convolution operation of the convolution layer in the network into the equal-variation convolution operation by using the dimension parameter corresponding to the target transformation group, the equal-variation network on the target transformation group is realized, and the transformation of the image can be decoupled from the essential characteristics.
Furthermore, the target transformation group can be any linear transformation group, including but not limited to a translation transformation group, a rotation transformation group, a scaling transformation group, a miscut transformation group, and a three-dimensional rotation transformation group in a homogeneous space, so that the scheme can realize the realization of a transformation network such as any group, and has strong universality.
In order to make the technical solutions of the embodiments of the present application better understood, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application.
The first embodiment is as follows:
fig. 1 is a flowchart illustrating an embodiment of an equal-variation network training method according to an exemplary embodiment of the present invention, and as shown in fig. 1, the equal-variation network training method includes the following steps:
step 101: a network model is constructed that is composed of convolutional layers.
Wherein the initially constructed network model may be a conventional deep convolutional network, which is composed of a plurality of conventional convolutional layers.
Step 102: and converting the convolution operation of the convolution layer in the network model into the equal-variation convolution operation by using the dimension parameters corresponding to the target transformation group.
In this embodiment, the image f (x) is considered as a function of position, and is denoted as Lg for the transforms on the target transform group G, where G ∈ G, [ Lgf ] (x) = f (Gg · x), and Gg · x is the transform operation performed on the position space x, and the target transform group is an arbitrary group of linear transforms that need to satisfy the linearity condition on the position space and the parameter space G, as follows:
Figure BDA0003486395820000051
Figure BDA0003486395820000052
where the first equation is a linear condition of the parameter space g and the second equation is a linear condition of the position space x.
The transformation groups satisfying the linear condition include, but are not limited to, a translation transformation group, a rotation transformation group, a scaling transformation group, a miscut transformation group, and a three-dimensional rotation transformation group in homogeneous space:
Figure BDA0003486395820000053
Figure BDA0003486395820000054
the first matrix is a miscut transformation group transformation matrix, the second matrix is a scaling transformation group transformation matrix, the third matrix is a transformation matrix when the miscut and the scaling are transformed simultaneously, and the fourth matrix and the fifth matrix are three-dimensional rotation transformation group transformation matrices in a homogeneous space.
In one possible implementation, the convolution operation in the convolutional layer is typically implemented using a convolution kernel, and thus the convolution kernel used in the convolutional layer may be transformed using the dimension parameters corresponding to the target transform group to convert the convolution operation of the convolutional layer into an equal-variant convolution operation.
Wherein, the dimension parameters corresponding to the target transformation group generally include transformation range and sampling interval,
in an optional embodiment, for the process of transforming the convolution kernel used in the convolution layer by using the dimension parameter corresponding to the target transformation group, each transformation parameter may be taken from the transformation range at a sampling interval, and the convolution kernel used in the convolution layer may be transformed by using the taken transformation parameter.
For example, the dimension parameters corresponding to the rotation transformation group include an angle range of 0 degree to 360 degrees and a sampling interval of 45 degrees, from 0 degree to 360 degrees, one rotation transformation is performed on the convolution kernel by sampling an angle every 45 degrees to obtain a transformed convolution kernel, and finally, several times of sampling are performed to obtain several transformed convolution kernels.
Step 103: and training the network model by using the images in the training image set to obtain the equal variation network.
Wherein, the identification task of the equal variation network is a task for realizing the training image set. For example, if the training image set is used to implement a classification task (i.e., the label of the image label is a classification label), the recognition result of the invariant network is the classification result of the input image.
It can be understood by those skilled in the art that the above description of the classification task is only an exemplary illustration, and the task type of the peer-to-peer network in the present application is not particularly limited, for example, when the task of the peer-to-peer network is the target detection task, the processing result of the peer-to-peer network is the target detection result.
Thus, the training process shown in fig. 1 is completed, and a dimension of the target transformation group is added to the convolutional layer of the network, that is, the convolutional operation of the convolutional layer in the network is converted into an equal-variation convolutional operation by using a dimension parameter corresponding to the target transformation group, so that an equal-variation network on the target transformation group is realized, and the transformation of the image can be decoupled from the essential features.
Furthermore, the target transformation group can be any linear transformation group, including but not limited to a translation transformation group, a rotation transformation group, a scaling transformation group, a miscut transformation group, and a three-dimensional rotation transformation group in a homogeneous space, so that the scheme can realize the realization of a transformation network such as any group, and has strong universality.
Example two:
fig. 2 is a flowchart illustrating a specific implementation process of an equal-variation network training according to an exemplary embodiment of the present invention, based on the embodiment illustrated in fig. 1, as illustrated in fig. 2, the specific implementation process of the equal-variation network training includes the following steps:
step 201: and constructing a network model consisting of the convolutional layers, and converting the convolutional layers in the network model into the equal-variation convolutional layers by using the dimensional parameters corresponding to the target transformation group.
For specific implementation of converting the convolutional layer into the equal-variation convolutional layer, reference may be made to the relevant description in the above embodiments, and details are not repeated herein.
Step 202: and performing Gaussian modulation on the equal-variation convolution layer in the network model by using a Gaussian sampling function.
Since the layers of equal convolution in the network model are defined in discrete space, both the feature mapping and the actual convolution kernel are performed under discrete sampling. Under discrete sampling, the isotacticity of the convolution is limited, being able to be invariant only over discrete groups. The Gaussian modulated equal-variation convolutional layer is guided in space position and angle, and position and angle parameters can be optimized, so that parameters can be defined in a continuous space, the trained equal-variation network can overcome the limitation of discrete sampling, and the equal-variation on a continuous conversion group is obtained.
In a specific implementation, the convolution kernel used in the equal-variation convolution layer is modulated by using a Gaussian sampling function, so that the equal-variation convolution layer performs equal-variation convolution operation on the input characteristic by using the modulated convolution kernel.
In one possible implementation, the specific modulation process for the convolution kernel is to perform an equal-variant convolution operation on the convolution kernel used by the equal-variant convolution layer using a gaussian sampling function to obtain a modulated convolution kernel.
Wherein, the modulation formula is as follows:
Figure BDA0003486395820000071
in the above equation, Ψ is the modulated convolution kernel,
Figure BDA0003486395820000072
for the convolution kernel before modulation, ζ (x) is a gaussian sampling function, and the specific form of the function is as follows:
Figure BDA0003486395820000073
where σ is the variance and T represents the transposed symbol.
Step 203: and training the network model after Gaussian modulation by using the images in the training image set to obtain the equal-variation network.
To this end, the training flow shown in fig. 2 is completed, after the equal-variation network of the general-purpose transformation group is constructed, the equal-variation convolution layer in the equal-variation network is further gaussian-modulated by using a gaussian sampling function, so that the equal-variation network can overcome the limitation of discrete sampling, obtain the equal-variation property on the continuous transformation group, and train the obtained continuous transformation group equal-variation network.
Example three:
fig. 3 is a flowchart of an embodiment of an image recognition method according to an exemplary embodiment of the present invention, based on the embodiment shown in fig. 1, an invariant network used in this embodiment is obtained by training using the training method shown in fig. 1, and as shown in fig. 3, the image recognition method includes the following steps:
step 301: the image is input into the training equal-variation network, the image is subjected to a plurality of times of equal-variation convolution operations by the equal-variation network to obtain an equal-variation feature map of the target transformation group, and target task recognition is carried out according to the feature map.
The equal-variant network includes a plurality of convolutional layers for performing equal-variant convolution operations, that is, each convolutional layer is actually an equal-variant convolutional layer.
In one possible implementation, in the process of performing a plurality of times of equal-variation convolution operations on an image by an equal-variation network, first, the first layer of convolutional layer performs equal-variation convolution operations on the image by using the transformed convolution kernel, and outputs the obtained equal-variation feature map to the second layer of convolutional layer, and then, from the second layer of convolutional layer to the last layer of convolutional layer, performs equal-variation convolution operations on the input feature map by using the transformed convolution kernel through each convolutional layer, and outputs the equal-variation feature map.
In specific implementation, the first layer convolution layer performs an equal-variation convolution operation on the image by using the transformed convolution kernel as follows:
Figure BDA0003486395820000081
where f (x) is the image, x represents the spatial position of the spatial dimension of the image, f (x) is two-dimensional, h 1 (x, a) is an output characteristic diagram which is three-dimensional, wherein two dimensions are the space dimension corresponding to x, the other dimension is the transformation dimension of the target transformation group corresponding to a, a represents different transformation parameters of the G dimension of the target transformation group, and psi 0 (G a X) is the transformed convolution kernel in the first convolutional layer, a indicates an equal sign of a transformed convolution, a indicates a sign of a convolution, and y is an integral variable.
Therefore, by continuously changing the value of a, a series of characteristic diagram outputs h can be obtained 1 (x,a)。
As shown in fig. 4, assuming that the values of a are three a1, a2, and a3, a1 corresponds to three transformed convolution kernels, a2 corresponds to one transformed convolution kernel, and a3 corresponds to one transformed convolution kernel, so that the images are convolved by the three transformed convolution kernels, respectively, to obtain a feature map corresponding to a1, a feature map corresponding to a2, and a feature map corresponding to a 3.
Starting from the second convolutional layer to the last convolutional layer (i.e. the subsequent convolutional layer), each convolutional layer uses the transformed convolutional core to perform the equal-variation convolution operation on the input feature map as follows:
Figure BDA0003486395820000082
wherein h is n+1 (x, a) is the output characteristic diagram of the nth convolution layer, h n (x, a) is the input feature map of the nth convolutional layer, Ψ n (G a X) is the transformed convolution kernel in the nth convolution layer, h n+1 、h n And psi n The convolution layers are three-dimensional, wherein two dimensions are space dimensions corresponding to x, the other dimension is transformation dimension corresponding to a, the lines represent equal-variation convolution symbols, the marks represent convolution symbols, the number of channels of the transformed convolution kernels in the nth convolution layer is consistent with the number of input feature maps, one channel processes one input feature map, y is an integral variable of the space dimension, and b is an integral variable of the G dimension of the target transformation group.
Therefore, by continuously transforming the value of a, the characteristic diagram output h corresponding to each a can be obtained n+1 (x,a)。
As shown in fig. 5, assuming that there are an input feature map corresponding to a1, an input feature map corresponding to a2, and an input feature map corresponding to a3, the convolution kernel obtained by using a1 transform, the convolution kernel obtained by using a2 transform, and the convolution kernel obtained by using a3 transform all have 3 channels, each channel is used to process one input feature map, and after each input feature map is subjected to convolution processing by the three transformed convolution kernels, three feature map outputs are still obtained, that is, an output feature map corresponding to a1, an output feature map corresponding to a2, and an output feature map corresponding to a 3.
Step 302: and acquiring the identification result output by the equal transformation network.
If the classification task is realized by the equal transformation network, the output identification result is the classification result of the image, and the identification task of the equal transformation network is not specifically limited in the application.
At this point, the image recognition process shown in fig. 3 is completed.
Corresponding to the embodiment of the equal transformation network training method, the invention also provides an embodiment of the equal transformation network training device.
Fig. 6 is a schematic structural diagram of an equal-variation network training apparatus according to an exemplary embodiment of the present invention, the apparatus is configured to execute the equal-variation network training method provided in any of the above embodiments, and as shown in fig. 6, the equal-variation network training apparatus includes:
a constructing module 610 for constructing a network model composed of convolutional layers;
a conversion module 620, configured to convert, by using the dimension parameter corresponding to the target transformation group, the convolution operation of the convolution layer in the network model into an equal-transformation convolution operation;
a training module 630, configured to train the network model by using images in the training image set to obtain an invariant network.
Fig. 7 is a schematic structural diagram of an image recognition apparatus according to an exemplary embodiment of the present invention, the apparatus being configured to perform the image recognition method provided in any of the above embodiments, as shown in fig. 7, the image recognition apparatus includes:
the identification module 710 is configured to input an image into a trained invariant network, perform a plurality of invariant convolution operations on the image by the invariant network to obtain an invariant feature map of a target transform group, and perform target task identification according to the feature map;
an obtaining module 720, configured to obtain the identification result output by the equal transformation network.
The implementation process of the functions and actions of each unit in the above device is specifically described in the implementation process of the corresponding step in the above method, and is not described herein again.
For the device embodiments, since they substantially correspond to the method embodiments, reference may be made to the partial description of the method embodiments for relevant points. The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules can be selected according to actual needs to achieve the purpose of the solution of the present invention. One of ordinary skill in the art can understand and implement it without inventive effort.
The embodiment of the present invention further provides an electronic device corresponding to the training method for an invariant network or the image recognition method provided in the foregoing embodiment, so as to execute the training method for an invariant network or the image recognition method.
Fig. 8 is a hardware block diagram of an electronic device according to an exemplary embodiment of the present invention, the electronic device including: a communication interface 601, a processor 602, a memory 603, and a bus 604; the communication interface 601, the processor 602 and the memory 603 communicate with each other via a bus 604. The processor 602 may execute the above-described training method or image recognition method by reading and executing machine executable instructions corresponding to the control logic of the training method or image recognition method in the memory 603, and the specific contents of the method are described in the above embodiments and will not be described herein again.
The memory 603 referred to in this disclosure may be any electronic, magnetic, optical, or other physical storage device that can contain stored information, such as executable instructions, data, and the like. Specifically, the Memory 603 may be a RAM (Random Access Memory), a flash Memory, a storage drive (e.g., a hard disk drive), any type of storage disk (e.g., an optical disk, a DVD, etc.), or similar storage medium, or a combination thereof. The communication connection between the network element of the system and at least one other network element is realized through at least one communication interface 601 (which may be wired or wireless), and the internet, a wide area network, a local network, a metropolitan area network, and the like can be used.
Bus 604 can be an ISA bus, PCI bus, EISA bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. The memory 603 is used for storing a program, and the processor 602 executes the program after receiving the execution instruction.
The processor 602 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware or instructions in the form of software in the processor 602. The Processor 602 may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; but may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components. The various methods, steps, and logic blocks disclosed in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor.
The electronic device provided by the embodiment of the application and the equal-variation network training method or the image recognition method provided by the embodiment of the application have the same inventive concept and have the same beneficial effects as the method adopted, operated or realized by the electronic device.
Referring to fig. 9, the computer readable storage medium is an optical disc 30, and a computer program (i.e., a program product) is stored thereon, and when being executed by a processor, the computer program may execute the equivalent network training method or the image recognition method provided in any of the foregoing embodiments.
It should be noted that examples of the computer-readable storage medium may also include, but are not limited to, a phase change memory (PRAM), a Static Random Access Memory (SRAM), a Dynamic Random Access Memory (DRAM), other types of Random Access Memories (RAM), a Read Only Memory (ROM), an Electrically Erasable Programmable Read Only Memory (EEPROM), a flash memory, or other optical and magnetic storage media, which are not described in detail herein.
The computer-readable storage medium provided by the above-mentioned embodiment of the present application and the equal-variation network training method or the image recognition method provided by the embodiment of the present application have the same inventive concept and have the same beneficial effects as the method adopted, run or implemented by the application program stored in the computer-readable storage medium.
Other embodiments of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This invention is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the invention and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the invention being indicated by the following claims.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising a … …" does not exclude the presence of another identical element in a process, method, article, or apparatus that comprises the element.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims (5)

1. An image recognition method, characterized in that the method comprises:
inputting an image into a trained equal-variation network, carrying out a plurality of equal-variation convolution operations on the image by the equal-variation network to obtain an equal-variation feature map of a target transformation group, and carrying out target task identification according to the feature map; the equal-variation network comprises a plurality of layers of convolution layers for executing equal-variation convolution operation, and the equal-variation network performs multiple equal-variation convolution operations on the image, and comprises the following steps:
performing, by the first layer convolutional layer, an equal-variation convolution operation on the image using the transformed convolutional layer, and outputting an obtained equal-variation feature map to the second layer convolutional layer, where the performing of the equal-variation convolution operation on the image is as follows:
Figure FDA0003813216610000011
wherein h is 1 (x, a) is an output feature map, f (x) is an image, x represents the spatial position of the spatial dimension of the image, a represents the transformation parameters of the G dimension of the target transformation group, the transformation parameters including transformation range and sampling interval, Ψ 0 (G a X) is the transformed convolution kernel in the first convolutional layer, ", indicates an equal sign of the transformed convolution,", indicates a sign of the convolution, and y is an integral variable;
starting from the second layer of convolutional layer to the last layer of convolutional layer, performing equal-variation convolution operation on the input feature graph by using the transformed convolutional core through each convolutional layer and outputting the feature graph, wherein the process of performing equal-variation convolution operation on the input feature graph comprises the following steps:
Figure FDA0003813216610000012
wherein h is n+1 (x, a) is the output characteristic diagram of the nth convolution layer, h n (x, a) is the input feature map of the nth convolutional layer, Ψ n (G a X) is a transformed convolution kernel in the nth convolutional layer, wherein the constant sign indicates a convolution symbol, the number of channels of the transformed convolution kernel in the nth convolutional layer is the same as the number of input feature maps, one channel processes one input feature map, y is an integral variable of a spatial dimension, and b is an integral variable of a G dimension of the target transform group;
wherein, the training process of the equal variation network comprises the following steps: constructing a network model consisting of convolutional layers; converting the convolution operation of the convolution layer in the network model into an equal-variation convolution operation by utilizing the dimension parameters corresponding to the target transformation group; and training the network model by using the images in the training image set to obtain the equal variation network.
2. The method of claim 1, wherein transforming convolution operations of convolution layers in the network model into invariant convolution operations using dimensional parameters corresponding to a target transformation group comprises:
and transforming the convolution kernels used in the convolution layers by using the dimension parameters corresponding to the target transformation group so as to convert the convolution operation of the convolution layers into the uniform-variation convolution operation.
3. The method of claim 1, wherein the dimension parameters corresponding to the target transformation group include a transformation range and a sampling interval;
transforming the convolution kernels used in the convolution layer by using the dimension parameters corresponding to the target transformation group, comprising:
and taking each transformation parameter from the transformation range according to the sampling interval, and transforming the convolution kernel used in the convolution layer by using the taken transformation parameter.
4. An image recognition apparatus, characterized in that the apparatus comprises:
the identification module is used for inputting images into a trained equal-variation network, carrying out multiple equal-variation convolution operations on the images by the equal-variation network to obtain an equal-variation feature map of a target transformation group, and identifying a target task according to the feature map; the equal-variation network comprises a plurality of layers of convolution layers for executing equal-variation convolution operation, and the equal-variation network performs multiple equal-variation convolution operations on the image, and comprises the following steps:
performing, by the first layer convolutional layer, an equal-variation convolution operation on the image using the transformed convolutional layer, and outputting an obtained equal-variation feature map to the second layer convolutional layer, where the performing of the equal-variation convolution operation on the image is as follows:
Figure FDA0003813216610000021
wherein h is 1 (x, a) is an output feature map, f (x) is an image, x represents the spatial position of the spatial dimension of the image, a represents the transformation parameters of the G dimension of the target transformation group, the transformation parameters including transformation range and sampling interval, Ψ 0 (G a X) is the transformed convolution kernel in the first convolutional layer, ", indicates an equal sign of the transformed convolution,", indicates a sign of the convolution, and y is an integral variable;
starting from the second layer of convolutional layer to the last layer of convolutional layer, carrying out equal-variation convolution operation on the input feature graph by each convolutional layer by using the transformed convolutional core and outputting the feature graph, wherein the process of carrying out equal-variation convolution operation on the input feature graph comprises the following steps:
Figure FDA0003813216610000022
wherein h is n+1 (x, a) is the output characteristic diagram of the nth convolution layer, h n (x, a) is the input feature map of the nth convolutional layer, Ψ n (G a X) is a transformed convolution kernel in the nth convolution layer, wherein an indicates an equal-shift convolution symbol, an indicates a convolution symbol, the number of channels of the transformed convolution kernel in the nth convolution layer is consistent with the number of input feature maps, one channel processes one input feature map, y is an integral variable of a space dimension, and b is an integral variable of a G dimension of a target transformation group;
wherein the apparatus further comprises:
the training module is used for constructing a network model consisting of convolutional layers; converting the convolution operation of the convolution layer in the network model into equal-variation convolution operation by using the dimension parameters corresponding to the target transformation group; and training the network model by using the images in the training image set to obtain the equal variation network.
5. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of the method according to any of claims 1-3 when executing the program.
CN202210082330.4A 2022-01-24 2022-01-24 Equal-variation network training method and device, and image recognition method and device Active CN114463556B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210082330.4A CN114463556B (en) 2022-01-24 2022-01-24 Equal-variation network training method and device, and image recognition method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210082330.4A CN114463556B (en) 2022-01-24 2022-01-24 Equal-variation network training method and device, and image recognition method and device

Publications (2)

Publication Number Publication Date
CN114463556A CN114463556A (en) 2022-05-10
CN114463556B true CN114463556B (en) 2022-12-16

Family

ID=81411677

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210082330.4A Active CN114463556B (en) 2022-01-24 2022-01-24 Equal-variation network training method and device, and image recognition method and device

Country Status (1)

Country Link
CN (1) CN114463556B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116188933B (en) * 2023-05-04 2023-09-01 泉州装备制造研究所 Method and device for predicting target direction of aerial view based on group-wise change

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
NL2016285B1 (en) * 2016-02-19 2017-09-20 Scyfer B V Device and method for generating a group equivariant convolutional neural network.
CN110930369B (en) * 2019-11-01 2023-05-05 中山大学 Pathological section identification method based on group et-variable neural network and conditional probability field
CN111401452B (en) * 2020-03-17 2022-04-26 北京大学 Image classification method of equal-variation convolution network model based on partial differential operator
CN112257753B (en) * 2020-09-23 2023-04-07 北京大学 Image classification method of generalized equal-variation convolution network model based on partial differential operator
CN113723472B (en) * 2021-08-09 2023-11-24 北京大学 Image classification method based on dynamic filtering constant-variation convolutional network model

Also Published As

Publication number Publication date
CN114463556A (en) 2022-05-10

Similar Documents

Publication Publication Date Title
Wang et al. Learning feature descriptors using camera pose supervision
CN109711481B (en) Neural networks for drawing multi-label recognition, related methods, media and devices
CN113361428B (en) Image-based traffic sign detection method
JP2021100247A (en) Distorted document image correction method and device
US11017269B2 (en) System and method for optimization of deep learning architecture
CN114463556B (en) Equal-variation network training method and device, and image recognition method and device
Germain et al. Neural reprojection error: Merging feature learning and camera pose estimation
KR20200144398A (en) Apparatus for performing class incremental learning and operation method thereof
Li et al. A closer look at invariances in self-supervised pre-training for 3d vision
CN110569379A (en) Method for manufacturing picture data set of automobile parts
CN116469110A (en) Image classification method, device, electronic equipment and computer readable storage medium
CN114528976B (en) Equal transformation network training method and device, electronic equipment and storage medium
EP4006789A1 (en) Conversion device, conversion method, program, and information recording medium
CN111914949B (en) Zero sample learning model training method and device based on reinforcement learning
CN110751061B (en) SAR image recognition method, device, equipment and storage medium based on SAR network
CN116363368A (en) Image semantic segmentation method and device based on convolutional neural network
CN114913382A (en) Aerial photography scene classification method based on CBAM-AlexNet convolutional neural network
CN114528977B (en) Equal variable network training method and device, electronic equipment and storage medium
CN115330998A (en) Target detection model training method and device, and target detection method and device
CN113239909B (en) Question processing method, device, equipment and medium
Björk et al. Simpler is better: Spectral regularization and up-sampling techniques for variational autoencoders
CN114565528A (en) Remote sensing image noise reduction method and system based on multi-scale and attention mechanism
Mukherjee et al. Generative semantic domain adaptation for perception in autonomous driving
CN114241446A (en) Method, device and equipment for marking corner points of guideboard and storage medium
CN113496228A (en) Human body semantic segmentation method based on Res2Net, TransUNet and cooperative attention

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant