CN115631343A - Image generation method, device and equipment based on full pulse network and storage medium - Google Patents
Image generation method, device and equipment based on full pulse network and storage medium Download PDFInfo
- Publication number
- CN115631343A CN115631343A CN202211170135.3A CN202211170135A CN115631343A CN 115631343 A CN115631343 A CN 115631343A CN 202211170135 A CN202211170135 A CN 202211170135A CN 115631343 A CN115631343 A CN 115631343A
- Authority
- CN
- China
- Prior art keywords
- image
- feature
- pulse
- attitude
- original
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 59
- 238000012545 processing Methods 0.000 claims abstract description 207
- 238000005070 sampling Methods 0.000 claims abstract description 13
- 230000004927 fusion Effects 0.000 claims description 19
- 238000004590 computer program Methods 0.000 claims description 13
- 239000012528 membrane Substances 0.000 claims description 6
- 230000000877 morphologic effect Effects 0.000 abstract 1
- 210000005036 nerve Anatomy 0.000 abstract 1
- 230000008569 process Effects 0.000 description 15
- 238000010586 diagram Methods 0.000 description 12
- 230000008859 change Effects 0.000 description 6
- 238000004458 analytical method Methods 0.000 description 5
- 238000004891 communication Methods 0.000 description 5
- 238000013528 artificial neural network Methods 0.000 description 4
- 230000006870 function Effects 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 1
- 238000012512 characterization method Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000005096 rolling process Methods 0.000 description 1
- 238000012421 spiking Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/7715—Feature extraction, e.g. by transforming the feature space, e.g. multi-dimensional scaling [MDS]; Mappings, e.g. subspace methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/80—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
- G06V10/806—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Computing Systems (AREA)
- Multimedia (AREA)
- General Health & Medical Sciences (AREA)
- Databases & Information Systems (AREA)
- Medical Informatics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Molecular Biology (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Image Processing (AREA)
Abstract
The invention relates to the field of computers, and provides an image generation method, an image generation device, image generation equipment and a storage medium based on a full pulse network, wherein the method comprises the following steps of: encoding at least one original image through a first pulse encoder to obtain original image characteristics; processing each original attitude image and each target attitude image through a second pulse encoder to obtain original attitude characteristics, wherein the target attitude images and the original attitude images are acquired at different sampling moments; processing the original image characteristics and the original attitude characteristics through a characteristic processor to obtain target image characteristics; and decoding the characteristics of the target image through a pulse decoder to generate a target image of the target object. The image generation method based on the full pulse network provided by the embodiment of the invention can be operated on a nerve morphological device, and an image with the truth degree meeting the requirement is generated under the guidance of the target attitude image, so that the truth degree of the generated image is improved.
Description
Technical Field
The present invention relates to the field of computers, and in particular, to a method, an apparatus, a device, and a storage medium for generating an image based on a full pulse network.
Background
In recent years, much research has been focused on improving the accuracy of a Spiking Neural Network (SNN) in tasks such as image classification and object detection, and the like, and the effect equivalent to or even exceeding that of an Artificial Neural Network (ANN) is realized with fewer parameters and calculation amounts, while the realization of a generative model on the SNN is less researched. However, the image generation task implemented by the ANN often requires more computation, and it is difficult to generate an image with high reality on the edge device quickly.
Disclosure of Invention
The invention provides an image generation method, device, equipment and storage medium based on a full pulse network, aiming at improving the reality degree of a generated image.
In a first aspect, the present invention provides an image generation method based on a full-pulse network, where the full-pulse network includes a first pulse encoder, a second pulse encoder, a feature processor, and a pulse decoder;
the image generation method based on the full pulse network comprises the following steps:
encoding at least one original image through the first pulse encoder to obtain original image characteristics;
processing each original attitude image and each target attitude image through the second pulse encoder to obtain original attitude characteristics, wherein the target attitude images and the original attitude images are acquired at different sampling moments;
processing the original image features and the original posture features through the feature processor to obtain target image features;
decoding the target image features through the pulse decoder to generate a target image of the target object;
the at least one original image is obtained by copying an image to be processed, and the original attitude image is an attitude image corresponding to the original image.
In one embodiment, the feature processor comprises a plurality of identical feature processing modules cascaded in sequence.
The plurality of feature processing modules comprise a first feature processing module, a second feature processing module and a third feature processing module;
the processing the original image feature and the original posture feature by the feature processor to obtain a target image feature includes:
processing the original image features and the original posture features through the first feature processing module to obtain first image features and first posture features;
processing the first image characteristic and the first attitude characteristic through the second characteristic processing module to obtain a second image characteristic and a second attitude characteristic;
and processing the second image characteristic and the second attitude characteristic through the third characteristic processing module to obtain the target image characteristic.
Any one of the feature processing modules comprises a first pulse convolution block, a second pulse convolution block and a third pulse convolution block;
the processing the second image feature and the second pose feature by the third feature processing module to obtain the target image feature includes:
processing the second image characteristic through the first pulse convolution block to obtain an image characteristic to be processed;
processing the second attitude feature through the second pulse convolution block to obtain an attitude feature to be processed;
fusing the image feature to be processed and the attitude feature to be processed through the third pulse convolution block to obtain a fused image feature;
and obtaining the target image characteristics based on the fusion image characteristics.
Any of the feature processing modules further comprises a stacking module;
the fusing the image feature to be processed and the attitude feature to be processed through the third pulse convolution block to obtain a fused image feature, including:
stacking the image features to be processed and the attitude features to be processed along a channel through the stacking module to obtain stacked features;
and performing information fusion on the stacked features through the third pulse convolution block to obtain the fused image features.
Any one of the feature processing modules further comprises an exclusive OR module;
the obtaining the target image feature based on the fused image feature includes:
and performing pulse residual error processing on the fusion image characteristic and the second image characteristic through the XOR module to obtain the target image characteristic.
The target image features are image features of T time steps;
the decoding, by the pulse decoder, the target image feature to generate a target image of the target object includes:
and performing weighted summation on the membrane potential of T time steps of the target image characteristic through the last layer of the pulse decoder to generate a target image.
In a second aspect, the present invention provides an image generation apparatus based on a full pulse network, the full pulse network comprising a first pulse encoder, a second pulse encoder, a feature processor and a pulse decoder;
the image generation device based on the full pulse network comprises:
the encoding unit is used for encoding at least one original image through the first pulse encoder to obtain original image characteristics;
the first processing unit is used for processing each original attitude image and each target attitude image through the second pulse encoder to obtain original attitude characteristics, wherein the target attitude images and the original attitude images are attitude images of a target object acquired at different sampling moments;
the second processing unit is used for processing the original image characteristics and the original posture characteristics through the characteristic processor to obtain target image characteristics;
the decoding unit is used for decoding the target image characteristics through the pulse decoder to generate a target image of the target object;
the at least one original image is obtained by copying an image to be processed, and the original attitude image is an attitude image corresponding to the original image.
In a third aspect, the present invention further provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and when the processor executes the program, the image generation method based on the full impulse network according to the first aspect is implemented.
In a fourth aspect, the present invention also provides a non-transitory computer-readable storage medium comprising a computer program which, when executed by the processor, implements the full impulse network-based image generation method of the first aspect.
In a fifth aspect, the present invention further provides a computer program product comprising a computer program which, when executed by the processor, implements the full impulse network-based image generation method of the first aspect.
The invention provides an image generation method, a device, equipment and a storage medium based on a full pulse network.A first pulse encoder is used for encoding at least one original image to obtain the characteristics of the original image; processing each original posture image and each target posture image through a second pulse encoder to obtain original posture characteristics, wherein the target posture images and the original posture images are posture images of a target object acquired at different sampling moments; processing the original image characteristics and the original attitude characteristics through a characteristic processor to obtain target image characteristics; decoding the characteristics of the target image through a pulse decoder to generate a target image of a target object; at least one original image is obtained by copying an image to be processed, and the original attitude image is an attitude image corresponding to the original image.
In the process of generating the image based on the full pulse network, the pulse encoder, the feature processor and the pulse decoder in the full pulse network are combined with the plurality of original images, the plurality of original attitude images and the plurality of target attitude images, the image with the truth degree meeting the requirement can be generated under the guidance of the target attitude images, and the truth degree of the generated image is improved.
Drawings
In order to more clearly illustrate the technical solution of the present invention, the drawings needed for the description of the embodiment or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.
FIG. 1 is a flow chart of a full pulse network based image generation method provided by the present invention;
FIG. 2 is a schematic diagram of the overall framework of the full-pulse network provided by the present invention;
FIG. 3 is a block diagram of a feature processing module provided by the present invention;
FIG. 4 is a block diagram of an image generation device based on a full pulse network according to the present invention;
fig. 5 is a block diagram of an electronic device provided by the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention clearer, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is obvious that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that in the description of the embodiments of the present invention, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrases "comprising a," "8230," "8230," or "comprising" does not exclude the presence of additional like elements in a process, method, article, or apparatus that comprises the element. The terms "upper", "lower", and the like, indicate orientations or positional relationships that are based on the orientations or positional relationships shown in the drawings, are merely for convenience in describing the present invention and to simplify the description, and do not indicate or imply that the referenced devices or elements must have a particular orientation, be constructed and operated in a particular orientation, and thus, are not to be construed as limiting the present invention. Unless expressly stated or limited otherwise, the terms "mounted," "connected," and "connected" are intended to be inclusive and mean, for example, that they may be fixedly connected, detachably connected, or integrally connected; can be mechanically or electrically connected; they may be connected directly or indirectly through intervening media, or they may be interconnected between two elements. The specific meanings of the above terms in the present invention can be understood by those skilled in the art according to specific situations.
The terms first, second and the like in the description and in the claims of the present application are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It will be appreciated that the data so used may be interchanged under appropriate circumstances such that embodiments of the application may be practiced in sequences other than those illustrated or described herein, and that the terms "first," "second," and the like are generally used herein in a generic sense and do not limit the number of terms, e.g., the first term can be one or more than one. In addition, "and/or" in the specification and claims means at least one of connected objects, a character "/" generally means that a preceding and succeeding related objects are in an "or" relationship.
Further, the image generation method, apparatus, device and storage medium based on the full pulse network provided by the present invention are described with reference to fig. 1 to 5. FIG. 1 is a flow chart of a full pulse network based image generation method provided by the present invention; FIG. 2 is a schematic diagram of the overall framework of the full-pulse network provided by the present invention; FIG. 3 is a block diagram of a feature processing module provided by the present invention; FIG. 4 is a block diagram of an image generation apparatus based on a full pulse network according to the present invention; fig. 5 is a block diagram of an electronic device provided by the present invention.
While a logical order is shown in the flow chart, in some cases, the steps shown or described may be performed in a different order than that shown or described herein.
The embodiment of the present invention is exemplified by taking an electronic device as an execution subject, and the embodiment of the present invention is exemplified by taking an image generation system as an electronic device.
Referring to fig. 1, fig. 1 is a flowchart of an image generation method based on a full pulse network according to the present invention. The image generation method based on the full pulse network provided by the embodiment of the invention comprises the following steps:
s101, encoding at least one original image through the first pulse encoder to obtain original image characteristics;
s102, processing each original attitude image and each target attitude image through the second pulse encoder to obtain original attitude characteristics, wherein the target attitude images and the original attitude images are acquired at different sampling moments;
s103, processing the original image characteristics and the original posture characteristics through the characteristic processor to obtain target image characteristics;
and S104, decoding the target image characteristics through the pulse decoder to generate a target image of the target object.
It should be noted that the image generating system provided by the present invention is a system carrying a full pulse network, and as shown in fig. 2, fig. 2 is a schematic diagram of an overall framework of the full pulse network provided by the present invention, and the full pulse network at least includes a first pulse encoder, a second pulse encoder, a feature processor, and a pulse decoder.
Further, the feature processor includes T feature processing modules that are sequentially cascaded, the T feature processing modules are respectively a feature processing module 1, a feature processing module 2 through a feature processing module T, and the internal structure and the operation principle of each of the feature processing modules 1, 2 through T are the same.
Further, referring to fig. 2, the full pulse network may be divided into two branches, each having a pulse encoder at the input end. Further, one branch of the full impulse network is used for processing image information, i.e. an image branch. The other branch is used for processing the attitude image, namely an attitude branch. That is, it can be understood that the first pulse encoder in the full pulse network is used to process image information and the second pulse encoder is used to process pose images.
Acquisition purposes are required before the raw image and the pose image are input to the pulse encoderStarting from a target object (the target object includes but is not limited to an animal, a movable and swingable object with joints), copying the image to be processed T times to obtain T identical original images, and copying the image to be processed T times, so that the size of the input image of the first pulse encoder in the image branch is C I *H I *W I * T, wherein, C I Is the number of image channels, H I Is the height, W, of the image I T is the number of time steps for the width of the image.
It should be noted that the pose information in each original image may be extracted, and the extracted pose information may be converted into an original pose image corresponding to the original image. Since the image to be processed is copied T times to obtain T identical original images, it can be understood that the original pose image is also copied T times to obtain T identical original pose images. Further, after the target object has changed its motion, the target posture image of the target object after the change of the motion is collected, it can be understood that the target posture image and the original posture image are the posture images of the target object collected at different sampling times, the original posture image is the posture image of the target object before the change of the motion, and the target posture image is the posture image of the target object after the change of the motion. And simultaneously, copying the target attitude image for T times to obtain T identical target attitude images.
Further, the target and original pose images form a pose image pair, and since the pose image pair is copied T times, the size of the second pulse encoder input image in the pose branch is 2 × c P *H I *W I * T, wherein, C P The number of posture joints. It should be further noted that the posture image includes, but is not limited to, posture information, posture joint point information, and posture joint point number.
Specifically, after receiving T input original images, a first pulse encoder of the image branch performs encoding processing on the T original images to obtain original image characteristicsFurther, after receiving the input T attitude image pairs (target attitude image and original attitude image), the second pulse encoder of the attitude branch performs encoding processing on the T attitude image pairs (target attitude image and original attitude image) to obtain original attitude characteristics
Further, the first pulse encoder of the image branch converts the original image featuresThe second pulse coder of the attitude branch transmits the original attitude characteristics to the characteristic processorAnd transmitting to the feature processor. The feature processor receives the original image featuresAnd original pose characteristicsThen, a plurality of feature processing modules in the feature processor process the features of the original imageAnd original pose characteristicsProcessing to obtain the target image characteristicsIt should be further noted that the target image feature is an image feature of an image of the target object after the target object has changed in motion, which is captured on the basis of the original posture image and the changed target posture image.
Further, the feature processor obtains the features of the target imageTransmitted to the pulse decoder. The pulse decoder receives the target image characteristics transmitted by the characteristic processing module TThen, the target image is characterizedAnd performing decoding processing to generate a target image in the target posture. It should be further noted that the target image is an image obtained by capturing a change of the target object based on the original posture image and the changed target posture image after the target object has changed in motion.
According to the image generation method based on the full pulse network, provided by the embodiment of the invention, at least one original image is coded through a first pulse coder to obtain the characteristics of the original image; processing each original attitude image and each target attitude image through a second pulse encoder to obtain original attitude characteristics, wherein the target attitude images and the original attitude images are acquired at different sampling moments; processing the original image characteristics and the original attitude characteristics through a characteristic processor to obtain target image characteristics; decoding the characteristics of the target image through a pulse decoder to generate a target image of a target object; at least one original image is obtained by copying an image to be processed, and the original attitude image is an attitude image corresponding to the original image.
In the process of generating the image based on the full pulse network, the pulse encoder, the feature processor and the pulse decoder in the full pulse network are combined with the plurality of original images, the plurality of original attitude images and the plurality of target attitude images, the image with the truth degree meeting the requirement can be generated under the guidance of the target attitude images, and the truth degree of the generated image is improved. The method can be further understood as that after the target object is subjected to motion change, a target image with high fidelity after the target object is subjected to motion change is generated on the basis of the original attitude image and the changed target attitude image under the guidance of the target attitude image and on the basis of the original image.
The specific analysis that the original image features and the original posture features are processed by the feature processor and the target image features are obtained in S103 is as follows:
processing the original image features and the original posture features through the first feature processing module to obtain first image features and first posture features;
processing the first image characteristic and the first attitude characteristic through the second characteristic processing module to obtain a second image characteristic and a second attitude characteristic;
and processing the second image characteristic and the second posture characteristic through the third characteristic processing module to obtain the target image characteristic.
It should be noted that the first feature processing module, the second feature processing module, and the third feature processing module in the embodiment of the present invention do not represent only three feature processing modules, and the first feature processing module, the second feature processing module, and the third feature processing module only represent the front-back order of the feature processing modules. The second feature processing module in the embodiment of the present invention may be an intermediate feature processing module in the feature processor, the first feature processing module may be all feature processing modules before the intermediate feature processing module, and the third feature processing module may be all feature processing modules after the intermediate feature processing module. In the embodiment of the present invention, the number of feature processors is T, for example, the second feature processing module is an intermediate feature processing module T, the first feature processing module is a feature processing module 1 to a feature processing module T-1, and the third feature processing module is a feature processing module T +1, a feature processing module T-1 to a feature processing module T.
Specifically, the feature processing module 1 receives the original image featuresAnd original pose characteristicsThen, the original image is characterizedAnd original pose characteristicsProcessing to output the first image characteristicAnd a first attitude characteristicFurther, the feature processing module 2 processes the first image feature output by the feature processing module 1And a first attitude characteristicFor input, the first image featureAnd a first attitude featureProcessing to output the second image characteristicAnd a second attitude featureSequentially, the characteristic processing module t outputs t-1 image characteristics through the characteristic processing module t-1And t-1 th attitude featureFor input, the t-1 image featuresAnd t-1 th attitude featureProcessing and outputting the t image characteristicAnd tth attitude feature
Sequentially, the characteristic processing module T-1 outputs the T-2 image characteristic by the characteristic processing module T-2And the T-2 th posture featureFor input, the T-2 image is characterizedAnd the T-2 th attitude featureProcessing to output T-1 image characteristicsAnd the T-1 th posture featureFurther, the characterization processing module T outputs the T-1 image characteristics by the characteristic processing module T-1And the T-1 th posture featureFor input, the T-1 image is characterizedAnd the T-1 th posture featureProcessing to output the target image feature
According to the embodiment of the invention, the original image features and the original attitude features are sequentially processed by the feature processing module in the feature processor, and the reality degree of the finally generated target image features meets the requirement under the guidance of the target attitude image, so that the image with the reality degree meeting the requirement can be generated, and the reality degree of the generated image is improved.
Further, the third feature processing module processes the second image feature and the second pose feature, and the specific analysis of the target image feature is obtained as follows:
processing the second image characteristic through the first pulse convolution block to obtain an image characteristic to be processed;
processing the second attitude feature through the second pulse convolution block to obtain an attitude feature to be processed;
fusing the image feature to be processed and the attitude feature to be processed through the third pulse convolution block to obtain a fused image feature;
and obtaining the target image characteristics based on the fusion image characteristics.
Further, referring to fig. 3, fig. 3 is a structural diagram of a feature processing module provided in the present invention, that is, the internal structure and operation principle of each feature processing module in a feature processor are the structure of fig. 3. As shown in fig. 3, there are also two branches in the feature processing module, and one branch is used for processing the image features, i.e. the image branch. The other branch is used for processing the posture characteristic, namely a posture branch. Further, the feature processing module includes a first pulse convolution block and a second pulse convolution block, that is, the image branch includes the first pulse convolution block, and the gesture branch includes the second pulse convolution block, where the first pulse convolution block and the second pulse convolution block are two 3 × 3 pulse convolution blocks. Further, the feature processing module further includes two stacking modules, a third pulse volume block and an exclusive or module, the third pulse volume block is a 1 × 1 pulse volume block, the pulse volume block is composed of a convolution layer, a batch normalization layer and a LIF (leak integration-and-Fire) layer, and the exclusive or module is composed of two linear layers.
Specifically, the embodiment of the present invention takes the tth feature processing module as an example, and the input of the tth feature processing module is the image feature output by the t-1 th feature processing moduleAnd attitude characteristicsI.e. the second image is characterized byThe second posture is characterized in thatSecond image characteristicThe size of C H W T, the second posture characteristicThe size of (d) is 2 × c × h × w × t.
Further, in the image branch of the tth feature processing module, the second image feature is mapped by the first pulse convolution block (two 3 × 3 pulse convolution blocks)And processing to obtain the characteristics of the image to be processed. In the gesture branch of the tth feature processing module, the second gesture feature is mapped by a second pulse convolution block (two 3 × 3 pulse convolution blocks)And processing to obtain the posture characteristics to be processed. And further, fusing the image features to be processed and the attitude features to be processed through a third pulse convolution block (1 x 1 pulse convolution block) in the t-th feature processing module to obtain fused image features.
Further, the tth feature processing module processes the image feature according to the second image featureFusing the image features and the posture features to be processed and outputting the t-th image featureAnd the tth attitude featureWherein, the t image characteristicAnd the tth posture characteristicSize of, and second image characteristicAnd a second attitude featureIs the same size, i.e. the t-th image featureThe size of (1) is C H W T, the T posture is specialSign forThe size of (d) is 2 × c × h × w × t. Finally, according to the t image characteristic output by the t characteristic processing moduleAnd the tth attitude featureObtaining target image characteristics
According to the embodiment of the invention, the image features and the attitude features are processed through the first pulse volume block, the second pulse volume block and the third pulse volume block in the feature processing module, and the reality degree of the finally generated target image features meets the requirement under the guidance of the target attitude image, so that the image with the reality degree meeting the requirement can be generated, and the reality degree of the generated image is improved.
Further, the third pulse convolution block is used for fusing the image feature to be processed and the attitude feature to be processed, and the specific analysis of the fused image feature is as follows:
stacking the image features to be processed and the attitude features to be processed by the stacking module along a channel to obtain stacked features;
and performing information fusion on the stacked features through the third pulse convolution block to obtain the fused image features.
Specifically, the stacking module performs stacking processing on the image feature to be processed and the posture feature to be processed along the channel to obtain a stacked feature. Further, the 1 × 1 pulse convolution block in the feature processing module performs information fusion on the stacked features to obtain fused image features.
According to the embodiment of the invention, the image features and the attitude features are processed through the stacking module and the third pulse rolling block, and the truth degree of the target image features generated under the guidance of the target attitude image meets the requirement, so that the image with the truth degree meeting the requirement can be generated, and the truth degree of the generated image is improved.
Further, the specific analysis of the target image feature obtained based on the fusion image feature is as follows:
and performing pulse residual error processing on the fusion image characteristic and the second image characteristic through the XOR module to obtain the target image characteristic.
It should be noted that, in the embodiment of the present invention, the pulse residual in the pulse network is implemented by the xor operation of the xor module, and the xor operation is implemented by two linear layers.
Specifically, first, two inputs for calculating the residual are tiled separately to obtain features with size M × T, the two are stacked and then subjected to linear layers with size M × T2 and 2 × 1, respectively, to obtain the result of the exclusive or operation (size M × T1), and then the size is re-adjusted to C × H × W T.
Therefore, in the embodiment of the present invention, the tth feature processing module is taken as an example, and the tth feature processing module fuses the image feature and the second image feature according to the resultPerforming pulse residual error processing to obtain the t-th image characteristicFurther, the tth feature processing module processes the tth image feature through the stacking moduleStacking the posture characteristic to be processed obtained after the second posture characteristic is processed by the first pulse convolution block along the channel to obtain the t-th posture characteristic
Sequentially carrying out the steps until the T-th feature processing module determines the image features of the image branches as target imagesCharacteristic of
According to the embodiment of the invention, the fused image feature and the second image feature are subjected to pulse residual error processing through the XOR module, and the reality degree of the generated target image feature meets the requirement under the guidance of the target attitude image, so that the image with the reality degree meeting the requirement can be generated, and the reality degree of the generated image is improved.
Further, the specific analysis of decoding the target image feature by the pulse decoder to generate the target image of the target object described in S104 is as follows:
and performing weighted summation on the membrane potential of T time steps of the target image characteristic through the last layer of the pulse decoder to generate a target image.
In particular, the last layer (last layer) of the pulse decoder is specific to the target image featureThe membrane potentials of the T time steps are weighted and summed to generate the target image.
According to the embodiment of the invention, the membrane potential of T time steps is subjected to weighted sum, so that the image with the truth degree meeting the requirement can be generated, and the truth degree of the generated image is improved.
Further, the overall procedure of the present invention is analyzed as follows:
after a first pulse encoder of an image branch receives T input original images, the T original images are encoded to obtain original image characteristicsFurthermore, after receiving the input T attitude image pairs, the second pulse encoder of the attitude branch performs encoding processing on the T attitude image pairs to obtain original attitude characteristics
The characteristic processing module 1 receives the original image characteristicsAnd original pose characteristicsThen, the original image features are characterized by the first pulse convolution block (two 3 x 3 pulse convolution blocks)Processing to obtain first image feature to be processed, and processing the original posture feature by a second pulse convolution block (two 3 × 3 pulse convolution blocks)And processing to obtain a first to-be-processed attitude characteristic. Further, the feature processing module 1 performs stacking processing on the first to-be-processed image feature and the first to-be-processed posture feature along the channel through the stacking module to obtain a stacked feature. Further, the feature processing module 1 performs information fusion on the stacked features through a third pulse convolution block (1 × 1 pulse convolution block) to obtain a fused image feature. Further, the feature processing module 1 combines the fused image feature and the original image feature through an exclusive or modulePerforming pulse residual error processing to obtain a first image characteristicFurther, the feature processing module 1 processes the first image feature through the stacking moduleStacking the first posture to be processed along the channel to obtain a first posture characteristic
Further onThe feature processing module 2 uses the first image feature output by the feature processing module 1And a first attitude featureIs an input.
The feature processing module 2 receives the first image featureAnd a first attitude featureThereafter, the first image feature is characterized by a first pulse convolution block (two 3 by 3 pulse convolution blocks)Processing to obtain second image feature to be processed, and processing the first posture feature by using second pulse convolution block (two 3-by-3 pulse convolution blocks)And processing to obtain a second posture characteristic to be processed. Further, the feature processing module 2 stacks the second to-be-processed image feature and the second to-be-processed posture feature along the channel through the stacking module, so as to obtain a stacked feature. Further, the feature processing module 2 performs information fusion on the stacked features through a third pulse convolution block (a pulse convolution block of 1 × 1) to obtain a fused image feature. Further, the feature processing module 2 fuses the image feature and the first image feature through an exclusive or modulePerforming pulse residual error processing to obtain a second image characteristicFurther, the feature processing module 2 pairs the second graph by stacking modulesImage characteristicsStacking the second posture to be processed along the channel to obtain a second posture characteristic
And the characteristic processing module 3 to the characteristic processing module T-1 are sequentially carried out.
The characteristic processing module T outputs the T-1 image characteristic by the characteristic processing module T-1And the T-1 th posture featureIs an input.
The characteristic processing module T receives the T-1 image characteristicAnd the T-1 th posture featureThen, the T-1 image feature is characterized by the first pulse convolution block (two 3-by-3 pulse convolution blocks)Processing to obtain the T-th image feature to be processed, and processing the T-1-th posture feature by a second pulse convolution block (two 3-to-3 pulse convolution blocks)And processing to obtain the T-th to-be-processed attitude characteristic. Further, the feature processing module T stacks the tth to-be-processed image feature and the tth to-be-processed posture feature along the channel through the stacking module to obtain a stacked feature. Further, the feature processing module T performs information fusion on the stacked features through a third pulse convolution block (1 × 1 pulse convolution block) to obtain a fusion graphLike the features. Further, the feature processing module T fuses the image feature and the T-1 image feature through an XOR modulePerforming pulse residual error processing to obtain the T-th image characteristicAnd characterizing the Tth imageDetermining as a target image feature
The target loss function in the embodiment of the invention is L full Target loss function L full Loss from L1, L per Perceptual loss and L GAN Generating a penalty contribution, i.e. a target penalty function of L full Is L full =argmin G max D λ 1 L GAN +λ 2 L 1 +λ 3 L per 。
Further, the following describes the image generating apparatus based on the full pulse network according to the present invention, and the image generating apparatus based on the full pulse network and the image generating method based on the full pulse network may be referred to in correspondence with each other.
As shown in fig. 4, fig. 4 is a structural diagram of an image generating apparatus based on a full pulse network according to the present invention, the image generating apparatus based on the full pulse network includes a first pulse encoder, a second pulse encoder, a feature processor, and a pulse decoder;
the image generation device based on the full pulse network comprises:
an encoding unit 401, configured to encode at least one original image through the first pulse encoder to obtain original image characteristics;
a first processing unit 402, configured to process each original pose image and each target pose image through the second pulse encoder to obtain an original pose feature, where the target pose image and the original pose image are pose images of a target object acquired at different sampling times;
a second processing unit 403, configured to process, by the feature processor, the original image feature and the original pose feature to obtain a target image feature;
a decoding unit 404, configured to decode, by the pulse decoder, the target image feature to generate a target image of the target object;
the at least one original image is obtained by copying an image to be processed, and the original attitude image is an attitude image corresponding to the original image.
Further, the second processing unit 403 is further configured to:
processing the original image features and the original posture features through the first feature processing module to obtain first image features and first posture features;
processing the first image characteristic and the first attitude characteristic through the second characteristic processing module to obtain a second image characteristic and a second attitude characteristic;
and processing the second image characteristic and the second attitude characteristic through the third characteristic processing module to obtain the target image characteristic.
Further, the second processing unit 403 is further configured to:
processing the second image characteristic through the first pulse convolution block to obtain an image characteristic to be processed;
processing the second attitude feature through the second pulse convolution block to obtain an attitude feature to be processed;
fusing the image feature to be processed and the attitude feature to be processed through the third pulse convolution block to obtain a fused image feature;
and obtaining the target image characteristic based on the fusion image characteristic.
Further, the second processing unit 403 is further configured to:
stacking the image features to be processed and the attitude features to be processed by the stacking module along a channel to obtain stacked features;
and performing information fusion on the stacked features through the third pulse convolution block to obtain the fused image features.
Further, the second processing unit 403 is further configured to:
and performing pulse residual error processing on the fusion image characteristic and the second image characteristic through the XOR module to obtain the target image characteristic.
Further, the decoding unit 404 is further configured to:
and performing weighted summation on the membrane potential of T time steps of the target image characteristic through the last layer of the pulse decoder to generate a target image.
The specific embodiment of the image generating apparatus based on the full pulse network provided by the present invention is basically the same as each embodiment of the image generating method based on the full pulse network, and details are not described herein.
Fig. 5 illustrates a physical structure diagram of an electronic device, and as shown in fig. 5, the electronic device may include: a processor (processor) 510, a communication Interface (Communications Interface) 520, a memory (memory) 530 and a communication bus 540, wherein the processor 510, the communication Interface 520 and the memory 530 communicate with each other via the communication bus 540. Processor 510 may invoke logic instructions in memory 530 to perform a full impulse network-based image generation method comprising:
encoding at least one original image through the first pulse encoder to obtain original image characteristics;
processing each original posture image and each target posture image through the second pulse encoder to obtain original posture characteristics, wherein the target posture images and the original posture images are posture images of a target object acquired at different sampling moments;
processing the original image features and the original posture features through the feature processor to obtain target image features;
decoding the target image features through the pulse decoder to generate a target image of the target object;
the at least one original image is obtained by copying an image to be processed, and the original attitude image is an attitude image corresponding to the original image.
Furthermore, the logic instructions in the memory 530 may be implemented in the form of software functional units and stored in a computer readable storage medium when the software functional units are sold or used as independent products. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
In another aspect, the present invention also provides a computer program product, the computer program product comprising a computer program stored on a non-transitory computer-readable storage medium, the computer program comprising program instructions, when the program instructions are executed by a computer, the computer being capable of executing the full pulse network-based image generation method provided by the above methods, the method comprising:
encoding at least one original image through the first pulse encoder to obtain original image characteristics;
processing each original attitude image and each target attitude image through the second pulse encoder to obtain original attitude characteristics, wherein the target attitude images and the original attitude images are acquired at different sampling moments;
processing the original image features and the original posture features through the feature processor to obtain target image features;
decoding the target image features through the pulse decoder to generate a target image of the target object;
the at least one original image is obtained by copying an image to be processed, and the original attitude image is an attitude image corresponding to the original image.
In yet another aspect, the present invention also provides a non-transitory computer-readable storage medium having stored thereon a computer program, which when executed by a processor is implemented to perform the above-provided full impulse network-based image generation methods, the method comprising:
encoding at least one original image through the first pulse encoder to obtain original image characteristics;
processing each original attitude image and each target attitude image through the second pulse encoder to obtain original attitude characteristics, wherein the target attitude images and the original attitude images are acquired at different sampling moments;
processing the original image features and the original posture features through the feature processor to obtain target image features;
decoding the target image features through the pulse decoder to generate a target image of the target object;
the at least one original image is obtained by copying an image to be processed, and the original attitude image is an attitude image corresponding to the original image.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.
Claims (10)
1. An image generation method based on a full pulse network is characterized in that the full pulse network comprises a first pulse encoder, a second pulse encoder, a feature processor and a pulse decoder;
the image generation method based on the full pulse network comprises the following steps:
encoding at least one original image through the first pulse encoder to obtain original image characteristics;
processing each original attitude image and each target attitude image through the second pulse encoder to obtain original attitude characteristics, wherein the target attitude images and the original attitude images are acquired at different sampling moments;
processing the original image features and the original posture features through the feature processor to obtain target image features;
decoding the target image features through the pulse decoder to generate a target image of the target object;
the at least one original image is obtained by copying an image to be processed, and the original attitude image is an attitude image corresponding to the original image.
2. The full pulse network-based image generation method of claim 1, wherein the feature processor comprises a plurality of identical feature processing modules cascaded in sequence.
3. The full pulse network-based image generation method according to claim 2, wherein the plurality of feature processing modules comprises a first feature processing module, a second feature processing module, and a third feature processing module;
the processing the original image features and the original posture features through the feature processor to obtain target image features includes:
processing the original image features and the original posture features through the first feature processing module to obtain first image features and first posture features;
processing the first image feature and the first attitude feature through the second feature processing module to obtain a second image feature and a second attitude feature;
and processing the second image characteristic and the second attitude characteristic through the third characteristic processing module to obtain the target image characteristic.
4. The full pulse network-based image generation method according to claim 3, wherein any one of the feature processing modules comprises a first pulse convolution block, a second pulse convolution block and a third pulse convolution block;
the processing the second image feature and the second pose feature by the third feature processing module to obtain the target image feature includes:
processing the second image characteristic through the first pulse convolution block to obtain an image characteristic to be processed;
processing the second attitude feature through the second pulse convolution block to obtain an attitude feature to be processed;
fusing the image features to be processed and the attitude features to be processed through the third pulse convolution block to obtain fused image features;
and obtaining the target image characteristics based on the fusion image characteristics.
5. The full pulse network-based image generation method according to claim 4, wherein any one of the feature processing modules further comprises a stacking module;
fusing the image feature to be processed and the attitude feature to be processed through the third pulse convolution block to obtain a fused image feature, including:
stacking the image features to be processed and the attitude features to be processed by the stacking module along a channel to obtain stacked features;
and performing information fusion on the stacked features through the third pulse convolution block to obtain the fused image features.
6. The full pulse network-based image generation method according to claim 4, wherein any one of the feature processing modules further comprises an exclusive or module;
the obtaining of the target image feature based on the fusion image feature comprises:
and performing pulse residual error processing on the fusion image characteristic and the second image characteristic through the XOR module to obtain the target image characteristic.
7. The full pulse network-based image generation method according to any one of claims 1 to 6, wherein the target image feature is an image feature of T time steps;
the decoding, by the pulse decoder, the target image feature to generate a target image of the target object includes:
and performing weighted sum on the membrane potential of T time steps of the target image characteristic through the last layer of the pulse decoder to generate a target image.
8. An image generation device based on a full pulse network is characterized in that the full pulse network comprises a first pulse encoder, a second pulse encoder, a feature processor and a pulse decoder;
the image generation device based on the full pulse network comprises:
the encoding unit is used for encoding at least one original image through the first pulse encoder to obtain original image characteristics;
the first processing unit is used for processing each original attitude image and each target attitude image through the second pulse encoder to obtain original attitude characteristics, wherein the target attitude images and the original attitude images are attitude images of a target object acquired at different sampling moments;
the second processing unit is used for processing the original image characteristics and the original posture characteristics through the characteristic processor to obtain target image characteristics;
the decoding unit is used for decoding the target image characteristics through the pulse decoder to generate a target image of the target object;
the at least one original image is obtained by copying an image to be processed, and the original attitude image is an attitude image corresponding to the original image.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the full pulse network-based image generation method of any one of claims 1 to 7 when executing the computer program.
10. A non-transitory computer readable storage medium comprising a computer program, wherein the computer program when executed by a processor implements the full impulse network-based image generation method of any one of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211170135.3A CN115631343A (en) | 2022-09-22 | 2022-09-22 | Image generation method, device and equipment based on full pulse network and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211170135.3A CN115631343A (en) | 2022-09-22 | 2022-09-22 | Image generation method, device and equipment based on full pulse network and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115631343A true CN115631343A (en) | 2023-01-20 |
Family
ID=84901927
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211170135.3A Pending CN115631343A (en) | 2022-09-22 | 2022-09-22 | Image generation method, device and equipment based on full pulse network and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115631343A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116989800A (en) * | 2023-09-27 | 2023-11-03 | 安徽大学 | Mobile robot visual navigation decision-making method based on pulse reinforcement learning |
-
2022
- 2022-09-22 CN CN202211170135.3A patent/CN115631343A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116989800A (en) * | 2023-09-27 | 2023-11-03 | 安徽大学 | Mobile robot visual navigation decision-making method based on pulse reinforcement learning |
CN116989800B (en) * | 2023-09-27 | 2023-12-15 | 安徽大学 | Mobile robot visual navigation decision-making method based on pulse reinforcement learning |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108319932B (en) | Multi-image face alignment method and device based on generative confrontation network | |
KR102081854B1 (en) | Method and apparatus for sign language or gesture recognition using 3D EDM | |
CN110009691B (en) | Parallax image generation method and system based on binocular stereo vision matching | |
US20210334621A1 (en) | Arithmetic processing system using hierarchical network | |
CN110659573B (en) | Face recognition method and device, electronic equipment and storage medium | |
CN105981050B (en) | For extracting the method and system of face characteristic from the data of facial image | |
CN110543901A (en) | image recognition method, device and equipment | |
CN112200057B (en) | Face living body detection method and device, electronic equipment and storage medium | |
CN114511798B (en) | Driver distraction detection method and device based on transformer | |
US20220067888A1 (en) | Image processing method and apparatus, storage medium, and electronic device | |
CN114140831B (en) | Human body posture estimation method and device, electronic equipment and storage medium | |
CN110598601A (en) | Face 3D key point detection method and system based on distributed thermodynamic diagram | |
CN113516133B (en) | Multi-modal image classification method and system | |
CN113344003B (en) | Target detection method and device, electronic equipment and storage medium | |
CN113781164B (en) | Virtual fitting model training method, virtual fitting method and related devices | |
CN112749603A (en) | Living body detection method, living body detection device, electronic apparatus, and storage medium | |
WO2023071806A1 (en) | Apriori space generation method and apparatus, and computer device, storage medium, computer program and computer program product | |
CN115019135A (en) | Model training method, target detection method, device, electronic equipment and storage medium | |
CN115631343A (en) | Image generation method, device and equipment based on full pulse network and storage medium | |
CN114978189A (en) | Data coding method and related equipment | |
CN118202388A (en) | Attention-based depth point cloud compression method | |
Lyu et al. | Identifiability-guaranteed simplex-structured post-nonlinear mixture learning via autoencoder | |
CN113079136B (en) | Motion capture method, motion capture device, electronic equipment and computer-readable storage medium | |
CN111127632B (en) | Human modeling model acquisition method and device, electronic equipment and storage medium | |
CN117392260A (en) | Image generation method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |