WO2019076188A1 - 一种图像目标识别方法、装置及计算机设备 - Google Patents
一种图像目标识别方法、装置及计算机设备 Download PDFInfo
- Publication number
- WO2019076188A1 WO2019076188A1 PCT/CN2018/108301 CN2018108301W WO2019076188A1 WO 2019076188 A1 WO2019076188 A1 WO 2019076188A1 CN 2018108301 W CN2018108301 W CN 2018108301W WO 2019076188 A1 WO2019076188 A1 WO 2019076188A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- feature
- sequence
- image
- horizontal
- feature sequence
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/253—Fusion techniques of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/72—Data preparation, e.g. statistical preprocessing of image or video features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/80—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
- G06V10/806—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/60—Type of objects
- G06V20/62—Text, e.g. of license plates, overlay texts or captions on TV images
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/60—Type of objects
- G06V20/62—Text, e.g. of license plates, overlay texts or captions on TV images
- G06V20/63—Scene text, e.g. street names
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/18—Extraction of features or characteristics of the image
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/18—Extraction of features or characteristics of the image
- G06V30/1801—Detecting partial patterns, e.g. edges or contours, or configurations, e.g. loops, corners, strokes or intersections
- G06V30/18019—Detecting partial patterns, e.g. edges or contours, or configurations, e.g. loops, corners, strokes or intersections by matching or filtering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2218/00—Aspects of pattern recognition specially adapted for signal processing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
Definitions
- the present application relates to the field of machine vision technology, and in particular, to an image object recognition method, apparatus, and computer device.
- Image target recognition mainly refers to the identification of a pre-specified target on a given image.
- target recognition is mainly carried out by the following three methods: template matching method, feature point matching method and deep learning method.
- template matching method feature point matching method
- deep learning method deep learning method.
- the above three methods perform target recognition based on feature information of a predetermined target, and the feature information of the specified target is often limited to have a fixed posture.
- the characteristic information of the target if the target in the image is tilted or deformed, the above three methods cannot identify the accurate target.
- an image detection method capable of recognizing multi-angle targets is proposed.
- the angle recognizer is used to judge the angle of the image.
- the target classifier generating device is used to generate the target classification for the angle. , using the target classifier to identify the specified target from the image.
- the target classifier needs to be generated separately for each tilt angle, and the angle of each target needs to be determined in advance, if the serial operation results in a higher operating rate. Slow, if running in parallel requires the processor to have efficient processing power, therefore, the target recognition of the above method is less efficient.
- An object of the embodiments of the present application is to provide an image object recognition method, apparatus, and computer device to improve the efficiency of target recognition.
- the specific technical solutions are as follows:
- an embodiment of the present application provides an image object recognition method, where the method includes:
- Feature extraction is performed along the horizontal viewing direction and the vertical viewing direction of the image, respectively, and the horizontal feature sequence and the vertical feature sequence of the image are obtained correspondingly;
- the fusion feature is activated to obtain image features
- the target in the image is identified by decoding the image features.
- the feature extraction is performed along the horizontal viewing direction and the vertical viewing direction of the image, respectively, and the horizontal feature sequence and the vertical feature sequence of the image are obtained, including:
- a convolution operation is performed along a vertical viewing direction of the image to obtain a second convolution result; the second convolution result is determined as a vertical feature sequence.
- the method before the merging the horizontal feature sequence and the vertical feature sequence to obtain a fused feature, the method further includes:
- Combining the horizontal feature sequence and the vertical feature sequence to obtain a fusion feature including:
- the horizontal feature sequence and the vertical feature sequence are fused in a weighted sum manner to obtain a fusion feature.
- the feature extraction is performed along the horizontal viewing direction and the vertical viewing direction of the image, respectively, and the horizontal feature sequence and the vertical feature sequence of the image are obtained, including:
- the second convolution result and the second inverse sequence are determined as a longitudinal feature sequence.
- the method before the merging the horizontal feature sequence and the vertical feature sequence to obtain a fused feature, the method further includes:
- Combining the horizontal feature sequence and the vertical feature sequence to obtain a fusion feature including:
- the merging the horizontal feature sequence and the vertical feature sequence to obtain a fusion feature includes:
- the horizontal feature sequence and the vertical feature sequence are fused in a splicing fusion manner to obtain a fusion feature.
- an embodiment of the present application provides an image object recognition apparatus, where the apparatus includes:
- a feature extraction module configured to perform feature extraction along a horizontal viewing direction and a vertical viewing direction of the image respectively, and correspondingly obtain a horizontal feature sequence and a vertical feature sequence of the image;
- a fusion module configured to fuse the horizontal feature sequence and the vertical feature sequence to obtain a fusion feature
- An activation module configured to activate the fusion feature by using a preset activation function to obtain an image feature
- a decoding module configured to identify a target in the image by decoding the image feature.
- the feature extraction module is specifically configured to:
- a convolution operation is performed along a vertical viewing direction of the image to obtain a second convolution result; the second convolution result is determined as a vertical feature sequence.
- the device further includes:
- a first deformation parameter extraction module configured to perform a convolution operation on the image based on the horizontal feature sequence and the vertical feature sequence, and extracting, for expressing the horizontal feature sequence and the vertical feature sequence respectively in image deformation The deformation parameter of the weight in the middle;
- the fusion module is specifically configured to:
- the horizontal feature sequence and the vertical feature sequence are fused in a weighted sum manner to obtain a fusion feature.
- the feature extraction module is specifically configured to:
- the second convolution result and the second inverse sequence are determined as a longitudinal feature sequence.
- the device further includes:
- a second deformation parameter extraction module configured to: based on the first convolution result in the horizontal feature sequence, the first reverse sequence, and the second convolution result and the second reverse sequence in the vertical feature sequence, Performing a convolution operation on the image, and extracting, for indicating the first convolution result, the first reverse sequence, the second convolution result, and the second reverse sequence respectively occupying an image deformation The deformation parameter of the weight;
- the fusion module is specifically configured to:
- the fusion module is specifically configured to:
- the horizontal feature sequence and the vertical feature sequence are fused in a splicing fusion manner to obtain a fusion feature.
- an embodiment of the present application provides a computer device, including a processor and a memory, where
- the memory is configured to store a computer program
- the processor when used to execute a computer program stored on the memory, implements the method steps as described in the first aspect.
- the feature extraction is performed based on the horizontal viewing direction and the vertical viewing direction of the image respectively, and the horizontal feature sequence and the vertical feature sequence of the image can be obtained, and then the horizontal feature sequence and the vertical feature sequence are obtained.
- the fusion is performed to obtain the fusion feature.
- the image features obtained by the activation of the fusion feature are decoded to identify the target in the image.
- the target For the target with tilt, the target has different components in different viewing directions. Therefore, by extracting features along different perspectives and then integrating the features, the complete feature information of the target at each perspective is obtained, and then the preset activation is performed.
- the function activates the fusion feature to obtain image features that can match the template.
- the target can be identified by decoding. This method does not need to judge the angle of the target in advance, and thus does not need to generate various targets for different angles.
- the target classifier simplifies the implementation steps of target recognition, and improves the efficiency of target recognition on the basis of ensuring the accuracy of target recognition.
- FIG. 1 is a schematic diagram of a target tilt in a realistic target recognition scene
- FIG. 2 is a schematic flow chart of an image object recognition method according to an embodiment of the present application.
- FIG. 3a is a schematic diagram of horizontal convolution of an image according to an embodiment of the present application.
- FIG. 3b is a schematic diagram of longitudinal convolution of an image according to an embodiment of the present application.
- FIG. 4 is another schematic flowchart of an image object recognition method according to an embodiment of the present application.
- FIG. 5 is a diagram showing an example of target recognition according to an embodiment of the present application.
- FIG. 6 is a schematic structural diagram of an image object recognition apparatus according to an embodiment of the present application.
- FIG. 7 is another schematic structural diagram of an image object recognition apparatus according to an embodiment of the present application.
- FIG. 8 is a schematic structural diagram of still another embodiment of an image object recognition apparatus according to an embodiment of the present application.
- FIG. 9 is a schematic structural diagram of a computer device according to an embodiment of the present application.
- the embodiment of the present application provides an image target recognition method, apparatus, and computer device.
- the image object recognition method provided by the embodiment of the present application is first introduced.
- the execution subject of the image object recognition method provided by the embodiment of the present application may be a computer device having an image recognition function, or may be a camera having an image recognition function, and the execution body includes at least a core processing chip having data processing capability.
- a manner of implementing an image object recognition method provided by an embodiment of the present application may be at least one of software, hardware circuits, and logic circuits disposed in an execution body.
- an image object recognition method provided by an embodiment of the present application may include the following steps.
- S201 performing feature extraction along a horizontal viewing direction and a vertical viewing direction of the image, respectively, and obtaining a horizontal feature sequence and a vertical feature sequence of the image.
- feature extraction can be performed separately in two viewing angle directions, that is, feature extraction is performed along the horizontal viewing direction and the vertical viewing direction of the image, respectively.
- the method of extracting features along the horizontal viewing direction of the image may be a convolution operation along the horizontal viewing direction of the image, which is called lateral convolution, as shown in FIG. 3a; and the feature extraction is performed along the vertical viewing direction of the image. It may be a convolution operation in the direction of the vertical viewing angle of the image, referred to as longitudinal convolution, as shown in Figure 3b.
- the method of feature extraction can also be a method of extracting other features, and the features along the horizontal viewing direction of the image and the features along the vertical viewing direction of the image can be extracted, which will not be repeated here.
- a horizontal feature sequence and a vertical feature sequence of the image can be obtained accordingly.
- the horizontal feature sequence is a feature sequence whose column number is equivalent to 1
- the vertical feature sequence is a feature sequence whose row number is equivalent to 1.
- the number of columns or the number of rows is equivalent to 1, and does not represent the number of columns or rows. The number must be equal to 1, indicating only that data can be processed as a whole.
- the feature has feature information complete with the image.
- the feature fusion can be achieved by splicing and converging Concat, and directly splicing the horizontal feature sequence and the vertical feature sequence to obtain a fusion feature; or adding the corresponding feature values by means of superposition of the feature points by the Eltwise Add to obtain the fusion
- the feature may also be based on weights used to represent the degree of deformation of the horizontal feature sequence and the vertical feature sequence, and the fusion feature is obtained by weighted sum.
- S203 Activate the fusion feature by using a preset activation function to obtain an image feature.
- the fusion feature obtained by fusion can reflect the complete feature information of the image. However, the features of the same target are often scattered. It is necessary to preserve and map the features of each target through the preset activation function to obtain image features.
- the image features can match the template.
- the preset activation function may be a hyperbolic tanh function, an S-type growth curve Sigmoid function, a modified linear unit ReLU function, or the like in the nonlinear activation function, or may be other types of activation functions, which are not enumerated here.
- the image features can be decoded by the following decoder, and different decoders can be used for different targets. For example, if the specified target to be recognized is text, the Attention decoder can be selected. Of course, a conventional target classifier can also be selected to identify the category of the target by confidence.
- the feature extraction is performed based on the horizontal viewing direction and the vertical viewing direction of the image respectively, and the horizontal feature sequence and the vertical feature sequence of the image can be obtained, and then the horizontal feature sequence and the vertical feature sequence are merged to obtain a fusion feature, and finally
- the image features obtained by activating the fusion feature are decoded to identify the target in the image.
- the target For the target with tilt, the target has different components in different viewing directions. Therefore, by extracting features along different perspectives and then integrating the features, the complete feature information of the target at each perspective is obtained, and then the preset activation is performed.
- the function activates the fusion feature to obtain image features that can match the template.
- the target can be identified by decoding. This method does not need to judge the angle of the target in advance, and thus does not need to generate various targets for different angles.
- the target classifier simplifies the implementation steps of target recognition, and improves the efficiency of target recognition on the basis of ensuring the accuracy of target recognition.
- the embodiment of the present application further provides another image object recognition method.
- the image object recognition method includes the following steps.
- S401 Feature extraction is performed along a horizontal viewing direction and a vertical viewing direction of the image, respectively, and a horizontal feature sequence and a vertical feature sequence of the image are obtained correspondingly.
- a convolution operation is performed to perform a convolution operation along a horizontal viewing angle of the image to obtain a first convolution result, and the first convolution result is determined as a horizontal feature sequence;
- the convolution operation is performed in the vertical viewing direction direction to obtain a second convolution result, and the second convolution result is determined as a vertical feature sequence. That is to say, the horizontal feature sequence only includes the convolution result obtained by performing a convolution operation on the row vector of the image, and the vertical feature sequence includes only the convolution result obtained by performing convolution operation on the column vector of the image.
- S402. Perform a convolution operation on the image based on the horizontal feature sequence and the vertical feature sequence, and extract a deformation parameter used to represent the weight of the horizontal feature sequence and the vertical feature sequence in the image deformation.
- the length of the feature sequence, parameters and other attributes, the proportion of the feature in the view direction is explained, that is, the weight of the feature sequence in the view direction in the image deformation, through the image
- the horizontal and vertical convolution operations are performed, and the weights corresponding to the respective viewing directions can be extracted.
- the horizontal feature sequence and the vertical feature sequence are fused in a weighted sum manner to obtain a fusion feature.
- the deformation parameters represent the proportion of the feature sequences in each view direction in the image deformation.
- the feature sequences corresponding to different view directions can be fused by weighted sum, and the fusion features obtained after fusion can be Reflects the degree of deformation of each direction of view.
- the obtained horizontal feature sequence is v
- the vertical feature sequence is h
- the deformation parameter obtained by the convolution operation is (0.6, 0.4)
- each feature value in the fusion feature is 0.6 ⁇ v ij +0.4. ⁇ h ij .
- S404 Activate the fusion feature by using a preset activation function to obtain an image feature.
- S404 and S405 in this embodiment are the same as S203 and S204 in the embodiment shown in FIG. 2, and details are not described herein again.
- the feature extraction is performed based on the horizontal viewing direction and the vertical viewing direction of the image respectively, and the horizontal feature sequence and the vertical feature sequence of the image can be obtained, and then the horizontal feature sequence and the vertical feature sequence are merged to obtain a fusion feature, and finally
- the image features obtained by activating the fusion feature are decoded to identify the target in the image.
- the target For the target with tilt, the target has different components in different viewing directions. Therefore, by extracting features along different perspectives and then integrating the features, the complete feature information of the target at each perspective is obtained, and then the preset activation is performed.
- the function activates the fusion feature to obtain image features that can match the template.
- the target can be identified by decoding.
- the target classifier simplifies the implementation steps of target recognition, and improves the efficiency of target recognition on the basis of ensuring the accuracy of target recognition. And, by performing a convolution operation on the horizontal feature sequence and the vertical feature sequence, a deformation parameter for representing the weight of the horizontal feature sequence and the vertical feature sequence in the image deformation is obtained, and the feature fusion is performed by using a weighted sum method.
- the fusion feature can more accurately reflect the degree of deformation of the target in different perspective directions, which can further improve the accuracy of target recognition.
- the embodiment of the present application may further perform the convolution operation on the row vector of the image along the horizontal direction of the image to obtain the first convolution result, and then perform the row vector in the first convolution result.
- Reversely arranging to obtain a first reverse sequence and performing a convolution operation on the column vector of the image along the vertical viewing direction of the image to obtain a second convolution result, and then inversely arranging the column vectors in the second convolution result to obtain the first Two reverse sequences.
- the first convolution result and the first inverse sequence are determined as a lateral feature sequence
- the second convolution result and the second reverse sequence are determined as a longitudinal feature sequence.
- the tilting of the target can be more intuitively represented by the forward and reverse alignment of the convolution results.
- the embodiment will be described with reference to specific examples.
- FIG. 5 it is an example of target recognition of an embodiment of the present application.
- the horizontal convolution of the image is performed along the horizontal viewing direction of the image to obtain a first convolution result.
- the first convolution result is Inverting the row vectors in the first convolution result to obtain the first reverse sequence
- the first convolution result and the first inverse sequence are determined as a sequence of lateral features.
- a convolution operation is performed on the column vector of the image along the vertical viewing direction of the image to obtain a second convolution result, as shown in FIG. 5, and the second convolution result is Inverting the column vectors in the second convolution result to obtain a second reverse sequence
- the second convolution result and the second inverse sequence are determined as a longitudinal feature sequence.
- the image is convoluted based on the first convolution result in the horizontal feature sequence, the first inverse sequence, and the second convolution result and the second inverse sequence in the vertical feature sequence, and is extracted for Deformation parameter ⁇ representing the weight of the first convolution result, the first inverse sequence, the second convolution result, and the second inverse sequence in the image deformation respectively: ( ⁇ 1 , ⁇ 1 , ⁇ 2 , ⁇ 2 ) T.
- the first convolution result, the first inverse sequence, the second convolution result and the second reverse sequence are fused by the formula (1) in a weighted sum manner to obtain n ⁇ Fusion feature of 1
- the fusion feature is activated by the formula (2) using the preset hyperbolic activation function to obtain the image feature h.
- the target in the image is identified by decoding the image features.
- the target has different components in different viewing directions for the target with tilting. Therefore, by extracting features along different perspectives and then integrating the features, the complete feature information of the target in each perspective is obtained.
- the preset activation function is used to activate the fusion feature to obtain image features that can match the template.
- the target can be identified by decoding. This method does not need to judge the angle of the target in advance, and does not need to be aimed at different angles.
- the target generates various target classifiers, which simplifies the implementation steps of target recognition, and improves the efficiency of target recognition on the basis of ensuring the accuracy of target recognition.
- the forward and reverse alignment of the convolution result can more intuitively represent the tilt of the target, and the convolution operation is performed on the horizontal feature sequence and the vertical feature sequence.
- the deformation parameter used to represent the weight of the horizontal feature sequence and the vertical feature sequence in the image deformation respectively is characterized by weighted sum, so that the fusion feature can more accurately reflect the degree of deformation of the target in different viewing directions. Further improve the accuracy of target recognition.
- the embodiment of the present application provides an image object recognition device.
- the image object recognition device may include:
- the feature extraction module 610 is configured to perform feature extraction along the horizontal viewing direction and the vertical viewing direction of the image, respectively, and obtain a horizontal feature sequence and a vertical feature sequence of the image.
- the fusion module 620 is configured to fuse the horizontal feature sequence and the vertical feature sequence to obtain a fusion feature.
- the activation module 630 is configured to activate the fusion feature by using a preset activation function to obtain an image feature.
- the decoding module 640 is configured to identify a target in the image by decoding the image feature.
- the feature extraction is performed based on the horizontal viewing direction and the vertical viewing direction of the image respectively, and the horizontal feature sequence and the vertical feature sequence of the image can be obtained, and then the horizontal feature sequence and the vertical feature sequence are merged to obtain a fusion feature, and finally
- the image features obtained by activating the fusion feature are decoded to identify the target in the image.
- the target For the target with tilt, the target has different components in different viewing directions. Therefore, by extracting features along different perspectives and then integrating the features, the complete feature information of the target at each perspective is obtained, and then the preset activation is performed.
- the function activates the fusion feature to obtain image features that can match the template.
- the target can be identified by decoding. This method does not need to judge the angle of the target in advance, and thus does not need to generate various targets for different angles.
- the target classifier simplifies the implementation steps of target recognition, and improves the efficiency of target recognition on the basis of ensuring the accuracy of target recognition.
- the merging module 620 is specifically configured to: fused the horizontal feature sequence and the vertical feature sequence in a splicing and merging manner to obtain a fused feature.
- the image object recognition device of the embodiment of the present application is a device for applying the image object recognition method of the embodiment shown in FIG. 2, and all embodiments of the image object recognition method of the embodiment shown in FIG. 2 are applicable to the image object recognition device. And can achieve the same or similar benefits.
- the embodiment of the present application provides another image object recognition device.
- the image object recognition device may include:
- a feature extraction module 710 configured to perform a convolution operation along a horizontal viewing direction of the image to obtain a first convolution result; determine the first convolution result as a horizontal feature sequence; perform convolution along a vertical viewing direction of the image Operation, obtaining a second convolution result; determining the second convolution result as a longitudinal feature sequence.
- the first deformation parameter extraction module 720 is configured to perform a convolution operation on the image based on the horizontal feature sequence and the vertical feature sequence, and extract an image for indicating the horizontal feature sequence and the vertical feature sequence respectively in an image The deformation parameter of the weight in the deformation.
- the fusion module 730 is configured to fuse the horizontal feature sequence and the vertical feature sequence in a weighted sum manner according to the deformation parameter to obtain a fusion feature.
- the activation module 740 is configured to activate the fusion feature by using a preset activation function to obtain an image feature.
- the decoding module 750 is configured to identify a target in the image by decoding the image feature.
- the feature extraction is performed based on the horizontal viewing direction and the vertical viewing direction of the image respectively, and the horizontal feature sequence and the vertical feature sequence of the image can be obtained, and then the horizontal feature sequence and the vertical feature sequence are merged to obtain a fusion feature, and finally
- the image features obtained by activating the fusion feature are decoded to identify the target in the image.
- the target For the target with tilt, the target has different components in different viewing directions. Therefore, by extracting features along different perspectives and then integrating the features, the complete feature information of the target at each perspective is obtained, and then the preset activation is performed.
- the function activates the fusion feature to obtain image features that can match the template.
- the target can be identified by decoding.
- the target classifier simplifies the implementation steps of target recognition, and improves the efficiency of target recognition on the basis of ensuring the accuracy of target recognition. And, by performing a convolution operation on the horizontal feature sequence and the vertical feature sequence, a deformation parameter for representing the weight of the horizontal feature sequence and the vertical feature sequence in the image deformation is obtained, and the feature fusion is performed by using a weighted sum method.
- the fusion feature can more accurately reflect the degree of deformation of the target in different perspective directions, which can further improve the accuracy of target recognition.
- the embodiment of the present application provides another image object recognition device.
- the image object recognition device may include:
- a feature extraction module 810 configured to perform a convolution operation along a horizontal viewing direction of the image to obtain a first convolution result; and inversely arrange the row vectors in the first convolution result to obtain a first reverse sequence;
- the first convolution result and the first reverse sequence are determined as a horizontal feature sequence;
- a convolution operation is performed along a vertical viewing direction of the image to obtain a second convolution result;
- the second convolution result is
- the column vectors are reversely arranged to obtain a second inverse sequence; the second convolution result and the second inverse sequence are determined as a vertical feature sequence.
- the second deformation parameter extraction module 820 is configured to: based on the first convolution result in the horizontal feature sequence, the first reverse sequence, and the second convolution result and the second reverse sequence in the vertical feature sequence, Performing a convolution operation on the image, the extracting is used to represent the first convolution result, the first inverse sequence, the second convolution result, and the second inverse sequence respectively in image deformation The deformation parameter of the weight.
- the fusion module 830 is configured to, according to the deformation parameter, the first convolution result, the first reverse sequence, the second convolution result, and the second reverse sequence in a weighted sum manner Fusion is performed to obtain fusion features.
- the activation module 840 is configured to activate the fusion feature by using a preset activation function to obtain an image feature.
- the decoding module 850 is configured to identify a target in the image by decoding the image feature.
- the target has different components in different viewing directions for the target with tilting. Therefore, by extracting features along different viewing angles and then performing feature fusion, the complete feature information of the target at each viewing angle is obtained. Then use the preset activation function to activate the fusion feature to obtain the image features that can match the template. Finally, the target can be identified by decoding. This method does not need to judge the angle of the target in advance, and it does not need to be aimed at different angles.
- the target generates various target classifiers, which simplifies the implementation steps of target recognition, and improves the efficiency of target recognition on the basis of ensuring the accuracy of target recognition.
- the forward and reverse alignment of the convolution result can more intuitively represent the tilt of the target, and the convolution operation is performed on the horizontal feature sequence and the vertical feature sequence.
- the deformation parameter used to represent the weight of the horizontal feature sequence and the vertical feature sequence in the image deformation respectively is characterized by weighted sum, so that the fusion feature can more accurately reflect the degree of deformation of the target in different viewing directions. Further improve the accuracy of target recognition.
- the embodiment of the present application provides a computer device, as shown in FIG. 9, including a processor 901 and a memory 902, wherein the memory 902 is configured to store a computer program;
- the processor 901 is configured to perform the following steps when executing the computer program stored on the memory 902:
- Feature extraction is performed along the horizontal viewing direction and the vertical viewing direction of the image respectively, and the horizontal feature sequence and the vertical feature sequence of the image are obtained correspondingly; the horizontal feature sequence and the vertical feature sequence are merged to obtain a fusion feature; An activation function is provided to activate the fusion feature to obtain an image feature; to identify the target in the image by decoding the image feature.
- the processor 901 when the processor 901 performs the feature extraction in the horizontal view direction and the vertical view direction of the image, and correspondingly obtains the horizontal feature sequence and the vertical feature sequence of the image, the following steps may be specifically implemented. : performing a convolution operation along a horizontal viewing direction of the image to obtain a first convolution result; determining the first convolution result as a horizontal feature sequence; performing a convolution operation along a vertical viewing direction of the image to obtain a second volume The result of the product; the second convolution result is determined as a sequence of longitudinal features.
- the processor 901 may further implement the following steps: performing a convolution operation on the image based on the horizontal feature sequence and the vertical feature sequence, and extracting, for expressing the horizontal feature sequence and the vertical The deformation parameter of the weight of the feature sequence in the image deformation.
- the processor 901 performs the step of fusing the horizontal feature sequence and the vertical feature sequence to obtain a fused feature, the following steps may be specifically implemented: according to the deformation parameter, in a weighted manner, The lateral feature sequence and the longitudinal feature sequence are fused to obtain a fusion feature.
- the processor 901 when the processor 901 performs the feature extraction in the horizontal view direction and the vertical view direction of the image, and correspondingly obtains the horizontal feature sequence and the vertical feature sequence of the image, the following steps may be specifically implemented. : performing a convolution operation along a horizontal viewing direction of the image to obtain a first convolution result; performing inverse alignment of the row vectors in the first convolution result to obtain a first inverse sequence; and the first convolution And the first reverse sequence is determined as a horizontal feature sequence; performing a convolution operation along a vertical viewing direction of the image to obtain a second convolution result; and reversing the column vector in the second convolution result Arranging to obtain a second inverse sequence; determining the second convolution result and the second inverse sequence as a longitudinal feature sequence.
- the processor 901 may further implement the following steps: based on the first convolution result in the horizontal feature sequence, the first reverse sequence, and the second convolution result in the vertical feature sequence, a second reverse sequence, performing a convolution operation on the image, and extracting, for indicating the first convolution result, the first reverse sequence, the second convolution result, and the second reverse sequence respectively The deformation parameter of the weight in the image deformation.
- the processor 901 performs the step of fusing the horizontal feature sequence and the vertical feature sequence to obtain a fused feature
- the following steps may be specifically implemented: according to the deformation parameter, in a weighted manner, The first convolution result, the first inverse sequence, the second convolution result, and the second inverse sequence are fused to obtain a fused feature.
- the specific step may be implemented as follows: The transverse feature sequence and the longitudinal feature sequence are fused to obtain a fusion feature.
- Data transmission between the memory 902 and the processor 901 can be performed by means of a wired connection or a wireless connection, and the computer device can communicate with other devices through a wired communication interface or a wireless communication interface. It should be noted that only an example of transmitting data between the processor 901 and the memory 902 through the bus is shown in FIG. 9, and is not limited to a specific transmission mode.
- the above memory may include a RAM (Random Access Memory), and may also include an NVM (Non-volatile Memory), such as at least one disk storage.
- the memory may also be at least one storage device located away from the processor.
- the processor may be a general-purpose processor, including a CPU (Central Processing Unit), an NP (Network Processor), or the like; or a DSP (Digital Signal Processor) or an ASIC (Application) Specific Integrated Circuit, FPGA (Field-Programmable Gate Array) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components.
- CPU Central Processing Unit
- NP Network Processor
- DSP Digital Signal Processor
- ASIC Application) Specific Integrated Circuit
- FPGA Field-Programmable Gate Array
- other programmable logic device discrete gate or transistor logic device, discrete hardware components.
- the processor of the computer device reads and runs a computer program stored in the memory, and the computer program executes the image object recognition method provided by the embodiment of the present application at runtime, thereby enabling: Target, the target has different components in different viewing directions. Therefore, by extracting features along different perspectives and then integrating the features, the complete feature information of the target in each perspective is obtained, and then the fusion is performed by using the preset activation function. The feature is activated to obtain image features that can match the template, and finally the target can be identified by decoding.
- This method does not need to determine the angle of the target that is tilted in advance, and thus does not need to generate various target classifiers for targets of different angles.
- the implementation steps of target recognition are simplified, and the efficiency of target recognition is improved on the basis of ensuring the accuracy of target recognition.
- the embodiment of the present application provides a computer readable storage medium for storing a computer program, which is used to execute at runtime: the embodiment of the present application The image target recognition method provided.
- the computer readable storage medium stores a computer program that executes the image object recognition method provided by the embodiment of the present application at runtime, and thus can realize that the target has different components in different viewing directions for the target that is tilted. Therefore, by extracting features along different perspectives and then performing feature fusion, the complete feature information of the target at each perspective is obtained, and then the fusion function is activated by using a preset activation function to obtain an image that can match the template. Feature, finally, the target can be identified by decoding.
- This method does not need to judge the angle of the target in advance, and it does not need to generate various target classifiers for targets with different angles, which simplifies the implementation steps of target recognition. Based on the accuracy of recognition, the efficiency of target recognition is improved.
- the embodiment of the present application provides an application program for performing the image object recognition method provided by the embodiment of the present application.
- the application performs the image object recognition method provided by the embodiment of the present application at runtime, and thus can realize that the target has different components in different viewing directions for the target that is tilted, and therefore, by different angles of view Feature extraction, and then through the feature fusion method, the complete feature information of the target in each perspective is obtained, and then the fusion feature is activated by using the preset activation function to obtain the image features that can match the template, and finally can be identified by decoding.
- Target this method does not need to judge the angle of the target in advance, and it does not need to generate various target classifiers for targets with different angles, which simplifies the implementation steps of target recognition. On the basis of ensuring the accuracy of target recognition, Improve the efficiency of target recognition.
- the computer readable storage medium and the application embodiment since the method content involved is basically similar to the foregoing method embodiment, the description is relatively simple, and the relevant part of the method embodiment is referred to can.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Data Mining & Analysis (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Multimedia (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Databases & Information Systems (AREA)
- General Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Software Systems (AREA)
- Image Analysis (AREA)
- Image Processing (AREA)
Abstract
本申请实施例提供了一种图像目标识别方法、装置及计算机设备,其中,图像目标识别方法包括:分别沿图像的水平视角方向和垂直视角方向进行特征提取,相应得到图像的横向特征序列和纵向特征序列;将横向特征序列及纵向特征序列进行融合,得到融合特征;利用预设激活函数,对融合特征进行激活,得到图像特征;通过对图像特征进行解码,识别图像中的目标。通过本方案可以提高目标识别的效率。
Description
本申请要求于2017年10月18日提交中国专利局、申请号为201710969721.7发明名称为“一种图像目标识别方法、装置及计算机设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
本申请涉及机器视觉技术领域,特别涉及一种图像目标识别方法、装置及计算机设备。
图像目标识别主要是指在给定的图像上定位识别出预先指定的目标。在图像处理领域,主要通过如下三种方法进行目标识别:模板匹配法、特征点匹配法和深度学习的方法。但是,在实际目标识别场景中,存在很多倾斜、旋转、弯曲的目标,上述三种方法均是基于预先指定的目标的特征信息进行目标识别,该指定的目标的特征信息往往限定为具有固定姿态的目标的特征信息,如果图像中的目标发生倾斜或者变形,利用上述三种方法则无法识别出准确的目标。
针对上述问题,相应的提出了能识别多角度目标的图像检测方法,在进行目标识别之前,利用角度辨识器判断图像的角度,基于该角度,利用目标分类器生成装置生成针对该角度的目标分类器,使用目标分类器从图像中识别指定的目标。
但是,如果一个图像中有多个目标的角度发生倾斜,则需要分别针对每个倾斜角度生成目标分类器,并且,需要对每个目标的角度预先进行判断,如果串行运行会导致运行速率较慢,如果并行运行则需要处理器具有高效的处理能力,因此,上述方法的目标识别的效率较低。
发明内容
本申请实施例的目的在于提供一种图像目标识别方法、装置及计算机设备,以提高目标识别的效率。具体技术方案如下:
第一方面,本申请实施例提供了一种图像目标识别方法,所述方法包括:
分别沿图像的水平视角方向和垂直视角方向进行特征提取,相应得到所述图像的横向特征序列和纵向特征序列;
将所述横向特征序列及所述纵向特征序列进行融合,得到融合特征;
利用预设激活函数,对所述融合特征进行激活,得到图像特征;
通过对所述图像特征进行解码,识别所述图像中的目标。
可选的,所述分别沿图像的水平视角方向和垂直视角方向进行特征提取,相应得到所述图像的横向特征序列和纵向特征序列,包括:
沿图像的水平视角方向进行卷积操作,得到第一卷积结果;将所述第一卷积结果确定为横向特征序列;
沿所述图像的垂直视角方向进行卷积操作,得到第二卷积结果;将所述第二卷积结果确定为纵向特征序列。
可选的,在所述将所述横向特征序列及所述纵向特征序列进行融合,得到融合特征之前,所述方法还包括:
基于所述横向特征序列及所述纵向特征序列,对所述图像进行卷积操作,提取用于表示所述横向特征序列及所述纵向特征序列分别在图像形变中所占权值的形变参数;
所述将所述横向特征序列及所述纵向特征序列进行融合,得到融合特征,包括:
根据所述形变参数,以加权和的方式,将所述横向特征序列及所述纵向特征序列进行融合,得到融合特征。
可选的,所述分别沿图像的水平视角方向和垂直视角方向进行特征提取,相应得到所述图像的横向特征序列和纵向特征序列,包括:
沿图像的水平视角方向进行卷积操作,得到第一卷积结果;
将所述第一卷积结果中的行向量进行反向排列,得到第一反向序列;
将所述第一卷积结果及所述第一反向序列确定为横向特征序列;
沿所述图像的垂直视角方向进行卷积操作,得到第二卷积结果;
将所述第二卷积结果中的列向量进行反向排列,得到第二反向序列;
将所述第二卷积结果及所述第二反向序列确定为纵向特征序列。
可选的,在所述将所述横向特征序列及所述纵向特征序列进行融合,得到融合特征之前,所述方法还包括:
基于所述横向特征序列中的第一卷积结果、第一反向序列,以及所述纵向特征序列中的第二卷积结果、第二反向序列,对所述图像进行卷积操作,提取用于表示所述第一卷积结果、所述第一反向序列、所述第二卷积结果及所述第二反向序列分别在图像形变中所占权值的形变参数;
所述将所述横向特征序列及所述纵向特征序列进行融合,得到融合特征,包括:
根据所述形变参数,以加权和的方式,将所述第一卷积结果、所述第一反向序列、所述第二卷积结果及所述第二反向序列进行融合,得到融合特征。
可选的,所述将所述横向特征序列及所述纵向特征序列进行融合,得到融合特征,包括:
以拼接融合的方式,将所述横向特征序列及所述纵向特征序列进行融合,得到融合特征。
第二方面,本申请实施例提供了一种图像目标识别装置,所述装置包括:
特征提取模块,用于分别沿图像的水平视角方向和垂直视角方向进行特征提取,相应得到所述图像的横向特征序列和纵向特征序列;
融合模块,用于将所述横向特征序列及所述纵向特征序列进行融合,得到融合特征;
激活模块,用于利用预设激活函数,对所述融合特征进行激活,得到图像特征;
解码模块,用于通过对所述图像特征进行解码,识别所述图像中的目标。
可选的,所述特征提取模块,具体用于:
沿图像的水平视角方向进行卷积操作,得到第一卷积结果;将所述第一卷积结果确定为横向特征序列;
沿所述图像的垂直视角方向进行卷积操作,得到第二卷积结果;将所述第二卷积结果确定为纵向特征序列。
可选的,所述装置还包括:
第一形变参数提取模块,用于基于所述横向特征序列及所述纵向特征序列,对所述图像进行卷积操作,提取用于表示所述横向特征序列及所述纵向特征序列分别在图像形变中所占权值的形变参数;
所述融合模块,具体用于:
根据所述形变参数,以加权和的方式,将所述横向特征序列及所述纵向特征序列进行融合,得到融合特征。
可选的,所述特征提取模块,具体用于:
沿图像的水平视角方向进行卷积操作,得到第一卷积结果;
将所述第一卷积结果中的行向量进行反向排列,得到第一反向序列;
将所述第一卷积结果及所述第一反向序列确定为横向特征序列;
沿所述图像的垂直视角方向进行卷积操作,得到第二卷积结果;
将所述第二卷积结果中的列向量进行反向排列,得到第二反向序列;
将所述第二卷积结果及所述第二反向序列确定为纵向特征序列。
可选的,所述装置还包括:
第二形变参数提取模块,用于基于所述横向特征序列中的第一卷积结果、第一反向序列,以及所述纵向特征序列中的第二卷积结果、第二反向序列,对所述图像进行卷积操作,提取用于表示所述第一卷积结果、所述第一反向序列、所述第二卷积结果及所述第二反向序列分别在图像形变中所占权值的形变参数;
所述融合模块,具体用于:
根据所述形变参数,以加权和的方式,将所述第一卷积结果、所述第一反向序列、所述第二卷积结果及所述第二反向序列进行融合,得到融合特征。
可选的,所述融合模块,具体用于:
以拼接融合的方式,将所述横向特征序列及所述纵向特征序列进行融合,得到融合特征。
第三方面,本申请实施例提供了一种计算机设备,包括处理器和存储器,其中,
所述存储器,用于存放计算机程序;
所述处理器,用于执行所述存储器上所存放的计算机程序时,实现如第一方面所述的方法步骤。
综上可见,本申请实施例提供的方案中,基于沿图像的水平视角方向和垂直视角方向分别进行特征提取,能够得到图像的横向特征序列和纵向特征序列,然后将横向特征序列和纵向特征序列进行融合,得到融合特征,最后,通过对融合特征进行激活后得到的图像特征进行解码,识别图像中的目标。针对发生倾斜的目标,目标在不同视角方向上的分量不同,因此,通过沿不同视角进行特征提取,然后经过特征融合的方式,得到目标在各视角下的完整的特征信息,再利用预设激活函数,将融合特征进行激活,得到能够与模板匹配的图像特征,最后通过解码即可识别目标,此方法无需提前对发生倾斜的目标进行角度判断,也就不需要针对不同角度的目标生成各种目标分类器,简化了目标识别的实现步骤,在保证目标识别的准确率的基础上,提高了目标识别的效率。
为了更清楚地说明本申请实施例和现有技术的技术方案,下面对实施例和现有技术中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1为现实的目标识别场景中目标发生倾斜的示意图;
图2为本申请实施例的图像目标识别方法的一种流程示意图;
图3a为本申请实施例的图像横向卷积示意图;
图3b为本申请实施例的图像纵向卷积示意图;
图4为本申请实施例的图像目标识别方法的另一种流程示意图;
图5为本申请实施例的目标识别示例图;
图6为本申请实施例的图像目标识别装置的一种结构示意图;
图7为本申请实施例的图像目标识别装置的另一种结构示意图;
图8为本申请实施例的图像目标识别装置的再一种结构示意图;
图9为本申请实施例的计算机设备的结构示意图。
为使本申请的目的、技术方案、及优点更加清楚明白,以下参照附图并举实施例,对本申请进一步详细说明。显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
下面通过具体实施例,对本申请进行详细的说明。
在现实的目标识别场景中,存在很多倾斜、旋转、透视及弯曲的目标,例如倒置的文字等,如图1所示。传统的目标识别方法受限于模板的固定,无法准确识别出发生倾斜、旋转、透视或者弯曲的目标。因此,为了能够准确识别出上述目标,并且提高目标识别的效率,本申请实施例提供了一种图像目标识别方法、装置及计算机设备。下面,首先对本申请实施例所提供的图像目标识别方法进行介绍。
本申请实施例所提供的一种图像目标识别方法的执行主体可以为具有图像识别功能的计算机设备,也可以为具有图像识别功能的摄像机,执行主体中至少包括具有数据处理能力的核心处理芯片。实现本申请实施例所提供的一种图像目标识别方法的方式可以为设置于执行主体中的软件、硬件电路和 逻辑电路的至少一种方式。
如图2所示,为本申请实施例所提供的一种图像目标识别方法,该图像目标识别方法可以包括如下步骤。
S201,分别沿图像的水平视角方向和垂直视角方向进行特征提取,相应得到图像的横向特征序列和纵向特征序列。
对于一图像,可以沿两个视角方向分别进行特征提取,即分别沿图像的水平视角方向和垂直视角方向进行特征提取。沿图像的水平视角方向进行特征提取的方式,可以是沿图像的水平视角方向进行卷积操作,称之为横向卷积,如图3a所示;沿图像的垂直视角方向进行特征提取的方式,可以是沿图像的垂直视角方向进行卷积操作,称之为纵向卷积,如图3b所示。当然,特征提取的方式还可以为其他特征提取的方式,可以提取到沿图像的水平视角方向的特征以及沿图像的垂直视角方向的特征即可,这里不再一一赘述。
为了能够进一步提高处理效率,沿两个视角方向分别进行特征提取可以是并行执行的。通过特征提取,相应地可以得到图像的横向特征序列和纵向特征序列。其中,横向特征序列为列数等效为1的特征序列,纵向特征序列为行数等效为1的特征序列,在这里,列数或者行数等效为1,并不表示列数或者行数一定等于1,仅表示可以作为一个整体的数据进行处理。
S202,将横向特征序列及纵向特征序列进行融合,得到融合特征。
由于横向特征序列及纵向特征序列是图像在不同视角下的特征集合,对图像中的目标进行识别,需要图像完整的特征信息,因此,需要对横向特征序列及纵向特征序列进行融合,得到的融合特征具有图像完整的特征信息。特征融合可以是通过拼接融合Concat的方式,将横向特征序列和纵向特征序列直接拼接到一起,得到融合特征;也可以是通过特征点叠加Eltwise Add的方式,将对应的特征值相加,得到融合特征;还可以是基于用于表示横向特征序列和纵向特征序列的形变程度的权值,通过加权和的方式,得到融合特征。当然这三种特征融合的方式只是举例说明,特征融合的方式不仅限于此,其他特征融合的方式均属于本申请实施例的保护范围,这里不再一一赘述。
S203,利用预设激活函数,对融合特征进行激活,得到图像特征。
通过融合得到的融合特征可以体现图像完整的特征信息,但是,同一个目标的特征往往较为分散,需要将融合特征通过预设激活函数把各目标的特征保留并映射出来,得到图像特征,所得到的图像特征可以与模板匹配。其中,预设激活函数可以为非线性激活函数中的双曲线tanh函数、S型生长曲线Sigmoid函数、修正线性单元ReLU函数等,也可以为其他类别的激活函数,这里不再一一列举。
S204,通过对图像特征进行解码,识别图像中的目标。
在通过激活得到图像特征后,可以通过后接解码器,对图像特征进行解码,针对不同的目标可以采用不同的解码器,例如,需要识别的指定目标为文字,则可以选择Attention解码器。当然,也可以选择常规的目标分类器,通过置信度对目标的类别进行识别。
应用本实施例,基于沿图像的水平视角方向和垂直视角方向分别进行特征提取,能够得到图像的横向特征序列和纵向特征序列,然后将横向特征序列和纵向特征序列进行融合,得到融合特征,最后,通过对融合特征进行激活后得到的图像特征进行解码,识别图像中的目标。针对发生倾斜的目标,目标在不同视角方向上的分量不同,因此,通过沿不同视角进行特征提取,然后经过特征融合的方式,得到目标在各视角下的完整的特征信息,再利用预设激活函数,将融合特征进行激活,得到能够与模板匹配的图像特征,最后通过解码即可识别目标,此方法无需提前对发生倾斜的目标进行角度判断,也就不需要针对不同角度的目标生成各种目标分类器,简化了目标识别的实现步骤,在保证目标识别的准确率的基础上,提高了目标识别的效率。
基于图2所示实施例,本申请实施例还提供了另一种图像目标识别方法,如图4所示,该图像目标识别方法包括如下步骤。
S401,分别沿图像的水平视角方向和垂直视角方向进行特征提取,相应得到图像的横向特征序列和纵向特征序列。
本实施例中,为了提高特征提取的效率,采用卷积运算,沿图像的水平视角方向进行卷积操作,得到第一卷积结果,并将第一卷积结果确定为横向 特征序列;沿图像的垂直视角方向进行卷积操作,得到第二卷积结果,并将第二卷积结果确定为纵向特征序列。也就是说,横向特征序列中仅包含对图像的行向量进行卷积操作得到的卷积结果,纵向特征序列中仅包括对图像的列向量进行卷积操作得到的卷积结果。
S402,基于横向特征序列及纵向特征序列,对图像进行卷积操作,提取用于表示横向特征序列及纵向特征序列分别在图像形变中所占权值的形变参数。
针对得到的横向特征序列和纵向特征序列,特征序列的长度、参数等属性,说明了该视角方向的特征所占比重,即该视角方向的特征序列在图像形变中所占权值,通过对图像进行横纵卷积操作,可以提取到各视角方向对应的权值。
S403,根据形变参数,以加权和的方式,将横向特征序列及纵向特征序列进行融合,得到融合特征。
在得到形变参数之后,形变参数表示了各视角方向的特征序列在图像形变中所占比重,则可以通过加权和的方式,将不同视角方向对应的特征序列进行融合,融合后得到的融合特征可以反映各视角方向的形变程度。例如,通过特征提取,得到的横向特征序列为v、纵向特征序列为h,通过卷积操作得到的形变参数为(0.6,0.4),则融合特征中每个特征值为0.6×v
ij+0.4×h
ij。
S404,利用预设激活函数,对融合特征进行激活,得到图像特征。
S405,通过对图像特征进行解码,识别图像中的目标。
本实施例中S404、S405与图2所示实施例中的S203、S204相同,这里不再赘述。
应用本实施例,基于沿图像的水平视角方向和垂直视角方向分别进行特征提取,能够得到图像的横向特征序列和纵向特征序列,然后将横向特征序列和纵向特征序列进行融合,得到融合特征,最后,通过对融合特征进行激活后得到的图像特征进行解码,识别图像中的目标。针对发生倾斜的目标,目标在不同视角方向上的分量不同,因此,通过沿不同视角进行特征提取,然后经过特征融合的方式,得到目标在各视角下的完整的特征信息,再利用 预设激活函数,将融合特征进行激活,得到能够与模板匹配的图像特征,最后通过解码即可识别目标,此方法无需提前对发生倾斜的目标进行角度判断,也就不需要针对不同角度的目标生成各种目标分类器,简化了目标识别的实现步骤,在保证目标识别的准确率的基础上,提高了目标识别的效率。并且,通过对横向特征序列和纵向特征序列进行卷积操作,得到用于表示横向特征序列及纵向特征序列分别在图像形变中所占权值的形变参数,通过加权和的方式进行特征融合,使得融合特征可以更真实的反映目标在不同视角方向的形变程度,可以进一步提高目标识别的准确性。
为了保证目标识别的高准确性,本申请实施例还可以在沿图像水平视角方向对图像的行向量进行卷积操作得到第一卷积结果之后,再将第一卷积结果中的行向量进行反向排列得到第一反向序列,在沿图像垂直视角方向对图像的列向量进行卷积操作得到第二卷积结果之后,再将第二卷积结果中的列向量进行反向排列得到第二反向序列。将第一卷积结果和第一反向序列确定为横向特征序列、将第二卷积结果和第二反向序列确定为纵向特征序列。横向特征序列和纵向特征序列中,通过卷积结果的正向排列和反向排列,可以更为直观的表示目标的倾斜情况。下面,结合具体实例,对该实施例进行介绍。
如图5所示,为本申请实施例的目标识别示例图。
第一步,沿图像的水平视角方向,对图像的行向量进行横向卷积,得到第一卷积结果,如图5中所示,第一卷积结果为
将第一卷积结果中的行向量进行反向排列,得到第一反向序列
将第一卷积结果及第一反向序列确定为横向特征序列。
第二步,沿图像的垂直视角方向,对图像的列向量进行卷积操作,得到第二卷积结果,如图5中所示,第二卷积结果为
将第二卷积结果中的列向量进行反向排列,得到第二反向序列
将第二卷积结果及第二反向序列确定为纵向特征序列。
第三步,基于横向特征序列中的第一卷积结果、第一反向序列,以及纵向特征序列中的第二卷积结果、第二反向序列,对图像进行卷积操作,提取用于表示第一卷积结果、第一反向序列、第二卷积结果及第二反向序列分别在图像形变中所占权值的形变参数α:(α
1,β
1,α
2,β
2)
T。
第五步,利用预设的双曲线激活函数,通过公式(2),对融合特征进行激活,得到图像特征h。
第六步,通过对图像特征进行解码,识别图像中的目标。
通过本方案,针对发生倾斜的目标,目标在不同视角方向上的分量不同,因此,通过沿不同视角进行特征提取,然后经过特征融合的方式,得到目标在各视角下的完整的特征信息,再利用预设激活函数,将融合特征进行激活,得到能够与模板匹配的图像特征,最后通过解码即可识别目标,此方法无需提前对发生倾斜的目标进行角度判断,也就不需要针对不同角度的目标生成各种目标分类器,简化了目标识别的实现步骤,在保证目标识别的准确率的基础上,提高了目标识别的效率。并且,横向特征序列和纵向特征序列中,通过卷积结果的正向排列和反向排列,可以更为直观的表示目标的倾斜情况,通过对横向特征序列和纵向特征序列进行卷积操作,得到用于表示横向特征序列及纵向特征序列分别在图像形变中所占权值的形变参数,通过加权和的方式进行特征融合,使得融合特征可以更真实的反映目标在不同视角方向的 形变程度,可以进一步提高目标识别的准确性。
相应于上述方法实施例,本申请实施例提供了一种图像目标识别装置,如图6所示,该图像目标识别装置可以包括:
特征提取模块610,用于分别沿图像的水平视角方向和垂直视角方向进行特征提取,相应得到所述图像的横向特征序列和纵向特征序列。
融合模块620,用于将所述横向特征序列及所述纵向特征序列进行融合,得到融合特征。
激活模块630,用于利用预设激活函数,对所述融合特征进行激活,得到图像特征。
解码模块640,用于通过对所述图像特征进行解码,识别所述图像中的目标。
应用本实施例,基于沿图像的水平视角方向和垂直视角方向分别进行特征提取,能够得到图像的横向特征序列和纵向特征序列,然后将横向特征序列和纵向特征序列进行融合,得到融合特征,最后,通过对融合特征进行激活后得到的图像特征进行解码,识别图像中的目标。针对发生倾斜的目标,目标在不同视角方向上的分量不同,因此,通过沿不同视角进行特征提取,然后经过特征融合的方式,得到目标在各视角下的完整的特征信息,再利用预设激活函数,将融合特征进行激活,得到能够与模板匹配的图像特征,最后通过解码即可识别目标,此方法无需提前对发生倾斜的目标进行角度判断,也就不需要针对不同角度的目标生成各种目标分类器,简化了目标识别的实现步骤,在保证目标识别的准确率的基础上,提高了目标识别的效率。
可选的,所述融合模块620,具体可以用于:以拼接融合的方式,将所述横向特征序列及所述纵向特征序列进行融合,得到融合特征。
本申请实施例的图像目标识别装置为应用如图2所示实施例的图像目标识别方法的装置,则图2所示实施例的图像目标识别方法的所有实施例均适用于该图像目标识别装置,且均能达到相同或相似的有益效果。
基于图6所示实施例,本申请实施例提供了另一种图像目标识别装置,如图7所示,该图像目标识别装置可以包括:
特征提取模块710,用于沿图像的水平视角方向进行卷积操作,得到第一卷积结果;将所述第一卷积结果确定为横向特征序列;沿所述图像的垂直视角方向进行卷积操作,得到第二卷积结果;将所述第二卷积结果确定为纵向特征序列。
第一形变参数提取模块720,用于基于所述横向特征序列及所述纵向特征序列,对所述图像进行卷积操作,提取用于表示所述横向特征序列及所述纵向特征序列分别在图像形变中所占权值的形变参数。
融合模块730,用于根据所述形变参数,以加权和的方式,将所述横向特征序列及所述纵向特征序列进行融合,得到融合特征。
激活模块740,用于利用预设激活函数,对所述融合特征进行激活,得到图像特征。
解码模块750,用于通过对所述图像特征进行解码,识别所述图像中的目标。
应用本实施例,基于沿图像的水平视角方向和垂直视角方向分别进行特征提取,能够得到图像的横向特征序列和纵向特征序列,然后将横向特征序列和纵向特征序列进行融合,得到融合特征,最后,通过对融合特征进行激活后得到的图像特征进行解码,识别图像中的目标。针对发生倾斜的目标,目标在不同视角方向上的分量不同,因此,通过沿不同视角进行特征提取,然后经过特征融合的方式,得到目标在各视角下的完整的特征信息,再利用预设激活函数,将融合特征进行激活,得到能够与模板匹配的图像特征,最后通过解码即可识别目标,此方法无需提前对发生倾斜的目标进行角度判断,也就不需要针对不同角度的目标生成各种目标分类器,简化了目标识别的实现步骤,在保证目标识别的准确率的基础上,提高了目标识别的效率。并且,通过对横向特征序列和纵向特征序列进行卷积操作,得到用于表示横向特征序列及纵向特征序列分别在图像形变中所占权值的形变参数,通过加权和的方式进行特征融合,使得融合特征可以更真实的反映目标在不同视角方向的 形变程度,可以进一步提高目标识别的准确性。
基于图6所示实施例,本申请实施例提供了另一种图像目标识别装置,如图8所示,该图像目标识别装置可以包括:
特征提取模块810,用于沿图像的水平视角方向进行卷积操作,得到第一卷积结果;将所述第一卷积结果中的行向量进行反向排列,得到第一反向序列;将所述第一卷积结果及所述第一反向序列确定为横向特征序列;沿所述图像的垂直视角方向进行卷积操作,得到第二卷积结果;将所述第二卷积结果中的列向量进行反向排列,得到第二反向序列;将所述第二卷积结果及所述第二反向序列确定为纵向特征序列。
第二形变参数提取模块820,用于基于所述横向特征序列中的第一卷积结果、第一反向序列,以及所述纵向特征序列中的第二卷积结果、第二反向序列,对所述图像进行卷积操作,提取用于表示所述第一卷积结果、所述第一反向序列、所述第二卷积结果及所述第二反向序列分别在图像形变中所占权值的形变参数。
融合模块830,用于根据所述形变参数,以加权和的方式,将所述第一卷积结果、所述第一反向序列、所述第二卷积结果及所述第二反向序列进行融合,得到融合特征。
激活模块840,用于利用预设激活函数,对所述融合特征进行激活,得到图像特征。
解码模块850,用于通过对所述图像特征进行解码,识别所述图像中的目标。
应用本实施例,针对发生倾斜的目标,目标在不同视角方向上的分量不同,因此,通过沿不同视角进行特征提取,然后经过特征融合的方式,得到目标在各视角下的完整的特征信息,再利用预设激活函数,将融合特征进行激活,得到能够与模板匹配的图像特征,最后通过解码即可识别目标,此方法无需提前对发生倾斜的目标进行角度判断,也就不需要针对不同角度的目标生成各种目标分类器,简化了目标识别的实现步骤,在保证目标识别的准确率的基础上,提高了目标识别的效率。并且,横向特征序列和纵向特征序 列中,通过卷积结果的正向排列和反向排列,可以更为直观的表示目标的倾斜情况,通过对横向特征序列和纵向特征序列进行卷积操作,得到用于表示横向特征序列及纵向特征序列分别在图像形变中所占权值的形变参数,通过加权和的方式进行特征融合,使得融合特征可以更真实的反映目标在不同视角方向的形变程度,可以进一步提高目标识别的准确性。
另外,相应于上述实施例提供的卷积运算方法,本申请实施例提供了一种计算机设备,如图9所示,包括处理器901和存储器902,其中,存储器902,用于存放计算机程序;处理器901,用于执行存储器902上所存放的计算机程序时,实现如下步骤:
分别沿图像的水平视角方向和垂直视角方向进行特征提取,相应得到所述图像的横向特征序列和纵向特征序列;将所述横向特征序列及所述纵向特征序列进行融合,得到融合特征;利用预设激活函数,对所述融合特征进行激活,得到图像特征;通过对所述图像特征进行解码,识别所述图像中的目标。
可选的,所述处理器901在实现所述分别沿图像的水平视角方向和垂直视角方向进行特征提取,相应得到所述图像的横向特征序列和纵向特征序列的步骤时,具体可以实现如下步骤:沿图像的水平视角方向进行卷积操作,得到第一卷积结果;将所述第一卷积结果确定为横向特征序列;沿所述图像的垂直视角方向进行卷积操作,得到第二卷积结果;将所述第二卷积结果确定为纵向特征序列。
可选的,所述处理器901还可以实现如下步骤:基于所述横向特征序列及所述纵向特征序列,对所述图像进行卷积操作,提取用于表示所述横向特征序列及所述纵向特征序列分别在图像形变中所占权值的形变参数。
所述处理器901在实现所述将所述横向特征序列及所述纵向特征序列进行融合,得到融合特征的步骤时,具体可以实现如下步骤:根据所述形变参数,以加权和的方式,将所述横向特征序列及所述纵向特征序列进行融合,得到融合特征。
可选的,所述处理器901在实现所述分别沿图像的水平视角方向和垂直 视角方向进行特征提取,相应得到所述图像的横向特征序列和纵向特征序列的步骤时,具体可以实现如下步骤:沿图像的水平视角方向进行卷积操作,得到第一卷积结果;将所述第一卷积结果中的行向量进行反向排列,得到第一反向序列;将所述第一卷积结果及所述第一反向序列确定为横向特征序列;沿所述图像的垂直视角方向进行卷积操作,得到第二卷积结果;将所述第二卷积结果中的列向量进行反向排列,得到第二反向序列;将所述第二卷积结果及所述第二反向序列确定为纵向特征序列。
可选的,所述处理器901还可以实现如下步骤:基于所述横向特征序列中的第一卷积结果、第一反向序列,以及所述纵向特征序列中的第二卷积结果、第二反向序列,对所述图像进行卷积操作,提取用于表示所述第一卷积结果、所述第一反向序列、所述第二卷积结果及所述第二反向序列分别在图像形变中所占权值的形变参数。
所述处理器901在实现所述将所述横向特征序列及所述纵向特征序列进行融合,得到融合特征的步骤时,具体可以实现如下步骤:根据所述形变参数,以加权和的方式,将所述第一卷积结果、所述第一反向序列、所述第二卷积结果及所述第二反向序列进行融合,得到融合特征。
可选的,所述处理器901在实现所述将所述横向特征序列及所述纵向特征序列进行融合,得到融合特征的步骤时,具体可以实现如下步骤:以拼接融合的方式,将所述横向特征序列及所述纵向特征序列进行融合,得到融合特征。
存储器902与处理器901之间可以通过有线连接或者无线连接的方式进行数据传输,并且计算机设备可以通过有线通信接口或者无线通信接口与其他的设备进行通信。需要说明的一点,图9中只给出了处理器901与存储器902之间通过总线传输数据的示例,并不是对具体传输方式的限定。
上述存储器可以包括RAM(Random Access Memory,随机存取存储器),也可以包括NVM(Non-volatile Memory,非易失性存储器),例如至少一个磁盘存储器。可选的,存储器还可以是至少一个位于远离上述处理器的存储装置。
上述处理器可以是通用处理器,包括CPU(Central Processing Unit,中央处理器)、NP(Network Processor,网络处理器)等;还可以是DSP(Digital Signal Processor,数字信号处理器)、ASIC(Application Specific Integrated Circuit,专用集成电路)、FPGA(Field-Programmable Gate Array,现场可编程门阵列)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。
本实施例中,该计算机设备的处理器通过读取并运行存储器中存储的计算机程序,该计算机程序在运行时执行本申请实施例所提供的图像目标识别方法,因此能够实现:针对发生倾斜的目标,目标在不同视角方向上的分量不同,因此,通过沿不同视角进行特征提取,然后经过特征融合的方式,得到目标在各视角下的完整的特征信息,再利用预设激活函数,将融合特征进行激活,得到能够与模板匹配的图像特征,最后通过解码即可识别目标,此方法无需提前对发生倾斜的目标进行角度判断,也就不需要针对不同角度的目标生成各种目标分类器,简化了目标识别的实现步骤,在保证目标识别的准确率的基础上,提高了目标识别的效率。
另外,相应于上述实施例所提供的图像目标识别方法,本申请实施例提供了一种计算机可读存储介质,用于存储计算机程序,所述计算机程序用于在运行时执行:本申请实施例所提供的图像目标识别方法。
本实施例中,计算机可读存储介质存储有在运行时执行本申请实施例所提供的图像目标识别方法的计算机程序,因此能够实现:针对发生倾斜的目标,目标在不同视角方向上的分量不同,因此,通过沿不同视角进行特征提取,然后经过特征融合的方式,得到目标在各视角下的完整的特征信息,再利用预设激活函数,将融合特征进行激活,得到能够与模板匹配的图像特征,最后通过解码即可识别目标,此方法无需提前对发生倾斜的目标进行角度判断,也就不需要针对不同角度的目标生成各种目标分类器,简化了目标识别的实现步骤,在保证目标识别的准确率的基础上,提高了目标识别的效率。
另外,相应于上述实施例所提供的图像目标识别方法,本申请实施例提供了一种应用程序,用于在运行时执行:本申请实施例所提供的图像目标识别方法。
本实施例中,应用程序在运行时执行本申请实施例所提供的图像目标识别方法,因此能够实现:针对发生倾斜的目标,目标在不同视角方向上的分量不同,因此,通过沿不同视角进行特征提取,然后经过特征融合的方式,得到目标在各视角下的完整的特征信息,再利用预设激活函数,将融合特征进行激活,得到能够与模板匹配的图像特征,最后通过解码即可识别目标,此方法无需提前对发生倾斜的目标进行角度判断,也就不需要针对不同角度的目标生成各种目标分类器,简化了目标识别的实现步骤,在保证目标识别的准确率的基础上,提高了目标识别的效率。
对于计算机设备、计算机可读存储介质以及应用程序实施例而言,由于其所涉及的方法内容基本相似于前述的方法实施例,所以描述的比较简单,相关之处参见方法实施例的部分说明即可。
需要说明的是,在本文中,诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。
本说明书中的各个实施例均采用相关的方式描述,各个实施例之间相同相似的部分互相参见即可,每个实施例重点说明的都是与其他实施例的不同之处。尤其,对于装置、计算机设备、计算机可读存储介质以及应用程序实施例而言,由于其基本相似于方法实施例,所以描述的比较简单,相关之处参见方法实施例的部分说明即可。
以上所述仅为本申请的较佳实施例而已,并不用以限制本申请,凡在本申请的精神和原则之内,所做的任何修改、等同替换、改进等,均应包含在本申请保护的范围之内。
Claims (18)
- 一种图像目标识别方法,其特征在于,所述方法包括:分别沿图像的水平视角方向和垂直视角方向进行特征提取,相应得到所述图像的横向特征序列和纵向特征序列;将所述横向特征序列及所述纵向特征序列进行融合,得到融合特征;利用预设激活函数,对所述融合特征进行激活,得到图像特征;通过对所述图像特征进行解码,识别所述图像中的目标。
- 根据权利要求1所述的方法,其特征在于,所述分别沿图像的水平视角方向和垂直视角方向进行特征提取,相应得到所述图像的横向特征序列和纵向特征序列,包括:沿图像的水平视角方向进行卷积操作,得到第一卷积结果;将所述第一卷积结果确定为横向特征序列;沿所述图像的垂直视角方向进行卷积操作,得到第二卷积结果;将所述第二卷积结果确定为纵向特征序列。
- 根据权利要求2所述的方法,其特征在于,在所述将所述横向特征序列及所述纵向特征序列进行融合,得到融合特征之前,所述方法还包括:基于所述横向特征序列及所述纵向特征序列,对所述图像进行卷积操作,提取用于表示所述横向特征序列及所述纵向特征序列分别在图像形变中所占权值的形变参数;所述将所述横向特征序列及所述纵向特征序列进行融合,得到融合特征,包括:根据所述形变参数,以加权和的方式,将所述横向特征序列及所述纵向特征序列进行融合,得到融合特征。
- 根据权利要求1所述的方法,其特征在于,所述分别沿图像的水平视角方向和垂直视角方向进行特征提取,相应得到所述图像的横向特征序列和纵向特征序列,包括:沿图像的水平视角方向进行卷积操作,得到第一卷积结果;将所述第一卷积结果中的行向量进行反向排列,得到第一反向序列;将所述第一卷积结果及所述第一反向序列确定为横向特征序列;沿所述图像的垂直视角方向进行卷积操作,得到第二卷积结果;将所述第二卷积结果中的列向量进行反向排列,得到第二反向序列;将所述第二卷积结果及所述第二反向序列确定为纵向特征序列。
- 根据权利要求4所述的方法,其特征在于,在所述将所述横向特征序列及所述纵向特征序列进行融合,得到融合特征之前,所述方法还包括:基于所述横向特征序列中的第一卷积结果、第一反向序列,以及所述纵向特征序列中的第二卷积结果、第二反向序列,对所述图像进行卷积操作,提取用于表示所述第一卷积结果、所述第一反向序列、所述第二卷积结果及所述第二反向序列分别在图像形变中所占权值的形变参数;所述将所述横向特征序列及所述纵向特征序列进行融合,得到融合特征,包括:根据所述形变参数,以加权和的方式,将所述第一卷积结果、所述第一反向序列、所述第二卷积结果及所述第二反向序列进行融合,得到融合特征。
- 根据权利要求1所述的方法,其特征在于,所述将所述横向特征序列及所述纵向特征序列进行融合,得到融合特征,包括:以拼接融合的方式,将所述横向特征序列及所述纵向特征序列进行融合,得到融合特征。
- 一种图像目标识别装置,其特征在于,所述装置包括:特征提取模块,用于分别沿图像的水平视角方向和垂直视角方向进行特征提取,相应得到所述图像的横向特征序列和纵向特征序列;融合模块,用于将所述横向特征序列及所述纵向特征序列进行融合,得到融合特征;激活模块,用于利用预设激活函数,对所述融合特征进行激活,得到图像特征;解码模块,用于通过对所述图像特征进行解码,识别所述图像中的目标。
- 根据权利要求7所述的装置,其特征在于,所述特征提取模块,具体用于:沿图像的水平视角方向进行卷积操作,得到第一卷积结果;将所述第一卷积结果确定为横向特征序列;沿所述图像的垂直视角方向进行卷积操作,得到第二卷积结果;将所述第二卷积结果确定为纵向特征序列。
- 根据权利要求8所述的装置,其特征在于,所述装置还包括:第一形变参数提取模块,用于基于所述横向特征序列及所述纵向特征序列,对所述图像进行卷积操作,提取用于表示所述横向特征序列及所述纵向特征序列分别在图像形变中所占权值的形变参数;所述融合模块,具体用于:根据所述形变参数,以加权和的方式,将所述横向特征序列及所述纵向特征序列进行融合,得到融合特征。
- 根据权利要求7所述的装置,其特征在于,所述特征提取模块,具体用于:沿图像的水平视角方向进行卷积操作,得到第一卷积结果;将所述第一卷积结果中的行向量进行反向排列,得到第一反向序列;将所述第一卷积结果及所述第一反向序列确定为横向特征序列;沿所述图像的垂直视角方向进行卷积操作,得到第二卷积结果;将所述第二卷积结果中的列向量进行反向排列,得到第二反向序列;将所述第二卷积结果及所述第二反向序列确定为纵向特征序列。
- 根据权利要求10所述的装置,其特征在于,所述装置还包括:第二形变参数提取模块,用于基于所述横向特征序列中的第一卷积结果、第一反向序列,以及所述纵向特征序列中的第二卷积结果、第二反向序列,对所述图像进行卷积操作,提取用于表示所述第一卷积结果、所述第一反向序列、所述第二卷积结果及所述第二反向序列分别在图像形变中所占权值的形变参数;所述融合模块,具体用于:根据所述形变参数,以加权和的方式,将所述第一卷积结果、所述第一反向序列、所述第二卷积结果及所述第二反向序列进行融合,得到融合特征。
- 根据权利要求7所述的装置,其特征在于,所述融合模块,具体用于:以拼接融合的方式,将所述横向特征序列及所述纵向特征序列进行融合,得到融合特征。
- 一种计算机设备,其特征在于,包括处理器和存储器,其中,所述存储器,用于存放计算机程序;所述处理器,用于执行所述存储器上所存放的计算机程序时,实现如下步骤:分别沿图像的水平视角方向和垂直视角方向进行特征提取,相应得到所述图像的横向特征序列和纵向特征序列;将所述横向特征序列及所述纵向特征序列进行融合,得到融合特征;利用预设激活函数,对所述融合特征进行激活,得到图像特征;通过对所述图像特征进行解码,识别所述图像中的目标。
- 根据权利要求13所述的计算机设备,其特征在于,所述处理器在实现所述分别沿图像的水平视角方向和垂直视角方向进行特征提取,相应得到所述图像的横向特征序列和纵向特征序列的步骤时,具体实现如下步骤:沿图像的水平视角方向进行卷积操作,得到第一卷积结果;将所述第一卷积结果确定为横向特征序列;沿所述图像的垂直视角方向进行卷积操作,得到第二卷积结果;将所述 第二卷积结果确定为纵向特征序列。
- 根据权利要求14所述的计算机设备,其特征在于,所述处理器还实现如下步骤:基于所述横向特征序列及所述纵向特征序列,对所述图像进行卷积操作,提取用于表示所述横向特征序列及所述纵向特征序列分别在图像形变中所占权值的形变参数;所述处理器在实现所述将所述横向特征序列及所述纵向特征序列进行融合,得到融合特征的步骤时,具体实现如下步骤:根据所述形变参数,以加权和的方式,将所述横向特征序列及所述纵向特征序列进行融合,得到融合特征。
- 根据权利要求13所述的计算机设备,其特征在于,所述处理器在实现所述分别沿图像的水平视角方向和垂直视角方向进行特征提取,相应得到所述图像的横向特征序列和纵向特征序列的步骤时,具体实现如下步骤:沿图像的水平视角方向进行卷积操作,得到第一卷积结果;将所述第一卷积结果中的行向量进行反向排列,得到第一反向序列;将所述第一卷积结果及所述第一反向序列确定为横向特征序列;沿所述图像的垂直视角方向进行卷积操作,得到第二卷积结果;将所述第二卷积结果中的列向量进行反向排列,得到第二反向序列;将所述第二卷积结果及所述第二反向序列确定为纵向特征序列。
- 根据权利要求16所述的计算机设备,其特征在于,所述处理器还实现如下步骤:基于所述横向特征序列中的第一卷积结果、第一反向序列,以及所述纵向特征序列中的第二卷积结果、第二反向序列,对所述图像进行卷积操作,提取用于表示所述第一卷积结果、所述第一反向序列、所述第二卷积结果及所述第二反向序列分别在图像形变中所占权值的形变参数;所述处理器在实现所述将所述横向特征序列及所述纵向特征序列进行融 合,得到融合特征的步骤时,具体实现如下步骤:根据所述形变参数,以加权和的方式,将所述第一卷积结果、所述第一反向序列、所述第二卷积结果及所述第二反向序列进行融合,得到融合特征。
- 根据权利要求13所述的计算机设备,其特征在于,所述处理器在实现所述将所述横向特征序列及所述纵向特征序列进行融合,得到融合特征的步骤时,具体实现如下步骤:以拼接融合的方式,将所述横向特征序列及所述纵向特征序列进行融合,得到融合特征。
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US16/756,427 US11347977B2 (en) | 2017-10-18 | 2018-09-28 | Lateral and longitudinal feature based image object recognition method, computer device, and non-transitory computer readable storage medium |
EP18867472.5A EP3699818A4 (en) | 2017-10-18 | 2018-09-28 | PROCESS FOR RECOGNIZING IMAGE OBJECT, APPARATUS, AND COMPUTER DEVICE |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710969721.7A CN109685058B (zh) | 2017-10-18 | 2017-10-18 | 一种图像目标识别方法、装置及计算机设备 |
CN201710969721.7 | 2017-10-18 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2019076188A1 true WO2019076188A1 (zh) | 2019-04-25 |
Family
ID=66173116
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2018/108301 WO2019076188A1 (zh) | 2017-10-18 | 2018-09-28 | 一种图像目标识别方法、装置及计算机设备 |
Country Status (4)
Country | Link |
---|---|
US (1) | US11347977B2 (zh) |
EP (1) | EP3699818A4 (zh) |
CN (1) | CN109685058B (zh) |
WO (1) | WO2019076188A1 (zh) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111291627A (zh) * | 2020-01-16 | 2020-06-16 | 广州酷狗计算机科技有限公司 | 人脸识别方法、装置及计算机设备 |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110197206B (zh) * | 2019-05-10 | 2021-07-13 | 杭州深睿博联科技有限公司 | 图像处理的方法及装置 |
CN113221709B (zh) * | 2021-04-30 | 2022-11-25 | 芜湖美的厨卫电器制造有限公司 | 用于识别用户运动的方法、装置及热水器 |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101833653A (zh) * | 2010-04-02 | 2010-09-15 | 上海交通大学 | 低分辨率视频中的人物识别方法 |
CN101944174A (zh) * | 2009-07-08 | 2011-01-12 | 西安电子科技大学 | 车牌字符的识别方法 |
US20120027305A1 (en) * | 2010-07-27 | 2012-02-02 | Pantech Co., Ltd. | Apparatus to provide guide for augmented reality object recognition and method thereof |
CN106803090A (zh) * | 2016-12-05 | 2017-06-06 | 中国银联股份有限公司 | 一种图像识别方法和装置 |
CN106960206A (zh) * | 2017-02-08 | 2017-07-18 | 北京捷通华声科技股份有限公司 | 字符识别方法和字符识别系统 |
CN107103331A (zh) * | 2017-04-01 | 2017-08-29 | 中北大学 | 一种基于深度学习的图像融合方法 |
CN107122712A (zh) * | 2017-03-27 | 2017-09-01 | 大连大学 | 基于卷积神经网络和双向局部特征聚合描述向量的掌纹图像识别方法 |
Family Cites Families (21)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070189607A1 (en) * | 2005-10-17 | 2007-08-16 | Siemens Corporate Research Inc | System and method for efficient feature dimensionality and orientation estimation |
GB0616293D0 (en) * | 2006-08-16 | 2006-09-27 | Imp Innovations Ltd | Method of image processing |
US8630489B2 (en) * | 2009-05-05 | 2014-01-14 | Microsoft Corporation | Efficient image matching |
US8494259B2 (en) * | 2009-12-28 | 2013-07-23 | Teledyne Scientific & Imaging, Llc | Biologically-inspired metadata extraction (BIME) of visual data using a multi-level universal scene descriptor (USD) |
US9760789B2 (en) | 2012-04-17 | 2017-09-12 | Conduent Business Services, Llc | Robust cropping of license plate images |
WO2014183259A1 (zh) * | 2013-05-14 | 2014-11-20 | 中国科学院自动化研究所 | 一种全色−多光谱遥感图像融合方法 |
CN103903238B (zh) * | 2014-03-21 | 2017-06-20 | 西安理工大学 | 图像特征的显著结构和相关结构融合方法 |
CN104091348B (zh) * | 2014-05-19 | 2017-04-05 | 南京工程学院 | 融合显著特征和分块模板的多目标跟踪方法 |
CN104268833B (zh) * | 2014-09-15 | 2018-06-22 | 江南大学 | 基于平移不变剪切波变换的图像融合方法 |
EP3058510B1 (en) * | 2014-11-28 | 2017-12-20 | FotoNation Limited | A method for producing a histogram of oriented gradients |
CN105740876B (zh) * | 2014-12-10 | 2019-11-22 | 阿里巴巴集团控股有限公司 | 一种图像预处理方法及装置 |
JP6324338B2 (ja) * | 2015-03-25 | 2018-05-16 | 株式会社日立ハイテクノロジーズ | 細胞診断支援装置、細胞診断支援方法、遠隔診断支援システム、及びサービス提供システム |
CN104978561A (zh) * | 2015-03-25 | 2015-10-14 | 浙江理工大学 | 融合梯度和光流特征的视频动作行为识别方法 |
WO2017165538A1 (en) * | 2016-03-22 | 2017-09-28 | Uru, Inc. | Apparatus, systems, and methods for integrating digital media content into other digital media content |
CN105893947B (zh) * | 2016-03-29 | 2019-12-03 | 江南大学 | 基于多局部相关特征学习的两视角人脸识别方法 |
CN106156768B (zh) * | 2016-07-01 | 2019-03-12 | 屈桢深 | 基于视觉的机动车行驶证检测方法 |
CN106407927B (zh) * | 2016-09-12 | 2019-11-05 | 河海大学常州校区 | 基于偏振成像的适用于水下目标检测的显著性视觉方法 |
CN106529446A (zh) * | 2016-10-27 | 2017-03-22 | 桂林电子科技大学 | 基于多分块深层卷积神经网络的车型识别方法和系统 |
JP6833496B2 (ja) * | 2016-12-19 | 2021-02-24 | 株式会社東芝 | 学習装置、紙葉類判別装置および紙葉類判別方法 |
CN108345827B (zh) * | 2017-01-24 | 2021-11-30 | 富士通株式会社 | 识别文档方向的方法、系统和神经网络 |
CN106991472A (zh) * | 2017-03-30 | 2017-07-28 | 中国人民解放军国防科学技术大学 | 一种融合ReLU激活函数与最大值池化的向量化实现方法 |
-
2017
- 2017-10-18 CN CN201710969721.7A patent/CN109685058B/zh active Active
-
2018
- 2018-09-28 EP EP18867472.5A patent/EP3699818A4/en not_active Ceased
- 2018-09-28 WO PCT/CN2018/108301 patent/WO2019076188A1/zh unknown
- 2018-09-28 US US16/756,427 patent/US11347977B2/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101944174A (zh) * | 2009-07-08 | 2011-01-12 | 西安电子科技大学 | 车牌字符的识别方法 |
CN101833653A (zh) * | 2010-04-02 | 2010-09-15 | 上海交通大学 | 低分辨率视频中的人物识别方法 |
US20120027305A1 (en) * | 2010-07-27 | 2012-02-02 | Pantech Co., Ltd. | Apparatus to provide guide for augmented reality object recognition and method thereof |
CN106803090A (zh) * | 2016-12-05 | 2017-06-06 | 中国银联股份有限公司 | 一种图像识别方法和装置 |
CN106960206A (zh) * | 2017-02-08 | 2017-07-18 | 北京捷通华声科技股份有限公司 | 字符识别方法和字符识别系统 |
CN107122712A (zh) * | 2017-03-27 | 2017-09-01 | 大连大学 | 基于卷积神经网络和双向局部特征聚合描述向量的掌纹图像识别方法 |
CN107103331A (zh) * | 2017-04-01 | 2017-08-29 | 中北大学 | 一种基于深度学习的图像融合方法 |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111291627A (zh) * | 2020-01-16 | 2020-06-16 | 广州酷狗计算机科技有限公司 | 人脸识别方法、装置及计算机设备 |
CN111291627B (zh) * | 2020-01-16 | 2024-04-19 | 广州酷狗计算机科技有限公司 | 人脸识别方法、装置及计算机设备 |
Also Published As
Publication number | Publication date |
---|---|
US11347977B2 (en) | 2022-05-31 |
EP3699818A1 (en) | 2020-08-26 |
US20200334504A1 (en) | 2020-10-22 |
EP3699818A4 (en) | 2020-12-23 |
CN109685058B (zh) | 2021-07-09 |
CN109685058A (zh) | 2019-04-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10824916B2 (en) | Weakly supervised learning for classifying images | |
US11481869B2 (en) | Cross-domain image translation | |
US11670071B2 (en) | Fine-grained image recognition | |
CN109255352B (zh) | 目标检测方法、装置及系统 | |
ES2924268T3 (es) | Procedimiento, aparato y dispositivo electrónico de recuperación de imágenes | |
US10410351B2 (en) | Automatically segmenting images based on natural language phrases | |
US11315281B2 (en) | Pupil positioning method and apparatus, VR/AR apparatus and computer readable medium | |
CN111652934B (zh) | 定位方法及地图构建方法、装置、设备、存储介质 | |
CN109960742B (zh) | 局部信息的搜索方法及装置 | |
US11132575B2 (en) | Combinatorial shape regression for face alignment in images | |
CN109727288A (zh) | 用于单目同时定位与地图构建的系统和方法 | |
CN111489396A (zh) | 利用临界边缘检测神经网络和几何模型确定相机参数 | |
WO2019076188A1 (zh) | 一种图像目标识别方法、装置及计算机设备 | |
US20170323149A1 (en) | Rotation invariant object detection | |
US20220327358A1 (en) | Feedback adversarial learning | |
CN112668573A (zh) | 目标检测定位置信度确定方法、装置、电子设备及存储介质 | |
CN107851196A (zh) | 一种图像模式匹配的方法及装置 | |
CN111680546A (zh) | 注意力检测方法、装置、电子设备及存储介质 | |
CN112102404B (zh) | 物体检测追踪方法、装置及头戴显示设备 | |
WO2015074405A1 (en) | Methods and devices for obtaining card information | |
Barra et al. | Fast quadtree-based pose estimation for security applications using face biometrics | |
CN116168410B (zh) | 一种基于神经网络的药盒信息识别方法及系统 | |
US20230401670A1 (en) | Multi-scale autoencoder generation method, electronic device and readable storage medium | |
US11532126B2 (en) | System and method for determining alpha values for alpha shapes | |
US9990537B2 (en) | Facial feature location using symmetry line |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 18867472 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
ENP | Entry into the national phase |
Ref document number: 2018867472 Country of ref document: EP Effective date: 20200518 |