CN108229492B - Method, device and system for extracting features - Google Patents

Method, device and system for extracting features Download PDF

Info

Publication number
CN108229492B
CN108229492B CN201710195256.6A CN201710195256A CN108229492B CN 108229492 B CN108229492 B CN 108229492B CN 201710195256 A CN201710195256 A CN 201710195256A CN 108229492 B CN108229492 B CN 108229492B
Authority
CN
China
Prior art keywords
feature
features
region
image
regions
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710195256.6A
Other languages
Chinese (zh)
Other versions
CN108229492A (en
Inventor
伊帅
赵海宇
田茂清
闫俊杰
王晓刚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Sensetime Technology Development Co Ltd
Original Assignee
Beijing Sensetime Technology Development Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Sensetime Technology Development Co Ltd filed Critical Beijing Sensetime Technology Development Co Ltd
Priority to CN201710195256.6A priority Critical patent/CN108229492B/en
Publication of CN108229492A publication Critical patent/CN108229492A/en
Application granted granted Critical
Publication of CN108229492B publication Critical patent/CN108229492B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)
  • Traffic Control Systems (AREA)

Abstract

The application discloses a method, a device and a system for extracting features, wherein the method for extracting the features comprises the following steps: generating initial image features corresponding to objects in an image; generating a plurality of region features corresponding to a plurality of regions of the image, respectively; and fusing the initial image features and the plurality of region features to obtain target image features of the object. According to the technical scheme for extracting the features, in the process of obtaining the features, not only is the feature extraction performed on the whole image containing the object, but also the feature extraction is performed on the multiple regions in the image, so that the process of extracting the features for the multiple regions enables the detail features in the multiple regions to be at least partially reserved, and finally obtained object features have better identification.

Description

Method, device and system for extracting features
Technical Field
The application relates to the field of computer vision and image processing, in particular to a method, a device and a system for extracting features.
Background
With the development of computer vision technology and the increase of the amount of image information, image recognition technology is applied in more and more fields, such as pedestrian retrieval, video monitoring, video classification, and the like, and in the image recognition technology, feature extraction is of great importance.
In the conventional image recognition technology, a method for extracting features of an image generally extracts overall features of a picture to describe the picture, wherein the overall features are the same in the fineness of various objects or backgrounds in the picture. The overall feature can be used for picture recognition, for example, and the overall feature is compared with the feature of the target classification to determine whether the object in the picture belongs to the target classification.
Disclosure of Invention
The embodiment of the application provides a technical scheme for extracting features.
One aspect of the embodiments of the present application discloses a method for extracting features, the method comprising: generating initial image features corresponding to objects in an image; generating a plurality of region features corresponding to a plurality of regions of the image, respectively; fusing the plurality of region features into a fused feature, wherein in a first overlapping region of the plurality of region features, a feature with the highest identifiability is selected as a feature corresponding to the first overlapping region, and the fused feature is generated based on the feature corresponding to the first overlapping region; and fusing the fused feature with the initial image feature as a target image feature of the object.
According to the technical scheme for extracting the features, in the process of obtaining the object feature map, not only is the feature extraction performed on the whole image containing the object, but also the feature extraction is performed on the multiple regions in the image, so that the detail features in the multiple regions are at least partially reserved in the process of extracting the features from the multiple regions, and the finally obtained object features have better identification.
Another aspect of the embodiments of the present application discloses an apparatus for extracting object features, the apparatus comprising: an image feature generation module that generates an initial image feature corresponding to an object in an image; a region feature generation module that generates a plurality of region features corresponding to a plurality of regions of the image, respectively; and a fusion module for fusing the initial image feature and the plurality of region features to obtain a target image feature of the object, the fusion module comprising: the first fusion submodule is used for fusing the plurality of region features into fusion features, wherein in a first overlapping region in the plurality of region features, the feature with the highest identifiability is selected as the feature corresponding to the first overlapping region, and the fusion features are generated based on the feature corresponding to the first overlapping region; and a second fusion submodule for fusing the fusion feature and the initial image feature into a target image feature of the object.
Another aspect of the embodiments of the present application further discloses a system for extracting object features, where the system includes: a memory storing executable instructions; one or more processors in communication with the memory to execute the executable instructions to: generating initial image features corresponding to objects in an image; generating a plurality of region features corresponding to a plurality of regions of the image, respectively; fusing the plurality of region features into a fused feature, wherein in a first overlapping region of the plurality of region features, a feature with the highest identifiability is selected as a feature corresponding to the first overlapping region, and the fused feature is generated based on the feature corresponding to the first overlapping region; and fusing the fused feature with the initial image feature as a target image feature of the object.
Yet another aspect of an embodiment of the present application discloses a non-transitory computer storage medium storing computer-readable instructions that, when executed, cause a processor to: generating initial image features corresponding to objects in an image; generating a plurality of region features corresponding to a plurality of regions of the image, respectively; and fusing the initial image features and the plurality of region features to obtain target image features of the object.
Drawings
In the following, exemplary and non-limiting embodiments of the present application are described with reference to the accompanying drawings. The figures are merely illustrative and generally do not represent exact proportions. The same or similar elements in different drawings are denoted by the same reference numerals.
FIG. 1 is a flow diagram illustrating a method 1000 of extracting features according to an embodiment of the present application;
FIG. 2 is a schematic diagram showing a pedestrian's joint used to illustrate a method according to an embodiment of the present application;
FIG. 3 is a schematic diagram showing a feature map extraction process for explaining a method according to an embodiment of the present application;
FIG. 4 is a schematic diagram showing a feature map extraction process for explaining a method according to an embodiment of the present application;
FIG. 5 is a schematic diagram illustrating a fusion process according to an embodiment of the present application;
FIG. 6 is a schematic diagram illustrating a fusion process according to an embodiment of the present application;
FIG. 7 is a schematic diagram illustrating an apparatus 700 for extracting features according to an embodiment of the present application; and
FIG. 8 is a schematic diagram of a computer system 800 suitable for implementing embodiments of the present application.
Detailed Description
Hereinafter, embodiments of the present application will be described in detail with reference to the detailed description and the accompanying drawings.
Fig. 1 is a flow chart illustrating a method 1000 of extracting features according to an embodiment of the present application. As shown in fig. 1, the method 1000 includes: step S1100, generating initial image features corresponding to objects in the image; step S1200 of generating a plurality of region features corresponding to a plurality of regions of the image, respectively; and step S1300, fusing the initial image feature and the plurality of region features to obtain the target image feature of the object.
In step S1100, the operation of generating initial image features corresponding to objects in the image may be implemented by a Convolutional Neural Network (CNN), which may include a plurality of perceptual modules. When an image including an object is input to the CNN, the plurality of perception modules are configured to perform convolution and pooling on the input image to obtain an overall feature of the image, where the overall feature is the initial image feature. Taking the case where the object is a pedestrian as an example, the initial image feature may be a feature that reflects the feature of the entire pedestrian. It should be noted that, in the embodiment of the present invention, the object in the image is not limited, and any other object may be used as the object, such as a vehicle, a human face, and the like.
In step S1200, a plurality of region features corresponding to a plurality of regions of the image, respectively, may be generated. In this step S1200, the region features for different regions in the image may be obtained, for example, the region features corresponding to different regions may be obtained by performing operations such as convolution and pooling on different regions in the image through CNN, or the feature extraction may be performed on the entire image first, and then the region features may be obtained from different regions in the overall image feature (e.g., the initial image feature), but the present application is not limited thereto. The plurality of regions may be extracted based on a structure of the object. As shown in fig. 2, in the case where the object is a pedestrian, a plurality of regions may be extracted in accordance with the human body structure, for example, three regions may be extracted, which are a region 201 including the head and shoulders (hereinafter, also referred to as the head and shoulders) of the pedestrian, a region 202 including the upper body of the pedestrian, and a region 203 including the lower body of the pedestrian, respectively, wherein the three regions 201 and 203 may partially overlap with each other.
In some embodiments, the plurality of regions of the image may be obtained by dividing the image into the plurality of regions according to the extraction reference points. The extracted reference point may be a connection point of a main part in the object structure. When the object is a pedestrian, the extraction reference point may be a human body localization point of the pedestrian, for example, a joint of the human body. As shown in fig. 2, fourteen joint points 1 to 14 of the pedestrian may be selected as extraction reference points, and then a plurality of regions may be extracted based on coordinates of the fourteen joint points 1 to 14, for example, joint points included in the head and shoulder are joint points 1 to 4, coordinates of the joint points 1 to 4 are, for example, (5,5), (5,2), (9,1), and (1,1), respectively, a minimum one (i.e., 1 of the coordinates (9,1) and (1, 1)) and a maximum one (i.e., 5 of the coordinates (5, 5)) of the vertical coordinates of the joint points 1 to 4 may be selected as a maximum value and a minimum value of the vertical coordinates of the region 201 including the head and shoulder, a minimum one (i.e., 1 of the coordinates (1, 1)) and a maximum one (i.e., the coordinates (9,1) 9) as the maximum value and the minimum value of the abscissa of the region 201 including the head-shoulder, the finally obtained region 201 including the head-shoulder may be a rectangle having coordinates (1,1), (9,1) (9,5), and (1,5) as the four corners, and the regions 202 and 203 may be obtained in a similar manner. In the embodiment of the present invention, as shown in fig. 2, joint point 1 may be a vertex joint point, joint point 2 may be a neck joint point, joint point 3 may be a left shoulder joint point, joint point 4 may be a right shoulder joint point, joint point 5 may be a left elbow joint point, joint point 6 may be a left wrist joint point, joint point 7 may be a right elbow joint point, joint point 8 may be a right wrist joint point, joint point 9 may be a left hip joint point, joint point 10 may be a right hip joint point, joint point 11 may be a left knee joint point, joint point 12 may be a left foot joint point, joint point 13 may be a right knee joint point, and joint point 14 may be a right foot joint point. The plurality of joint points can be selected manually or extracted by using a convolutional neural network pre-trained by a reverse phase propagation method, for example, when an image is input into the convolutional neural network, a convolutional layer in the convolutional neural network convolves the image, so that the features of joint areas of pedestrians in the image are extracted and included in a feature map, the features of joint areas of non-pedestrians are removed or zeroed, and then the feature map is output as a feature map representing the positions of the joint points of the pedestrians.
For example, as shown in FIG. 3, the initial image feature 310 may be generated from the image 300 using a convolutional neural network, and then the Region features respectively corresponding to the regions, i.e., the Region features 321 and 323, wherein the Region features 321 and 323 may be feature maps or feature vectors, may be generated from the initial image feature 310, this step may be performed by Region of Interest Pooling (Pooling) using a convolutional neural network, e.g., the initial image feature 310 is a feature map of 96 8596, which may be input into the convolutional neural network, the Region of Interest Pooling layer in the convolutional neural network separately pools the regions corresponding to the regions in the initial image feature to obtain the features of 3524 and 3524 of the Region, which are the Region features 321 and 321 corresponding to the regions, respectively, it should be noted that the above mentioned feature size 96 is contemplated to be used for "96 size 96", and so as to generate the regions corresponding to the regions, i.e., the regions of Interest 321 and 323, which may be generated from the image features using a convolutional neural network, which may be generated by using a more detailed neural network, or by using the same method as the method.
In another embodiment, step S1100 may include: performing convolution and pooling on the image to generate intermediate image characteristics; and generating initial image features corresponding to objects in the image based on the intermediate image features. The operations of convolving and pooling the images to generate intermediate image features in this embodiment may be implemented using a CNN that includes multiple perceptual modules. Specifically, an image of the object may be input into the CNN, and then the multiple sensing modules in the CNN perform convolution and pooling on the input image to obtain an overall feature of the image, which is an intermediate image feature. The intermediate image features may be global features comprising the image, which may be feature maps or feature vectors. After obtaining the intermediate image feature, the CNN may be used to perform convolution and pooling on the intermediate image feature to generate an initial image feature corresponding to an object in the image, specifically, the intermediate image feature may be input into the CNN, and then the multiple sensing modules in the CNN perform convolution and pooling on the input intermediate image feature to obtain an overall feature of the intermediate image feature, where the overall feature is the initial image feature. Since the initial image features are obtained by convolving and pooling the intermediate image features through the convolutional neural network, the initial image features may contain more fine features than the intermediate image features.
In this embodiment, step S1200 may include: extracting a plurality of regions from an image; pooling the intermediate image features based on the plurality of regions to generate a plurality of intermediate region features corresponding to the plurality of regions, respectively; and generating a plurality of region features respectively corresponding to the plurality of regions based on the plurality of intermediate region features.
For the step of extracting a plurality of regions from the image, it may be obtained using the method described above with reference to fig. 2. Then, the intermediate image features may be pooled based on the plurality of regions to generate a plurality of intermediate region features corresponding to the plurality of regions, respectively, and a plurality of region features corresponding to the plurality of regions, respectively, may be generated based on the plurality of intermediate region features. As shown in fig. 4, a pooling method described with reference to fig. 3 may be utilized to obtain a plurality of intermediate region features 331-, the region features 321-323 are described above. Since the region features 321-323 are generated by convolution and pooling on the basis of the intermediate region features 331-333, the region features 321-323 have finer or more discriminative features than the intermediate region features 331-333.
After obtaining the initial image feature and the plurality of region features respectively corresponding to the plurality of regions, in S1300, the initial image feature and the plurality of region features may be fused to obtain a target image feature of the object. In the case where the target image feature obtained by the fusion is a pedestrian, for example, the region feature corresponding to the region including the head and shoulder portion and the feature of the head and shoulder portion of the initial image feature may be fused to the feature of the head and shoulder portion of the target image feature, the region feature corresponding to the region including the upper body and the feature of the upper body of the initial image feature may be fused to the feature of the upper body of the target image feature, and the region feature corresponding to the region including the lower body and the feature of the lower body of the initial image feature may be fused to the feature of the lower body of the target image feature. Since the region features at least partially include finer or more distinguishable features, the method for obtaining the target image features by fusing the region features and the initial image features can effectively improve the accuracy in applications such as image recognition. In the fusion process, the feature in the region feature may be directly used as the feature of the corresponding region in the target image feature, or the feature in the region feature may be partially used as the feature of the corresponding region in the target image feature, and the feature of the corresponding region in the target image feature may be used as another feature of the corresponding region in the initial image feature, for example, the head feature in the region feature of the head and shoulder may be used as the head feature in the target image feature, and the shoulder feature in the target image feature may be used as the shoulder feature in the initial image feature.
In one embodiment, fusing the initial image feature and the plurality of region features to obtain a target image feature of the object includes: fusing the plurality of region features into a fused feature, and fusing the fused feature with the initial image feature into a target image feature of the object. For example, as shown in fig. 5, in the case where the object is a pedestrian, a region feature 321 corresponding to a region including the head and shoulders, a region feature 322 corresponding to a region including the upper body, and a region feature 323 corresponding to a region including the lower body may be fused first into a fused feature 400, and then the fused feature 400 and the initial image feature 310 may be fused into a target image feature 500 of the object.
In one embodiment, fusing the plurality of region features into a fused feature comprises: selecting the feature with the highest identification in a first overlapping area in the plurality of area features as the feature corresponding to the first overlapping area; and generating a fusion feature based on the feature corresponding to the first overlapping area. A plurality of regional features may partially overlap each other, and the overlapped region may be referred to as a first overlapped region, for example, as shown in fig. 6, a region 201 including the head and shoulder of a pedestrian partially overlaps a region 202 including the upper body of the pedestrian, and the region 202 including the upper body of the pedestrian partially overlaps a region 203 including the lower body of the pedestrian, wherein the regional features respectively corresponding to the regions 201 and 203 are schematically represented as 321 and 323 in fig. 6, and in the regional features 321 and 323, the features are represented in numerical form, and the features with high numerical values have high identifiability, that is, represent features with high distinctiveness from other features. In the process of fusing the region features 321-323, the sizes of the features of the two partially overlapped region features in the overlapped region (i.e., the first overlapped region) may be compared, a feature with a larger value is used as the feature of the overlapped region, and the features in the dashed frame of the region feature 321 are obviously larger than the features in the dashed frame of the region feature 322 by taking the region features 321 and 322 as an example, so that the features in the dashed frame of the region feature 321 are used as the features of the region in the fused feature after fusion, as shown in the dashed frame of the fused feature 400 in fig. 6. Using the above method, a fusion feature 400 as shown in fig. 6 can be obtained. Because the features with higher identifiability are reserved and the features with lower identifiability are eliminated in the fusion process, the competition strategy enables the target image features of the finally obtained object to contain the features with higher identifiability. The fusion method can be performed by a fusion unit in the convolutional neural network, and specifically, a plurality of region features or fusion features to be fused and initial image features are input into a fusion unit of the convolutional neural network, the fusion unit outputs a feature larger in an overlapping region of the plurality of input features as a feature of the overlapping region, and an inner lamination layer can be further included in the convolutional neural network and used for converting an output fusion feature layer into a feature map which can be used for subsequent fusion.
In one embodiment, fusing the fused feature with the initial image feature to a target image feature of the object comprises: selecting the feature with the highest identification as the feature corresponding to the second overlapping area in which the fused feature and the initial image feature are overlapped with each other; and generating a target image feature of the object based on the feature corresponding to the second overlapping area. In fusing the fused feature and the initial image feature, since the fused feature is obtained based on the region feature, the fused feature overlaps with the initial image feature, and the overlapped region may be referred to as a second overlapped region, so that in the fusing process, the fused feature and the initial image feature may be fused in the same way as the fusing method described with reference to fig. 6, that is, in the second overlapped region, the feature having the highest value (that is, having the highest recognizability) of the fused feature and the initial image feature is used as the feature at the region in the fused target image feature, so that the target image feature obtained by the fusion may be used as the feature of the object.
Compared with the traditional method, the method can obtain the target image features which are finer and have higher identifiability, because in the process of obtaining the target image features, the method not only extracts the features of the whole image containing the object, but also extracts the features of a plurality of areas in the image, so that the detailed features in the plurality of areas are at least partially reserved according to the process of extracting the features of the plurality of areas, and in addition, a competitive strategy is adopted in the fusion process according to the method of the embodiment of the application, namely, the features with high identifiability are reserved, the features with low identifiability are eliminated, and the finally obtained target image features have better identifiability. This is advantageous for applications such as object recognition, picture retrieval, etc., e.g., in applications for pedestrian retrieval, if the pedestrians in both the two pictures wear the white shirt and the black pants, at this time in the conventional object feature extraction method, features are extracted from the whole image, so that the slight difference between two pedestrians is probably ignored in the feature extraction process, in the method according to the embodiment of the present application, since there is a process of extracting features from a plurality of regions of a picture, so that the detail features in the plurality of region features can be extracted, for example, in the region including the face of a pedestrian, the detail features of the face of the pedestrian can be extracted, and the feature that the face of the pedestrian has high identification is reserved in the fusion process, so that the two pedestrians can be distinguished according to the difference of the details of the faces of the two pedestrians, and a correct result is obtained.
It should be noted that, in the present application, features, such as intermediate image features, initial image features, intermediate region features, fusion features, and the like, may be expressed in the form of feature maps and feature vectors. In addition, although the description is made using an example in which the object is a pedestrian in the present application, the present application is not limited thereto, and for example, the object may also be a human face, a vehicle, or the like.
A specific application of the method of extracting features according to the embodiment of the present application to a person will be described below.
When the object is a pedestrian, an intermediate image feature (e.g., 24 × 24 in size) may be generated by performing a convolution and pooling operation on an image of the pedestrian through the CNN in step S1100, and then an initial image feature (e.g., 12 × 12 in size) may be generated based on the intermediate image feature, for example, by performing a convolution and pooling operation on the intermediate image feature through the CNN. Next, in step S1200, a plurality of regions are extracted from the image of the pedestrian, for example, regions corresponding to the head-shoulder, upper-body, and lower-body of the pedestrian, and then, based on the plurality of regions corresponding to the head-shoulder, upper-body, and lower-body, the intermediate image features are pooled by using the CNN, and a plurality of intermediate region features (for example, 12 × 12 in size) corresponding to the head-shoulder, upper-body, and lower-body are generated. Then, region features corresponding to the head-shoulder, the upper-body, and the lower-body are generated based on the plurality of intermediate region features corresponding to the head-shoulder, the upper-body, and the lower-body, respectively, and specifically, the plurality of intermediate region features corresponding to the head-shoulder, the upper-body, and the lower-body may be convolved and pooled by the CNN to generate region features corresponding to the head-shoulder, the upper-body, and the lower-body. Then, in step S1300, the region feature corresponding to the upper body, the region feature corresponding to the lower body, and the region feature corresponding to the head and shoulder are fused into a fusion feature, and the fusion feature and the initial image feature are fused into a target image feature of the pedestrian.
In another embodiment, the method for extracting features of a pedestrian may further include the step of extracting and fusing features of limbs of the pedestrian. Specifically, in step S1100, the image of the pedestrian may be convolved and pooled by the CNN to generate an intermediate image feature (e.g., 24 × 24 in size), and then convolved and pooled to generate a convolved intermediate image feature (e.g., 12 × 12 in size), which may be, for example, convolved and pooled by the CNN to obtain a convolved intermediate image feature, after which the convolved intermediate image feature may be convolved and pooled to generate an initial image feature (e.g., 6 × 6 in size). Then, in step S1200, regions corresponding to the head and shoulder, the upper body, and the lower body of the pedestrian are extracted from the image of the pedestrian, and then, by using the CNN, the intermediate image features are pooled based on the plurality of regions corresponding to the head and shoulder, the upper body, and the lower body, and a plurality of intermediate region features (for example, 12 × 12 in size) corresponding to the head and shoulder, the upper body, and the lower body, respectively, are generated. In this step, it may further include: a plurality of regions, e.g., regions 204 and 207 in fig. 2, corresponding to the left arm, right arm, left leg, and right leg, respectively, of the pedestrian are extracted from the image. The region extraction process may use the same method as described with reference to fig. 2. The convolved intermediate image features are then pooled based on a plurality of regions corresponding to the left arm, the right arm, the left leg, and the right leg, respectively, using, for example, CNN, to generate a plurality of intermediate region features (e.g., 12 × 12 in size) corresponding to the left arm, the right arm, the left leg, and the right leg, respectively. Then, for example, a plurality of middle region features corresponding to the head-shoulder portion, the upper body, the lower body, the left arm, the right arm, the left leg, and the right leg are convolved and pooled by the CNN, respectively, and a convolved middle region feature (for example, 6 × 6 in size) corresponding to the head-shoulder portion, the upper body, the lower body, the left arm, the right arm, the left leg, and the right leg is generated. After obtaining the convolved mid-region features corresponding to the head-shoulder, upper-body, lower-body, left-arm, right-arm, left-leg, and right-leg, the convolved mid-region features corresponding to the left-leg and right-leg, respectively, may be fused into a leg-fusion feature using the fusion method described in connection with fig. 6; fusing the convolved middle region features corresponding to the left arm and the right arm respectively into arm fusion features; fusing the arm fusion feature and the convolved middle region feature corresponding to the upper half body into a region feature corresponding to the upper half body; fusing the leg fusion feature and the convolved middle region feature corresponding to the lower body into a region feature corresponding to the lower body; and using the convolved intermediate region features corresponding to the head-shoulder as the region features corresponding to the head-shoulder. Then, in step S1300, the region feature corresponding to the upper body, the region feature corresponding to the lower body, and the region feature corresponding to the head and shoulder may be fused into a fusion feature by the fusion method described with reference to fig. 6, and the fusion feature and the initial image feature may be fused into the target image feature of the pedestrian.
Fig. 7 exemplarily shows an apparatus 700 for extracting object features according to an embodiment of the present application. The device includes: an image feature generation module 710 that generates initial image features corresponding to objects in an image; a region feature generation module 720 that generates a plurality of region features corresponding to a plurality of regions of the image, respectively; and a fusion module 730 for fusing the initial image feature and the plurality of region features to obtain a target image feature of the object.
In one embodiment, the image feature generation module 710 includes: an intermediate image feature generation submodule 711 for performing convolution and pooling on the image to generate an intermediate image feature; and an initial image feature generation sub-module 712 that generates initial image features corresponding to objects in the image based on the intermediate image features.
In one embodiment, the region feature generation module 720 includes: a region extraction sub-module 721 that extracts a plurality of regions from the image; an intermediate region feature generation sub-module 722 pools the intermediate image features based on the plurality of regions to generate a plurality of intermediate region features corresponding to the plurality of regions, respectively, and a region feature generation sub-module 723 generates a plurality of region features corresponding to the plurality of regions based on the plurality of intermediate region features, respectively.
In one embodiment, the fusion module 730 includes: a first fusion submodule 731 fusing the plurality of region features into a fusion feature; and a second fusion sub-module 732 that fuses the fused features and the initial image features into target image features of the object.
In one embodiment, the first fusion submodule 731 is further configured to: selecting the feature with the highest identification in a first overlapping area in the plurality of area features as the feature corresponding to the first overlapping area; and generating a fusion feature based on the feature corresponding to the first overlapping area.
In one embodiment, the second fusion submodule 732 is further configured to: selecting the feature with the highest identification as the feature corresponding to the second overlapping area in which the fused feature and the initial image feature are overlapped with each other; and generating a target image feature of the object based on the feature corresponding to the second overlapping area.
In one embodiment, the region extraction sub-module 721 is configured to: the image is divided into a plurality of regions according to the extraction reference points.
In one embodiment, the object comprises a pedestrian, a human face, or a vehicle.
In one embodiment, when the object is a pedestrian, the extracted reference points include human body localization points of the pedestrian.
In one embodiment, when the object is a pedestrian, the region extraction sub-module 721 is configured to: extracting regions corresponding to the head and shoulder, the upper body, and the lower body of the pedestrian, respectively, from the image; and a middle region feature generation submodule 722 for: the intermediate image features are pooled based on a plurality of regions corresponding to the head-shoulder, upper-body, and lower-body, respectively, to generate a plurality of intermediate region features corresponding to the head-shoulder, upper-body, and lower-body, respectively.
In one embodiment, when the object is a pedestrian, the region feature generation submodule 723 is configured to: regional features corresponding to the head-shoulder, upper-body, and lower-body are generated based on a plurality of intermediate regional features corresponding to the head-shoulder, upper-body, and lower-body, respectively.
In one embodiment, when the subject is a pedestrian, the first fusion sub-module 731 is configured to: a region feature corresponding to the upper half, a region feature corresponding to the lower half, and a region feature corresponding to the head-shoulder are fused into a fused feature.
In one embodiment, when the object is a pedestrian, the initial image feature generation sub-module 712 is configured to: convolving and pooling the intermediate image features to generate convolved intermediate image features; and convolving and pooling the convolved intermediate image features to generate initial image features.
In one embodiment, when the object is a pedestrian, the region extraction sub-module 721 is further configured to: a plurality of regions respectively corresponding to a left arm, a right arm, a left leg, and a right leg of a pedestrian are extracted from an image.
In one embodiment, when the object is a pedestrian, the region feature generation sub-module 723 is configured to pool the convolved intermediate image features based on a plurality of regions corresponding to the left arm, the right arm, the left leg, and the right leg, respectively, to generate a plurality of intermediate region features corresponding to the left arm, the right arm, the left leg, and the right leg, respectively; convolving and pooling a plurality of intermediate region features corresponding to the head-shoulder, the upper-body, the lower-body, the left-arm, the right-arm, the left-leg, and the right-leg, respectively, to generate convolved intermediate region features corresponding to the head-shoulder, the upper-body, the lower-body, the left-arm, the right-arm, the left-leg, and the right-leg; fusing the convolved mid-region features corresponding to the left leg and the right leg, respectively, into leg-fused features; fusing the convolved mid-region features corresponding to the left arm and the right arm, respectively, into arm fusion features; fusing the arm fusion feature and the convolved middle region feature corresponding to the upper torso into a region feature corresponding to the upper torso; fusing the leg fusion feature and the convolved middle region feature corresponding to the lower body into a region feature corresponding to the lower body; and using the convolved intermediate region features corresponding to the head-shoulder as the region features corresponding to the head-shoulder.
In one embodiment, the features include a feature map or a feature vector.
As will be appreciated by those skilled in the art, the apparatus 700 for extracting object features may be implemented in the form of an Integrated Circuit (IC), which includes but is not limited to a digital signal processor, a graphics processing IC, an image processing IC, an audio processing IC, and the like. A person skilled in the art can know in the teaching provided in this application which form of hardware or software is used to implement the apparatus 700 for extracting object features. For example, the present application may be implemented in the form of a storage medium storing computer-executable instructions that respectively implement the above-described apparatus 700 for extracting object features, thereby implementing their respective above-described functions by being executed by a computer. The apparatus 700 for extracting object features of the present application can also be implemented by a computer system, wherein the computer system comprises a memory storing computer executable instructions and a processor in communication with the memory, and the processor executes the executable instructions to realize the functions of the apparatus 700 for extracting object features described above with reference to fig. 7.
Referring now to FIG. 8, shown is a block diagram of a computer system 800 suitable for implementing embodiments of the present application. The computer system 800 may include a processing unit (e.g., Central Processing Unit (CPU)801, image processing unit (GPU), etc.) that may perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM)802 or a program loaded from a storage section 808 into a Random Access Memory (RAM) 803. In the RAM 803, various programs and data required for the operation of the system 800 can also be stored. The CPU 801, ROM 802, and RAM 803 are connected to each other by a bus 804. An input/output I/O interface 805 is also connected to bus 804.
The following are components connectable to the I/O interface 805, AN input section 806 including a keyboard, a mouse, and the like, AN output section 807 including a cathode ray tube CRT, a liquid crystal display device L CD, a speaker, and the like, a storage section 808 including a hard disk and the like, and a communication section 809 including a network interface card (e.g., L AN card, modem, and the like), the communication section 809 may perform communication processing via a network such as the internet, the drive 810 may also be connected to the I/O interface 805 as necessary, a removable medium 811 such as a magnetic disk, AN optical disk, a magneto-optical disk, a semiconductor memory, and the like may be mounted on the drive 810 so that a computer program read therefrom is mounted in the storage section 808 as necessary.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units or modules referred to in the embodiments of the present application may be implemented by software or hardware. The described units or modules may also be provided in a processor. The names of these units or modules should not be construed as limiting these units or modules.
The above description is only an exemplary embodiment of the present application and is illustrative of the principles of the technology employed. It will be appreciated by a person skilled in the art that the scope of the present application is not limited to the embodiments with a specific combination of the above-mentioned features, but also covers other embodiments with any combination of the above-mentioned features or their equivalents without departing from the inventive concept. For example, the above features and the technical features having similar functions disclosed in the present application are mutually replaced to form the technical solution.

Claims (29)

1. A method of extracting features, comprising:
generating initial image features corresponding to objects in an image;
generating a plurality of region features corresponding to a plurality of regions of the image, respectively;
fusing the plurality of region features into a fused feature, wherein in a first overlapping region of the plurality of region features, a feature with the highest identifiability is selected as a feature corresponding to the first overlapping region, and the fused feature is generated based on the feature corresponding to the first overlapping region; and
fusing the fused feature and the initial image feature as a target image feature of the object.
2. The method of claim 1, generating initial image features corresponding to objects in the image comprising:
performing convolution and pooling on the image to generate intermediate image features; and
generating initial image features corresponding to objects in the image based on the intermediate image features.
3. The method of claim 2, generating a plurality of region features corresponding to a plurality of regions of the image, respectively, comprising:
extracting a plurality of regions from the image;
pooling the intermediate image features based on the plurality of regions to generate a plurality of intermediate region features corresponding to the plurality of regions, respectively, an
Generating a plurality of region features respectively corresponding to the plurality of regions based on the plurality of intermediate region features.
4. The method of claim 1, wherein fusing the fused feature with the initial image feature to a target image feature of the object comprises:
selecting a feature with the highest identification as a feature corresponding to a second overlapping area in the second overlapping area in which the fused feature and the initial image feature overlap with each other; and
and generating a target image characteristic of the object based on the characteristic corresponding to the second overlapping area.
5. The method of claim 3, wherein extracting a plurality of regions from the image comprises:
dividing the image into a plurality of the regions according to the extraction reference points.
6. The method of claim 5, wherein the object comprises a pedestrian, a human face, or a vehicle.
7. The method of claim 6, wherein, when the object is a pedestrian, the extracted reference points comprise human body localization points of the pedestrian.
8. The method of claim 7, wherein, when the object is a pedestrian,
extracting a plurality of regions from the image comprises:
extracting regions corresponding to the head and shoulder, the upper body, and the lower body of the pedestrian, respectively, from the image; and
pooling the intermediate image features based on the plurality of regions, and generating a plurality of intermediate region features corresponding to the plurality of regions, respectively, comprises:
the intermediate image features are pooled based on a plurality of regions corresponding to the head-shoulder, the upper-body, and the lower-body, respectively, to generate a plurality of intermediate region features corresponding to the head-shoulder, the upper-body, and the lower-body, respectively.
9. The method of claim 8, wherein generating a plurality of region features respectively corresponding to the plurality of regions based on the plurality of intermediate region features when the object is a pedestrian comprises:
generating region features corresponding to the head-shoulder, the upper-body, and the lower-body based on a plurality of middle region features corresponding to the head-shoulder, the upper-body, and the lower-body, respectively.
10. The method of claim 9, wherein fusing the plurality of region features into a fused feature comprises, when the object is a pedestrian:
and fusing the region feature corresponding to the upper body, the region feature corresponding to the lower body, and the region feature corresponding to the head-shoulder portion into a fused feature.
11. The method of claim 10, wherein generating initial image features corresponding to objects in the image based on the intermediate image features comprises, when the object is a pedestrian:
convolving and pooling the intermediate image features to generate convolved intermediate image features; and
convolving and pooling the convolved intermediate image features to generate initial image features.
12. The method of claim 11, wherein, when the object is a pedestrian, extracting a plurality of regions from the image further comprises:
a plurality of regions respectively corresponding to a left arm, a right arm, a left leg, and a right leg of the pedestrian are extracted from the image.
13. The method of claim 12, wherein, when the object comprises a pedestrian, generating region features corresponding to the head-shoulder, the upper-body, and the lower-body based on a plurality of intermediate region features corresponding to the head-shoulder, the upper-body, and the lower-body, respectively, comprises:
pooling the convolved intermediate image features based on a plurality of regions corresponding to the left arm, the right arm, the left leg, and the right leg, respectively, to generate a plurality of intermediate region features corresponding to the left arm, the right arm, the left leg, and the right leg, respectively;
convolving and pooling a plurality of intermediate region features corresponding to the head-shoulder, the upper-body, the lower-body, the left-arm, the right-arm, the left-leg, and the right-leg, respectively, to generate convolved intermediate region features corresponding to the head-shoulder, the upper-body, the lower-body, the left-arm, the right-arm, the left-leg, and the right-leg;
fusing the convolved mid-region features corresponding to the left leg and the right leg, respectively, into leg-fused features;
fusing the convolved mid-region features corresponding to the left arm and the right arm, respectively, into arm fusion features;
fusing the arm fusion feature and the convolved middle region feature corresponding to the upper torso into a region feature corresponding to the upper torso;
fusing the leg fusion feature and the convolved middle region feature corresponding to the lower body into a region feature corresponding to the lower body; and
using the convolved intermediate region features corresponding to the head-shoulder as the region features corresponding to the head-shoulder.
14. The method of claim 1, wherein the feature comprises a feature map or a feature vector.
15. An apparatus for extracting features, comprising:
an image feature generation module that generates an initial image feature corresponding to an object in an image;
a region feature generation module that generates a plurality of region features corresponding to a plurality of regions of the image, respectively; and
a fusion module for fusing the initial image feature and the plurality of region features to obtain a target image feature of the object,
the fusion module includes:
a first fusion submodule, configured to fuse the plurality of region features into a fusion feature, where in a first overlapping region of the plurality of region features, a feature with a highest identifiability is selected as a feature corresponding to the first overlapping region, and the fusion feature is generated based on the feature corresponding to the first overlapping region; and
and the second fusion submodule fuses the fusion feature and the initial image feature into a target image feature of the object.
16. The apparatus of claim 15, wherein the image feature generation module comprises:
the intermediate image feature generation submodule is used for performing convolution and pooling on the image to generate intermediate image features; and
an initial image feature generation sub-module that generates initial image features corresponding to objects in the image based on the intermediate image features.
17. The apparatus of claim 16, wherein the regional feature generation module comprises:
a region extraction sub-module that extracts a plurality of regions from the image;
an intermediate region feature generation sub-module that pools the intermediate image features based on the plurality of regions, generates a plurality of intermediate region features respectively corresponding to the plurality of regions, and
and a region feature generation sub-module that generates a plurality of region features respectively corresponding to the plurality of regions based on the plurality of intermediate region features.
18. The apparatus of claim 15, wherein the second fusion submodule is further to:
selecting a feature with the highest identification as a feature corresponding to a second overlapping area in the second overlapping area in which the fused feature and the initial image feature overlap with each other; and
and generating a target image characteristic of the object based on the characteristic corresponding to the second overlapping area.
19. The apparatus of claim 17, wherein the region extraction submodule is to:
dividing the image into a plurality of the regions according to the extraction reference points.
20. The apparatus of claim 19, wherein the object comprises a pedestrian, a human face, or a vehicle.
21. The apparatus of claim 20, wherein, when the object is a pedestrian, the extracted reference point comprises a human body localization point of the pedestrian.
22. The apparatus of claim 21, wherein, when the object is a pedestrian,
the region extraction submodule is configured to:
extracting regions corresponding to the head and shoulder, the upper body, and the lower body of the pedestrian, respectively, from the image; and
the middle region feature generation submodule is configured to:
the intermediate image features are pooled based on a plurality of regions corresponding to the head-shoulder, the upper-body, and the lower-body, respectively, to generate a plurality of intermediate region features corresponding to the head-shoulder, the upper-body, and the lower-body, respectively.
23. The apparatus of claim 22, wherein when the object is a pedestrian, the region feature generation submodule is to:
generating region features corresponding to the head-shoulder, the upper-body, and the lower-body based on a plurality of middle region features corresponding to the head-shoulder, the upper-body, and the lower-body, respectively.
24. The apparatus of claim 23, wherein, when the object is a pedestrian, the first fusion submodule is to:
and fusing the region feature corresponding to the upper body, the region feature corresponding to the lower body, and the region feature corresponding to the head-shoulder portion into a fused feature.
25. The apparatus of claim 24, wherein when the object is a pedestrian, the initial image feature generation submodule is to:
convolving and pooling the intermediate image features to generate convolved intermediate image features; and
convolving and pooling the convolved intermediate image features to generate initial image features.
26. The apparatus of claim 25, wherein when the object is a pedestrian, the region extraction sub-module is further to:
a plurality of regions respectively corresponding to a left arm, a right arm, a left leg, and a right leg of the pedestrian are extracted from the image.
27. The apparatus of claim 26, wherein when the object is a pedestrian, the region feature generation submodule is to:
pooling the convolved intermediate image features based on a plurality of regions corresponding to the left arm, the right arm, the left leg, and the right leg, respectively, to generate a plurality of intermediate region features corresponding to the left arm, the right arm, the left leg, and the right leg, respectively;
convolving and pooling a plurality of intermediate region features corresponding to the head-shoulder, the upper-body, the lower-body, the left-arm, the right-arm, the left-leg, and the right-leg, respectively, to generate convolved intermediate region features corresponding to the head-shoulder, the upper-body, the lower-body, the left-arm, the right-arm, the left-leg, and the right-leg;
fusing the convolved mid-region features corresponding to the left leg and the right leg, respectively, into leg-fused features;
fusing the convolved mid-region features corresponding to the left arm and the right arm, respectively, into arm fusion features;
fusing the arm fusion feature and the convolved middle region feature corresponding to the upper torso into a region feature corresponding to the upper torso;
fusing the leg fusion feature and the convolved middle region feature corresponding to the lower body into a region feature corresponding to the lower body; and
using the convolved intermediate region features corresponding to the head-shoulder as the region features corresponding to the head-shoulder.
28. The apparatus of claim 15, wherein the feature comprises a feature map or a feature vector.
29. A system for extracting features, comprising:
a memory storing executable instructions;
one or more processors in communication with the memory to execute the executable instructions to:
generating initial image features corresponding to objects in an image;
generating a plurality of region features corresponding to a plurality of regions of the image, respectively;
fusing the plurality of region features into a fused feature, wherein in a first overlapping region of the plurality of region features, a feature with the highest identifiability is selected as a feature corresponding to the first overlapping region, and the fused feature is generated based on the feature corresponding to the first overlapping region; and
fusing the fused feature and the initial image feature as a target image feature of the object.
CN201710195256.6A 2017-03-29 2017-03-29 Method, device and system for extracting features Active CN108229492B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710195256.6A CN108229492B (en) 2017-03-29 2017-03-29 Method, device and system for extracting features

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710195256.6A CN108229492B (en) 2017-03-29 2017-03-29 Method, device and system for extracting features

Publications (2)

Publication Number Publication Date
CN108229492A CN108229492A (en) 2018-06-29
CN108229492B true CN108229492B (en) 2020-07-28

Family

ID=62657374

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710195256.6A Active CN108229492B (en) 2017-03-29 2017-03-29 Method, device and system for extracting features

Country Status (1)

Country Link
CN (1) CN108229492B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109920016B (en) * 2019-03-18 2021-06-25 北京市商汤科技开发有限公司 Image generation method and device, electronic equipment and storage medium
US12094189B2 (en) 2019-05-13 2024-09-17 Nippon Telegraph And Telephone Corporation Learning method, learning program, and learning device to accurately identify sub-objects of an object included in an image
CN110705345A (en) * 2019-08-21 2020-01-17 重庆特斯联智慧科技股份有限公司 Pedestrian re-identification method and system based on deep learning
CN111444928A (en) * 2020-03-30 2020-07-24 北京市商汤科技开发有限公司 Key point detection method and device, electronic equipment and storage medium

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105631413A (en) * 2015-12-23 2016-06-01 中通服公众信息产业股份有限公司 Cross-scene pedestrian searching method based on depth learning

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105631413A (en) * 2015-12-23 2016-06-01 中通服公众信息产业股份有限公司 Cross-scene pedestrian searching method based on depth learning

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Fusion of Global and Local Features Using KCCA for Automatic Target Recognition;Jiong zhao等;《2009 Fifth International Conference on Image and Graphics》;20100830;摘要,第1页左栏 *
Good Practice in CNN Feature Transfer;Liang Zheng等;《arXiv:1604.00133v1》;20160401;全文 *

Also Published As

Publication number Publication date
CN108229492A (en) 2018-06-29

Similar Documents

Publication Publication Date Title
CN110176027B (en) Video target tracking method, device, equipment and storage medium
CN110135455B (en) Image matching method, device and computer readable storage medium
CN108229492B (en) Method, device and system for extracting features
CN112052839B (en) Image data processing method, apparatus, device and medium
CN108875523B (en) Human body joint point detection method, device, system and storage medium
CN111583097A (en) Image processing method, image processing device, electronic equipment and computer readable storage medium
US11037325B2 (en) Information processing apparatus and method of controlling the same
US20230237666A1 (en) Image data processing method and apparatus
AU2018200164A1 (en) Forecasting human dynamics from static images
CN110084154B (en) Method and device for rendering image, electronic equipment and computer readable storage medium
CN110858414A (en) Image processing method and device, readable storage medium and augmented reality system
CN112508989B (en) Image processing method, device, server and medium
CN112651881A (en) Image synthesis method, apparatus, device, storage medium, and program product
CN114782911B (en) Image processing method, device, equipment, medium, chip and vehicle
CN114842035A (en) License plate desensitization method, device and equipment based on deep learning and storage medium
CN113837130A (en) Human hand skeleton detection method and system
CN108564058A (en) A kind of image processing method, device and computer readable storage medium
CN115761885B (en) Behavior recognition method for common-time and cross-domain asynchronous fusion driving
CN117529758A (en) Methods, systems, and media for identifying human collaborative activity in images and videos using neural networks
CN113544701B (en) Method and device for detecting associated object, electronic equipment and storage medium
CN107633498A (en) Image dark-state Enhancement Method, device and electronic equipment
CN116883770A (en) Training method and device of depth estimation model, electronic equipment and storage medium
CN112258435A (en) Image processing method and related product
CN112580544A (en) Image recognition method, device and medium and electronic equipment thereof
CN114565521B (en) Image restoration method, device, equipment and storage medium based on virtual reloading

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant