CN107545268B - Color feature extraction method and system - Google Patents

Color feature extraction method and system Download PDF

Info

Publication number
CN107545268B
CN107545268B CN201610499736.7A CN201610499736A CN107545268B CN 107545268 B CN107545268 B CN 107545268B CN 201610499736 A CN201610499736 A CN 201610499736A CN 107545268 B CN107545268 B CN 107545268B
Authority
CN
China
Prior art keywords
target object
region
candidate
candidate region
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610499736.7A
Other languages
Chinese (zh)
Other versions
CN107545268A (en
Inventor
张默
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Moshanghua Technology Co ltd
Original Assignee
Beijing Moshanghua Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Moshanghua Technology Co ltd filed Critical Beijing Moshanghua Technology Co ltd
Priority to CN201610499736.7A priority Critical patent/CN107545268B/en
Publication of CN107545268A publication Critical patent/CN107545268A/en
Application granted granted Critical
Publication of CN107545268B publication Critical patent/CN107545268B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Image Analysis (AREA)

Abstract

The application discloses a color feature extraction method and a system, wherein the method comprises the following steps: extracting a candidate region where a characteristic part of a target object in an image to be processed is located; determining a candidate region of the target object by using each candidate region of the characteristic part; selecting an optimal candidate region of the target object from the candidate regions of the target object as a target region of the target object; and extracting the color features of the target area as the color features of the image to be processed. The color feature extraction method and device improve accuracy of extracted color features.

Description

Color feature extraction method and system
Technical Field
The application belongs to the technical field of image processing, and particularly relates to a color feature extraction method and system.
Background
The color feature is one of image features for describing surface properties of an image. In the prior art, color features are mostly represented by using a color histogram, and color feature extraction mainly extracts the color histogram of an image.
However, since the image mainly contains the target object in the image and the other part is the background information, the region where the target object is located only occupies a part of the entire image, and the region and the posture of the target object in the image are relatively random, the color feature extraction method in the prior art cannot well capture the color feature of the target object in the image due to the color feature of the background information, so that the extracted color feature is not accurate enough, and the error rate is high.
Disclosure of Invention
In view of this, the technical problem that the present application is to solve is that the color features extracted in the prior art are inaccurate, and the error rate is high.
In order to solve the above technical problem, the present application discloses a color feature extraction method:
extracting a candidate region where a characteristic part of a target object in an image to be processed is located;
determining a candidate region of the target object by using each candidate region of the characteristic part;
selecting an optimal candidate region of the target object from the candidate regions of the target object as a target region of the target object;
and extracting the color features of the target area as the color features of the image to be processed.
Preferably, the extracting the candidate region where the feature of the target object is located in the image to be processed includes:
constructing a pyramid characteristic image of the image to be processed;
constructing an integral channel characteristic image of each pyramid characteristic image;
carrying out window sliding on each integral channel characteristic image of each pyramid characteristic image to extract a candidate window;
extracting a plurality of rectangular candidate frames from the candidate window;
selecting a candidate region containing the characteristic part of the target object from the plurality of rectangular candidate frames by using a classifier; the classifier is obtained by pre-training with a first training sample, wherein the first training sample comprises a positive sample containing the target object feature and a negative sample not containing the target object feature.
Preferably, the determining the candidate region of the target object by using each candidate region of the feature part includes:
and acquiring an enclosing region where the target object is located by utilizing the candidate region of each characteristic part and utilizing an enclosing box algorithm, and determining the enclosing region as the candidate region of the target object.
Preferably, the selecting an optimal candidate region of the target object from the candidate regions of the target object as the target region of the target object includes:
matching each candidate region with the object model, and selecting one candidate region which is most matched as the optimal target region of the target object; the object model is obtained by utilizing a second training sample to train and pre-train, wherein the second training sample comprises a positive sample containing the target object and a negative sample not containing the target object; the object model comprises a main model and a plurality of sub models.
Preferably, the matching each candidate region with the object model and selecting the best matching candidate region as the optimal target region of the target object includes:
dividing each candidate region of the target object to obtain a plurality of sub-regions;
calculating a first matching score of each candidate region of the target object with a main model of an object model;
calculating a second matching score of each sub-region in each candidate region of the target object and the corresponding sub-model in the object model;
adding the second matching scores corresponding to the multiple sub-areas of each candidate area of the target object to obtain a third matching score corresponding to each candidate area;
adding the first matching score and the third matching score corresponding to each candidate area of the target object to obtain a fourth matching score;
and selecting the candidate area with the highest fourth matching score as the optimal target area of the target object.
Preferably, the extracting the color feature of the target region as the color feature of the image to be processed includes:
extracting a three-channel color histogram of the target area;
and calculating the posterior annotation color characteristic selected when the probability in the conditional random field is maximum by using a conditional random field algorithm and the three-channel color histogram of the target region, wherein the posterior annotation color characteristic is used as the color characteristic of the image to be processed.
The application also opens a color extraction system:
the first extraction module is used for extracting a candidate region where a characteristic part of a target object in an image to be processed is located;
a first determination module, configured to determine a candidate region of the target object by using each candidate region of the feature;
the first selection module is used for selecting the optimal candidate area of the target object from the candidate areas of the target object as the target area of the target object;
and the second extraction module is used for extracting the color features of the target area as the color features of the image to be processed.
Preferably, the first extraction module comprises:
the first construction unit is used for constructing a pyramid characteristic image of the image to be processed;
the second construction unit is used for constructing an integral channel characteristic image of each pyramid characteristic image;
the first extraction unit is used for carrying out window sliding on each integral channel characteristic image of each pyramid characteristic image to extract a candidate window;
a second extraction unit configured to extract a plurality of rectangular candidate frames from the candidate window;
a first classification unit configured to select, by using a classifier, a candidate region including a feature portion of the target object from the plurality of rectangular candidate frames; the classifier is obtained by pre-training with a first training sample, wherein the first training sample comprises a positive sample containing the target object feature and a negative sample not containing the target object feature.
Preferably, the first determining module comprises:
and the first determining unit is used for acquiring an enclosing region where the target object is located by utilizing the candidate region of each characteristic part and utilizing an enclosing box algorithm, and determining the enclosing region as the candidate region of the target object.
Preferably, the first selection module comprises:
the first selection unit is used for matching each candidate region with the object model and selecting the candidate region which is most matched as the optimal target region of the target object; the object model is obtained by utilizing a second training sample to train and pre-train, wherein the second training sample comprises a positive sample containing the target object and a negative sample not containing the target object; the object model comprises a main model and a plurality of sub models.
Preferably, the first selection unit includes:
a first dividing unit, configured to divide each candidate region of the target object to obtain a plurality of sub-regions;
a first calculating subunit, configured to calculate a first matching score between each candidate region of the target object and a main model of an object model;
the second calculating subunit is used for calculating a second matching score of each sub-region in each candidate region of the target object and the corresponding sub-model in the object model;
the third calculation subunit is used for adding the second matching scores corresponding to the sub-regions of each candidate region of the target object to obtain a third matching score corresponding to each candidate region;
and the fourth calculating subunit is configured to add the first matching score and the third matching score corresponding to each candidate region of the target object to obtain a fourth matching score.
And the first selection subunit is used for selecting the candidate region with the highest fourth matching score as the optimal target region of the target object.
Preferably, the second extraction module comprises:
a third extraction unit, configured to extract a three-channel color histogram of the target region;
and the second selection unit is used for calculating the posterior annotation color feature selected when the probability in the conditional random field is maximum by using the conditional random field algorithm and the three-channel color histogram of the target region, and taking the posterior annotation color feature as the color feature of the image to be processed.
Compared with the prior art, the application can obtain the following technical effects:
and accurately positioning the area where the target object appears by adopting a method for accurately acquiring the area where the target object is located in the image to be processed, and acquiring corresponding color characteristics by utilizing the acquired area. The method improves the pertinence of the color feature extraction position and the accuracy of color feature extraction, and reduces the probability of errors of the extracted color features.
Of course, it is not necessary for any one product to achieve all of the above-described technical effects simultaneously.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:
FIG. 1 is a flow chart of one embodiment of a color extraction method of an embodiment of the present application;
FIG. 2 is a flow chart of yet another embodiment of a color extraction method of an embodiment of the present application;
FIG. 3 is a schematic representation of a mannequin of an embodiment of the present application;
fig. 4 is a schematic structural diagram of an embodiment of a color extraction system according to an embodiment of the present application.
Detailed Description
Embodiments of the present application will be described in detail with reference to the drawings and examples, so that how to implement technical means to solve technical problems and achieve technical effects of the present application can be fully understood and implemented.
The color features of the image have important application significance in the field of image processing, and the colors of the image can be analyzed in detail by extracting the color features of the image. For example, the color feature of the clothing of one person in a video is extracted, and then the color feature of the clothing is extracted as a parameter to detect the clothing of the same type; and the color characteristics of a pair of oil paintings can be extracted for analyzing the color composition of the oil paintings. The existing color feature extraction mode has low attention to the appearance region of the target object. For example, a color histogram of an extracted image is used as a color feature, and a region where a target object appears is not considered in the extraction, but each pixel value in the image to be processed is directly counted to directly form the color histogram. The extracted color features cannot well capture the color features of the target object in the image to be processed, so that the extracted color features are not strong in pertinence, the extraction result is not accurate enough, and the error rate is high.
In order to solve the technical problem, the inventors propose a technical solution of the present application after a series of studies. In the present embodiment, first, candidate regions of the feature portion of the target object are obtained; determining a candidate region of the target object by using the candidate region of the characteristic part; then selecting the optimal candidate area where the target object is located from the candidate areas of the target object as a target area; and finally, extracting color features from the target area as the color features of the image.
In the embodiment of the application, the robustness of obtaining the target area where the target object is located can be improved, and the accuracy of obtaining the target area where the target object is located is improved, so that the accuracy of the extracted color features is improved, and the probability of errors of the color features is reduced.
The technical solution of the present application is described in detail below with reference to the accompanying drawings.
As shown in fig. 1, a flowchart of an embodiment of a color feature extraction method provided in the present application is provided, where the method mainly includes the following steps:
101: extracting a candidate region where a characteristic part of a target object in an image to be processed is located;
wherein the feature of the target object may be a relatively robust feature of the target object, for example, if the target object is a person, the feature of the target object may be a head-shoulder region of the person; if the target object is a bicycle, the features of the target object may be two bicycle wheels.
102: determining a candidate region of the target object by using each candidate region of the characteristic part;
preferably, the determining the candidate region of the target object by using each candidate region of the feature part may include:
and acquiring an enclosing region where the target object is located by utilizing the candidate region of each characteristic part and utilizing an enclosing box algorithm, and determining the enclosing region as the candidate region of the target object.
Wherein, the specific parameters of the bounding box algorithm can be configured as: the width is unchanged, the central point is shifted down by twice the height of the original target area, and the height is changed to 3.5 times the height of the original target area. The algorithm using bounding box coordinate estimation can make the acquired target region more accurate.
103: selecting an optimal candidate region of the target object from the candidate regions of the target object as a target region of the target object;
as a possible implementation manner, in order to make the obtained target region more accurate, a target detection algorithm may be used to select an optimal candidate region of the target object from the candidate regions of the target object. The target detection algorithm may be a DPM (Deformable Parts Model) target detection algorithm.
104: and extracting the color features of the target area as the color features of the image to be processed.
Wherein, the extracted color feature of the target region may be a color histogram.
As a possible implementation manner, in order to make the accuracy of the color feature of the image to be processed higher, the extracting the color feature of the target region includes, as the color feature of the image to be processed:
extracting a three-channel color histogram of the target area;
and calculating the posterior annotation color characteristic selected when the probability in the conditional random field is maximum by using a conditional random field algorithm and the three-channel color histogram of the target region, wherein the posterior annotation color characteristic is used as the color characteristic of the image to be processed.
Preferably, a three-channel color histogram of the target region may be input as an observed color feature to a probability formula of the conditional random field:
Figure BDA0001034706560000071
where i denotes the ith position of the observation sequence, tkRepresenting a transfer characteristic function between marker positions i-1 and i for the observed color characteristic, slA state feature function, λ, representing the position of the observation color feature ikAnd mulThe representation of the corresponding weight value can be obtained by maximum likelihood estimation. x represents firstAnd observing the color features by inspection, wherein y represents the posterior labeling color features, and the posterior labeling color features when the probability formula of the conditional random field obtains the maximum value of the probability value are the color features of the image to be processed.
In the embodiment, the robustness of obtaining the target region where the target object is located can be improved, and the accuracy of obtaining the target region where the target object is located is improved, so that the accuracy of the extracted color features is improved, and the probability of errors of the color features is reduced.
As a further embodiment, to further improve the accuracy of the color features, in step 101: extracting the candidate region where the feature of the target object is located in the image to be processed may include:
constructing a pyramid characteristic image of the image to be processed;
constructing an integral channel characteristic image of each pyramid characteristic image;
carrying out window sliding on each integral channel characteristic image of each pyramid characteristic image to extract a candidate window;
extracting a plurality of rectangular candidate frames from the candidate window;
selecting a candidate region containing the characteristic part of the target object from the plurality of rectangular candidate frames by using a classifier; the classifier is obtained by pre-training with a first training sample, wherein the first training sample comprises a positive sample containing the target object feature and a negative sample not containing the target object feature.
Preferably, the pyramid feature image of the image to be processed may be constructed in a manner that the length and width of the original image is reduced by half to be one step, each step may include m layers, for example, m may be 4 layers, 12 layers, and the like, and the width and height of each layer are reduced to 21/m times of the previous layer.
Preferably, the integrated channel feature image may include: three-channel color characteristics, gradient amplitude, gradient histograms of 6 directions.
Preferably, the window sliding on the integration channel feature image may be to define a window and a sliding step size, so that the window slides from the top left corner of the integration channel feature image, and it is known to slide to the bottom right corner in the order from left to right and from top to bottom.
Preferably, the extracting the plurality of rectangular candidate frames from the candidate window may be randomly selecting a first predetermined number of rectangular candidate frames with random sizes but satisfying a certain limit from among the candidate frames; the first predetermined number is a sufficient number of values, for example, 4 ten thousand, at least for obtaining the feature of one target object. The certain limit can be met by the aspect ratio of the rectangular frame meeting a certain proportion limit, for example, the aspect ratio can be changed between 1/8-8, but the proportion limit cannot be exceeded. At the same time, the maximum and/or minimum values of the width and/or height may also be limited, for example, the width and height may not exceed 36 pixel values at the maximum.
Preferably, the classifier may be a soft-cascade classifier, which has only one strong classifier having a plurality of weak classifiers. The classification process of an Adaboost classifier can be regarded as a cascade, and each weak classifier is a layer of the cascade, and when a detected rectangular candidate box passes through a layer, the score of the layer is related to all layers before the layer. When judging whether a certain layer can pass through, the decision needs to be made according to the output scores of all layers before the layer. The decision formula is as follows:
Figure BDA0001034706560000081
wherein t represents the current layer composed of the first t weak classifiers, HtRepresents the score of the current layer and x represents the input of the layer, i.e. the detected candidate rectangular box. A threshold determination method may be used to determine whether the detected rectangular candidate box can pass through the layer. Namely, a threshold value theta is set, if the fraction of the layer is less than theta, the candidate rectangular frame is judged not to pass through the layer,thereby judging that the candidate rectangular frame does not contain the characteristic part of the target object.
Preferably, the classifier may be obtained by pre-training with a first training sample. The first training samples comprise positive samples and negative samples. The first training sample may be obtained by manually labeling an image containing a target object downloaded from the internet, or may be an open data set, such as a pedestrian data set of an INRIA (institute of national information and automation institute, france) or a virtual pedestrian data set of a CVC (Computer Vision Center) through coordinate transformation.
The positive sample is an image containing a characteristic portion of the target object and is obtained by manual labeling or coordinate transformation. The manual labeling can be to select the region where the characteristic part of the target object is located from the image manually downloaded from the network. The coordinate transformation may be a region where a characteristic portion of the target object is obtained from the data set by coordinate transformation. The image resulting from manual annotation or coordinate transformation is taken as a positive sample.
The positive samples may include training samples and test samples. A first number of images is selected as training samples and a second number of images is selected as test samples. For example, if 10000 images are obtained by manual labeling or coordinate transformation, the first number may be 8000 and the second number may be 2000.
The negative examples are images that do not include the feature of the target object, and may be a third number of images in which the target object is not present and the background of the image is complicated. For example, the third number may be 10000.
In order to make the sample information more accurate, the image of the first training sample can be ensured to meet a certain scale proportion. For example, aspect ratios may be limited, such as: 1: 0.85; the size of the image of the first training sample may also be fixed, e.g. using a size of 34 x 29, 34, 29 representing the number of pixels.
The training process of the classifier may include multiple rounds of training processes, for example, 5 rounds of training may be performed. The first round of training may be to select a fourth number of negative examples from images that do not contain the target object and have a complex image background, the fourth number being less than or equal to the third number. After each round of training is finished, testing the classifier obtained by the round of training by using the test sample; the misclassified images are added as new samples to the negative samples and the fourth number is increased by a corresponding amount. After multiple rounds of training, a classifier with a higher classification effect can be obtained.
In this embodiment, since the plurality of rectangular candidate frames only include the feature portion of the target object, each candidate frame occupies a smaller memory, and the smaller the memory is, the faster the calculation speed is, and the less calculation error occurs in the calculation process, so that by using this way, a large amount of background information can be quickly and accurately excluded, the accuracy of the extracted target object candidate region is improved, and the accuracy of the color feature is further improved.
As still another embodiment, in order to improve the accuracy of the obtained target region, an optimal candidate region may be extracted using a target detection algorithm. If the target detection algorithm may be a DPM target detection algorithm, then in step 103: selecting an optimal candidate region of the target object from the candidate regions of the target object, wherein the selecting as the target region of the target object may include:
matching each candidate region with the object model, and selecting one candidate region which is most matched as the optimal target region of the target object; the object model is obtained by training a pre-training using a second training sample, wherein the second training sample comprises a positive sample containing the target object, a test sample and a negative sample not containing the target object.
Preferably, the object model is composed of a main model and a plurality of sub models. The master model is an ideal template for the target object. Each sub-model represents an ideal location of a robust feature of the target object in the target object. The plurality of sub-models may be 6 sub-models.
Therefore, as a further embodiment, the matching each candidate region with the object model and selecting the best matching one candidate region as the optimal target region of the target object may include:
dividing each candidate region of the target object to obtain a plurality of sub-regions;
preferably, the plurality of sub-regions of the candidate region and the plurality of sub-regions of the object model are equal in number, and are matched in position and in one-to-one correspondence.
Calculating a first matching score of each candidate region of the target object with a main model of an object model;
preferably, calculating the first matching score of each candidate region of the target object with the object model may be calculating a matching score of each candidate region of the target object with a main model of the object model as the first matching score.
Calculating a second matching score of each sub-region in each candidate region of the target object and the corresponding sub-model in the object model;
preferably, the second matching score is for each sub-region of the candidate region, the deformation cost deviating from the ideal position represented by the corresponding sub-model in the object model being lower the further the deviation.
Adding the second matching scores corresponding to the multiple sub-areas of each candidate area of the target object to obtain a third matching score corresponding to each candidate area;
and adding the first matching score and the third matching score corresponding to each candidate area of the target object to obtain a fourth matching score.
The fourth matching score may be calculated using the following formula:
Figure BDA0001034706560000111
wherein,
Figure BDA0001034706560000112
to representThe first of the matching scores is a score of,
Figure BDA0001034706560000113
a second matching score corresponding to the ith sub-region,
Figure BDA0001034706560000114
represents the third matching score, score (x)0,y0,l0) Represents a fourth match score; (x)0,y0) 2 (x) represents the coordinate of the upper left corner of the root filter on the root feature map (the root filter can be a main model, the root feature map can be a candidate region, and the root filter needs to be used for sliding a window on the root feature map), 2 (x)0,y0)+viCoordinates representing a part (sub-model) in the root filter (the coordinates being the coordinates of a part on a part feature map with a resolution 2 times that of the root feature map, hence multiplied by 2, viIs the offset of the part, such as from the upper left corner to the head position, or from the upper left corner to the arm position); b denotes the root offset set for aligning the various parts.
And selecting the candidate area with the highest fourth matching score as the optimal target area of the target object.
As still another embodiment, in order to further improve the accuracy of the extracted color feature, the extracting the color feature of the target region as the color feature of the image to be processed may include:
extracting a three-channel color histogram of each sub-area in the target area;
calculating the posterior marking color characteristic of each selected subregion when the probability is maximum in the conditional random field by using a conditional random field algorithm and a three-channel color histogram of each subregion in the target region;
and counting the posterior labeling color features of each sub-area to serve as the color features of the image to be processed.
Wherein, the three-channel color histogram of each subregion is extracted to further optimize the extraction result of the color feature by using a conditional random field algorithm and through a statistical mode.
Preferably, the three-channel color histogram of each sub-region in the target region may be used as an observed color feature, and the probability formula of the conditional random field is input as follows:
Figure BDA0001034706560000121
where i denotes the ith position of the observation sequence, tkIs a transfer characteristic function, s, between marker positions i-1 and i of the observed sequencelIs a state characteristic function of the position of the observation sequence i, lambdakAnd mulIs a corresponding weight value, which can be obtained by maximum likelihood estimation. x is a priori observation color characteristic, y is a posterior labeling color characteristic, and the posterior labeling color characteristic of each subregion when the probability formula of the conditional random field obtains the probability maximum value.
The posterior labeling color feature of each sub-region may be used alone or in combination with the posterior labeling color feature of the target region, which is not limited herein.
In the embodiment of the application, a method for acquiring an optimal target area of a target object appearing in an image is adopted. The position where the target area appears can be accurately obtained, then color feature extraction is carried out, the pertinence of the color feature extraction is improved, and therefore the accuracy of the extracted color feature is further improved.
The technical solution of the present application will be described with reference to a specific application scenario, as shown in fig. 2, which is a flowchart of another embodiment of a color feature extraction method provided in the embodiment of the present application, in which a target object is taken as a human body as an example for description.
The method may comprise the steps of:
201: constructing a pyramid characteristic image of the image to be processed;
202: constructing an integral channel characteristic image of each pyramid characteristic image;
203: carrying out window sliding on each integral channel characteristic image of each pyramid characteristic image to extract a candidate window;
204: extracting a plurality of rectangular candidate frames from the candidate window;
205: selecting a candidate region containing the human head and shoulders from the plurality of rectangular candidate frames by using a head and shoulder classifier; the head and shoulder classifier is obtained by utilizing a third training sample through pre-training, wherein the third training sample comprises a positive sample of a human head and shoulder area and a negative sample which does not contain the human head and shoulder area.
Preferably, the head-shoulder classifier may adopt a soft-cascade classifier, and is obtained by pre-training with a third training sample. The third training sample may be obtained by manually labeling an image containing a human body downloaded from the internet, or may be an open pedestrian data set, such as a pedestrian data set of an INRIA (institute of national information and automation institute, france) or a virtual pedestrian data set of a CVC (Computer Vision Center) obtained by coordinate transformation.
The positive sample and the test sample are images containing human head and shoulder areas and are obtained through manual labeling or coordinate transformation. The manual labeling can be to select the area where the human head and shoulder are located from the image manually downloaded from the network. The coordinate transformation may be to obtain the region where the head and the shoulder of the human body are located from the data set through coordinate transformation. The image resulting from manual annotation or coordinate transformation is taken as a positive sample.
The positive samples may include training samples and test samples. A fifth number of images is selected as training samples and a sixth number of images is selected as test samples. For example, if 10000 images are obtained by manual labeling or coordinate transformation, the fifth number may be 8000 and the sixth number may be 2000.
The negative example is an image that does not include the head and shoulder area of the human body, and may be a seventh number of images in which no human body is present and the background of the image is complicated. For example, the seventh number may be 10000.
In order to make the sample information more accurate, the image of the third training sample can be ensured to meet a certain scale proportion. For example, aspect ratios may be limited, such as: 1: 0.85; the size of the image of the first training sample may also be fixed, e.g. using a size of 34 x 29, 34, 29 representing the number of pixels.
The training process of the head-shoulder classifier may include a multi-round training process, for example, 5 rounds may be trained. The first round of training may be to select an eighth number of negative samples from images that do not include the human head-shoulder area, the eighth number being less than or equal to a seventh number. After each round of training is finished, testing the head-shoulder classifier obtained by the round of training by using the test sample; the misclassified images are added as new samples to the negative samples, and the eighth number is increased by a corresponding amount. After multi-round training, the head and shoulder classifier with higher classification effect can be obtained.
206: acquiring an enclosing region where the human body is located by utilizing the candidate region of each human head and shoulder and utilizing an enclosing box coordinate estimation algorithm, and determining the enclosing region as the candidate region of the human body;
207: selecting an optimal candidate region of the human body from the candidate regions of the human body as a target region of the human body;
an optimal candidate region of the human body may be selected from the candidate regions of the human body using an object detection algorithm. The target detection algorithm may be a DPM target detection algorithm, and selecting an optimal candidate region of the human body from the candidate regions of the human body may specifically include:
matching each candidate region with a human body model, and selecting one candidate region which is most matched as an optimal target region of the human body; the human body model is obtained by utilizing a fourth training sample to train and pre-train, and the fourth training sample comprises a positive sample containing the human body and a negative sample not containing the human body; the manikin comprises a main model and a plurality of sub models.
Preferably, the human body model is composed of a main model and a plurality of sub models. The main model is an ideal template of a human body. Each sub-model represents an ideal location of a robust feature of the human body (e.g., the head-shoulder region) in the human body. The plurality of sub-models may be 6 sub-models. Fig. 3 is a schematic diagram of a human body model and a human body submodel in the DPM algorithm.
The matching each candidate region with the human body model, and selecting a best matching candidate region as the optimal target region of the human body may include:
dividing each candidate region of the human body to obtain a plurality of human body sub-regions;
preferably, the plurality of human body sub-regions of the candidate region are equal to the plurality of sub-regions of the human body model in number, and the positions of the plurality of human body sub-regions are matched with each other and are in one-to-one correspondence.
Calculating a first matching score of each candidate region of the human body and a main model of a human body model;
preferably, the calculating of the first matching score of each candidate region of the human body with the human body model may be calculating a matching score of each candidate region of the human body with a main model of the human body model as the first matching score.
Calculating a second matching score of each human body sub-region in each candidate region of the human body and the corresponding sub-model in the human body model;
preferably, the second matching score is for each sub-region of the human body in the candidate region, the deformation cost deviating from the ideal position represented by the corresponding sub-model in the human body model is lower the further the deviation is.
Adding the second matching scores corresponding to the plurality of human body sub-regions of each candidate region of the human body to obtain a third matching score corresponding to each candidate region;
and adding the first matching score and the third fourth matching score corresponding to each candidate area of the human body to obtain a fourth matching score.
The fourth matching score may be calculated using the following formula:
Figure BDA0001034706560000141
wherein,
Figure BDA0001034706560000142
a first match score is represented as a function of,
Figure BDA0001034706560000143
a second match score corresponding to the ith person's body region,
Figure BDA0001034706560000151
represents the third matching score, score (x)0,y0,l0) Represents a fourth match score; (x)0,y0) 2 (x) represents the coordinate of the upper left corner of the root filter on the root feature map (the root filter can be a human body model, the root feature map can be a candidate area, and the root filter is required to be used for sliding a window on the root feature map)0,y0)+viCoordinates representing a part (human body submodel) in the root filter (the coordinates are the coordinates of a part on a part feature map with a resolution 2 times that of the root feature map, thus multiplied by 2, viIs the offset of the part, such as from the upper left corner to the head position, or from the upper left corner to the arm position); b denotes the root offset set for aligning the various parts.
And selecting the candidate region with the highest fourth matching score as the optimal target region of the human body.
208: and extracting a three-channel color histogram of the target area.
209: and calculating the posterior marking color characteristic selected when the probability in the conditional random field is maximum by using a conditional random field algorithm and the color histogram of the target region, and taking the posterior marking color characteristic as the color characteristic of the image to be processed.
In the embodiment of the application, a specific application scene is combined, the accurate region where people appear in the image is obtained, and then the color features of the clothes of the human body are extracted from the region. The accuracy of the extracted color features of the human body laundry can be improved.
As shown in fig. 4, a schematic structural diagram of an embodiment of a color feature extraction system provided in the present application may include the following modules:
the first extraction module 401: the candidate region is used for extracting the characteristic part of the target object in the image to be processed;
the first determination module 402: the candidate region of the target object is determined by utilizing each candidate region of the characteristic part;
preferably, the first determining module may include:
and the first determining unit is used for acquiring an enclosing region where the target object is located by utilizing the candidate region of each characteristic part and utilizing an enclosing box algorithm, and determining the enclosing region as the candidate region of the target object.
The first selection module 403: the optimal candidate area of the target object is selected from the candidate areas of the target object to serve as the target area of the target object;
the second extraction module 404: and the color feature extraction module is used for extracting the color feature of the target area as the color feature of the image to be processed.
Preferably, the second extraction module may include:
a third extraction unit, configured to extract a three-channel color histogram of the target region;
and the second selection unit is used for calculating the posterior annotation color feature selected when the probability in the conditional random field is maximum by using the conditional random field algorithm and the three-channel color histogram as the color feature of the image to be processed.
Preferably, the second selecting unit is specifically configured to input a probability formula of the conditional random field by using a three-channel color histogram of the target region as an observed color feature:
Figure BDA0001034706560000161
where i denotes the ith position of the observation sequence, tkRepresenting a transfer characteristic function between marker positions i-1 and i for the observed color characteristic, slA state feature function, λ, representing the position of the observation color feature ikAnd mulThe representation of the corresponding weight value can be obtained by maximum likelihood estimation. x represents the prior observation color characteristic, y represents the posterior labeling color characteristic, and the posterior labeling color characteristic when the probability formula of the conditional random field obtains the probability maximum value is the color characteristic of the image to be processed.
In the embodiment, the robustness of obtaining the target region where the target object is located can be improved, and the accuracy of obtaining the target region where the target object is located is improved, so that the accuracy of the extracted color features is improved, and the probability of errors of the color features is reduced.
As still another embodiment, in order to further improve the accuracy of the color feature, the first extraction module may include:
the first construction unit is used for constructing a pyramid characteristic image of the image to be processed;
the second construction unit is used for constructing an integral channel characteristic image of each pyramid characteristic image;
the first extraction unit is used for carrying out window sliding on each integral channel characteristic image of each pyramid characteristic image to extract a candidate window;
a second extraction unit configured to extract a plurality of rectangular candidate frames from the candidate window;
a first classification unit configured to select, by using a classifier, a candidate region including a feature portion of the target object from the plurality of rectangular candidate frames; the classifier is obtained by pre-training with a first training sample, wherein the first training sample comprises a positive sample containing the target object feature and a negative sample not containing the target object feature.
In this embodiment, since the plurality of rectangular candidate frames only include the feature portion of the target object, each candidate frame occupies a smaller memory, and the smaller the memory is, the faster the calculation speed is, and the less calculation error occurs in the calculation process, so that by using this way, a large amount of background information can be quickly and accurately excluded, the accuracy of the extracted target object candidate region is improved, and the accuracy of the color feature is further improved.
As still another embodiment, in order to improve the accuracy of the obtained target region, an optimal candidate region may be extracted using a target detection algorithm. The target detection algorithm may be a DPM target detection algorithm, and the first selection module may include:
the first selection unit is used for matching each candidate region with the object model and selecting the candidate region which is most matched as the optimal target region of the target object; the object model is obtained by utilizing a second training sample to train and pre-train, wherein the second training sample comprises a positive sample containing the target object and a negative sample not containing the target object; the object model comprises a main model and a plurality of sub models.
Preferably, the first selection unit includes:
a first dividing unit, configured to divide each candidate region of the target object to obtain a plurality of sub-regions;
a first calculating subunit, configured to calculate a first matching score between each candidate region of the target object and a main model of an object model;
the second calculating subunit is used for calculating a second matching score of each sub-region in each candidate region of the target object and the corresponding sub-model in the object model;
the third calculation subunit is used for adding the second matching scores corresponding to the sub-regions of each candidate region of the target object to obtain a third matching score corresponding to each candidate region;
and the fourth calculating subunit is configured to add the first matching score and the third matching score corresponding to each candidate region of the target object to obtain a fourth matching score.
And the first selection subunit is used for selecting the candidate region with the highest fourth matching score as the optimal target region of the target object.
Preferably, the fourth matching score may be specifically calculated by the following formula:
Figure BDA0001034706560000181
wherein,
Figure BDA0001034706560000182
a first match score is represented as a function of,
Figure BDA0001034706560000183
a second matching score corresponding to the ith sub-region,
Figure BDA0001034706560000184
represents the third matching score, score (x)0,y0,l0) Represents a fourth match score; (x)0,y0) 2 (x) represents the coordinate of the upper left corner of the root filter on the root feature map (the root filter can be a main model, the root feature map can be a candidate region, and the root filter needs to be used for sliding a window on the root feature map), 2 (x)0,y0)+viCoordinates representing a part (sub-model) in the root filter (the coordinates being the coordinates of a part on a part feature map with a resolution 2 times that of the root feature map, hence multiplied by 2, viIs the offset of the part, such as from the upper left corner to the head position, or from the upper left corner to the arm position); b denotes the root offset set for aligning the various parts.
As still another embodiment, in order to further improve the accuracy of the extracted color features, the second extraction module may include:
the fourth extraction unit is used for extracting a three-channel color histogram of each sub-area in the target area;
the third selection unit is used for calculating the posterior labeling color feature of each selected subregion when the probability in the conditional random field is maximum by using a conditional random field algorithm and a three-channel color histogram of each subregion in the target region;
and the first statistic unit is used for counting the posterior labeling color features of each sub-region to serve as the color features of the image to be processed.
In the embodiment of the application, a method for acquiring an optimal target area of a target object appearing in an image is adopted. The position where the target area appears can be accurately obtained, then color feature extraction is carried out, the pertinence of the color feature extraction is improved, and therefore the accuracy of the extracted color feature is further improved.
In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.
Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, computer readable media does not include non-transitory computer readable media (transient media), such as modulated data signals and carrier waves.
As used in the specification and in the claims, certain terms are used to refer to particular components. As one skilled in the art will appreciate, manufacturers may refer to a component by different names. This specification and claims do not intend to distinguish between components that differ in name but not function. In the following description and in the claims, the terms "include" and "comprise" are used in an open-ended fashion, and thus should be interpreted to mean "include, but not limited to. "substantially" means within an acceptable error range, and a person skilled in the art can solve the technical problem within a certain error range to substantially achieve the technical effect. Furthermore, the term "coupled" is intended to encompass any direct or indirect electrical coupling. Thus, if a first device couples to a second device, that connection may be through a direct electrical coupling or through an indirect electrical coupling via other devices and couplings. The description which follows is a preferred embodiment of the present application, but is made for the purpose of illustrating the general principles of the application and not for the purpose of limiting the scope of the application. The protection scope of the present application shall be subject to the definitions of the appended claims.
It is also noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a good or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such good or system. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a commodity or system that includes the element.
The foregoing description shows and describes several preferred embodiments of the present application, but as aforementioned, it is to be understood that the application is not limited to the forms disclosed herein, but is not to be construed as excluding other embodiments and is capable of use in various other combinations, modifications, and environments and is capable of changes within the scope of the application as described herein, commensurate with the above teachings, or the skill or knowledge of the relevant art. And that modifications and variations may be effected by those skilled in the art without departing from the spirit and scope of the application, which is to be protected by the claims appended hereto.

Claims (8)

1. A method of color feature extraction, the method comprising:
extracting a candidate region where a characteristic part of a target object in an image to be processed is located;
determining a candidate region of the target object by using each candidate region of the characteristic part;
selecting an optimal candidate region of the target object from the candidate regions of the target object as a target region of the target object;
extracting color features of the target area to serve as the color features of the image to be processed;
wherein,
the selecting an optimal candidate region of the target object from the candidate regions of the target object as the target region of the target object includes:
matching each candidate region with the object model, and selecting one candidate region which is most matched as the optimal target region of the target object; the object model is obtained by utilizing a second training sample to train and pre-train, wherein the second training sample comprises a positive sample containing the target object and a negative sample not containing the target object; the object model comprises a main model and a plurality of sub models;
matching each candidate region with the object model, and selecting a candidate region with the best match as the optimal target region of the target object includes:
dividing each candidate region of the target object to obtain a plurality of sub-regions;
calculating a first matching score of each candidate region of the target object with a main model of an object model;
calculating a second matching score of each sub-region in each candidate region of the target object and the corresponding sub-model in the object model;
adding the second matching scores corresponding to the multiple sub-areas of each candidate area of the target object to obtain a third matching score corresponding to each candidate area;
adding the first matching score and the third matching score corresponding to each candidate area of the target object to obtain a fourth matching score;
and selecting the candidate area with the highest fourth matching score as the optimal target area of the target object.
2. The method of claim 1,
the extracting of the candidate region where the characteristic part of the target object in the image to be processed is located includes:
constructing a pyramid characteristic image of the image to be processed;
constructing an integral channel characteristic image of each pyramid characteristic image;
carrying out window sliding on each integral channel characteristic image of each pyramid characteristic image to extract a candidate window;
extracting a plurality of rectangular candidate frames from the candidate window;
selecting a candidate region containing the characteristic part of the target object from the plurality of rectangular candidate frames by using a classifier; the classifier is obtained by pre-training with a first training sample, wherein the first training sample comprises a positive sample containing the target object feature and a negative sample not containing the target object feature.
3. The method of claim 1,
the determining the candidate region of the target object by using each candidate region of the feature part comprises:
and acquiring an enclosing region where the target object is located by utilizing the candidate region of each characteristic part and utilizing an enclosing box algorithm, and determining the enclosing region as the candidate region of the target object.
4. The method of claim 1,
the extracting the color feature of the target area as the color feature of the image to be processed includes:
extracting a three-channel color histogram of the target area;
and calculating the posterior annotation color characteristic selected when the probability in the conditional random field is maximum by using a conditional random field algorithm and the three-channel color histogram of the target region, wherein the posterior annotation color characteristic is used as the color characteristic of the image to be processed.
5. A color extraction system, the system comprising:
the first extraction module is used for extracting a candidate region where a characteristic part of a target object in an image to be processed is located;
a first determination module, configured to determine a candidate region of the target object by using each candidate region of the feature;
the first selection module is used for selecting the optimal candidate area of the target object from the candidate areas of the target object as the target area of the target object;
the second extraction module is used for extracting the color features of the target area as the color features of the image to be processed;
wherein,
the first selection module comprises:
the first selection unit is used for matching each candidate region with the object model and selecting the candidate region which is most matched as the optimal target region of the target object; the object model is obtained by utilizing a second training sample to train and pre-train, wherein the second training sample comprises a positive sample containing the target object and a negative sample not containing the target object; the object model comprises a main model and a plurality of sub models;
the first selection unit includes:
a first dividing unit, configured to divide each candidate region of the target object to obtain a plurality of sub-regions;
a first calculating subunit, configured to calculate a first matching score between each candidate region of the target object and a main model of an object model;
the second calculating subunit is used for calculating a second matching score of each sub-region in each candidate region of the target object and the corresponding sub-model in the object model;
the third calculation subunit is used for adding the second matching scores corresponding to the sub-regions of each candidate region of the target object to obtain a third matching score corresponding to each candidate region;
the fourth calculating subunit is configured to add the first matching score and the third matching score corresponding to each candidate region of the target object to obtain a fourth matching score;
and the first selection subunit is used for selecting the candidate region with the highest fourth matching score as the optimal target region of the target object.
6. The system of claim 5,
the first extraction module comprises:
the first construction unit is used for constructing a pyramid characteristic image of the image to be processed;
the second construction unit is used for constructing an integral channel characteristic image of each pyramid characteristic image;
the first extraction unit is used for carrying out window sliding on each integral channel characteristic image of each pyramid characteristic image to extract a candidate window;
a second extraction unit configured to extract a plurality of rectangular candidate frames from the candidate window;
a first classification unit configured to select, by using a classifier, a candidate region including a feature portion of the target object from the plurality of rectangular candidate frames; the classifier is obtained by pre-training with a first training sample, wherein the first training sample comprises a positive sample containing the target object feature and a negative sample not containing the target object feature.
7. The system of claim 5,
the first determining module includes:
and the first determining unit is used for acquiring an enclosing region where the target object is located by utilizing the candidate region of each characteristic part and utilizing an enclosing box algorithm, and determining the enclosing region as the candidate region of the target object.
8. The system of claim 5,
the second extraction module comprises:
a third extraction unit, configured to extract a three-channel color histogram of the target region;
and the second selection unit is used for calculating the posterior annotation color feature selected when the probability in the conditional random field is maximum by using the conditional random field algorithm and the three-channel color histogram of the target region, and taking the posterior annotation color feature as the color feature of the image to be processed.
CN201610499736.7A 2016-06-29 2016-06-29 Color feature extraction method and system Active CN107545268B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610499736.7A CN107545268B (en) 2016-06-29 2016-06-29 Color feature extraction method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610499736.7A CN107545268B (en) 2016-06-29 2016-06-29 Color feature extraction method and system

Publications (2)

Publication Number Publication Date
CN107545268A CN107545268A (en) 2018-01-05
CN107545268B true CN107545268B (en) 2020-07-03

Family

ID=60965969

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610499736.7A Active CN107545268B (en) 2016-06-29 2016-06-29 Color feature extraction method and system

Country Status (1)

Country Link
CN (1) CN107545268B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101325690A (en) * 2007-06-12 2008-12-17 上海正电科技发展有限公司 Method and system for detecting human flow analysis and crowd accumulation process of monitoring video flow
CN104850844A (en) * 2015-05-27 2015-08-19 成都新舟锐视科技有限公司 Pedestrian detection method based on rapid construction of image characteristic pyramid
US9158989B1 (en) * 2014-05-08 2015-10-13 Tandent Vision Science, Inc. Color pure scale-spaced pyramid arrangement for use in an image segregation

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101325690A (en) * 2007-06-12 2008-12-17 上海正电科技发展有限公司 Method and system for detecting human flow analysis and crowd accumulation process of monitoring video flow
US9158989B1 (en) * 2014-05-08 2015-10-13 Tandent Vision Science, Inc. Color pure scale-spaced pyramid arrangement for use in an image segregation
CN104850844A (en) * 2015-05-27 2015-08-19 成都新舟锐视科技有限公司 Pedestrian detection method based on rapid construction of image characteristic pyramid

Also Published As

Publication number Publication date
CN107545268A (en) 2018-01-05

Similar Documents

Publication Publication Date Title
US9824294B2 (en) Saliency information acquisition device and saliency information acquisition method
CN108549870B (en) Method and device for identifying article display
CN108717531B (en) Human body posture estimation method based on Faster R-CNN
CN108875600A (en) A kind of information of vehicles detection and tracking method, apparatus and computer storage medium based on YOLO
CN105740780B (en) Method and device for detecting living human face
CN111860494B (en) Optimization method and device for image target detection, electronic equipment and storage medium
WO2015161776A1 (en) Hand motion identification method and apparatus
CN104835175B (en) Object detection method in a kind of nuclear environment of view-based access control model attention mechanism
CN111160249A (en) Multi-class target detection method of optical remote sensing image based on cross-scale feature fusion
US20170011523A1 (en) Image processing apparatus, image processing method, and storage medium
CN102214309B (en) Special human body recognition method based on head and shoulder model
CN108305260B (en) Method, device and equipment for detecting angular points in image
CN105335725A (en) Gait identification identity authentication method based on feature fusion
CN109919002B (en) Yellow stop line identification method and device, computer equipment and storage medium
CN110334703B (en) Ship detection and identification method in day and night image
CN107766864B (en) Method and device for extracting features and method and device for object recognition
CN104463240B (en) A kind of instrument localization method and device
CN104376575A (en) Pedestrian counting method and device based on monitoring of multiple cameras
CN111723814A (en) Cross-image association based weak supervision image semantic segmentation method, system and device
CN110599463A (en) Tongue image detection and positioning algorithm based on lightweight cascade neural network
CN116883588A (en) Method and system for quickly reconstructing three-dimensional point cloud under large scene
CN108734200A (en) Human body target visible detection method and device based on BING features
CN110930384A (en) Crowd counting method, device, equipment and medium based on density information
CN108447092B (en) Method and device for visually positioning marker
CN115620022A (en) Object detection method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
EE01 Entry into force of recordation of patent licensing contract

Application publication date: 20180105

Assignee: Apple R&D (Beijing) Co., Ltd.

Assignor: BEIJING MOSHANGHUA TECHNOLOGY CO., LTD.

Contract record no.: 2019990000054

Denomination of invention: Color feature extraction method and system

License type: Exclusive License

Record date: 20190211

EE01 Entry into force of recordation of patent licensing contract
GR01 Patent grant
GR01 Patent grant