WO2013075295A1

WO2013075295A1 - Clothing identification method and system for low-resolution video

Info

Publication number: WO2013075295A1
Application number: PCT/CN2011/082705
Authority: WO
Inventors: 李响; 李俐; 张超; 陈晓娟
Original assignee: 浙江晨鹰科技有限公司
Priority date: 2011-11-23
Filing date: 2011-11-23
Publication date: 2013-05-30

Abstract

Disclosed are a clothing identification method and system for low-resolution video. The method is based on the time and space classifier convergence technology. The method concerns particularly: extracting a foreground image in a video stream, and extracting contour information about a moving object in the foreground image; identifying a moving human body target according to the extracted contour information; performing multi-point feature identification processing on different blocks of the same human body target in the same frame image in the video stream and voting to decide the identification result; and performing the voting decision according to the decision result of the same human body target in a plurality of frame images in the video stream and finally determining the clothing type of the moving human body target. By way of the method disclosed on the basis of the time and space classifier convergence technology in the present invention, the clothing features of a plurality of video frames of the same moving target are decided on the basis of motion detection, human body identification and clothing identification, and the clothing type and identity of the human body target are finally determined, thus realizing the purpose of identity and clothing identification with high efficiency, high quality, and high accuracy.

Description

Garment recognition method and system for low resolution video

TECHNICAL FIELD The present invention relates to the field of image information processing technologies, and more particularly to a garment recognition method and system for low resolution video.

BACKGROUND OF THE INVENTION With the continuous advancement of technology, traditional biometrics technology has been difficult to meet the requirements for security protection in security-sensitive situations (such as military complexes such as military complexes and armed police complexes). Therefore, the demand for intelligent visual monitoring system that can realize the automatic identification of characters in real time is put forward. In recent years, the non-contact long-distance character identification technology has received extensive attention from researchers and has also been developed accordingly.

At present, mainly in multi-mode, large-scale visual monitoring technology to achieve long-distance personnel detection, classification and identification technology related research. These include "gait-based identification" and "face recognition based identification".

Among them, due to the gait-based long-distance personnel identification, it is necessary to establish a rich gait knowledge base. Therefore, it is impossible to satisfy the requirement of non-contact remote character identification in the case where the number of people is large and there is a uniform distinction.

Face recognition based on face recognition, mainly adopts multi-level detection system, which is roughly divided into four levels: face detection, uniform area detection, accessory detection and collar recognition, and filtering out a large amount of irrelevant data in the process of executing each level. To improve detection accuracy and efficiency. The specific identification process is shown in Figure 1. The face data obtained after the face detection is the next level of execution of the uniform area inspection service; the uniform area mask obtained after the inspection of the uniform area is the next level of accessories and the collar detection service. .

In the prior art clothing recognition based on the face recognition based on the face recognition process, it is necessary to perform face recognition first, and then perform military uniform recognition according to the feature of the accessory collar flower. However, in the recognition of face recognition and uniforms, images with high resolution and good background effects are required. For low-resolution images, the recognition accuracy is lowered and the missed detection rate is high. Moreover, the method is suitable for image recognition, not video, and therefore cannot meet or adapt to clothing recognition in video or even low resolution video. SUMMARY OF THE INVENTION In view of this, the present invention provides a clothing recognition method and system for low-resolution video, which overcomes the method of face recognition based on the prior art, and cannot realize character clothing and identity recognition in low-resolution video. The problem.

To achieve the above object, the present invention provides the following technical solutions:

A clothing recognition method for low resolution video, comprising:

Determining a current time series in the received video stream, extracting a foreground image in the video stream, determining a human body target from the foreground image, and extracting contour information of the human body target;

Decomposing the contour information of the human body target, extracting a clothing feature value corresponding to each block in the contour information of the human body target according to the preset clothing category; and a clothing category of each of the blocks in the frame;

Converging the clothing categories of the respective segments, and performing a voting decision according to the pre-stored clothing categories to determine a clothing category of the human target in the current time series;

Returning to the step of determining a current time sequence in the video stream, acquiring a clothing category of the same human target in each frame in different time series in the video stream, and performing a voting decision according to the pre-stored clothing category, determining The clothing category of the sports target.

A garment recognition system for low resolution video, comprising:

Extracting means, configured to determine a current time sequence in the received video stream, and extract a foreground image of the video stream time series, determine a human body target from the foreground image, and extract contour information of the human body target;

Decomposing means, configured to decompose the contour information of the human body target, and extract a clothing feature value corresponding to each block in the contour information of the human body target according to the preset clothing category;

a comparison identifying device, configured to compare the obtained clothing feature values of each segment with a preset clothing feature threshold, and identify a clothing category of each segment in the current frame;

a merging device, configured to fuse a clothing category of each of the blocks in the same time sequence or different time series in the video stream;

a determining device, configured to perform a voting decision according to the pre-stored clothing category, determine a clothing category of the human target in the current time series; and a clothing category of the same human target in each time frame of different time series After the fusion, the judgment of the clothing category of the human target.

According to the above technical solution, compared with the prior art, the present invention discloses a clothing recognition method and system for low resolution video. Based on the spatio-temporal classifier fusion technique, first, extracting the foreground image in the acquired video stream, and extracting the contour information of the moving human body; and then, identifying the moving human body object according to the extracted contour information; and passing the same frame in the video frame Different blocks of the same human target in the image are processed by multi-point feature recognition, and the recognition result is voted. Finally, the voting result is determined according to the judgment result of the same human target in the multi-frame image in the video stream, and finally the human body is determined. The clothing category of the target. According to the above-described method based on spatio-temporal classifier fusion technology, the method for performing moving human target recognition according to the background model preprocesses the algorithm, and can eliminate objects in the video background that are similar to the target color, and reduce interference. At the same time, considering a plurality of clothing features, based on motion detection, human body recognition and clothing recognition, the result of the clothing feature judgment in multiple video frames of the same human target, finally determining the clothing category of the moving target, thereby achieving high efficiency and high quality. , high-accuracy identity and clothing identification purposes.

BRIEF DESCRIPTION OF THE DRAWINGS In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the embodiments or the description of the prior art will be briefly described below. Obviously, in the following description The drawings are merely examples of the invention, and those skilled in the art can obtain other drawings based on the drawings provided without any creative work.

1 is a flow chart of a method for identifying a face recognition based on the prior art; FIG. 2 is a flowchart of a method for recognizing a low resolution video according to an embodiment of the present invention; A flowchart of extracting a foreground image disclosed in the embodiment;

4a-4c are diagrams showing an effect of the process of identifying a human body object according to an embodiment of the present invention; FIG. 5 is a flowchart of performing various feature information extraction according to an embodiment of the present invention;

6 is a flowchart of performing multi-feature weak classifier fusion according to an embodiment of the present invention; FIG. 7 is an effect diagram of finalizing garment recognition processing in low-resolution video according to an embodiment of the present invention;

FIG. 8 is a schematic diagram of a clothing recognition system for low resolution video disclosed in an embodiment of the present invention. The technical solutions in the embodiments of the present invention are clearly and completely described in the following with reference to the accompanying drawings in the embodiments of the present invention. It is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. example. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments of the present invention without departing from the inventive scope are the scope of the present invention.

The following embodiments of the present invention disclose a clothing recognition method and system for low-resolution video. Based on the spatio-temporal classifier fusion technology, various uniforms, general clothing, camouflage clothes, etc. can be identified and classified, and high efficiency and high quality are adopted. And realize the identification of the final identity information in a reliable and accurate manner. The specific process is described in detail by the following examples. Embodiment 1

Referring to FIG. 2, it is a flow chart of a method for recognizing a low-resolution video according to the present invention, which mainly includes the following steps:

Step S101: Extract a foreground image in the received video stream.

Referring to FIG. 3, the specific process of performing step S101 is:

Step S1011: The video stream is read into a computer or a related device that can be analyzed, and the obtained video stream is decomposed, and a plurality of single-frame video sequences are obtained according to a time series.

Step S1012: Acquire a foreground image corresponding to the plurality of single-frame video sequences. The process of acquiring the foreground image corresponding to a single frame video sequence in the step S1012 is: first, performing background modeling on the video according to the content of the video sequence; secondly, determining the current single frame video sequence and the current background frame; Determining a foreground image corresponding to the current single-frame video sequence by using a difference between the current frame video sequence and the background frame; and finally, updating the background frame according to the current single-frame video sequence to ensure the accuracy of the background frame in the next frame, The update process is updated in real time.

It should be noted that the process of determining the current background frame determined by the above is: background modeling implemented by a single Gaussian or mixed Gaussian method. And further adopting the frame difference principle to obtain a corresponding foreground image according to the difference between the current single frame video sequence and the background frame.

The above background modeling of video can adopt single Gaussian, mixed Gaussian, Kernel-based, Eigen-Background and the like. In the embodiment disclosed by the present invention, the background modeling is performed by using a mixed Gaussian method, that is, the background frame is obtained, and the mixed Gaussian model is defined as:

Where is the jth Gaussian kernel weight; K is the Gaussian kernel number; ;; (; ^; ∑ is the jth Gaussian distribution with a median of /, and the variance is 。. The above (1) is expressed as: At the moment, each pixel has a probability of _3⁄4 PC^^ K Gaussian mixture to facilitate background modeling in the video stream.

The foreground image of the moving object is extracted by the above process, that is, the foreground image is extracted by subtracting the video image from the background image, that is, the frame difference can be used to improve the foreground image extraction effect, thereby obtaining a more accurate foreground of the moving target. The image, that is, the contour foreground image.

In addition, in order to eliminate the influence of noise and cavity in the process of extracting the foreground image, the filtering method is used to remove the noise; the mathematical morphology is used to remove the cavity, thereby obtaining a foreground image of the moving target with better effect. The effect diagram after performing the above process is shown in Fig. 4a~®4c, wherein Fig. 4a is a video image; Fig. 4b is a background image (background frame); Fig. 4c is a foreground image corresponding to the current moving target.

Step S102: Determine a current time series in the video stream, determine a moving target from the foreground image, and extract contour information of the moving human target. The human body target generally refers to the moving human body, that is, the human body appearing in the foreground image in the current time series.

The contour information acquired during the execution of step S102 is determined based on the contour width and the contour height of the human body target. And the process of specifically extracting the contour information of the human body target is: first, extracting the feature of the moving object from the foreground image, analyzing according to the ratio of the width and height of the moving object, and identifying the moving human body; and then analyzing and acquiring the human body Outline information.

A more specific description is: Extracting the contour features of the moving object based on the plane geometry knowledge. The distance between the leftmost point and the rightmost point of the contour of each animal is taken as the width of the object; the distance between the uppermost point and the lowermost point is taken as the height of the moving object. Calculating the aspect ratio of the moving object, and comparing the length-to-width ratio of each obtained moving object with the threshold value of the shoulder width and the height ratio of the conventional human body, excluding other moving objects such as vehicles, and determining the moving target, that is, the moving human body, and Extract the contour information of the moving human body from it.

Adopting the extraction method of the foreground image in the above steps S101 and S102, and the manner of recognizing the moving object in the foreground image, and the preprocessing for the moving target (setting the shoulder width of the conventional human body, The threshold of height ratio can effectively overcome the influence of objects such as trees and buildings on the recognition result, and reduce the interference of non-moving human body in the moving object to recognize the moving human body.

Step S103: Decompose the contour information of the human body target, and extract a clothing feature value corresponding to each block in the contour information of the human body target according to the preset clothing category. In contrast, the clothing category of each block in the current frame is identified.

In step S105, the clothing category of each of the segments is merged, and a voting decision is performed according to the pre-stored clothing category, and the clothing category of the human target in the current time series is determined.

The process of extracting, merging, and identifying a plurality of feature information for each of the above blocks. The plurality of feature information extractions are specifically directed to step S103. Referring to FIG. 5, the method further includes the following steps: Step S1031: Decompose the contour information of the human body, and divide the human body according to the biological characteristics of the human body.

Step S1032: Perform eigenvalue training, and perform calculation of the corresponding clothing feature value according to the preset clothing category.

Step S1033: Extract a clothing feature value corresponding to each block in the contour information of the human body.

Identifying the clothing category of each of the segments in the current frame for step S104, identifying and determining the clothing category of the human target in the current time series is then for step S105, and the fusion of the plurality of characteristic information for each of the segments is also directed to step S105. The process can also be specifically as follows: First, according to the human target obtained in step S102, one of the personal goals is determined. Decomposing the contour information of the determined human target, that is, dividing the same individual body, for example, dividing the human body into parts such as arms, tops, pants, etc. according to human body characteristics (step S1031); and then performing each block The calculation and extraction of the clothing feature values; the different segment clothing categories are identified, and the clothing categories of the respective blocks are merged. That is, the fusion is implemented based on the spatiotemporal classifier fusion technique; finally, the voting decision is made by the large number decision, and the clothing category of the human target in the current time series is determined.

The above process can be recorded as: for the clothing category of the same individual body target in the same frame image in the video stream, using the spatial correlation of the image, voting judgment is performed on the multiple recognition results, and the same human body is judged by using the large number judgment. The plurality of recognition results are combined and aggregated to obtain a clothing recognition result.

In addition, for low-resolution video, any feature is not stable, therefore, To ensure garment recognition for moving objects in low-resolution video, in the present invention, a plurality of weak classifiers are fused to form a strong classifier that is stable for a single block image to implement clothing recognition for the block image. The process of specifically performing multi-feature weak classifier fusion can be seen in FIG. 6.

For the corresponding clothing feature values in each block, in the present embodiment, the color and texture features are selected as the classification features for the calculation of the feature values, and the calculation process corresponds to the feature value calculation portion in FIG.

First, training for different categories of clothing color and texture features can be performed offline. Select hundreds of samples to calculate clothing color and texture features separately. That is, the eigenvalue calculations such as color and gray level symmetry are performed for each identified block (human body region).

Among them, the color feature calculation, in the mega area of the segmentation sample, converts the RGB color space into the HSV color space, and the HSV color system is closer to human visual perception. The specific conversion method is as follows:

( 2 )

Max(R, G, B) - min(R, G, B)

Max(R, G, B)

(3)

Max(R, G, B)

255

(4)

Where R, G, and B represent the color values of the RGB color space, respectively, and H, S, and V represent the color values in the HSV color space, respectively.

Using the above formula (2) to formula (4), the three channels of H, S, and V are separated for the converted picture. In addition, by counting the values of the different categories of clothing 11, S, V, the range of values of the preset clothing color feature threshold can be determined.

Texture feature calculation, for different clothing samples, the gray level co-occurrence matrix is used for texture feature calculation. The gray level co-occurrence matrix has a total of 15 eigenvalues. In the embodiment disclosed in the present invention, the four second-order moments, the contrast, the correlation, and the entropy are selected for calculation. Among them, the second moment of the angle, also called energy, can be expressed by the formula:

ASM = ∑∑P(i † ( 5 ) The second-order moment of the angle is a measure of the uniformity of the gray scale of the image texture, which is used to reflect the uniformity of the image gray distribution and the texture thickness.

The contrast CON is used to measure how the values of the matrix are distributed and how much of the image changes locally, reflecting the sharpness of the image and the depth of the texture.

Correlated CORRLN is used to measure the similarity of spatial gray level co-occurrence matrix elements in the row or column direction. Therefore, the magnitude of the correlation value can reflect the local grayscale correlation in the image. Specifically expressed as:

CORRLN = |∑∑(W)P(i, j)) - μ _χ μ _γ } / ^ ( 7 ) Entropy ΕΝΤ is used to measure the randomness of image texture. When all the values in the spatial co-occurrence matrix are equal, it takes the maximum value; conversely, if the values in the co-occurrence matrix are very uneven, the value is small. Specifically expressed as:

Using the above formulas (5) to (8), the second-order moment, contrast, correlation and entropy information of different categories of clothing angles are calculated for the sample RGB images converted into grayscale images, and the preset clothing texture feature threshold range is finally determined.

After training the color and texture features of different categories of clothing, the color and texture features of the human body region in the video image of the recognized human body are subjected to block calculation according to the above calculation method; and a plurality of different features for the selected color feature and texture; For the j-th feature, the weak classifier 126 is constructed, and the value of each weak classifier of each image clothing is determined by the threshold range; since each classifier has the same feature weight, for each of the N features, each feature weight is 1/1 N. Finally, the weak classifier fusion is cascaded into a strong classifier output that is stable for a single block image. That is, the clothing category of each of the sub-blocks is fused by the clothing category weak classifier corresponding to each sub-block, and the result of the fusion is output through the strong classifier.

Finally, according to the recognition results of the same human body target, that is, the different blocks of the same human body, the clothing identification is performed using the large number judgment, and the final recognition result is obtained.

The embodiment of the invention is formed by multiple weak classifiers by comprehensively considering multiple features such as color and texture. The strong classifier, after a large number of judgments, uses the recognition results of different blocks of the same human body to optimize the clothing recognition results. Ensure the recognition effect with good effect and high reliability.

It should be noted that the preset clothing texture feature threshold and the preset clothing color feature threshold are both part of the preset clothing feature threshold. That is to say, in the process of comparing, the obtained clothing feature values of each block of the same type are compared with the feature thresholds of the same type in the preset clothing feature threshold.

Step S106, returning to step S102, the tracking of the human body target is realized, that is, the recognition of the clothing category of the human body target in the next time sequence or the adjacent time series is performed.

Step S107: Acquire a clothing category of the same human target in each frame in the different time series in the video stream, and perform a voting decision according to the pre-stored clothing category to determine a clothing category of the moving target.

Step S106 is performed by using the most single moving target tracking method to perform the same motion on the moving human body target in the next time series or adjacent time series in the video sequence according to the human body morphological characteristics (aspect ratio) and the position correlation of the target. Target Tracking. And repeating the above steps S102 to S105 to obtain the clothing category of the human target in the current time series of the same moving target. Finally, step S107 is performed, according to the identification result of the clothing category of the plurality of adjacent video sequences, the voting is performed, and the result of the large number is used to smooth out the erroneous recognition result. Thereby completing the garment recognition processing in the low resolution video. See Figure 7 for the specific recognition effect.

In the process of performing step S106 and step S107, the adjacent multi-frame recognition result is summarized by using the time series correlation in the embodiment of the present invention, and the multi-frame clothing recognition result of the same human body is voted based on the tracking result. The use of adjacent multi-frame recognition results to smooth the results of errors, and finally achieve or complete fast, high-precision clothing recognition by means of spatiotemporal fusion results.

Further, the embodiments disclosed in the present invention can improve the real-time performance of the present invention for clothing recognition by reducing the number of detection targets by motion detection, performing feature training by offline, and using a method of setting a weighted weight calculation method. The above-mentioned disclosed embodiments of the present invention describe in detail a clothing recognition method for low-resolution video. The method of the present invention can be implemented by various forms of systems, and thus the present invention also discloses a low-resolution video clothing. The identification system is described in detail below with reference to specific embodiments.

Referring to FIG. 8, a garment recognition system for low resolution video disclosed in the embodiment of the present invention mainly includes: an extracting device 11, a disassembling device 12, a comparison identifying device 13, and a fusing device 14. And decision device 15.

The extracting device 11 is configured to determine a current time sequence in the received video stream, and extract a foreground image in the video stream, determine a moving human target from the foreground image, and extract contour information of the human target.

The decomposing device 12 is configured to decompose the contour information of the human body target, and extract the clothing feature value corresponding to each segment in the contour information of the human body target according to the preset clothing category.

The comparison identifying means 13 is configured to compare the obtained clothing feature values of the respective blocks with the preset clothing feature thresholds, and identify the clothing categories of the respective blocks in the current frame.

The merging device 14 is configured to fuse the clothing categories of each of the blocks in the same time series or different time series in the video stream.

The determining device 15 is configured to perform a voting decision according to the pre-stored clothing category, determine a clothing category of the human target in the current time series, and a clothing category of the same human target in each frame in different time series, the human body The judgment of the target clothing category.

Based on the clothing recognition system for low resolution video disclosed in the above embodiments of the present invention, the method further includes:

The removing device 16 is configured to perform noise and cavity removal operations on the acquired foreground image. The above-disclosed system of the present invention corresponds to the method disclosed in the above-mentioned first embodiment, and the principle or the process of execution of each part can be referred to the above-disclosed method and its related parts. In summary:

The method and system disclosed by the present invention are based on a spatiotemporal classifier fusion technique, based on motion detection, human body recognition and clothing recognition, determine clothing characteristics in multiple video streams of the same human target, and finally determine the clothing category and identity of the moving target. In order to achieve high efficiency, high quality, high accuracy identity and clothing recognition purposes.

The various embodiments in the present specification are described in a progressive manner, and each embodiment focuses on differences from other embodiments, and the same similar parts between the various embodiments may be referred to each other. For the device disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the comparison is described, and the relevant part can be referred to the method part.

The steps of a method or algorithm described in connection with the embodiments disclosed herein may be implemented directly in hardware, a software module executed by a processor, or a combination of both. Software modules can be placed in random access memory (RAM), memory, read only memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, removable disk, CD-ROM, or any other form of storage known in the art. In the medium.

The above description of the disclosed embodiments enables those skilled in the art to make or use the invention. Various modifications to these embodiments are obvious to those skilled in the art, and the general principles defined herein may be implemented in other embodiments without departing from the spirit or scope of the invention. Therefore, the present invention is not to be limited to the embodiments shown herein, but is to be accorded to the broadest scope of the principles and novel features disclosed herein.

Claims

A garment recognition method for low resolution video, comprising:

Extracting a foreground image in the received video stream;

Determining a current time series in the video stream, determining a moving target from the foreground image, identifying a body object, and extracting contour information of the human body target;

The method according to claim 1, wherein, after the foreground image is received, before determining the motion target from the foreground image, the method further includes:

A noise and hole removal operation is performed on the acquired foreground image.

The method according to claim 1 or 2, wherein the foreground image in the received video stream is extracted, and the specific process includes:

Decomposing the acquired video stream, and obtaining a plurality of single-frame video sequences according to a time series;

Obtaining foreground images corresponding to the plurality of single-frame video sequences.

The method according to claim 3, wherein the foreground image corresponding to the plurality of single-frame video sequences is obtained, and the specific process includes:

Background modeling of the video according to the content of the previous several video sequences;

Determining a current single frame video sequence and a current background frame;

And determining, according to a difference between the current single-frame video sequence and the background frame, a foreground image corresponding to the current single-frame video sequence, and updating the background frame in real time according to the current frame video sequence.

5. The method according to claim 1 or 2, wherein the specific process of determining a human body target from the foreground image comprises:

Extracting features of the moving object from the foreground image, and analyzing the contour information of the moving object Interest rate

The aspect ratio of the moving object is solved, and the threshold is set according to the conventional shoulder width and the height ratio to identify the human target.

The method according to claim 1 or 2, wherein the contour information of the human body is decomposed, and the clothing feature value corresponding to each segment in the contour information of the human body is extracted according to the preset clothing category, and the specific process includes :

Decomposing the contour information of the human body, and dividing the human body according to the biological characteristics of the human body; performing eigenvalue training, performing calculation of the corresponding clothing feature value according to the preset clothing category; extracting corresponding blocks of the contour information of the human body Clothing feature value.

7. The method according to claim 4, comprising: establishing a current background frame by using a single Gaussian, mixed Gaussian, Kernel-based or Eigen-Background method.

8. The method according to claim 1 or 2, comprising:

The clothing category of each of the blocks is fused by the clothing category weak classifier corresponding to each of the blocks, and the result of the fusion is formed into a strong classifier.

9. A garment recognition system for low resolution video, comprising:

An extracting device, configured to extract a foreground image in the received video stream, and after determining a current time sequence in the video stream, determine a human body target from the foreground image, and extract contour information of the human body target;

a determining device, configured to perform a voting decision according to the pre-stored clothing category, determine a clothing category of the human target in the current time series; and, after the clothing category of the same human target in each frame of different time series, the human target The judgment of the clothing category.

10. The system according to claim 9, further comprising:

And a removing device, configured to perform noise and cavity removal operations on the acquired foreground image.