CN108334804B

CN108334804B - Image processing apparatus and method, and image processing system

Info

Publication number: CN108334804B
Application number: CN201710051462.XA
Authority: CN
Inventors: 李荣军; 黄耀海; 谭诚; 那森; 松下昌弘; 清水智之
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2017-01-20
Filing date: 2017-01-20
Publication date: 2023-10-31
Anticipated expiration: 2037-01-20
Also published as: CN108334804A

Abstract

The invention provides an image processing apparatus and method, and an image processing system. An aspect of the present invention discloses an image processing apparatus. The image processing apparatus includes: a unit configured to acquire a face image; a unit configured to determine at least one region pair from the face image, wherein regions in the region pair are symmetrical to each other about one of symmetry lines within the face; and a unit configured to determine a feature from one of the pair of regions based on a first direction and determine a feature from the other of the pair of regions based on a second direction, wherein the first direction and the second direction are symmetrical to each other about a line of symmetry of the pair of regions. According to the present invention, features for performing a human body image retrieval (HIR) process or a facial recognition (FID) process will be more accurate, which will improve the accuracy of the HIR process or FID process.

Description

Image processing apparatus and method, and image processing system

Technical Field

The present invention relates to an image processing apparatus and method, and an image processing system.

Background

During video surveillance, in order to obtain corresponding information of a particular person, such as the behavior of the particular person at a particular location (e.g., airport, supermarket, etc.) during a particular period of time, a human body image retrieval (Human Image Retrieval, HIR) technique would typically be used to retrieve an interrelated image (e.g., a facial image) for the particular person from a captured video frame, whereby an operator can analyze the behavior of the particular person through the retrieved image. During the recognition process, in order to recognize whether a specific person belongs to one of registered persons in a specific system (e.g., a door control system (gate control system), a payment system, etc.), a face recognition (Face Identification, FID) technique is generally used to recognize whether a face image of the specific person matches a registered face image of one of the registered persons.

In general, the accuracy of the HIR process (e.g., face image retrieval process) or FID process depends on the accuracy of features extracted from an input face image (i.e., query image) and/or a recorded/registered face image (e.g., captured video frame, registered image, etc.). To obtain more accurate features so that the accuracy of the facial image retrieval process and FID process can be improved, an exemplary technique is disclosed in chinese patent CN102136062a, which includes: extracting multi-resolution local binary pattern (Local Binary Patterns, LBP) features (i.e., feature vectors) from an input facial image; generating an index based on multi-resolution LBP features extracted from recorded/registered face images; and retrieving a recorded/registered face image similar to the input face image based on the extracted multi-resolution LBP features and the generated index.

However, during the video monitoring or recognition process, whether it is a recorded/registered face image or an input face image, the face in these face images is often accompanied by various occlusions, for example, occlusion caused by overlap between several people (as shown in fig. 1A), occlusion caused by accessories (such as hair) (as shown in fig. 1B), and the like. As shown in fig. 1A and 1B, these occlusions will make a portion of the face invisible, which will make the features extracted from that portion unreliable. In other words, the features extracted from the portion will directly affect the accuracy of the features of the entire face. Therefore, during the video monitoring or recognition process, in the case where a face in a recorded/registered face image or an input face image is accompanied by occlusion, the feature for performing the HIR process or FID process will be inaccurate. Thus, the accuracy of the HIR process or FID process will be affected.

Disclosure of Invention

Accordingly, in view of the above description in the background art, the present disclosure aims to solve the above-described problems.

According to an aspect of the present invention, there is provided an image processing apparatus including: an image acquisition unit configured to acquire a face image; a region pair determining unit configured to determine at least one region pair from the face image, wherein regions in the region pair are symmetrical to each other about one of symmetry lines within the face; and a feature determination unit configured to determine a feature from one of the regions in the region pair based on the first direction and determine a feature from the other of the regions in the region pair based on the second direction, wherein the first direction and the second direction are symmetrical to each other about a symmetry line of the region pair. Wherein the symmetry line within the face comprises at least a symmetry line of the face or a symmetry line of a component of the face. For example, the first direction is a clockwise direction and the second direction is a counterclockwise direction.

With the present invention, in the case of face accompanying occlusion in a recorded/registered face image or an input face image, the feature for performing the HIR process or FID process will be more accurate, which will improve the accuracy of the HIR process or FID process.

Further characteristic features and advantages of the invention will be apparent from the following description with reference to the accompanying drawings.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description, serve to explain the principles of the invention.

Fig. 1A to 1B schematically show an exemplary face in a face image accompanied by occlusion.

Fig. 2 schematically illustrates a symmetry line of an exemplary face and a symmetry line of components of an exemplary face.

Fig. 3A to 3B schematically show an exemplary region of interest (regions of interest, ROI) of an exemplary face in a face image.

Fig. 4 is a block diagram schematically illustrating a hardware structure in which techniques according to embodiments of the present invention may be implemented.

Fig. 5 is a block diagram showing the structure of an image processing apparatus according to the first embodiment of the present invention.

Fig. 6 schematically shows a flow chart of image processing according to a first embodiment of the invention.

Fig. 7 schematically shows a flow chart of the feature determination step S630 shown in fig. 6 according to the present invention.

Fig. 8A to 8F schematically illustrate the manner in which the feature determination unit 530 shown in fig. 5 determines features that are not related to orientation (orientation) according to the present invention.

Fig. 9A to 9F schematically illustrate the manner in which the feature determination unit 530 shown in fig. 5 determines the feature related to the orientation according to the present invention.

Fig. 10 schematically shows another flow chart of the feature determination step S630 shown in fig. 6 according to the present invention.

Fig. 11 schematically shows another flow chart of the feature determination step S630 shown in fig. 6 according to the present invention.

Fig. 12 is a block diagram showing the structure of an image processing apparatus according to a second embodiment of the present invention.

Fig. 13 shows an arrangement of an exemplary image processing system according to the present invention.

Fig. 14 shows an arrangement of another exemplary image processing system according to the present invention.

Detailed Description

Exemplary embodiments of the present invention will be described in detail below with reference to the accompanying drawings. It should be noted that the following description is merely illustrative, exemplary in nature and is in no way intended to limit the invention, its application, or uses. The relative arrangement of the components and steps, numerical expressions and numerical values set forth in the examples do not limit the scope of the present invention unless otherwise specifically indicated. In addition, techniques, methods, and apparatus known to those of skill in the art may not be discussed in detail, but should be part of the present specification where appropriate.

Note that like reference numerals and letters refer to like items in the figures, and thus once an item is defined in a figure, it need not be discussed in the following figures.

As described above, in the case where a face (e.g., a human face) in a face image is accompanied by some kind of occlusion (e.g., an occlusion caused by overlap between several people or an occlusion caused by an accessory), the accuracy of the features of the entire face will be affected. That is, the accuracy of the features for performing the HIR process or FID process will be affected.

Statistics indicate that a face (e.g., a human face) has symmetry. That is, the face is almost bilaterally symmetrical. For example, the dashed line 210 schematically shown in fig. 2 is a line of symmetry of the face. In addition, the face typically includes several components, such as a facial contour component, an eyebrow component, an eye component, a nose component, and a mouth component. And statistics also indicate that the components of the face have symmetry. That is, the components of the face are also nearly side-to-side and/or side-to-side. For example, the dashed lines 220 through 230 schematically shown in fig. 2 are lines of symmetry of the eye components of the face, and the dashed line 240 schematically shown in fig. 2 is a line of symmetry of the mouth components of the face.

In addition, the face in the face image typically contains certain regions, e.g., eye regions, nose regions, mouth regions, regions determined based on feature points of the facial components. Hereinafter, in the present invention, for example, these regions are regarded as regions of interest (Region of Interest, ROI). As shown in fig. 3A, for example, an eye region of a face, a nose region of a face, and a mouth region of a face serve as the corresponding ROIs. As shown in fig. 3B, for example, a region determined based on feature points of a face component is used as a corresponding ROI. Also, features (e.g., feature vectors) for performing the HIR process or FID process are typically determined from the corresponding ROIs of the face.

The inventors have found that in one aspect, similar features can be determined from a corresponding ROI with consideration of symmetry of the face and/or symmetry of components of the face in determining the corresponding ROI. In other words, where a pair of ROIs (i.e., a pair of regions as described herein) can be determined from a facial image, features determined from one of the ROIs will be similar to features determined from the other of the ROIs, where the ROIs in the pair of ROIs are symmetrical to each other about a line of symmetry of the face or a line of symmetry of a component of the face. For example, as shown in fig. 3A, the region 301 and the region 302 can be regarded as one ROI pair in consideration of symmetry of the face. For example, as shown in fig. 3B, in a case where symmetry of a face is considered, the region 310 and the region 340 may be regarded as one ROI pair, and the region 320 and the region 330 may be regarded as another ROI pair. Also, in consideration of symmetry of the eye, the region 310 and the region 320 may be regarded as one ROI pair, and the region 330 and the region 340 may be regarded as another ROI pair.

On the other hand, for any one of the ROI pairs, when the corresponding feature is determined from that ROI pair, the feature determined from the corresponding ROI of that ROI pair will be nearly the same, also taking into account the symmetry of that ROI pair. In other words, for any one of the ROI pairs, in the case where the direction for determining the feature from one of the ROIs and the direction for determining the feature from the other of the ROIs are symmetrical to each other about the symmetry line of the ROI pair, the features determined based on these symmetry directions are mirror-symmetrical to each other. Thus, for any one of the ROI pairs, features determined from the corresponding ROIs may be interchanged with one another.

Thus, for any one of the pair of ROIs, where one of the ROIs is an occluded ROI (i.e., an occlusion region as described herein), instead of determining features directly from the occluded ROI, features determined from the other ROI of the pair may be used directly as corresponding features of the occluded ROI. In other words, even if a face in a face image is accompanied by some occlusion, the accuracy of features for the entire face is hardly affected. That is, the accuracy of the features for performing the HIR process or FID process will be hardly affected.

Thus, according to the present invention, during the HIR process or FID process, in the case where a face in a recorded/registered face image or an input face image accompanies occlusion, the accuracy of the feature for performing the HIR process or FID process can be improved. In other words, in the case of face accompanying occlusion in a recorded/registered face image or an input face image, the feature for performing the HIR process or FID process will be more accurate, which will improve the accuracy of the HIR process or FID process.

(hardware construction)

First, a hardware structure in which the techniques described below can be implemented will be described with reference to fig. 4.

For example, hardware architecture 400 includes a Central Processing Unit (CPU) 410, random Access Memory (RAM) 420, read Only Memory (ROM) 430, hard disk 440, input device 450, output device 460, network interface 470, and system bus 480. Furthermore, hardware architecture 400 may be implemented by, for example, a camera, a Personal Digital Assistant (PDA), a mobile phone, a tablet computer, a laptop computer, a desktop computer, or other suitable electronic device.

In a first implementation, image processing according to the present invention is made up of hardware or firmware and serves as a module or component of hardware architecture 400. For example, an image processing apparatus 500, which will be described in detail below with reference to fig. 5, and an image processing apparatus 1200, which will be described in detail below with reference to fig. 12, are used as modules or components of the hardware structure 400. In the second implementation, the image processing according to the present invention is constituted by software stored in the ROM 430 or the hard disk 440 and executed by the CPU 410. For example, an image processing procedure 600 described in detail below with reference to fig. 6 is used as a program stored in the ROM 430 or the hard disk 440.

CPU 410 is any suitable programmable control device (e.g., processor) and may perform various functions to be described below by executing various application programs stored in ROM 430 or hard disk 440. RAM 420 is used to temporarily store programs or data loaded from ROM 430 or hard disk 440, and is also used as a space in which CPU 410 performs various processes, for example, implementing a technique and other available functions that will be described in detail below with reference to fig. 6. The hard disk 440 stores various information such as an Operating System (OS), various applications, control programs, and data stored in advance or defined in advance by a manufacturer.

In one implementation, input device 450 is used to allow a user to interact with hardware structure 400. In one example, a user may input image/video/data through the input device 450. In another example, a user may trigger the corresponding image processing of the present invention via input device 450. Further, the input device 450 may take various forms, such as buttons, a keyboard, or a touch screen. In another implementation, the input device 450 is used to receive images/video output from a particular electronic device, such as a digital still camera, video camera, and/or web cam.

In one implementation, the output device 460 is used to display the processing results (e.g., similar facial images used to input facial images) to the user. Also, the output device 460 may take various forms, such as a Cathode Ray Tube (CRT) or a liquid crystal display. In another implementation, the output device 460 is used to output the processing results (e.g., the determined features) to subsequent operations such as an HIR process, an FID process, or a human attribute recognition (Human Attribute Recognition, HAR) process.

The network interface 470 provides an interface for connecting the hardware structure 400 to a network. For example, hardware structure 400 may communicate data via network interface 470 with other electronic devices connected via a network. Alternatively, a wireless interface may be provided for hardware architecture 400 for wireless data communication. The system bus 480 may provide a data transmission path for transmitting data between the CPU 410, the RAM 420, the ROM430, the hard disk 440, the input device 450, the output device 460, the network interface 470, and the like. Although referred to as a bus, the system bus 480 is not limited to any particular data transmission technology.

The hardware architecture 400 described above is merely illustrative and is in no way intended to limit the invention, its applications, or uses. Also, for simplicity, only one hardware configuration is shown in fig. 4. However, various hardware configurations may be used as needed.

(image processing)

Next, image processing according to the present invention will be described with reference to fig. 5 to 14.

Fig. 5 is a block diagram showing the structure of an image processing apparatus 500 according to the first embodiment of the present invention. Wherein some or all of the blocks shown in fig. 5 may be implemented by dedicated hardware. As shown in fig. 5, the image processing apparatus 500 includes an image acquisition unit 510, a region pair determination unit 520, and a feature determination unit 530.

First, the input device 450 shown in fig. 4 receives a face image (e.g., a human face image shown in fig. 1B) output from a special electronic apparatus (e.g., a camera) or input by a user. Next, the input device 450 transmits the received face image to the image acquisition unit 510 via the system bus 480.

Then, as shown in fig. 5, the image acquisition unit 510 acquires a face image from the input device 450 through the system bus 480.

The region pair determining unit 520 determines at least one region pair from the face image, wherein the regions in the region pair are symmetrical to each other about one of symmetry lines within the face, and the regions in the region pair are, for example, ROIs of the face. Further, as depicted in FIG. 2, the symmetry line within the face includes at least a symmetry line of the face (e.g., symmetry line 210) or a symmetry line of a component of the face (e.g., symmetry line 220/230/240). And, as depicted in fig. 3A and 3B, the region pairs are ROI pairs, such as a pair of ROI 310 and ROI 340, a pair of ROI 320 and ROI 330, a pair of ROI 310 and ROI 320, and a pair of ROI 330 and ROI 340.

For any one of the pairs of regions (e.g., the pair of ROI 310 and ROI 340 shown in fig. 3B), the feature determination unit 530 determines a feature from one of the regions of the pair of regions (e.g., ROI 310) based on the first direction and determines a feature from the other of the pair of regions (e.g., ROI 340) based on the second direction.

Wherein the first direction and the second direction are symmetrical to each other about a symmetry line of the region pair (e.g., a symmetry line of the face). In one implementation, the second direction is obtained by reversing the first direction of the symmetry line based on the pair of regions. And, in the case where the first direction is a clockwise direction, the second direction is a counterclockwise direction. In addition, in one implementation, the determined features are orientation-related features, such as scale-invariant feature transform (Scale Invariant Feature Transform, SIFT) features, accelerated robust features (speed-Up Robust Features, SURF). In another implementation, the determined feature is a location independent feature, such as a Local Binary Pattern (LBP) feature.

The flowchart 600 shown in fig. 6 is a corresponding procedure of the image processing apparatus 500 shown in fig. 5.

As shown in fig. 6, in the image acquisition step S610, the image acquisition unit 510 acquires a face image from the input device 450 through the system bus 480.

In the region pair determining step S620, the region pair determining unit 520 determines at least one region pair from the face image. Taking the pair of the region 310 and the region 340 shown in fig. 3B as an example, in one implementation, the region pair determining unit 520 determines the corresponding region pair by the following process.

First, the region pair determining unit 520 detects feature points from the components of the face by using an existing feature point detection method such as a supervised descent method. For example, the points shown in fig. 3B represent detected feature points.

Second, the region pair determining unit 520 determines a pair of feature points (i.e., a feature point pair) that are symmetrical to each other about a symmetry line of the region pair. For example, the symmetry line of a pair of regions 310 and 340 is the symmetry line of the face. Thus, as shown in fig. 3B, a pair of feature points 311 and 341 can be determined.

Then, the region pair determining unit 520 determines a corresponding region pair based on the determined feature point pair. In one implementation, one of the pair of regions is determined by centering around feature point 311 and the other of the pair of regions is determined by centering around feature point 341. In addition, the regions in the pair of regions may also be determined based on the pair of feature points by using other methods as long as the respective regions are determined in the same manner, have the same size, and are symmetrical to each other.

Returning to fig. 6, in the feature determining step S630, for any one of the pair of regions, the feature determining unit 530 determines a feature from one of the regions of the pair of regions based on the first direction, and determines a feature from the other of the regions of the pair of regions based on the second direction. Taking one region pair (for example, a pair of regions 310 and 340 shown in fig. 3B) as an example, in one implementation, the feature determination unit 530 determines a feature with reference to fig. 7.

As shown in fig. 7, in step S710, for each pixel in each region, the feature determination unit 530 determines a difference value based on the gray level of the pixel and the gray levels of the neighboring pixels of the pixel. In one implementation, the gray level of a pixel is the intensity value of the pixel.

In one implementation, to obtain an orientation-independent feature (e.g., an LBP feature) from one region pair, taking the pair of regions 310 and 340 shown in fig. 3B as an example, the feature determination unit 530 determines the corresponding difference values with reference to fig. 8A to 8F. For one pixel (e.g., feature point 311) in the region 310, first, the feature determination unit 530 obtains the corresponding gray scale of the pixel and the corresponding gray scales of the neighboring pixels (e.g., 8 neighboring pixels) of the pixel. As shown in fig. 8A, for example, the value "83" is the corresponding gradation of the pixel 311, and the other values around the value "83" are the corresponding gradations of the adjacent pixels of the pixel 311. Then, the feature determination unit 530 determines a corresponding difference value by comparing the obtained grayscales. Wherein in the present implementation, the determined difference is the binary value of the pixel 311. As shown in fig. 8A, assuming that in the case where the gradation (i.e., the value "83") of the pixel 311 is greater than or equal to the gradation (e.g., the value "43") of one neighboring pixel, the difference value is determined using 8 neighboring pixels, the feature determination unit 530 determines the corresponding difference value as a binary value "0", for example. Otherwise, in the case where the gradation (i.e., the value "83") of the pixel 311 is smaller than the gradation (e.g., the value "204") of one neighboring pixel, the feature determination unit 530 determines the corresponding difference value as a binary value "1", for example. Accordingly, fig. 8C shows, for the pixels 311 in the region 310, the corresponding differences determined by the feature determination unit 530, for example. Furthermore, it is obvious to those skilled in the art that the number of adjacent pixels for determining the difference is not limited to 8. Further, for one pixel (for example, feature point 341) in the region 340, fig. 8B shows a corresponding gradation, and fig. 8D shows a corresponding difference value determined by, for example, the feature determination unit 530, similarly to the manner described with reference to fig. 8A and 8C.

In another implementation, to obtain a feature (e.g., SIFT feature) related to an orientation from one region pair, taking the pair of the region 310 and the region 340 shown in fig. 3B as an example, the feature determination unit 530 determines a corresponding difference value with reference to fig. 9A to 9F. For one pixel (e.g., feature point 311) in the region 310, first, the feature determination unit 530 obtains the corresponding gray scale of the pixel and the corresponding gray scales of the neighboring pixels (e.g., 8 neighboring pixels) of the pixel. As shown in fig. 9A, for example, the value "a" is the corresponding gradation of the pixel 311, and the other values around the value "a" are the corresponding gradations of the adjacent pixels of the pixel 311. Then, the feature determination unit 530 determines a corresponding difference value by comparing the obtained grayscales. In this embodiment, the determined difference values are a value in the horizontal direction of the pixel 311 and a value in the vertical direction of the pixel 311, respectively. As shown in fig. 9A, assuming that adjacent pixels having gradation of "(2)", "(4)", "(5)", and "(7)", are used to determine the difference values, the feature determination unit 530 determines one difference value (for example, a value in the horizontal direction of the pixel 311) by, for example, subtracting the value "(4)" from the value "(5)", and the feature determination unit 530 determines the other difference value (for example, a value in the vertical direction of the pixel 311) by, for example, subtracting the value "(2)" from the value "(7)". For example, fig. 9C shows the corresponding difference value determined by the feature determination unit 530 for the pixel 311 in the region 310. Furthermore, it will be apparent to those skilled in the art that the manner for determining the difference is not limited to the implementations described above, and may be implemented by utilizing other techniques. For one pixel (e.g., feature point 341) in the region 340, fig. 9B shows a corresponding gradation, and fig. 9D shows a corresponding difference value determined by, for example, the feature determination unit 530, in a manner similar to that described with reference to fig. 9A and 9C.

Returning to fig. 7, in step S720, the feature determination unit 530 determines a first value of a pixel in one of the regions in the region pair based on the first direction and the corresponding difference value, and determines a second value of a pixel in the other of the regions in the region pair based on the second direction and the corresponding difference value.

In one implementation, to obtain orientation-independent features from one region pair, taking the pixel 311 in region 310 shown in FIG. 3B as an example, the corresponding difference is shown in FIG. 8C. Accordingly, the feature determination unit 530 determines the corresponding first value of the pixel 311 based on the first direction and the corresponding difference value shown in fig. 8C. As shown in fig. 8E, the arrowed curve represents, for example, a first direction (e.g., clockwise). Thus, according to the clockwise direction, the feature determination unit 530 determines the first value of the pixel 311 based on the binary value sequence obtained by arranging the difference values from the upper left to the lower left. As shown in fig. 8E, the obtained binary value sequence is "00010011", and the decimal value (i.e., "19") of the obtained binary value sequence is regarded as the first value of the pixel 311. Taking the pixel 341 in the region 340 shown in fig. 3B as an example, as described above, the corresponding difference is shown in fig. 8D. Also, as shown in fig. 8F, the curve with an arrow indicates, for example, the second direction (for example, counterclockwise direction). Thus, similar to the manner described with reference to fig. 8E, a binary value sequence "00010011" is obtained by arranging the differences from the upper right to the lower right, and the decimal value ("19") of the obtained binary value sequence is regarded as the second value of the pixel 341. In other words, the first value and the second value are decimal values corresponding to the orientation-independent feature. In addition, as described above, the gradation is an intensity value; thus, the first and second values are intensity values.

In another implementation, to obtain orientation-related features from one region pair, taking the pixel 311 in region 310 shown in FIG. 3B as an example, the corresponding difference is shown in FIG. 9C. Accordingly, the feature determination unit 530 determines the corresponding first value of the pixel 311 based on the first direction and the corresponding difference value shown in fig. 9C. As shown in fig. 9E, the arrowed curve represents, for example, a first direction (e.g., clockwise). Accordingly, the feature determination unit 530 determines the first value of the pixel 311 by using, for example, the following formula from the clockwise direction, the value of the pixel 311 in the horizontal direction, and the value of the pixel 311 in the vertical direction:

where (x, y) represents the coordinates of the pixel 311. As shown in fig. 9E, the first value of the pixel 311 is, for example, 45 degrees.

Taking the pixel 341 in the region 340 shown in fig. 3B as an example, the corresponding difference is shown in fig. 9D. Also, as shown in fig. 9F, the curve with an arrow indicates, for example, the second direction (for example, counterclockwise direction). Thus, similar to the manner described with reference to fig. 9E, the feature determination unit 530 determines the second value of the pixel 341 by using, for example, the following formula:

as shown in fig. 9F, the second value of the pixel 341 is, for example, 45 degrees. In other words, the first and second values are angle values corresponding to the orientation-related features.

Returning to fig. 7, in step S730, the feature determination unit 530 determines a feature from one of the regions in the region pair based on the frequency of occurrence of the first value, and determines a feature from the other of the regions in the region pair based on the frequency of occurrence of the second value. As described above, the feature determination unit 530 will determine the corresponding first value of each pixel in one of the regions of the region pair, and will determine the corresponding second value of each pixel in the other of the regions of the region pair.

In one implementation, for obtaining orientation-independent features from a region pair, the first and second values are decimal values, as described above. Thus, for one of the regions in the pair of regions, a numerical histogram will be determined based on the frequency of occurrence of the respective first values in that region, and that numerical histogram will be considered as the corresponding feature determined from that region. In addition, by using a similar manner, another numerical histogram determined based on the frequency of occurrence of each second value will be considered as a corresponding feature determined from the other of the regions in the region pair.

In another implementation, for obtaining the orientation-related feature from a region pair, the first and second values are angle values, as described above. Thus, for one of the regions in the pair of regions, an angle histogram will be determined based on the frequency of occurrence of the respective first values in that region, and that angle histogram will be considered as the corresponding feature determined from that region. In addition, by using a similar manner, another angle histogram determined based on the frequency of occurrence of the respective second values will be considered as a corresponding feature determined from the other of the regions in the region pair.

Further, referring to fig. 7, for one region pair, a feature is determined based on each pixel in the region pair. Since the size of the region will directly affect the accuracy of the feature, in order to obtain a more accurate feature from the region pairs, in another implementation, the feature determination unit 530 determines the feature with reference to fig. 10 for at least one of the region pairs.

Taking one region pair (for example, a pair of regions 301 and 302 shown in fig. 3A) as an example, as shown in fig. 10, in step S1010, the feature determination unit 530 obtains at least one sub-region pair from the region pair by dividing the region of the region pair into sub-regions. Wherein the sub-regions of a pair of sub-regions are symmetrical to each other about a line of symmetry of the pair of regions. In other words, the number and size of sub-regions in each region of the region pair are the same. As shown in fig. 3A, the region 301 and the region 302 are divided into four sub-regions, and the sub-region 303 and the sub-region 304 are regarded as one sub-region pair.

In step S1020, the feature determination unit 530 determines a feature from the pair of sub-regions. Wherein, for each sub-region pair (e.g., a pair of sub-regions 303 and 304) of the region pair (e.g., a pair of regions 301 and 302), the feature determination unit 530 determines a corresponding feature from the sub-region pair by using a similar manner as described with reference to fig. 7 to 9.

Then, in step S1030, for each region in the pair of regions, the feature determination unit 530 links together the corresponding features determined from the sub-regions of the region to construct the corresponding features of the region.

Further, for example, as shown in fig. 1A, the face 110 is blocked by the face 120, the region 111 and the region 112 are one region pair, and the region 111 is a blocked region. As described above, in the present invention, even if a face in a face image is accompanied by some occlusion, the accuracy of features for the entire face is hardly affected. Thus, in other implementations, for at least one of the region pairs, the feature determination unit 530 determines the feature with reference to fig. 11. In this case, the regions in the region pair are preferably symmetrical to each other about the symmetry line of the face.

Taking one region pair (for example, the pair of the region 111 and the region 112 shown in fig. 1A) as an example, as shown in fig. 11, in step S1110, the feature determination unit 530 determines whether or not one region of the region pair is an occlusion region. And, in the case where one of the regions in the pair of regions is an occlusion region, the process advances to step S1130; otherwise, the process advances to step S1120.

In one implementation, for each region in the pair of regions (e.g., region 111), the feature determination unit 530 performs a corresponding determination based on the black pixel density of the region in the pair of regions. For example, first, the feature determination unit 530 binarizes an image corresponding to the region (for example, an image within the region 111 as shown in fig. 1A) by using an existing binary algorithm such as an OSTU algorithm, an adaptive threshold algorithm, a threshold algorithm, or the like. Then, the feature determination unit 530 calculates the black pixel density of the region by using the following formula:

where α represents the black pixel density. Finally, the feature determination unit 530 determines whether the black pixel density of the region is greater than a predetermined threshold (e.g., TH 1). When the black pixel density is greater than TH1, the region is determined as a blocked region; otherwise, the region is not an occluded region (i.e., a non-occluded region).

In another implementation, for each region (e.g., region 111) in the region pair, the feature determination unit 530 performs a corresponding determination based on an existing track detection method such as a head tracking method, an Ω tracking method, or the like. Taking the face 110 and the face 120 shown in fig. 1A as an example, first, the feature determination unit 530 detects the trajectory of the face 110 (for example, trajectory 1) and the trajectory of the face 120 (for example, trajectory 2) by the head tracking method. Then, the feature determination unit 530 determines the overlapping position between the track 1 and the track 2. Finally, the feature determination unit 530 predicts a region whose position is to be at the overlapping position as an occlusion region. For the pair of regions 111 and 112 shown in fig. 1A, the region 111 will be determined as an occlusion region.

Returning to fig. 11, in step S1120, the feature determination unit 530 determines a feature from a pair of regions (for example, a pair of regions 111 and 112 shown in fig. 1A) by using a similar manner as described with reference to fig. 7 to 9. For example, a corresponding feature is determined from region 111 based on a first direction (e.g., clockwise) and a corresponding feature is determined from region 112 based on a second direction (e.g., counterclockwise).

In step S1130, the feature determination unit 530 determines a feature from the other region (i.e., the region 112) in the pair of regions based on the second direction (e.g., the counterclockwise direction). In addition, in the case where the region 111 is not an occlusion region, a corresponding feature is determined from the region 111 based on the second direction. Accordingly, the feature determination unit 530 determines the corresponding feature from the region 112 based on the first direction.

Then, in step S1140, the feature determination unit 530 regards the feature determined from the region 112 as a corresponding feature of the occlusion region (i.e., region 111). For example, a copy of the features determined from region 112 is considered to be a corresponding feature of the occlusion region.

In addition, as an alternative solution, taking one region pair as an example, after determining the corresponding feature (for example, feature vector 1 and feature vector 2) from the region in the region pair, the feature determination unit 530 will further link the feature vector 1 and the feature vector 2 to obtain the corresponding feature of the region pair in the following manner. In one example, feature vector 1 and feature vector 2 are linked in a serial fashion. That is, the length of the link feature is the sum of the length of the feature vector 1 and the length of the feature vector 2. In another example, feature vector 1 and feature vector 2 are linked in parallel in order to reduce the length of the linked feature. That is, the length of the link feature is equal to the length of any one of the feature vector 1 and the feature vector 2. In another example, to reduce the size of the linked feature, feature vector 1 and feature vector 2 are linked by using an existing machine learning method such as a Principal Component Analysis (PCA) method or the like.

As described above, in the present invention, in one aspect, symmetry of a face and/or symmetry of components of a face are considered in determining corresponding pairs of regions, in other words, pairs of regions are symmetrically determined. In another aspect, symmetry of the region pairs is also considered when determining features from the corresponding region pairs, in other words, features are determined symmetrically from the region pairs. Therefore, in the case of face accompanying occlusion in a recorded/registered face image or an input face image, the features determined according to the present invention will be more accurate, which will improve the accuracy of the HIR process or FID process.

As described above, the features are symmetrically determined from the symmetrically determined region pairs. Thus, for any one of the pair of regions, the feature determined from one of the regions in the pair of regions is nearly identical to the feature determined from the other of the regions in the pair of regions. Therefore, as an exemplary application of the above-described image processing with reference to fig. 5 to 11, another image processing apparatus 1200 will be described next with reference to fig. 12.

Fig. 12 is a block diagram showing the structure of an image processing apparatus 1200 according to the second embodiment of the present invention. Wherein some or all of the blocks shown in fig. 12 may be implemented by dedicated hardware. As shown in fig. 12, the image processing apparatus 1200 includes an image processing apparatus 500 and an occlusion determination unit 1210.

First, with respect to an input face image, the image processing apparatus 500 determines corresponding features from respective pairs of regions with reference to fig. 5 to 11.

Then, for at least one of the region pairs, the occlusion judgment unit 1210 judges whether occlusion occurs between the regions in the corresponding region pair. More specifically, for example, taking one region pair as an example, the occlusion judgment unit 1210 first calculates a similarity measure between features determined from regions in the region pair. Wherein the similarity measure is calculated as a cosine distance, a euclidean distance or a mahalanobis distance (Mahalanobis distance), for example. Then, the occlusion judgment unit 1210 judges that occlusion does not occur between areas in the case where the similarity measure is greater than or equal to a predefined threshold (e.g., TH 2). Otherwise, the occlusion judgment unit 1210 judges that occlusion occurs between areas.

In other words, the features determined according to the image processing with reference to fig. 5 to 11 may be applied to determine whether a face in a face image is accompanied by some occlusion. As shown in fig. 1A, for a pair of the region 121 and the region 122 in the face 120, the features determined from the region 121 and the features determined from the region 122 are almost the same according to the image processing with reference to fig. 5 to 11. That is, the similarity measure between the determined features is greater than or equal to TH2. Therefore, it will be judged that no occlusion occurs between the region 121 and the region 122. Also, in the case where it is determined that no occlusion has occurred between all the pairs of regions in the face 120, it will be determined that the face 120 is not accompanied by any occlusion. Also, as shown in fig. 1A, for a pair of the region 111 and the region 112 in the face 110, the features determined from the region 111 and the features determined from the region 112 are dissimilar to each other according to the image processing with reference to fig. 5 to 11. That is, the similarity measure between the determined features is less than TH2. Therefore, it will be judged that occlusion occurs between the region 111 and the region 112. Therefore, it will be determined that the face 110 is accompanied by some occlusion.

As described above, HIR techniques are typically used to retrieve interrelated face images of an input face image during video surveillance. As an exemplary application of the above-described image processing with reference to fig. 5 to 11, an exemplary image processing system 1200 (for example, an HIR system) will be described next with reference to fig. 13.

As shown in fig. 13, the image processing system 1300 includes a search client 1310 and a search server 1320. Wherein a user may interact with the image processing system 1300 by retrieving the client 1310. The retrieval server 1320 includes an index server 1321 and a processor 1322. Wherein the predetermined index is stored in the index server 1321. Alternatively, the index server 1321 may be replaced by an external server. That is, the predetermined index may be stored in an external server instead of the internal server of the search server 1320.

Further, in one implementation, the search client 1310 and the search server 1320 are connected to each other via a system bus. In another implementation, the search client 1310 and the search server 1320 are connected to each other via a network. In addition, the search client 1310 and the search server 1320 may be implemented via the same electronic device (e.g., computer, PDA, mobile phone, camera). Alternatively, the search client 1310 and the search server 1320 may be implemented via different electronic devices.

As shown in fig. 13, first, the retrieval client 1310 acquires a face image input by a user (i.e., an input face image). Next, the retrieval client 1310 determines corresponding features from respective pairs of regions in the input face image with reference to fig. 5 to 11. The retrieval client 1310 then obtains features of the input face image based on the features determined from the region pairs. For example, features of an input face image are obtained by linking features determined from pairs of regions. Alternatively, the features of the input facial image are obtained by the retrieval server 1320, rather than the retrieval client 1310. In other words, in this case, the retrieval client 1310 receives only the face image input by the user, and transmits the input face image to the retrieval server 1320.

The retrieval server 1320 (and in particular the processor 1322) then obtains feature candidates from the predetermined index stored in the index server 1321. Wherein one feature candidate is determined based on the features determined from one sample face image (e.g., registered face image) with reference to fig. 5 to 11. Feature candidates are determined, for example, by linking features determined from corresponding sample face images. Further, in one implementation, the predetermined index includes feature candidates and hyperlinks, where one hyperlink corresponds to one sample face image from which the corresponding feature candidate is determined. In another implementation, the predetermined index includes feature candidates and corresponding sample facial images.

After obtaining the corresponding feature candidates, the retrieval server 1320 (particularly, the processor 1322) determines, for each of the sample face images corresponding to the obtained feature candidates, whether the sample face image is similar to the input face image. In one implementation, first, the retrieval server 1320 calculates a similarity measure between the obtained features of the input face image and the obtained feature candidates corresponding to the sample face image. Wherein the similarity measure is calculated as a cosine distance, a euclidean distance or a mahalanobis distance, for example. Then, in the case where the similarity measure is greater than or equal to a predefined threshold (e.g., TH 3), the retrieval server 1320 determines that the sample face image is similar to the input face image. Otherwise, the retrieval server 1320 determines that the sample face image is not similar to the input face image. Alternatively, the similarity determination is performed by the search client 1310, rather than by the search server 1320. In other words, in this case, the retrieval server 1320 obtains only feature candidates and transmits the corresponding feature candidates to the retrieval client 1310.

In addition, as an exemplary application, the first N sample face images determined similarly to the input face image are output to the user. Wherein N is an integer greater than or equal to 1.

As described above, during the recognition process, the FID technique is generally used to recognize whether or not the face image of the person matches the registered face image of one of the registered persons. As an exemplary application of the above-described image processing with reference to fig. 5 to 11, an exemplary image processing system (e.g., a door control system, a payment system) will be described next with reference to fig. 14.

As shown in fig. 14, the image processing system 1400 includes a payment client 1410 (or gate control client 1420) and an identification server 1430. Wherein the recognition server 1430 includes a server 1431 and a processor 1432. In the case where the image processing system 1400 is used as a payment system, the server 1431 stores a face image of a registered person. For example, one registered person corresponds to one Identification (ID) number. Also, in the case where the image processing system 1400 is used as a gate control system, the server 1431 stores a predetermined index. Wherein the predetermined index stored in server 1431 is similar to the predetermined index stored in index server 1321. Alternatively, the server 1431 may be replaced by an external server. That is, the face image or the predetermined index of the registered person may also be stored in an external server instead of the internal server of the recognition server 1430.

Further, in one implementation, the payment client 1410, the gate control client 1420, and the identification server 1430 are connected to each other via a system bus. In another implementation, the payment client 1410, the gate control client 1420, and the identification server 1430 are connected to each other via a network. In addition, the payment client 1410, the gate control client 1420, and the identification server 1430 may be implemented via the same electronic device (e.g., computer, PDA, mobile phone, camera). Alternatively, the payment client 1410, the gate control client 1420, and the identification server 1430 may also be implemented via different electronic devices.

As shown in fig. 14, in the case where the image processing system 1400 is used as a payment system, in one aspect, the payment client 1410 acquires a face image input by a registered person (i.e., inputs a face image). Then, the payment client 1410 obtains features of the input face image from the features determined from the region pairs in the input face image by linking with reference to fig. 5 to 11. Alternatively, the characteristics of the input image are obtained by the recognition server 1430 instead of the payment client 1410. In other words, in this case, the payment client 1410 receives only the face image input by the registered person, and transmits the input face image to the recognition server 1430. In another aspect, the payment client 1410 obtains an ID number entered by the registered person and transmits the ID number to the identification server 1430.

After acquiring the ID number, the recognition server 1430 (particularly the processor 1432) acquires the registered face image of the registered person from the image processing system 1400 (particularly the server 1431). For example, the recognition server 1430 acquires the corresponding registered face image from the server 1431 by comparing the acquired ID number with the ID number corresponding to the registered face image. Then, the recognition server 1430 (particularly the processor 1432) obtains the features of the registered face image of the registered person by linking the determined features from the region pairs in the registered face image of the registered person with reference to fig. 5 to 11.

After obtaining the features of the input face image and the features of the registered face image of the registered person, the recognition server 1430 (particularly the processor 1432) recognizes whether the face in the input face image belongs to the face of the registered person in the registered face image. In one implementation, first, the recognition server 1430 calculates a similarity measure between features of the input face image and features of the registered face image of the registered person. Wherein the similarity measure is calculated as a cosine distance, a euclidean distance or a mahalanobis distance, for example. Then, in the case where the similarity measure is greater than or equal to a predefined threshold (e.g., TH 4), the recognition server 1430 determines that the face in the input face image belongs to the face of the registered person in the registered face image. Otherwise, the recognition server 1430 determines that the face in the input face image does not belong to the face of the registered person in the registered face image. Alternatively, the similarity determination is performed by payment client 1410, rather than by identification server 1430. In other words, in this case, the recognition server 1430 obtains only the features of the registered face image of the registered person and transmits the corresponding features to the payment client 1410.

In addition, as an exemplary application, in the case where a face in an input face image is determined to belong to a face of a registered person in a registered face image, the registered person may perform a payment activity via the image processing system 1400.

Further, as shown in fig. 14, in the case where the image processing system 1400 is used as a gate control system, the corresponding process performed by the image processing system 1400 is similar to the corresponding process performed by the image processing system 1300. Accordingly, detailed description is not repeated here. In addition, the only difference is that the door will be opened for the user only when one sample face image most similar to the face image (i.e., the input face image) taken by the door control client 1420 can be determined by the image processing system 1400.

As described above, the features determined according to the present invention will be more accurate. Therefore, the accuracy of the HIR process and FID process will also be improved.

In addition, as another application of the image processing described above with reference to fig. 5 to 11, the features determined with reference to fig. 5 to 11 may also be used for HAR processing. More specifically, first, corresponding features will be determined from the input face image with reference to fig. 5 to 11. Then, the corresponding attributes of the person in the input face image will be determined based on the classifier and the determined features. Among these, for example, the attributes of a person include the age level of the person (e.g., elderly, adults, children), the race of the person (e.g., caucasian, yellow race, black race), the sex of the person (e.g., male, female), and the like.

All of the elements described above are exemplary and/or preferred modules for implementing the processes described in this disclosure. These units may be hardware units (e.g., field Programmable Gate Arrays (FPGAs), digital signal processors, application specific integrated circuits, etc.) and/or software modules (e.g., computer readable programs). The units for implementing the steps are not described in detail above. However, in the case where there is a step of executing a certain process, there may be a corresponding functional module or unit (realized by hardware and/or software) for realizing the same process. The technical solutions by means of all combinations of the described steps and the units corresponding to these steps are included in the disclosure of the application as long as they constitute a complete, applicable technical solution.

The method and apparatus of the present application can be implemented in a number of ways. For example, the methods and apparatus of the present application may be implemented by software, hardware, firmware, or any combination thereof. The above-described sequence of steps of the method is intended to be illustrative only, and the steps of the method of the present application are not limited to the sequence specifically described above unless specifically stated otherwise. Furthermore, in some embodiments, the present application may also be implemented as a program recorded in a recording medium including machine-readable instructions for implementing the method according to the present application. Therefore, the present application also covers a recording medium storing a program for implementing the method according to the present application.

While certain specific embodiments of the present invention have been illustrated in detail by way of example, it should be understood by those skilled in the art that the foregoing examples are intended to be illustrative only and not to limit the scope of the invention. It will be appreciated by those skilled in the art that modifications may be made to the embodiments described above without departing from the scope and spirit of the invention. The scope of the invention is defined by the appended claims.

Claims

1. An image processing apparatus comprising:

an image acquisition unit configured to acquire a face image;

a region pair determining unit configured to determine at least one region pair from the face image, wherein regions in the region pair are symmetrical to each other about one of symmetry lines within the face; and

a feature determining unit configured to determine, for one of the regions in the pair of regions, a feature based on a first direction and a value of a pixel, and determine, from the other of the regions in the pair of regions, a feature based on a second direction and a value of a pixel, wherein the first direction and the second direction are symmetrical to each other about a symmetry line of the pair of regions.

2. The apparatus of claim 1, wherein the line of symmetry within the face comprises at least one of: the symmetry line of the face, the symmetry line of the components of the face.

3. The apparatus according to claim 2, wherein the region pair determining unit determines the region pair based on a feature point pair determined from a component of the face, wherein feature points in the feature point pair are symmetrical to each other about a symmetry line of the region pair.

4. The apparatus according to claim 1, wherein for any one of the pair of regions, the feature determination unit determines the feature based on a frequency of occurrence of a first value of the pixel and a frequency of occurrence of a second value of the pixel;

wherein, for any one of the pixels in said one of the regions, a corresponding first value is determined based on the first direction and a difference value, the difference value being determined based on the gray level of the pixel and the gray level of neighboring pixels of the pixel;

wherein for any one of the pixels in the other one of the regions, a corresponding second value is determined based on the second direction and a difference value, the difference value being determined based on the gray level of the pixel and the gray level of the neighboring pixels of the pixel.

5. The apparatus of claim 4, wherein the first and second values are intensity values of pixels or angle values of pixels.

6. The apparatus according to claim 4, wherein the feature determination unit further obtains at least one pair of sub-regions from at least one of the pairs of regions, wherein the sub-regions of the pair of sub-regions are symmetrical to each other about a symmetry line of the corresponding pair of regions.

7. The apparatus according to claim 1, wherein, for at least one of the pair of regions, in a case where an occlusion region is determined from the pair of regions, the feature determining unit determines a feature from the other of the pair of regions based on the first direction or the second direction, and regards the determined feature as a corresponding feature of the occlusion region.

8. The apparatus of claim 1, wherein the first direction is a clockwise direction and the second direction is a counterclockwise direction.

9. The apparatus of claim 1, wherein the apparatus further comprises:

and an occlusion judgment unit configured to judge, for at least one of the pair of regions, whether occlusion occurs between the regions based on a similarity measure between features determined from the regions.

10. An image processing method, comprising:

an image acquisition step of acquiring a face image;

a region pair determining step of determining at least one region pair from the face image, wherein regions in the region pair are symmetrical to each other about one of symmetry lines in the face; and

a feature determining step of determining, for one of the regions in the pair of regions, a feature based on a first direction and a value of a pixel, and determining, from the other of the regions in the pair of regions, a feature based on a second direction and a value of a pixel, wherein the first direction and the second direction are symmetrical to each other about a symmetry line of the pair of regions.

11. The method according to claim 10, wherein, for any one of the pair of regions, in the feature determining step, a feature is determined based on the frequency of occurrence of the first value of the pixel and the frequency of occurrence of the second value of the pixel;

wherein, for any one of the pixels in one of the regions, a corresponding first value is determined based on the first direction and a difference value, the difference value being determined based on the gray level of the pixel and the gray level of the neighboring pixels of the pixel;

wherein for any one of the pixels in the other of the regions, a corresponding second value is determined based on the second direction and a difference value, the difference value being determined based on the gray level of the pixel and the gray level of the neighboring pixels of the pixel.

12. The method according to claim 10, wherein for at least one of the pairs of regions, in case an occlusion region is determined from the pair of regions, in the feature determining step, a feature is determined from the other of the pair of regions based on the first direction or the second direction, and the feature is regarded as a corresponding feature of the occlusion region.

13. The method of claim 10, wherein the method further comprises:

And judging whether or not occlusion occurs between the regions based on a similarity measure between features determined from the regions, for at least one of the pairs of regions.

14. An image processing system, comprising:

a first image processing apparatus configured to obtain features based on features determined from the input face image according to any one of claims 1 to 8;

a second image processing device configured to obtain at least one feature candidate from a predetermined index including at least feature candidates, wherein one of the feature candidates is determined based on the feature determined from one of the sample face images according to any one of claims 1 to 8; and

a third image processing means configured to determine, for at least one of the sample face images, whether the sample face image is similar to the input face image based on a similarity measure between the obtained feature and the obtained feature candidate corresponding to the sample face image.

15. An image processing system, comprising:

a first image processing device configured to acquire a face image of a person from the image processing system;

A second image processing apparatus configured to obtain a first feature based on the features determined from the acquired face image according to any one of claims 1 to 8, and to obtain a second feature based on the features determined from the input face image according to any one of claims 1 to 8; and

and a third image processing device configured to identify whether or not the face in the input face image belongs to the face of the person in the acquired face image based on a similarity measure between the first feature and the second feature.