CN111767914A - Target object detection device and method, image processing system, and storage medium - Google Patents
Target object detection device and method, image processing system, and storage medium Download PDFInfo
- Publication number
- CN111767914A CN111767914A CN201910255843.9A CN201910255843A CN111767914A CN 111767914 A CN111767914 A CN 111767914A CN 201910255843 A CN201910255843 A CN 201910255843A CN 111767914 A CN111767914 A CN 111767914A
- Authority
- CN
- China
- Prior art keywords
- target object
- detection
- candidate
- geometric information
- generated
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 154
- 238000012545 processing Methods 0.000 title claims abstract description 20
- 238000000034 method Methods 0.000 title abstract description 34
- 239000000284 extract Substances 0.000 claims abstract description 7
- 238000013528 artificial neural network Methods 0.000 claims description 19
- 238000000605 extraction Methods 0.000 claims description 19
- 238000012805 post-processing Methods 0.000 claims description 8
- 230000004807 localization Effects 0.000 claims 1
- 238000012806 monitoring device Methods 0.000 description 4
- 230000003287 optical effect Effects 0.000 description 4
- 238000007619 statistical method Methods 0.000 description 4
- 238000013459 approach Methods 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 230000006870 function Effects 0.000 description 2
- 238000012546 transfer Methods 0.000 description 2
- 238000003491 array Methods 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 210000000887 face Anatomy 0.000 description 1
- 230000014509 gene expression Effects 0.000 description 1
- 239000011521 glass Substances 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 230000000717 retained effect Effects 0.000 description 1
- 230000001629 suppression Effects 0.000 description 1
- 238000012549 training Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/42—Global feature extraction by analysis of the whole pattern, e.g. using frequency domain transformations or autocorrelation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/25—Determination of region of interest [ROI] or a volume of interest [VOI]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/46—Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
- G06V10/462—Salient features, e.g. scale invariant feature transforms [SIFT]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V2201/00—Indexing scheme relating to image or video recognition or understanding
- G06V2201/07—Target detection
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Image Analysis (AREA)
Abstract
The present disclosure discloses a target object detection apparatus and method, an image processing system, and a storage medium. The target object detection device includes: a unit that extracts features from the image; a unit that obtains a candidate detection region having pre-generated geometric information on an image based on the extracted features and the pre-generated geometric information, wherein the pre-generated geometric information is capable of describing at least an overall shape of a target object; a unit that detects a candidate target object in the image from the generated candidate detection region based on the extracted feature; and a unit that determines a target object in the image based on the detected target object candidate. According to the method and the device, the detection precision of the target object can be effectively improved, namely, the detection rate of the target object and the positioning accuracy of the target object can be effectively improved.
Description
Technical Field
The present invention relates to image processing, and more particularly to detection of a target object in an image, for example.
Background
The method for detecting the target object (such as the human face, the human body and the like) from the image or the video has important application value for subsequent image processing applications (such as face recognition, person tracking, people counting and the like). The current conventional approach to detecting a target object is: the target object in the image is detected by scanning the image in a sliding window manner by using a pre-generated rectangular detection window (namely, a rectangular detection area) with a plurality of rules.
For example, an exemplary technique for detecting a target object using a neural network is disclosed in U.S. Pat. No. US9858496B 2. The neural network used in the exemplary technique includes a Region proposal network layer (RPN), and the RPN layer includes a plurality of rectangular detection regions with fixed aspect ratios and fixed scaling ratios. Among them, the exemplary technique is mainly: firstly, scanning features extracted from an input image by using an RPN layer in a sliding window mode, and projecting a scanning result onto the input image to obtain a plurality of candidate rectangular detection areas; and then, the candidate rectangular detection areas are used for carrying out classification and positioning operations on the target object so as to determine the final target object.
As described above, in the detection of the target object, the above-described method uses the pre-generated detection region having a plurality of regular rectangular regions, so that the above-described method can effectively detect the target object in the regular form, that is, the above-described method can effectively detect the target object having the regular shape. However, in many scenes, the target object is often occluded (e.g., the target object is partially occluded) and/or deformed (e.g., the pose of the target object changes, the capture angle of the image changes). That is, in many scenes, the target object will usually be presented in a local form and/or a deformed form, so that the target object can only present a local shape. Since the rectangular detection area cannot describe the local shape of the target object well, in the case of the target object having the local shape, the target object cannot be detected from the image well by using the above method, and the detection accuracy of the target object is affected. In other words, the recall rate of the target object and the accuracy of the positioning of the target object will be affected.
Disclosure of Invention
In view of the above description in the background, the present disclosure is directed to solving at least one of the problems set forth above.
According to an aspect of the present disclosure, there is provided a target object detection apparatus including: an extraction unit that extracts a feature from an image; a generation unit that generates a detection region candidate having pre-generated geometric information on the image based on the extracted features and the pre-generated geometric information, the pre-generated geometric information being capable of describing at least an overall shape of a target object; a detection unit that detects a target object candidate in the image from the generated detection region candidate based on the extracted features; and a determination unit that determines a target object in the image based on the detected target object candidates.
According to another aspect of the present disclosure, there is provided a target object detection method including: an extraction step of extracting features from the image; a generation step of generating a candidate detection region having pre-generated geometric information on the image based on the extracted features and the pre-generated geometric information, wherein the pre-generated geometric information can describe at least an overall shape of a target object; a detection step of detecting a target object candidate in the image from the generated detection region candidate based on the extracted features; and a determination step of determining a target object in the image based on the detected target object candidates.
Wherein the pre-generated geometrical information is further capable of describing a local shape of the target object. Wherein the pre-generated geometric information is constituted by, for example, a bitmap or keypoints of the target object.
According to still another aspect of the present disclosure, there is provided an image processing system including: an acquisition device for acquiring an image or video; the target object detection apparatus as described above, for detecting a face in the acquired image or video; and post-processing means for performing a subsequent image processing operation based on the detected face; wherein the acquisition device, the target object detection device, and the post-processing device are connected to each other via a network.
According to yet another aspect of the present disclosure, there is provided a storage medium having stored thereon instructions that, when executed by a processor, cause a target object detection method to be performed, the target object detection method comprising: an extraction step of extracting features from the image; a generation step of generating a candidate detection region having pre-generated geometric information on the image based on the extracted features and the pre-generated geometric information, wherein the pre-generated geometric information can describe at least an overall shape of a target object; a detection step of detecting a target object candidate in the image from the generated detection region candidate based on the extracted features; and a determination step of determining a target object in the image based on the detected target object candidates. Wherein the pre-generated geometrical information is further capable of describing a local shape of the target object. Wherein the pre-generated geometric information is constituted by, for example, a bitmap or keypoints of the target object.
In the present disclosure, in generating the detection region candidates, the present disclosure makes it possible to generate detection region candidates having information that can describe the shape of the target object by using geometric information that can describe the shape of the target object. The shape of the described target object may be the overall shape of the target object, and may also be a local shape of the target object. Therefore, the present disclosure can effectively detect a target object in a corresponding scene from an image, regardless of whether the scene is presented in a global form for the target object or in a local form and/or a deformed form for the target object. Therefore, according to the present disclosure, the detection accuracy of the target object, that is, the recall rate of the target object and the accuracy of the positioning of the target object can be effectively improved.
Other features and advantages of the present disclosure will become apparent from the following description of exemplary embodiments, which refers to the accompanying drawings.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of the specification, illustrate embodiments of the disclosure and together with the description of the embodiments, serve to explain the principles of the disclosure.
1A-1C schematically illustrate examples of occluded faces according to the present disclosure.
Fig. 2 schematically shows an example of a partially occluded human body according to the present disclosure.
Fig. 3A to 3D schematically show examples of geometric information capable of describing the overall/local shape of a target object according to the present disclosure.
Fig. 4 is a block diagram schematically illustrating a hardware configuration in which the technology according to the embodiment of the present disclosure can be implemented.
Fig. 5 is a block diagram illustrating the configuration of a target object detection apparatus according to an embodiment of the present disclosure.
Fig. 6 schematically illustrates a schematic structure of a pre-generated neural network that may be used with embodiments of the present disclosure.
Fig. 7 schematically illustrates a flow chart of a target object detection method according to an embodiment of the present disclosure.
Fig. 8A to 8C schematically show examples of a detection region candidate generated, a detected target object candidate, and a determined target object according to an embodiment of the present disclosure.
Fig. 9A to 9E schematically show examples of determination of the degree of overlap between two candidate target objects according to an embodiment of the present disclosure.
Fig. 10 shows an arrangement of an exemplary image processing system according to the present disclosure.
Detailed Description
Exemplary embodiments of the present disclosure will be described in detail below with reference to the accompanying drawings. It should be noted that the following description is merely illustrative and exemplary in nature and is in no way intended to limit the disclosure, its application, or uses. The relative arrangement of components and steps, numerical expressions, and numerical values set forth in the embodiments do not limit the scope of the present disclosure unless specifically stated otherwise. Additionally, techniques, methods, and apparatus known to those skilled in the art may not be discussed in detail, but are intended to be part of the present specification where appropriate.
Note that like reference numerals and letters refer to like items in the drawings, and thus, once an item is defined in a drawing, it is not necessary to discuss it in the following drawings.
As described above, in many scenes, some target objects are usually occluded and/or deformed to be presented in a local form and/or a deformed form, so that the target objects can only present a local shape. The inventors have found that, on the one hand, for a certain kind of target object to be represented in a local shape, it will usually only represent a certain few local morphologies or deformed morphologies. For example, in the case where the target object is a human face, the partial form it presents is, for example, generally that eyes are blocked by a foreign object (for example, as shown in fig. 1A), a mouth is blocked by a mask (for example, as shown in fig. 1B), a mouth is blocked by another person (for example, as shown in fig. 1C, the person 110), and the like. For example, in the case that the target object is a human body, the local form it presents is, for example, usually that the body is partially occluded by others (e.g., the person 210 as shown in fig. 2), and the like. Therefore, the inventors consider that, for a certain kind of target object, the geometric information capable of describing the local shape of the target object can be obtained by a clustering or statistical method based on a sample of the target object presented in the local form and/or deformed form under the detection view angle, wherein the local shape of the target object is marked in the sample presented in the local form and/or deformed form. In addition, some target objects are presented in the overall shape, so that the target objects present the overall shape, and therefore, for a certain kind of target objects presented in the overall shape, the geometric information capable of describing the overall shape of the target object can be obtained by a clustering or statistical method based on a sample presented in the overall shape of the target object from the detection perspective, wherein the sample presented in the overall shape is labeled with the overall shape of the target object. Thus, geometric information that can describe the overall/local shape of such target objects, i.e., "pre-generated geometric information" that may be used in the present disclosure, may be obtained.
On the other hand, whether a target object is presented in an overall shape or in a local shape, the inventors believe that in one way, where the shape of the target object itself is rectangular, geometric information that can describe the shape of such a target object may be constituted, for example, directly by rectangular areas of different sizes (as shown, for example, in fig. 3A). In another approach, the geometric information that can describe the shape of the target object may also be constituted, for example, by a bitmap (bitmap) that projects the foreground contour of the target object onto rectangular areas of different sizes. For example, for a target object as shown in fig. 1A, the corresponding geometric information is, for example, a bitmap as shown in fig. 3B. For example, for a target object as shown in fig. 2, the corresponding geometric information is, for example, a bitmap as shown in fig. 3C. In yet another approach, the geometric information that can describe the shape of the target object may also be constituted, for example, by a set of keypoints (landworks) that can describe the shape profile or apparent structure of the target object within rectangular regions of different sizes. For example, for a target object as shown in FIG. 1C, the corresponding geometric information is, for example, a set of keypoints as shown in FIG. 3D.
Thus, the inventors consider that the present disclosure can generate respective candidate detection regions by using the geometric information obtained and constituted in the above-described manner to detect respective target objects from the image. As described above, since the geometric information obtained and configured in the above manner can describe the overall shape of the target object and also describe the local shape of the target object, the present disclosure can effectively detect the target object in a corresponding scene from the image regardless of whether the scene is presented in the overall form for the target object or the scene is presented in the local form and/or the deformed form for the target object. Therefore, according to the present disclosure, the detection accuracy of the target object, that is, the recall rate of the target object and the accuracy of the positioning of the target object can be effectively improved.
(hardware construction)
A hardware configuration that can implement the technique described hereinafter will be described first with reference to fig. 4.
The hardware configuration 400 includes, for example, a Central Processing Unit (CPU)410, a Random Access Memory (RAM)420, a Read Only Memory (ROM)430, a hard disk 440, an input device 450, an output device 460, a network interface 470, and a system bus 480. Further, in one implementation, hardware configuration 400 may be implemented by a computer, such as a tablet, laptop, desktop, or other suitable electronic device. In another implementation, hardware configuration 400 may be implemented by a monitoring device, such as a digital camera, video camera, web camera, or other suitable electronic device. Where hardware configuration 400 is implemented by a monitoring device, hardware configuration 400 also includes, for example, an optical system 490.
In one implementation, a target object detection apparatus in accordance with the present invention is constructed from hardware or firmware and used as a module or component of hardware configuration 400. For example, a target object detection apparatus 500, which will be described in detail below with reference to fig. 5, may be used as a module or component of the hardware configuration 400. In another implementation, the apparatus for detecting a target object according to the present invention is constructed by software stored in the ROM430 or the hard disk 440 and executed by the CPU 410. For example, a process 700, which will be described in detail below with reference to fig. 7, may be used as a program stored in the ROM430 or the hard disk 440.
In one implementation, input device 450 is used to allow a user to interact with hardware configuration 400. In one example, a user may input video/images through input device 450. In another example, a user may trigger a corresponding process of the present invention through input device 450. In addition, input device 450 may take a variety of forms, such as a button, a keyboard, or a touch screen. In another implementation, the input device 450 is used to receive video/images output from specialized electronic devices such as digital cameras, video cameras, and/or web cameras. Additionally, where hardware configuration 400 is implemented by a monitoring device, optical system 490 in hardware configuration 400 would directly capture video/images of the monitoring site.
In one implementation, the output device 460 is used to display the detection results (such as the detected target object) to the user. Also, the output device 460 may take various forms such as a Cathode Ray Tube (CRT) or a liquid crystal display. In another implementation, the output device 460 is used to output detection results to subsequent image processing such as face recognition, person tracking, people counting, and the like.
The hardware configuration 400 described above is merely illustrative and is in no way intended to limit the present invention, its applications, or uses. Also, only one hardware configuration is shown in FIG. 4 for simplicity. However, a plurality of hardware configurations may be used as necessary.
(target object detecting device and method)
The detection process according to the present invention will be described next with reference to fig. 5 to 9E.
Fig. 5 is a block diagram illustrating the configuration of a target object detection apparatus 500 according to an embodiment of the present disclosure. Wherein some or all of the modules shown in figure 5 may be implemented by dedicated hardware. As shown in fig. 5, the target object detection apparatus 500 includes an extraction unit 510, a generation unit 520, a detection unit 530, and a determination unit 540.
In addition, the storage device 550 shown in fig. 5 stores, for example, at least pre-generated geometric information capable of describing the shape (e.g., overall shape, local shape) of the target object. In one implementation, the storage device 550 is the ROM430 or the hard disk 440 shown in FIG. 4. In another implementation, the storage device 550 is a server or an external storage device connected to the target object detection apparatus 500 via a network (not shown).
First, in one implementation, for example, in a case where the hardware configuration 400 shown in fig. 4 is implemented by a computer, the input device 450 receives an image output from a dedicated electronic device (e.g., a camera or the like) or input by a user. The input device 450 then transmits the received image to the target object detection apparatus 500 via the system bus 480. In another implementation, for example, where the hardware configuration 400 is implemented by a monitoring device, the target object detection apparatus 500 directly uses the image captured by the optical system 490.
Then, as shown in fig. 5, the extraction unit 510 extracts a feature from the received image (i.e., the entire image). In one implementation, extraction unit 510 extracts, for example, a deep convolution feature map from the received image using various feature extraction operators, such as, but clearly not limited to, convolutional neural networks having the structure VGC16, ResNet, SENet, and the like.
The generation unit 520 generates a candidate detection region having the pre-generated geometric information capable of describing at least the overall shape of the target object on the received image based on the features extracted by the extraction unit 510 and the pre-generated geometric information stored in the storage device 550. For a class of target objects, the geometric information describing the overall shape of the class of target objects may be obtained, for example, by clustering or statistical methods based on samples of the class of target objects in their overall shape from the detection perspective, where the samples are labeled with the overall shape of the class of target objects. As described above, in order to effectively detect the target object presented in the local form and/or the deformed form from the image as well, the pre-generated geometric information can further describe the local shape of the target object. That is, the pre-generated geometric information can describe not only the overall shape of the target object, but also the local shape of the target object. For a class of target objects, geometric information describing the local shape of the class of target objects may be obtained, for example, by clustering or statistical methods based on samples of the class of target objects presented in their local shape and/or deformed shape from the detection perspective, where the local shape of the class of target objects is labeled in the samples. As mentioned above, the pre-generated geometric information may be constituted by a bitmap or key points of the target object, for example.
In one implementation, the generating unit 520 may generate the candidate detection region on the received image, for example, by: the corresponding regions are first determined on the extracted features by pre-generated geometric information obtained from the storage device 550, and then the determined regions are mapped onto the received image to obtain candidate detection regions.
After generating the candidate detection regions, the detection unit 530 detects the candidate target objects in the received image from the candidate detection regions generated by the generation unit 520 based on the features extracted by the extraction unit 510. Also, the determination unit 540 determines a target object in the received image based on the target object candidate detected by the detection unit 530.
Finally, the determination unit 540 transmits the detection result (e.g., the detected target object) to the output device 460 via the system bus 480 shown in fig. 4 for displaying the detection result to the user or for outputting the detection result to subsequent image processing such as face recognition, person tracking, people counting, and the like.
Furthermore, preferably, in one implementation, each unit (i.e., the extracting unit 510, the generating unit 520, the detecting unit 530, and the determining unit 540) in the target object detecting apparatus 500 shown in fig. 5 may perform a corresponding operation using a pre-generated neural network. In one aspect, as shown, for example, in fig. 6, a pre-generated neural network that may be used with embodiments of the present disclosure includes, for example, a portion for extracting features (i.e., sub-networks), a portion for generating candidate detection regions, a portion for detecting candidate target objects, and a portion for determining target objects. In the present disclosure, for example, the parts in the neural network may be generated in advance based on the above-mentioned pre-generated geometric information capable of describing the overall/local shape of the target object, through an end-to-end training mode and a backward-transfer updating mode. On the other hand, the pre-generated neural network may be stored in the storage device 550, for example.
Specifically, in one aspect, the target object detection apparatus 500 retrieves a pre-generated neural network from the storage device 550. On the other hand, the extraction unit 510 extracts features from the received image using a portion for extracting features in the neural network. The generating unit 520 generates candidate detection regions on the received image based on the features extracted by the extracting unit 510 and the pre-generated geometric information, using a portion of the neural network for generating the candidate detection regions. The detection unit 530 detects a candidate target object in the received image from the candidate detection area generated by the generation unit 520 based on the feature extracted by the extraction unit 510, using a portion for detecting a candidate target object in the neural network. The determination unit 540 determines the target object in the received image based on the candidate target object detected by the detection unit 530, using a portion for determining the target object in the neural network.
The flowchart 700 shown in fig. 7 is a corresponding process of the target object detection apparatus 500 shown in fig. 5.
As shown in fig. 7, in the extraction step S710, the extraction unit 510 extracts features from the received image.
In the generating step S720, the generating unit 520 obtains corresponding pre-generated geometric information from the storage device 550 according to the type of the target object, and generates a candidate detection region having the pre-generated geometric information on the received image based on the extracted features and the obtained pre-generated geometric information, wherein the pre-generated geometric information can describe at least the overall shape of the target object. Thus, in one implementation, the candidate detection regions generated by the generation unit 520 include detection regions having the overall shape of the target object. In another implementation, the pre-generated geometric information can describe not only the overall shape of the target object, but also the local shape of the target object. Thus, the detection-candidate regions generated by the generation unit 520 include a detection region having the overall shape of the target object (which may be referred to as a "first detection-candidate region", for example) and a detection region having the local shape of the target object (which may be referred to as a "second detection-candidate region", for example). For example, taking the target object as shown in fig. 1C as an example, after the generation step S720, the generated first candidate detection regions are, for example, regions 811 to 813 composed of solid lines as shown in fig. 8A, and the generated second candidate detection regions are, for example, regions 814 to 817 composed of dotted lines as shown in fig. 8A, wherein the used pre-generated geometric information is, for example, composed of key points of the target object (for example, as shown by dots in fig. 8A).
After generating the candidate detection regions, in the detection step S730, the detection unit 530 detects the candidate target objects in the received image from the generated candidate detection regions based on the extracted features. In one implementation, the detection unit 530 performs a classification operation on the generated candidate detection regions based on the geometric information included in the generated candidate detection regions. For example, the detection unit 530 performs discriminative classification of the target object for each candidate detection region by a pre-generated classifier or the above-described pre-generated neural network. For example, a detection region candidate having therein the overall shape or the local shape of the target object is determined as a "target object candidate", and a detection region candidate having therein no overall shape or local shape of the target object is determined as a "non-target object candidate". On the other hand, the detection unit 530 performs a positioning operation on the generated candidate detection region based on the extracted features and geometric information included in the generated candidate detection region. For example, the detection unit 530 performs regression processing on each candidate detection region based on the extracted features by a pre-generated regressor or the above-described pre-generated neural network to obtain final position information of each candidate detection region, where the generated geometric information in the candidate detection region may be regarded as initial position information of the candidate detection region. Thus, for the "candidate target object" obtained by the classification operation, the final position information thereof can also be obtained by the positioning operation. Further, in the case where the pre-generated geometric information used is constituted by the keypoints of the target object, the final position information may be obtained by performing a regression operation on the keypoints in the generated candidate detection regions. For example, taking the target object as shown in fig. 1C as an example, after the detection step S730, the detected target object candidate is, for example, an area composed of a solid line and a dashed line as shown in fig. 8B.
Returning to fig. 7, the determination unit 540 determines a target object in the received image based on the detected target object candidates. In one implementation, the determining unit 540 determines the final target object by performing a selection or combination operation on the detected candidate target objects based on the geometric information included in the candidate target objects through a Non-Maximum Suppression (NMS) method. For example, the determination unit 540 performs a selection or merging operation on the candidate target objects by determining whether there is a candidate target object belonging to the same target object. Specifically, the determination unit 540 determines the final target object, for example, by:
first, for any two candidate target objects, the determination unit 540 calculates the distance between the two candidate target objects based on the geometric information possessed in the two candidate target objects. As described above, the geometric information capable of describing the overall/local shape of the target object used by the present disclosure may be constituted by a bitmap which projects the foreground contour of the target object onto a rectangular region, a set of key points within the rectangular region which may describe the shape contour or apparent structure of the target object, or a rectangular region which may directly describe the shape of the target object itself. Therefore, for any two candidate target objects, the degree of overlap between the geometric information possessed in the two candidate target objects may be used as the distance between the two candidate target objects.
For example, in the case where the used pre-generated geometric information is constituted by bitmaps of target objects, the degree of overlap between bitmaps present in the two candidate target objects may be calculated as the distance between the two candidate target objects. For example, assuming that the bitmaps included in the two candidate target objects are respectively shown as the hatched portions in fig. 9A and 9B, and assuming that the two candidate target objects overlap as shown in fig. 9C, it can be determined that the portion of true overlap between the two candidate target objects is shown as the hatched portion in fig. 9D, and thus it can be known that the degree of overlap between the two candidate target objects is small (e.g., 3). Further, without using the present disclosure, the overlapping portion between the two candidate target objects will be regarded as shown by the hatched portion in fig. 9E, that is, the overlapping degree between the rectangular regions where they are located is used as the overlapping degree between them, so that it can be known that the accuracy of the overlapping degree obtained at this time is low, and thus the determination of whether the two candidate target objects belong to the same target object will be affected. As another example, in the case where the used pre-generated geometric information is constituted by the keypoints of the target objects, the degree of overlap between polygons constituted by the keypoints included in the two candidate target objects may be calculated as the distance between the two candidate target objects. As another example, in the case where the used pre-generated geometric information is constituted by a rectangular region that can directly describe the shape of the target object itself, the degree of overlap between the rectangular regions that are present in the two candidate target objects may be calculated as the distance between the two candidate target objects.
Then, for all the candidate target objects, after the distances thereof from each other are calculated, the determination unit 540 merges the candidate target objects belonging to the same target object by the NMS method to obtain a final target object. For example, for any two candidate target objects, in the case where the distance between them is greater than or equal to a predefined threshold (e.g., TH), the two candidate target objects will be judged as belonging to the same target object and thus will be retained one of them or will be united into one. And performing the operation until all the remaining candidate target objects are less than TH in distance from each other, and then considering the remaining candidate target objects as final target objects.
As described above, in determining whether two candidate target objects belong to the same target object, the present disclosure determines the degree of overlap therebetween by using geometric information capable of describing the overall/local shape of the target object, so that a more accurate degree of overlap can be obtained, and thus the detection accuracy of the target object can be further improved. For example, taking the target object shown in fig. 1C as an example, after the detection step S740, the determined final target object is, for example, a portion where the solid line region is located and a portion where the dashed line region is located as shown in fig. 8C.
Finally, returning to fig. 7, the determination unit 540 transmits the detection result (e.g., the detected target object) to the output device 460 via the system bus 480 shown in fig. 4, for displaying the detection result to the user or for outputting the detection result to subsequent image processing such as face recognition, person tracking, people counting, and the like.
Further, as described in fig. 5, each unit (i.e., the extraction unit 510, the generation unit 520, the detection unit 530, and the determination unit 540) in the target object detection apparatus 500 may perform a corresponding operation using a neural network generated in advance. Therefore, the steps shown in fig. 7 (i.e., the extracting step S710, the generating step S720, the detecting step S730, and the determining step S740) may also perform corresponding operations using the pre-generated neural network.
As described above, the present disclosure may generate respective candidate detection regions by using geometric information capable of describing a shape of a target object to detect the respective target object from an image. Since the used geometric information can describe the overall shape of the target object and also describe the local shape of the target object, the present disclosure can effectively detect the target object in the corresponding scene from the image regardless of the scene presented in the overall shape of the target object or the scene presented in the local form and/or the deformed form of the target object. Therefore, according to the present disclosure, the detection accuracy of the target object, that is, the recall rate of the target object and the accuracy of the positioning of the target object can be effectively improved.
(applications)
Further, as described above, the present invention may be implemented by a computer (e.g., a client server). Thus, as an application, taking the present invention as an example by a client server, fig. 10 shows the arrangement of an exemplary image processing system 1000 according to the present invention. In this application, the image processing system 1000 is used for face recognition, person tracking, people counting, or the like, for example. As shown in fig. 10, the image processing system 1000 includes an acquisition device 1010 (e.g., at least one web camera), a post-processing device 1020, and a target object detection device 500 as shown in fig. 5, wherein the acquisition device 1010, the post-processing device 1020, and the target object detection device 500 are connected to each other via a network 1030. The post-processing apparatus 1020 and the target object detection apparatus 500 may be implemented by the same client server, or may be implemented by different client servers.
As shown in fig. 10, first, the acquisition device 1010 captures an image or video of a place of interest (e.g., a mall entrance, a supermarket entrance, etc.) and transmits the captured image/video to the target object detection device 500 via the network 1030.
The target object detection apparatus 500 detects a face from the captured image/video with reference to fig. 5 to 9E. That is, in this application, the target object is a face (e.g., a face of a person). Therefore, in this application, the geometric information capable of describing the whole/local shape of the face used by the target object detection apparatus 500 is composed of, for example, a set of face key points (e.g., eye key points, mouth key points, nose tip key points, etc.) that can describe the shape outline of the face within a rectangular region. Further, in this application, the whole/partial shape geometric information capable of describing a face can be obtained, for example, based on face samples at various detection angles, such as various types of face samples (e.g., a face on the front, a face on the side, and the like), face samples of various sizes (e.g., a face of a large size, a face of a small size, and the like), face samples that are blocked (e.g., a face with glasses/sunglasses, a face with a mask, and the like), and the like.
The post-processing device 1020 performs subsequent image processing operations, such as face recognition, person tracking, or people counting, based on the detected face.
All of the elements described above are exemplary and/or preferred modules for implementing the processes described in this disclosure. These units may be hardware units (such as Field Programmable Gate Arrays (FPGAs), digital signal processors, application specific integrated circuits, etc.) and/or software modules (such as computer readable programs). The units for carrying out the steps have not been described in detail above. However, in case there are steps to perform a specific procedure, there may be corresponding functional modules or units (implemented by hardware and/or software) to implement the same procedure. The technical solutions through all combinations of the described steps and the units corresponding to these steps are included in the disclosure of the present application as long as the technical solutions formed by them are complete and applicable.
The method and apparatus of the present invention may be implemented in a variety of ways. For example, the methods and apparatus of the present invention may be implemented in software, hardware, firmware, or any combination thereof. The above-described order of the steps of the method is intended to be illustrative only and the steps of the method of the present invention are not limited to the order specifically described above unless specifically indicated otherwise. Furthermore, in some embodiments, the present invention may also be embodied as a program recorded in a recording medium, which includes machine-readable instructions for implementing a method according to the present invention. Accordingly, the present invention also covers a recording medium storing a program for implementing the method according to the present invention.
While some specific embodiments of the present invention have been shown in detail by way of example, it should be understood by those skilled in the art that the foregoing examples are intended to be illustrative only and are not limiting upon the scope of the invention. It will be appreciated by those skilled in the art that the above-described embodiments may be modified without departing from the scope and spirit of the invention. The scope of the invention is to be limited only by the following claims.
Claims (13)
1. A target object detection apparatus, characterized by comprising:
an extraction unit that extracts a feature from an image;
a generation unit that generates a detection region candidate having pre-generated geometric information on the image based on the extracted features and the pre-generated geometric information, the pre-generated geometric information being capable of describing at least an overall shape of a target object;
a detection unit that detects a target object candidate in the image from the generated detection region candidate based on the extracted features; and
a determination unit that determines a target object in the image based on the detected target object candidates.
2. The target object detection apparatus of claim 1, wherein the pre-generated geometric information further describes a local shape of the target object;
wherein the generated detection region candidates include a detection region having an overall shape of the target object and a detection region having a local shape of the target object.
3. The target object detection apparatus according to claim 1 or 2, wherein, for the geometric information that can describe the overall shape of the target object in the pre-generated geometric information, for a class of target objects, the corresponding geometric information is obtained based on a sample in which the class of target object is presented in its overall form under the detection perspective, wherein the overall shape of the class of target object is noted in the sample.
4. The target object detection apparatus according to claim 2, wherein, for the geometric information that can describe the local shape of the target object in the pre-generated geometric information, for one kind of target object, the corresponding geometric information is obtained based on a sample in which the kind of target object is presented in its local form and/or deformed form under the detection perspective, wherein the local shape of the kind of target object is marked in the sample.
5. The target object detection apparatus according to claim 1 or 2, wherein the pre-generated geometric information is constituted by a bitmap or key points of the target object.
6. The target object detection device according to claim 1 or 2, wherein the detection unit detects the target object candidate in the image by performing a classification and localization operation on the generated candidate detection regions based on the extracted features and geometric information possessed therein.
7. The target object detection device according to claim 1 or 2, wherein the determination unit determines the target object in the image by performing a selection or merging operation on the detected candidate target objects based on a distance between the detected candidate target objects;
wherein the distance between any two candidate target objects is obtained by the geometric information of the two candidate target objects.
8. The target object detection apparatus according to claim 1, wherein the extraction unit, the generation unit, the detection unit, and the determination unit perform respective operations using a pre-generated neural network.
9. A target object detection method, characterized by comprising:
an extraction step of extracting features from the image;
a generation step of generating a candidate detection region having pre-generated geometric information on the image based on the extracted features and the pre-generated geometric information, wherein the pre-generated geometric information can describe at least an overall shape of a target object;
a detection step of detecting a target object candidate in the image from the generated detection region candidate based on the extracted features; and
a determination step of determining a target object in the image based on the detected target object candidates.
10. The target object detection method of claim 9, wherein the pre-generated geometric information further describes a local shape of the target object;
wherein the generated detection region candidates include a detection region having an overall shape of the target object and a detection region having a local shape of the target object.
11. The target object detection method according to claim 9, wherein in the extraction step, the generation step, the detection step, and the determination step, respective operations are performed using a neural network generated in advance.
12. An image processing system, characterized in that the image processing system comprises:
an acquisition device for acquiring an image or video;
the target object detection apparatus according to any one of claims 1 to 8, for detecting a face in the acquired image or video; and
a post-processing device that performs a subsequent image processing operation based on the detected face;
wherein the acquisition device, the target object detection device, and the post-processing device are connected to each other via a network.
13. A storage medium having stored thereon instructions that, when executed by a processor, cause a target object detection method to be performed, the target object detection method comprising:
an extraction step of extracting features from the image;
a generation step of generating a candidate detection region having pre-generated geometric information on the image based on the extracted features and the pre-generated geometric information, wherein the pre-generated geometric information can describe at least an overall shape of a target object;
a detection step of detecting a target object candidate in the image from the generated detection region candidate based on the extracted features; and
a determination step of determining a target object in the image based on the detected target object candidates.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910255843.9A CN111767914A (en) | 2019-04-01 | 2019-04-01 | Target object detection device and method, image processing system, and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910255843.9A CN111767914A (en) | 2019-04-01 | 2019-04-01 | Target object detection device and method, image processing system, and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111767914A true CN111767914A (en) | 2020-10-13 |
Family
ID=72718873
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910255843.9A Pending CN111767914A (en) | 2019-04-01 | 2019-04-01 | Target object detection device and method, image processing system, and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111767914A (en) |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2002092592A (en) * | 2000-09-12 | 2002-03-29 | Toko Electric Corp | System and method for automatic patrol |
CN103679128A (en) * | 2012-09-24 | 2014-03-26 | 中国航天科工集团第二研究院二O七所 | Anti-cloud-interference airplane target detection method |
CN107622277A (en) * | 2017-08-28 | 2018-01-23 | 广东工业大学 | A kind of complex-curved defect classification method based on Bayes classifier |
CN108596952A (en) * | 2018-04-19 | 2018-09-28 | 中国电子科技集团公司第五十四研究所 | Fast deep based on candidate region screening learns Remote Sensing Target detection method |
CN109242848A (en) * | 2018-09-21 | 2019-01-18 | 西华大学 | Based on OTSU and GA-BP neural network wallpaper defects detection and recognition methods |
CN109274891A (en) * | 2018-11-07 | 2019-01-25 | 北京旷视科技有限公司 | A kind of image processing method, device and its storage medium |
-
2019
- 2019-04-01 CN CN201910255843.9A patent/CN111767914A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2002092592A (en) * | 2000-09-12 | 2002-03-29 | Toko Electric Corp | System and method for automatic patrol |
CN103679128A (en) * | 2012-09-24 | 2014-03-26 | 中国航天科工集团第二研究院二O七所 | Anti-cloud-interference airplane target detection method |
CN107622277A (en) * | 2017-08-28 | 2018-01-23 | 广东工业大学 | A kind of complex-curved defect classification method based on Bayes classifier |
CN108596952A (en) * | 2018-04-19 | 2018-09-28 | 中国电子科技集团公司第五十四研究所 | Fast deep based on candidate region screening learns Remote Sensing Target detection method |
CN109242848A (en) * | 2018-09-21 | 2019-01-18 | 西华大学 | Based on OTSU and GA-BP neural network wallpaper defects detection and recognition methods |
CN109274891A (en) * | 2018-11-07 | 2019-01-25 | 北京旷视科技有限公司 | A kind of image processing method, device and its storage medium |
Non-Patent Citations (1)
Title |
---|
苏松志,李绍滋,蔡国榕著: "行人检测 理论与实践", 31 March 2016, 厦门大学出版社, pages: 165 - 167 * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109255352B (en) | Target detection method, device and system | |
CN109376667B (en) | Target detection method and device and electronic equipment | |
CN108537112B (en) | Image processing apparatus, image processing system, image processing method, and storage medium | |
CN108470332B (en) | Multi-target tracking method and device | |
CN107358149B (en) | Human body posture detection method and device | |
US9478039B1 (en) | Background modeling and foreground extraction method based on depth image | |
WO2019218824A1 (en) | Method for acquiring motion track and device thereof, storage medium, and terminal | |
CN107808111B (en) | Method and apparatus for pedestrian detection and attitude estimation | |
CN109934065B (en) | Method and device for gesture recognition | |
EP2339507B1 (en) | Head detection and localisation method | |
Moya-Alcover et al. | Modeling depth for nonparametric foreground segmentation using RGBD devices | |
US11170512B2 (en) | Image processing apparatus and method, and image processing system | |
CN105556539A (en) | Detection devices and methods for detecting regions of interest | |
CN108509994B (en) | Method and device for clustering character images | |
US20180063449A1 (en) | Method for processing an asynchronous signal | |
EP2639743A2 (en) | Image processing device, image processing program, and image processing method | |
CN111626082A (en) | Detection device and method, image processing device and system | |
CN112749655B (en) | Sight line tracking method, device, computer equipment and storage medium | |
JP7354767B2 (en) | Object tracking device and object tracking method | |
CN109640066A (en) | The generation method and device of high-precision dense depth image | |
CN114698399A (en) | Face recognition method and device and readable storage medium | |
CN110689556A (en) | Tracking method and device and intelligent equipment | |
CN113052019B (en) | Target tracking method and device, intelligent equipment and computer storage medium | |
CN113298852A (en) | Target tracking method and device, electronic equipment and computer readable storage medium | |
JP2017084065A (en) | Identity theft detection device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |