CN108197536B - Image processing method and device, computer device and readable storage medium - Google Patents

Image processing method and device, computer device and readable storage medium Download PDF

Info

Publication number
CN108197536B
CN108197536B CN201711385169.3A CN201711385169A CN108197536B CN 108197536 B CN108197536 B CN 108197536B CN 201711385169 A CN201711385169 A CN 201711385169A CN 108197536 B CN108197536 B CN 108197536B
Authority
CN
China
Prior art keywords
pooling
image
processed
matching degree
maximum
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201711385169.3A
Other languages
Chinese (zh)
Other versions
CN108197536A (en
Inventor
陈乐�
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Intellifusion Technologies Co Ltd
Original Assignee
Shenzhen Intellifusion Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Intellifusion Technologies Co Ltd filed Critical Shenzhen Intellifusion Technologies Co Ltd
Priority to CN201711385169.3A priority Critical patent/CN108197536B/en
Publication of CN108197536A publication Critical patent/CN108197536A/en
Application granted granted Critical
Publication of CN108197536B publication Critical patent/CN108197536B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Image Processing (AREA)

Abstract

A method of image processing, the method comprising: acquiring the maximum pooling step length and the size of a pooling window according to a given filtering threshold; obtaining the matching degree score of each pixel point of the image to be processed and a target; and according to the matching degree score of each pixel point of the image to be processed, performing maximum pooling on the image to be processed according to the pooling step length and the pooling window size to obtain the image to be processed without the repeated candidate frame. The invention also provides an image processing device, a computer device and a readable storage medium. The invention can rapidly perform candidate frame de-duplication on the image according to the matching degree score of the pixel points.

Description

image processing method and device, computer device and readable storage medium
Technical Field
the invention relates to the technical field of computer vision, in particular to an image processing method and device, a computer device and a readable storage medium.
Background
In the target detection process, multiple candidate frames are usually obtained (for example, multiple candidate face frames are obtained in face detection). To eliminate redundant candidate frames, non-maximum suppression (NMS) algorithms are typically used to deduplicate (i.e., remove duplicate candidate frames) candidate frames in an image. However, the conventional NMS algorithm ranks the candidate frames according to the matching degree score of each candidate frame, continuously searches for the candidate frame with the highest matching degree score, traverses other candidate frames, and deletes the candidate frame with the overlapping rate exceeding the filtering threshold. The traditional NMS algorithm has low speed and efficiency in removing the repeated candidate boxes.
Disclosure of Invention
In view of the foregoing, it is desirable to provide an image processing method and apparatus, a computer apparatus and a readable storage medium, which can perform frame candidate deduplication on an image rapidly according to a matching degree score of a pixel point.
A first aspect of the present application provides an image processing method, the method comprising:
Acquiring the maximum pooling step length and the size of a pooling window according to a given filtering threshold;
Obtaining the matching degree score of each pixel point of the image to be processed and a target;
And according to the matching degree score of each pixel point of the image to be processed, performing maximum pooling on the image to be processed according to the pooling step length and the pooling window size to obtain the image to be processed without the repeated candidate frame.
in another possible implementation manner, the obtaining the maximum pooling step size and the pooling window size according to the given filtering threshold includes:
Establishing a filtering threshold and a calculation formula of the maximum pooling step length and the pooling window size, and calculating the maximum pooling step length and the pooling window size corresponding to the given filtering threshold according to the calculation formula; or
Establishing a corresponding relation table of different filtering thresholds and the maximum pooling step length and the pooling window size, and searching the maximum pooling step length and the pooling window size corresponding to the given filtering threshold from the corresponding relation table.
In another possible implementation manner, the obtaining a matching degree score of each pixel point of the image to be processed and the target includes:
Receiving pixel matching degree characteristics of an image to be processed, and acquiring a matching degree score of each pixel point of the image to be processed and a target from the pixel matching degree characteristics; or
Receiving an image to be processed, extracting pixel matching degree characteristics of the image to be processed, and obtaining a matching degree score of each pixel point of the image to be processed and a target.
in another possible implementation manner, the performing, according to the matching degree score of each pixel point of the image to be processed, the maximum pooling of the image to be processed by using the pooling step size and the pooling window size includes:
And acquiring the pooling windows of the image to be processed one by one according to the pooling step length and the size of the pooling windows, reserving the maximum matching degree score in each pooling window, and resetting the non-maximum matching degree score.
In another possible implementation manner, in the image to be processed after the repeated candidate frame is removed, each pixel point with a non-zero matching degree score corresponds to one candidate frame after the repetition removal, and the position of the upper left corner of the candidate frame after the repetition removal is the position with the non-zero matching degree score.
A second aspect of the present application provides an image processing apparatus, the apparatus comprising:
A first obtaining unit, configured to obtain a pooling step size and a pooling window size of a maximum pooling according to a given filtering threshold;
The second acquisition unit is used for acquiring the matching degree score of each pixel point of the image to be processed and the target;
And the pooling unit is used for performing maximum pooling on the image to be processed according to the matching degree score of each pixel point of the image to be processed by using the pooling step length and the pooling window size to obtain the image to be processed without the repeated candidate frame.
In another possible implementation manner, the first obtaining unit is specifically configured to:
Establishing a filtering threshold and a calculation formula of the maximum pooling step length and the pooling window size, and calculating the maximum pooling step length and the pooling window size corresponding to the given filtering threshold according to the calculation formula; or
Establishing a corresponding relation table of different filtering thresholds and the maximum pooling step length and the pooling window size, and searching the maximum pooling step length and the pooling window size corresponding to the given filtering threshold from the corresponding relation table.
In another possible implementation manner, the pooling unit is specifically configured to:
And acquiring the pooling windows of the image to be processed one by one according to the pooling step length and the size of the pooling windows, reserving the maximum matching degree score in each pooling window, and resetting the non-maximum matching degree score to obtain the image to be processed without the repeated candidate frame.
A third aspect of the application provides a computer apparatus comprising a processor for implementing the image processing method when executing a computer program stored in a memory.
A fourth aspect of the present application provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the image processing method.
The method comprises the steps of obtaining the maximum pooling step length and the size of a pooling window according to a given filtering threshold; obtaining the matching degree score of each pixel point of the image to be processed and a target; and according to the matching degree score of each pixel point of the image to be processed, performing maximum pooling on the image to be processed according to the pooling step length and the pooling window size to obtain the image to be processed without the repeated candidate frame. According to the invention, the candidate frames in the image are subjected to de-duplication through maximum pooling, the position information of the pixel point arrangement is reserved, the pixel points do not need to be sequenced, all repeated candidate frames can be removed only by traversing the pixel points once, the final result is obtained, the execution efficiency is greatly improved, and the candidate frame de-duplication can be rapidly performed on the image according to the matching degree score of the pixel points.
Drawings
fig. 1 is a flowchart of an image processing method according to an embodiment of the present invention.
FIG. 2 is a schematic diagram of candidate block deduplication by max pooling.
Fig. 3 is a schematic diagram of maximum pooling of images.
Fig. 4 is a structural diagram of an image processing apparatus according to a second embodiment of the present invention.
Fig. 5 is a schematic diagram of a computer device according to a third embodiment of the present invention.
Detailed Description
In order that the above objects, features and advantages of the present invention can be more clearly understood, a detailed description of the present invention will be given below with reference to the accompanying drawings and specific embodiments. It should be noted that the embodiments and features of the embodiments of the present application may be combined with each other without conflict.
In the following description, numerous specific details are set forth to provide a thorough understanding of the present invention, and the described embodiments are merely a subset of the embodiments of the present invention, rather than a complete embodiment. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention.
Preferably, the image processing method of the present invention is applied in one or more computer apparatuses. The computer device is a device capable of automatically performing numerical calculation and/or information processing according to a preset or stored instruction, and the hardware thereof includes, but is not limited to, a processor, an external storage medium, a memory, and the like.
The computer device can be a main device such as a desktop computer, a notebook computer, a palm computer and a cloud server. The computer device can be in man-machine interaction with a user through a keyboard, a mouse, a remote controller, a touch panel or voice control equipment and the like.
example one
Fig. 1 is a flowchart of an image processing method according to an embodiment of the present invention. The image processing method is applied to a computer device. The image processing method performs candidate frame deduplication on the image to be processed through maximum pooling. The image processing method can be applied to target detection, for example, in face detection of various video monitoring scenes (such as intelligent traffic, access control systems, urban security and the like), and is used for carrying out candidate frame duplicate removal on the monitored images.
As shown in fig. 1, the image processing method specifically includes the following steps:
101: and acquiring the maximum pooling step length and the size of a pooling window according to a given filtering threshold.
The filtering threshold is a threshold of a coincidence rate set for a candidate frame when a non-maximum suppression (NMS) algorithm is used to perform deduplication on the candidate frame in an image. If the coincidence rate of the two candidate frames is greater than the filtering threshold, the two candidate frames are considered as repeated candidate frames, and the candidate frames with lower scores (namely the candidate frames with lower probability of belonging to the target) are filtered. The filtering threshold may be set by a user, for example, based on empirical values, or may be a default for the system.
A calculation formula (see fig. 2) of the filtering threshold and the maximum pooled pooling step size and the pooling window size may be established, and the maximum pooled pooling step size and the pooling window size corresponding to a given filtering threshold may be calculated according to the calculation formula.
Or, the maximum pooling step length and the pooling window size corresponding to different filtering thresholds may be pre-calculated, a corresponding relationship table of different filtering thresholds and the maximum pooling step length and the pooling window size may be established, and the maximum pooling step length and the pooling window size corresponding to a given filtering threshold may be searched from the corresponding relationship table. The given filtering threshold may correspond to a plurality of groups (two or more groups) of pooling step sizes and pooling window sizes, and one group of pooling step sizes and pooling window sizes may be selected from the corresponding plurality of groups of pooling step sizes and pooling window sizes as needed. It should be noted that, if the given filtering threshold is not included in the correspondence table, the filtering threshold with the minimum difference (may be an absolute value of the difference) from the given filtering threshold is searched from the correspondence table, and the pooling step size and the pooling window size corresponding to the filtering threshold with the minimum difference from the given filtering threshold are used as the pooling step size and the pooling window size corresponding to the given filtering threshold.
the pooling step length and the pooling window size are integers. The pooling step represents a sliding step of a pooling window over the image. The pooling step may include pooling steps in the horizontal direction and the vertical direction, for example, the pooling step is 1 × 1, that is, the pooling steps in the horizontal direction and the vertical direction are both 1 (i.e., 1 pixel).
the pooling window size is the size of the pooling area per pooling. For example, the pooling window size is 2 x 2, i.e. 2 x 2 pooling regions are pooled at a time.
Referring to fig. 2, a schematic diagram of candidate block deduplication by max pooling is shown.
As shown in fig. 2, the maximum pooling window size is 2 x 2, the pooling step size is 1 x 1, and the candidate box size is 6 x 6. Assuming that a is at the position that is retained after maximum pooling (i.e., a is at the maximum in each 2 x 2 pooling window containing a), the value that is likely to be retained nearer to a is the position of the diagonal line in the graph; the position farthest from a (the distance between the pixels is farthest), which must be filtered, is the position of B. The candidate frames 6 × 6 (denoted by 21 and 22 in the drawing) are drawn based on the positions of a and B (i.e., the positions of a and B are the upper left corners of the candidate frames), respectively, and the overlap ratio of the two candidate frames 21 and 22 can be calculated, for example, the overlap ratio is (5 × 5)/(6 × 6-2-5) ═ 0.532 (this overlap ratio is the overlap ratio in the joint system). Therefore, the maximum pooling of the image with the pooling step size of 1 × 1 and the pooling window size of 2 × 2 is equivalent to the deduplication of the candidate frames in the image by the NMS algorithm with the filtering threshold of 0.532, i.e. when the coincidence rate is greater than 0.532, the candidate frames with lower scores are filtered out. The size of the pooling window and the pooling step length are adjusted, so that NMS algorithms with different filtering thresholds can be realized.
It should be noted that, assuming that the area of the first candidate frame is area1, the area of the second candidate frame is area2, and the overlapping area of the first candidate frame and the second candidate frame is area, the overlapping ratio may be area/(area1+ area2-area) (i.e., the overlapping ratio of the joint method), or area/(min (area1, area2)) (i.e., the overlapping ratio of the minimum method). Where min (area1, area2) represents taking the minimum of area1 and area 2. Thus, maximal pooling of images at a particular pooling step size and pooling window size may correspond to deduplication of candidate boxes in an image with a particular NMS algorithm filtering the threshold at a particular manner of coincidence. The specific-mode overlap ratio may be a combined-mode overlap ratio or a minimum-mode overlap ratio. Taking fig. 2 as an example, the overlap ratio of the candidate frame 21 and the candidate frame 22 in the combination is (5 × 5)/(6 × 2-5 × 5) ═ 0.532, and the overlap ratio in the minimum mode is (5 × 5)/(min (6 × 6,6 × 6)) -0.694. Therefore, the maximum pooling of the image with the pooling step size of 1 × 1 and the pooling window size of 2 × 2 may be equivalent to deduplication of candidate frames in the image with the NMS algorithm with the filtering threshold of 0.532 and the joint manner of the coincidence rate; or equivalently, the candidate box in the image is de-duplicated by the NMS algorithm with the filtering threshold of 0.694 and with the minimum mode coincidence rate.
102: and acquiring the matching degree score of each pixel point of the image to be processed and the target.
The image to be detected may be an image received from an external device, for example, a pedestrian monitoring image captured by a camera near a zebra crossing on a road, the pedestrian monitoring image being received from the camera.
alternatively, the image to be detected may be an image taken by the computer device, such as a pedestrian monitoring image taken by the computer device.
Alternatively, the image to be detected may also be an image read from a memory of the computer device, for example a pedestrian monitoring image read from a memory of the computer device.
the target is an object to be detected in the image to be processed. For example, when a face detection is performed on an image to be processed, a target is a face in the image to be processed.
The matching degree score of the pixel point and the target represents the probability that the pixel point is the pixel point of the target. The higher the matching degree score of the pixel point and the target is, the higher the probability that the pixel point is the pixel point in the target is.
The pixel matching degree feature of the image to be processed can be received, and the matching degree score of each pixel point of the image to be processed and the target is obtained from the pixel matching degree feature. In a preferred embodiment, a pixel matching characteristic of a to-be-processed image output by a Multi-task convolutional neural network (MTCNN) may be received. And the convolution layer of the MTCNN extracts the pixel matching degree characteristics of the image to be processed, wherein the pixel matching degree characteristics comprise the matching degree score of each pixel point of the image to be processed.
Or, an image to be processed may be received, and the pixel matching degree feature of the image to be processed is extracted, so as to obtain the matching degree score between each pixel point of the image to be processed and the target. For example, performing convolution operation on the image to be processed to obtain a matching degree score of each pixel point of the image to be processed and a target. The pixel matching degree feature of the to-be-processed image can be obtained by referring to the prior art, and details are not repeated here.
The matching degree score of the pixel point and the target represents the probability that the pixel point belongs to the target. The higher the matching degree score is, the higher the probability that the pixel point belongs to the target is.
103: and according to the matching degree score of each pixel point of the image to be processed, performing maximum pooling on the image to be processed according to the pooling step length and the pooling window size to obtain the image to be processed without the repeated candidate frame.
According to the matching degree score of each pixel point of the image to be processed, performing maximum pooling on the image to be processed by the pooling step length and the pooling window size comprises the following steps: and acquiring the pooling windows of the images to be processed one by one according to the pooling step length and the size of the pooling windows, reserving the maximum matching degree score in each pooling window, and resetting the non-maximum matching degree score. In the image to be processed after the repeated candidate frames are removed, each pixel point with the matching degree score not being zero corresponds to one candidate frame (for example, a face frame after the duplication removal) after the duplication removal, and the position of the upper left corner of the candidate frame after the duplication removal is the position with the matching degree score not being zero. The size of the candidate frame after the duplication removal is a preset size. For example, referring to fig. 2, the maximum pooling window size is 2 × 2, the pooling step size is 1 × 1, the candidate box size is 6 × 6, and the overlap ratio is (5 × 5)/(6 × 2-5 × 5) ═ 0.532. Then, the maximum pooling is performed on the image to be processed by using the pooling step 1 x 1 and the pooling window size 2 x 2 corresponding to the filtering threshold 0.532 (i.e. the coincidence rate 0.532), and the size of the candidate frame after the duplication removal is 6 x 6.
referring to fig. 3, a schematic diagram of maximum pooling of images is shown.
As shown in fig. 3, the maximum pooling window size is 2 x 2, the pooling step size is 1 x 1, and 31, 32 are pooling windows. Each pooling window contains four values, each value representing a matching degree score of a corresponding pixel point, wherein A < B < C. The maximum pooling is to reserve the maximum value in each pooling window, zero out the non-maximum value, and the position which is not 0 and is reserved is the position of the upper left corner of the candidate frame (for example, the face frame after the duplication removal) after the duplication removal.
the pooling window 31 is first maximally pooled. The pooling window 31 has two values of A and B greater than 0, and since A < B, the positions of B and A are reserved and cleared. Then, the pooling window 31 is shifted to the right by one pixel to obtain a pooling window 32, and the pooling window 32 is maximally pooled. The pooling window 32 has two values of B and C which are larger than 0, and the positions of C and B are reserved and cleared because B is smaller than C; and so on. After two maximal pooling, C remained, with the remaining positions being 0, as shown in fig. 3.
The traditional NMS algorithm sorts according to the matching degree score of each candidate frame, continuously finds the candidate frame with the highest matching degree score, traverses other candidate frames, deletes the candidate frame with the coincidence rate exceeding the filtering threshold, and has low execution efficiency. According to the method, the candidate frames in the image are subjected to de-duplication through maximum pooling, the position information of the arrangement of the pixel points is reserved, the pixel points do not need to be sequenced, all repeated candidate frames can be removed only by traversing the pixel points once, the image to be processed after the repeated candidate frames are removed is obtained, the execution efficiency is greatly improved, and the candidate frame de-duplication can be quickly performed on the image.
The image processing method in the first embodiment obtains the maximum pooling step length and the pooling window size according to a given filtering threshold; obtaining the matching degree score of each pixel point of the image to be processed and a target; and according to the matching degree score of each pixel point of the image to be processed, performing maximum pooling on the image to be processed according to the pooling step length and the pooling window size to obtain the image to be processed without the repeated candidate frame. The image processing method according to the first embodiment performs deduplication on candidate frames in an image through maximum pooling, retains position information of pixel point arrangement, does not need to sequence pixel points, can remove all repeated candidate frames only by traversing the pixel points once, obtains a final result, greatly improves execution efficiency, and achieves rapid deduplication of the candidate frames of the image according to matching degree scores of the pixel points.
Example two
Fig. 4 is a structural diagram of an image processing apparatus according to a second embodiment of the present invention. The image processing apparatus 10 is applied to a computer apparatus. The image processing apparatus 10 performs frame candidate deduplication on images to be processed by maximum pooling. The image processing apparatus 10 may be applied to target detection, for example, in face detection of various video monitoring scenes (such as intelligent transportation, access control system, city security, etc.), and is used to perform candidate frame deduplication on a monitored image.
As shown in fig. 4, the image processing apparatus 10 may include: a first acquisition unit 401, a second acquisition unit 402, a pooling unit 403.
A first obtaining unit 401, configured to obtain a pooling step size and a pooling window size of a maximum pooling according to a given filtering threshold.
The filtering threshold is a threshold of a coincidence rate set for a candidate frame when a non-maximum suppression (NMS) algorithm is used to perform deduplication on the candidate frame in an image. If the coincidence rate of the two candidate frames is greater than the filtering threshold, the two candidate frames are considered as repeated candidate frames, and the candidate frames with lower scores (namely the candidate frames with lower probability of belonging to the target) are filtered. The filtering threshold may be set by a user, for example, based on empirical values, or may be a default for the system.
A calculation formula (see fig. 2) of the filtering threshold and the maximum pooled pooling step size and the pooling window size may be established, and the maximum pooled pooling step size and the pooling window size corresponding to a given filtering threshold may be calculated according to the calculation formula.
or, the maximum pooling step length and the pooling window size corresponding to different filtering thresholds may be pre-calculated, a corresponding relationship table of different filtering thresholds and the maximum pooling step length and the pooling window size may be established, and the maximum pooling step length and the pooling window size corresponding to a given filtering threshold may be searched from the corresponding relationship table. The given filtering threshold may correspond to a plurality of groups (two or more groups) of pooling step sizes and pooling window sizes, and one group of pooling step sizes and pooling window sizes may be selected from the corresponding plurality of groups of pooling step sizes and pooling window sizes as needed. It should be noted that, if the given filtering threshold is not included in the correspondence table, the filtering threshold with the minimum difference (may be an absolute value of the difference) from the given filtering threshold is searched from the correspondence table, and the pooling step size and the pooling window size corresponding to the filtering threshold with the minimum difference from the given filtering threshold are used as the pooling step size and the pooling window size corresponding to the given filtering threshold.
the pooling step length and the pooling window size are integers. The pooling step represents a sliding step of a pooling window over the image. The pooling step may include pooling steps in the horizontal direction and the vertical direction, for example, the pooling step is 1 × 1, that is, the pooling steps in the horizontal direction and the vertical direction are both 1 (i.e., 1 pixel).
The pooling window size is the size of the pooling area per pooling. For example, the pooling window size is 2 x 2, i.e. 2 x 2 pooling regions are pooled at a time.
Referring to fig. 2, a schematic diagram of candidate block deduplication by max pooling is shown.
as shown in fig. 2, the maximum pooling window size is 2 x 2, the pooling step size is 1 x 1, and the candidate box size is 6 x 6. Assuming that a is at the position that is retained after maximum pooling (i.e., a is at the maximum in each 2 x 2 pooling window containing a), the value that is likely to be retained nearer to a is the position of the diagonal line in the graph; the position farthest from a (the distance between the pixels is farthest), which must be filtered, is the position of B. The candidate frames 6 × 6 (denoted by 21 and 22 in the drawing) are drawn based on the positions of a and B (i.e., the positions of a and B are the upper left corners of the candidate frames), respectively, and the overlap ratio of the two candidate frames 21 and 22 can be calculated, for example, the overlap ratio is (5 × 5)/(6 × 6-2-5) ═ 0.532 (this overlap ratio is the overlap ratio in the joint system). Therefore, the maximum pooling of the image with the pooling step size of 1 × 1 and the pooling window size of 2 × 2 is equivalent to the deduplication of the candidate frames in the image by the NMS algorithm with the filtering threshold of 0.532, i.e. when the coincidence rate is greater than 0.532, the candidate frames with lower scores are filtered out. The size of the pooling window and the pooling step length are adjusted, so that NMS algorithms with different filtering thresholds can be realized.
it should be noted that, assuming that the area of the first candidate frame is area1, the area of the second candidate frame is area2, and the overlapping area of the first candidate frame and the second candidate frame is area, the overlapping ratio may be area/(area1+ area2-area) (i.e., the overlapping ratio of the joint method), or area/(min (area1, area2)) (i.e., the overlapping ratio of the minimum method). Where min (area1, area2) represents taking the minimum of area1 and area 2. Thus, maximal pooling of images at a particular pooling step size and pooling window size may correspond to deduplication of candidate boxes in an image with a particular NMS algorithm filtering the threshold at a particular manner of coincidence. The specific-mode overlap ratio may be a combined-mode overlap ratio or a minimum-mode overlap ratio. Taking fig. 2 as an example, the overlap ratio of the candidate frame 21 and the candidate frame 22 in the combination is (5 × 5)/(6 × 2-5 × 5) ═ 0.532, and the overlap ratio in the minimum mode is (5 × 5)/(min (6 × 6,6 × 6)) -0.694. Therefore, the maximum pooling of the image with the pooling step size of 1 × 1 and the pooling window size of 2 × 2 may be equivalent to deduplication of candidate frames in the image with the NMS algorithm with the filtering threshold of 0.532 and the joint manner of the coincidence rate; or equivalently, the candidate box in the image is de-duplicated by the NMS algorithm with the filtering threshold of 0.694 and with the minimum mode coincidence rate.
A second obtaining unit 402, configured to obtain a matching degree score between each pixel point of the image to be processed and the target.
the image to be detected may be an image received from an external device, for example, a pedestrian monitoring image captured by a camera near a zebra crossing on a road, the pedestrian monitoring image being received from the camera.
Alternatively, the image to be detected may be an image taken by the computer device, such as a pedestrian monitoring image taken by the computer device.
Alternatively, the image to be detected may also be an image read from a memory of the computer device, for example a pedestrian monitoring image read from a memory of the computer device.
The target is an object to be detected in the image to be processed. For example, when a face detection is performed on an image to be processed, a target is a face in the image to be processed.
The matching degree score of the pixel point and the target represents the probability that the pixel point is the target. The higher the matching degree score of the pixel point and the target is, the higher the probability that the pixel point is the target is.
The pixel matching degree feature of the image to be processed can be received, and the matching degree score of each pixel point of the image to be processed and the target is obtained from the pixel matching degree feature. In a preferred embodiment, a pixel matching characteristic of a to-be-processed image output by a Multi-task convolutional neural network (MTCNN) may be received. And the convolution layer of the MTCNN extracts the pixel matching degree characteristics of the image to be processed, wherein the pixel matching degree characteristics comprise the matching degree score of each pixel point of the image to be processed.
Or, an image to be processed may be received, and the pixel matching degree feature of the image to be processed is extracted, so as to obtain the matching degree score between each pixel point of the image to be processed and the target. For example, performing convolution operation on the image to be processed to obtain a matching degree score of each pixel point of the image to be processed and a target. The pixel matching degree feature of the to-be-processed image can be obtained by referring to the prior art, and details are not repeated here.
The matching degree score of the pixel point and the target represents the probability that the pixel point belongs to the target. The higher the matching degree score is, the higher the probability that the pixel point belongs to the target is.
And the pooling unit 403 is configured to perform maximum pooling on the image to be processed according to the matching degree score of each pixel point of the image to be processed, according to the pooling step length and the pooling window size, so as to obtain the image to be processed without the repeated candidate frame.
According to the matching degree score of each pixel point of the image to be processed, performing maximum pooling on the image to be processed by the pooling step length and the pooling window size comprises the following steps: and acquiring the pooling windows of the images to be processed one by one according to the pooling step length and the size of the pooling windows, reserving the maximum matching degree score in each pooling window, and resetting the non-maximum matching degree score. In the image to be processed after the repeated candidate frames are removed, each pixel point with the matching degree score not being zero corresponds to one candidate frame (for example, a face frame after the duplication removal) after the duplication removal, and the position of the upper left corner of the candidate frame after the duplication removal is the position with the matching degree score not being zero. The size of the candidate frame after the duplication removal is a preset size. For example, referring to fig. 2, the maximum pooling window size is 2 × 2, the pooling step size is 1 × 1, the candidate box size is 6 × 6, and the overlap ratio is (5 × 5)/(6 × 2-5 × 5) ═ 0.532. Then, the maximum pooling is performed on the image to be processed by using the pooling step 1 x 1 and the pooling window size 2 x 2 corresponding to the filtering threshold 0.532 (i.e. the coincidence rate 0.532), and the size of the candidate frame after the duplication removal is 6 x 6.
Referring to fig. 3, a schematic diagram of maximum pooling of images is shown.
as shown in fig. 3, the maximum pooling window size is 2 x 2, the pooling step size is 1 x 1, and 31, 32 are pooling windows. Each pooling window contains four values, each value representing a matching degree score of a corresponding pixel point, wherein A < B < C. The maximum pooling is to reserve the maximum value in each pooling window, zero out the non-maximum value, and the position which is not 0 and is reserved is the position of the upper left corner of the candidate frame (for example, the face frame after the duplication removal) after the duplication removal.
the pooling window 31 is first maximally pooled. The pooling window 31 has two values of A and B greater than 0, and since A < B, the positions of B and A are reserved and cleared. Then, the pooling window 31 is shifted to the right by one pixel to obtain a pooling window 32, and the pooling window 32 is maximally pooled. The pooling window 32 has two values of B and C which are larger than 0, and the positions of C and B are reserved and cleared because B is smaller than C; and so on. After two maximal pooling, C remained, with the remaining positions being 0, as shown in fig. 3.
The traditional NMS algorithm sorts according to the matching degree score of each candidate frame, continuously finds the candidate frame with the highest matching degree score, traverses other candidate frames, deletes the candidate frame with the coincidence rate exceeding the filtering threshold, and has low execution efficiency. In the method, the candidate frames of the image are subjected to duplicate removal through maximum pooling, the position information of the pixel point arrangement is reserved, the pixel points do not need to be sequenced, all repeated candidate frames can be removed only by traversing the pixel points once, the image to be processed after the repeated candidate frames are removed is obtained, and the execution efficiency is greatly improved.
The image processing device of the second embodiment obtains the maximum pooling step length and the pooling window size according to the given filtering threshold; obtaining the matching degree score of each pixel point of the image to be processed and a target; and according to the matching degree score of each pixel point of the image to be processed, performing maximum pooling on the image to be processed according to the pooling step length and the pooling window size to obtain the image to be processed without the repeated candidate frame. The image processing device in the second embodiment performs candidate frame deduplication on the image through maximum pooling, retains the position information of the pixel arrangement, does not need to sequence the pixel points, can remove all repeated candidate frames only by traversing the pixel points once, obtains a final result, greatly improves the execution efficiency, and realizes that the candidate frame deduplication is performed on the image quickly according to the matching degree score of the pixel points.
EXAMPLE III
Fig. 5 is a schematic diagram of a computer device according to a third embodiment of the present invention. The computer device 1 comprises a memory 20, a processor 30 and a computer program 40, such as an image processing program, stored in the memory 20 and executable on the processor 30. The processor 30, when executing the computer program 40, implements the steps of the above-described embodiments of the image processing method, such as the steps 101 to 103 shown in fig. 1. Alternatively, the processor 30, when executing the computer program 40, implements the functions of the modules/units in the above-mentioned device embodiments, such as the units 401 to 403 in fig. 4.
Illustratively, the computer program 40 may be partitioned into one or more modules/units that are stored in the memory 20 and executed by the processor 30 to implement the present invention. The one or more modules/units may be a series of computer program instruction segments capable of performing specific functions, which are used to describe the execution process of the computer program 40 in the computer apparatus 1. For example, the computer program 40 may be divided into a first obtaining unit 401, a second obtaining unit 402, and a pooling unit 403 in fig. 4, and the specific functions of each unit are shown in embodiment two.
The computer device 1 may be a desktop computer, a notebook, a palm computer, a cloud server, or other computing devices. It will be understood by those skilled in the art that the schematic diagram 5 is only an example of the computer apparatus 1, and does not constitute a limitation to the computer apparatus 1, and may include more or less components than those shown, or combine some components, or different components, for example, the computer apparatus 1 may further include an input and output device, a network access device, a bus, and the like.
The Processor 30 may be a Central Processing Unit (CPU), and may include other general purpose processors, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field-Programmable Gate arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components, and the like. The general purpose processor may be a microprocessor or the processor may be any conventional processor or the like, and the processor 30 is the control center of the computer device 1 and connects the various parts of the whole computer device 1 by various interfaces and lines.
The memory 20 may be used for storing the computer program 40 and/or the module/unit, and the processor 30 implements various functions of the computer device 1 by running or executing the computer program and/or the module/unit stored in the memory 20 and calling data stored in the memory 20. The memory 20 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data (such as audio data, a phonebook, etc.) created according to the use of the computer apparatus 1, and the like. The storage 20 may include an external storage medium, and may also include a memory. In addition, the memory 20 may include high speed random access memory, and may also include non-volatile memory, such as a hard disk, a memory, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), at least one magnetic disk storage device, a Flash memory device, or other volatile solid state storage device.
The modules/units integrated with the computer device 1 may be stored in a computer-readable storage medium if they are implemented in the form of software functional units and sold or used as separate products. Based on such understanding, all or part of the flow of the method according to the embodiments of the present invention may also be implemented by a computer program, which may be stored in a computer-readable storage medium, and when the computer program is executed by a processor, the steps of the method embodiments may be implemented. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like. It should be noted that the computer readable medium may contain content that is subject to appropriate increase or decrease as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media does not include electrical carrier signals and telecommunications signals as is required by legislation and patent practice.
In the embodiments provided in the present invention, it should be understood that the disclosed computer apparatus and method can be implemented in other ways. For example, the above-described embodiments of the computer apparatus are merely illustrative, and for example, the division of the units is only one logical function division, and there may be other divisions when the actual implementation is performed.
in addition, functional units in the embodiments of the present invention may be integrated into the same processing unit, or each unit may exist alone physically, or two or more units are integrated into the same unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional module.
It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned. Furthermore, it is obvious that the word "comprising" does not exclude other elements or steps, and the singular does not exclude the plural. The units or computer means recited in the computer means claims may also be implemented by the same unit or computer means, either in software or in hardware. The terms first, second, etc. are used to denote names, but not any particular order.
Finally, it should be noted that the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting, and although the present invention is described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention.

Claims (8)

1. An image processing method, characterized in that the method comprises:
Acquiring the maximum pooling step length and the size of a pooling window according to a given filtering threshold, wherein the filtering threshold is a threshold of a coincidence rate set for a candidate frame when a non-maximum suppression algorithm is adopted to perform de-duplication on the candidate frame obtained by target detection in an image;
Obtaining a matching degree score of each pixel point of an image to be processed and a target, wherein the image to be processed comprises a plurality of candidate frames obtained through target detection;
And acquiring the pooling windows of the image to be processed one by one according to the pooling step length and the size of the pooling windows, reserving the maximum matching degree score in each pooling window, and resetting the non-maximum matching degree score to obtain the image to be processed after the repeated candidate frame is removed, wherein in the image to be processed after the repeated candidate frame is removed, each pixel point with the non-zero matching degree score corresponds to one candidate frame after the repetition is removed, and the position of the upper left corner of the candidate frame after the repetition is the position with the non-zero matching degree score.
2. The method of claim 1, wherein said obtaining a pooling step size and a pooling window size for maximum pooling based on a given filtering threshold comprises:
Establishing a filtering threshold and a calculation formula of the maximum pooling step length and the pooling window size, and calculating the maximum pooling step length and the pooling window size corresponding to the given filtering threshold according to the calculation formula; or
Establishing a corresponding relation table of different filtering thresholds and the maximum pooling step length and the pooling window size, and searching the maximum pooling step length and the pooling window size corresponding to the given filtering threshold from the corresponding relation table.
3. The method of claim 1, wherein the obtaining a matching degree score of each pixel point of the image to be processed with the target comprises:
Receiving pixel matching degree characteristics of an image to be processed, and acquiring a matching degree score of each pixel point of the image to be processed and a target from the pixel matching degree characteristics; or
Receiving an image to be processed, extracting pixel matching degree characteristics of the image to be processed, and obtaining a matching degree score of each pixel point of the image to be processed and a target.
4. an image processing apparatus, characterized in that the apparatus comprises:
The device comprises a first acquisition unit, a second acquisition unit and a third acquisition unit, wherein the first acquisition unit is used for acquiring the maximum pooling step length and the pooling window size according to a given filtering threshold, and the filtering threshold is a threshold of a coincidence rate set aiming at a candidate frame when a non-maximum suppression algorithm is adopted to perform de-duplication on the candidate frame obtained by target detection in an image;
The second acquisition unit is used for acquiring the matching degree score of each pixel point of an image to be processed and a target, wherein the image to be processed comprises a plurality of candidate frames obtained through target detection;
And the pooling unit is used for acquiring pooling windows of the images to be processed one by one according to the pooling step length and the size of the pooling windows, reserving the maximum matching degree score in each pooling window, and resetting the non-maximum matching degree score to obtain the images to be processed after the repeated candidate frames are removed, wherein in the images to be processed after the repeated candidate frames are removed, each pixel point with the non-zero matching degree score corresponds to one candidate frame after the repetition is removed, and the position of the upper left corner of the candidate frame after the repetition is the position with the non-zero matching degree score.
5. The apparatus of claim 4, wherein the first obtaining unit is specifically configured to:
establishing a filtering threshold and a calculation formula of the maximum pooling step length and the pooling window size, and calculating the maximum pooling step length and the pooling window size corresponding to the given filtering threshold according to the calculation formula; or
Establishing a corresponding relation table of different filtering thresholds and the maximum pooling step length and the pooling window size, and searching the maximum pooling step length and the pooling window size corresponding to the given filtering threshold from the corresponding relation table.
6. The apparatus of claim 4, wherein the second acquiring unit acquiring the matching degree score of each pixel point of the image to be processed with the target comprises:
Receiving pixel matching degree characteristics of an image to be processed, and acquiring a matching degree score of each pixel point of the image to be processed and a target from the pixel matching degree characteristics; or
Receiving an image to be processed, extracting pixel matching degree characteristics of the image to be processed, and obtaining a matching degree score of each pixel point of the image to be processed and a target.
7. A computer arrangement, characterized in that the computer arrangement comprises a processor for implementing the image processing method as claimed in any one of claims 1-3 when executing a computer program stored in a memory.
8. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the image processing method according to any one of claims 1 to 3.
CN201711385169.3A 2017-12-20 2017-12-20 Image processing method and device, computer device and readable storage medium Active CN108197536B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711385169.3A CN108197536B (en) 2017-12-20 2017-12-20 Image processing method and device, computer device and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711385169.3A CN108197536B (en) 2017-12-20 2017-12-20 Image processing method and device, computer device and readable storage medium

Publications (2)

Publication Number Publication Date
CN108197536A CN108197536A (en) 2018-06-22
CN108197536B true CN108197536B (en) 2019-12-17

Family

ID=62577463

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711385169.3A Active CN108197536B (en) 2017-12-20 2017-12-20 Image processing method and device, computer device and readable storage medium

Country Status (1)

Country Link
CN (1) CN108197536B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110503193B (en) * 2019-07-25 2022-02-22 瑞芯微电子股份有限公司 ROI-based pooling operation method and circuit
CN112686269B (en) * 2021-01-18 2024-06-25 北京灵汐科技有限公司 Pooling method, apparatus, device and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101706576A (en) * 2009-11-13 2010-05-12 山东大学 Radar image based moving target morphology detecting and tracking method
US9626579B2 (en) * 2014-05-05 2017-04-18 Qualcomm Incorporated Increasing canny filter implementation speed
CN106600631A (en) * 2016-11-30 2017-04-26 郑州金惠计算机系统工程有限公司 Multiple target tracking-based passenger flow statistics method

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101706576A (en) * 2009-11-13 2010-05-12 山东大学 Radar image based moving target morphology detecting and tracking method
US9626579B2 (en) * 2014-05-05 2017-04-18 Qualcomm Incorporated Increasing canny filter implementation speed
CN106600631A (en) * 2016-11-30 2017-04-26 郑州金惠计算机系统工程有限公司 Multiple target tracking-based passenger flow statistics method

Also Published As

Publication number Publication date
CN108197536A (en) 2018-06-22

Similar Documents

Publication Publication Date Title
CN109918969B (en) Face detection method and device, computer device and computer readable storage medium
CN109376596B (en) Face matching method, device, equipment and storage medium
CN110147722A (en) A kind of method for processing video frequency, video process apparatus and terminal device
CN105894464B (en) A kind of medium filtering image processing method and device
CN109583345B (en) Road recognition method, device, computer device and computer readable storage medium
US20210201068A1 (en) Image processing method and apparatus, and electronic device
JP2014531097A (en) Text detection using multi-layer connected components with histograms
US10943098B2 (en) Automated and unsupervised curation of image datasets
CN108197536B (en) Image processing method and device, computer device and readable storage medium
CN111179265A (en) Image-based fingerprint quality evaluation method and device and electronic equipment
CN112132033B (en) Vehicle type recognition method and device, electronic equipment and storage medium
WO2019201029A1 (en) Candidate box update method and apparatus
AU2020294190B2 (en) Image processing method and apparatus, and electronic device
CN110633636A (en) Trailing detection method and device, electronic equipment and storage medium
CN113688839B (en) Video processing method and device, electronic equipment and computer readable storage medium
CN112132215B (en) Method, device and computer readable storage medium for identifying object type
CN113762220A (en) Object recognition method, electronic device, and computer-readable storage medium
CN111079624B (en) Sample information acquisition method and device, electronic equipment and medium
CN112418089A (en) Gesture recognition method and device and terminal
CN112235598A (en) Video structured processing method and device and terminal equipment
CN110210425B (en) Face recognition method and device, electronic equipment and storage medium
CN113077469A (en) Sketch image semantic segmentation method and device, terminal device and storage medium
CN107861990B (en) Video searching method and system and terminal equipment
CN110276050A (en) To the method and device of high dimension vector similarity system design
CN114724128A (en) License plate recognition method, device, equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant