CN108875723B

CN108875723B - Object detection method, device and system and storage medium

Info

Publication number: CN108875723B
Application number: CN201810005939.5A
Authority: CN
Inventors: 吕鹏原; 姚聪
Original assignee: Beijing Kuangshi Technology Co Ltd
Current assignee: Beijing Kuangshi Technology Co Ltd
Priority date: 2018-01-03
Filing date: 2018-01-03
Publication date: 2023-01-06
Anticipated expiration: 2038-01-03
Also published as: CN108875723A

Abstract

The embodiment of the invention provides an object detection method, device and system and a storage medium. The object detection method includes: acquiring an image to be detected; performing object detection on an image to be detected to determine at least one candidate region for indicating the position of a target object; for each of at least one candidate region, dividing the candidate region into m sub-regions, wherein m is a positive integer greater than or equal to 2; respectively calculating the subregion probability of the target object in the m subregions; calculating the probability of the candidate region with the target object according to the probability of all the sub-regions of the candidate region; and selecting the candidate region according with the preset rule according to the candidate region probability of at least one candidate region to obtain an object detection result. The object detection method provided by the embodiment of the invention can effectively solve the problems of object (such as characters) rotation, overlong and the like.

Description

Object detection method, device and system and storage medium

Technical Field

The present invention relates to the field of object recognition, and more particularly, to an object detection method, apparatus and system, and a storage medium.

Background

Object detection is detecting whether a target object (e.g., a pedestrian, a character, a specific pattern, etc.) exists from an image. When the shape and the direction of a target area where a target object is located are irregular in an image, difficulty is brought to detection. The following description will be made by taking text detection as an example. Text detection is an important research direction in the field of computer vision, and aims to identify text regions in images. Characters in natural scenes often have the problems of character rotation, perspective deformation, overlong characters and the like, and the problems are difficult to overcome when the existing character detection technology meets the problems.

Disclosure of Invention

The present invention has been made in view of the above problems. The invention provides an object detection method, device and system and a storage medium.

According to an aspect of the present invention, there is provided an object detection method. The method comprises the following steps: acquiring an image to be detected; carrying out object detection on an image to be detected so as to determine at least one candidate region for indicating the position of a target object; for each of at least one candidate region, dividing the candidate region into m sub-regions, wherein m is a positive integer greater than or equal to 2; respectively calculating the subregion probability of the target object in the m subregions; calculating the candidate region probability of the target object in the candidate region according to all the sub-region probabilities of the candidate region; and selecting the candidate region according with the preset rule according to the candidate region probability of at least one candidate region to obtain an object detection result.

Illustratively, the object detection of the image to be detected to determine at least one candidate region indicating the position of the target object comprises: carrying out object detection on an image to be detected to obtain position information related to the position of a target object, wherein the position information comprises coordinates of n corner points, relative position attributes of the n corner points in the corresponding rectangular area and the length of the short side of the rectangular area corresponding to each of the n corner points, and n is an integer; and combining the n corner points according to the position information to obtain at least one candidate region.

Illustratively, the n corner points are divided into an upper left corner point group, a lower left corner point group, an upper right corner point group and a lower right corner point group, wherein the number of corner points in the upper left corner point group, the lower left corner point group, the upper right corner point group and the lower right corner point group is n respectively _i I =1,2,3,4, wherein n _i And ≥ 0, combining the n angular points according to the position information to obtain at least one candidate area comprises: two from any one of the 4 corner point pairsSelecting two matched angular points from the angular point groups respectively to combine into an angular point pair, wherein the 4 angular point pairs comprise an angular point pair consisting of an upper left angular point group and an upper right angular point group, an angular point pair consisting of an upper right angular point group and a lower right angular point group, an angular point pair consisting of a lower left angular point group and a lower right angular point group, and an angular point pair consisting of an upper left angular point group and a lower left angular point group; and determining at least one candidate region based on the coordinates of two corner points in each corner point pair of the at least one corner point pair obtained by combining the 4 corner point pair groups and the side lengths of the short sides corresponding to the two corner points.

Illustratively, selecting two matching corner groups from the two corner groups of any one of the 4 corner group pairs to form a corner group pair comprises: selecting two corner points from two corner point groups of any one corner point group pair in the 4 corner point group pairs respectively; judging whether the two selected corner points meet the preset requirements, if so, determining that the two selected corner points are matched corner points and combining the two selected corner points into a corner point pair; wherein the preset requirements include: the difference value between the side lengths of the short sides corresponding to the two selected corner points is smaller than a preset difference value; the two selected corner points meet a preset spatial position relation; the side length of the short side of the rectangular area formed by combining the two selected corner points is within a preset range.

Illustratively, the object detection of the image to be detected to obtain the position information related to the position of the target object comprises: and processing the image to be detected by using a full convolution network to obtain position information.

Exemplarily, the object detecting the image to be detected to determine the at least one candidate region indicating the position of the target object further includes: performing object detection on an image to be detected to obtain m probability maps, wherein the m probability maps respectively have m relative position attributes, the relative position attributes of the m probability maps are respectively corresponding to and consistent with the relative position attributes of m sub-regions of each candidate region in at least one candidate region in the candidate region, and each probability map in the m probability maps is used for indicating the probability that a pixel at the relative position attribute corresponding to the probability map in each prediction object region where a target object in the image to be detected is located belongs to the target object; for each of the at least one candidate region, calculating a sub-region probability that the target object exists for the m sub-regions respectively comprises: for each of the at least one candidate region, mapping the m sub-regions to m probability maps, respectively, based on the correspondence of the relative position attributes, to determine a probability that all pixels in each of the m sub-regions belong to the target object; and for each sub-region of the m sub-regions, averaging the probabilities that all pixels in the sub-region belong to the target object to obtain a sub-region probability for the sub-region.

Illustratively, processing the image to be detected using the full convolution network to obtain the location information includes: and processing the image to be detected by using a full convolution network to obtain position information and m probability maps.

For example, for each of the at least one candidate region, calculating a candidate region probability that the target object exists in the candidate region according to all the sub-region probabilities of the candidate region includes: for each of the at least one candidate region, the probabilities of all sub-regions of the candidate region are averaged to obtain a candidate region probability for the candidate region.

Exemplarily, selecting a candidate region that meets a preset rule according to a candidate region probability of each of the at least one candidate region to obtain the object detection result comprises: selecting a candidate region with a candidate region probability greater than a preset probability threshold from at least one candidate region; and performing non-maximum suppression on the selected candidate area to obtain a suppressed candidate area as an object detection result.

According to an aspect of the present invention, there is provided an object detecting apparatus. The device comprises: the acquisition module is used for acquiring an image to be detected; the detection module is used for carrying out object detection on the image to be detected so as to determine at least one candidate region for indicating the position of the target object; a probability calculation module, configured to, for each of at least one candidate region, divide the candidate region into m sub-regions, where m is a positive integer greater than or equal to 2; respectively calculating the subregion probability of the target object in the m subregions; calculating the candidate region probability of the target object in the candidate region according to all the sub-region probabilities of the candidate region; and the selection module is used for selecting the candidate region which accords with the preset rule according to the candidate region probability of at least one candidate region so as to obtain the object detection result.

According to another aspect of the present invention, there is provided an object detection system comprising a processor and a memory, wherein the memory has stored therein computer program instructions for executing the above object detection method when executed by the processor.

According to another aspect of the present invention, there is provided a storage medium having stored thereon program instructions for performing the above-described object detection method when executed.

According to the object detection method, the device and the system as well as the storage medium, the candidate region probability is calculated according to the sub-region probability of each candidate region, and then the candidate regions are screened according to the candidate region probability to obtain the object detection result. The object detection method provided by the embodiment of the invention can effectively solve the problems of object (such as characters) rotation, overlong and the like.

Drawings

The above and other objects, features and advantages of the present invention will become more apparent from the following detailed description of the embodiments of the present invention when taken in conjunction with the accompanying drawings. The accompanying drawings are included to provide a further understanding of the embodiments of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention. In the drawings, like reference numbers generally indicate like parts or steps.

FIG. 1 shows a schematic block diagram of an example electronic device for implementing an object detection method and apparatus in accordance with embodiments of the present invention;

FIG. 2 shows a schematic flow diagram of an object detection method according to an embodiment of the invention;

FIG. 3 shows a schematic diagram of an image to be detected and text thereon, according to an example;

FIGS. 4a and 4b show schematic diagrams of a partitioning of a candidate region of the image to be detected shown in FIG. 3, according to an example;

FIGS. 5a-5d respectively show schematic diagrams of 4 probability maps obtained based on the detection of the image to be detected shown in FIG. 3, according to one embodiment of the present invention;

FIG. 6 shows a schematic block diagram of an object detection apparatus according to an embodiment of the present invention; and

FIG. 7 shows a schematic block diagram of an object detection system according to one embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, exemplary embodiments according to the present invention will be described in detail below with reference to the accompanying drawings. It is to be understood that the described embodiments are merely a subset of embodiments of the invention and not all embodiments of the invention, with the understanding that the invention is not limited to the example embodiments described herein. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the invention described herein without inventive step, shall fall within the scope of protection of the invention.

In order to solve the above problem, embodiments of the present invention provide an object detection method, apparatus and system, and a storage medium. According to the embodiment of the invention, the problems of object rotation, overlong length and the like can be solved by adopting a position-sensitive object detection method. The object detection method provided by the embodiment of the invention can be applied to any fields related to object detection, such as the fields of security monitoring, internet finance, banking business and the like.

First, an example electronic device 100 for implementing an object detection method and apparatus according to an embodiment of the present invention is described with reference to fig. 1.

As shown in FIG. 1, electronic device 100 includes one or more processors 102, one or more memory devices 104, an input device 106, an output device 108, and an image capture device 110, which are interconnected via a bus system 112 and/or other form of connection mechanism (not shown). It should be noted that the components and structure of the electronic device 100 shown in fig. 1 are exemplary only, and not limiting, and the electronic device may have other components and structures as desired.

The processor 102 may be implemented in at least one hardware form of a Digital Signal Processor (DSP), a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), the processor 102 may be one or a combination of Central Processing Units (CPU), image processors (GPU), application Specific Integrated Circuits (ASIC), or other forms of processing units with data processing and/or instruction execution capabilities, and may control other components in the electronic device 100 to perform desired functions.

The storage 104 may include one or more computer program products that may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. The volatile memory may include, for example, random Access Memory (RAM), cache memory (cache), and/or the like. The non-volatile memory may include, for example, read Only Memory (ROM), hard disk, flash memory, etc. On which one or more computer program instructions may be stored that may be executed by processor 102 to implement client-side functionality (implemented by the processor) and/or other desired functionality in embodiments of the invention described below. Various applications and various data, such as various data used and/or generated by the applications, may also be stored in the computer-readable storage medium.

The input device 106 may be a device used by a user to input instructions and may include one or more of a keyboard, a mouse, a microphone, a touch screen, and the like.

The output device 108 may output various information (e.g., images and/or sounds) to an external (e.g., user), and may include one or more of a display, a speaker, etc.

The image capture device 110 may capture images (including video frames) and store the captured images in the storage device 104 for use by other components. The image capture device 110 may be an image sensor in a camera. It should be understood that the image capture device 110 is merely an example, and the electronic device 100 may not include the image capture device 110. In this case, the to-be-processed image may be captured by using another device having an image capturing capability, and the captured image may be transmitted to the electronic apparatus 100.

Illustratively, an exemplary electronic device for implementing the object detection method and apparatus according to embodiments of the present invention may be implemented on a device such as a personal computer or a remote server.

Next, an object detection method according to an embodiment of the present invention will be described with reference to fig. 2. Fig. 2 shows a schematic flow diagram of an object detection method 200 according to an embodiment of the invention. As shown in fig. 2, the object detection method 200 includes the following steps.

In step S210, an image to be detected is acquired.

The image to be detected may be any image for which it is desired to detect the presence of a target object. The target object described herein may be any object, including but not limited to: text, a particular pattern, a person or a part of a human body (such as a human face), an animal, a vehicle, a building, etc.

The image to be detected can be a static image or a video frame in a video. The image to be detected may be an original image acquired by an image acquisition device, or may be an image obtained after preprocessing (such as digitizing, normalizing, smoothing, and the like) the original image.

In step S220, object detection is performed on the image to be detected to determine at least one candidate region indicating a location of the target object.

Step S220 may be implemented using any existing or future-emerging object detection algorithm.

For example, a neural network model may be trained in advance, and the image to be detected is input into the trained neural network model, and the neural network model may output the position information of the target object. For example, the position of an object region where the target object is located may be detected, the object region may be represented by a rectangular region (or a rectangular box), and the position information of the target object may include corner coordinates of the rectangular region. Exemplarily, the corner points refer to four corners of a rectangular area: upper left, upper right, lower left, lower right.

Exemplarily, step S220 may include: carrying out object detection on an image to be detected to obtain position information related to the position of a target object, wherein the position information comprises coordinates of n corner points, relative position attributes of the n corner points in the corresponding rectangular area and the length of the short side of the rectangular area corresponding to each of the n corner points, and n is an integer; and combining the n corner points according to the position information to obtain at least one candidate region.

The following description will be made by taking text detection as an example. Fig. 3 shows a schematic view of an image 300 to be detected and the text thereon according to an example. As shown in fig. 3, the image 300 to be detected includes two characters, i.e., "Hello" and "World". The rectangular area 310 is an area containing text detected by the text detection algorithm. The two rectangular areas 310 shown in fig. 3 each contain 4 corner points 320. Therefore, the coordinates of 8 corner points can be detected by a character detection algorithm. In addition, the relative position attributes of the 8 corner points can be detected, wherein the relative position attributes are 2 corner points, namely, the upper left corner, the lower left corner, the upper right corner and the lower right corner. In addition, the length of the short side of the rectangular area 310 corresponding to each corner point, that is, the length of the short side 330, can be detected. Although two rectangular regions are shown in fig. 3 as being as large and the short sides as long, this is merely an example, and the size of each rectangular region may be determined according to actual circumstances and is not necessarily as large.

Illustratively, the n corner points are divided into an upper left corner point group, a lower left corner point group, an upper right corner point group and a lower right corner point group, wherein the number of corner points in the upper left corner point group, the lower left corner point group, the upper right corner point group and the lower right corner point group is n respectively _i I =1,2,3,4, wherein n _i And more than or equal to 0, combining the n angular points according to the position information to obtain at least one candidate area comprises: selecting two matched corner group composite corners from two corner groups of any one of the 4 corner group pairsThe point pairs comprise 4 point pair groups, wherein the 4 point pair groups comprise a point pair group consisting of a left upper point group and a right upper point group, a point pair group consisting of a right upper point group and a right lower point group, a point pair group consisting of a left lower point group and a right lower point group, and a point pair group consisting of a left upper point group and a left lower point group; and determining at least one candidate region based on the coordinates of two corner points in each corner point pair of the at least one corner point pair obtained by combining the 4 corner point pair groups and the side lengths of the short sides corresponding to the two corner points.

The n corner points may be divided into 4 groups by relative position attributes, such as an upper left corner group, a lower left corner group, an upper right corner group, and a lower right corner group. The number of corners in each set of corners is not necessarily the same, since there may be images without target objects, and some corners may be missed. Suppose that the numbers of the corner points of the upper left corner point group, the lower left corner point group, the upper right corner point group and the lower right corner point group are n respectively ₁ 、n ₂ 、n ₃ 、n ₄ . For each corner group, n _i ≥0，i＝1,2,3,4。

Following the example shown in fig. 3. It will be appreciated that in the above example, rather than directly deriving the rectangular area 330 from information about a stack of corner points, the text detection algorithm may derive 8 corner points for sampling and combining. And combining any 2 corner points which can form a rectangular area according to the coordinates of the corner points, the relative position attribute of the corner points and the side length of the short side.

It will be appreciated that any two corner points in combination with the length of the short side may define a rectangular area. Compared with a mode of randomly sampling 4 corner points to form a candidate region, the mode of sampling two corner points can greatly reduce the operation complexity. Specifically, there may be 4 combinations of corner sampling (top left corner, top right corner), (top right corner, bottom right corner), (bottom left corner, bottom right corner), (top left corner, and bottom left corner). Illustratively, based on the above four combination manners, two corner points may be randomly selected from the corresponding corner point group, and the selected corner points are filtered. An exemplary filtering mechanism is described below.

Exemplarily, the selecting, from two corner groups of any one of the 4 corner group pairs, two matched corner groups to be combined into a corner pair respectively comprises: selecting two corner points from two corner point groups of any one of the 4 corner point group pairs respectively; judging whether the two selected corner points meet the preset requirements or not, if so, determining that the two selected corner points are matched corner points and combining the two selected corner points into a corner point pair; wherein the preset requirements include: the difference value between the side lengths of the short sides corresponding to the two selected corner points is smaller than a preset difference value; the two selected corner points meet a preset spatial position relation; the side length of the short side of the rectangular area formed by combining the two selected corner points is within a preset range.

For corner points satisfying the following three conditions, it can be considered as matching corner points: 1. the short side lengths corresponding to the two angular points are close to each other, for example, the difference value of the short side lengths is smaller than a preset difference value; 2. the two corner points satisfy a preset (or reasonable) spatial position relationship, for example, the x coordinate of the upper left corner point is smaller than the x coordinate of the upper right corner point; 3. the length of the short side of the rectangular region formed by combining the two corner points is within a preset range, for example, is greater than a preset length threshold.

For example, with reference to fig. 3, assuming that four corner points of the left rectangular region are respectively an upper left UL1, a lower left LL1, an upper right UR1, and a lower right LR1, and four corner points of the right rectangular region are respectively an upper left UL2, a lower left LL2, an upper right UR2, and a lower right LR2, there may be a plurality of combinations of (UL 1, UR 1), (UL 1, UR 2), (UR 1, LR 1), (LL 1, LR 2), and (UL 1, LL 1). Some combinations of rectangular regions are consistent (or substantially consistent), for example, the rectangular regions corresponding to both combinations (UL 1, UR 1) and (LL 1, LR 1) are (UL 1, LL1, UR1, LR 1).

By the above corner point combination, three kinds of candidate regions (UL 1, LL1, UR1, LR 1), (UL 1, LL1, UR2, LR 2), (UL 2, LL2, UR2, LR 2) (the number of each kind of candidate region may not be 1, and only one of each kind of candidate region is shown in fig. 4a to 4b described below) can be obtained approximately as A1, A2, and A3, respectively. When the number of the corners is larger, there are more candidate regions that can be formed, and the combination manner of the corners is more complicated. As can be seen from the above examples, the number of the at least one candidate region is not consistent with the number of the object regions actually containing the target object, and then the at least one candidate region may be filtered to try to find a candidate region consistent with the actual object region.

In step S230, for each of at least one candidate region, the candidate region is divided into m sub-regions, where m is a positive integer greater than or equal to 2.

Fig. 4a and 4b are schematic diagrams illustrating the division of the candidate region of the image to be detected shown in fig. 3 according to an example, wherein fig. 4a illustrates the division manner of the candidate regions A1 and A3, and fig. 4b illustrates the division manner of the candidate region A2. In the example shown in fig. 4a and 4b, each candidate region is equally divided into four sub-regions, i.e. m =4. In the following description, an example of m =4 will be used, however, this is not a limitation of the present invention, and the number of sub-regions into which each candidate region is divided and the dividing manner of the sub-regions may be determined as needed, for example, each candidate region may be divided into a number of sub-regions other than 4 in a non-uniform manner. The divided sub-regions have certain relative position attributes in the candidate regions to which the sub-regions belong, for example, the candidate region A1 includes four sub-regions, and the relative position attributes are upper left, lower left, upper right, and lower right, respectively.

In step S240, for each of the at least one candidate region, a sub-region probability that the target object exists in the m sub-regions is calculated, respectively.

Exemplarily, step S240 may include: performing object detection on an image to be detected to obtain m probability maps, wherein the m probability maps respectively have m relative position attributes, the relative position attributes of the m probability maps are respectively corresponding to and consistent with the relative position attributes of m sub-regions of each candidate region in at least one candidate region in the candidate region, and each probability map in the m probability maps is used for indicating the probability that a pixel at the relative position attribute corresponding to the probability map in each prediction object region where a target object in the image to be detected is located belongs to the target object; step S240 may include: for each of the at least one candidate region, mapping the m sub-regions to m probability maps, respectively, based on the correspondence of the relative position attributes, to determine a probability that all pixels in each of the m sub-regions belong to the target object; and for each sub-region of the m sub-regions, averaging the probabilities that all pixels in the sub-region belong to the target object to obtain a sub-region probability for the sub-region. The prediction object region is an object region predicted by an algorithm, and may not completely coincide with an actual object region, which is shown in the drawings herein as coinciding.

Illustratively, by performing object detection on an image to be detected, m probability maps (or object segmentation probability maps) can also be obtained. That is, the number of probability maps coincides with the number of sub-regions into which each candidate region is divided.

In one example, the operations of obtaining location information of the target object and obtaining m probability maps may be performed using a trained full convolution network. After the image to be detected is input into the full convolution network, the full convolution network can output the position information of the target object and the m probability maps. The size of each probability map may be identical to the size of the input image to be detected.

Fig. 5a-5d respectively show schematic diagrams of 4 probability maps obtained based on the detection of the image to be detected shown in fig. 3 according to an embodiment of the present invention. The probability map described herein is a location-sensitive segmented probability map, each with its own relative location attributes. For example, the probability map shown in fig. 5a is a probability map with the relative position attribute at the upper left. That is, the probability map shown in fig. 5a is used to indicate the probability that the pixel of the upper left region in each predicted text region in the image to be detected belongs to text. In the probability map shown in fig. 5a, the size of a pixel corresponding to the upper left region (indicated by hatching) of the two text regions "Hello" and "World" in fig. 3 (fig. 5a to 5d assume that the actual text region and the predicted text region are the same) represents the probability that the pixel belongs to a text. Ideally, in the probability map shown in fig. 5a, the pixel corresponding to the pixel belonging to a character in the upper left region of each character region takes a value of 1, and the pixel corresponding to the pixel not belonging to a character in the upper left region takes a value of 0. In the probability map shown in fig. 5a, pixels corresponding to the white area are set to a default value, for example, 0, regardless of whether or not pixels of an area (indicated by white) other than the upper left area of the text area in the image to be detected belong to text. In practice, in the probability map shown in FIG. 5a, the value of each pixel may be in the range of [0,1 ].

The probability maps shown in fig. 5b-5d are probability maps with relative position attributes of upper right, lower left, and lower right, respectively, and the principle is similar to that in fig. 5a, and will not be described again.

In training the full convolution network, the position of each text region in the sample image may be labeled (may be labeled by a rectangular box), and the position of the upper left region in each text region may also be labeled (may also be labeled by a rectangular box), so that the full convolution network capable of obtaining the probability map shown in fig. 5a may be trained. The training principle for detecting the full convolution network for obtaining the probability map shown in fig. 5b-5d is similar to the probability map shown in fig. 5a, and is not repeated.

The m sub-regions of each candidate region may be mapped to the m probability maps in a one-to-one correspondence. For example, the upper left sub-region of the candidate region A1 may be mapped to an upper left probability map as shown in fig. 5a, and the probability that each pixel in the upper left sub-region of the candidate region A1 belongs to a character is obtained. Then, the probabilities that all pixels in the upper left sub-region of the candidate region a belong to the text may be averaged, and the obtained average value is used as the sub-region probability of the upper left sub-region. Similarly, the upper right, lower left and lower right sub-regions of the candidate region A1 may be mapped to probability maps as shown in fig. 5b-5d, respectively, to determine the sub-region probability for each sub-region, respectively.

In step S250, for each of at least one candidate region, a candidate region probability that the target object exists in the candidate region is calculated according to all the sub-region probabilities of the candidate region.

Exemplarily, step S250 may include: for each of the at least one candidate region, the probability of all sub-regions of the candidate region is averaged to obtain a candidate region probability for the candidate region.

For example, the candidate region probability of the candidate region A1 may be obtained by averaging the 4 sub-region probabilities of the candidate region A1. Similarly, similar operations may be performed on the candidate regions A2, A3 to obtain a candidate region probability for each candidate region.

In step S260, a candidate region meeting a preset rule is selected according to the candidate region probability of at least one candidate region, so as to obtain an object detection result.

Optionally, the preset rules may include requirements regarding candidate region probability filtering and non-maxima suppression of the remaining candidate regions. Exemplarily, step S260 may include: selecting a candidate region with a candidate region probability greater than a preset probability threshold from at least one candidate region; and performing non-maximum suppression on the selected candidate region to obtain a suppressed candidate region as an object detection result.

The candidate region probability for each candidate region may be used as a score with which to filter the candidate regions. For example, candidate regions with a candidate region probability exceeding 0.65 may be retained and candidate regions below 0.65 may be filtered out. After filtering, a non-maximum suppression (NMS) may be performed on the remaining candidate regions to suppress redundant candidate regions. The implementation of non-maxima suppression can be understood by those skilled in the art and will not be described herein. And the candidate region obtained after the inhibition is the final object detection result.

It can be understood that when each sub-candidate region of the candidate region A1 or A3 is mapped onto the corresponding probability map, the probability that the sub-candidate region is mapped onto the white region of the probability map is relatively low, and therefore the calculated candidate region probability is relatively high. In contrast, the candidate region A2 covers two text regions, i.e., "Hello" and "World", and covers a blank portion between the two text regions, so when each sub-region of the candidate region A2 is mapped onto a corresponding probability map, more pixels are mapped onto a white region of the probability map, which results in lowering the probability average of each sub-region, i.e., lowering the candidate region probability of the candidate region A2. Therefore, the candidate region probability of the candidate region A2 obtained by the calculation may be lower than the candidate region probabilities of the candidate regions A1 and A3. By setting a suitable preset probability threshold, the candidate region A2 may be filtered out. Therefore, the method of dividing the sub-region and calculating the probability of the candidate region by the probability of the sub-region is a position-sensitive object detection method, which can well reduce the wrong object division, for example, two characters are recognized as one. Meanwhile, the position-sensitive object detection mode can improve the detection accuracy, and due to the position sensitivity, the detection effect can be well guaranteed even if the object has the problems of rotation, deformation and the like, and the robustness of object detection is improved. When the target object to be detected is a character, the object detection method can well detect the character in any direction in the natural scene.

Illustratively, the object detection method according to embodiments of the present invention may be implemented in a device, apparatus, or system having a memory and a processor.

According to the object detection method provided by the embodiment of the invention, the candidate region probability is calculated through the sub-region probability of each candidate region, and then the candidate regions are screened based on the candidate region probability to obtain the object detection result. The object detection method provided by the embodiment of the invention can effectively solve the problems of object (such as characters) rotation, overlong and the like.

The object detection method can be deployed at an image acquisition end, for example, the object detection method can be deployed at the image acquisition end of an access control system in the field of security application; in the field of financial applications, it may be deployed at personal terminals such as smart phones, tablets, personal computers, and the like.

Alternatively, the object detection method according to the embodiment of the present invention may also be distributively deployed at the server side and the personal terminal side. For example, in the security application field, an image may be collected at an image collection end, the image collection end transmits the collected image to a server end (or a cloud end), and the server end (or the cloud end) performs object detection.

Illustratively, the above steps S230-S250 may be implemented using a rotated position sensitive region of interest average Pooling layer (rotated position sensitive ROI average layer). The pooling mode can well process the rotated candidate area so as to further improve the reliability of the object detection method for processing the character rotation problem. The at least one candidate region and the m probability maps may be input into a rotated position sensitive region of interest averaging pooling layer for pooling, which may output a candidate region probability for each candidate region. Illustratively, in training the above-described fully convolutional network, a rotated location-sensitive region of interest average pooling layer may be trained simultaneously.

According to another aspect of the present invention, an object detecting apparatus is provided. Fig. 6 shows a schematic block diagram of an object detection apparatus 600 according to an embodiment of the present invention.

As shown in fig. 6, the object detecting apparatus 600 according to an embodiment of the present invention includes an obtaining module 610, a detecting module 620, a calculating module 630, and a selecting module 640. The various modules may perform the various steps/functions of the object detection method described above in connection with fig. 2-5d, respectively. Only the main functions of the components of the object detection apparatus 600 will be described below, and details that have been described above will be omitted.

The obtaining module 610 is configured to obtain an image to be detected. The obtaining module 610 may be implemented by the processor 102 in the electronic device shown in fig. 1 executing program instructions stored in the storage 106.

The detection module 620 is configured to perform object detection on the image to be detected to determine at least one candidate region indicating a location of a target object. The detection module 620 may be implemented by the processor 102 in the electronic device shown in fig. 1 executing program instructions stored in the storage 106.

The calculating module 630 is configured to, for each of the at least one candidate region, divide the candidate region into m sub-regions, where m is a positive integer greater than or equal to 2; respectively calculating the sub-region probability of the target object existing in the m sub-regions; and calculating the candidate region probability of the target object in the candidate region according to all the sub-region probabilities of the candidate region. The calculation module 630 may be implemented by the processor 102 in the electronic device shown in fig. 1 executing program instructions stored in the storage 106.

The selecting module 640 is configured to select a candidate region that meets a preset rule according to the candidate region probability of each of the at least one candidate region, so as to obtain an object detection result. The selection module 640 may be implemented by the processor 102 in the electronic device shown in fig. 1 executing program instructions stored in the storage 106.

Illustratively, the detection module 620 is specifically configured to: performing object detection on an image to be detected to obtain position information related to the position of a target object, wherein the position information comprises coordinates of n corner points, relative position attributes of the n corner points in the corresponding rectangular region, and short side length of the rectangular region corresponding to each of the n corner points, and n is an integer; and combining the n corner points according to the position information to obtain at least one candidate region.

Illustratively, the n corner points are divided into an upper left corner point group, a lower left corner point group, an upper right corner point group and a lower right corner point group, wherein the number of corner points in the upper left corner point group, the lower left corner point group, the upper right corner point group and the lower right corner point group is n respectively _i I =1,2,3,4, wherein n _i Not less than 0, the detection module 620 is specifically further configured to: respectively selecting two matched corner groups from two corner groups of any corner group pair in the 4 corner group pairs to form a corner group pair, wherein the 4 corner group pairs comprise a corner group pair consisting of a left upper corner group and a right upper corner group, a corner group pair consisting of a right upper corner group and a right lower corner group, a corner group pair consisting of a left lower corner group and a right lower corner group, and a corner group pair consisting of a left upper corner group and a left lower corner group; and determining at least one candidate region based on the coordinates of two corner points in each corner point pair obtained by combining the 4 corner point pairs and the side length of the short side corresponding to the two corner points.

Exemplarily, the detecting module 620 is further specifically configured to: selecting two corner points from two corner point groups of any one of the 4 corner point group pairs respectively; judging whether the two selected corner points meet the preset requirements or not, if so, determining that the two selected corner points are matched corner points and combining the two selected corner points into a corner point pair; wherein, the preset requirements include: the difference value between the side lengths of the short sides corresponding to the two selected corner points is smaller than a preset difference value; the two selected corner points meet a preset spatial position relation; the side length of the short side of the rectangular region formed by combining the two selected corner points is within a preset range.

Illustratively, the detection module 620 is specifically configured to: and processing the image to be detected by using a full convolution network to obtain position information.

Exemplarily, the detecting module 620 is further specifically configured to: performing object detection on an image to be detected to obtain m probability maps, wherein the m probability maps respectively have m relative position attributes, the relative position attributes of the m probability maps are respectively corresponding to and consistent with the relative position attributes of m sub-regions of each candidate region in at least one candidate region in the candidate region, and each probability map in the m probability maps is used for indicating the probability that a pixel at the relative position attribute corresponding to the probability map in each prediction object region where a target object in the image to be detected is located belongs to the target object; for each of the at least one candidate region, calculating a sub-region probability that the target object exists for the m sub-regions respectively comprises: for each of the at least one candidate region, mapping the m sub-regions to m probability maps, respectively, based on the correspondence of the relative position attributes to determine a probability that all pixels in each of the m sub-regions belong to the target object; and for each sub-region of the m sub-regions, averaging the probabilities that all pixels in the sub-region belong to the target object to obtain a sub-region probability for the sub-region.

Illustratively, the detection module 620 is specifically configured to: and processing the image to be detected by using a full convolution network to obtain position information and m probability maps.

Illustratively, the calculation module 630 is specifically configured to: for each of the at least one candidate region, the probabilities of all sub-regions of the candidate region are averaged to obtain a candidate region probability for the candidate region.

Illustratively, the selection module 640 includes: selecting a candidate region with a candidate region probability greater than a preset probability threshold from at least one candidate region; and performing non-maximum suppression on the selected candidate area to obtain a suppressed candidate area as an object detection result.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

FIG. 7 shows a schematic block diagram of an object detection system 700 according to one embodiment of the present invention. Object detection system 700 includes an image acquisition device 710, a storage device 720, and a processor 730.

The image capturing device 710 is used for capturing an image to be detected. Image acquisition device 710 is optional and object detection system 700 may not include image acquisition device 710. In this case, an image to be detected may be acquired using other image acquisition devices and the acquired image may be transmitted to the object detection system 700.

The storage means 720 stores computer program instructions for implementing the respective steps in the object detection method according to an embodiment of the present invention.

The processor 730 is configured to run the computer program instructions stored in the storage 720 to execute the corresponding steps of the object detection method according to the embodiment of the present invention, and is configured to implement the obtaining module 610, the detecting module 620, the calculating module 630 and the selecting module 640 in the object detection apparatus 600 according to the embodiment of the present invention.

In one embodiment, the computer program instructions, when executed by the processor 730, are for performing the steps of: acquiring an image to be detected; carrying out object detection on an image to be detected so as to determine at least one candidate region for indicating the position of a target object; for each of at least one candidate region, dividing the candidate region into m sub-regions, wherein m is a positive integer greater than or equal to 2; respectively calculating the subregion probability of the target object existing in the m subregions; calculating the candidate region probability of the target object in the candidate region according to all the sub-region probabilities of the candidate region; and selecting the candidate region according with the preset rule according to the candidate region probability of at least one candidate region to obtain an object detection result.

Illustratively, the step of performing object detection on the image to be detected to determine at least one candidate region indicating the position of the target object, which is executed by the processor 730, includes: carrying out object detection on an image to be detected to obtain position information related to the position of a target object, wherein the position information comprises coordinates of n corner points, relative position attributes of the n corner points in the corresponding rectangular area and the length of the short side of the rectangular area corresponding to each of the n corner points, and n is an integer; and combining the n corner points according to the position information to obtain at least one candidate region.

Illustratively, the n corner points are divided into an upper left corner point group, a lower left corner point group, an upper right corner point group and a lower right corner point group, wherein the number of corner points in the upper left corner point group, the lower left corner point group, the upper right corner point group and the lower right corner point group is n respectively _i I =1,2,3,4, wherein n _i ≧ 0, the step of combining the n corner points based on the location information for execution by the computer program instructions when executed by the processor 730 to obtain at least one candidate region comprises: respectively selecting two matched corner groups from two corner groups of any corner group pair in the 4 corner group pairs to form a corner group pair, wherein the 4 corner group pairs comprise a corner group pair consisting of a left upper corner group and a right upper corner group, a corner group pair consisting of a right upper corner group and a right lower corner group, a corner group pair consisting of a left lower corner group and a right lower corner group, and a corner group pair consisting of a left upper corner group and a left lower corner group; and coordinates of two corner points in each of at least one pair of corner points obtained based on the combination of the 4 pairs of corner points and twoAnd determining at least one candidate area according to the side length of the short side corresponding to each corner point.

Illustratively, the step of selecting two matched corner group corner pairs from the two corner groups of any one of the 4 corner group pairs, respectively, for execution by the processor 730, comprises: selecting two corner points from two corner point groups of any one of the 4 corner point group pairs respectively; judging whether the two selected corner points meet the preset requirements or not, if so, determining that the two selected corner points are matched corner points and combining the two selected corner points into a corner point pair; wherein the preset requirements include: the difference value between the side lengths of the short sides corresponding to the two selected corner points is smaller than a preset difference value; the two selected corner points meet a preset spatial position relation; the side length of the short side of the rectangular area formed by combining the two selected corner points is within a preset range.

Illustratively, the step of performing object detection on the image to be detected to obtain the position information related to the position of the target object, which is executed by the processor 730, includes: and processing the image to be detected by using a full convolution network to obtain position information.

For example, the step of performing object detection on the image to be detected to determine at least one candidate region indicating the position of the target object, which is executed by the processor 730, further includes: performing object detection on an image to be detected to obtain m probability maps, wherein the m probability maps respectively have m relative position attributes, the relative position attributes of the m probability maps correspond to and are consistent with the relative position attributes of the m sub-regions of each candidate region in the candidate region, and each probability map in the m probability maps is used for indicating the probability that pixels at the relative position attributes corresponding to the probability map in each prediction object region where a target object in the image to be detected is located belong to the target object; the step of calculating a sub-region probability that the target object exists in the m sub-regions for each of the at least one candidate region, respectively, for the computer program instructions when executed by the processor 730, comprises: for each of the at least one candidate region, mapping the m sub-regions to m probability maps, respectively, based on the correspondence of the relative position attributes, to determine a probability that all pixels in each of the m sub-regions belong to the target object; and for each sub-region of the m sub-regions, averaging the probabilities that all pixels in the sub-region belong to the target object to obtain a sub-region probability for the sub-region.

Illustratively, the steps of processing the image to be detected using a full convolutional network to obtain position information, which the computer program instructions are used for when executed by the processor 730, include: and processing the image to be detected by using a full convolution network to obtain position information and m probability maps.

Illustratively, the step of calculating, for each of the at least one candidate regions, a candidate region probability that the target object exists in the candidate region according to all the sub-region probabilities of the candidate region, which is used by the processor 730 when the computer program instructions are executed, includes: for each of the at least one candidate region, the probabilities of all sub-regions of the candidate region are averaged to obtain a candidate region probability for the candidate region.

For example, the step of selecting a candidate region according to a preset rule according to the candidate region probability of each of at least one candidate region to obtain the object detection result, which is executed by the processor 730, includes: selecting a candidate region with a candidate region probability greater than a preset probability threshold from at least one candidate region; and performing non-maximum suppression on the selected candidate region to obtain a suppressed candidate region as an object detection result.

Further, according to an embodiment of the present invention, there is also provided a storage medium on which program instructions are stored, which when executed by a computer or a processor are used for executing the respective steps of the object detection method according to an embodiment of the present invention and for implementing the respective modules in the object detection apparatus according to an embodiment of the present invention. The storage medium may include, for example, a memory card of a smart phone, a storage component of a tablet computer, a hard disk of a personal computer, a read-only memory (ROM), an erasable programmable read-only memory (EPROM), a portable compact disc read-only memory (CD-ROM), a USB memory, or any combination of the above storage media.

In one embodiment, the program instructions, when executed by a computer or a processor, may cause the computer or the processor to implement the respective functional modules of the object detection apparatus according to the embodiment of the present invention and/or may perform the object detection method according to the embodiment of the present invention.

In one embodiment, the program instructions are operable when executed to perform the steps of: acquiring an image to be detected; carrying out object detection on an image to be detected so as to determine at least one candidate region for indicating the position of a target object; for each of at least one candidate region, dividing the candidate region into m sub-regions, wherein m is a positive integer greater than or equal to 2; respectively calculating the subregion probability of the target object in the m subregions; calculating the probability of the candidate region with the target object according to the probability of all the sub-regions of the candidate region; and selecting the candidate region according with the preset rule according to the candidate region probability of at least one candidate region to obtain the object detection result.

For example, the step of performing object detection on the image to be detected to determine at least one candidate region indicating the position of the target object, which is executed when the program instructions are executed, includes: carrying out object detection on an image to be detected to obtain position information related to the position of a target object, wherein the position information comprises coordinates of n corner points, relative position attributes of the n corner points in the corresponding rectangular area and the length of the short side of the rectangular area corresponding to each of the n corner points, and n is an integer; and combining the n corner points according to the position information to obtain at least one candidate region.

Illustratively, the n corner points are divided into an upper left corner point group, a lower left corner point group, an upper right corner point group and a lower right corner point group, wherein the upper left corner point group, the lower left corner point groupThe number of the angular points in the point group, the upper right corner point group and the lower right corner point group is n respectively _i I =1,2,3,4, wherein n _i And > 0, the step of combining n angular points according to the position information to obtain at least one candidate region, which is executed by the program instructions during the operation, comprises: respectively selecting two matched corner points from two corner point groups of any one of the 4 corner point groups to form a corner point pair, wherein the 4 corner point groups comprise a corner point pair consisting of a left upper corner point group and a right upper corner point group, a corner point pair consisting of a right upper corner point group and a right lower corner point group, a corner point pair consisting of a left lower corner point group and a right lower corner point group, and a corner point pair consisting of a left upper corner point group and a left lower corner point group; and determining at least one candidate region based on the coordinates of two corner points in each corner point pair of the at least one corner point pair obtained by combining the 4 corner point pair groups and the side lengths of the short sides corresponding to the two corner points.

Illustratively, the step of selecting, for execution at runtime, two matching corner groups from two corner groups of any one of the 4 corner groups into a corner pair, respectively, comprises: selecting two corner points from two corner point groups of any one of the 4 corner point group pairs respectively; judging whether the two selected corner points meet the preset requirements, if so, determining that the two selected corner points are matched corner points and combining the two selected corner points into a corner point pair; wherein the preset requirements include: the difference value between the short side lengths corresponding to the two selected corner points is smaller than a preset difference value; the two selected corner points meet a preset spatial position relation; the side length of the short side of the rectangular region formed by combining the two selected corner points is within a preset range.

Illustratively, the step of performing object detection on the image to be detected to obtain the position information related to the position of the target object, which is executed when the program instructions are executed, includes: and processing the image to be detected by using a full convolution network to obtain position information.

For example, the step of performing object detection on the image to be detected to determine at least one candidate region indicating the position of the target object, which is executed when the program instructions are executed, further includes: performing object detection on an image to be detected to obtain m probability maps, wherein the m probability maps respectively have m relative position attributes, the relative position attributes of the m probability maps are respectively corresponding to and consistent with the relative position attributes of m sub-regions of each candidate region in at least one candidate region in the candidate region, and each probability map in the m probability maps is used for indicating the probability that a pixel at the relative position attribute corresponding to the probability map in each prediction object region where a target object in the image to be detected is located belongs to the target object; the step of calculating, for each of the at least one candidate regions, a sub-region probability that the target object exists for the m sub-regions, respectively, for execution by the program instructions when executed, comprises: for each of the at least one candidate region, mapping the m sub-regions to m probability maps, respectively, based on the correspondence of the relative position attributes, to determine a probability that all pixels in each of the m sub-regions belong to the target object; and for each sub-region of the m sub-regions, averaging the probabilities that all pixels in the sub-region belong to the target object to obtain a sub-region probability for the sub-region.

Illustratively, the program instructions for executing, when executed, the step of processing the image to be detected using a full convolution network to obtain the position information includes: and processing the image to be detected by using a full convolution network to obtain position information and m probability maps.

Illustratively, the step of calculating, for each of at least one candidate region, a candidate region probability that the target object exists in the candidate region according to all sub-region probabilities of the candidate region includes: for each of the at least one candidate region, the probabilities of all sub-regions of the candidate region are averaged to obtain a candidate region probability for the candidate region.

Illustratively, the step of selecting, according to the candidate region probability of each of the at least one candidate region, a candidate region that meets a preset rule to obtain an object detection result, which is executed by the program instructions when running, includes: selecting a candidate region with a candidate region probability greater than a preset probability threshold from at least one candidate region; and performing non-maximum suppression on the selected candidate area to obtain a suppressed candidate area as an object detection result.

The modules in the object detection system according to the embodiment of the present invention may be implemented by the processor of the electronic device implementing object detection according to the embodiment of the present invention running computer program instructions stored in the memory, or may be implemented when computer instructions stored in the computer-readable storage medium of the computer program product according to the embodiment of the present invention are run by a computer.

Although the illustrative embodiments have been described herein with reference to the accompanying drawings, it is to be understood that the foregoing illustrative embodiments are merely exemplary and are not intended to limit the scope of the invention thereto. Various changes and modifications may be effected therein by one of ordinary skill in the pertinent art without departing from the scope or spirit of the present invention. All such changes and modifications are intended to be included within the scope of the present invention as set forth in the appended claims.

In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described device embodiments are merely illustrative, and for example, the division of the units is only one logical functional division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another device, or some features may be omitted, or not executed.

In the description provided herein, numerous specific details are set forth. It is understood, however, that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.

Similarly, it should be appreciated that in the description of exemplary embodiments of the invention, various features of the invention are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the invention and aiding in the understanding of one or more of the various inventive aspects. However, the method of the present invention should not be construed to reflect the intent: that the invention as claimed requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this invention.

It will be understood by those skilled in the art that all of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or elements of any method or apparatus so disclosed, may be combined in any combination, except combinations where such features are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.

Moreover, those skilled in the art will appreciate that although some embodiments described herein include some features included in other embodiments, not others, combinations of features of different embodiments are meant to be within the scope of the invention and form different embodiments. For example, in the claims, any of the claimed embodiments may be used in any combination.

The various component embodiments of the invention may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. It will be appreciated by those skilled in the art that some or all of the functionality of some of the modules in an object detection apparatus according to embodiments of the present invention may be implemented in practice using a microprocessor, digital Signal Processor (DSP), field Programmable Gate Array (FPGA), application Specific Integrated Circuit (ASIC), or the like. The present invention may also be embodied as apparatus programs (e.g., computer programs and computer program products) for performing a portion or all of the methods described herein. Such programs implementing the present invention may be stored on computer-readable media or may be in the form of one or more signals. Such a signal may be downloaded from an internet website, or provided on a carrier signal, or provided in any other form.

It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The usage of the words first, second and third, etcetera do not indicate any ordering. These words may be interpreted as names.

The above description is only for the purpose of describing the embodiments of the present invention or the description thereof, and the scope of the present invention is not limited thereto, and any person skilled in the art can easily think of the changes or substitutions within the technical scope of the present invention, and shall cover the scope of the present invention. The protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. An object detection method, comprising:

acquiring an image to be detected;

performing object detection on the image to be detected to determine at least one candidate region for indicating the position of a target object;

for each of the at least one candidate region,

dividing the candidate area into m sub-areas, wherein m is a positive integer greater than or equal to 2;

respectively calculating the sub-region probability of the target object existing in the m sub-regions;

calculating the candidate region probability of the target object in the candidate region according to all the sub-region probabilities of the candidate region;

selecting a candidate region according with a preset rule according to the candidate region probability of each candidate region to obtain an object detection result;

the object detection of the image to be detected to determine at least one candidate region for indicating the position of the target object includes:

performing object detection on the image to be detected to obtain position information related to the position of the target object, wherein the position information comprises coordinates of n corner points, relative position attributes of the n corner points in the corresponding rectangular region, and the length of the short side of the rectangular region corresponding to the n corner points, and n is an integer; and

combining the n corner points according to the position information to obtain the at least one candidate region.

2. The method of claim 1, wherein the n corner points are divided into an upper left corner point group, a lower left corner point group, an upper right corner point group, and a lower right corner point group, wherein the number of corner points in the upper left corner point group, the lower left corner point group, the upper right corner point group, and the lower right corner point group is divided intoIs n respectively _i I =1,2,3,4, wherein n _i ≥0，

The combining the n corner points according to the location information to obtain the at least one candidate region comprises:

selecting two matched corner points from two corner points of any one of 4 corner point group pairs to form a corner point pair, wherein the 4 corner point group pairs comprise a corner point pair consisting of a left upper corner point group and a right upper corner point group, a corner point pair consisting of a right upper corner point group and a right lower corner point group, a corner point pair consisting of a left lower corner point group and a right lower corner point group, and a corner point pair consisting of a left upper corner point group and a left lower corner point group; and

and determining the at least one candidate region based on the coordinates of two corner points in each corner point pair obtained by combining the 4 corner point pairs and the side length of the short side corresponding to the two corner points.

3. The method of claim 2, wherein said selecting two matching corner points from the two corner point groups of any of the 4 corner point groups to be combined into a corner point pair comprises:

selecting two corner points from two corner point groups of any one of the 4 corner point group pairs respectively;

judging whether the two selected corner points meet preset requirements or not, if so, determining that the two selected corner points are matched corner points and combining the two selected corner points into a corner point pair;

wherein the preset requirements include:

the difference value between the side lengths of the short sides corresponding to the two selected angular points is smaller than a preset difference value;

the two selected corner points meet a preset spatial position relationship;

and the side length of the short side of the rectangular area formed by combining the selected two angular points is within a preset range.

4. The method as claimed in claim 1, wherein said performing object detection on said image to be detected to obtain position information related to a position where said target object is located comprises:

and processing the image to be detected by utilizing a full convolution network to obtain the position information.

5. The method of any one of claims 1 to 3, wherein the object detecting the image to be detected to determine at least one candidate region indicating a location of a target object further comprises:

performing object detection on the image to be detected to obtain m probability maps, wherein the m probability maps respectively have m relative position attributes, the relative position attributes of the m probability maps are respectively corresponding to and consistent with the relative position attributes of m sub-regions of each candidate region in the at least one candidate region in the candidate region, and each probability map in the m probability maps is used for indicating the probability that a pixel at the relative position attribute corresponding to the probability map in each prediction object region where the target object in the image to be detected is located belongs to the target object;

the calculating, for each of the at least one candidate region, a sub-region probability that the target object exists for the m sub-regions respectively comprises:

for each of the at least one candidate region,

mapping the m sub-regions to the m probability maps, respectively, based on a correspondence of relative position attributes, to determine a probability that all pixels in each of the m sub-regions belong to the target object; and

for each sub-region of the m sub-regions, the probabilities that all pixels in the sub-region belong to the target object are averaged to obtain a sub-region probability for the sub-region.

6. The method as claimed in claim 5, wherein said performing object detection on said image to be detected to obtain position information related to a position of said target object comprises:

7. The method of claim 6, wherein the processing the image to be detected using a full convolution network to obtain the position information comprises:

and processing the image to be detected by utilizing a full convolution network so as to obtain the position information and the m probability maps.

8. The method of any one of claims 1 to 4, wherein said calculating, for each of said at least one candidate region, a candidate region probability that the target object exists for the candidate region based on all sub-region probabilities for the candidate region comprises:

for each of the at least one candidate region, the probabilities of all sub-regions of the candidate region are averaged to obtain a candidate region probability for the candidate region.

9. The method according to any one of claims 1 to 4, wherein the selecting the candidate region according to the candidate region probability of each of the at least one candidate region to obtain the object detection result comprises:

selecting a candidate region with a candidate region probability greater than a preset probability threshold from the at least one candidate region; and

and performing non-maximum suppression on the selected candidate region to obtain a suppressed candidate region as the object detection result.

10. An object detecting apparatus comprising:

the acquisition module is used for acquiring an image to be detected;

the detection module is used for carrying out object detection on the image to be detected so as to determine at least one candidate region for indicating the position of the target object;

a calculating module, configured to divide each of the at least one candidate region into m sub-regions, where m is a positive integer greater than or equal to 2; respectively calculating the sub-region probability of the target object existing in the m sub-regions; calculating the probability of the candidate region with the target object according to the probability of all the sub-regions of the candidate region;

the selection module is used for selecting a candidate region which accords with a preset rule according to the probability of the respective candidate region of the at least one candidate region so as to obtain an object detection result;

wherein the detection module is specifically configured to:

performing object detection on the image to be detected to obtain position information related to the position of the target object, wherein the position information comprises coordinates of n corner points, relative position attributes of the n corner points in the corresponding rectangular area and the length of the short side of the rectangular area corresponding to the n corner points, and n is an integer; and

and combining the n corner points according to the position information to obtain the at least one candidate region.

11. An object detection system comprising a processor and a memory, wherein the memory has stored therein computer program instructions for execution by the processor for performing the object detection method of any one of claims 1-9.

12. A storage medium having stored thereon program instructions for performing, when executed, the object detection method of any one of claims 1-9.