CN113674346A

CN113674346A - Image detection method, image detection device, electronic equipment and computer-readable storage medium

Info

Publication number: CN113674346A
Application number: CN202010408358.3A
Authority: CN
Inventors: 吴凯; 杨磊
Original assignee: Beijing Jingdong Qianshi Technology Co Ltd
Current assignee: Beijing Jingdong Qianshi Technology Co Ltd
Priority date: 2020-05-14
Filing date: 2020-05-14
Publication date: 2021-11-19
Anticipated expiration: 2040-05-14
Also published as: CN113674346B

Abstract

The disclosure provides an image detection method, an image detection device, electronic equipment and a computer-readable storage medium, and relates to the technical field of computers. The image detection method comprises the following steps: carrying out point cloud slicing processing on the three-dimensional point cloud image to obtain a corresponding point cloud slice image; inputting the point cloud slice image into a backbone convolution neural network to obtain a corresponding characteristic image; performing multi-feature head prediction on the feature map to obtain a first probability map, a positioning map and a circle center detection branch; determining a corresponding second probability map according to the first probability map and the circle center detection branch; and determining the detection result of the three-dimensional point cloud picture according to the positioning picture and the second probability picture. According to the technical scheme, point cloud slicing and pixel-by-pixel positioning regression are carried out on the three-dimensional point cloud image, a novel probability image optimization method is provided, the data calculation amount of image detection is reduced, the problem that an interference frame is generated at the edge of an object is solved, and the reliability and the accuracy of image detection are improved.

Description

Image detection method, image detection device, electronic equipment and computer-readable storage medium

Technical Field

The present disclosure relates to the field of image processing technologies, and in particular, to an image detection method and apparatus, an electronic device, and a computer-readable storage medium.

Background

Object detection is an important branch of computer vision, and the problem to be solved is to find a target in an image and give a category and a coordinate frame of the target, in the last few years, object detection has enjoyable results, many algorithms are proposed, and the accuracy of the target detection is higher and higher.

Existing object detection schemes are based in large part on setting a large number of anchors and corresponding parameters, such as size, dimensions and resolution. However, anchor-based solutions face a number of problems, such as:

(1) the setting of the anchor is very important, and the design of the hyper-parameter is difficult.

(2) Anchors are difficult to cover all shaped targets and have poor generalization capability.

(3) Usually, a large number of anchors are selected to achieve a better recall ratio, and the calculation amount and the video memory consumption are large.

(4) The edge of the real object is easy to generate an interference frame, the detection of the blocked object can be biased by the blocked object, and the problem of low recall rate is caused.

It is to be noted that the information disclosed in the above background section is only for enhancement of understanding of the background of the present disclosure, and thus may include information that does not constitute prior art known to those of ordinary skill in the art.

Disclosure of Invention

An object of the present disclosure is to provide an image detection method, apparatus, electronic device, and computer-readable storage medium, which overcome, at least to some extent, the problems of large amount of computation and low detection speed in the related art.

Additional features and advantages of the disclosure will be set forth in the detailed description which follows, or in part will be obvious from the description, or may be learned by practice of the disclosure.

According to an aspect of the present disclosure, there is provided an image detection method including: carrying out point cloud slicing processing on the three-dimensional point cloud image to obtain a corresponding point cloud slice image; inputting the point cloud slice image into a backbone convolution neural network to obtain a corresponding characteristic image; performing multi-feature head prediction on the feature map to obtain a first probability map, a positioning map and a circle center detection branch; determining a corresponding second probability map according to the first probability map and the circle center detection branch; and determining the detection result of the three-dimensional point cloud picture according to the positioning picture and the second probability picture.

In an embodiment of the present disclosure, the point cloud slicing processing is performed on the three-dimensional point cloud image to obtain a corresponding point cloud slice image, which specifically includes the following steps: converting the three-dimensional point cloud picture into a three-dimensional multilayer picture in a specified dimension direction in a projection mode; and determining a slice point cloud picture according to the three-dimensional multilayer picture.

In an embodiment of the present disclosure, performing multi-feature head prediction on a feature map to obtain a first probability map, a positioning map, and a circle center detection branch, specifically includes the following steps: determining an edge frame of a real object in the three-dimensional point cloud picture; determining pixel points in the point cloud slice image corresponding to the characteristic positions in the characteristic image, and recording the pixel points as target pixel points; and regressing the target pixel points according to the edge frame of the real object to obtain a first probability map and a positioning map.

In an embodiment of the present disclosure, the regression parameter of the target pixel point includes a distance between the target pixel point and the edge frame, the edge frame of the real object includes an upper frame, a lower frame, a left frame, and a right frame, and the distance includes a distance between the target pixel point and the left frame, a distance between the target pixel point and the right frame, a distance between the target pixel point and the upper frame, and a distance between the target pixel point and the lower frame.

In one embodiment of the present disclosure, the regression parameters of the target pixel point further include a pose angle of an edge frame of the real object with respect to the feature map.

In an embodiment of the present disclosure, the generating of the circle center detection branch according to the regression result of the target pixel includes the following steps: calculating Euclidean distance and regularization distance between a target pixel point and the central point of a real object; performing minimization processing on the Euclidean distance according to an excitation constant of the Euclidean distance to obtain a minimized Euclidean distance; and determining the circle center detection branch according to the minimized Euclidean distance and the regularization distance.

In one embodiment of the present disclosure, the image detection method further includes: judging whether the calculation result of the minimized Euclidean distance is larger than 1; and if the calculation result of the minimized Euclidean distance is judged to be larger than 1, setting the calculation result of the minimized Euclidean distance to be 1.

In one embodiment of the present disclosure, the image detection method further includes: determining a focus loss function according to the relation between the prediction probability value and the real probability value of the probability map; determining a smooth loss function according to the relation between the predicted positioning value and the real positioning value of the positioning map; and generating a loss function for training the backbone convolutional neural network according to the focus loss function and the smooth loss function.

According to another aspect of the present disclosure, there is provided an image detection apparatus including: the slicing module is used for carrying out point cloud slicing processing on the three-dimensional point cloud image to obtain a corresponding point cloud slice image; the convolution module is used for inputting the point cloud slice image into the backbone convolution neural network to obtain a corresponding characteristic image; the prediction module is used for carrying out multi-feature head prediction on the feature map so as to obtain a first probability map, a positioning map and a circle center detection branch; the determining module is used for determining a corresponding second probability map according to the first probability map and the circle center detection branch; the determining module is further used for determining the detection result of the three-dimensional point cloud picture according to the positioning picture and the second probability picture.

According to still another aspect of the present disclosure, there is provided an electronic device including: a processor; and a memory for storing executable instructions for the processor; wherein the processor is configured to perform the image detection method of any one of the above via execution of executable instructions.

According to yet another aspect of the present disclosure, there is provided a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the image detection method of any one of the above.

According to the image detection scheme provided by the embodiment of the disclosure, the three-dimensional point cloud image is sliced, and the anchor-free technology is adopted to perform pixel-by-pixel positioning regression processing, so that a large number of anchors and hyper-parameters corresponding to the anchors are avoided, the calculated amount is reduced, and the image detection speed is improved.

Furthermore, by providing a new circle center detection branch, the optimization of the probability map is realized, especially the influence of an interference frame generated by the edge of a real object is reduced, and the accuracy and the recall rate of the regression result are improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure. It is to be understood that the drawings in the following description are merely exemplary of the disclosure, and that other drawings may be derived from those drawings by one of ordinary skill in the art without the exercise of inventive faculty.

FIG. 1 is a schematic diagram illustrating an image inspection system configuration according to an embodiment of the present disclosure;

FIG. 2 shows a flow chart of an image detection method in an embodiment of the present disclosure;

FIG. 3 shows a flow chart of another image detection method in an embodiment of the present disclosure;

FIG. 4 is a flow chart illustrating a further method of image detection in an embodiment of the present disclosure;

FIG. 5 shows a schematic diagram of an image detection scheme of the prior art;

FIG. 6 shows a schematic diagram of an image detection scheme in an embodiment of the present disclosure;

FIG. 7 shows a flow chart of another image detection method in an embodiment of the disclosure;

FIG. 8 shows a flow chart of yet another image detection method in an embodiment of the present disclosure;

FIG. 9 shows a flow chart of yet another image detection method in an embodiment of the present disclosure;

FIG. 10 is a flow chart illustrating yet another image detection method in an embodiment of the present disclosure;

FIG. 11 is a flow chart illustrating yet another image detection method in an embodiment of the present disclosure;

FIG. 12 is a schematic diagram illustrating a functional distribution of an image detection scheme in an embodiment of the present disclosure;

FIG. 13 is a schematic diagram of an image inspection interface of the prior art;

FIG. 14 is a schematic diagram illustrating an image detection interface in an embodiment of the present disclosure;

FIG. 15 is a schematic diagram of an image detection apparatus in an embodiment of the present disclosure;

FIG. 16 shows a schematic view of an electronic device in an embodiment of the disclosure; and

FIG. 17 shows a schematic diagram of a computer-readable storage medium in an embodiment of the disclosure.

Detailed Description

Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art. The described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.

Furthermore, the drawings are merely schematic illustrations of the present disclosure and are not necessarily drawn to scale. The same reference numerals in the drawings denote the same or similar parts, and thus their repetitive description will be omitted. Some of the block diagrams shown in the figures are functional entities and do not necessarily correspond to physically or logically separate entities. These functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor devices and/or microcontroller devices.

According to the scheme, the three-dimensional point cloud images are sliced, the anchor free technology is adopted for positioning regression processing pixel by pixel, the anchors and the hyper-parameters corresponding to the anchors do not need to be set, the calculated amount is reduced, and the image detection speed is improved. For ease of understanding, the following first explains several terms referred to in this application.

Computer Vision technology (CV) Computer Vision is a science for researching how to make a machine "see", and further refers to that a camera and a Computer are used to replace human eyes to perform machine Vision such as identification, tracking and measurement on a target, and further image processing is performed, so that the Computer processing becomes an image more suitable for human eyes to observe or transmitted to an instrument to detect. As a scientific discipline, computer vision research-related theories and techniques attempt to build artificial intelligence systems that can capture information from images or multidimensional data. Computer vision technologies generally include image processing, image Recognition, image semantic understanding, image retrieval, OCR (Optical Character Recognition), video processing, video semantic understanding, video content/behavior Recognition, three-dimensional object reconstruction, 3D (3-Dimension) technology, virtual reality, augmented reality, synchronous positioning, map construction, and other technologies, and also include common biometric technologies such as face Recognition and fingerprint Recognition.

Instead of specifying each image category of interest directly in the code, the image detection algorithm provides a number of examples for each image category of the computer, and then designs a learning algorithm, looks at the examples and learns the visual appearance of each category. That is, a training set of labeled images is first accumulated and then input into a computer, which processes the data.

The currently popular image classification architecture is the Convolutional Neural Network (CNN): the image is sent to a network, which then classifies the image data. The convolutional neural network starts with an input "scanner" that does not parse all of the training data at once. For example, inputting an image of size 100 x 100, you do not need a network layer of 10000 nodes. Instead, you only need to create one scan-in layer of size 10 × 10, scanning the first 10 × 10 pixels of the image. The scanner then moves one pixel to the right and scans the next 10 x 10 pixels, which is the sliding window.

The scheme provided by the embodiment of the application relates to technologies such as graphic processing and image recognition of a computer vision technology, and is specifically explained by the following embodiment.

Fig. 1 shows a schematic structural diagram of an image detection system in an embodiment of the present disclosure, which includes a plurality of terminals 120 and a server cluster 140.

The terminal 120 may be a mobile terminal such as a mobile phone, a game console, a tablet Computer, an e-book reader, smart glasses, an MP4(Moving Picture Experts Group Audio Layer IV) player, an intelligent home device, an AR (Augmented Reality) device, a VR (Virtual Reality) device, or a Personal Computer (PC), such as a laptop Computer and a desktop Computer.

Among them, an application program for providing image detection may be installed in the terminal 120.

The terminals 120 are connected to the server cluster 140 through a communication network. Optionally, the communication network is a wired network or a wireless network.

The server cluster 140 is a server, or is composed of a plurality of servers, or is a virtualization platform, or is a cloud computing service center. The server cluster 140 is used to provide background services for applications that provide image detection. Optionally, the server cluster 140 undertakes primary computational work and the terminal 120 undertakes secondary computational work; alternatively, the server cluster 140 undertakes secondary computing work and the terminal 120 undertakes primary computing work; alternatively, the terminal 120 and the server cluster 140 perform cooperative computing by using a distributed computing architecture.

In some alternative embodiments, the server cluster 140 is used to store image inspection information, such as images to be inspected, a library of reference images, and images for which inspection is completed.

Alternatively, the clients of the applications installed in different terminals 120 are the same, or the clients of the applications installed on two terminals 120 are clients of the same type of application of different control system platforms. Based on different terminal platforms, the specific form of the client of the application program may also be different, for example, the client of the application program may be a mobile phone client, a PC client, or a World Wide Web (Web) client.

Those skilled in the art will appreciate that the number of terminals 120 described above may be greater or fewer. For example, the number of the terminals may be only one, or several tens or hundreds of the terminals, or more. The number of terminals and the type of the device are not limited in the embodiments of the present application.

Optionally, the system may further include a management device (not shown in fig. 1), and the management device is connected to the server cluster 140 through a communication network. Optionally, the communication network is a wired network or a wireless network.

Optionally, the wireless network or wired network described above uses standard communication techniques and/or protocols. The Network is typically the Internet, but may be any Network including, but not limited to, a Local Area Network (LAN), a Metropolitan Area Network (MAN), a Wide Area Network (WAN), a mobile, wireline or wireless Network, a private Network, or any combination of virtual private networks. In some embodiments, data exchanged over a network is represented using techniques and/or formats including Hypertext Mark-up Language (HTML), Extensible markup Language (XML), and the like. All or some of the links may also be encrypted using conventional encryption techniques such as Secure Socket Layer (SSL), Transport Layer Security (TLS), Virtual Private Network (VPN), Internet protocol Security (IPsec). In other embodiments, custom and/or dedicated data communication techniques may also be used in place of, or in addition to, the data communication techniques described above.

Hereinafter, each step of the image detection method in the present exemplary embodiment will be described in more detail with reference to the drawings and examples.

Fig. 2 shows a flowchart of an image detection method in an embodiment of the present disclosure. The method provided by the embodiment of the present disclosure may be performed by any electronic device with computing processing capability, for example, the terminal 120 and/or the server cluster 140 in fig. 1. In the following description, the terminal 120 is taken as an execution subject for illustration.

As shown in fig. 2, the terminal 120 performs an image detection method including the steps of:

step S202, point cloud slicing processing is carried out on the three-dimensional point cloud image to obtain a corresponding point cloud slice image.

For example, the three-dimensional point cloud image is usually sliced in the z-axis direction by referring to xyz coordinates, and is usually sliced equally, but not limited thereto.

As shown in FIG. 3, an image having a height Z satisfying Z1 ≦ Z < Z2 is sliced into a first slice, i.e., a first pseudo two-dimensional map. And slicing the image with the height Z satisfying Z2 ≦ Z < Z3 into a second slice image, namely a second pseudo-two-dimensional image. And slicing the image with the height Z satisfying Z3 ≦ Z < Z4 into a third slice image, namely a third pseudo two-dimensional image. And slicing the image with the height Z satisfying Z4 ≦ Z < Z5 into a fourth slice, namely a fourth pseudo-two-dimensional map.

Further, the pseudo-two-dimensional map comprises at least one object to be detected and a real object frame.

And step S204, inputting the point cloud slice image into a backbone convolutional neural network to obtain a corresponding characteristic image.

According to the embodiment of the method and the device, the point cloud slice image is subjected to pixel-by-pixel feature extraction, and an anchor and corresponding hyper-parameters are not required to be set, so that the calculation amount is reduced, and the difficulty of model setting is also reduced. The generated characteristic diagram at least comprises H, W and C parameters which respectively represent the length, the width and the channel number.

As shown in fig. 4, the slice point cloud image is convolved three times to obtain first convolution results f1, f2, f3, and f4, a second convolution result f4, and third convolution results f6, f7, f8, and f9, but is not limited thereto.

In actual use, the final feature map of the convolution process may represent the following identification, but is not limited thereto, such as, for example, f₆: identification of the vehicle, f₇: prediction of bicycles and tricycles, f₈: prediction of pedestrian and f₁₀: prediction of trucks and buses.

And step S206, performing multi-feature head prediction on the feature map to obtain a first probability map, a positioning map and a circle center detection branch.

The present disclosure provides a method for performing multi-feature header prediction by multiplying a feature map by a 1 × 1 convolutional network, generally to enlarge feature values of the feature map, but is not limited thereto. As shown in fig. 4, 4 operations of 1x1 convolution for f6, f7, f8 and f10 to generate a multi-feature head are performed, and since similar objects have size similarity on different point cloud slice images, the target range detected by the technical scheme of the present disclosure can be well limited.

As shown in fig. 4, the result of the multi-feature header prediction includes classification (classification), circle-center detection branch (circle-center), and region matrix regression (Bbox regression), where Bbox regression refers to transforming the currently predicted Bbox to be closer to a ground real box (which generally refers to labeled data in a supervised learning process) by using a transformation method.

In FIGS. 5 and 6, the grid represents the final feature map, and point C is the center (x) of the object_c,y_c) P is the position being detected (x)_p,y_p)。

As shown in fig. 5, in the related art, in the three-dimensional object recognition process, regression calculation of k anchors is performed for each position of each feature map.

As shown in fig. 6, (l, r, t, b) of each regression position relative to the real object at the point are four distances from the left, right, top and bottom of the real object frame, respectively.

For example, in the anchor-based detection method, there are more than 10 anchors for each pixel of the feature map, the regression parameters of each anchor are five (the height value is not important in the automatic driving field and is temporarily ignored), and the regression parameters of each pixel of the scheme are five in total, where l, r, b, t and yaw are attitude angles, and the image detection scheme disclosed by the invention is smaller than the anchor-based method by 10 times in parameter calculation.

Wherein l is the distance between the P point and the left frame of the real frame, r is the distance between the P point and the right frame of the real frame, t is the distance between the P point and the upper frame of the real frame, and b is the distance between the P point and the lower frame of the real frame.

In addition, since each pixel of the feature map generates a regression result of the frame, the filtering needs to be performed by a probability map. The size of the probability map is the same as the location map, and the number of channels varies depending on the method. c. CⁱE C, i e {1, …, len (C) }, C represents all categories to be detected, such as car, person, bicycle, tricycle, etc., CⁱRepresenting class i, len (c) represents how many classes of objects in common need to be detected. Therefore, based on the image detection method of the present disclosure, the size of the probability map is H × W × len (C + 1).

And step S208, determining a corresponding second probability map according to the first probability map and the circle center detection branch.

Although the regression framework of the pixel-by-pixel can solve the problem that the regression frame is inaccurate due to shielding between different objects, the detection is inaccurate due to the low-quality identification frame detected at the edge of the object. The detection boxes at the edges of objects are generally less accurate in class identification and regression than the boxes detected around the center of the object, but are not filtered out because of the smaller intersection.

The first probability graph is optimized by providing a new circle center detection branch, and the attenuation of the image density is optimized based on the circle center detection branch, namely, the attenuation from the center point to the edge point is smoother and real, so that the reliability and the accuracy of object detection are effectively improved.

And step S210, determining the detection result of the three-dimensional point cloud picture according to the positioning picture and the second probability picture.

As shown in fig. 7, in an embodiment of the present disclosure, in step S202, a point cloud slicing process is performed on the three-dimensional point cloud image to obtain a corresponding point cloud slice image, which specifically includes the following steps:

step S2022, converting the three-dimensional point cloud image into a three-dimensional multi-layer image with a specified dimension direction by projection.

Wherein, if the three-dimensional point cloud chart is placed under an xyz-axis coordinate system, the specified dimension may be one of a z-axis direction, an x-axis direction and a y-axis direction, but is not limited thereto.

In addition, for the number of point cloud slices, which is typically determined by the projection density, e.g., by determining that the point cloud slice processing is performed in the z-axis direction, the number of slices is determined based on the pixel density projected on the x-y plane.

And step S2024, determining a slice point cloud picture according to the three-dimensional multilayer picture.

As shown in fig. 8, in an embodiment of the present disclosure, in step S206, performing multi-feature-head prediction on the feature map to obtain a first probability map, a positioning map, and a circle center detection branch, specifically including the following steps:

and step S2062, determining the edge frame of the real object in the three-dimensional point cloud picture.

Step S2064, determining pixel points in the point cloud slice image corresponding to the characteristic positions in the characteristic image, and recording the pixel points as target pixel points.

Step S2066, the target pixel points are regressed according to the edge frame of the real object, so that a first probability map and a positioning map are obtained.

In an embodiment of the present disclosure, the attitude angle of the real frame is yaw, the regression parameter of the target pixel point includes a distance between the target pixel point and the edge frame, the edge frame of the real object includes an upper frame, a lower frame, a left frame, and a right frame, and the distance includes a distance l between the target pixel point and the left frame, a distance r between the target pixel point and the right frame, a distance t between the target pixel point and the upper frame, and a distance b between the target pixel point and the lower frame.

As shown in fig. 9, in an embodiment of the present disclosure, generating a circle center detection branch according to a regression result of a target pixel point specifically includes the following steps:

step S2068, calculating the euclidean distance and the regularization distance between the target pixel point and the center point of the real object.

Therefore, the new circle center detection branch proposed by the present disclosure can simultaneously depict the Euclidean distance and the regularization distance from the point P of the detection point to the center,

step S20610, performing minimization on the euclidean distance according to the excitation constant of the euclidean distance to obtain a minimized euclidean distance.

In one embodiment of the present disclosure, the excitation constant has a value range of 6 ± 1, but is not limited thereto, and further, the rapid increase of the central region may be limited by the minimized euclidean distance, so that the attenuation amplitude is normal.

Step S20612, according to the minimized Euclidean distance and the regularization distance, the circle center detection branch is determined.

Since the minimized euclidean distance does not allow the central region to exhibit attenuation, a regularization distance is introduced, that is, the minimized euclidean distance and the regularization distance are combined to realize normal attenuation from the center to the periphery.

In one embodiment of the present disclosure, euclidean_limitFor the minimum Euclidean distance, α is an excitation constant, centeress is a circle center detection branch, l is a distance between a P point and a left frame of a real frame, r is a distance between the P point and a right frame of the real frame, t is a distance between the P point and an upper frame of the real frame, b is a distance between the P point and a lower frame of the real frame, and an expression of the circle center detection branch can adopt the following mode:

as shown in fig. 10, in one embodiment of the present disclosure, the image detection method further includes:

in step S212, it is determined whether the calculation result of the minimized euclidean distance is greater than 1.

In step S214, if it is determined that the calculation result of the minimized euclidean distance is greater than 1, the calculation result of the minimized euclidean distance is set to 1.

Where a minimized euclidean distance is applied to the probability map, it makes no sense to be greater than 1.

As shown in fig. 11, in one embodiment of the present disclosure, the image detection method further includes:

step S216, determining a focus loss function according to the relation between the predicted probability value and the real probability value of the probability map.

In embodiments of the present disclosure, the accuracy of the subsequent output probability map of the convolutional neural network may be optimized by the above-described focus loss function.

Step S218, determining a smooth loss function according to the relationship between the predicted positioning value and the real positioning value of the positioning map.

In embodiments of the present disclosure, the accuracy of the subsequent output localization map of the convolutional neural network may be optimized by the above-described smoothing loss function.

Step S220, generating a loss function for training the backbone convolutional neural network according to the focus loss function and the smooth loss function.

In the embodiment of the disclosure, the convolutional neural network is trained by combining the focus loss function and the smoothing loss function, so that the trained convolutional neural network outputs a more accurate and reliable regression result.

As shown in fig. 12, a line L1 is an image density distribution generated based on the euclidean distance, a line L2 is an image density distribution generated based on the regularization distance, and a line L3 is an image density distribution generated based on the circle center detecting branch of the embodiment of the present disclosure.

For example, if a real object is an object with a length of 20cm and a width of 30cm, the result of the attenuation probability visualization can be seen in fig. 13 and 14. Fig. 13 shows a real object D1, and fig. 14 shows an attenuation result D2 after the circle center detection branch processing, that is, the attenuation result is circular from the center to the periphery of the image.

According to one embodiment of the present disclosure, the circle center detection branch is also very simple to use, and is directly multiplied by the first probability map to weaken the probability of the edge portion of the object, so as to generate an optimized probability map, i.e., the second probability map.

It is to be noted that the above-mentioned figures are only schematic illustrations of the processes involved in the method according to an exemplary embodiment of the invention, and are not intended to be limiting. It will be readily understood that the processes shown in the above figures are not intended to indicate or limit the chronological order of the processes. In addition, it is also readily understood that these processes may be performed synchronously or asynchronously, e.g., in multiple modules.

As will be appreciated by one skilled in the art, aspects of the present invention may be embodied as a system, method or program product. Thus, various aspects of the invention may be embodied in the form of: an entirely hardware embodiment, an entirely software embodiment (including firmware, microcode, etc.) or an embodiment combining hardware and software aspects that may all generally be referred to herein as a "circuit," module "or" system.

An image detection apparatus 900 according to this embodiment of the present invention is described below with reference to fig. 15. The image detection apparatus 900 shown in fig. 15 is only an example, and should not bring any limitation to the functions and the range of use of the embodiment of the present invention.

The image detection apparatus 900 is represented in the form of a hardware module. The components of the image detection apparatus 900 may include, but are not limited to: a slicing module 902, configured to perform point cloud slicing on the three-dimensional point cloud image to obtain a corresponding point cloud slice image; a convolution module 904, configured to input the point cloud slice image into a backbone convolution neural network to obtain a corresponding feature map; a prediction module 906, configured to perform multi-feature head prediction on the feature map to obtain a first probability map, a positioning map, and a circle center detection branch; a determining module 908, configured to determine a corresponding second probability map according to the first probability map and the circle center detection branch; the determining module 908 is further configured to determine a detection result of the three-dimensional point cloud image according to the positioning map and the second probability map.

In one embodiment of the present disclosure, the slicing module 902 is further configured to: converting the three-dimensional point cloud picture into a three-dimensional multilayer picture in a specified dimension direction in a projection mode; and determining a slice point cloud picture according to the three-dimensional multilayer picture.

In an embodiment of the present disclosure, the prediction module 906 is further configured to: determining an edge frame of a real object in the three-dimensional point cloud picture; determining pixel points in the point cloud slice image corresponding to the characteristic positions in the characteristic image, and recording the pixel points as target pixel points; and regressing the target pixel points according to the edge frame of the real object to obtain a first probability map and a positioning map.

In an embodiment of the present disclosure, the prediction module 906 is further configured to: calculating Euclidean distance and regularization distance between a target pixel point and the central point of a real object; performing minimization processing on the Euclidean distance according to an excitation constant of the Euclidean distance to obtain a minimized Euclidean distance; and determining the circle center detection branch according to the minimized Euclidean distance and the regularization distance.

In one embodiment of the present disclosure, the image detection apparatus further includes: a judging module 910, configured to judge whether a calculation result of the minimized euclidean distance is greater than 1; and if the calculation result of the minimized Euclidean distance is judged to be larger than 1, setting the calculation result of the minimized Euclidean distance to be 1.

In one embodiment of the present disclosure, the image detection apparatus further includes: a training module 912, configured to determine a focus loss function according to a relationship between the predicted probability value and the actual probability value of the probability map; determining a smooth loss function according to the relation between the predicted positioning value and the real positioning value of the positioning map; and generating a loss function for training the backbone convolutional neural network according to the focus loss function and the smooth loss function.

An electronic device 1000 according to this embodiment of the invention is described below with reference to fig. 15. The electronic device 1000 shown in fig. 15 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present invention.

As shown in fig. 15, the electronic device 1000 is embodied in the form of a general purpose computing device. The components of the electronic device 1000 may include, but are not limited to: the at least one processing unit 1010, the at least one memory unit 1020, and a bus 1030 that couples various system components including the memory unit 1020 and the processing unit 1010.

Where the storage unit stores program code that may be executed by the processing unit 1010 to cause the processing unit 1010 to perform the steps according to various exemplary embodiments of the present invention described in the "exemplary methods" section above in this specification. For example, the processing unit 1010 may perform steps S202, S204, S206, S2010, and S210 as shown in fig. 1, and other steps defined in the image detection method of the present disclosure.

The storage unit 1020 may include readable media in the form of volatile memory units, such as a random access memory unit (RAM)10201 and/or a cache memory unit 10202, and may further include a read-only memory unit (ROM) 10203.

The memory unit 1020 may also include a program/utility 10204 having a set (at least one) of program modules 10205, such program modules 10205 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.

Bus 1030 may be any one or more of several types of bus structures including a memory unit bus or memory unit controller, a peripheral bus, an accelerated graphics port, a processing unit, and a local bus using any of a variety of bus architectures.

The electronic device 1000 may also communicate with one or more external devices 700 (e.g., keyboard, pointing device, bluetooth device, etc.), with one or more devices that enable a user to interact with the electronic device, and/or with any devices (e.g., router, modem, etc.) that enable the electronic device 1000 to communicate with one or more other computing devices. Such communication may occur through input/output (I/O) interfaces 1050. Also, the electronic device 1000 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network such as the internet) via the network adapter 1060. As shown, the network adapter 1060 communicates with the other modules of the electronic device 1000 over the bus 1030. It should be appreciated that although not shown in the figures, other hardware and/or software modules may be used in conjunction with the electronic device, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.

Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiments of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (which may be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to enable a computing device (which may be a personal computer, a server, a terminal device, or a network device, etc.) to execute the method according to the embodiments of the present disclosure.

In an exemplary embodiment of the present disclosure, there is also provided a computer-readable storage medium having stored thereon a program product capable of implementing the above-described method of the present specification. In some possible embodiments, aspects of the invention may also be implemented in the form of a program product comprising program code means for causing a terminal device to carry out the steps according to various exemplary embodiments of the invention described in the above-mentioned "exemplary methods" section of the present description, when the program product is run on the terminal device.

Referring to fig. 17, a program product 1100 for implementing the above method according to an embodiment of the present invention is described, which may employ a portable compact disc read only memory (CD-ROM) and include program code, and may be run on a terminal device, such as a personal computer. However, the program product of the present invention is not limited in this regard and, in the present document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

A computer readable signal medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).

It should be noted that although in the above detailed description several modules or units of the device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functionality of two or more modules or units described above may be embodied in one module or unit, according to embodiments of the present disclosure. Conversely, the features and functions of one module or unit described above may be further divided into embodiments by a plurality of modules or units.

Moreover, although the steps of the methods of the present disclosure are depicted in the drawings in a particular order, this does not require or imply that the steps must be performed in this particular order, or that all of the depicted steps must be performed, to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step execution, and/or one step broken down into multiple step executions, etc.

Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiments of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (which may be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to enable a computing device (which may be a personal computer, a server, a mobile terminal, or a network device, etc.) to execute the method according to the embodiments of the present disclosure.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

Claims

1. An image detection method, comprising:

carrying out point cloud slicing processing on the three-dimensional point cloud image to obtain a corresponding point cloud slice image;

inputting the point cloud slice image into a backbone convolutional neural network to obtain a corresponding characteristic image;

performing multi-feature head prediction on the feature map to obtain a first probability map, a positioning map and a circle center detection branch;

determining a corresponding second probability map according to the first probability map and the circle center detection branch;

and determining the detection result of the three-dimensional point cloud picture according to the positioning picture and the second probability picture.

2. The image detection method of claim 1, wherein the point cloud slicing processing on the three-dimensional point cloud image to obtain a corresponding point cloud slice image comprises:

converting the three-dimensional point cloud picture into a three-dimensional multilayer picture in a specified dimension direction in a projection mode;

and determining the slice point cloud picture according to the three-dimensional multilayer picture.

3. The image detection method according to claim 1, wherein the performing multi-feature-head prediction on the feature map to obtain a first probability map, a positioning map and a circle center detection branch comprises:

determining an edge frame of a real object in the three-dimensional point cloud picture;

determining pixel points in the point cloud slice image corresponding to the characteristic positions in the characteristic image, and recording the pixel points as target pixel points;

and regressing the target pixel points according to the edge frame of the real object to obtain the first probability map and the positioning map.

4. The image detection method according to claim 3,

the regression parameters of the target pixel points comprise the distance between the target pixel points and the edge frame, the edge frame of the real object comprises an upper frame, a lower frame, a left frame and a right frame, and the distance comprises the distance between the target pixel points and the left frame, the distance between the target pixel points and the right frame, the distance between the target pixel points and the upper frame and the distance between the target pixel points and the lower frame.

5. The image detection method according to claim 3,

the regression parameters of the target pixel point further include an attitude angle of an edge frame of the real object relative to the feature map.

6. The image detection method according to claim 3, wherein the generating the circle center detection branch according to the regression result of the target pixel point comprises:

calculating Euclidean distance and regularization distance between the target pixel point and the central point of the real object;

performing minimization processing on the Euclidean distance according to the excitation constant of the Euclidean distance to obtain a minimized Euclidean distance;

and determining the circle center detection branch according to the minimized Euclidean distance and the regularization distance.

7. The image detection method according to claim 6, further comprising:

judging whether the calculation result of the minimized Euclidean distance is larger than 1;

and if the calculation result of the minimized Euclidean distance is judged to be larger than 1, setting the calculation result of the minimized Euclidean distance to be 1.

8. The image detection method according to any one of claims 1 to 7, further comprising:

determining a focus loss function according to the relation between the prediction probability value and the real probability value of the probability map;

determining a smooth loss function according to the relation between the predicted positioning value and the real positioning value of the positioning diagram;

and generating a loss function for training the backbone convolutional neural network according to the focus loss function and the smooth loss function.

9. An image detection apparatus, characterized by comprising:

the slicing module is used for carrying out point cloud slicing processing on the three-dimensional point cloud image to obtain a corresponding point cloud slice image;

the convolution module is used for inputting the point cloud slice image into a backbone convolution neural network so as to obtain a corresponding characteristic image;

the prediction module is used for carrying out multi-feature head prediction on the feature map so as to obtain a first probability map, a positioning map and a circle center detection branch;

the determining module is used for determining a corresponding second probability map according to the first probability map and the circle center detection branch;

the determining module is further used for determining the detection result of the three-dimensional point cloud picture according to the positioning picture and the second probability picture.

10. An electronic device, comprising:

a processor; and

a memory for storing executable instructions of the processor;

wherein the processor is configured to perform the image detection method of any one of claims 1-8 via execution of the executable instructions.

11. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements the image detection method according to any one of claims 1 to 8.