CN113674346B

CN113674346B - Image detection method, image detection device, electronic equipment and computer readable storage medium

Info

Publication number: CN113674346B
Application number: CN202010408358.3A
Authority: CN
Inventors: 吴凯; 杨磊
Original assignee: Beijing Jingdong Qianshi Technology Co Ltd
Current assignee: Beijing Jingdong Qianshi Technology Co Ltd
Priority date: 2020-05-14
Filing date: 2020-05-14
Publication date: 2024-04-16
Anticipated expiration: 2040-05-14
Also published as: CN113674346A

Abstract

The disclosure provides an image detection method, an image detection device, electronic equipment and a computer readable storage medium, and relates to the technical field of computers. The image detection method comprises the following steps: performing point cloud slicing processing on the three-dimensional point cloud image to obtain a corresponding point cloud slice image; inputting the point cloud slice image into a backbone convolutional neural network to obtain a corresponding feature image; carrying out multi-feature head prediction on the feature map to obtain a first probability map, a positioning map and a circle center detection branch; determining a corresponding second probability map according to the first probability map and the circle center detection branch; and determining the detection result of the three-dimensional point cloud image according to the localization image and the second probability image. According to the technical scheme, point cloud slicing and pixel-by-pixel positioning regression are performed on the three-dimensional point cloud image, a novel probability image optimization method is provided, the data calculation amount of image detection is reduced, the problem that an interference frame is generated at the edge of an object is solved, and the reliability and the accuracy of image detection are improved.

Description

Image detection method, image detection device, electronic equipment and computer readable storage medium

Technical Field

The present disclosure relates to the field of image processing technologies, and in particular, to an image detection method, an image detection device, an electronic device, and a computer readable storage medium.

Background

Object detection is an important branch of computer vision, and solves the problem that in one image, targets in the image are found, categories and coordinate frames of the targets are given, and in the last years, object detection has obtained favorable results, a plurality of algorithms are proposed, and the accuracy of target detection is also higher and higher.

Existing object detection schemes are based in large part on the provision of a large number of anchors and corresponding parameters, such as size, dimension, and resolution. However, anchor-based solutions face a number of problems, such as:

(1) The setting of the anchor is very important, and the super-parameter design is difficult.

(2) Anchors have difficulty covering all shapes of targets and have poor generalization ability.

(3) A large number of anchors are usually selected to obtain a good recall rate, and the calculated amount and the memory consumption are relatively large.

(4) The edge of the real object is easy to generate an interference frame, and the detection of the blocked object can be possibly biased by the blocked object, so that the problem of low recall rate is caused.

It should be noted that the information disclosed in the above background section is only for enhancing understanding of the background of the present disclosure and thus may include information that does not constitute prior art known to those of ordinary skill in the art.

Disclosure of Invention

An object of the present disclosure is to provide an image detection method, apparatus, electronic device, and computer-readable storage medium, which overcome, at least to some extent, the problems of large calculation amount and low detection speed due to the related art.

Other features and advantages of the present disclosure will be apparent from the following detailed description, or may be learned in part by the practice of the disclosure.

According to one aspect of the present disclosure, there is provided an image detection method including: performing point cloud slicing processing on the three-dimensional point cloud image to obtain a corresponding point cloud slice image; inputting the point cloud slice image into a backbone convolutional neural network to obtain a corresponding feature image; carrying out multi-feature head prediction on the feature map to obtain a first probability map, a positioning map and a circle center detection branch; determining a corresponding second probability map according to the first probability map and the circle center detection branch; and determining the detection result of the three-dimensional point cloud image according to the localization image and the second probability image.

In one embodiment of the present disclosure, a point cloud slice process is performed on a three-dimensional point cloud image to obtain a corresponding point cloud slice image, which specifically includes the following steps: converting the three-dimensional point cloud picture into a three-dimensional multilayer picture in a specified dimension direction in a projection mode; and determining a slice point cloud picture according to the three-dimensional multilayer picture.

In one embodiment of the present disclosure, multi-feature header prediction is performed on a feature map to obtain a first probability map, a positioning map, and a circle center detection branch, and the method specifically includes the following steps: determining an edge frame of a real object in the three-dimensional point cloud picture; determining pixel points in the point cloud slice corresponding to the feature positions in the feature map, and recording the pixel points as target pixel points; and returning the target pixel points according to the edge frame of the real object to obtain a first probability map and a positioning map.

In one embodiment of the present disclosure, the regression parameters of the target pixel point include a distance between the target pixel point and the edge frame, the edge frame of the real object includes an upper frame, a lower frame, a left frame, and a right frame, and the distance includes a distance between the target pixel point and the left frame, a distance between the target pixel point and the right frame, a distance between the target pixel point and the upper frame, and a distance between the target pixel point and the lower frame.

In one embodiment of the present disclosure, the regression parameters of the target pixel points further include an attitude angle of the edge frame of the real object with respect to the feature map.

In one embodiment of the present disclosure, generating a circle center detection branch according to a regression result of a target pixel point specifically includes the following steps: calculating the Euclidean distance and regularization distance between the target pixel point and the center point of the real object; performing minimization processing on the Euclidean distance according to the excitation constant of the Euclidean distance to obtain a minimized Euclidean distance; and determining a circle center detection branch according to the minimized Euclidean distance and the regularized distance.

In one embodiment of the present disclosure, the image detection method further includes: judging whether the calculation result of the minimized Euclidean distance is larger than 1; if it is determined that the calculation result of the minimized euclidean distance is greater than 1, the calculation result of the minimized euclidean distance is set to 1.

In one embodiment of the present disclosure, the image detection method further includes: determining a focus loss function according to the relation between the predicted probability value and the true probability value of the probability map; determining a smooth loss function according to the relation between the predicted positioning value and the real positioning value of the positioning map; a loss function for training the backbone convolutional neural network is generated based on the focal loss function and the smoothing loss function.

According to another aspect of the present disclosure, there is provided an image detection apparatus including: the slicing module is used for carrying out point cloud slicing processing on the three-dimensional point cloud image so as to obtain a corresponding point cloud slice image; the convolution module is used for inputting the point cloud slice image into the backbone convolution neural network to obtain a corresponding feature image; the prediction module is used for carrying out multi-feature head prediction on the feature map so as to obtain a first probability map, a positioning map and a circle center detection branch; the determining module is used for determining a corresponding second probability map according to the first probability map and the circle center detection branch; the determining module is further used for determining a detection result of the three-dimensional point cloud image according to the positioning image and the second probability image.

According to still another aspect of the present disclosure, there is provided an electronic apparatus including: a processor; and a memory for storing executable instructions of the processor; wherein the processor is configured to perform the image detection method of any of the above via execution of the executable instructions.

According to still another aspect of the present disclosure, there is provided a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the image detection method of any one of the above.

According to the image detection scheme provided by the embodiment of the disclosure, the three-dimensional point cloud image is sliced, and the anchor-free technology is adopted to perform pixel-by-pixel positioning regression processing, so that a large number of anchors and super-parameters corresponding to the anchors are prevented from being set, the calculated amount is reduced, and the image detection speed is improved.

Furthermore, by providing a new circle center detection branch, the probability map is optimized, especially the influence of an interference frame generated by the edge of a real object is reduced, and the accuracy and recall rate of a regression result are improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description, serve to explain the principles of the disclosure. It will be apparent to those of ordinary skill in the art that the drawings in the following description are merely examples of the disclosure and that other drawings may be derived from them without undue effort.

FIG. 1 is a schematic diagram showing the structure of an image detection system in an embodiment of the present disclosure;

FIG. 2 illustrates a flow chart of an image detection method in an embodiment of the present disclosure;

FIG. 3 illustrates a flow chart of another image detection method in an embodiment of the present disclosure;

FIG. 4 illustrates a flow chart of yet another image detection method in an embodiment of the present disclosure;

FIG. 5 is a schematic diagram of an image detection scheme in the prior art;

FIG. 6 shows a schematic diagram of an image detection scheme in an embodiment of the present disclosure;

FIG. 7 illustrates a flow chart of another image detection method in an embodiment of the present disclosure;

FIG. 8 illustrates a flow chart of yet another image detection method in an embodiment of the present disclosure;

FIG. 9 illustrates a flow chart of yet another image detection method in an embodiment of the present disclosure;

FIG. 10 illustrates a flow chart of yet another image detection method in an embodiment of the present disclosure;

FIG. 11 illustrates a flow chart of yet another image detection method in an embodiment of the present disclosure;

FIG. 12 is a schematic diagram showing a functional distribution of an image detection scheme in an embodiment of the present disclosure;

FIG. 13 is a schematic diagram of an image detection interface in the prior art;

FIG. 14 illustrates a schematic diagram of an image detection interface in an embodiment of the present disclosure;

fig. 15 shows a schematic diagram of an image detection apparatus in an embodiment of the present disclosure;

FIG. 16 shows a schematic diagram of an electronic device in an embodiment of the disclosure; and

fig. 17 shows a schematic diagram of a computer-readable storage medium in an embodiment of the disclosure.

Detailed Description

Example embodiments will now be described more fully with reference to the accompanying drawings. However, the exemplary embodiments may be embodied in many forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of the example embodiments to those skilled in the art. The described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.

Furthermore, the drawings are merely schematic illustrations of the present disclosure and are not necessarily drawn to scale. The same reference numerals in the drawings denote the same or similar parts, and thus a repetitive description thereof will be omitted. Some of the block diagrams shown in the figures are functional entities and do not necessarily correspond to physically or logically separate entities. These functional entities may be implemented in software or in one or more hardware modules or integrated circuits or in different networks and/or processor devices and/or microcontroller devices.

According to the scheme, the three-dimensional point cloud image is sliced, the anchor free technology is adopted to carry out pixel-by-pixel positioning regression processing, anchors and super parameters corresponding to the anchors are not required to be set, the calculated amount is reduced, and the image detection speed is improved. For ease of understanding, several terms referred to in this application are first explained below.

Computer Vision (CV) is a science of studying how to "look" a machine, and more specifically, to replace human eyes with a camera and a Computer to perform machine Vision such as recognition, tracking and measurement on a target, and further perform graphic processing to make the Computer process into an image more suitable for human eyes to observe or transmit to an instrument to detect. As a scientific discipline, computer vision research-related theory and technology has attempted to build artificial intelligence systems that can acquire information from images or multidimensional data. Computer vision techniques typically include image processing, image recognition, image semantic understanding, image retrieval, OCR (Optical Character Recognition ), video processing, video semantic understanding, video content/behavior recognition, three-dimensional object reconstruction, 3D (three-dimensional) techniques, virtual reality, augmented reality, synchronous positioning, and map construction, and the like, as well as common biometric recognition techniques such as face recognition, fingerprint recognition, and the like.

Instead of specifying each image category of interest directly in the code, the image detection algorithm provides many examples for each image category of the computer, and then designs a learning algorithm to look at these examples and learn the visual appearance of each category. That is, a training set with a marker image is first accumulated and then input into a computer, which processes the data.

The currently more popular image classification architecture is Convolutional Neural Network (CNN): the image is fed into a network, which then sorts the image data. Convolutional neural networks begin with an input "scanner" that also does not parse all training data at once. For example, input an image of size 100 x 100, you do not need a network layer with 10000 nodes. Instead you only need to create a 10 x 10 size scan-in layer, scanning the first 10 x 10 pixels of the image. The scanner then moves one pixel to the right and scans the next 10 x 10 pixel, which is the sliding window.

The solution provided in the embodiments of the present application relates to technologies such as graphic processing and image recognition of computer vision technologies, and is specifically described by the following embodiments.

Fig. 1 shows a schematic structural diagram of an image detection system according to an embodiment of the present disclosure, including a plurality of terminals 120 and a server cluster 140.

The terminal 120 may be a mobile terminal such as a mobile phone, a game console, a tablet computer, an electronic book reader, a smart glasses, an MP4 (Moving Picture Experts Group Audio Layer IV, dynamic image expert compression standard audio layer 4) player, a smart home device, an AR (Augmented Reality ) device, a VR (Virtual Reality) device, or the terminal 120 may be a personal computer (Personal Computer, PC) such as a laptop portable computer and a desktop computer, etc.

Among them, an application program for providing image detection may be installed in the terminal 120.

The terminal 120 is connected to the server cluster 140 through a communication network. Optionally, the communication network is a wired network or a wireless network.

The server cluster 140 is a server, or is composed of several servers, or is a virtualized platform, or is a cloud computing service center. The server cluster 140 is used to provide background services for applications that provide image detection. Optionally, the server cluster 140 takes on primary computing work and the terminal 120 takes on secondary computing work; alternatively, the server cluster 140 takes on secondary computing work and the terminal 120 takes on primary computing work; alternatively, a distributed computing architecture is employed between the terminal 120 and the server cluster 140 for collaborative computing.

In some alternative embodiments, the server cluster 140 is used to store image detection information, such as images to be detected, a reference image library, and detected images.

Alternatively, the clients of the applications installed in different terminals 120 are the same, or the clients of the applications installed on both terminals 120 are clients of the same type of application of different control system platforms. The specific form of the client of the application program may also be different based on the difference of the terminal platforms, for example, the application program client may be a mobile phone client, a PC client, or a World Wide Web (Web) client.

Those skilled in the art will appreciate that the number of terminals 120 may be greater or lesser. Such as the above-mentioned terminals may be only one, or the above-mentioned terminals may be several tens or hundreds, or more. The number of terminals and the device type are not limited in the embodiment of the present application.

Optionally, the system may further comprise a management device (not shown in fig. 1), which is connected to the server cluster 140 via a communication network. Optionally, the communication network is a wired network or a wireless network.

Alternatively, the wireless network or wired network described above uses standard communication techniques and/or protocols. The network is typically the Internet, but may be any network including, but not limited to, a local area network (Local Area Network, LAN), metropolitan area network (Metropolitan Area Network, MAN), wide area network (Wide Area Network, WAN), mobile, wired or wireless network, private network, or any combination of virtual private networks. In some embodiments, data exchanged over a network is represented using techniques and/or formats including HyperText Mark-up Language (HTML), extensible markup Language (Extensible MarkupLanguage, XML), and the like. All or some of the links may also be encrypted using conventional encryption techniques such as secure sockets layer (Secure Socket Layer, SSL), transport layer security (Transport Layer Security, TLS), virtual private network (Virtual Private Network, VPN), internet protocol security (Internet ProtocolSecurity, IPsec), etc. In other embodiments, custom and/or dedicated data communication techniques may also be used in place of or in addition to the data communication techniques described above.

Next, each step of the image detection method in the present exemplary embodiment will be described in more detail with reference to the drawings and examples.

Fig. 2 shows a flowchart of an image detection method in an embodiment of the present disclosure. The methods provided by embodiments of the present disclosure may be performed by any electronic device having computing processing capabilities, such as, for example, terminal 120 and/or server cluster 140 in fig. 1. In the following illustration, the terminal 120 is exemplified as an execution subject.

As shown in fig. 2, the terminal 120 performs an image detection method including the steps of:

step S202, performing point cloud slicing processing on the three-dimensional point cloud image to obtain a corresponding point cloud slice image.

For example, the three-dimensional point cloud image is generally processed by point cloud slicing, typically, but not limited to, halving slicing in the z-axis direction with reference to the xyz coordinate system.

As shown in FIG. 3, an image slice whose height Z satisfies Z1. Ltoreq.Z < Z2 is taken as a first slice, i.e., a first pseudo two-dimensional map. And slicing the image with the height Z which is less than or equal to Z2 and less than Z3 into a second slice diagram, namely a second pseudo two-dimensional diagram. And (3) slicing the image with the height Z which is less than or equal to Z3 and less than Z4 into a third slice diagram, namely a third pseudo two-dimensional diagram. And (3) slicing the image with the height Z which is less than or equal to Z4 and less than Z5 into a fourth slice diagram, namely a fourth pseudo two-dimensional diagram.

Further, the pseudo two-dimensional graph comprises at least one object to be detected and a real object frame.

And step S204, inputting the point cloud slice into a backbone convolutional neural network to obtain a corresponding feature map.

According to the embodiment of the disclosure, the pixel-by-pixel characteristic extraction is performed on the point cloud slice image, and anchors and corresponding super parameters are not required to be set, so that the operand is reduced, and the difficulty in setting the model is also reduced. The generated characteristic diagram at least comprises H, W and C parameters which respectively represent the length, the width and the channel number.

As shown in fig. 4, the slice point cloud image is subjected to three convolution processes to obtain first convolution results f1, f2, f3, and f4, second convolution result f4, and third convolution results f6, f7, f8, and f9, but is not limited thereto.

In actual use, the final feature map of the convolution process may represent, but is not limited to, the following identification, such as f ₆ : identification of a vehicle, f ₇ : prediction of bicycles and tricycles, f ₈ : prediction of pedestrians and f ₁₀ : prediction of trucks and buses.

Step S206, multi-feature head prediction is performed on the feature map to obtain a first probability map, a positioning map and a circle center detection branch.

The present disclosure provides a way to perform multi-feature header prediction by multiplying a feature map with a 1×1 convolutional network, typically for the purpose of enlarging the feature values of the feature map, but is not limited thereto. As shown in fig. 4, the operations of generating multi-feature heads by performing 4 1x1 convolutions on f6, f7, f8 and f10 are performed, and since similar objects have size similarity on different point cloud slice pictures, the target range detected by the technical scheme of the disclosure can be well limited.

As shown in fig. 4, the results of the multi-feature header prediction include classification (classification), circle-center detection branch (circle-center), and region matrix regression (Bbox regression), where Bbox regression refers to a method of transforming a currently predicted Bbox to be closer to a ground truth box (groundtry box, groundtry box generally refers to the tag data in the supervised learning process) using a transform method.

In fig. 5 and 6, the grid represents the final feature map, point C being the center of the object (x _c ,y _c ) The point P is the position (x _p ,y _p )。

As shown in fig. 5, in the related art, in the three-dimensional object recognition process, regression calculation of k anchors is performed for each position of each feature map.

As shown in fig. 6, the (l, r, t, b) of each position relative to the real object at that point is regressed to four distances to the left, right, top and bottom, respectively, of the real object frame.

For example, there are more than 10 anchors for each pixel of the feature map based on the anchor detection method, the regression parameters of each anchor are five (the height value is not important in the automatic driving field and is ignored temporarily), and the regression parameters of each pixel are l, r, b, t and yaw, the attitude angle is five, the image detection scheme of the present disclosure is 10 times smaller than the parameter calculation based on the anchor method.

Wherein l is the distance between the P point and the left frame of the real frame, r is the distance between the P point and the right frame of the real frame, t is the distance between the P point and the upper frame of the real frame, and b is the distance between the P point and the lower frame of the real frame.

In addition, since each feature map pixel produces a regression result of the box, screening by probability maps is required. The probability map is the same as the localization map in size, and the number of channels (channels) varies according to the method. c ⁱ C, i e {1, …, len (C) }, C representing all classes to be detected, e.g. car, person, bicycle and tricycle, C ⁱ Representing class i, len (C) represents how many kinds of objects in common need to be detected. Therefore, based on the image detection method of the present disclosure, the size of the probability map is h×w×len (c+1).

Step S208, determining a corresponding second probability map according to the first probability map and the circle center detection branch.

Although the pixel-by-pixel regression architecture can solve the problem of inaccurate regression frames between different objects due to occlusion, detection is inaccurate due to a low quality recognition frame detected at the edge of the object. The detection frames at the edges of the object are typically less accurate in category identification and regression than the frames detected around the center of the object, but are not filtered out due to the smaller cross-over.

The first probability map is optimized by providing a new circle center detection branch, and the image density attenuation is optimized based on the circle center detection branch, so that even if the center point is smoothly and truly attenuated to the edge point, the reliability and the accuracy of object detection are effectively improved.

Step S210, determining a detection result of the three-dimensional point cloud image according to the localization image and the second probability image.

As shown in fig. 7, in an embodiment of the present disclosure, step S202 performs a point cloud slice process on a three-dimensional point cloud image to obtain a corresponding point cloud slice image, and specifically includes the following steps:

in step S2022, the three-dimensional point cloud image is converted into a three-dimensional multi-layer image with a specified dimension direction in a projection manner.

Wherein, if the three-dimensional point cloud image is placed under an xyz-axis coordinate system, the designated dimension may be one of a z-axis direction, an x-axis direction, and a y-axis direction, but is not limited thereto.

In addition, for the number of point cloud slices, which is typically determined by projection density, such as determining that the point cloud slice processing is performed in the z-axis direction, the number of slices is determined based on the pixel density projected on the x-y plane.

Step S2024, determining a slice point cloud image according to the three-dimensional multi-layer picture.

As shown in fig. 8, in an embodiment of the present disclosure, step S206 performs multi-feature header prediction on a feature map to obtain a first probability map, a positioning map, and a circle center detection branch, and specifically includes the following steps:

in step S2062, an edge frame of a real object in the three-dimensional point cloud image is determined.

In step S2064, the pixel point in the point cloud slice corresponding to the feature position in the feature map is determined and is recorded as the target pixel point.

In step S2066, the target pixel is regressed according to the edge frame of the real object to obtain the first probability map and the localization map.

In one embodiment of the present disclosure, the attitude angle of the real frame is yaw, the regression parameters of the target pixel point include a distance between the target pixel point and the edge frame, the edge frame of the real object includes an upper frame, a lower frame, a left frame and a right frame, and the distance includes a distance l between the target pixel point and the left frame, a distance r between the target pixel point and the right frame, a distance t between the target pixel point and the upper frame, and a distance b between the target pixel point and the lower frame.

As shown in fig. 9, in one embodiment of the present disclosure, the circle center detection branch is generated according to the regression result of the target pixel point, and specifically includes the following steps:

in step S2068, the euclidean distance and regularization distance between the target pixel point and the center point of the real object are calculated.

Therefore, the novel circle center detection branch provided by the present disclosure can simultaneously draw the Euclidean distance and regularized distance from the point P to the center of the detection point,

in step S20610, the euclidean distance is minimized according to the excitation constant of the euclidean distance to obtain a minimized euclidean distance.

In one embodiment of the present disclosure, the excitation constant has a value range of 6±1, but is not limited thereto, and further, the rapid increase of the center region may be limited by the minimized euclidean distance, so that the attenuation amplitude is normal.

In step S20612, the center detection branch is determined based on the minimized euclidean distance and the regularized distance.

Since the minimized Euclidean distance does not allow the center region to exhibit attenuation, a regularized distance is introduced, i.e., a combination of minimized Euclidean distance and regularized distance to achieve normal attenuation from center to periphery.

In one embodiment of the present disclosure, euclidean _limit For the minimized Euclidean distance, alpha is an excitation constant, center is a circle center detection branch, l is the distance between a P point and the left frame of a real frame, r is the distance between the P point and the right frame of the real frame, t is the distance between the P point and the upper frame of the real frame, b is the distance between the P point and the lower frame of the real frame, and the expression of the circle center detection branch can adopt the following modes:

as shown in fig. 10, in one embodiment of the present disclosure, the image detection method further includes:

step S212, judging whether the calculation result of the minimized Euclidean distance is larger than 1.

In step S214, if it is determined that the calculation result of the minimized euclidean distance is greater than 1, the calculation result of the minimized euclidean distance is set to 1.

Among them, since the minimized euclidean distance is applied to the probability map, it is meaningless to be greater than 1.

As shown in fig. 11, in one embodiment of the present disclosure, the image detection method further includes:

step S216, determining a focus loss function according to the relation between the predicted probability value and the true probability value of the probability map.

In embodiments of the present disclosure, the accuracy of the subsequent output probability map of the convolutional neural network may be optimized by the focus loss function described above.

Step S218, determining a smooth loss function according to the relation between the predicted positioning value and the real positioning value of the positioning map.

In embodiments of the present disclosure, the accuracy of the subsequent output localization map of the convolutional neural network may be optimized by the above-described smoothing loss function.

Step S220, generating a loss function for training the backbone convolution neural network according to the focus loss function and the smooth loss function.

In the embodiment of the disclosure, the convolutional neural network is trained by combining the focus loss function and the smooth loss function, so that the trained convolutional neural network outputs more accurate and reliable regression results.

As shown in fig. 12, a line L1 is an image density distribution generated based on the euclidean distance, a line L2 is an image density distribution generated based on the regularized distance, and a line L3 is an image density distribution generated based on the center detection branch of the embodiment of the present disclosure.

The result of the visualization of the attenuation probability using a real object as, for example, an object having a length of 20cm and a width of 30cm is shown in fig. 13 and 14. Fig. 13 shows a real object D1, and fig. 14 shows a result D2 of attenuation after the center detection branch processing, that is, attenuation in a circular shape from the center to the periphery of the image.

According to an embodiment of the disclosure, the circle center detection branch is also very simple to use, and is directly multiplied by the first probability map to weaken the probability of the edge part of the object, so that an optimized probability map, namely a second probability map, is generated.

It is noted that the above-described figures are only schematic illustrations of processes involved in a method according to an exemplary embodiment of the invention, and are not intended to be limiting. It will be readily appreciated that the processes shown in the above figures do not indicate or limit the temporal order of these processes. In addition, it is also readily understood that these processes may be performed synchronously or asynchronously, for example, among a plurality of modules.

Those skilled in the art will appreciate that the various aspects of the invention may be implemented as a system, method, or program product. Accordingly, aspects of the invention may be embodied in the following forms, namely: an entirely hardware embodiment, an entirely software embodiment (including firmware, micro-code, etc.) or an embodiment combining hardware and software aspects may be referred to herein as a "circuit," module "or" system.

An image detection apparatus 900 according to this embodiment of the present invention is described below with reference to fig. 15. The image detection apparatus 900 shown in fig. 15 is only an example, and should not impose any limitation on the functions and the scope of use of the embodiment of the present invention.

The image detection apparatus 900 is represented in the form of a hardware module. The components of image detection device 900 may include, but are not limited to: the slicing module 902 is configured to perform a point cloud slicing process on the three-dimensional point cloud image to obtain a corresponding point cloud slice image; the convolution module 904 is configured to input the point cloud slice map into a backbone convolution neural network to obtain a corresponding feature map; a prediction module 906, configured to perform multi-feature head prediction on the feature map to obtain a first probability map, a positioning map, and a circle center detection branch; a determining module 908, configured to determine a corresponding second probability map according to the first probability map and the circle center detection branch; the determining module 908 is further configured to determine a detection result of the three-dimensional point cloud image according to the localization map and the second probability map.

In one embodiment of the present disclosure, the slicing module 902 is further configured to: converting the three-dimensional point cloud picture into a three-dimensional multilayer picture in a specified dimension direction in a projection mode; and determining a slice point cloud picture according to the three-dimensional multilayer picture.

In one embodiment of the present disclosure, the prediction module 906 is further configured to: determining an edge frame of a real object in the three-dimensional point cloud picture; determining pixel points in the point cloud slice corresponding to the feature positions in the feature map, and recording the pixel points as target pixel points; and returning the target pixel points according to the edge frame of the real object to obtain a first probability map and a positioning map.

In one embodiment of the present disclosure, the prediction module 906 is further configured to: calculating the Euclidean distance and regularization distance between the target pixel point and the center point of the real object; performing minimization processing on the Euclidean distance according to the excitation constant of the Euclidean distance to obtain a minimized Euclidean distance; and determining a circle center detection branch according to the minimized Euclidean distance and the regularized distance.

In one embodiment of the present disclosure, the image detection apparatus further includes: a judging module 910, configured to judge whether the calculation result of the minimized euclidean distance is greater than 1; if it is determined that the calculation result of the minimized euclidean distance is greater than 1, the calculation result of the minimized euclidean distance is set to 1.

In one embodiment of the present disclosure, the image detection apparatus further includes: a training module 912 for determining a focus loss function according to a relationship between the predicted probability value and the true probability value of the probability map; determining a smooth loss function according to the relation between the predicted positioning value and the real positioning value of the positioning map; a loss function for training the backbone convolutional neural network is generated based on the focal loss function and the smoothing loss function.

An electronic device 1000 according to this embodiment of the present invention is described below with reference to fig. 15. The electronic device 1000 shown in fig. 15 is merely an example, and should not be construed as limiting the functionality and scope of use of embodiments of the present invention.

As shown in fig. 15, the electronic device 1000 is embodied in the form of a general purpose computing device. Components of electronic device 1000 may include, but are not limited to: the at least one processing unit 1010, the at least one memory unit 1020, and a bus 1030 that connects the various system components, including the memory unit 1020 and the processing unit 1010.

Wherein the storage unit stores program code that is executable by the processing unit 1010 such that the processing unit 1010 performs steps according to various exemplary embodiments of the present invention described in the above section of the "exemplary method" of the present specification. For example, the processing unit 1010 may perform steps S202, S204, S206, S2010, and S210 as shown in fig. 1, as well as other steps defined in the image detection method of the present disclosure.

The memory unit 1020 may include readable media in the form of volatile memory units such as Random Access Memory (RAM) 10201 and/or cache memory unit 10202, and may further include Read Only Memory (ROM) 10203.

The storage unit 1020 may also include a program/utility 10204 having a set (at least one) of program modules 10205, such program modules 10205 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each or some combination of which may include an implementation of a network environment.

Bus 1030 may be representing one or more of several types of bus structures including a memory unit bus or memory unit controller, a peripheral bus, an accelerated graphics port, a processing unit, or a local bus using any of a variety of bus architectures.

The electronic device 1000 may also be in communication with one or more external devices 700 (e.g., keyboard, pointing device, bluetooth device, etc.), with one or more devices that enable a user to interact with the electronic device, and/or with any device (e.g., router, modem, etc.) that enables the electronic device 1000 to communicate with one or more other computing devices. Such communication may occur through an input/output (I/O) interface 1050. Also, electronic device 1000 can communicate with one or more networks such as a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the Internet, through network adapter 1060. As shown, the network adapter 1060 communicates with other modules of the electronic device 1000 over the bus 1030. It should be appreciated that although not shown, other hardware and/or software modules may be used in connection with an electronic device, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, data backup storage systems, and the like.

From the above description of embodiments, those skilled in the art will readily appreciate that the example embodiments described herein may be implemented in software, or may be implemented in software in combination with the necessary hardware. Thus, the technical solution according to the embodiments of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (may be a CD-ROM, a U-disk, a mobile hard disk, etc.) or on a network, including several instructions to cause a computing device (may be a personal computer, a server, a terminal device, or a network device, etc.) to perform the method according to the embodiments of the present disclosure.

In an exemplary embodiment of the present disclosure, a computer-readable storage medium having stored thereon a program product capable of implementing the method described above in the present specification is also provided. In some possible embodiments, the aspects of the invention may also be implemented in the form of a program product comprising program code for causing a terminal device to carry out the steps according to the various exemplary embodiments of the invention as described in the "exemplary method" section of this specification, when the program product is run on the terminal device.

Referring to fig. 17, a program product 1100 for implementing the above-described method according to an embodiment of the present invention is described, which may employ a portable compact disc read only memory (CD-ROM) and include program code, and may be run on a terminal device, such as a personal computer. However, the program product of the present invention is not limited thereto, and in this document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

The computer readable signal medium may include a data signal propagated in baseband or as part of a carrier wave with readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Program code for carrying out operations of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device, partly on a remote computing device, or entirely on the remote computing device or server. In the case of remote computing devices, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., connected via the Internet using an Internet service provider).

It should be noted that although in the above detailed description several modules or units of a device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functionality of two or more modules or units described above may be embodied in one module or unit in accordance with embodiments of the present disclosure. Conversely, the features and functions of one module or unit described above may be further divided into a plurality of modules or units to be embodied.

Furthermore, although the steps of the methods in the present disclosure are depicted in a particular order in the drawings, this does not require or imply that the steps must be performed in that particular order or that all illustrated steps be performed in order to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step to perform, and/or one step decomposed into multiple steps to perform, etc.

From the above description of embodiments, those skilled in the art will readily appreciate that the example embodiments described herein may be implemented in software, or may be implemented in software in combination with the necessary hardware. Thus, the technical solution according to the embodiments of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (may be a CD-ROM, a U-disk, a mobile hard disk, etc.) or on a network, including several instructions to cause a computing device (may be a personal computer, a server, a mobile terminal, or a network device, etc.) to perform the method according to the embodiments of the present disclosure.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any adaptations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

Claims

1. An image detection method, comprising:

performing point cloud slicing processing on the three-dimensional point cloud image to obtain a corresponding point cloud slice image;

inputting the point cloud slice image into a backbone convolutional neural network to obtain a corresponding feature image;

performing multi-feature head prediction on the feature map to obtain a first probability map, a positioning map and a circle center detection branch, wherein the multi-feature head prediction comprises the following steps:

determining an edge frame of a real object in the three-dimensional point cloud picture;

determining pixel points in the point cloud slice corresponding to the feature positions in the feature map, and recording the pixel points as target pixel points;

returning the target pixel points according to the edge frame of the real object to obtain the first probability map and the positioning map;

calculating Euclidean distance and regularization distance between the target pixel point and the center point of the real object;

performing minimization treatment on the Euclidean distance according to the excitation constant of the Euclidean distance to obtain a minimized Euclidean distance;

determining the circle center detection branch according to the minimized Euclidean distance and the regularized distance;

determining a corresponding second probability map according to the first probability map and the circle center detection branch;

And determining the detection result of the three-dimensional point cloud image according to the localization image and the second probability image.

2. The method of claim 1, wherein performing a point cloud slice process on the three-dimensional point cloud image to obtain a corresponding point cloud slice image comprises:

converting the three-dimensional point cloud picture into a three-dimensional multilayer picture in a specified dimension direction in a projection mode;

and determining the point cloud slice according to the three-dimensional multilayer picture.

3. The image detection method according to claim 1, wherein,

the regression parameters of the target pixel point comprise distances between the target pixel point and the edge frame, the edge frame of the real object comprises an upper frame, a lower frame, a left frame and a right frame, and the distances comprise distances between the target pixel point and the left frame, distances between the target pixel point and the right frame, distances between the target pixel point and the upper frame and distances between the target pixel point and the lower frame.

4. The image detection method according to claim 1, wherein,

the regression parameters of the target pixel points further comprise attitude angles of the edge frames of the real objects relative to the feature map.

5. The image detection method according to claim 1, characterized by further comprising:

judging whether the calculation result of the minimized Euclidean distance is more than 1;

and if the calculation result of the minimized Euclidean distance is judged to be larger than 1, setting the calculation result of the minimized Euclidean distance to be 1.

6. The image detection method according to any one of claims 1 to 5, further comprising:

determining a focus loss function according to the relation between the predicted probability value and the true probability value of the probability map;

determining a smooth loss function according to the relation between the predicted positioning value and the real positioning value of the positioning map;

and generating a loss function for training the backbone convolutional neural network according to the focus loss function and the smooth loss function.

7. An image detection apparatus, comprising:

the slicing module is used for carrying out point cloud slicing processing on the three-dimensional point cloud image so as to obtain a corresponding point cloud slice image;

the convolution module is used for inputting the point cloud slice image into a backbone convolution neural network to obtain a corresponding feature image;

the prediction module is configured to perform multi-feature head prediction on the feature map to obtain a first probability map, a positioning map and a circle center detection branch, and includes:

the determining module is used for determining a corresponding second probability map according to the first probability map and the circle center detection branch;

the determining module is further configured to determine a detection result of the three-dimensional point cloud image according to the positioning image and the second probability image.

8. An electronic device, comprising:

a processor; and

a memory for storing executable instructions of the processor;

wherein the processor is configured to perform the image detection method of any one of claims 1 to 6 via execution of the executable instructions.

9. A computer-readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the image detection method according to any one of claims 1 to 6.