CN115131314A - Target detection method, device, equipment and storage medium - Google Patents

Target detection method, device, equipment and storage medium Download PDF

Info

Publication number
CN115131314A
CN115131314A CN202210746451.4A CN202210746451A CN115131314A CN 115131314 A CN115131314 A CN 115131314A CN 202210746451 A CN202210746451 A CN 202210746451A CN 115131314 A CN115131314 A CN 115131314A
Authority
CN
China
Prior art keywords
image
detection frame
detection
target object
determining
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210746451.4A
Other languages
Chinese (zh)
Inventor
邹智康
叶晓青
孙昊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202210746451.4A priority Critical patent/CN115131314A/en
Publication of CN115131314A publication Critical patent/CN115131314A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10028Range image; Depth image; 3D point clouds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Multimedia (AREA)
  • Quality & Reliability (AREA)
  • Image Analysis (AREA)

Abstract

The present disclosure provides a target detection method, apparatus, device and storage medium, which relate to the technical field of artificial intelligence, specifically to the technical fields of image processing, computer vision, virtual reality, deep learning, and the like, and can be applied to scenes such as 3D vision, smart cities, intelligent transportation, and the like. The method comprises the following steps: acquiring an image to be detected; dividing an image to be detected into at least two image blocks based on the depth information, wherein the at least two image blocks are overlapped; performing target detection on the image blocks aiming at each of the at least two image blocks to obtain detection frames corresponding to target objects in the image blocks; and in response to determining that the same target object corresponds to at least one detection frame, determining a target detection frame of the target object based on the position of the at least one detection frame. The target detection method provided by the disclosure improves the efficiency and accuracy of target detection.

Description

Target detection method, device, equipment and storage medium
Technical Field
The utility model relates to an artificial intelligence technical field, concretely relates to image processing, computer vision, virtual reality and technical field such as deep learning, can be applied to scenes such as 3D vision, wisdom city, intelligent transportation.
Background
Monocular 3D (3-Dimension) target Detection (Monocular 3D Object Detection) is one of the most widely used computer vision technologies, and the technology can be applied to the fields of vehicle automatic driving systems, intelligent robots, intelligent transportation and the like. At present, monocular 3D target detection strongly depends on a single model to estimate the 3D attribute of a target object in the whole scene, different models have different scene perception capabilities under different distances, and a special division mode is not added, so that the scene in a distance range is easily overfitted, and the performance of the 3D target detection is influenced.
Disclosure of Invention
The disclosure provides a target detection method, a target detection device, a target detection apparatus and a storage medium.
According to a first aspect of the present disclosure, there is provided an object detection method, including: acquiring an image to be detected; dividing an image to be detected into at least two image blocks based on the depth information, wherein the at least two image blocks are overlapped; performing target detection on the image blocks aiming at each of at least two image blocks to obtain a detection frame corresponding to a target object in the image blocks; and in response to determining that the same target object corresponds to at least one detection frame, determining a target detection frame of the target object based on the position of the at least one detection frame.
According to a second aspect of the present disclosure, there is provided an object detection apparatus comprising: an acquisition module configured to acquire an image to be detected; a dividing module configured to divide an image to be detected into at least two image blocks based on depth information, wherein an overlap exists between the at least two image blocks; the detection module is configured to perform target detection on the image blocks aiming at each of the at least two image blocks to obtain detection frames corresponding to target objects in the image blocks; the determining module is configured to determine a target detection frame of the target object based on a position of the at least one detection frame in response to determining that the same target object corresponds to the at least one detection frame.
According to a third aspect of the present disclosure, there is provided an electronic device comprising: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method as described in any one of the implementations of the first aspect.
According to a fourth aspect of the present disclosure, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing a computer to perform the method as described in any one of the implementations of the first aspect.
According to a fifth aspect of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements the method as described in any of the implementation manners of the first aspect.
It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.
Drawings
The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:
FIG. 1 is an exemplary system architecture diagram in which the present disclosure may be applied;
FIG. 2 is a flow diagram of one embodiment of a target detection method according to the present disclosure;
FIG. 3 is a flow diagram of another embodiment of a target detection method according to the present disclosure;
FIG. 4 is a flow chart of yet another embodiment of a target detection method according to the present disclosure;
FIG. 5 is a diagram of an application scenario of the object detection method according to the present disclosure;
FIG. 6 is a schematic structural diagram of one embodiment of an object detection apparatus according to the present disclosure;
fig. 7 is a block diagram of an electronic device for implementing the object detection method of an embodiment of the present disclosure.
Detailed Description
Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
It should be noted that, in the present disclosure, the embodiments and features of the embodiments may be combined with each other without conflict. The present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.
Fig. 1 illustrates an exemplary system architecture 100 to which embodiments of the object detection method or object detection apparatus of the present disclosure may be applied.
As shown in fig. 1, the system architecture 100 may include terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the terminal devices 101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.
A user may use the terminal devices 101, 102, 103 to interact with the server 105 via the network 104 to receive or transmit information or the like. Various client applications may be installed on the terminal devices 101, 102, 103.
The terminal devices 101, 102, 103 may be hardware or software. When the terminal devices 101, 102, 103 are hardware, they may be various electronic devices including, but not limited to, smart phones, tablet computers, laptop portable computers, desktop computers, and the like. When the terminal apparatuses 101, 102, 103 are software, they can be installed in the above-described electronic apparatuses. It may be implemented as multiple pieces of software or software modules, or as a single piece of software or software module. And is not particularly limited herein.
The server 105 may provide various services. For example, the server 105 may analyze and process the image to be detected acquired from the terminal apparatuses 101, 102, 103, and generate a processing result (e.g., a target detection frame of the target object).
The server 105 may be hardware or software. When the server 105 is hardware, it may be implemented as a distributed server cluster composed of a plurality of servers, or may be implemented as a single server. When the server 105 is software, it may be implemented as multiple pieces of software or software modules (e.g., to provide distributed services), or as a single piece of software or software module. And is not particularly limited herein.
It should be noted that the object detection method provided by the embodiment of the present disclosure is generally executed by the server 105, and accordingly, the object detection device is generally disposed in the server 105.
It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
With continued reference to fig. 2, a flow 200 of one embodiment of a target detection method according to the present disclosure is shown. The target detection method comprises the following steps:
step 201, an image to be detected is obtained.
In this embodiment, the subject (e.g., the server 105 shown in fig. 1) performing the target detection method acquires an image to be detected. The image to be detected in this embodiment may be acquired by a monocular camera, and the monocular camera may send the acquired image to be detected to the execution main body in real time, or the monocular camera may send the acquired image to be detected to the execution main body according to a preset time interval.
Step 202, dividing the image to be detected into at least two image blocks based on the depth information.
In this embodiment, the executing body divides the image to be detected into at least two image blocks based on the depth information, wherein at least two image blocks have an overlap therebetween. The depth information here refers to a depth range, i.e. a perception range of the image to be detected, for example 0-30 meters, 30-50 meters, etc. For example, the executing entity may divide the image to be detected based on a distance adaptive policy to obtain a plurality of image blocks in different depth ranges, and the executing entity may divide the image to be detected on the vertical axis of the image. Generally, the execution subject divides the image to be detected according to three depth ranges, namely a near range, a middle range and a far range.
For example, the execution subject determines that the overall sensing range of the image to be detected is 70 meters, and then the execution subject divides the entire space of the image to be detected into three near, medium and long-distance image blocks, namely 0-30 meters, 30-50 meters and 50-70 meters, and the obtained 3 image blocks are overlapped, so that edge information of the image is not lost when the image is segmented, and the integrity of the image information is ensured. It should be noted that since an object included in a close place occupies a larger pixel position in an image, a lower image block (i.e., an image block having a minimum depth range of 0 to 30 m) is larger.
Optionally, in some scenes, in order to ensure consistency between the pixel proportion of the divided image blocks and the pixel proportion of the undivided image, the executing entity may divide the image to be detected into N × N image blocks, that is, divide the image to be detected into N blocks on the vertical axis, and then divide the image to be detected into N blocks on the horizontal axis, so as to obtain N × N image blocks, where N is a positive integer.
Step 203, performing target detection on the image blocks for each of the at least two image blocks to obtain a detection frame corresponding to the target object in the image block.
In this embodiment, for each of the at least two divided image blocks, the executing body detects a target object in the image block, so as to obtain a detection frame corresponding to the target object, where the target object may be any obstacle, such as a vehicle, a pedestrian, and the like. For example, the executing entity may first obtain depth range information of each image block, and input the image block into the detection model corresponding to the depth range, so that the detection model detects the target object in the image block, thereby obtaining the detection frame corresponding to the target object. Here, the corresponding detection model is trained in advance for each depth range, that is, for the depth range of 0 to 30 meters, training data of the depth range is collected and trained by using the training data to obtain the corresponding detection model. Because the distance range covered by the image scene to be detected is large, it is difficult to directly predict the 3D attributes of all objects in the scene by using a single model, and the regression value is often not very accurate. Therefore, in the embodiment, the regression task is converted into the classification task, the image to be detected is classified and divided firstly, and then different detection models are used for 3D detection respectively, so that the accuracy of the detection result is improved.
In response to determining that the same target object corresponds to at least one detection box, step 204 determines a target detection box of the target object based on a position of the at least one detection box.
In this embodiment, since different image blocks are overlapped and each image block is detected by using the corresponding detection model, the overlapped portion of different image blocks has a plurality of different detection frames, that is, the same target object corresponds to a plurality of detection frames. Furthermore, even for the same target object in the same image block, it may correspond to a plurality of detection frames. Based on the position information of the detection frames, the execution body determines a target detection frame from the detection frames for the target object.
For example, if multiple detection frames corresponding to the same target object are in the same image block, the target detection frame may be determined based on the confidence values of the detection frames. If a plurality of detection frames corresponding to the same target object are located in different image blocks, the execution subject determines a target detection frame of the target object from the plurality of detection frames based on a Non-Maximum Suppression (NMS) method. The NMS algorithm is widely applied to a target detection scenario, and aims to find an optimal detection box in order to eliminate redundant candidate boxes.
The target detection method provided by the embodiment of the disclosure includes the steps of firstly, obtaining an image to be detected; then dividing the image to be detected into at least two image blocks based on the depth information; then, target detection is carried out on each image block in at least two image blocks to obtain a detection frame corresponding to a target object in the image block; and finally, in response to the fact that the same target object corresponds to at least one detection frame, determining a target detection frame of the target object based on the position of the at least one detection frame. According to the target detection method in the embodiment, the image to be detected is divided according to the depth range, and then detection is performed in different distance ranges, so that the precision of 3D target detection is improved, and the accuracy of a 3D target detection result is also improved.
In the technical scheme of the disclosure, the collection, storage, use, processing, transmission, provision, disclosure and other processing of the personal information of the related user are all in accordance with the regulations of related laws and regulations and do not violate the good customs of the public order.
With continued reference to fig. 3, fig. 3 illustrates a flow 300 of another embodiment of a target detection method according to the present disclosure. The target detection method comprises the following steps:
step 301, obtaining an image to be detected.
Step 302, dividing the image to be detected into at least two image blocks based on the depth information.
Step 303, performing target detection on the image block for each of the at least two image blocks to obtain a detection frame corresponding to the target object in the image block.
The steps 301-.
Step 304, in response to determining that at least one detection frame corresponding to the same target object is in the same image block, determining the confidence of each detection frame, and taking the detection frame with the highest confidence value as the target detection frame of the target object.
In this embodiment, if it is determined that at least one detection frame corresponding to the same target object is located in the same image block, an executing entity (e.g., the server 105 shown in fig. 1) of the target detection method determines the target detection frame based on the confidence value of each detection frame. That is, the executing body obtains the confidence value of each detection frame, and then takes the detection frame with the highest confidence value as the target detection frame of the target object. When the image block is detected by using the detection model, the detection model also outputs a confidence value of each detection frame, and the execution main body can directly acquire the confidence value of each detection frame so as to determine the target detection frame based on the confidence value. Thereby ensuring the accuracy of the target detection frame.
Step 305, in response to determining that at least one detection frame corresponding to the same target object is located in different image blocks where overlapping exists, determining a target detection frame of the target object from the at least one detection frame based on a non-maximum suppression method.
In this embodiment, if it is determined that at least one detection frame corresponding to the same target object is located in different image blocks with overlapping, the execution subject determines the target detection frame of the target object from the plurality of detection frames based on a non-maximum suppression method. The NMS algorithm is widely applied to target detection scenarios, and aims to find an optimal detection box in order to eliminate redundant candidate boxes. Thereby ensuring the accuracy of the target detection frame.
In some optional implementations of this embodiment, step 305 includes: for each detection frame in at least one detection frame, determining the distance weight of the detection frame based on the depth information of the image block where the detection frame is located and the position information of the detection frame in the image block; determining the score of the detection frame based on the confidence coefficient and the distance weight of the detection frame; and taking the detection frame with the highest score in the at least one detection frame as the target detection frame of the target object.
In this implementation, since the plurality of detection frames corresponding to the same target object are located in the plurality of overlapping image blocks, the executing entity may first determine depth information of the image block where each detection frame is located, determine position information of the detection frame of the target object in the image block, and set a distance weight for the detection frame based on the depth information and the position information. Generally, since the near perception capability is relatively more accurate, a higher distance weight is set for the detection box that is near and larger in occupation. The distance weight is then multiplied by the confidence value of the detection box to obtain a final score for the detection box. And finally, taking the detection frame with the highest score in the at least one detection frame as the target detection frame of the target object. Therefore, the determined target detection frame can be ensured to have stronger detection capability.
As an example, it is assumed that an image Y is divided by depth information to obtain a short-distance image block a and a middle-distance image block B, the short-distance image block a and the middle-distance image block B overlap, a detection box of the target object X in the short-distance image block a is a1, a proportion of a1 in a is 60%, and a confidence of a1 is 0.9. The detection box of the target object X in the middle-distance image block B is B1, the proportion of B1 in B is 30%, and the confidence of B1 is 0.6. Since the image blocks of a1 are located closer and a1 is larger, the executing entity sets the distance weight of a1 to 1.2 and the distance weight of B1 to 1.0. Therefore, a score of 1.2 × 0.9 to 1.08 for a1 and a score of 1.0 × 0.6 to 0.6 for B1 can be calculated. Therefore, the execution body may use a1 as the target detection frame of the target object X.
As can be seen from fig. 3, compared with the embodiment corresponding to fig. 2, the method for detecting the target in the embodiment highlights the process of determining the target detection frame, thereby ensuring that the determined target detection frame has better detection capability and improving the accuracy of the target detection result.
With continued reference to fig. 4, fig. 4 illustrates a flow 400 of yet another embodiment of a target detection method according to the present disclosure. The target detection method comprises the following steps:
step 401, obtaining an image to be detected.
And 402, inputting the image to be detected into a division model, and outputting to obtain at least two divided image blocks.
In this embodiment, an execution subject of the target detection method (for example, the server 105 shown in fig. 1) inputs an image to be detected into a division model, and outputs the image to be detected to obtain at least two divided image blocks, where the division model is used to divide the image to be detected into the at least two image blocks according to different depth information. That is, in this embodiment, the partition model is trained in advance, so that the partition model can divide the image to be detected into a plurality of image blocks according to the depth information. The division model has the capability of dividing the image according to the depth information interval, so that the division model is used for dividing, the division efficiency can be improved, and the rationality of the division result can be ensured.
In some optional implementations of this embodiment, the partition model is trained based on the following steps: acquiring a training sample set, wherein training samples in the training sample set comprise sample images and sub-images corresponding to the sample images, and the sub-images are obtained by dividing the sample images according to depth information; and taking the sample image as input, taking the sub-image corresponding to the sample image as output, and training an initial division model to obtain a division model after training.
In this implementation, the execution subject may first obtain the sample image and the plurality of sub-images corresponding to each sample image, and use the sample image and the sub-images corresponding to the sample image as training samples to obtain a training sample set, where the sample image may be divided into the plurality of sub-images according to different depth information by using an artificial division method. And then, the execution subject takes the sample image as input and takes the sub-image corresponding to the sample image as output to train the initial division model, so as to obtain the trained division model. The division model obtained by training in the mode can quickly and accurately divide the image to be detected according to the depth range.
Step 403, for each of the at least two image blocks, determining depth information of the image block.
In this embodiment, for each of at least two image blocks, the executing entity may first determine the depth information of the image block. Since the image to be detected is divided according to the depth information when being divided, the executing entity may obtain the depth information when each image block is divided again.
Step 404, inputting the image block into the detection model corresponding to the depth information, and outputting to obtain a detection frame corresponding to the target object in the image block.
In this embodiment, after determining the depth information of each image block, the executing entity determines a detection model corresponding to the depth information and inputs the image block into the detection model, so as to obtain a detection frame corresponding to a target object in the image block. Here, a plurality of detection models corresponding to the depth information are trained in advance, so that each image block can be accurately detected, and the accuracy of the detection result is improved.
In some optional implementations of this embodiment, the detection model is trained based on the following steps: acquiring a training data set, wherein training data in the training data set comprises an original image and a detection frame corresponding to the original image, the training data in the training data set has the same depth information, and the detection frame is obtained by labeling a target object in the original image; and taking the original image as input, taking the detection frame corresponding to the original image as output, training an initial detection model, and obtaining the trained detection model.
In this implementation manner, the executing entity may first obtain the original images and the detection frames corresponding to each original image, and use the original images and the detection frames corresponding to the original images as training data to obtain a training data set, where a manual labeling manner may be adopted to label the target object in the original images, so as to obtain the corresponding detection frames. And then, the executing body takes the original image as input and takes the detection frame corresponding to the original image as output to train the initial detection model, so as to obtain the trained detection model. The detection model obtained through the training in the mode can quickly and accurately detect the target object in the image block.
Step 405, in response to determining that at least one detection frame corresponding to the same target object is located in the same image block, respectively calculating the confidence of each detection frame, and using the detection frame with the highest confidence value as the target detection frame of the target object.
And step 406, in response to determining that at least one detection frame corresponding to the same target object is located in different image blocks with overlapping, determining a target detection frame of the target object from the at least one detection frame based on a non-maximum suppression method.
The steps 405 and 406 are substantially the same as the steps 304 and 305 of the foregoing embodiment, and the specific implementation manner can refer to the foregoing description of the steps 304 and 305, which is not described herein again.
As can be seen from fig. 4, compared with the embodiment corresponding to fig. 3, in the target detection method in this embodiment, the step of dividing the image to be detected and the step of determining the detection frame corresponding to the target object in the image block are highlighted, so that the efficiency and accuracy of 3D target detection are further improved.
With further reference to fig. 5, fig. 5 illustrates an application scenario of the object detection method according to the present disclosure. In the application scene, the executing main body inputs the image to be detected into the distance segmentation network, so that the distance segmentation network divides the image to be detected according to the depth information to obtain 3 image blocks after division, wherein the 3 image blocks comprise three image blocks at a short distance, three image blocks at a medium distance and three image blocks at a long distance, and the 3 image blocks are overlapped, thereby ensuring that edge information of the image is not lost when the image is divided, and showing that the image blocks at the lower part are larger because an obstacle contained at the close distance occupies a larger pixel position in the image.
Then, the executing body inputs each image block into the corresponding sensing network, so as to obtain a detection frame corresponding to the target object in each image block. For example, three near-distance image blocks are input into the near-distance sensing network, three middle-distance image blocks are input into the middle-distance sensing network, and three long-distance image blocks are input into the long-distance sensing network.
And finally, distance perception weighting is carried out based on an NMS strategy, namely, a higher weight value is set for the detection box of the near distance because the near perception capability is relatively more accurate, then the 3D detection boxes obtained by all the distance networks are combined together for NMS, namely, the final target detection box is determined based on the weight and the confidence value of each 3D detection box.
With further reference to fig. 6, as an implementation of the methods shown in the above figures, the present disclosure provides an embodiment of an object detection apparatus, which corresponds to the method embodiment shown in fig. 2, and which is particularly applicable to various electronic devices.
As shown in fig. 6, the object detection device 600 of the present embodiment includes: an acquisition module 601, a division module 602, a detection module 603, and a determination module 604. The acquiring module 601 is configured to acquire an image to be detected; a dividing module 602 configured to divide an image to be detected into at least two image blocks based on the depth information, wherein an overlap exists between the at least two image blocks; the detection module 603 is configured to perform target detection on an image block for each of at least two image blocks to obtain a detection frame corresponding to a target object in the image block; the determining module 604 is configured to, in response to determining that the same target object corresponds to at least one detection box, determine a target detection box of the target object based on a position of the at least one detection box.
In the present embodiment, the object detection device 600: the specific processing and the technical effects thereof of the obtaining module 601, the dividing module 602, the detecting module 603 and the determining module 604 can refer to the related descriptions of step 201 and step 204 in the corresponding embodiment of fig. 2, which are not described herein again.
In some optional implementations of this embodiment, the determining module includes: the first determining submodule is configured to determine the confidence of each detection frame in response to determining that at least one detection frame corresponding to the same target object is in the same image block, and the detection frame with the highest confidence value is used as the target detection frame of the target object; and the second determining submodule is configured to determine the target detection frame of the target object from the at least one detection frame based on a non-maximum suppression method in response to determining that the at least one detection frame corresponding to the same target object is in different image blocks with overlapping.
In some optional implementations of this embodiment, the second determination submodule is further configured to: for each detection frame in at least one detection frame, determining the distance weight of the detection frame based on the depth information of the image block where the detection frame is located and the position information of the detection frame in the image block; determining the score of the detection frame based on the confidence coefficient and the distance weight of the detection frame; and taking the detection frame with the highest score in the at least one detection frame as the target detection frame of the target object.
In some optional implementations of this embodiment, the dividing module includes: and the division submodule is configured to input the image to be detected into the division model and output the image to be detected to obtain at least two divided image blocks, wherein the division model is used for dividing the image to be detected into the at least two image blocks according to different depth information.
In some optional implementations of the present embodiment, the object detecting apparatus 600 further includes a first training module for training the partition model, and the first training module is configured to: acquiring a training sample set, wherein training samples in the training sample set comprise sample images and sub-images corresponding to the sample images, and the sub-images are obtained by dividing the sample images according to depth information; and taking the sample image as input, taking the sub-image corresponding to the sample image as output, training an initial division model, and obtaining a division model after training.
In some optional implementations of this embodiment, the detecting module includes: a third determining sub-module configured to determine depth information of the image block; and the detection sub-module is configured to input the image block into a detection model corresponding to the depth information, and output a detection frame corresponding to the target object in the image block.
In some optional implementations of the present embodiment, the target detection apparatus 600 is used for training a second training module of the detection model, and the second training module is configured to: acquiring a training data set, wherein training data in the training data set comprises an original image and a detection frame corresponding to the original image, the training data in the training data set has the same depth information, and the detection frame is obtained by labeling a target object in the original image; and taking the original image as input, taking the detection frame corresponding to the original image as output, training an initial detection model, and obtaining the trained detection model.
The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.
FIG. 7 illustrates a schematic block diagram of an example electronic device 700 that can be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not intended to limit implementations of the disclosure described and/or claimed herein.
As shown in fig. 7, the device 700 comprises a computing unit 701, which may perform various suitable actions and processes according to a computer program stored in a Read Only Memory (ROM)702 or a computer program loaded from a storage unit 708 into a Random Access Memory (RAM) 703. In the RAM 703, various programs and data required for the operation of the device 700 can also be stored. The computing unit 701, the ROM 702, and the RAM 703 are connected to each other by a bus 704. An input/output (I/O) interface 705 is also connected to bus 704.
Various components in the device 700 are connected to the I/O interface 705, including: an input unit 706 such as a keyboard, a mouse, or the like; an output unit 707 such as various types of displays, speakers, and the like; a storage unit 708 such as a magnetic disk, optical disk, or the like; and a communication unit 709 such as a network card, modem, wireless communication transceiver, etc. The communication unit 709 allows the device 700 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.
Computing unit 701 may be a variety of general and/or special purpose processing components with processing and computing capabilities. Some examples of the computing unit 701 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The calculation unit 701 executes the respective methods and processes described above, such as the target detection method. For example, in some embodiments, the object detection method may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 708. In some embodiments, part or all of a computer program may be loaded onto and/or installed onto device 700 via ROM 702 and/or communications unit 709. When the computer program is loaded into the RAM 703 and executed by the computing unit 701, one or more steps of the object detection method described above may be performed. Alternatively, in other embodiments, the computing unit 701 may be configured to perform the object detection method by any other suitable means (e.g. by means of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), Complex Programmable Logic Devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.
Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.
Cloud computing (cloud computer) refers to a technology system that accesses a flexibly extensible shared physical or virtual resource pool through a network, where resources may include servers, operating systems, networks, software, applications, or storage devices, and the like, and may be deployed and managed in an on-demand, self-service manner. Through the cloud computing technology, high-efficiency and strong data processing capacity can be provided for technical application and model training of artificial intelligence, block chains and the like.
The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server with a combined blockchain.
It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved, and the present disclosure is not limited herein.
The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made, depending on design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims (17)

1. A method of target detection, comprising:
acquiring an image to be detected;
dividing the image to be detected into at least two image blocks based on the depth information, wherein the at least two image blocks are overlapped;
performing target detection on each image block of the at least two image blocks to obtain a detection frame corresponding to a target object in the image block;
in response to determining that the same target object corresponds to at least one detection frame, determining a target detection frame of the target object based on a position of the at least one detection frame.
2. The method of claim 1, wherein, in response to determining that the same target object corresponds to at least one detection box, determining the target detection box of the target object based on a position of the at least one detection box comprises:
in response to the fact that at least one detection frame corresponding to the same target object is located in the same image block, determining the confidence of each detection frame, and taking the detection frame with the highest confidence value as the target detection frame of the target object;
and in response to determining that at least one detection frame corresponding to the same target object is in different image blocks with overlapping, determining a target detection frame of the target object from the at least one detection frame based on a non-maximum suppression method.
3. The method of claim 2, wherein the determining a target detection box for the target object from the at least one detection box based on a non-maximum suppression method comprises:
for each detection frame in the at least one detection frame, determining a distance weight of the detection frame based on depth information of an image block where the detection frame is located and position information of the detection frame in the image block; determining a score for the detection box based on the confidence of the detection box and the distance weight;
and taking the detection frame with the highest score in the at least one detection frame as the target detection frame of the target object.
4. The method of claim 1, wherein the dividing the image to be detected into at least two image blocks based on the depth information comprises:
and inputting the image to be detected into a division model, and outputting to obtain at least two divided image blocks, wherein the division model is used for dividing the image to be detected into the at least two image blocks according to different depth information.
5. The method of claim 4, wherein the partition model is trained based on the following steps:
acquiring a training sample set, wherein training samples in the training sample set comprise sample images and sub-images corresponding to the sample images, and the sub-images are obtained by dividing the sample images according to depth information;
and taking the sample image as input, taking the sub-image corresponding to the sample image as output, training an initial division model, and obtaining a division model after training.
6. The method according to claim 1, wherein the performing target detection on the image block to obtain a detection frame corresponding to a target object in the image block includes:
determining depth information of the image block;
and inputting the image block into a detection model corresponding to the depth information, and outputting to obtain a detection frame corresponding to a target object in the image block.
7. The method of claim 6, wherein the detection model is trained based on:
acquiring a training data set, wherein training data in the training data set comprises an original image and a detection frame corresponding to the original image, the training data in the training data set has the same depth information, and the detection frame is obtained by labeling a target object in the original image;
and taking the original image as input, taking a detection frame corresponding to the original image as output, training an initial detection model, and obtaining a trained detection model.
8. An object detection device comprising:
an acquisition module configured to acquire an image to be detected;
a dividing module configured to divide the image to be detected into at least two image blocks based on depth information, wherein there is an overlap between the at least two image blocks;
the detection module is configured to perform target detection on each image block of the at least two image blocks to obtain a detection frame corresponding to a target object in the image block;
the determining module is configured to determine a target detection frame of the target object based on a position of at least one detection frame in response to determining that the same target object corresponds to the at least one detection frame.
9. The apparatus of claim 8, wherein the means for determining comprises:
the first determining sub-module is configured to determine the confidence of each detection frame in response to determining that at least one detection frame corresponding to the same target object is in the same image block, and take the detection frame with the highest confidence value as the target detection frame of the target object;
and the second determining submodule is configured to determine a target detection frame of the target object from at least one detection frame corresponding to the same target object based on a non-maximum suppression method in response to determining that the at least one detection frame is located in different image blocks with overlapping.
10. The apparatus of claim 9, wherein the second determination submodule is further configured to:
for each detection frame in the at least one detection frame, determining a distance weight of the detection frame based on depth information of an image block where the detection frame is located and position information of the detection frame in the image block; determining a score for the detection box based on the confidence of the detection box and the distance weight;
and taking the detection frame with the highest score in the at least one detection frame as the target detection frame of the target object.
11. The apparatus of claim 8, wherein the means for dividing comprises:
and the division submodule is configured to input the image to be detected into a division model and output the image to be detected to obtain at least two divided image blocks, wherein the division model is used for dividing the image to be detected into the at least two image blocks according to different depth information.
12. The apparatus of claim 11, wherein the apparatus further comprises a first training module for training a partitioning model, the first training module configured to:
acquiring a training sample set, wherein training samples in the training sample set comprise sample images and sub-images corresponding to the sample images, and the sub-images are obtained by dividing the sample images according to depth information;
and taking the sample image as input, taking the sub-image corresponding to the sample image as output, and training an initial division model to obtain a trained division model.
13. The apparatus of claim 8, wherein the detection module comprises:
a third determining sub-module configured to determine depth information of the image block;
and the detection sub-module is configured to input the image block into a detection model corresponding to the depth information, and output a detection frame corresponding to a target object in the image block.
14. The apparatus of claim 13, wherein the apparatus further comprises a second training module for training a detection model, the second training module configured to:
acquiring a training data set, wherein training data in the training data set comprises an original image and a detection frame corresponding to the original image, the training data in the training data set has the same depth information, and the detection frame is obtained by labeling a target object in the original image;
and taking the original image as input, taking a detection frame corresponding to the original image as output, training an initial detection model, and obtaining a trained detection model.
15. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-7.
16. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-7.
17. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 1-7.
CN202210746451.4A 2022-06-28 2022-06-28 Target detection method, device, equipment and storage medium Pending CN115131314A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210746451.4A CN115131314A (en) 2022-06-28 2022-06-28 Target detection method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210746451.4A CN115131314A (en) 2022-06-28 2022-06-28 Target detection method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN115131314A true CN115131314A (en) 2022-09-30

Family

ID=83379441

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210746451.4A Pending CN115131314A (en) 2022-06-28 2022-06-28 Target detection method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN115131314A (en)

Similar Documents

Publication Publication Date Title
CN112801164A (en) Training method, device and equipment of target detection model and storage medium
CN113920307A (en) Model training method, device, equipment, storage medium and image detection method
CN112634343A (en) Training method of image depth estimation model and processing method of image depth information
CN113674421B (en) 3D target detection method, model training method, related device and electronic equipment
CN114627239B (en) Bounding box generation method, device, equipment and storage medium
CN112863187B (en) Detection method of perception model, electronic equipment, road side equipment and cloud control platform
CN113901998A (en) Model training method, device, equipment, storage medium and detection method
CN113392794B (en) Vehicle line crossing identification method and device, electronic equipment and storage medium
CN113724388B (en) High-precision map generation method, device, equipment and storage medium
CN114140759A (en) High-precision map lane line position determining method and device and automatic driving vehicle
CN114187459A (en) Training method and device of target detection model, electronic equipment and storage medium
CN113469025A (en) Target detection method and device applied to vehicle-road cooperation, road side equipment and vehicle
US20230072632A1 (en) Obstacle detection method, electronic device and storage medium
CN113378712A (en) Training method of object detection model, image detection method and device thereof
CN116245193A (en) Training method and device of target detection model, electronic equipment and medium
US20230245429A1 (en) Method and apparatus for training lane line detection model, electronic device and storage medium
CN114119990B (en) Method, apparatus and computer program product for image feature point matching
CN113177980B (en) Target object speed determining method and device for automatic driving and electronic equipment
CN113627298A (en) Training method of target detection model and method and device for detecting target object
CN113126120A (en) Data annotation method, device, equipment, storage medium and computer program product
US20230162383A1 (en) Method of processing image, device, and storage medium
CN115861755A (en) Feature fusion method and device, electronic equipment and automatic driving vehicle
CN116152702A (en) Point cloud label acquisition method and device, electronic equipment and automatic driving vehicle
CN113920273B (en) Image processing method, device, electronic equipment and storage medium
CN114429631A (en) Three-dimensional object detection method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination