CN113642510A

CN113642510A - Target detection method, device, equipment and computer readable medium

Info

Publication number: CN113642510A
Application number: CN202110998557.9A
Authority: CN
Inventors: 付小龙
Original assignee: Beijing Jingdong Qianshi Technology Co Ltd
Current assignee: Beijing Jingdong Qianshi Technology Co Ltd
Priority date: 2021-08-27
Filing date: 2021-08-27
Publication date: 2021-11-12

Abstract

The embodiment of the disclosure discloses a target detection method, a target detection device, an electronic device and a computer readable medium. One embodiment of the method comprises: carrying out target detection on the image to be processed to obtain a bounding box set corresponding to an object displayed in the image to be processed and a score of each bounding box; selecting a target boundary frame from the boundary frame set according to the score of each boundary frame, and determining each boundary frame except the target boundary frame in the boundary frame set as a suppression boundary frame subset; selecting a boundary frame with the overlapping rate of the boundary frame with the target boundary frame being greater than or equal to a preset threshold value from the suppression boundary frame subset as a candidate boundary frame; determining weighted position information of the candidate bounding box and the position information of the target bounding box, and taking the weighted position information as the position information of the object in the image to be processed. The method and the device improve the accuracy of the finally determined positioning frame and improve the accuracy of target detection.

Description

Target detection method, device, equipment and computer readable medium

Technical Field

Embodiments of the present disclosure relate to the field of computer technologies, and in particular, to a target detection method, apparatus, device, and computer-readable medium.

Background

The target detection technology is widely applied to various fields. In the process of target detection, a plurality of detection frames are obtained for a certain object (for example, a pedestrian) in an image. Related object detection techniques typically perform de-duplication processing on the detection boxes by averaging the detection boxes or using a model with a multi-scale filter.

However, when the above-mentioned method is used for removing the duplicate, there is a problem that it is difficult to accurately determine the positioning frame, and the accuracy of target detection is low.

Disclosure of Invention

This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the detailed description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter. Some embodiments of the present disclosure propose target detection methods, apparatuses, devices and computer readable media to solve one or more of the technical problems mentioned in the background section above.

In a first aspect, some embodiments of the present disclosure provide a method of object detection, the method comprising: carrying out target detection on the image to be processed to obtain a bounding box set corresponding to an object displayed in the image to be processed and a score of each bounding box; selecting a target boundary frame from the boundary frame set according to the score of each boundary frame, and determining each boundary frame except the target boundary frame in the boundary frame set as a suppression boundary frame subset; selecting a boundary frame with the overlapping rate of the boundary frame with the target boundary frame being greater than or equal to a preset threshold value from the suppression boundary frame subset as a candidate boundary frame; determining weighted position information of the candidate bounding box and the position information of the target bounding box, and taking the weighted position information as the position information of the object in the image to be processed.

In a second aspect, some embodiments of the present disclosure provide an object detection apparatus, the apparatus comprising: the detection unit is configured to perform target detection on the image to be processed to obtain a bounding box set corresponding to an object displayed in the image to be processed and a score of each bounding box; a first selecting unit configured to select a target bounding box from the bounding box set according to the score of each bounding box, and determine each bounding box in the bounding box set except the target bounding box as a suppression bounding box subset; a second selecting unit configured to select, as a candidate bounding box, a bounding box from the suppression bounding box subset, the bounding box having an overlap rate with the target bounding box that is greater than or equal to a preset threshold; a determination unit configured to determine weighted position information of the candidate bounding box and the position information of the target bounding box, and to take the weighted position information as position information of the object in the image to be processed.

In a third aspect, some embodiments of the present disclosure provide an electronic device, comprising: one or more processors; a storage device having one or more programs stored thereon, which when executed by one or more processors, cause the one or more processors to implement the method described in any of the implementations of the first aspect.

In a fourth aspect, some embodiments of the present disclosure provide a computer readable medium on which a computer program is stored, wherein the program, when executed by a processor, implements the method described in any of the implementations of the first aspect.

The above embodiments of the present disclosure have the following advantages: the accuracy of the finally obtained positioning frame can be improved by weighting the position information of the inhibition bounding box subset and the target bounding box. In practice, the reason for the low accuracy of target detection is found to be: by means of the averaging of the detection frames or the multi-scale filter model, some characteristics are lost and some error characteristics are introduced after the detection frames are subjected to de-duplication. Based on this, by weighting the position information of the suppression bounding box subset and the target bounding box, the feature loss or the introduction of error features caused by the deduplication operation can be reduced. Therefore, the accuracy of the finally determined positioning frame can be improved, and the accuracy of target detection can be improved.

Drawings

The above and other features, advantages and aspects of various embodiments of the present disclosure will become more apparent by referring to the following detailed description when taken in conjunction with the accompanying drawings. Throughout the drawings, the same or similar reference numbers refer to the same or similar elements. It should be understood that the drawings are schematic and that elements and elements are not necessarily drawn to scale.

FIG. 1 is a schematic diagram of one application scenario of an object detection method according to some embodiments of the present disclosure;

FIG. 2 is a flow diagram of some embodiments of a target detection method according to the present disclosure;

FIG. 3 shows a graph of the loss function when convolution kernels of different sizes are used;

FIG. 4 is a flow chart of further embodiments of a target detection method according to the present disclosure;

FIG. 5 is a schematic structural diagram of some embodiments of an object detection apparatus according to the present disclosure;

FIG. 6 is a schematic structural diagram of an electronic device suitable for use in implementing some embodiments of the present disclosure.

Detailed Description

Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure are shown in the drawings, it is to be understood that the disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided for a more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the disclosure are for illustration purposes only and are not intended to limit the scope of the disclosure.

It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings. The embodiments and features of the embodiments in the present disclosure may be combined with each other without conflict.

It should be noted that the terms "first", "second", and the like in the present disclosure are only used for distinguishing different devices, modules or units, and are not used for limiting the order or interdependence relationship of the functions performed by the devices, modules or units.

It is noted that references to "a", "an", and "the" modifications in this disclosure are intended to be illustrative rather than limiting, and that those skilled in the art will recognize that "one or more" may be used unless the context clearly dictates otherwise.

The names of messages or information exchanged between devices in the embodiments of the present disclosure are for illustrative purposes only, and are not intended to limit the scope of the messages or information.

The present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.

Fig. 1 is a schematic diagram of one application scenario of an object detection method according to some embodiments of the present disclosure.

Under this application scenario, to take the image that the robot gathered is patrolled and examined in unmanned storehouse operation as an example, it is low to the people detection accuracy in the image, can lead to patrolling and examining robot circuit planning unreasonable, the phenomenon such as personnel of detour or bruise appears, and then influences the work efficiency of patrolling and examining the robot.

First, the execution subject of the object detection method may be the computing device 101. Based on this, the computing device 101 may perform target detection on the image to be processed, and obtain a set of bounding boxes corresponding to the object displayed in the image to be processed and a score of each bounding box. As shown in the figure, the bounding box set of the human body displayed in the image to be processed includes a bounding box 102, a bounding box 103, and a bounding box 104. Assume that bounding box 102 scores 0.1, bounding box 103 scores 0.3, and bounding box 104 scores 0.6. On the basis, a target bounding box is selected from the bounding box set according to the score of each bounding box (the bounding box 104). And determining each bounding box in the set of bounding boxes except the target bounding box as a suppression bounding box subset. In the context of this application, the bounding box subset includes

bounding boxes

102 and 103. And then, selecting the boundary frame with the overlapping rate of the boundary frame and the target boundary frame being greater than or equal to a preset threshold value from the inhibition boundary frame subset as a candidate boundary frame. In the context of this application, the candidate bounding box is bounding box 103. Then, weighted position information of the candidate bounding box and the position information of the target bounding box can be determined, and the weighted position information is taken as the position information of the object in the image to be processed. In the context of the present application, weighted position information of the bounding box 103 and the bounding box 104 may be determined. For example, an average of the coordinates of the two bounding boxes may be determined. For convenience of explanation, the weighted location information is represented as a bounding box, resulting in a bounding box 105 as shown in the figure.

With continued reference to fig. 2, a flow 200 of some embodiments of a target detection method according to the present disclosure is shown. The target detection method comprises the following steps:

step 201, performing target detection on the image to be processed to obtain a bounding box set corresponding to the object displayed in the image to be processed and a score of each bounding box.

In some embodiments, the executing subject of the target detection method may perform target detection on the image to be processed through various target detection algorithms, so as to obtain a bounding box set of the object formation displayed in the image to be processed and a score of each bounding box.

As an example, target detection can be performed on the image to be processed by Fast R-CNN (Fast regional convolutional network), R-CNN (regional convolutional network). Fast R-CNN gets the region of interest (which may be 2000) of the image through a selective search algorithm. Then, ROI pooling (region of interest pooling) is performed on the region of interest. Then, the output of the ROI posing layer is taken as a feature vector for each region of interest, and the feature vectors of the region of interest are connected to the fully connected layer. In addition, a multitask loss function is defined and is respectively connected with the softmax classifier and the boxbounding regressor to obtain the category of the current region of interest and the coordinates of the bounding box. And performing non-maximum suppression (NMS) on all the obtained bounding boxes to obtain a final detection result. Namely, the set of bounding boxes of the object formation displayed in the image to be processed and the score of each bounding box. Wherein the probability value of the output of the softmax classifier may be determined as the score of each bounding box.

In addition, an artificial neural network can be built according to needs to detect the target of the image to be processed. For example, a number of convolutional layers and pooling layers may be provided. The convolution layer extracts image features by performing operations on the image and different convolution kernels. The pooling layer is used for further extracting representative image features and accelerating operation. As an example, the convolution kernel may employ a convolution kernel of 5 × 5. As shown in fig. 3, a graph of the loss function is shown when convolution kernels of different sizes are used. It can be seen that the loss function can be minimized with a convolution kernel of 5x 5. Therefore, the convolution kernel of 5x5 can be adopted to improve the robustness of the network and the accuracy of the prediction result.

Step 202, selecting a target bounding box from the bounding box set according to the score of each bounding box, and determining each bounding box except the target bounding box in the bounding box set as a suppression bounding box subset.

In some optional implementations of some embodiments, the execution subject may select the target bounding box by:

step 1, selecting the boundary frame with the highest score from the boundary frame set, and adding the selected boundary frame into the first boundary frame set.

And 2, deleting the boundary frames with the overlapping rate higher than a preset overlapping rate threshold value with the boundary frame selected this time from the unselected boundary frames in the boundary frame set to obtain a residual boundary frame set.

And 3, if the residual bounding box set is not empty, taking the residual bounding box set as the bounding box set, and continuing to execute the step 1 and the step 2.

And 4, if the residual bounding box set is empty and the number of the bounding boxes contained in the first bounding box set is one, determining the bounding boxes contained in the first bounding box set as target bounding boxes.

And 5, if the residual bounding box set is empty and the number of the bounding boxes contained in the first bounding box set is greater than one, taking the first bounding box set as the bounding box set, adjusting the preset overlap rate threshold value and continuously executing the steps 1 to 4 until one bounding box, namely the target bounding box, is remained.

Of course, other ways of determining the target bounding box may be used, as desired. For example, the bounding box with the highest score may be selected as the target bounding box.

For example, A, B, C, D four bounding boxes are included in the set of bounding boxes. The score corresponding to each bounding box is respectively: bounding box a scored 0.98, bounding box B scored 0.86, bounding box C scored 0.7, bounding box D scored 0.8, bounding box D scored 0.79.

On this basis, the following steps may be performed:

step 1: firstly, selecting the bounding box A with the highest score, and adding the selected bounding box A into the first bounding box set.

Step 2: and deleting the boundary frames, of which the overlapping rates with the boundary frames selected this time are higher than a preset overlapping rate threshold value, in the unselected boundary frames in the boundary frame set to obtain a residual boundary frame set. In this example, the unselected bounding boxes in the set of bounding boxes include B, C, D because A has already been selected. The overlapping rate of the bounding box B and the bounding box A is assumed to be higher than a preset overlapping rate threshold value. Therefore, the bounding box B needs to be deleted. Thus, the remaining set of bounding boxes includes C, D.

And step 3: and if the residual bounding box set is not empty, taking the residual bounding box set as the bounding box set, and continuing to execute the step 1 and the step 2 until the residual bounding box set is empty.

In this example, the remaining set of bounding boxes includes C, D, i.e., is not empty. Therefore, the remaining bounding box set may be used as the bounding box set, and steps 1 and 2 may be continued. Specifically, the bounding box D with the highest score can be selected from the first set of bounding boxes, and the bounding box D can be added into the first set of bounding boxes. At this time, the first bounding box set includes bounding box D and bounding box a. Then, since the bounding box D has been selected, each of the unselected bounding boxes in the bounding box set is the bounding box C. The overlapping rate of the bounding box C and the selected bounding box D is assumed to be higher than the preset overlapping rate threshold value, so the bounding box C is deleted. At this time, the remaining bounding box set is empty, and the jump is performed to step 4 or step 5.

And 4, step 4: and if the residual bounding box set is empty and the number of the bounding boxes contained in the first bounding box set is one, determining the bounding boxes contained in the first bounding box set as the target bounding box.

And 5: and if the remaining bounding box sets are empty and the number of the bounding boxes contained in the first bounding box set is greater than one, taking the first bounding box set as the bounding box set, and continuing to execute the steps 1 to 4 until the number of the bounding boxes contained in the first bounding box is one. In practice, after the first bounding box set is used as the bounding box set, all bounding boxes currently included in the first bounding box set need to be deleted.

In this example, after performing step 1 and step 2 twice, the remaining set of bounding boxes is empty, and the number of bounding boxes included in the first bounding box is greater than one. Therefore, the first bounding box set is used as the bounding box set, the preset overlap rate threshold is adjusted, for example, the preset overlap rate threshold may be adjusted to be smaller, and the steps 1 to 4 are continuously performed.

Specifically, the bounding box with the highest score, bounding box a, may be selected from the set of bounding boxes. Bounding box a may then be added to the first set of bounding boxes. Here, since all the previously added bounding boxes of the first set of bounding boxes have been deleted, only bounding box a is present. Then, the unselected bounding boxes in the bounding box set are bounding boxes D at this time. Since the preset overlap rate threshold is decreased, the boundary frame D may be deleted if the overlap rate of the boundary frame D and the boundary frame (boundary frame a) selected this time is higher than the preset overlap rate threshold. At this time, the remaining bounding box set is empty, the number of bounding boxes currently included in the first bounding box set is one, and the bounding boxes included in the first bounding box set are determined as target bounding boxes. That is, the target bounding box is bounding box A. At this point, the target bounding box is bounding box A, and the bounding box B, C, D deleted in the process constitutes the suppression bounding box subset.

Step 203, selecting a bounding box with the overlapping rate of the target bounding box being greater than or equal to a preset threshold value from the suppression bounding box subset as a candidate bounding box.

In some embodiments, the execution subject may select, as the candidate bounding box, a bounding box from the suppression bounding box subset, where an overlap rate with the target bounding box is greater than or equal to a preset pre-threshold (e.g., 50%). The candidate bounding boxes may include one or more bounding boxes, as appropriate.

And step 204, determining weighted position information of the candidate bounding box and the position information of the target bounding box, and taking the weighted position information as the position information of the object in the image to be processed.

In some embodiments, the execution body may determine weighted location information of the candidate bounding box and the location information of the target bounding box.

Specifically, the candidate bounding box is at least one bounding box. The candidate bounding box is denoted by Y'. Y { (S)_j，B_j) In which B_jAnd indicating the position information of the jth bounding box contained in the candidate bounding box. S_jIndicating the score of the jth bounding box.

Wherein, for each bounding box of at least one bounding box, the weight value of the bounding box can be determined according to the score of the bounding box, specifically, B can be determined by the following formula_jWeight w of_j：

w_j＝max(0，s_j)

On this basis, weighted position information of the candidate bounding box and the position information of the target bounding box can be determined according to the determined weight value.

As an example, the weighted position information B 'of each of the candidate bounding boxes is determined by the following formula'_i：

Then, determination of B 'may continue'_iWeighted position information with position information of the target bounding box, and position information of the object in the image to be processed with the weighted position information.

According to the method provided by some embodiments of the present disclosure, the accuracy of the finally obtained location frame can be improved by weighting the position information of the inhibition bounding box subset and the target bounding box. In practice, the reason for the low accuracy of target detection is found to be: by means of the averaging of the detection frames or the multi-scale filter model, some characteristics are lost and some error characteristics are introduced after the detection frames are subjected to de-duplication. Based on this, by weighting the position information of the suppression bounding box subset and the target bounding box, the feature loss or the introduction of error features caused by the deduplication operation can be reduced. Therefore, the accuracy of the finally determined positioning frame can be improved, and the accuracy of target detection can be improved.

With further reference to fig. 4, a flow 400 of further embodiments of a target detection method is shown. The process 400 of the target detection method includes the following steps:

step 401, performing feature extraction on an image to be processed to obtain a first feature map.

In some embodiments, the execution subject of the target detection method may perform feature extraction, for example, by using a convolutional neural network or a feature extraction network, to obtain the first feature map.

And 402, inputting the first feature diagram into an edge detection network to obtain a second feature diagram.

In some embodiments, the executing entity may continue to input the first feature map into the edge detection network to obtain the second feature map. The Edge detection network can be used for realizing an Edge-Boxes algorithm to extract features such as edges and textures, so that a second feature map is obtained. The Edge-Boxes algorithm essentially performs feature extraction through a plurality of sliding windows. Therefore, if a plurality of bounding Boxes are generated directly by the Edge-Boxes algorithm, the method has the problems of large calculation amount and long time consumption.

Step 403, inputting the second feature map into the candidate region extraction network, and obtaining a bounding box set corresponding to the object displayed in the image to be processed and a score of each bounding box.

In some embodiments, the executing entity may input the second feature map into the candidate region extraction network, and obtain a bounding box set corresponding to an object displayed in the image to be processed and a score of each bounding box. Among them, the RPN (Region pro-potential Network) is a convolutional Network, and because there is no full connection layer, it can support inputs with different sizes. And then outputting a bounding box set corresponding to the displayed object in the image to be processed and the score of each bounding box.

And step 404, selecting a target bounding box from the bounding box set according to the score of each bounding box, and determining each bounding box except the target bounding box in the bounding box set as a suppression bounding box subset.

Step 405, selecting a bounding box with the overlapping rate of the target bounding box being greater than or equal to a preset threshold value from the suppression bounding box subset as a candidate bounding box.

Step 406, determining weighted position information of the candidate bounding box and the position information of the target bounding box, and taking the weighted position information as the position information of the object in the image to be processed.

In some embodiments, specific implementations of the

steps

404 and 406 and the technical effects thereof can refer to those 202 and 204 corresponding to fig. 2, which are not described herein again.

As can be seen from fig. 4, compared with the description of some embodiments corresponding to fig. 2, the process 400 of the target detection method in some embodiments corresponding to fig. 4 performs a feature-based search through the candidate area extraction network after performing a rough search through the edge detection network. Therefore, the calculation amount can be saved, and the time consumption can be reduced.

With further reference to fig. 5, as an implementation of the methods illustrated in the above figures, the present disclosure provides some embodiments of an object detection apparatus, which correspond to those of the method embodiments illustrated in fig. 2, and which may be particularly applicable in various electronic devices.

As shown in fig. 5, the object detection apparatus 500 of some embodiments includes: a detection unit 501, a first selection unit 502, a second selection unit 503, and a determination unit 504. The detection unit 501 is configured to perform target detection on an image to be processed, and obtain a bounding box set corresponding to an object displayed in the image to be processed and a score of each bounding box. The first selecting unit 502 is configured to select a target bounding box from the set of bounding boxes according to the score of each bounding box, and determine each bounding box in the set of bounding boxes except the target bounding box as a suppression bounding box subset. The second selecting unit 503 is configured to select a bounding box from the suppression bounding box subset, where an overlap ratio with the target bounding box is greater than or equal to a preset threshold, as a candidate bounding box. The determining unit 504 is configured to determine weighted position information of the candidate bounding box and the position information of the target bounding box, and to take the weighted position information as the position information of the object in the image to be processed.

In an optional implementation manner of some embodiments, the detection unit 501 is further configured to perform feature extraction on the image to be processed, so as to obtain a first feature map; inputting the first feature map into an edge detection network to obtain a second feature map; and inputting the second feature map into a candidate region extraction network to obtain a bounding box set corresponding to the displayed object in the image to be processed and the score of each bounding box.

In an optional implementation of some embodiments, the detection unit 501 is further configured to: and performing target detection on the image to be processed to obtain a bounding box set corresponding to the displayed object in the image to be processed and the score of each bounding box, and obtaining the object category corresponding to each bounding box.

In an alternative implementation of some embodiments,

it will be understood that the elements described in the apparatus 500 correspond to various steps in the method described with reference to fig. 2. Thus, the operations, features and resulting advantages described above with respect to the method are also applicable to the apparatus 500 and the units included therein, and are not described herein again.

Referring now to fig. 6, a schematic diagram of an electronic device (e.g., the server or terminal device of fig. 1) 600 suitable for use in implementing some embodiments of the present disclosure is shown. The electronic device shown in fig. 6 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.

As shown in fig. 6, electronic device 600 may include a processing means (e.g., central processing unit, graphics processor, etc.) 601 that may perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)602 or a program loaded from a storage means 608 into a Random Access Memory (RAM) 603. In the RAM 603, various programs and data necessary for the operation of the electronic apparatus 600 are also stored. The processing device 601, the ROM 602, and the RAM 603 are connected to each other via a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.

Generally, the following devices may be connected to the I/O interface 605: input devices 606 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; output devices 607 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage 608 including, for example, tape, hard disk, etc.; and a communication device 609. The communication means 609 may allow the electronic device 600 to communicate with other devices wirelessly or by wire to exchange data. While fig. 6 illustrates an electronic device 600 having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided. Each block shown in fig. 6 may represent one device or may represent multiple devices as desired.

In particular, according to some embodiments of the present disclosure, the processes described above with reference to the flow diagrams may be implemented as computer software programs. For example, some embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In some such embodiments, the computer program may be downloaded and installed from a network through the communication device 609, or installed from the storage device 608, or installed from the ROM 602. The computer program, when executed by the processing device 601, performs the above-described functions defined in the methods of some embodiments of the present disclosure.

It should be noted that the computer readable medium described in some embodiments of the present disclosure may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In some embodiments of the disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In some embodiments of the present disclosure, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.

In some embodiments, the clients, servers may communicate using any currently known or future developed network Protocol, such as HTTP (HyperText Transfer Protocol), and may interconnect with any form or medium of digital data communication (e.g., a communications network). Examples of communication networks include a local area network ("LAN"), a wide area network ("WAN"), the Internet (e.g., the Internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed network.

The computer readable medium may be embodied in the electronic device; or may exist separately without being assembled into the electronic device. The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: carrying out target detection on the image to be processed to obtain a bounding box set corresponding to an object displayed in the image to be processed and a score of each bounding box; selecting a target boundary frame from the boundary frame set according to the score of each boundary frame, and determining each boundary frame except the target boundary frame in the boundary frame set as a suppression boundary frame subset; selecting a boundary frame with the overlapping rate of the boundary frame with the target boundary frame being greater than or equal to a preset threshold value from the suppression boundary frame subset as a candidate boundary frame; determining weighted position information of the candidate bounding box and the position information of the target bounding box, and taking the weighted position information as the position information of the object in the image to be processed.

Computer program code for carrying out operations for embodiments of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in some embodiments of the present disclosure may be implemented by software, and may also be implemented by hardware. The described units may also be provided in a processor, and may be described as: a processor includes a detection unit, a first selection unit, a second selection unit, and a determination unit. The names of these units do not in some cases constitute a limitation on the unit itself, and for example, the detection unit may also be described as a "unit for object detection of an image to be processed".

The functions described herein above may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), systems on a chip (SOCs), Complex Programmable Logic Devices (CPLDs), and the like.

The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention in the embodiments of the present disclosure is not limited to the specific combination of the above-mentioned features, but also encompasses other embodiments in which any combination of the above-mentioned features or their equivalents is made without departing from the inventive concept as defined above. For example, the above features and (but not limited to) technical features with similar functions disclosed in the embodiments of the present disclosure are mutually replaced to form the technical solution.

Claims

1. A method of target detection, comprising:

carrying out target detection on an image to be processed to obtain a bounding box set corresponding to an object displayed in the image to be processed and a score of each bounding box;

selecting a target bounding box from the bounding box set according to the score of each bounding box, and determining each bounding box except the target bounding box in the bounding box set as a suppression bounding box subset;

selecting a bounding box with the overlapping rate of the target bounding box being greater than or equal to a preset threshold value from the suppression bounding box subset as a candidate bounding box;

determining weighted position information of the candidate bounding box and the position information of the target bounding box, and taking the weighted position information as the position information of the object in the image to be processed.

2. The method according to claim 1, wherein the performing target detection on the image to be processed to obtain the bounding box set corresponding to the object displayed in the image to be processed and the score of each bounding box comprises:

extracting the features of the image to be processed to obtain a first feature map;

inputting the first feature map into an edge detection network to obtain a second feature map;

and inputting the second feature map into a candidate region extraction network to obtain a bounding box set corresponding to the object displayed in the image to be processed and the score of each bounding box.

3. The method according to claim 1, wherein the performing target detection on the image to be processed to obtain the bounding box set corresponding to the object displayed in the image to be processed and the score of each bounding box comprises:

and carrying out target detection on the image to be processed to obtain a bounding box set corresponding to the object displayed in the image to be processed and the score of each bounding box, and obtaining the object category corresponding to each bounding box.

4. The method of claim 1, wherein the selecting a target bounding box from the set of bounding boxes according to the score for each bounding box comprises:

selecting the boundary frame with the highest score from the boundary frame set, and adding the selected boundary frame into a first boundary frame set;

deleting the boundary frames, of which the overlapping rates with the boundary frame selected this time are higher than a preset overlapping rate threshold value, in the unselected boundary frames in the boundary frame set to obtain a residual boundary frame set;

if the residual bounding box set is not empty, taking the residual bounding box set as the bounding box set, and continuing to execute the steps;

and if the residual bounding box set is empty and the number of the bounding boxes currently contained in the first bounding box set is one, determining the bounding boxes contained in the first bounding box set as target bounding boxes.

5. The method of claim 4, wherein said selecting a target bounding box from said set of bounding boxes according to the score of each bounding box further comprises:

if the remaining bounding box set is empty and the number of bounding boxes included in the first bounding box set is greater than one, the first bounding box set is used as the bounding box set, the preset overlap rate threshold is adjusted, and the steps are continuously executed until the number of bounding boxes included in the first bounding box set is one, and the bounding boxes included in the first bounding box set are determined as the target bounding boxes.

6. The method of claim 1, wherein the candidate bounding box is at least one bounding box; and

the determining weighted position information of the candidate bounding box and the position information of the target bounding box comprises:

for each bounding box of the at least one bounding box, determining a weight value of the bounding box according to the score of the bounding box;

determining weighted position information of the candidate bounding box and the position information of the target bounding box according to the determined weight value.

7. An object detection device comprising:

the detection unit is configured to perform target detection on an image to be processed to obtain a bounding box set corresponding to an object displayed in the image to be processed and a score of each bounding box;

a first selecting unit, configured to select a target bounding box from the bounding box set according to the score of each bounding box, and determine each bounding box in the bounding box set except the target bounding box as a suppression bounding box subset;

a second selecting unit configured to select, as a candidate bounding box, a bounding box from the suppression bounding box subset, the bounding box having an overlap rate with the target bounding box that is greater than or equal to a preset threshold;

a determination unit configured to determine weighted position information of the candidate bounding box and the position information of the target bounding box, and to use the weighted position information as position information of the object in the image to be processed.

8. The apparatus of claim 7, wherein the detection unit is further configured to:

9. The apparatus of claim 7, wherein the detection unit is further configured to:

10. An electronic device, comprising:

one or more processors;

a storage device having one or more programs stored thereon,

when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-6.

11. A computer-readable medium, on which a computer program is stored, wherein the program, when executed by a processor, implements the method of any one of claims 1-6.