WO2022160229A1 - Apparatus and method for processing candidate boxes by using plurality of cores - Google Patents

Apparatus and method for processing candidate boxes by using plurality of cores Download PDF

Info

Publication number
WO2022160229A1
WO2022160229A1 PCT/CN2021/074313 CN2021074313W WO2022160229A1 WO 2022160229 A1 WO2022160229 A1 WO 2022160229A1 CN 2021074313 W CN2021074313 W CN 2021074313W WO 2022160229 A1 WO2022160229 A1 WO 2022160229A1
Authority
WO
WIPO (PCT)
Prior art keywords
candidate
candidate frame
frames
frame
sequence
Prior art date
Application number
PCT/CN2021/074313
Other languages
French (fr)
Chinese (zh)
Inventor
林鑫
汪昊
刘虎
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to CN202180092237.5A priority Critical patent/CN116762092A/en
Priority to PCT/CN2021/074313 priority patent/WO2022160229A1/en
Publication of WO2022160229A1 publication Critical patent/WO2022160229A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing

Definitions

  • the present application relates to the technical field of data processing, and in particular, to an apparatus and method for processing candidate frames using multiple cores.
  • Computer vision is an integral part of various intelligent/autonomous systems in various application fields, such as manufacturing, inspection, document analysis, medical diagnosis, and military.
  • it is to install eyes (cameras/camcorders) and brains (algorithms) on the computer to identify, track and measure the target instead of the human eye, so that the computer can perceive the environment.
  • perception can be viewed as extracting information from sensory signals
  • computer vision can also be viewed as the science of how to make artificial systems "perceive" from images or multidimensional data.
  • computer vision is the use of various imaging systems to replace the visual organ to obtain input information, and then the computer replaces the brain to process and interpret the input information.
  • the ultimate research goal of computer vision is to enable computers to observe and understand the world through vision like humans, and have the ability to adapt to the environment autonomously.
  • Object detection is a technique commonly used in the field of computer vision.
  • Nonmaximum suppression is widely used in computer vision algorithms, such as image recognition, edge detection or object detection.
  • the classifier is an important part of the NMS algorithm, which can be used to detect objects in the image, such as whether it is a face, etc. For an object in the image, the classifier will generate multiple candidate frames.
  • the classifier computes a confidence level, also known as a score, for each candidate box. In order to accurately detect or recognize objects, only one optimal candidate frame is reserved for an object, and the content in the optimal candidate frame is used as the recognized or detected object.
  • the candidate frame with the highest classifier score in all candidate frames in the image is selected, the overlapping area between other candidate frames and the candidate frame with the highest classifier score is traversed and calculated, and according to the relationship between the overlapping area and the preset threshold, the frames in the other candidate frames are deleted.
  • the candidate frame with the highest classifier score is selected again from the other candidate frames that have not been deleted, and the traversal calculation of the overlapping area is performed again until all other candidate frames except the optimal candidate frame are deleted.
  • the number of candidate boxes is very large, which leads to an increase in the traversal calculation amount, which increases the calculation amount of the NMS algorithm, resulting in the problem of long calculation time of the NMS algorithm.
  • the embodiments of the present application provide an apparatus and method for processing candidate frames by using multiple cores, which improves the calculation speed of the NMS algorithm and improves the efficiency of obtaining target candidate frames.
  • a first aspect of the present application provides an apparatus for processing candidate frames using multiple cores, and the apparatus for processing candidate frames using multiple cores may include multiple single-core processors.
  • All candidate frames of the image to be detected can be processed by the device, wherein each candidate frame in all the candidate frames of the image to be detected corresponds to a sequence number, and the sequence numbers of any two candidate frames are different.
  • the sequence numbers corresponding to each candidate frame are consecutive.
  • different sequence numbers may be randomly assigned to each candidate frame.
  • sequence numbers may be assigned to all candidate frames according to the order of the confidence levels of all candidate frames from high to low.
  • sequence numbers may be assigned to all candidate frames according to the order of confidence of all candidate frames from low to high.
  • the multiple first single-core processors among the multiple single-core processors are configured to execute the following process in parallel: acquiring some candidate frames from all the candidate frames.
  • some candidate frames may be acquired from all candidate spaces according to their respective identification information.
  • the identification information is unique, the identification information of any two first single-core processors is different, and the partial candidate frames obtained by any two first single-core processors are different; each candidate frame in the obtained partial candidate frame is the same as Suppression relationship between each candidate box in all candidate boxes.
  • the suppression relationship between each candidate frame and itself may not be obtained.
  • the second single-core processor among the plurality of single-core processors is configured to obtain the final candidate frame according to the suppression relationship obtained by each first single-core processor and the corresponding confidence level of each candidate frame.
  • the second single-core processor is any one of the multiple first single-core processors, or the second single-core processor is different from any one of the multiple first single-core processors.
  • the solution provided by the present application processes the computation without dependencies in parallel by a multi-core processor, so as to speed up the processing progress.
  • computations without dependencies can be understood as computations independent of the ordering of confidence.
  • the step of acquiring the inhibition relationship may be performed by a multi-core processing model.
  • Each first single-core processor obtains a part of candidate frames respectively, and calculates the suppression relationship between each candidate frame in the obtained candidate frame and each candidate frame in all candidate frames.
  • the inhibition relationship between any two candidate frames in all candidate frames can be obtained at one time, and then the calculation related to the confidence level is calculated through the second single-core processing.
  • the final candidate frame is also sometimes referred to as the target candidate frame, both of which represent the remaining candidate frames after all candidate frames are processed by the NMS algorithm.
  • each first single-core processor is specifically configured to: acquire the area of each candidate frame in all candidate frames. Obtain the overlap area between each candidate frame in some candidate frames and each candidate frame in all candidate frames. Obtain the overlapping area of each candidate frame in some candidate frames and each candidate frame in all candidate frames. According to the relationship between the overlapping area and the ratio of the overlapping area and the preset threshold, the suppression relationship between each candidate frame in some candidate frames and each candidate frame in all candidate frames is obtained. In this embodiment, a process of obtaining the suppression relationship between any two candidate frames is given. In the solution provided in this application, the process of obtaining the suppression relationship between any two candidate frames is processed by multi-core processing. The parallel execution of the processor increases the speed of the calculation.
  • At least one of the above steps is performed in parallel by a multi-core processor, and the remaining steps may be performed by a single-core processor, such as obtaining the area of each candidate frame in all candidate frames, obtaining some candidate frames The overlapping area between each candidate frame in the frame and each candidate frame in all candidate frames, and the overlapping area between each candidate frame in some candidate frames and each candidate frame in all candidate frames, these three
  • the step is obtained by a multi-core processor, and according to the relationship between the overlap area and the ratio of the overlap area and the preset threshold, the suppression relationship between each candidate frame in some candidate frames and each candidate frame in all candidate frames can be obtained in the first step.
  • Two single-core processors execute.
  • each first single-core processor is specifically configured to acquire, from the sequence of candidate frames, some adjacent candidate frames in order according to their respective identification information.
  • the sequence of candidate frames is obtained by sorting all the candidate frames according to the order of the confidence levels from high to low.
  • the candidate frames obtained by the multi-core processor are the candidate frames sorted according to the confidence, which is convenient for subsequent calculation.
  • each first single-core processor is specifically configured to: obtain from the candidate frame sequence the same number of adjacent partial candidate frames in order according to the respective identification information.
  • the number of partial candidate frames obtained by each first single-core processor is the same, for example, the first first single-core processor obtains candidate frames with serial numbers 1-20, and the second The core processor obtains candidate boxes with sequence numbers 21-40, and so on.
  • the second single-core processor is further configured to: obtain N bit sequences from each first single-core processor, where N is the number of partial candidate frames, and the N bit sequences use In order to represent the suppression relationship between each candidate frame in the partial candidate frame and each candidate frame in the all candidate frames, each bit sequence in the N bit sequences is used to represent the first candidate frame and the suppression relationship between each candidate frame in all the candidate frames, the first candidate frame is one candidate frame in the partial candidate frame.
  • each first single-core processor sends N bit sequences to the second single-core processor.
  • the second single-core processor when the second single-core processor is not identical to any one of the plurality of first single-core processors, the second single-core processor receives the sequence of N bits sent by each of the first single-core processors .
  • the second single-core processor obtains N bit sequences from itself, and from each of the other first single-core processors A single-core processor each acquires N bit sequences.
  • the second single-core processor when the second single-core processor is one of the plurality of first single-core processors, the second single-core processor can obtain the N bit sequences generated by itself, and receive every other first single-core processor.
  • a sequence of N bits sent by the core processor In this embodiment, the suppression relationship between one candidate frame and each candidate frame in all the candidate frames can be represented by a bit sequence, which is beneficial to the subsequent calculation process and also increases the diversity of the scheme.
  • each bit sequence may include M bits, each of the M bits is used to indicate that the first candidate frame is suppressed by the second candidate frame, or to indicate that the first candidate frame is not Suppressed by a second candidate frame, which is one candidate frame among all the candidate frames.
  • M is the number of all candidate frames.
  • each of the M positions can use 1 bit to indicate the inhibition relationship, for example, when 1 bit is 0, it means The second candidate frame is suppressed by the first candidate frame, and when 1 bit is 1, it indicates that the second candidate frame is not suppressed by the first candidate frame.
  • each first single-core processor is further configured to: perform an initialization operation on each bit sequence according to the obtained sorting of some candidate frames, so that each bit sequence after initialization is The first P bits are used to indicate that the first candidate frame is suppressed by the second candidate frame, so that the last M-P bits of each of the bit sequences after initialization are used to indicate that the first candidate frame is not suppressed by the second candidate frame.
  • P is the sequence number of the first candidate frame in the sorting of some candidate frames.
  • each candidate frame When obtaining the suppression relationship, each candidate frame does not need to obtain the suppression relationship with itself, and does not need to consider the suppression relationship between candidate frames whose confidence levels are ranked before itself, so when initializing the bit sequence, M bits
  • the first P bits are used to indicate that the first candidate frame is suppressed by the second candidate frame, so that the M-P bits in the M bits are used to indicate that the first candidate frame is not suppressed by the second candidate frame, and P is the first candidate frame.
  • the sequence number of the candidate frame in the sorting of some candidate frames.
  • the second single-core processor is specifically configured to: perform multiple processing on the acquired bit sequence to obtain the target candidate frame, wherein any one processing in the multiple processing includes: according to the sequence number
  • the sequence obtains the bit sequence to be processed from the acquired bit sequence, and the sequence number sequence is used to indicate the sequence number of each bit sequence, wherein the sequence number of the bit sequence to be processed is determined according to the sequence number of the second candidate frame.
  • the sequence is used to represent the suppression relationship between the second candidate frame and each candidate frame in all candidate frames, the second candidate frame is a candidate frame in all candidate frames, and the sequence number of each candidate frame in all candidate frames is based on The candidate box sequence is obtained.
  • the updated global sequence is used to obtain the target candidate frame.
  • a specific solution for obtaining the final candidate frame by the second single-core processor is given.
  • the second single-core processor is specifically configured to: generate a sequence number sequence according to the sequence of candidate frames, where the sequence of candidate frames is used to indicate the confidence level of the candidate frame.
  • the sequence number sequence is screened multiple times to obtain multiple sequence numbers, and the multiple sequence numbers are used to obtain the final candidate frame, wherein any one of the multiple screenings may include: obtaining the bit sequence of a specific candidate frame according to the sequence number obtained by the previous screening , the bit sequence of the specific candidate frame is used to represent the suppression relationship between the specific candidate frame and the second candidate frame.
  • the updated global sequence is obtained.
  • the updated global sequence indicates at least one sequence number, and the candidate frame corresponding to at least one sequence number is not sorted by the specific candidate frame and the sequence number. Other candidate boxes before a specific candidate box are suppressed.
  • a specific solution for obtaining the final candidate frame by the second single-core processor is given.
  • the second single-core processor is specifically configured to: acquire the bit sequence to be processed from the acquired bit sequence according to the sequence number of the acquired bit sequence, and the sequence number of the bit sequence to be processed is based on the sequence number of the bit sequence to be processed.
  • the sequence number of the second candidate frame is determined, and the bit sequence to be processed is used to represent the suppression relationship between the second candidate frame and each candidate frame in all candidate frames, and the second candidate frame is a candidate frame in all candidate frames.
  • the serial number of each candidate frame in all candidate frames is obtained according to the sequence of candidate frames. According to the bit sequence to be processed and the processed bit sequence, the target candidate frame is obtained.
  • the target candidate frame is a candidate frame that is not suppressed by the second candidate frame and other candidate frames sorted before the second candidate frame.
  • Each processed candidate frame The bit sequence is used to represent the suppression relationship between each other candidate frame ranked before the second candidate frame and each candidate frame in all candidate frames.
  • some candidate frames obtained by each first single-core processor constitute all candidate frames.
  • a second aspect of the present application provides a method for processing a candidate frame using multiple cores, which may include: acquiring a candidate frame of an image to be detected. Part of the candidate frames are respectively obtained from all the candidate frames by using the respective identification information of the multiple first single-core processors. The suppression relationship between each candidate frame in some candidate frames and each candidate frame in all candidate frames is acquired by each first single-core processor. The second single-core processor obtains the final candidate frame according to the inhibition relationship obtained by each first single-core processor and the corresponding confidence level of each candidate frame.
  • acquiring, by each first single-core processor, the suppression relationship between each candidate frame in some candidate frames and each candidate frame in all candidate frames may include: by each candidate frame The first single-core processor obtains the area of each candidate box in all candidate boxes. The overlapping area between each candidate frame in some candidate frames and each candidate frame in all candidate frames is obtained by each first single-core processor. The overlapping area of each candidate frame in some candidate frames and each candidate frame in all candidate frames is obtained by each first single-core processor. The suppression relationship between each candidate frame in some candidate frames and each candidate frame in all candidate frames is obtained by each first single-core processor according to the relationship between the overlapping area and the ratio of the overlapping area and the preset threshold.
  • the method may further include: sorting all the candidate frames according to the order of the confidence levels corresponding to each candidate frame from high to low, so as to obtain a sequence of candidate frames.
  • Obtaining part of the candidate frames from all the candidate frames by using the respective identification information of the multiple first single-core processors may include: obtaining, from the sequence of candidate frames, the adjacent candidate frames by using the respective identification information of the multiple first single-core processors.
  • obtaining, from the candidate frame sequence, some adjacent candidate frames in order by using the respective identification information of the multiple first single-core processors may include: using the respective identification information of the multiple first single-core processors The identification information obtains the same number of adjacent partial candidate frames from the sequence of candidate frames.
  • the method may further include: obtaining, through the second single-core processor, N bit sequences from each of the first single-core processors, where N is the number of partial candidate frames, and among the N bit sequences Each bit sequence of is used to represent the suppression relationship between the first candidate frame and the second candidate frame, the first candidate frame is a candidate frame in some candidate frames, and the second candidate frame is each candidate frame in all candidate frames frame.
  • each bit sequence may include M bits, each of the M bits is used to indicate that the first candidate frame is suppressed by the second candidate frame, or to indicate that the first candidate frame is not Suppressed by the second candidate frame, M is the number of all candidate frames.
  • the method may further include: performing an initialization operation on each bit sequence by each first single-core processor according to the order of the obtained partial candidate frames, so that the first P in the M bits are initialized. bits are used to indicate that the first candidate frame is suppressed by the second candidate frame, so that the last M-P bits of the M bits are used to indicate that the first candidate frame is not suppressed by the second candidate frame, and P is the first candidate frame in the partial The sequence number in the ordering of candidate boxes.
  • obtaining the final candidate frame by the second single-core processor according to the suppression relationship obtained by each first single-core processor and the corresponding confidence level of each candidate frame may include: according to the sequence of candidate frames , to generate a sequence of serial numbers. The sequence number sequence is screened multiple times to obtain multiple sequence numbers, and the multiple sequence numbers are used to obtain the final candidate frame, wherein any one of the multiple screenings may include: obtaining the bit sequence of a specific candidate frame according to the sequence number obtained by the previous screening , the bit sequence of the specific candidate frame is used to represent the suppression relationship between the specific candidate frame and the second candidate frame. According to the inhibition relationship between the bit sequence of the specific candidate frame and the global sequence obtained by the last screening, the updated global sequence is obtained. The updated global sequence indicates at least one sequence number, and the candidate frame corresponding to at least one sequence number is not sorted by the specific candidate frame and the sequence number. Other candidate boxes before a specific candidate box are suppressed.
  • the target candidate frame is obtained by the second single-core processor according to the inhibition relationship obtained by each first single-core processor and the confidence level corresponding to each candidate frame in all the candidate frames
  • the method includes: performing multiple processing on the acquired bit sequence by the second single-core processor to obtain the target candidate frame, wherein any one processing in the multiple processing includes: acquiring the to-be-processed bit sequence from the acquired bit sequence according to the sequence number sequence Bit sequence, the sequence number sequence is used to indicate the sequence number of each bit sequence, where the sequence number of the bit sequence to be processed is determined according to the sequence number of the second candidate frame, and the bit sequence to be processed is used to indicate the second candidate frame and all candidate frames.
  • the suppression relationship between each candidate frame in the frame, the second candidate frame is a candidate frame in all candidate frames, and the sequence number of each candidate frame in all candidate frames is obtained according to the sequence of candidate frames.
  • some candidate frames obtained by each first single-core processor constitute all candidate frames.
  • a third aspect of the present application provides a neural network device.
  • the neural network device may include a device for processing candidate frames using multiple cores.
  • the device for processing candidate frames using multiple cores is the device for processing candidate frames using multiple cores described in the first aspect.
  • a fourth aspect of the present application provides an apparatus for processing candidate boxes using multiple cores, which may include: a memory for storing computer-readable instructions. It may also include a processor coupled to the memory for executing computer readable instructions in the memory to perform the method as described in the second aspect.
  • a fifth aspect of the present application provides a chip system
  • the chip system may include a processor and a communication interface
  • the processor obtains program instructions through the communication interface
  • the method described in the second aspect is implemented when the program instructions are executed by the processor.
  • a sixth aspect of the present application provides a computer-readable storage medium, which may include a program that, when executed by a processing unit, executes the method described in the second aspect.
  • a seventh aspect of the present application provides a computer program product, which, when the computer program product is run on a computer, causes the computer to perform the method of the second aspect.
  • the solution provided in this application separates the parallel computing part and the data-dependent part of the calculation in the NMS algorithm.
  • the inhibition relationship between candidate boxes is obtained through a multi-core processor, and the parallel execution capability of the multi-core processor is used to increase the flexibility of calculation and improve the calculation speed of the NMS algorithm.
  • the suppression relationship between any candidate frame in all candidate frames and each candidate frame in all candidate frames can be obtained, and then the bit sequence matrix can be obtained.
  • each bit sequence in the bit sequence matrix can indicate the suppression relationship between the two candidate boxes through 1 bit, and the amount of data is small, which effectively reduces the amount of data transmitted by the multi-core processor to the second single-core processing, and shortens the calculation time.
  • the second single-core processor may extract the bit sequence corresponding to the unsuppressed candidate frame from the bit sequence matrix, and delete the suppressed bit sequence.
  • the vreduce instruction may be used to implement this process.
  • the solution provided in this application can speed up the calculation process of the NMS algorithm and quickly obtain the final candidate frame.
  • FIG. 1 is a schematic diagram of the architecture of an RPN
  • FIG. 2a is a schematic structural diagram of an apparatus for processing candidate frames using multiple cores provided by an embodiment of the present application
  • FIG. 2b is a schematic structural diagram of an apparatus for processing candidate frames using multiple cores provided by an embodiment of the present application
  • FIG. 3 is a schematic diagram of each single-core processor in a multi-core processor in an embodiment of the present application acquiring some candidate frames;
  • Fig. 4 is the schematic flow chart of simultaneously obtaining the area of a plurality of candidate frames by SIMD instruction
  • 5 is a schematic flowchart of simultaneously obtaining the overlap area between each candidate frame in the first candidate frame and all candidate frames through SIMD instructions;
  • FIG. 6 is a schematic diagram of the suppression relationship between the first candidate frame and the second candidate frame represented by a bit sequence in an embodiment of the present application
  • FIG. 7a is a schematic diagram of the second single-core processor screening serial numbers in an embodiment of the present application.
  • 7b is a schematic diagram of the second single-core processor screening serial numbers in an embodiment of the present application.
  • 7c is a schematic diagram of the second single-core processor screening serial numbers in an embodiment of the present application.
  • FIG. 8 is a schematic diagram of the second single-core processor screening serial numbers in an embodiment of the present application.
  • FIG. 9 is a schematic flowchart of a method for processing candidate frames using multiple cores provided by an embodiment of the present application.
  • FIG. 10 is a schematic flowchart of a method for processing candidate frames using multiple cores provided by an embodiment of the present application
  • FIG. 11 is a schematic flowchart of a method for processing candidate frames using multiple cores provided by an embodiment of the present application.
  • FIG. 12 is a schematic flowchart of the first single-core processor acquiring the suppression relationship between candidate frames in an embodiment of the present application
  • FIG. 13 is a schematic flowchart of obtaining a final candidate frame by a second single-core processor in an embodiment of the present application
  • FIG. 14 is a schematic structural diagram of a computer device according to an embodiment of the present application.
  • the present application provides an apparatus and method for processing candidate frames using multiple cores.
  • the computing speed of the NMS algorithm can be improved by the device for processing candidate frames using multiple cores provided in this application.
  • NMS The basic idea of the NMS algorithm is to search for local maxima and suppress elements that are not maxima.
  • NMS is widely used in video tracking (viedo tracking) and object recognition (object recognition) and other fields. For example, it is widely used in edge detection, face detection, target detection and other scenarios.
  • the key to target detection is to accurately locate the target of interest from the scene and correctly determine the type of the target.
  • Object detection systems usually employ two stages to locate and identify objects of interest, namely the candidate region stage and the region detection stage.
  • the candidate region stage aims to find hundreds or thousands of candidate boxes from the possible positions and scales of the target, so that the target is all contained in these candidate boxes.
  • the region detection stage further identifies and locates the potential targets in these candidate frames, so as to accurately determine the category of the target.
  • RPN region proposal network
  • FIG. 1 it is a schematic diagram of the architecture of an RPN.
  • the input of RPN200 is a feature map (future map).
  • the feature map can be obtained by extracting the features of the image to be processed through a convolutional neural network (CNN).
  • CNN convolutional neural network
  • a convolutional neural network (CNN) 100 may include an input layer 110 , a convolutional/pooling layer 120 .
  • the convolutional/pooling layer 120 may include layers as examples 121-126, in one implementation, layer 121 is a convolutional layer, layer 122 is a pooling layer, layer 123 is a convolutional layer, and layer 124 is a pooling layer , 125 is a convolution layer, 126 is a pooling layer; in another implementation manner, 121 and 122 are a convolutional layer, 123 is a pooling layer, 124 and 125 are a convolutional layer, and 126 is a pooling layer. That is, the output of a convolutional layer can be used as the input of a subsequent pooling layer, or it can be used as the input of another convolutional layer to continue the convolution operation.
  • the convolution layer 121 may include many convolution operators, which are also called kernels, and their role in image processing is equivalent to a filter that extracts specific information from the input image matrix.
  • the convolution operator can be essentially a weight matrix. This weight matrix is usually pre-defined. In the process of convolving an image, the weight matrix is usually pixel by pixel along the horizontal direction on the input image ( Or two pixels after two pixels...depending on the value of stride), which completes the work of extracting specific features from the image.
  • the size of the weight matrix should be related to the size of the image. It should be noted that the depth dimension of the weight matrix is the same as the depth dimension of the input image.
  • the weight matrix will be extended to Enter the entire depth of the image. Therefore, convolution with a single weight matrix will produce a single depth dimension of the convolutional output, but in most cases a single weight matrix is not used, but multiple weight matrices of the same dimension are applied.
  • the output of each weight matrix is stacked to form the depth dimension of the convolutional image.
  • Different weight matrices can be used to extract different features in the image. For example, one weight matrix is used to extract image edge information, another weight matrix is used to extract specific colors of the image, and another weight matrix is used to extract unwanted noise in the image. Perform fuzzification...
  • the dimensions of the multiple weight matrices are the same, and the dimension of the feature maps extracted from the weight matrices with the same dimensions are also the same, and then the multiple extracted feature maps with the same dimensions are combined to form the output of the convolution operation .
  • the weight values in these weight matrices need to be obtained through a lot of training in practical applications, and each weight matrix formed by the weight values obtained by training can extract information from the input image, thereby helping the convolutional neural network 100 to make correct predictions.
  • the initial convolutional layer for example, 121
  • the features extracted by the later convolutional layers become more and more complex, such as features such as high-level semantics.
  • each layer 121-126 as shown in 120 in Figure 1 can be a convolutional layer followed by a layer
  • the pooling layer can also be a multi-layer convolutional layer followed by one or more pooling layers.
  • the pooling layer may include an average pooling operator and/or a max pooling operator for sampling the input image to obtain a smaller size image.
  • the average pooling operator can calculate the average value of the pixel values in the image within a certain range.
  • the max pooling operator can take the pixel with the largest value within a specific range as the result of max pooling. Also, just as the size of the weight matrix used in the convolutional layer should be related to the size of the image, the operators in the pooling layer should also be related to the size of the image.
  • the size of the output image after processing by the pooling layer can be smaller than the size of the image input to the pooling layer, and each pixel in the image output by the pooling layer represents the average or maximum value of the corresponding sub-region of the image input to the pooling layer.
  • the convolutional neural network 100 may also include other structures, such as a hidden layer, an output layer, etc., because it has nothing to do with this solution, and will not be introduced.
  • the feature map is input into the RPN200.
  • the RPN200 network uses a sliding window to capture the features of each position of the image on the feature map, and maps the features of each position to k anchor windows of different scales and aspect ratios, where k is generally a positive integer greater than 1, such as k is 3.
  • each anchor window is determined to include or exclude the object to be detected.
  • This paper also refers to the middle layer as a classifier.
  • An anchor window including the object to be detected can be considered as a candidate frame, and the candidate frame is also scored, and a candidate frame with a higher score indicates a higher probability of including the object to be detected.
  • the RPN network may further include other processing steps, such as performing compression (reshape) processing on the acquired feature map. This embodiment of the present application does not limit the RPN network may include more processing steps.
  • a large number of candidate frames may be generated, and these candidate frames may overlap with each other.
  • NMS NMS-based on-the-semiconductor
  • a large number of obtained candidate frames also known as bounding boxes, are serially processed, and the candidate frame with the highest confidence is selected in each round, and then all remaining candidate frames are concerned with the selected candidate frame.
  • the candidate boxes for the overlap ratio (the overlap ratio of the area), which will be suppressed in this round.
  • the candidate frame selected in this round will be deleted from the candidate frame list and will not appear in the next round. Then start the next round, repeat the above process, select the candidate frame with the highest confidence, and suppress the candidate frame with high overlap rate.
  • candidate frame 1 has the highest confidence, and focus on the remaining candidate frame 2, candidate frame 3, candidate frame 4, candidate frame 5 and the overlap rate of candidate frame 1. It is assumed that candidate frame 4 and candidate frame 1 have a high overlap rate ( The high overlap rate can be understood as the overlap rate exceeding the preset threshold), then the candidate frame 4 is suppressed by the candidate frame 1 . In this round, candidate frame 1 is selected as the first final candidate frame, and candidate frame 4 is determined to be suppressed.
  • candidate frame 3 is suppressed by candidate frame 2.
  • candidate frame 2 is selected as the second final candidate frame, and at the same time, it is determined that candidate frame 4 is suppressed.
  • candidate frame 5 is left, then candidate frame 5 is not suppressed by any candidate frame, and candidate frame 5 is the third final candidate frame. It can be seen from the above process that the candidate frame selected in each round depends on the selection result of the previous round.
  • the embodiment of the present application provides an apparatus for processing candidate frames using multiple cores, which splits the calculation process of the NMS algorithm, and executes the calculations without dependencies in parallel, so as to improve the calculation speed of the NMS algorithm. This will be described in detail below.
  • An apparatus for processing candidate frames by using multiple cores may include multiple first single-core processors 2021 and second single-core processors 203 .
  • the second single-core processor 203 is any one of the multiple first single-core processors 2021 , or the second single-core processor 203 is different from any one of the multiple first single-core processors 2021 .
  • FIG. 2a it is a schematic structural diagram of an apparatus for processing candidate frames using multiple cores provided by an embodiment of the present application. As shown in FIG. 2a, when the second single-core processor 203 is different from any one of the multiple first single-core processors 2021, the multiple first single-core processors 2021 can be regarded as a multi-core processor as a whole 202.
  • FIG. 2b it is a schematic structural diagram of another apparatus for processing candidate frames by using multiple cores provided by an embodiment of the present application.
  • the apparatus includes a plurality of first single-core processors. It can be considered that the structure shown in FIG. 2b is based on FIG. 2a, and the second single-core processor 203 is regarded as a plurality of first single-core processors. Any of the core processors 2021.
  • Each first single-core processor 2021 is configured to obtain partial candidate frames from all candidate frames of the image to be detected, and obtain the difference between each candidate frame in the partial candidate frame and each candidate frame in all candidate frames inhibition relationship. How to obtain the candidate frame of the image to be detected can be understood with reference to the process of how to obtain the candidate frame in FIG. 1 , and details are not repeated here. In a possible implementation manner, each first single-core processor 2021 may acquire some candidate frames from all candidate frames of the image to be detected according to the respective identification information. In a possible implementation manner, some candidate frames obtained by each first single-core processor 2021 constitute all candidate frames. It should be noted that, in this application, an apparatus for processing candidate frames by using multiple cores may also be referred to as a data processing apparatus, and the two have the same meaning.
  • each candidate frame in all the candidate frames corresponds to a sequence number, and the sequence numbers of any two candidate frames are different.
  • the sequence numbers corresponding to each candidate frame are consecutive.
  • different sequence numbers may be randomly assigned to each candidate frame.
  • sequence numbers may be assigned to all candidate frames according to the order of the confidence levels of all candidate frames from high to low.
  • sequence numbers may be assigned to all candidate frames according to the order of confidence of all candidate frames from low to high.
  • all candidate frames include candidate frame A, candidate frame B, candidate frame C, candidate frame D and candidate frame E, sort the candidate frames according to the confidence of each candidate frame from high to low, and obtain the confidence of candidate frame E
  • the confidence degree of candidate frame C is the second, the confidence degree of candidate frame A is the next, the confidence degree of candidate frame D is lower than that of candidate frame A, and the confidence degree of candidate frame B is the lowest.
  • the candidate frame E is assigned the sequence number 1
  • the candidate frame C is assigned the sequence number 2
  • the candidate frame A is assigned the sequence number 3
  • the candidate frame D is assigned the sequence number 4
  • the candidate frame B is assigned Serial number 5.
  • all the candidate frames may be sent to each of the first single-core processors 2021 in the multi-core processor 202 .
  • all candidate frames sorted according to the confidence may be sent to each first single-core processor 2021 in the multi-core processor 202 .
  • all candidate boxes that are not sorted according to the confidence may also be sent to each first single-core processor 2021 in the multi-core processor 202 .
  • each first single-core processor 2021 After each first single-core processor 2021 obtains all the candidate frames, it selects some candidate frames from all the candidate frames as candidate frames to be processed according to the respective identification information, and some candidate frames obtained by any two first single-core processors 2021 The boxes are not the same.
  • the identification information is unique, the identification information of any two first single-core processors 2021 is different, and the partial candidate frames obtained by any two first single-core processors 2021 are different.
  • the number of partial candidate frames acquired by each first single-core processor 2021 may be the same. For example, assuming that there are N candidate frames in total and there are 5 first single-core processors 2021 in total, the number of partial candidate frames obtained by each single-core processor is N/5.
  • the first first single-core processor 2021 can obtain candidate frames corresponding to serial numbers 1-20 from all candidate frames according to the preset identification information, and the second The first single-core processor 2021 obtains candidate frames corresponding to serial numbers 21-40 from all candidate frames according to the preset identification information, and the third first single-core processor 2021 obtains from all candidate frames according to the preset identification information
  • the candidate frames corresponding to the serial numbers 81-100 are obtained from all the candidate frames according to the preset identification information.
  • each of the first single-core processors 2021 obtains some candidate frames from all the candidate frames according to the respective identification information, and the number of the obtained partial candidate frames is the same.
  • the first single-core processor 2021 obtains the top 20 candidate frames in the confidence ranking
  • the second The single-core processor 203 obtains the candidate boxes from the 21st to the 40th in the confidence order
  • the third single-core processor obtains the candidate boxes from the 41st to the 60th in the confidence order
  • the fourth single-core processor obtains the The candidate boxes from the 61st to the 80th in the confidence order are obtained by the fifth single-core processor, and the candidate boxes from the 81st to the first single-core processor 2021 in the confidence order are obtained.
  • each single-core processor in the multi-core processor 202 obtains a schematic diagram of some candidate frames.
  • the sorted candidate boxes are used as input to the apparatus for processing candidate boxes using multiple cores. It is assumed that there are 2560 sequence numbers in total, and each sequence number corresponds to a candidate frame. The smaller the value of the sequence number, the higher the confidence of the candidate frame.
  • the sorted candidate frames are copied to the multi-core processor 202 , and specifically to each of the first single-core processors 2021 in the multi-core processor 202 . As shown in FIG. 3 , each first single-core processor 2021 selects its own partial candidate frame from the 2560 candidate frames according to its own identification information.
  • the input sorted candidate frames are evenly divided into R regional tiles (tiling), and each regional tile includes the same number of candidate boxes, as indicated by the black boxes in Figure 3
  • the region obtained by each first single-core processor 2021 is divided into blocks, or understood as a partial candidate frame obtained by each first single-core processor 2021 .
  • the number of partial candidate frames acquired by each first single-core processor 2021 may be different. For example, there are a total of N candidate boxes, N is the first single-core processor 2021, there are a total of five first single-core processors 2021, and the first first single-core processor 2021 can be selected from all the first single-core processors 2021 according to the preset identification information
  • the candidate frame obtains the candidate frame corresponding to the serial number 1-20
  • the second first single-core processor 2021 obtains the candidate frame corresponding to the serial number 21-60 from all the candidate frames according to the preset identification information
  • the third first single-core processor 2021 obtains the candidate frame corresponding to the serial number 21-60.
  • the core processor 2021 obtains the candidate frames corresponding to the sequence numbers 61-70 from all the candidate frames according to the preset identification information, and the fourth first single-core processor 2021 obtains the sequence numbers 71-70 from all the candidate frames according to the preset identification information.
  • the candidate frame corresponding to 85, the fifth first single-core processor 2021 obtains the sequence numbers 86-100 from all the candidate frames according to the preset identification information.
  • each first single-core processor 2021 After each first single-core processor 2021 acquires some candidate frames from all candidate frames according to the respective identification information, it acquires the suppression relationship between each candidate frame in the partial candidate frame and each candidate frame in all candidate frames . This is a process of parallel processing. After each first single-core processor 2021 obtains the suppression relationship between each candidate frame in its own partial candidate frame and each candidate frame in all candidate frames, After the results output by a single-core processor 2021 are summed up, the suppression relationship between each candidate frame in all candidate frames and all other candidate frames can be obtained. The number of partial candidate frames obtained by each of the above-mentioned first single-core processors 2021 may be the same. The description continues.
  • the first single-core processor obtains a candidate frame with a sequence number of 1 and each candidate frame in all N candidate frames.
  • the second single-core processor obtains the suppression relationship between the candidate box with serial number 21 and each candidate box in all N candidate boxes
  • the third single-core processor obtains The suppression relationship between the candidate frame with serial number 41 and each candidate frame in all N candidate frames
  • the fourth single-core processor obtains the candidate frame with serial number 61 and each candidate frame in all N candidate frames
  • the fifth single-core processor obtains the suppression relationship between the candidate frame with serial number 81 and each candidate frame in all N candidate frames.
  • the second single-core processor obtains the sequence number The suppression relationship between each candidate frame in the candidate frames corresponding to 21 to 40 and each candidate frame in all candidate frames
  • the third single-core processor obtains each candidate frame in the candidate frames corresponding to serial numbers 41 to 60.
  • the suppression relationship between the frame and each candidate frame in all candidate frames, and the fourth single-core processor obtains the information of each candidate frame in the candidate frames corresponding to serial numbers 61 to 80 and each candidate frame in all candidate frames.
  • the fifth single-core processor obtains the suppression relationship between each candidate frame in the candidate frame corresponding to the serial number 81 to the first single-core processor 2021 and each candidate frame in all the candidate frames.
  • each first single-core processor 2021 acquires the area of each candidate frame in all candidate frames. Since the calculation process of the area of each candidate frame is independent, a single instruction stream with multiple data streams (single instruction multiple data) can be used, so that each first single-core processor 2021 in the multi-core processor 202 can simultaneously calculate multiple candidates area of the box. In another possible implementation, each first single-core processor 2021 calculates the area of each candidate frame in the partially obtained candidate frame, and according to the calculation result of each first single-core processor 2021, then The area of each candidate frame in all candidate frames can be obtained. Each first single-core processor 2021 may acquire the area of the candidate box calculated by the other first single-core processors 2021 .
  • the first single-core processor 2021 may obtain the area of each candidate box in the candidate boxes whose serial numbers are from 21 to the first single-core processor 2021 .
  • a single-core processor may acquire the area of each candidate frame in all candidate frames, and send the acquired area of each candidate frame in all candidate frames to another first single-core processor for processing device 2021.
  • Each candidate frame corresponds to a coordinate, which is used to represent the coordinate information of the candidate frame on the feature map, and the area of the candidate frame can be obtained according to the coordinates of a candidate frame.
  • a candidate frame can usually be represented by the coordinates of the upper left corner and the lower right corner of the candidate frame. Assuming that the coordinates of the upper left corner are (x1, y1) and the coordinates of the lower right corner are (x2, y2), the area of the candidate frame can be Expressed as (x2-x1+1)*(y2-y1+1).
  • FIG. 4 it is a schematic flowchart of simultaneously obtaining the areas of multiple candidate boxes through SIMD instructions.
  • the abscissa of the upper left corner of each candidate frame is x1
  • the ordinate of the upper left corner of each candidate frame is y1
  • the abscissa of the lower right corner of each candidate frame is x2
  • the ordinate of the lower right corner of each candidate box is y2.
  • each candidate frame in some candidate frames After obtaining the area of each candidate frame in all candidate frames, in order to calculate the suppression relationship between any two candidate frames, it is also necessary to calculate the overlap ratio between the two candidate frames. Specifically, the overlapping area between each candidate frame in some candidate frames and each candidate frame in all candidate frames can be obtained. Obtain the overlapping area of each candidate frame in some candidate frames and each candidate frame in all candidate frames. According to the relationship between the overlapping area and the ratio of the overlapping area and the preset threshold, the suppression relationship between each candidate frame in some candidate frames and each candidate frame in all candidate frames is obtained. A detailed description will be given below.
  • each first single-core processor 2021 calculates the overlapping area between each obtained candidate frame and all the candidate frames for each of the obtained candidate frames. Assume that the overlapping area between the first candidate frame and each candidate frame in all other candidate frames is currently calculated. Assuming that the coordinates of the upper left corner of the first candidate frame are (sel-x1, sel-y1), the coordinates of the lower right corner are (sel-x2, sel-y2), and the coordinates of the upper left corner of a candidate frame in all candidate frames are (x1, y1), the coordinates of the lower right corner are (x2, y2).
  • the overlapping area between the two candidate boxes can be expressed as [min(sel-x2, x2)-max(sel-x1, x1)+1]*[min(sel-y2, y2)-max(sel- y1, y1)+1].
  • FIG. 5 it is a schematic flowchart of simultaneously obtaining the overlapping area between the first candidate frame and each candidate frame in all candidate frames through SIMD instructions.
  • the coordinates of the upper left corner of the currently selected candidate frame (the first candidate frame) are (sel-x1, sel-y1), and the coordinates of the lower right corner are (sel-x1, sel-y1).
  • each first single-core processor 2021 calculates the overlapping area between each obtained candidate frame and all the candidate frames for each obtained candidate frame. It has been described above that the area of each candidate frame is obtained according to the coordinates of the candidate frame, and the overlapping area between the two candidate frames can be obtained by superimposing the areas of any two candidate frames.
  • the coordinates of the upper left corner of the first candidate frame are (sel-x1, sel-y1)
  • the coordinates of the lower right corner are (sel-x2, sel-y2)
  • the upper left corner of a candidate frame in all candidate frames is The coordinates of the corner are (x1, y1)
  • the coordinates of the lower right corner are (x2, y2).
  • the overlapping area between the two candidate boxes can be expressed as [(x2-x1+1)*(y2-y1+1)]+[(sel-x2-sel-x1+1)*(sel-y2- sel-y1+1)].
  • the overlapping area between the first candidate frame and each candidate frame in all candidate frames can be obtained simultaneously through SIMD instructions, for example, according to the formula [(x2-x1+1)*(y2-y1 +1)]+[(sel-x2-sel-x1+1)*(sel-y2-sel-y1+1)] simultaneously obtains the difference between the first candidate frame and each candidate frame in all candidate frames Overlay area.
  • the relationship between the ratio of the overlapping area between each candidate frame and other candidate frames to the overlapping area between the candidate frame and other candidate frames and the preset threshold obtain the relationship between each candidate frame and other candidate frames. inhibit relationship. If the ratio is less than the threshold, it means that the overlap ratio of the two candidate frames is small, which means that the inhibition relationship between the two candidate frames is not suppressed. If the ratio is greater than the threshold, it means that the overlap rate of the two candidate frames is large, which means that the inhibition relationship between the two candidate frames is suppressed. The candidate box is suppressed.
  • each candidate frame can be regarded as the selected candidate frame in turn, and the selected candidate frame and each of the all candidate frames can be obtained. Suppression relationship between candidate boxes.
  • multiple candidate frames can also be selected as the selected candidate frames at the same time, and the difference between each candidate frame in the multiple selected candidate frames and each candidate frame in all the candidate frames is obtained at one time. inhibitory relationship between.
  • the suppression relationship between the selected candidate frame and each candidate frame in all candidate frames can be represented by a bit sequence.
  • each bit sequence is used to represent the suppression relationship between the first candidate frame and the second candidate frame.
  • the first candidate frame is a candidate frame in some candidate frames
  • the second candidate frame is each candidate frame in all candidate frames. frame.
  • Each candidate frame corresponds to a bit sequence, and each bit sequence includes M positions, where M is the number of all candidate frames.
  • the selected candidate frame is the first candidate frame
  • the bit sequence corresponding to the first candidate frame is the first bit sequence
  • the bit sequence includes M positions, and each position in the M positions is used to represent the global candidate frame
  • the suppression relationship between one candidate box and the first candidate box is the first candidate frame.
  • the suppression relationship between the first candidate frame and each candidate frame may be determined according to the sequence numbers of the candidate frames and sequentially according to the sequence numbers.
  • each of the M positions may indicate the suppression relationship through a plurality of bits.
  • each position in the M positions can indicate the suppression relationship by 1 bit, for example, when 1 bit is 0, it indicates the first The second candidate frame is suppressed by the first candidate frame, and when 1 bit is 1, it indicates that the second candidate frame is not suppressed by the first candidate frame.
  • the first bit sequence corresponding to the candidate frame with sequence number 1 in the first bit sequence shown in Figure 6, the first position, the third position, the first bit sequence If the 6th position, the 9th position, the 12th position, the 13th position and the 15th position are 1, and the rest of the positions are 0, it can be considered that the candidate frame with sequence number 3 is not suppressed by the candidate frame with sequence number 1,
  • the candidate frame with sequence number 6 is not suppressed by the candidate frame with sequence number 1
  • the candidate frame with sequence number 9 is not suppressed by the candidate frame with sequence number 1
  • the candidate frame with sequence number 12 is not suppressed by the candidate frame with sequence number 1
  • the sequence number is
  • the candidate frame of 13 is not suppressed by the candidate frame with the sequence number 1
  • the candidate frame with the sequence number 15 is not suppressed by the candidate frame with the sequence number 1
  • the candidate frames with the sequence number are all suppressed by the candidate frame with the sequence number 1, that is, the sequence number is 2.
  • the second bit sequence corresponding to the candidate frame with sequence number 2 in the second bit sequence shown in FIG. 6, the first position, the second position, the fourth position to the seventh position, the ninth position The first position and the 14th position are 1, and the remaining positions are 0. It can be considered that the candidate frame with the sequence number 1, the sequence number 4 to 7, the sequence number 9 and the sequence number 14 is not suppressed by the candidate frame with the sequence number 2, and the remaining sequence numbers are not suppressed by the candidate frame with the sequence number 2.
  • the candidate frames of are suppressed by the candidate frame with sequence number 2, that is, the candidate frame with sequence number 3, sequence number 8, sequence numbers 10 to 13, sequence number 15, and sequence number 16 are suppressed by the candidate frame with sequence number 2.
  • M bit sequences may be initially set, and a first number of initial bit sequences may be configured for each first single-core processor 2021 , and the first number is obtained by the first single-core processor 2021 The number of partial candidate boxes.
  • each candidate frame does not need to obtain the suppression relationship with itself, and does not need to consider the suppression relationship between candidate frames whose confidence levels are ranked before itself, so when initializing the bit sequence, M bits
  • the first P bits are used to indicate that the first candidate frame is suppressed by the second candidate frame, so that the M-P bits in the M bits are used to indicate that the first candidate frame is not suppressed by the second candidate frame, and P is the first candidate frame.
  • the sequence number of the candidate frame in the sorting of some candidate frames For example, referring to Table 1, it is assumed that there are 15 candidate frames in total, each candidate frame corresponds to a sequence number, the 15 sequence numbers are consecutive, and the smaller the sequence number, the higher the confidence. Then, when initializing the bit sequence with the serial number of 7, for example, when indicated by 1 bit, the first 7 bits are 1, and the last 8 bits are 0.
  • each candidate frame since each candidate frame does not need to obtain the inhibition relationship with itself when obtaining the suppression relationship, and does not need to consider the suppression relationship between candidate frames whose confidence levels are ranked before itself, it is also possible to After the bit sequence of each candidate frame is acquired, each acquired bit sequence is processed by checking the bit sequence.
  • the check bit makes the first P bits in the M bits all used to indicate that the first candidate frame is suppressed by the second candidate frame, and the last M-P bits in the M bits are used to indicate that the first candidate frame is not suppressed by the second candidate frame.
  • the candidate frame is suppressed, and P is the sequence number of the first candidate frame in the sorting of some candidate frames.
  • Table 1 can also be understood as the parity bit sequence of the bit sequence numbered 7, and the parity bit sequence of the bit sequence numbered 7 and the bit sequence numbered 7 are ORed bitwise Calculation is performed to obtain a bit sequence with a sequence number of 7 after verification.
  • N bit sequences can be obtained, where N is the number of partial candidate frames obtained by the first single-core processor 2021.
  • a total of M bit sequences can be obtained.
  • Each first single-core processor 2021 sends the acquired suppression relationship between each candidate frame in some candidate frames and each candidate frame in all candidate frames to the second single-core processor 203 .
  • each first single-core processor 2021 sends the acquired N bit sequences to the second single-core processor 203 .
  • each first single-core processor 2021 sends the obtained N bit sequences after checking to the second single-core processor 203 .
  • each first single-core processor 2021 may send N bit sequences to the storage module, and the storage module sorts the N bit sequences according to the sequence numbers of the candidate boxes, and stores the sorted N bit sequences. A sequence of bits is used as input to the second single-core processor 203 .
  • the second single-core processor may acquire the N bit sequences generated by itself, and receive each other The N bit sequences sent by a single-core processor will not be repeated below.
  • the second single-core processor 203 is configured to obtain the final candidate frame according to the suppression relationship obtained by each first single-core processor 2021 and the corresponding confidence level of each candidate frame.
  • the second single-core processor 203 may first obtain the bit sequence corresponding to the candidate frame with the highest confidence, and may obtain each bit sequence corresponding to the candidate frame with the highest confidence according to the bit sequence corresponding to the candidate frame with the highest confidence. The inhibition relationship between each candidate frame and the candidate frame with the highest confidence is obtained, and the candidate frame that is not inhibited by the candidate frame with the highest confidence is obtained, and the next round of screening process is entered.
  • next round of screening process select the bit sequence corresponding to the candidate frame with the highest confidence, and obtain the candidate frame that is not suppressed by the candidate frame with the highest confidence according to the bit sequence corresponding to the candidate frame with the highest confidence in this round, and enter In the next round of screening process, the above process is repeated until the preset stop condition is met.
  • the stopping condition can be understood as outputting a preset number of final candidate frames, or traversing all candidate frames, or other set stopping conditions. In order to better understand this process, here is an example to illustrate. Continue to refer to refer to FIG. 6 , the candidate frame with sequence number 1 is the candidate frame with the highest confidence in all candidate frames.
  • sequence number 3 sequence number 6, sequence number 9, sequence number 12, sequence number 13 and The candidate frame corresponding to the sequence number 15 is not suppressed by the candidate frame whose sequence number is 1. Then the candidate boxes corresponding to the sequence number 3, the sequence number 6, the sequence number 9, the sequence number 12, the sequence number 13, and the sequence number 15 are subjected to the next round of screening.
  • the confidence of the candidate frame of sequence number 3 is the highest confidence of the candidate frame of sequence number 3, sequence number 6, sequence number 9, sequence number 12, sequence number 13 and sequence number 15, then according to the bit sequence corresponding to sequence number 3 (for example, the third bit sequence) to obtain the inhibition relationship between the candidate frames of sequence number 3 and sequence number 6, sequence number 9, sequence number 12, sequence number 13, and sequence number 15, and after these two rounds of screening, two final candidate frames are obtained, which are the ones of sequence number 1.
  • a sequence number sequence is generated according to the sequence of candidate boxes.
  • the sequence number sequence is screened multiple times to obtain multiple sequence numbers, and the multiple sequence numbers are used to obtain the final candidate frame, wherein any one screening in the multiple screening includes: obtaining the bit sequence of a specific candidate frame according to the sequence number obtained by the previous screening, The bit sequence of the specific candidate frame is used to represent the inhibition relationship between the specific candidate frame and the second candidate frame; according to the inhibition relationship between the bit sequence of the specific candidate frame and the global sequence obtained from the last screening, the updated global sequence is obtained, and the updated global sequence is updated.
  • the latter global sequence indicates at least one sequence number, and the candidate frame corresponding to at least one sequence number is not suppressed by the specific candidate frame and other candidate frames ranked before the specific candidate frame.
  • an example is given by taking 1 bit to represent the suppression relationship as an example.
  • all candidate frames are sorted in the order of confidence from high to low, so that each candidate frame corresponds to a sequence number, then each sequence number in the sequence number sequence is consistent with the sequence number corresponding to each candidate frame. For example, assuming that there are 15 candidate frames in total, after sorting all candidate frames in the order of confidence from high to low, the sequence number of the first candidate frame is 1, then the sequence number 1 in the sequence number sequence is used to obtain the first candidate frame. For a bit sequence of a candidate frame, for example, the sequence number of the seventh candidate frame is 7, the sequence number 7 in the sequence number sequence is used to obtain the bit sequence of the seventh candidate frame.
  • the second single-core processor 203 initializes the global sequence, the initialized global sequence includes M bits, and each bit is set to 1, which is used to indicate that in the initial state, each candidate frame in all candidate frames is not suppressed , that is, each candidate frame in all candidate frames is reserved.
  • the M bits the first bit to the Mth bit are respectively used to indicate the suppression relationship between the currently selected candidate frame and each candidate frame corresponding to the confidence level from high to low.
  • the candidate frames corresponding to sequence numbers 2 to 6, sequence number 10, sequence number 12, and sequence number 16 are not suppressed by the candidate frame corresponding to sequence number 1, and candidate frames corresponding to other sequence numbers are suppressed by sequence number 1.
  • the serial numbers 2 to 6, the serial number 10, the serial number 12 and the serial number 16 are obtained. Since the serial numbers are assigned to each candidate frame in the order of confidence from high to low, the higher the serial number, the higher the serial number. small, the higher the confidence.
  • start the judgment from the bit sequence of the candidate frame with sequence number 2 perform AND operation on the bit sequence of the candidate frame with sequence number 2 and the global sequence obtained after the first update, and use the global sequence after the AND operation as the updated global sequence.
  • FIG. 7b The global sequence of , specifically the global sequence after the second update.
  • Figure 7b continues to illustrate on the basis of Figure 7a.
  • the candidate frames corresponding to sequence numbers 4-6 are not suppressed by the candidate frame corresponding to sequence number 2, and other sequence numbers are not suppressed by the candidate frame corresponding to sequence number 2.
  • the corresponding candidate frame is suppressed by the candidate frame corresponding to sequence number 2. Since the suppression relationship between the candidate frame corresponding to sequence number 1 and the candidate frame corresponding to sequence number 2 has been judged, the judgment will not be repeated here.
  • the candidate frame corresponding to sequence number 1 has already been determined. Selected as the first final candidate box.
  • the sequence numbers 4 to 6 are obtained.
  • the process of each cycle will select the sequence number that is reserved after this AND operation and the selected sequence number for determining the final The sequence number of the candidate frame, and then the sequence number to be reserved next time is screened according to the reserved sequence number, and the above-mentioned cyclic screening process is repeated until the stop condition is satisfied.
  • the final candidate frame is obtained according to the sequence number corresponding to the position of 1 in the global sequence obtained by the final update. 7c for understanding, assuming that the final obtained global sequence is shown in FIG. 7c, then it is determined that the sequence numbers corresponding to the position of 1 in the global sequence are sequence number 1 to sequence number 6, sequence number 10, sequence number 12 and sequence number 15, then determine the sequence number
  • the candidate frames corresponding to 1 to 6, the candidate frame corresponding to 10, the candidate frame corresponding to 12, and the candidate frame corresponding to 15 are the final candidate frames. If the sequence number assigned to each candidate box is in the order of confidence from high to low, the confidence level ranks the 1st to 6th candidate boxes, the 10th candidate box, the 12th and the 15th candidate box.
  • the candidate frame is the final candidate frame.
  • the output of the second single-core processor 203 may be the coordinates of the final candidate frame, such as outputting the coordinates of the upper left corner and the lower right corner of each final candidate frame.
  • the second single-core processor 203 performs multiple screenings on the sequence number sequence, actually each screening only needs to acquire the first few sequence numbers, or the first one sequence number.
  • the global sequence obtained after the first update actually only needs to obtain the sequence number 2, and then the next screening can be performed without the need to obtain the sequence numbers 3-6, sequence numbers 10, sequence numbers 12 and sequence numbers 16.
  • the updated global sequence after the second update in fact, only the sequence number 4 needs to be obtained, and the next screening can be performed without the need to obtain the sequence number 5 and the sequence number 6. Therefore, in a possible implementation, an early-stop indicator can be designed in an apparatus utilizing multiple cores to process candidate boxes.
  • the indicator is used to indicate that only a preset number of sequence numbers are acquired each time, such as acquiring one or three, so as to reduce redundant computation.
  • the device using the multi-core processing candidate frame acquires the sequence numbers of the first three positions of 1 for the updated global sequence, and then stops acquiring the sequence numbers of other positions of 1.
  • the amount of data input to the candidate frame of the device using multi-core processing candidate frames is too large, or the number of the first single-core processors 2021 included in the multi-core processor 202 is too small, etc., which may cause the first The amount of data that needs to be processed by a single-core processor 2021 , or the amount of data acquired by the first single-core processor 2021 exceeds the maximum storage space of the first single-core processor 2021 .
  • all candidate frames may be input into the apparatus for processing candidate frames by using multiple cores.
  • all candidate frames are divided into multiple groups of candidate frames, such as a first group of candidate frames and a second group of candidate frames, and each first single-core processor 2021 obtains from the first group of candidate frames according to their respective identification information
  • each first single-core processor 2021 obtains the suppression relationship between each candidate frame in some candidate frames and each candidate frame in all candidate frames
  • the second single-core processor 203 obtains the suppression relationship between each candidate frame in the partial candidate frames and each candidate frame in all candidate frames.
  • the suppression relationship obtained by the core processor 2021 and the confidence level corresponding to each candidate frame in the first group of candidate frames are used to obtain the final candidate frame in the first group of candidate frames.
  • Each first single-core processor 2021 then obtains some candidate frames from the second group of candidate frames according to their respective identification information, and each first single-core processor 2021 obtains each candidate frame in the partial candidate frame and all candidate frames.
  • the second single-core processor 203 obtains the second group of candidates according to the suppression relationship obtained by each first single-core processor 2021 and the corresponding confidence level of each candidate frame in the second group of candidate frames The final candidate box in the box.
  • the multi-core processor 202 processes the first group of candidate frames, and sends the processed data to the second single-core processor 203. For this part of the process, refer to the solutions described in FIGS. 1 to 8 . For understanding, it will not be repeated here.
  • the multi-core processor 202 processes the second group of candidate frames, and sends the processed data to the second single-core processor 203. So far, the second single-core processor 203 obtains the last update obtained for the first group of candidate frames.
  • the latter global sequence (hereinafter referred to as the first set of global sequences).
  • the first set of global sequences and the bit sequences corresponding to the candidate frames with the highest confidence in the second set of candidate frames are ANDed to obtain the first updated global sequence for the second set of candidate frames, and the subsequent cyclic calculation process
  • the global sequence in the initialization state of the second group of candidate boxes is the first group of global sequences. All final candidate boxes can be obtained according to the last updated global sequence of the second group of candidate boxes.
  • the final candidate frame may not be obtained according to the second group of candidate frames.
  • the above-mentioned two groups of candidate frames are only exemplary, and in fact, all candidate frames may be divided into more groups of candidate frames.
  • all candidate frames may also be input into the device using multi-core processing candidate frames at one time, and the total number of partial candidate frames obtained by each first single-core processor 2021 is less than the total number of all candidate frames. , for example, the total number of partial candidate frames acquired by each first single-core processor 2021 is half of the total number of all candidate frames. How each first single-core processor 2021 processes some of the obtained candidate frames, and how the second single-core processor 203 outputs the final candidate frame according to the data sent by each first single-core processor 2021 The specific introduction has been made above, and will not be repeated here.
  • the apparatus for processing candidate frames using multi-core does not process the remaining half of the candidate frames.
  • FIG. 10 it is a schematic flowchart of a method for processing candidate frames using multiple cores provided by an embodiment of the present application. Specifically, the method includes the following steps:
  • the obtained candidate frames of the image to be detected are candidate frames that have been sorted in descending order of confidence, and in the order of confidence from high to low, each candidate frame corresponds to A sequence number, the smaller the sequence number, the higher the confidence of the candidate frame.
  • a total of 2560 candidate frames are obtained, each candidate frame corresponds to a sequence number, and a total of 2560 sequence numbers are included.
  • each single-core processor of the plurality of first single-core processors can obtain all candidate frames, but only some of the candidate frames obtained by each of them are used as candidate frames to be processed, and the candidate frames to be processed follow step 1003 to be processed.
  • X first single-core processors are included in total, and each single-core processor obtains 16 candidate frames from all candidate frames as candidate frames to be processed.
  • the value of each bit in the bit sequence is only for illustration, and does not represent the real suppression relationship.
  • each first single-core processor is completely consistent.
  • FIG. 12 it is a schematic flowchart of obtaining the suppression relationship between candidate frames by the first single-core processor in an embodiment of the present application. Taking one of the single-core processors as an example, the first single-core processing model can perform the following steps:
  • steps 1201 to 1205 can be understood with reference to the relevant steps performed by the first single-core processor 2021 in the embodiments corresponding to FIG. 2a and FIG. 2b, and details are not repeated here. 11, when obtaining the suppression relationship, each candidate frame does not need to obtain the suppression relationship with itself, and does not need to consider the suppression relationship between candidate frames whose confidence is ranked before itself, Therefore, the bit sequence of each candidate frame output in step 1205, the bit sequence corresponding to the candidate frame of serial number P, and the first M-P bits do not need to be considered in the calculation, for example, all the first M-P bits can be set to 1.
  • each bit sequence output by each first single-core processor may be formed into a bit sequence matrix according to the serial number. This matrix of bit sequences can be stored in a memory unit as an input to the second single core processor.
  • the values in the bit sequence matrix are schematic illustrations and do not represent the real suppression relationship.
  • FIG. 13 it is a schematic flowchart of obtaining a final candidate frame by the second single-core processor in an embodiment of the present application.
  • the second single-core processor may perform the following steps:
  • step 1304. Repeat step 1302 and step 1303 until the stop condition is satisfied.
  • Steps 1301 to 1305 can be understood with reference to the relevant steps performed by the second single-core processor 203 in the embodiment corresponding to FIG. 2a , and details are not repeated here.
  • steps 1301 to 1305 may refer to the related processes performed by the first single-core processor 2021 in the embodiment corresponding to FIG. 2b. The steps are understood, and details are not repeated here.
  • the second single-core processor performs multiple screening on the sequence number sequence to obtain the final candidate frame.
  • the above-mentioned apparatus for utilizing multi-core processing candidate frames includes corresponding hardware structures and/or software modules for executing each function.
  • the present application can be implemented in hardware or in the form of a combination of hardware and computer software. Whether a function is performed by hardware or computer software driving hardware depends on the specific application and design constraints of the technical solution. Skilled artisans may implement the described functionality using different methods for each particular application, but such implementations should not be considered beyond the scope of this application.
  • the apparatus for utilizing the multi-core processing candidate frame in FIG. 2a to FIG. 13 can be implemented by one entity device, or can be implemented jointly by multiple entity devices, and can also be a logical function module in one entity device, This embodiment of the present application does not specifically limit this.
  • the apparatus for processing candidate frames using multiple cores can also be implemented by the computer device shown in FIG. 14 .
  • Schematic diagram of the hardware structure It includes: a communication interface 1401 and a processor 1402, and may also include a memory 1403.
  • the communication interface 1401 can use any device such as a transceiver for communicating with other devices or communication networks.
  • the end-side device can use the communication interface 1401 to communicate with the server, such as uploading a model or downloading a model.
  • the communication interface 1401 may use technologies such as Ethernet, radio access network (RAN), and wireless local area networks (WLAN) to communicate with the server.
  • RAN radio access network
  • WLAN wireless local area networks
  • the processor 1402 includes but is not limited to a central processing unit (CPU), a network processor (NP), an application-specific integrated circuit (ASIC) or a programmable logic device (programmable logic device, PLD) one or more.
  • the above-mentioned PLD may be a complex programmable logic device (CPLD), a field-programmable gate array (FPGA), a general-purpose array logic (generic array logic, GAL) or any combination thereof.
  • Processor 1402 is responsible for communication lines 1404 and general processing, and may also provide various functions including timing, peripheral interface, voltage regulation, power management, and other control functions.
  • Memory 1403 may be read-only memory (ROM) or other types of static storage devices that can store static information and instructions, random access memory (RAM), or other types of storage devices that can store information and instructions It can also be an electrically erasable programmable read-only memory (EEPROM), a compact disc read-only memory (CD-ROM) or other optical disk storage, CD-ROM storage (including compact discs, laser discs, optical discs, digital versatile discs, Blu-ray discs, etc.), magnetic disk storage media or other magnetic storage devices, or capable of carrying or storing desired program code in the form of instructions or data structures and capable of being executed by a computer Access any other medium without limitation.
  • the memory may exist independently and be connected to the processor 1402 through the communication line 1404 .
  • the memory 1403 may also be integrated with the processor 1402. If the memory 1403 and the processor 1402 are separate devices, the memory 1403 and the processor 1402 are connected, for example, the memory 1403 and the processor 1402 can communicate through a communication line.
  • the communication interface 1401 and the processor 1402 can communicate through a communication line, and the communication interface 1401 can also be directly connected to the processor 1402 .
  • Communication lines 1404 which may include any number of interconnected buses and bridges, link together various circuits including one or more processors 1402 , represented by processor 1402 , and memory, represented by memory 1403 . Communication lines 1404 may also link together various other circuits, such as peripherals, voltage regulators, and power management circuits, which are well known in the art and, therefore, will not be described further herein.
  • the second single-core processor is any one of the multiple first single-core processors, it can be considered that the processor 1402 in this application includes the multi-core processor 202 and the second single-core processor in the embodiment corresponding to FIG.
  • the processor 1402 in this application includes the first single-core processor in the embodiment corresponding to FIG. 2b 2021. It should be understood that the above is only an example provided by the embodiments of the present application, and an apparatus utilizing a multi-core processing candidate block may have more or less components than those shown, two or more components may be combined, or Different configurations of components are possible.
  • the device for utilizing a multi-core processing candidate frame may be a chip, and the chip includes: a processing unit and a communication unit, the processing unit may be, for example, a processor, and the communication unit may be, for example, an input/output interface, a pin or circuit etc.
  • the processing unit can execute the computer execution instructions stored in the storage unit, so that the chip executes the steps performed by the device using the multi-core processing candidate frame described in the embodiments shown in FIG. 2a to FIG. 13 .
  • the storage unit is a storage unit in the chip, such as a register, a cache, etc.
  • the storage unit may also be a storage unit located outside the chip in the wireless access device, such as only Read-only memory (ROM) or other types of static storage devices that can store static information and instructions, random access memory (RAM), etc.
  • ROM Read-only memory
  • RAM random access memory
  • the aforementioned processing unit or processor may be a central processing unit (CPU), a neural-network processing unit (NPU), a graphics processor (graphics processing unit, GPU), a digital signal processor (digital signal processor, DSP), application specific integrated circuit (ASIC) or field programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic device, Discrete hardware components, etc.
  • CPU central processing unit
  • NPU neural-network processing unit
  • GPU graphics processor
  • DSP digital signal processor
  • ASIC application specific integrated circuit
  • FPGA field programmable gate array
  • a general purpose processor may be a microprocessor or it may be any conventional processor or the like.
  • U disk U disk
  • mobile hard disk ROM
  • RAM random access memory
  • disk or CD etc.
  • a computer device which can be a personal computer, server, or network device, etc. to execute the methods described in the various embodiments of the present application.
  • Embodiments of the present application further provide a computer-readable storage medium, where a program for training a model is stored in the computer-readable storage medium, and when it runs on a computer, the computer executes the program shown in FIG. 9 or FIG. 10 above.
  • the examples describe steps in the method.
  • Embodiments of the present application also provide a computer-readable storage medium, where a program for data processing is stored in the computer-readable storage medium, and when the computer-readable storage medium runs on a computer, the computer executes the program shown in FIG. 6 , FIG. 8 , and FIG. 9. Steps in the method described in the embodiment shown in FIG. 11 . Or cause the computer to execute the steps in the method described in the embodiment shown in FIG. 12 above.
  • the embodiments of the present application also provide a digital processing chip.
  • the digital processing chip integrates circuits and one or more interfaces for realizing the above-mentioned processor or the functions of the processor.
  • the digital processing chip can perform the method steps of any one or more of the foregoing embodiments.
  • the digital processing chip does not integrate the memory, it can be connected with the external memory through the communication interface.
  • the digital processing chip implements the actions performed by the training device/translation device in the above embodiment according to the program codes stored in the external memory.
  • Embodiments of the present application also provide a computer program product, where the computer program product includes one or more computer instructions.
  • the computer may be a general purpose computer, special purpose computer, computer network, or other programmable device.
  • the computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be downloaded from a website site, computer, server, or data center Transmission to another website site, computer, server, or data center is by wire (eg, coaxial cable, fiber optic, digital subscriber line (DSL)) or wireless (eg, infrared, wireless, microwave, etc.).
  • wire eg, coaxial cable, fiber optic, digital subscriber line (DSL)
  • wireless eg, infrared, wireless, microwave, etc.
  • the computer-readable storage medium may be any available medium that can be stored by a computer, or a data storage device such as a server, data center, etc., which includes one or more available media integrated.
  • the usable media may be magnetic media (eg, floppy disks, hard disks, magnetic tapes), optical media (eg, DVD), or semiconductor media (eg, Solid State Disk (SSD)), and the like.
  • modules may be combined or integrated into another system, or some features may be ignored.
  • the shown or discussed mutual coupling or direct coupling or communication connection may be through some ports, and the indirect coupling or communication connection between modules may be electrical or other similar forms.
  • the modules or sub-modules described as separate components may or may not be physically separated, may or may not be physical modules, or may be distributed into multiple circuit modules, and some or all of them may be selected according to actual needs. module to achieve the purpose of the solution of this application.

Landscapes

  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

Disclosed is an apparatus for processing candidate boxes by using a plurality of cores. The apparatus comprises a plurality of single-core processors. A plurality of first single-core processors from among the plurality of single-core processors are used for executing the following processes in parallel: acquiring a portion of candidate boxes from all candidate boxes of an image to be detected; and acquiring a suppression relationship between each candidate box in the portion of candidate boxes and each candidate box in all the candidate boxes. A second single-core processor from among the plurality of single-core processors is used for acquiring a target candidate box according to the suppression relationship acquired by each first single-core processor and a confidence corresponding to each candidate box, wherein the second single-core processor is any one of the plurality of first single-core processors, or the second single-core processor is different from any one of the plurality of first single-core processors. By means of the solution provided in the present application, the calculation speed of an NMS algorithm can be improved.

Description

利用多核处理候选框的装置以及方法Apparatus and method for processing candidate frames using multiple cores 技术领域technical field
本申请涉及数据处理技术领域,尤其涉及一种利用多核处理候选框的装置以及方法。The present application relates to the technical field of data processing, and in particular, to an apparatus and method for processing candidate frames using multiple cores.
背景技术Background technique
计算机视觉是各个应用领域,如制造业、检验、文档分析、医疗诊断,和军事等领域中各种智能/自主系统中不可分割的一部分,它是一门关于如何运用照相机/摄像机和计算机来获取人们所需的被拍摄对象的数据与信息的学问。形象地说,就是给计算机安装上眼睛(照相机/摄像机)和大脑(算法)用来代替人眼对目标进行识别、跟踪和测量等,从而使计算机能够感知环境。因为感知可以看作是从感官信号中提取信息,所以计算机视觉也可以看作是研究如何使人工系统从图像或多维数据中“感知”的科学。总的来说,计算机视觉就是用各种成像系统代替视觉器官获取输入信息,再由计算机来代替大脑对这些输入信息完成处理和解释。计算机视觉的最终研究目标就是使计算机能像人那样通过视觉观察和理解世界,具有自主适应环境的能力。目标检测是计算机视觉领域常用到的技术。Computer vision is an integral part of various intelligent/autonomous systems in various application fields, such as manufacturing, inspection, document analysis, medical diagnosis, and military. The knowledge of the data and information that people need to be photographed. To put it figuratively, it is to install eyes (cameras/camcorders) and brains (algorithms) on the computer to identify, track and measure the target instead of the human eye, so that the computer can perceive the environment. Because perception can be viewed as extracting information from sensory signals, computer vision can also be viewed as the science of how to make artificial systems "perceive" from images or multidimensional data. In general, computer vision is the use of various imaging systems to replace the visual organ to obtain input information, and then the computer replaces the brain to process and interpret the input information. The ultimate research goal of computer vision is to enable computers to observe and understand the world through vision like humans, and have the ability to adapt to the environment autonomously. Object detection is a technique commonly used in the field of computer vision.
非极大值抑制(nonmaximum suppression,NMS)被广泛应用在计算机视觉算法中,比如,图像识别、边缘检测或物体检测等方面。分类器为NMS算法中的重要部分,其可以用于检测图像中的对象,比如是否为人脸等,针对图像中的一个对象,分类器会产生多个候选框。分类器会对每一个候选框计算置信度,也称为得分。为了能够准确地进行对象的检测或识别,对于一个对象只保留一个最优的候选框,将最优的候选框中的内容作为识别或检测出的对象。Nonmaximum suppression (NMS) is widely used in computer vision algorithms, such as image recognition, edge detection or object detection. The classifier is an important part of the NMS algorithm, which can be used to detect objects in the image, such as whether it is a face, etc. For an object in the image, the classifier will generate multiple candidate frames. The classifier computes a confidence level, also known as a score, for each candidate box. In order to accurately detect or recognize objects, only one optimal candidate frame is reserved for an object, and the content in the optimal candidate frame is used as the recognized or detected object.
具体的,选取图像中所有候选框中分类器得分最高的候选框,遍历计算其他候选框与分类器得分最高的候选框的重叠面积,根据重叠面积与预设阈值的关系删除其他候选框中的部分候选框。在未被删除的其他候选框中再次选取分类器得分最高的候选框,再次执行重叠面积的遍历计算,直至除最优候选框外的其他候选框全部被删除。在较复杂的图像中,候选框的数目非常大,导致遍历计算量增大,从而使得NMS算法的计算量也随之增大,造成NMS算法计算时间长的问题。Specifically, the candidate frame with the highest classifier score in all candidate frames in the image is selected, the overlapping area between other candidate frames and the candidate frame with the highest classifier score is traversed and calculated, and according to the relationship between the overlapping area and the preset threshold, the frames in the other candidate frames are deleted. Some candidate boxes. The candidate frame with the highest classifier score is selected again from the other candidate frames that have not been deleted, and the traversal calculation of the overlapping area is performed again until all other candidate frames except the optimal candidate frame are deleted. In a more complex image, the number of candidate boxes is very large, which leads to an increase in the traversal calculation amount, which increases the calculation amount of the NMS algorithm, resulting in the problem of long calculation time of the NMS algorithm.
发明内容SUMMARY OF THE INVENTION
本申请实施例提供一种利用多核处理候选框的装置以及方法,提升NMS算法的计算速度,提高获取目标候选框的效率。The embodiments of the present application provide an apparatus and method for processing candidate frames by using multiple cores, which improves the calculation speed of the NMS algorithm and improves the efficiency of obtaining target candidate frames.
为达到上述目的,本申请实施例提供如下技术方案:To achieve the above purpose, the embodiments of the present application provide the following technical solutions:
本申请第一方面提供一种利用多核处理候选框的装置,该利用多核处理候选框的装置可以包括多个单核处理器。可以通过该装置对待检测图像的全部候选框进行处理,其中,待检测图像的全部候选框中的每个候选框各自对应一个序号,任意两个候选框的序号不相同。在一个可能的实施方式中,各个候选框对应的序号是连续的。在一个可能的实施方式中,可以随机为每一个候选框分配不同的序号。在一个可能的实施方式中,可以根据全部候选框的置信度从高到低的顺序,为全部候选框分配序号。在一个可能的实施方式中,可以根据全部候选框的置信度从低到高的顺序,为全部候选框分配序号。多个单核处理器中的多个第一单核处理器,用于并行执行下述流程:从全部候选框中获取部分候选框。在一个可能的实施方式中,可以根据各自的标识信息从全部候选空中获取部分候选框。其中,标识信息具有唯一性, 任意两个第一单核处理器的标识信息不同,任意两个第一单核处理器获取的部分候选框不相同;获取部分候选框中的每个候选框与全部候选框中的每个候选框之间的抑制关系。在一种可能的实施方式中,可以不获取每个候选框和自身的抑制关系。多个单核处理器中的第二单核处理器,用于根据每个第一单核处理器获取的抑制关系以及各个候选框对应的置信度,获取最终候选框。其中,第二单核处理器是多个第一单核处理器中的任意一个,或者第二单核处理器与多个第一单核处理器中的任意一个都不相同。A first aspect of the present application provides an apparatus for processing candidate frames using multiple cores, and the apparatus for processing candidate frames using multiple cores may include multiple single-core processors. All candidate frames of the image to be detected can be processed by the device, wherein each candidate frame in all the candidate frames of the image to be detected corresponds to a sequence number, and the sequence numbers of any two candidate frames are different. In a possible implementation manner, the sequence numbers corresponding to each candidate frame are consecutive. In a possible implementation, different sequence numbers may be randomly assigned to each candidate frame. In a possible implementation manner, sequence numbers may be assigned to all candidate frames according to the order of the confidence levels of all candidate frames from high to low. In a possible implementation manner, sequence numbers may be assigned to all candidate frames according to the order of confidence of all candidate frames from low to high. The multiple first single-core processors among the multiple single-core processors are configured to execute the following process in parallel: acquiring some candidate frames from all the candidate frames. In a possible implementation, some candidate frames may be acquired from all candidate spaces according to their respective identification information. The identification information is unique, the identification information of any two first single-core processors is different, and the partial candidate frames obtained by any two first single-core processors are different; each candidate frame in the obtained partial candidate frame is the same as Suppression relationship between each candidate box in all candidate boxes. In a possible implementation, the suppression relationship between each candidate frame and itself may not be obtained. The second single-core processor among the plurality of single-core processors is configured to obtain the final candidate frame according to the suppression relationship obtained by each first single-core processor and the corresponding confidence level of each candidate frame. The second single-core processor is any one of the multiple first single-core processors, or the second single-core processor is different from any one of the multiple first single-core processors.
由第一方面提供的方案可知,本申请提供的方案将没有依赖关系的计算通过多核处理器并行处理,以加快处理进度。其中,没有依赖关系的计算可以理解与置信度的排序无关的计算。在本申请提供的方案中,可以将获取抑制关系的步骤通过多核处理模型执行。每个第一单核处理器分别获取一部分的候选框,并计算各自获取到的候选框中的每个候选框与全部候选框中的每个候选框之间的抑制关系。通过多核处理器,可以一次性获取到全部候选框中的任意两个候选框之间的抑制关系,再通过第二单核处理计算与置信度相关的计算,在本申请提供的方案中,根据获取的抑制关系以及各个候选框对应的置信度,获取最终候选框。本申请有时也将最终候选框称为目标候选框,二者均表示通过NMS算法对全部候选框进行处理后,保留下来的候选框。It can be known from the solution provided in the first aspect that the solution provided by the present application processes the computation without dependencies in parallel by a multi-core processor, so as to speed up the processing progress. Among them, computations without dependencies can be understood as computations independent of the ordering of confidence. In the solution provided in this application, the step of acquiring the inhibition relationship may be performed by a multi-core processing model. Each first single-core processor obtains a part of candidate frames respectively, and calculates the suppression relationship between each candidate frame in the obtained candidate frame and each candidate frame in all candidate frames. Through the multi-core processor, the inhibition relationship between any two candidate frames in all candidate frames can be obtained at one time, and then the calculation related to the confidence level is calculated through the second single-core processing. In the solution provided in this application, according to The obtained inhibition relationship and the corresponding confidence of each candidate frame are used to obtain the final candidate frame. In this application, the final candidate frame is also sometimes referred to as the target candidate frame, both of which represent the remaining candidate frames after all candidate frames are processed by the NMS algorithm.
在一种可能的实施方式中,每个第一单核处理器,具体用于:获取全部候选框中每个候选框的面积。获取部分候选框中的每个候选框与全部候选框中的每个候选框之间的重叠面积。获取部分候选框中的每个候选框与全部候选框中的每个候选框的叠加面积。根据重叠面积和叠加面积的比值与预设阈值的关系,获取部分候选框中的每个候选框与全部候选框中的每个候选框之间的抑制关系。在这种实施方式中,给出了一种获取任意两个候选框之间的抑制关系的过程,在本申请提供的方案中,获取任意两个候选框之间的抑制关系的过程通过多核处理器并行执行,提升计算的速度。此外,在一些可能的实施方式中,上述步骤中的至少一个步骤通过多核处理器并行执行,其余步骤可以通过单核处理器执行,比如获取全部候选框中每个候选框的面积、获取部分候选框中的每个候选框与全部候选框中的每个候选框之间的重叠面积、获取部分候选框中的每个候选框与全部候选框中的每个候选框的叠加面积,这三个步骤通过多核处理器获取,根据重叠面积和叠加面积的比值与预设阈值的关系,获取部分候选框中的每个候选框与全部候选框中的每个候选框之间的抑制关系可以在第二单核处理器中执行。In a possible implementation manner, each first single-core processor is specifically configured to: acquire the area of each candidate frame in all candidate frames. Obtain the overlap area between each candidate frame in some candidate frames and each candidate frame in all candidate frames. Obtain the overlapping area of each candidate frame in some candidate frames and each candidate frame in all candidate frames. According to the relationship between the overlapping area and the ratio of the overlapping area and the preset threshold, the suppression relationship between each candidate frame in some candidate frames and each candidate frame in all candidate frames is obtained. In this embodiment, a process of obtaining the suppression relationship between any two candidate frames is given. In the solution provided in this application, the process of obtaining the suppression relationship between any two candidate frames is processed by multi-core processing. The parallel execution of the processor increases the speed of the calculation. In addition, in some possible implementations, at least one of the above steps is performed in parallel by a multi-core processor, and the remaining steps may be performed by a single-core processor, such as obtaining the area of each candidate frame in all candidate frames, obtaining some candidate frames The overlapping area between each candidate frame in the frame and each candidate frame in all candidate frames, and the overlapping area between each candidate frame in some candidate frames and each candidate frame in all candidate frames, these three The step is obtained by a multi-core processor, and according to the relationship between the overlap area and the ratio of the overlap area and the preset threshold, the suppression relationship between each candidate frame in some candidate frames and each candidate frame in all candidate frames can be obtained in the first step. Two single-core processors execute.
在一种可能的实施方式中,每个第一单核处理器,具体用于根据各自的标识信息从候选框序列中获取排序相邻的部分候选框。候选框序列是根据所述置信度从高到低的顺序,对全部所述候选框进行排序后获取的。在这种实施方式中,多核处理器获取的候选框是按照置信度进行排序后的候选框,便于后续的计算。In a possible implementation manner, each first single-core processor is specifically configured to acquire, from the sequence of candidate frames, some adjacent candidate frames in order according to their respective identification information. The sequence of candidate frames is obtained by sorting all the candidate frames according to the order of the confidence levels from high to low. In this embodiment, the candidate frames obtained by the multi-core processor are the candidate frames sorted according to the confidence, which is convenient for subsequent calculation.
在一种可能的实施方式中,每个第一单核处理器,具体用于:根据各自的标识信息从候选框序列中获取数目相同的、排序相邻的部分候选框。在这种实施方式中,各个第一单核处理器获取的部分候选框的数目是相同的,比如第一个第一单核处理器获取序号1-20的候选框,第二个第一单核处理器获取序号是21-40的候选框,等等。通过这样的设计,有利于后续的计算,各个第一单核处理的性能可以是相同的。In a possible implementation manner, each first single-core processor is specifically configured to: obtain from the candidate frame sequence the same number of adjacent partial candidate frames in order according to the respective identification information. In this implementation manner, the number of partial candidate frames obtained by each first single-core processor is the same, for example, the first first single-core processor obtains candidate frames with serial numbers 1-20, and the second The core processor obtains candidate boxes with sequence numbers 21-40, and so on. Through such a design, subsequent calculations are facilitated, and the performance of each first single-core processing can be the same.
在一种可能的实施方式中,第二单核处理器,还用于:从每个第一单核处理器获取N个 比特序列,N为部分候选框的数目,所述N个比特序列用于表示所述部分候选框中的每个候选框与所述全部候选框中的每个候选框之间的抑制关系,所述N个比特序列中的每个比特序列用于表示第一候选框和全部所述候选框中的每个候选框之间的抑制关系,所述第一候选框是所述部分候选框中的一个候选框。当第二单核处理器与多个第一单核处理器中的任意一个都不相同时,每个第一单核处理器向第二单核处理器发送N个比特序列。换句话说,当第二单核处理器与多个第一单核处理器中的任意一个都不相同时,第二单核处理器接收每个第一单核处理器发送的N个比特序列。当第二单核处理器是多个第一单核处理器中的任意一个时,第二单核处理器从自身获取N个比特序列,从其他的第一单核处理器中的每个第一单核处理器各获取N个比特序列。换句话说,当第二单核处理器是多个第一单核处理器中的一个时,第二单核处理器可以获取自身产生的N个比特序列,并接收其他的每个第一单核处理器发送的N个比特序列。在这种实施方式中,可以通过比特序列表示一个候选框和全部候选框中的每一个候选框的抑制关系,有利于后续的计算过程,也增加了方案的多样性。In a possible implementation manner, the second single-core processor is further configured to: obtain N bit sequences from each first single-core processor, where N is the number of partial candidate frames, and the N bit sequences use In order to represent the suppression relationship between each candidate frame in the partial candidate frame and each candidate frame in the all candidate frames, each bit sequence in the N bit sequences is used to represent the first candidate frame and the suppression relationship between each candidate frame in all the candidate frames, the first candidate frame is one candidate frame in the partial candidate frame. When the second single-core processor is not identical to any one of the plurality of first single-core processors, each first single-core processor sends N bit sequences to the second single-core processor. In other words, when the second single-core processor is not identical to any one of the plurality of first single-core processors, the second single-core processor receives the sequence of N bits sent by each of the first single-core processors . When the second single-core processor is any one of the plurality of first single-core processors, the second single-core processor obtains N bit sequences from itself, and from each of the other first single-core processors A single-core processor each acquires N bit sequences. In other words, when the second single-core processor is one of the plurality of first single-core processors, the second single-core processor can obtain the N bit sequences generated by itself, and receive every other first single-core processor. A sequence of N bits sent by the core processor. In this embodiment, the suppression relationship between one candidate frame and each candidate frame in all the candidate frames can be represented by a bit sequence, which is beneficial to the subsequent calculation process and also increases the diversity of the scheme.
在一种可能的实施方式中,每个比特序列可以包括M个比特,M个比特中的每个比特用于表示第一候选框被第二候选框抑制,或者用于表示第一候选框未被第二候选框抑制,第二候选框是全部所述候选框中的一个候选框。M为全部候选框的数目。在这种可能的实施方式中,为了降低计算量,减少需要传输的数据量,提升算法的计算速度,M个位置中的每个位置可以通过1比特指示抑制关系,比如1比特为0时表示第二候选框被第一候选框抑制,1比特为1时表示第二候选框没有被第一候选框抑制。In a possible implementation, each bit sequence may include M bits, each of the M bits is used to indicate that the first candidate frame is suppressed by the second candidate frame, or to indicate that the first candidate frame is not Suppressed by a second candidate frame, which is one candidate frame among all the candidate frames. M is the number of all candidate frames. In this possible implementation, in order to reduce the amount of calculation, reduce the amount of data that needs to be transmitted, and improve the calculation speed of the algorithm, each of the M positions can use 1 bit to indicate the inhibition relationship, for example, when 1 bit is 0, it means The second candidate frame is suppressed by the first candidate frame, and when 1 bit is 1, it indicates that the second candidate frame is not suppressed by the first candidate frame.
在一种可能的实施方式中,每个第一单核处理器,还用于:根据获取的部分候选框的排序,对每个比特序列进行初始化操作,使初始化后的每个所述比特序列的前P个比特均用于表示第一候选框被第二候选框抑制,使初始化后的每个所述比特序列的后M-P个比特均用于表示第一候选框未被第二候选框抑制,P为第一候选框在部分候选框的排序中的序号。由于在获取抑制关系时,每个候选框不需要获取和自身的抑制关系,以及不需要考虑置信度排序在自身之前的候选框之间的抑制关系,所以在初始化比特序列时,使M个比特中的前P个比特均用于表示第一候选框被第二候选框抑制,使M个比特中后M-P个比特均用于表示第一候选框未被第二候选框抑制,P为第一候选框在部分候选框的排序中的序号。In a possible implementation manner, each first single-core processor is further configured to: perform an initialization operation on each bit sequence according to the obtained sorting of some candidate frames, so that each bit sequence after initialization is The first P bits are used to indicate that the first candidate frame is suppressed by the second candidate frame, so that the last M-P bits of each of the bit sequences after initialization are used to indicate that the first candidate frame is not suppressed by the second candidate frame. , P is the sequence number of the first candidate frame in the sorting of some candidate frames. When obtaining the suppression relationship, each candidate frame does not need to obtain the suppression relationship with itself, and does not need to consider the suppression relationship between candidate frames whose confidence levels are ranked before itself, so when initializing the bit sequence, M bits The first P bits are used to indicate that the first candidate frame is suppressed by the second candidate frame, so that the M-P bits in the M bits are used to indicate that the first candidate frame is not suppressed by the second candidate frame, and P is the first candidate frame. The sequence number of the candidate frame in the sorting of some candidate frames.
在一种可能的实施方式中,第二单核处理器,具体用于:对获取的比特序列进行多次处理,以获取目标候选框,其中,多次处理中的任意一次处理包括:根据序号序列从获取的比特序列中获取待处理的比特序列,序号序列用于指示每个比特序列的序号,其中,待处理的比特序列的序号是根据第二候选框的序号确定的,待处理的比特序列用于表示第二候选框和全部候选框中的每个候选框之间的抑制关系,第二候选框是全部候选框中的一个候选框,全部候选框中每个候选框的序号是根据候选框序列获取的。根据待处理的比特序列和上一次处理获取的全局序列,获取更新后的全局序列,更新后的全局序列指示没有被第二候选框以及排序在第二候选框之前的其他候选框抑制的候选框,更新后的全局序列用于获取目标候选框。在这种实施方式中,给出了一种具体的第二单核处理器获取最终候选框的方案。In a possible implementation manner, the second single-core processor is specifically configured to: perform multiple processing on the acquired bit sequence to obtain the target candidate frame, wherein any one processing in the multiple processing includes: according to the sequence number The sequence obtains the bit sequence to be processed from the acquired bit sequence, and the sequence number sequence is used to indicate the sequence number of each bit sequence, wherein the sequence number of the bit sequence to be processed is determined according to the sequence number of the second candidate frame. The sequence is used to represent the suppression relationship between the second candidate frame and each candidate frame in all candidate frames, the second candidate frame is a candidate frame in all candidate frames, and the sequence number of each candidate frame in all candidate frames is based on The candidate box sequence is obtained. Obtain an updated global sequence according to the bit sequence to be processed and the global sequence obtained by the previous processing, and the updated global sequence indicates candidate frames that are not suppressed by the second candidate frame and other candidate frames ranked before the second candidate frame , the updated global sequence is used to obtain the target candidate frame. In this embodiment, a specific solution for obtaining the final candidate frame by the second single-core processor is given.
在一种可能的实施方式中,第二单核处理器,具体用于:根据候选框序列,生成序号序列,候选框序列用于指示候选框的置信度的高低。对序号序列进行多次筛选,以获取多个序号,多个序号用于获取最终候选框,其中多次筛选中的任意一次筛选可以包括:根据前一次 筛选获取的序号获取特定候选框的比特序列,特定候选框的比特序列用于表示特定候选框和第二候选框之间的抑制关系。根据特定候选框的比特序列与上一次筛选获取的全局序列的抑制关系,获取更新后的全局序列,更新后的全局序列指示至少一个序号,至少一个序号对应的候选框没有被特定候选框以及排序在特定候选框之前的其他候选框抑制。在这种实施方式中,给出了一种具体的第二单核处理器获取最终候选框的方案。In a possible implementation manner, the second single-core processor is specifically configured to: generate a sequence number sequence according to the sequence of candidate frames, where the sequence of candidate frames is used to indicate the confidence level of the candidate frame. The sequence number sequence is screened multiple times to obtain multiple sequence numbers, and the multiple sequence numbers are used to obtain the final candidate frame, wherein any one of the multiple screenings may include: obtaining the bit sequence of a specific candidate frame according to the sequence number obtained by the previous screening , the bit sequence of the specific candidate frame is used to represent the suppression relationship between the specific candidate frame and the second candidate frame. According to the inhibition relationship between the bit sequence of the specific candidate frame and the global sequence obtained by the last screening, the updated global sequence is obtained. The updated global sequence indicates at least one sequence number, and the candidate frame corresponding to at least one sequence number is not sorted by the specific candidate frame and the sequence number. Other candidate boxes before a specific candidate box are suppressed. In this embodiment, a specific solution for obtaining the final candidate frame by the second single-core processor is given.
在一种可能的实施方式中,第二单核处理器,具体用于:根据获取的比特序列的序号,从获取的比特序列中获取待处理的比特序列,待处理的比特序列的序号是根据第二候选框的序号确定的,待处理的比特序列用于表示第二候选框和全部候选框中的每个候选框之间的抑制关系,第二候选框是全部候选框中的一个候选框,全部候选框中每个候选框的序号是根据候选框序列获取的。根据待处理的比特序列以及已处理的比特序列,获取目标候选框,目标候选框为没有被第二候选框以及排序在第二候选框之前的其他候选框抑制的候选框,每个已处理的比特序列用于表示排序在第二候选框之前的每个其他候选框和全部候选框中的每个候选框之间的抑制关系。In a possible implementation manner, the second single-core processor is specifically configured to: acquire the bit sequence to be processed from the acquired bit sequence according to the sequence number of the acquired bit sequence, and the sequence number of the bit sequence to be processed is based on the sequence number of the bit sequence to be processed. The sequence number of the second candidate frame is determined, and the bit sequence to be processed is used to represent the suppression relationship between the second candidate frame and each candidate frame in all candidate frames, and the second candidate frame is a candidate frame in all candidate frames. , the serial number of each candidate frame in all candidate frames is obtained according to the sequence of candidate frames. According to the bit sequence to be processed and the processed bit sequence, the target candidate frame is obtained. The target candidate frame is a candidate frame that is not suppressed by the second candidate frame and other candidate frames sorted before the second candidate frame. Each processed candidate frame The bit sequence is used to represent the suppression relationship between each other candidate frame ranked before the second candidate frame and each candidate frame in all candidate frames.
在一种可能的实施方式中,每个第一单核处理器获取的部分候选框组成了全部候选框。In a possible implementation manner, some candidate frames obtained by each first single-core processor constitute all candidate frames.
本申请第二方面提供一种利用多核处理候选框的方法,可以包括:获取待检测图像的候选框。通过多个第一单核处理器各自的标识信息从全部候选框中分别获取部分候选框。通过每个第一单核处理器获取部分候选框中的每个候选框与全部候选框中的每个候选框之间的抑制关系。通过第二单核处理器根据每个第一单核处理器获取的抑制关系以及各个候选框对应的置信度,获取最终候选框。A second aspect of the present application provides a method for processing a candidate frame using multiple cores, which may include: acquiring a candidate frame of an image to be detected. Part of the candidate frames are respectively obtained from all the candidate frames by using the respective identification information of the multiple first single-core processors. The suppression relationship between each candidate frame in some candidate frames and each candidate frame in all candidate frames is acquired by each first single-core processor. The second single-core processor obtains the final candidate frame according to the inhibition relationship obtained by each first single-core processor and the corresponding confidence level of each candidate frame.
在一种可能的实施方式中,通过每个第一单核处理器获取部分候选框中的每个候选框与全部候选框中的每个候选框之间的抑制关系,可以包括:通过每个第一单核处理器获取全部候选框中每个候选框的面积。通过每个第一单核处理器获取部分候选框中的每个候选框与全部候选框中的每个候选框之间的重叠面积。通过每个第一单核处理器获取部分候选框中的每个候选框与全部候选框中的每个候选框的叠加面积。通过每个第一单核处理器根据重叠面积和叠加面积的比值与预设阈值的关系,获取部分候选框中的每个候选框与全部候选框中的每个候选框之间的抑制关系。In a possible implementation manner, acquiring, by each first single-core processor, the suppression relationship between each candidate frame in some candidate frames and each candidate frame in all candidate frames may include: by each candidate frame The first single-core processor obtains the area of each candidate box in all candidate boxes. The overlapping area between each candidate frame in some candidate frames and each candidate frame in all candidate frames is obtained by each first single-core processor. The overlapping area of each candidate frame in some candidate frames and each candidate frame in all candidate frames is obtained by each first single-core processor. The suppression relationship between each candidate frame in some candidate frames and each candidate frame in all candidate frames is obtained by each first single-core processor according to the relationship between the overlapping area and the ratio of the overlapping area and the preset threshold.
在一种可能的实施方式中,该方法还可以包括:根据各个候选框对应的置信度从高到低的顺序,对全部候选框进行排序,以获取候选框序列。通过多个第一单核处理器各自的标识信息从全部候选框中分别获取部分候选框,可以包括:通过多个第一单核处理器各自的标识信息从候选框序列中获取排序相邻的部分候选框。In a possible implementation manner, the method may further include: sorting all the candidate frames according to the order of the confidence levels corresponding to each candidate frame from high to low, so as to obtain a sequence of candidate frames. Obtaining part of the candidate frames from all the candidate frames by using the respective identification information of the multiple first single-core processors may include: obtaining, from the sequence of candidate frames, the adjacent candidate frames by using the respective identification information of the multiple first single-core processors. Some candidate boxes.
在一种可能的实施方式中,通过多个第一单核处理器各自的标识信息从候选框序列中获取排序相邻的部分候选框,可以包括:通过多个第一单核处理器各自的标识信息从候选框序列中获取数目相同的、排序相邻的部分候选框。In a possible implementation manner, obtaining, from the candidate frame sequence, some adjacent candidate frames in order by using the respective identification information of the multiple first single-core processors may include: using the respective identification information of the multiple first single-core processors The identification information obtains the same number of adjacent partial candidate frames from the sequence of candidate frames.
在一种可能的实施方式中,该方法还可以包括:通过第二单核处理器从每个第一单核处理器获取N个比特序列,N为部分候选框的数目,N个比特序列中的每个比特序列用于表示第一候选框和第二候选框之间的抑制关系,第一候选框是部分候选框中的一个候选框,第二候选框是全部候选框中的每个候选框。In a possible implementation manner, the method may further include: obtaining, through the second single-core processor, N bit sequences from each of the first single-core processors, where N is the number of partial candidate frames, and among the N bit sequences Each bit sequence of is used to represent the suppression relationship between the first candidate frame and the second candidate frame, the first candidate frame is a candidate frame in some candidate frames, and the second candidate frame is each candidate frame in all candidate frames frame.
在一种可能的实施方式中,每个比特序列可以包括M个比特,M个比特中的每个比特用 于表示第一候选框被第二候选框抑制,或者用于表示第一候选框未被第二候选框抑制,M为全部候选框的数目。In a possible implementation, each bit sequence may include M bits, each of the M bits is used to indicate that the first candidate frame is suppressed by the second candidate frame, or to indicate that the first candidate frame is not Suppressed by the second candidate frame, M is the number of all candidate frames.
在一种可能的实施方式中,该方法还可以包括:通过每个第一单核处理器根据获取的部分候选框的排序,对每个比特序列进行初始化操作,使M个比特中的前P个比特均用于表示第一候选框被第二候选框抑制,使M个比特中后M-P个比特均用于表示第一候选框未被第二候选框抑制,P为第一候选框在部分候选框的排序中的序号。In a possible implementation manner, the method may further include: performing an initialization operation on each bit sequence by each first single-core processor according to the order of the obtained partial candidate frames, so that the first P in the M bits are initialized. bits are used to indicate that the first candidate frame is suppressed by the second candidate frame, so that the last M-P bits of the M bits are used to indicate that the first candidate frame is not suppressed by the second candidate frame, and P is the first candidate frame in the partial The sequence number in the ordering of candidate boxes.
在一种可能的实施方式中,通过第二单核处理器根据每个第一单核处理器获取的抑制关系以及各个候选框对应的置信度,获取最终候选框,可以包括:根据候选框序列,生成序号序列。对序号序列进行多次筛选,以获取多个序号,多个序号用于获取最终候选框,其中多次筛选中的任意一次筛选可以包括:根据前一次筛选获取的序号获取特定候选框的比特序列,特定候选框的比特序列用于表示特定候选框和第二候选框之间的抑制关系。根据特定候选框的比特序列与上一次筛选获取的全局序列的抑制关系,获取更新后的全局序列,更新后的全局序列指示至少一个序号,至少一个序号对应的候选框没有被特定候选框以及排序在特定候选框之前的其他候选框抑制。In a possible implementation manner, obtaining the final candidate frame by the second single-core processor according to the suppression relationship obtained by each first single-core processor and the corresponding confidence level of each candidate frame may include: according to the sequence of candidate frames , to generate a sequence of serial numbers. The sequence number sequence is screened multiple times to obtain multiple sequence numbers, and the multiple sequence numbers are used to obtain the final candidate frame, wherein any one of the multiple screenings may include: obtaining the bit sequence of a specific candidate frame according to the sequence number obtained by the previous screening , the bit sequence of the specific candidate frame is used to represent the suppression relationship between the specific candidate frame and the second candidate frame. According to the inhibition relationship between the bit sequence of the specific candidate frame and the global sequence obtained by the last screening, the updated global sequence is obtained. The updated global sequence indicates at least one sequence number, and the candidate frame corresponding to at least one sequence number is not sorted by the specific candidate frame and the sequence number. Other candidate boxes before a specific candidate box are suppressed.
在一种可能的实施方式中,通过第二单核处理器根据每个第一单核处理器获取的抑制关系,以及全部候选框中的每个候选框对应的置信度,获取目标候选框,包括:通过第二单核处理器对获取的比特序列进行多次处理,以获取目标候选框,其中,多次处理中的任意一次处理包括:根据序号序列从获取的比特序列中获取待处理的比特序列,序号序列用于指示每个比特序列的序号,其中,待处理的比特序列的序号是根据第二候选框的序号确定的,待处理的比特序列用于表示第二候选框和全部候选框中的每个候选框之间的抑制关系,第二候选框是全部候选框中的一个候选框,全部候选框中每个候选框的序号是根据候选框序列获取的。根据待处理的比特序列和上一次处理获取的全局序列,获取更新后的全局序列,更新后的全局序列指示没有被第二候选框以及排序在第二候选框之前的其他候选框抑制的候选框,更新后的全局序列用于获取目标候选框。In a possible implementation manner, the target candidate frame is obtained by the second single-core processor according to the inhibition relationship obtained by each first single-core processor and the confidence level corresponding to each candidate frame in all the candidate frames, The method includes: performing multiple processing on the acquired bit sequence by the second single-core processor to obtain the target candidate frame, wherein any one processing in the multiple processing includes: acquiring the to-be-processed bit sequence from the acquired bit sequence according to the sequence number sequence Bit sequence, the sequence number sequence is used to indicate the sequence number of each bit sequence, where the sequence number of the bit sequence to be processed is determined according to the sequence number of the second candidate frame, and the bit sequence to be processed is used to indicate the second candidate frame and all candidate frames. The suppression relationship between each candidate frame in the frame, the second candidate frame is a candidate frame in all candidate frames, and the sequence number of each candidate frame in all candidate frames is obtained according to the sequence of candidate frames. Obtain an updated global sequence according to the bit sequence to be processed and the global sequence obtained by the previous processing, and the updated global sequence indicates candidate frames that are not suppressed by the second candidate frame and other candidate frames ranked before the second candidate frame , the updated global sequence is used to obtain the target candidate frame.
在一种可能的实施方式中,每个第一单核处理器获取的部分候选框组成了全部候选框。In a possible implementation manner, some candidate frames obtained by each first single-core processor constitute all candidate frames.
本申请第三方面提供一种神经网络装置,神经网络装置可以包括利用多核处理候选框的装置,该利用多核处理候选框的装置为第一方面所描述的利用多核处理候选框的装置。A third aspect of the present application provides a neural network device. The neural network device may include a device for processing candidate frames using multiple cores. The device for processing candidate frames using multiple cores is the device for processing candidate frames using multiple cores described in the first aspect.
本申请第四方面提供一种利用多核处理候选框的装置,可以包括:存储器,用于存储计算机可读指令。还可以包括,与存储器耦合的处理器,用于执行存储器中的计算机可读指令从而执行如第二方面所描述的方法。A fourth aspect of the present application provides an apparatus for processing candidate boxes using multiple cores, which may include: a memory for storing computer-readable instructions. It may also include a processor coupled to the memory for executing computer readable instructions in the memory to perform the method as described in the second aspect.
本申请第五方面提供一种芯片系统,芯片系统可以包括处理器和通信接口,处理器通过通信接口获取程序指令,当程序指令被处理器执行时实现第二方面所描述的方法。A fifth aspect of the present application provides a chip system, the chip system may include a processor and a communication interface, the processor obtains program instructions through the communication interface, and the method described in the second aspect is implemented when the program instructions are executed by the processor.
本申请第六方面提供一种计算机可读存储介质,可以包括程序,当其被处理单元所执行时,执行如第二方面所描述的方法。A sixth aspect of the present application provides a computer-readable storage medium, which may include a program that, when executed by a processing unit, executes the method described in the second aspect.
本申请第七方面提供一种计算机程序产品,当计算机程序产品在计算机上运行时,使得计算机执行如第二方面的方法。A seventh aspect of the present application provides a computer program product, which, when the computer program product is run on a computer, causes the computer to perform the method of the second aspect.
第二方面至第七方面所描述的方案的有益效果可以参照第一方面所描述的方案的有益效果进行理解,这里不再重复赘述。The beneficial effects of the solutions described in the second to seventh aspects can be understood with reference to the beneficial effects of the solutions described in the first aspect, and details are not repeated here.
本申请提供的方案,将NMS算法中可并行计算部分和具有数据依赖部分的计算分开。通过多核处理器获取候选框之间的抑制关系,利用多核处理器的并行执行能力,增加计算的灵活性,提升NMS算法的计算速度。在不需要增加特殊的处理指令的前提下,可以获取全部候选框中任意一个候选框和全部候选框中的每个候选框之间的抑制关系,进而获取比特序列矩阵。其中,比特序列矩阵中每个比特序列可以通过1bit指示两个候选框之间的抑制关系,数据量小,有效减少多核处理器传输到第二单核处理的数据量,缩短计算时间。第二单核处理器可以就比特序列矩阵提取不被抑制的候选框对应的比特序列,删除被抑制的比特序列,比如可以通过vreduce指令实现这一过程。本申请提供的方案可以加速NMS算法的计算过程,快速获取最终的候选框。The solution provided in this application separates the parallel computing part and the data-dependent part of the calculation in the NMS algorithm. The inhibition relationship between candidate boxes is obtained through a multi-core processor, and the parallel execution capability of the multi-core processor is used to increase the flexibility of calculation and improve the calculation speed of the NMS algorithm. On the premise of no need to add special processing instructions, the suppression relationship between any candidate frame in all candidate frames and each candidate frame in all candidate frames can be obtained, and then the bit sequence matrix can be obtained. Among them, each bit sequence in the bit sequence matrix can indicate the suppression relationship between the two candidate boxes through 1 bit, and the amount of data is small, which effectively reduces the amount of data transmitted by the multi-core processor to the second single-core processing, and shortens the calculation time. The second single-core processor may extract the bit sequence corresponding to the unsuppressed candidate frame from the bit sequence matrix, and delete the suppressed bit sequence. For example, the vreduce instruction may be used to implement this process. The solution provided in this application can speed up the calculation process of the NMS algorithm and quickly obtain the final candidate frame.
附图说明Description of drawings
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。In order to explain the embodiments of the present invention or the technical solutions in the prior art more clearly, the following briefly introduces the accompanying drawings that need to be used in the description of the embodiments or the prior art. Obviously, the accompanying drawings in the following description are only These are some embodiments of the present invention, and for those of ordinary skill in the art, other drawings can also be obtained from these drawings without any creative effort.
图1为一种RPN的架构示意图;FIG. 1 is a schematic diagram of the architecture of an RPN;
图2a为本申请实施例提供的一种利用多核处理候选框的装置的结构示意图;FIG. 2a is a schematic structural diagram of an apparatus for processing candidate frames using multiple cores provided by an embodiment of the present application;
图2b为本申请实施例提供的一种利用多核处理候选框的装置的结构示意图;FIG. 2b is a schematic structural diagram of an apparatus for processing candidate frames using multiple cores provided by an embodiment of the present application;
图3为本申请实施例中多核处理器中的各个单核处理器获取部分候选框的示意图;3 is a schematic diagram of each single-core processor in a multi-core processor in an embodiment of the present application acquiring some candidate frames;
图4为通过SIMD指令同时获取多个候选框的面积的流程示意图;Fig. 4 is the schematic flow chart of simultaneously obtaining the area of a plurality of candidate frames by SIMD instruction;
图5为通过SIMD指令同时获取第一候选框和全部候选框中的每一个候选框之间的重叠面积的流程示意图;5 is a schematic flowchart of simultaneously obtaining the overlap area between each candidate frame in the first candidate frame and all candidate frames through SIMD instructions;
图6为本申请实施例中通过比特序列表示第一候选框和第二候选框之间的抑制关系的示意图;FIG. 6 is a schematic diagram of the suppression relationship between the first candidate frame and the second candidate frame represented by a bit sequence in an embodiment of the present application;
图7a为本申请实施例中第二单核处理器对序号进行筛选的示意图;7a is a schematic diagram of the second single-core processor screening serial numbers in an embodiment of the present application;
图7b为本申请实施例中第二单核处理器对序号进行筛选的示意图;7b is a schematic diagram of the second single-core processor screening serial numbers in an embodiment of the present application;
图7c为本申请实施例中第二单核处理器对序号进行筛选的示意图;7c is a schematic diagram of the second single-core processor screening serial numbers in an embodiment of the present application;
图8为本申请实施例中第二单核处理器对序号进行筛选的示意图;8 is a schematic diagram of the second single-core processor screening serial numbers in an embodiment of the present application;
图9为本申请实施例提供的一种利用多核处理候选框的方法的流程示意图;9 is a schematic flowchart of a method for processing candidate frames using multiple cores provided by an embodiment of the present application;
图10为本申请实施例提供的一种利用多核处理候选框的方法的流程示意图;10 is a schematic flowchart of a method for processing candidate frames using multiple cores provided by an embodiment of the present application;
图11为本申请实施例提供的一种利用多核处理候选框的方法的流程示意图;FIG. 11 is a schematic flowchart of a method for processing candidate frames using multiple cores provided by an embodiment of the present application;
图12为本申请实施例中第一单核处理器获取候选框之间的抑制关系的流程示意图;12 is a schematic flowchart of the first single-core processor acquiring the suppression relationship between candidate frames in an embodiment of the present application;
图13为本申请实施例中第二单核处理器获取最终候选框的流程示意图;FIG. 13 is a schematic flowchart of obtaining a final candidate frame by a second single-core processor in an embodiment of the present application;
图14为本申请实施例提供的一种计算机设备的结构示意图。FIG. 14 is a schematic structural diagram of a computer device according to an embodiment of the present application.
具体实施方式Detailed ways
下面结合附图,对本申请的实施例进行描述,显然,所描述的实施例仅仅是本申请一部分的实施例,而不是全部的实施例。本领域普通技术人员可知,随着技术的发展和新场景的出现,本申请实施例提供的技术方案对于类似的技术问题,同样适用。The embodiments of the present application will be described below with reference to the accompanying drawings. Obviously, the described embodiments are only a part of the embodiments of the present application, rather than all the embodiments. Those of ordinary skill in the art know that with the development of technology and the emergence of new scenarios, the technical solutions provided in the embodiments of the present application are also applicable to similar technical problems.
本申请提供一种利用多核处理候选框的装置以及方法。通过本申请提供的一种利用多核 处理候选框的装置可以提升NMS算法的计算速度。The present application provides an apparatus and method for processing candidate frames using multiple cores. The computing speed of the NMS algorithm can be improved by the device for processing candidate frames using multiple cores provided in this application.
由于本申请涉及大量NMS相关的知识,为了更好的理解本申请提供的方案,下面首先对NMS以及相关知识进行介绍。Since this application involves a large amount of NMS-related knowledge, in order to better understand the solution provided by this application, NMS and related knowledge are first introduced below.
NMS算法的基本思想是搜索局部极大值,抑制不是极大值的元素。NMS在视频跟踪(viedo tracking)和物体识别(object recognition)等领域应用十分广泛。比如,在边缘检测、人脸检测、目标检测等场景都有十分广泛的应用。以执行目标检测任务为例进行说明,目标检测的关键在于精准地将感兴趣的目标从场景中定位出来,并正确的判定目标的类别。目标检测系统通常采用两阶段来定位和识别感兴趣的目标,即候选区域阶段与区域检测阶段。候选区域阶段旨在从目标可能出现的位置、尺度中找到几百或上千个候选框,使目标被全部包含在这些候选框中。区域检测阶段对这些候选框中潜在的目标进行进一步识别、定位,从而准确的判定出目标的类别。随着深度学习的出现,目标检测系统大都基于深度神经网络。目前,候选区域阶段通用的模型为区域建议网络(region proposal network,RPN)。如图1所示,为一种RPN的架构示意图。RPN200的输入为特征图(future map)。特征图可以通过卷积神经网络(convolutional neuron nrtwork,CNN)对待处理图像进行特征提取后获取。卷积神经网络(CNN)100可以包括输入层110,卷积层/池化层120。卷积层/池化层120可以包括如示例121-126层,在一种实现中,121层为卷积层,122层为池化层,123层为卷积层,124层为池化层,125为卷积层,126为池化层;在另一种实现方式中,121、122为卷积层,123为池化层,124、125为卷积层,126为池化层。即卷积层的输出可以作为随后的池化层的输入,也可以作为另一个卷积层的输入以继续进行卷积操作。以卷积层121为例,卷积层121可以包括很多个卷积算子,卷积算子也称为核,其在图像处理中的作用相当于一个从输入图像矩阵中提取特定信息的过滤器,卷积算子本质上可以是一个权重矩阵,这个权重矩阵通常被预先定义,在对图像进行卷积操作的过程中,权重矩阵通常在输入图像上沿着水平方向一个像素接着一个像素(或两个像素接着两个像素……这取决于步长stride的取值)的进行处理,从而完成从图像中提取特定特征的工作。该权重矩阵的大小应该与图像的大小相关,需要注意的是,权重矩阵的纵深维度(depth dimension)和输入图像的纵深维度是相同的,在进行卷积运算的过程中,权重矩阵会延伸到输入图像的整个深度。因此,和一个单一的权重矩阵进行卷积会产生一个单一纵深维度的卷积化输出,但是大多数情况下不使用单一权重矩阵,而是应用维度相同的多个权重矩阵。每个权重矩阵的输出被堆叠起来形成卷积图像的纵深维度。不同的权重矩阵可以用来提取图像中不同的特征,例如一个权重矩阵用来提取图像边缘信息,另一个权重矩阵用来提取图像的特定颜色,又一个权重矩阵用来对图像中不需要的噪点进行模糊化……该多个权重矩阵维度相同,经过该多个维度相同的权重矩阵提取后的特征图维度也相同,再将提取到的多个维度相同的特征图合并形成卷积运算的输出。这些权重矩阵中的权重值在实际应用中需要经过大量的训练得到,通过训练得到的权重值形成的各个权重矩阵可以从输入图像中提取信息,从而帮助卷积神经网络100进行正确的预测。当卷积神经网络100有多个卷积层的时候,初始的卷积层(例如121)往往提取较多的一般特征,该一般特征也可以称之为低级别的特征;随着卷积神经网络100深度的加深,越往后的卷积层(例如126)提取到的特征越来越复杂,比如高级别的语义之类的特征,语义越高的特征越适用于待解决的问题。由于常常需要减少训练参数的数量,因此卷积层之后常常需要 周期性的引入池化层,即如图1中120所示例的121-126各层,可以是一层卷积层后面跟一层池化层,也可以是多层卷积层后面接一层或多层池化层。以图像数据为例,在图像处理过程中,池化层的唯一目的就是减少图像的空间大小。池化层可以包括平均池化算子和/或最大池化算子,以用于对输入图像进行采样得到较小尺寸的图像。平均池化算子可以在特定范围内对图像中的像素值进行计算产生平均值。最大池化算子可以在特定范围内取该范围内值最大的像素作为最大池化的结果。另外,就像卷积层中用权重矩阵的大小应该与图像大小相关一样,池化层中的运算符也应该与图像的大小相关。通过池化层处理后输出的图像尺寸可以小于输入池化层的图像的尺寸,池化层输出的图像中每个像素点表示输入池化层的图像的对应子区域的平均值或最大值。卷积神经网络100除了卷积层/池化层120,还可能包括其他结构,比如隐含层、输出层等等,因为与本方案无关,不多做介绍。卷积层/池化层120获取特征图后,将特征图输入至RPN200内。RPN200网络在特征图上,使用滑窗来捕获图像各个位置的特征,并将各个位置的特征分别对应到k个不同尺度和宽高比的锚窗上,k一般为大于1的正整数,比如k为3。生成候选框时,将各个位置的特征输入到一个中间层(例如,回归网络、全连接网络等)中,对每个锚窗判定包括待检测对象以及不包括待检测对象。本文也将中间层称为分类器。包括待检测对象的锚窗可以认为是候选框,还会对候选框进行打分,得分越高的候选框表示包括待检测对象的概率越高。在一个可能的实施方式中,RPN网络还可能包括其他处理步骤,比如对获取到的特征图进行压缩(reshape)处理,本申请实施例对RPN网络还可能包括更多的处理并不进行限定。The basic idea of the NMS algorithm is to search for local maxima and suppress elements that are not maxima. NMS is widely used in video tracking (viedo tracking) and object recognition (object recognition) and other fields. For example, it is widely used in edge detection, face detection, target detection and other scenarios. Taking the task of target detection as an example to illustrate, the key to target detection is to accurately locate the target of interest from the scene and correctly determine the type of the target. Object detection systems usually employ two stages to locate and identify objects of interest, namely the candidate region stage and the region detection stage. The candidate region stage aims to find hundreds or thousands of candidate boxes from the possible positions and scales of the target, so that the target is all contained in these candidate boxes. The region detection stage further identifies and locates the potential targets in these candidate frames, so as to accurately determine the category of the target. With the advent of deep learning, most object detection systems are based on deep neural networks. At present, the common model in the candidate region stage is the region proposal network (RPN). As shown in FIG. 1 , it is a schematic diagram of the architecture of an RPN. The input of RPN200 is a feature map (future map). The feature map can be obtained by extracting the features of the image to be processed through a convolutional neural network (CNN). A convolutional neural network (CNN) 100 may include an input layer 110 , a convolutional/pooling layer 120 . The convolutional/pooling layer 120 may include layers as examples 121-126, in one implementation, layer 121 is a convolutional layer, layer 122 is a pooling layer, layer 123 is a convolutional layer, and layer 124 is a pooling layer , 125 is a convolution layer, 126 is a pooling layer; in another implementation manner, 121 and 122 are a convolutional layer, 123 is a pooling layer, 124 and 125 are a convolutional layer, and 126 is a pooling layer. That is, the output of a convolutional layer can be used as the input of a subsequent pooling layer, or it can be used as the input of another convolutional layer to continue the convolution operation. Taking the convolution layer 121 as an example, the convolution layer 121 may include many convolution operators, which are also called kernels, and their role in image processing is equivalent to a filter that extracts specific information from the input image matrix. The convolution operator can be essentially a weight matrix. This weight matrix is usually pre-defined. In the process of convolving an image, the weight matrix is usually pixel by pixel along the horizontal direction on the input image ( Or two pixels after two pixels...depending on the value of stride), which completes the work of extracting specific features from the image. The size of the weight matrix should be related to the size of the image. It should be noted that the depth dimension of the weight matrix is the same as the depth dimension of the input image. During the convolution operation, the weight matrix will be extended to Enter the entire depth of the image. Therefore, convolution with a single weight matrix will produce a single depth dimension of the convolutional output, but in most cases a single weight matrix is not used, but multiple weight matrices of the same dimension are applied. The output of each weight matrix is stacked to form the depth dimension of the convolutional image. Different weight matrices can be used to extract different features in the image. For example, one weight matrix is used to extract image edge information, another weight matrix is used to extract specific colors of the image, and another weight matrix is used to extract unwanted noise in the image. Perform fuzzification... The dimensions of the multiple weight matrices are the same, and the dimension of the feature maps extracted from the weight matrices with the same dimensions are also the same, and then the multiple extracted feature maps with the same dimensions are combined to form the output of the convolution operation . The weight values in these weight matrices need to be obtained through a lot of training in practical applications, and each weight matrix formed by the weight values obtained by training can extract information from the input image, thereby helping the convolutional neural network 100 to make correct predictions. When the convolutional neural network 100 has multiple convolutional layers, the initial convolutional layer (for example, 121) often extracts more general features, which can also be called low-level features; with the convolutional neural network As the depth of the network 100 deepens, the features extracted by the later convolutional layers (eg 126) become more and more complex, such as features such as high-level semantics. Features with higher semantics are more suitable for the problem to be solved. Since it is often necessary to reduce the number of training parameters, it is often necessary to periodically introduce a pooling layer after the convolutional layer, that is, each layer 121-126 as shown in 120 in Figure 1, which can be a convolutional layer followed by a layer The pooling layer can also be a multi-layer convolutional layer followed by one or more pooling layers. Taking image data as an example, in the image processing process, the only purpose of the pooling layer is to reduce the spatial size of the image. The pooling layer may include an average pooling operator and/or a max pooling operator for sampling the input image to obtain a smaller size image. The average pooling operator can calculate the average value of the pixel values in the image within a certain range. The max pooling operator can take the pixel with the largest value within a specific range as the result of max pooling. Also, just as the size of the weight matrix used in the convolutional layer should be related to the size of the image, the operators in the pooling layer should also be related to the size of the image. The size of the output image after processing by the pooling layer can be smaller than the size of the image input to the pooling layer, and each pixel in the image output by the pooling layer represents the average or maximum value of the corresponding sub-region of the image input to the pooling layer. In addition to the convolutional layer/pooling layer 120, the convolutional neural network 100 may also include other structures, such as a hidden layer, an output layer, etc., because it has nothing to do with this solution, and will not be introduced. After the convolutional layer/pooling layer 120 obtains the feature map, the feature map is input into the RPN200. The RPN200 network uses a sliding window to capture the features of each position of the image on the feature map, and maps the features of each position to k anchor windows of different scales and aspect ratios, where k is generally a positive integer greater than 1, such as k is 3. When generating a candidate frame, the features of each position are input into an intermediate layer (eg, regression network, fully connected network, etc.), and each anchor window is determined to include or exclude the object to be detected. This paper also refers to the middle layer as a classifier. An anchor window including the object to be detected can be considered as a candidate frame, and the candidate frame is also scored, and a candidate frame with a higher score indicates a higher probability of including the object to be detected. In a possible implementation manner, the RPN network may further include other processing steps, such as performing compression (reshape) processing on the acquired feature map. This embodiment of the present application does not limit the RPN network may include more processing steps.
针对同一个待检测对象,可能产生大量的候选框,这些候选框相互之间可能会有重叠,此时需要利用NMS找到最佳的候选框,消除冗余的候选框。在传统NMS的实现流程中,对获得的大量的候选框又称边界框进行串行处理,每轮选取置信度最大的候选框,接着关注所有剩下的候选框中与选取的候选框有着高重叠率(面积的重叠率)的候选框,它们将在这一轮被抑制。这一轮选取的候选框将从候选框列表中删除,不会在下一轮出现。之后开始下一轮,重复上述过程,选取置信度最大候选框,抑制高重叠率的候选框。举例说明,假设按照置信度从高到低,对多个候选框进行排序,获取到候选框1,候选框2,候选框3,候选框4,候选框5。其中,候选框1的置信度最大,关注剩下的候选框2,候选框3,候选框4,候选框5与候选框1的重叠率,假设候选框4与候选框1有着高重叠率(其中高重叠率可以理解为重叠率超过预设阈值),则候选框4被候选框1抑制。这一轮选取了候选框1为第一个最终候选框,同时确定候选框4被抑制。下一轮时,从候选框2、候选框3以及候选框5中选取置信度最大的候选框,即候选框2,关注剩下的候选框3和候选框5和候选框2的重叠率,假设候选框3与候选框2有着较高的重叠率,则候选框3被候选框2抑制。这一轮,选取了候选框2为第二个最终候选框,同时确定候选框4被抑制。下一轮,只剩下候选框5,则候选框5没有被任何一个候选框抑制,候选框5为第三个最终候选框。由上述过程可知,每一轮选取的候选框依赖于前一轮的选取结果,比如选取第二个最终候选框时,依赖于第一轮选取的结果,选取第二个最终候选框时,依赖于第一轮选取的结果和第二轮选取结果,这为NMS算法的并行优化带来了较大难度。For the same object to be detected, a large number of candidate frames may be generated, and these candidate frames may overlap with each other. At this time, it is necessary to use NMS to find the best candidate frame and eliminate redundant candidate frames. In the implementation process of traditional NMS, a large number of obtained candidate frames, also known as bounding boxes, are serially processed, and the candidate frame with the highest confidence is selected in each round, and then all remaining candidate frames are concerned with the selected candidate frame. The candidate boxes for the overlap ratio (the overlap ratio of the area), which will be suppressed in this round. The candidate frame selected in this round will be deleted from the candidate frame list and will not appear in the next round. Then start the next round, repeat the above process, select the candidate frame with the highest confidence, and suppress the candidate frame with high overlap rate. For example, it is assumed that multiple candidate frames are sorted according to the degree of confidence from high to low, and candidate frame 1, candidate frame 2, candidate frame 3, candidate frame 4, and candidate frame 5 are obtained. Among them, candidate frame 1 has the highest confidence, and focus on the remaining candidate frame 2, candidate frame 3, candidate frame 4, candidate frame 5 and the overlap rate of candidate frame 1. It is assumed that candidate frame 4 and candidate frame 1 have a high overlap rate ( The high overlap rate can be understood as the overlap rate exceeding the preset threshold), then the candidate frame 4 is suppressed by the candidate frame 1 . In this round, candidate frame 1 is selected as the first final candidate frame, and candidate frame 4 is determined to be suppressed. In the next round, select the candidate frame with the highest confidence from the candidate frame 2, the candidate frame 3 and the candidate frame 5, that is, the candidate frame 2, pay attention to the overlap rate of the remaining candidate frame 3, candidate frame 5 and candidate frame 2, Assuming that candidate frame 3 and candidate frame 2 have a high overlap rate, candidate frame 3 is suppressed by candidate frame 2. In this round, candidate frame 2 is selected as the second final candidate frame, and at the same time, it is determined that candidate frame 4 is suppressed. In the next round, only candidate frame 5 is left, then candidate frame 5 is not suppressed by any candidate frame, and candidate frame 5 is the third final candidate frame. It can be seen from the above process that the candidate frame selected in each round depends on the selection result of the previous round. For example, when selecting the second final candidate frame, it depends on the result of the first round selection, and when selecting the second final candidate frame, it depends on Based on the results of the first round of selection and the second round of selection results, this brings great difficulty to the parallel optimization of the NMS algorithm.
为了解决上述技术问题,本申请实施例提供一种利用多核处理候选框的装置,将NMS算法的计算过程进行拆分,将不具有依赖关系的计算并行执行,以提升NMS算法的计算速 度。以下对此进行详细说明。In order to solve the above technical problems, the embodiment of the present application provides an apparatus for processing candidate frames using multiple cores, which splits the calculation process of the NMS algorithm, and executes the calculations without dependencies in parallel, so as to improve the calculation speed of the NMS algorithm. This will be described in detail below.
本申请实施例提供的一种利用多核处理候选框的装置可以包括多个第一单核处理器2021和第二单核处理器203。其中,第二单核处理器203是多个第一单核处理器2021中的任意一个,或者第二单核处理器203与多个第一单核处理器2021中的任意一个都不相同。参阅图2a,为本申请实施例提供的一种利用多核处理候选框的装置的结构示意图。如图2a所示,当第二单核处理器203与多个第一单核处理器2021中的任意一个都不相同时,可以将多个第一单核处理器2021整体看做多核处理器202。参阅图2b,为本申请实施例提供的另一种利用多核处理候选框的装置的结构示意图。如图2b所示,该装置包括多个第一单核处理器,可以认为图2b所示的结构,是在图2a的基础上,将第二单核处理器203看做多个第一单核处理器2021中的任意一个。An apparatus for processing candidate frames by using multiple cores provided in this embodiment of the present application may include multiple first single-core processors 2021 and second single-core processors 203 . The second single-core processor 203 is any one of the multiple first single-core processors 2021 , or the second single-core processor 203 is different from any one of the multiple first single-core processors 2021 . Referring to FIG. 2a , it is a schematic structural diagram of an apparatus for processing candidate frames using multiple cores provided by an embodiment of the present application. As shown in FIG. 2a, when the second single-core processor 203 is different from any one of the multiple first single-core processors 2021, the multiple first single-core processors 2021 can be regarded as a multi-core processor as a whole 202. Referring to FIG. 2b, it is a schematic structural diagram of another apparatus for processing candidate frames by using multiple cores provided by an embodiment of the present application. As shown in FIG. 2b, the apparatus includes a plurality of first single-core processors. It can be considered that the structure shown in FIG. 2b is based on FIG. 2a, and the second single-core processor 203 is regarded as a plurality of first single-core processors. Any of the core processors 2021.
每个第一单核处理器2021,用于根据从待检测图像的全部候选框中获取部分候选框,并获取部分候选框中的每个候选框与全部候选框中的每个候选框之间的抑制关系。关于如何获取待检测图像的候选框可以参照图1中关于如何获取候选框的过程进行理解,这里不再重复赘述。在一个可能的实施方式中,每个第一单核处理器2021可以根据各自的标识信息从待检测图像的全部候选框中获取部分候选框。在一个可能的实施方式中,各个第一单核处理器2021获取的部分候选框组成了全部的候选框。需要说明的是,本申请有时也将利用多核处理候选框的装置称为数据处理装置,二者表示相同的意思。Each first single-core processor 2021 is configured to obtain partial candidate frames from all candidate frames of the image to be detected, and obtain the difference between each candidate frame in the partial candidate frame and each candidate frame in all candidate frames inhibition relationship. How to obtain the candidate frame of the image to be detected can be understood with reference to the process of how to obtain the candidate frame in FIG. 1 , and details are not repeated here. In a possible implementation manner, each first single-core processor 2021 may acquire some candidate frames from all candidate frames of the image to be detected according to the respective identification information. In a possible implementation manner, some candidate frames obtained by each first single-core processor 2021 constitute all candidate frames. It should be noted that, in this application, an apparatus for processing candidate frames by using multiple cores may also be referred to as a data processing apparatus, and the two have the same meaning.
其中,全部候选框中的每个候选框各自对应一个序号,任意两个候选框的序号不相同。在一个可能的实施方式中,各个候选框对应的序号是连续的。在一个可能的实施方式中,可以随机为每一个候选框分配不同的序号。在一个可能的实施方式中,可以根据全部候选框的置信度从高到低的顺序,为全部候选框分配序号。在一个可能的实施方式中,可以根据全部候选框的置信度从低到高的顺序,为全部候选框分配序号。比如,全部候选框包括候选框A,候选框B,候选框C,候选框D以及候选框E,按照各个候选框的置信度从高到低对候选框进行排序,获取到候选框E的置信度最高,候选框C的置信度次之,候选框A的置信度再次之,候选框D的置信度小于候选框A的置信度,候选框B的置信度最低。则根据全部候选框的置信度从高低的顺序,为候选框E分配序号1,为候选框C分配序号2,为候选框A分配序号3,为候选框D分配序号4,为候选框B分配序号5。Wherein, each candidate frame in all the candidate frames corresponds to a sequence number, and the sequence numbers of any two candidate frames are different. In a possible implementation manner, the sequence numbers corresponding to each candidate frame are consecutive. In a possible implementation, different sequence numbers may be randomly assigned to each candidate frame. In a possible implementation manner, sequence numbers may be assigned to all candidate frames according to the order of the confidence levels of all candidate frames from high to low. In a possible implementation manner, sequence numbers may be assigned to all candidate frames according to the order of confidence of all candidate frames from low to high. For example, all candidate frames include candidate frame A, candidate frame B, candidate frame C, candidate frame D and candidate frame E, sort the candidate frames according to the confidence of each candidate frame from high to low, and obtain the confidence of candidate frame E The confidence degree of candidate frame C is the second, the confidence degree of candidate frame A is the next, the confidence degree of candidate frame D is lower than that of candidate frame A, and the confidence degree of candidate frame B is the lowest. Then according to the order of confidence of all candidate frames from high to low, the candidate frame E is assigned the sequence number 1, the candidate frame C is assigned the sequence number 2, the candidate frame A is assigned the sequence number 3, the candidate frame D is assigned the sequence number 4, and the candidate frame B is assigned Serial number 5.
利用多核处理候选框的装置获取到全部候选框后,可以将全部候选框发送至多核处理器202中的每一个第一单核处理器2021中。在一个可能的实施方式中,可以将根据置信度排序后全部候选框发送至多核处理器202中的每一个第一单核处理器2021中。在一个可能的实施方式中,也可以将未根据置信度排序后的全部候选框发送至多核处理器202中的每一个第一单核处理器2021中。After the apparatus for processing candidate frames by using multiple cores obtains all the candidate frames, all the candidate frames may be sent to each of the first single-core processors 2021 in the multi-core processor 202 . In a possible implementation manner, all candidate frames sorted according to the confidence may be sent to each first single-core processor 2021 in the multi-core processor 202 . In a possible implementation manner, all candidate boxes that are not sorted according to the confidence may also be sent to each first single-core processor 2021 in the multi-core processor 202 .
每一个第一单核处理器2021获取到全部候选框后,根据各自的标识信息从全部候选框中选取部分候选框作为待处理候选框,任意两个第一单核处理器2021获取的部分候选框不相同。其中,标识信息具有唯一性,任意两个第一单核处理器2021的标识信息不同,任意两个第一单核处理器2021获取的部分候选框不相同。After each first single-core processor 2021 obtains all the candidate frames, it selects some candidate frames from all the candidate frames as candidate frames to be processed according to the respective identification information, and some candidate frames obtained by any two first single-core processors 2021 The boxes are not the same. The identification information is unique, the identification information of any two first single-core processors 2021 is different, and the partial candidate frames obtained by any two first single-core processors 2021 are different.
在一个可能的实施方式中,各个第一单核处理器2021获取的部分候选框的数目可以是相同的。比如,假设一共有N个候选框,一共有5个第一单核处理器2021,则每个单核处理器 获取的部分候选框的数目为N/5。举例说明,假设N为第一单核处理器2021,则第一个第一单核处理器2021根据预设的标识信息可以从全部候选框中获取序号1-20对应的候选框,第二个第一单核处理器2021根据预设的标识信息从全部候选框中获取序号21-40对应的候选框,第三个第一单核处理器2021根据预设的标识信息从全部候选框中获取序号41-60对应的候选框,第四个第一单核处理器2021根据预设的标识信息从全部候选框中获取序号61-80对应的候选框,第五个第一单核处理器2021根据预设的标识信息从全部候选框中获取序号81-100对应的候选框。在这个例子中,每一个第一单核处理器2021都根据各自的标识信息从全部候选框中获取了部分候选框,且获取的部分候选框的数目相同。当每一个第一单核处理器2021获取的是按照置信度排序后的全部候选框时,在上述例子中,第一单核处理器2021获取的是置信度排序前20的候选框,第二单核处理器203获取的是置信度排序第21至第40的候选框,第三单核处理器获取的是置信度排序第41至第60的候选框,第四单核处理器获取的是置信度排序第61至第80的候选框,第五单核处理器获取的是置信度排序第81至第第一单核处理器2021的候选框。In a possible implementation manner, the number of partial candidate frames acquired by each first single-core processor 2021 may be the same. For example, assuming that there are N candidate frames in total and there are 5 first single-core processors 2021 in total, the number of partial candidate frames obtained by each single-core processor is N/5. For example, assuming that N is the first single-core processor 2021, the first first single-core processor 2021 can obtain candidate frames corresponding to serial numbers 1-20 from all candidate frames according to the preset identification information, and the second The first single-core processor 2021 obtains candidate frames corresponding to serial numbers 21-40 from all candidate frames according to the preset identification information, and the third first single-core processor 2021 obtains from all candidate frames according to the preset identification information The candidate frame corresponding to the serial number 41-60, the fourth first single-core processor 2021 obtains the candidate frame corresponding to the serial number 61-80 from all the candidate frames according to the preset identification information, and the fifth first single-core processor 2021 The candidate frames corresponding to the serial numbers 81-100 are obtained from all the candidate frames according to the preset identification information. In this example, each of the first single-core processors 2021 obtains some candidate frames from all the candidate frames according to the respective identification information, and the number of the obtained partial candidate frames is the same. When each first single-core processor 2021 obtains all candidate frames sorted by confidence, in the above example, the first single-core processor 2021 obtains the top 20 candidate frames in the confidence ranking, and the second The single-core processor 203 obtains the candidate boxes from the 21st to the 40th in the confidence order, the third single-core processor obtains the candidate boxes from the 41st to the 60th in the confidence order, and the fourth single-core processor obtains the The candidate boxes from the 61st to the 80th in the confidence order are obtained by the fifth single-core processor, and the candidate boxes from the 81st to the first single-core processor 2021 in the confidence order are obtained.
如图3所示,本申请实施例中多核处理器202中的各个单核处理器获取部分候选框的示意图。在这种实施方式中,将排序后的候选框作为利用多核处理候选框的装置的输入。假设一共包括2560个序号,每个序号对应一个候选框,序号的取值越小,代表候选框的置信度越高。将排序后的候选框复制到多核处理器202中,具体的复制到多核处理器202中各个第一单核处理器2021中。如图3所示,每个第一单核处理器2021根据各自的标识信息从2560个候选框中挑选各自的部分候选框。在图3所示的例子中,输入的排序后的候选框被平均分为R个区域分块(tiling),每个区域分块包括的候选框的数目相同,如图3中的黑框表示每个第一单核处理器2021获取的区域分块,或者理解为每个第一单核处理器2021获取的部分候选框。As shown in FIG. 3 , in the embodiment of the present application, each single-core processor in the multi-core processor 202 obtains a schematic diagram of some candidate frames. In such an embodiment, the sorted candidate boxes are used as input to the apparatus for processing candidate boxes using multiple cores. It is assumed that there are 2560 sequence numbers in total, and each sequence number corresponds to a candidate frame. The smaller the value of the sequence number, the higher the confidence of the candidate frame. The sorted candidate frames are copied to the multi-core processor 202 , and specifically to each of the first single-core processors 2021 in the multi-core processor 202 . As shown in FIG. 3 , each first single-core processor 2021 selects its own partial candidate frame from the 2560 candidate frames according to its own identification information. In the example shown in Figure 3, the input sorted candidate frames are evenly divided into R regional tiles (tiling), and each regional tile includes the same number of candidate boxes, as indicated by the black boxes in Figure 3 The region obtained by each first single-core processor 2021 is divided into blocks, or understood as a partial candidate frame obtained by each first single-core processor 2021 .
在一个可能的实施方式中,各个第一单核处理器2021获取的部分候选框的数目可以是不相同的。比如,一共有N个候选框,N为第一单核处理器2021,一共有5个第一单核处理器2021,第一个第一单核处理器2021根据预设的标识信息可以从全部候选框中获取序号1-20对应的候选框,第二个第一单核处理器2021根据预设的标识信息从全部候选框中获取序号21-60对应的候选框,第三个第一单核处理器2021根据预设的标识信息从全部候选框中获取序号61-70对应的候选框,第四个第一单核处理器2021根据预设的标识信息从全部候选框中获取序号71-85对应的候选框,第五个第一单核处理器2021根据预设的标识信息从全部候选框中获取序号86-100。In a possible implementation manner, the number of partial candidate frames acquired by each first single-core processor 2021 may be different. For example, there are a total of N candidate boxes, N is the first single-core processor 2021, there are a total of five first single-core processors 2021, and the first first single-core processor 2021 can be selected from all the first single-core processors 2021 according to the preset identification information The candidate frame obtains the candidate frame corresponding to the serial number 1-20, the second first single-core processor 2021 obtains the candidate frame corresponding to the serial number 21-60 from all the candidate frames according to the preset identification information, and the third first single-core processor 2021 obtains the candidate frame corresponding to the serial number 21-60. The core processor 2021 obtains the candidate frames corresponding to the sequence numbers 61-70 from all the candidate frames according to the preset identification information, and the fourth first single-core processor 2021 obtains the sequence numbers 71-70 from all the candidate frames according to the preset identification information. The candidate frame corresponding to 85, the fifth first single-core processor 2021 obtains the sequence numbers 86-100 from all the candidate frames according to the preset identification information.
每个第一单核处理器2021根据各自的标识信息从全部候选框中获取部分候选框后,获取部分候选框中的每个候选框与全部候选框中的每个候选框之间的抑制关系。这是一个并行处理的过程,每个第一单核处理器2021获取各自的部分候选框中的每个候选框与全部候选框中的每个候选框之间的抑制关系后,则将各个第一单核处理器2021输出的结果归总后,即可获取全部候选框中的每个候选框和其他所有候选框之间的抑制关系。结合上述各个第一单核处理器2021获取的部分候选框的数目可以是相同的例子继续说明,第一个单核处理器获取序号为1的候选框与全部N个候选框中的每个候选框之间的抑制关系的同时,第二个单核处理器获取序号为21的候选框与全部N个候选框中的每个候选框之间的抑制关系,同时第三个单 核处理器获取序号为41的候选框与全部N个候选框中的每个候选框之间的抑制关系,同时第四个单核处理器获取序号为61的候选框与全部N个候选框中的每个候选框之间的抑制关系,同时第五个单核处理器获取序号为81的候选框与全部N个中的每个候选框候选框之间的抑制关系。当第一个单核处理器获取了序号1至20对应的候选框中的每一个候选框与全部候选框中的每一个候选框的抑制关系的同时,第二个单核处理器获取了序号21至40对应的候选框中的每一个候选框与全部候选框中的每一个候选框的抑制关系,同时第三个单核处理器获取了序号41至60对应的候选框中的每一个候选框与全部候选框中的每一个候选框的抑制关系,同时第四个单核处理器获取了序号61至80对应的候选框中的每一个候选框与全部候选框中的每一个候选框的抑制关系,同时第五个单核处理器获取了序号81至第一单核处理器2021对应的候选框中的每一个候选框与全部候选框中的每一个候选框的抑制关系。After each first single-core processor 2021 acquires some candidate frames from all candidate frames according to the respective identification information, it acquires the suppression relationship between each candidate frame in the partial candidate frame and each candidate frame in all candidate frames . This is a process of parallel processing. After each first single-core processor 2021 obtains the suppression relationship between each candidate frame in its own partial candidate frame and each candidate frame in all candidate frames, After the results output by a single-core processor 2021 are summed up, the suppression relationship between each candidate frame in all candidate frames and all other candidate frames can be obtained. The number of partial candidate frames obtained by each of the above-mentioned first single-core processors 2021 may be the same. The description continues. The first single-core processor obtains a candidate frame with a sequence number of 1 and each candidate frame in all N candidate frames. At the same time as the suppression relationship between the boxes, the second single-core processor obtains the suppression relationship between the candidate box with serial number 21 and each candidate box in all N candidate boxes, and the third single-core processor obtains The suppression relationship between the candidate frame with serial number 41 and each candidate frame in all N candidate frames, while the fourth single-core processor obtains the candidate frame with serial number 61 and each candidate frame in all N candidate frames At the same time, the fifth single-core processor obtains the suppression relationship between the candidate frame with serial number 81 and each candidate frame in all N candidate frames. When the first single-core processor obtains the suppression relationship between each candidate frame in the candidate frames corresponding to the sequence numbers 1 to 20 and each candidate frame in all the candidate frames, the second single-core processor obtains the sequence number The suppression relationship between each candidate frame in the candidate frames corresponding to 21 to 40 and each candidate frame in all candidate frames, while the third single-core processor obtains each candidate frame in the candidate frames corresponding to serial numbers 41 to 60. The suppression relationship between the frame and each candidate frame in all candidate frames, and the fourth single-core processor obtains the information of each candidate frame in the candidate frames corresponding to serial numbers 61 to 80 and each candidate frame in all candidate frames. At the same time, the fifth single-core processor obtains the suppression relationship between each candidate frame in the candidate frame corresponding to the serial number 81 to the first single-core processor 2021 and each candidate frame in all the candidate frames.
示例性的,下面给出一种计算部分候选框中的每个候选框与全部候选框中的每个候选框之间的抑制关系的方式。Exemplarily, a method for calculating the suppression relationship between each candidate frame in some candidate frames and each candidate frame in all candidate frames is given below.
获取全部候选框中每个候选框的面积。在一个可能的实施方式中,每个第一单核处理器2021获取全部候选框中每个候选框的面积。由于每一个候选框的面积的计算过程是独立的,可以使用单指令流多数据流(single instruction multiple data),使多核处理器202中的每个第一单核处理器2021同时计算多个候选框的面积。在另一种可能的实施方式中,每个第一单核处理器2021计算各自获取到的部分候选框中的每个候选框的面积,根据各个第一单核处理器2021的计算结果,则可以获取到全部候选框中的每个候选框的面积。每个第一单核处理器2021可以获取其他第一单核处理器2021计算的候选框的面积。比如在上述的例子中,第一单核处理器2021可以获取到序号为21至第一单核处理器2021的候选框中的每一个候选框的面积。在一个可能的实施方式中,可以由一个单核处理器获取全部候选框中每个候选框的面积,并将获取到的全部候选框中每个候选框的面积发送至其他第一单核处理器2021中。Get the area of each candidate box in all candidate boxes. In a possible implementation manner, each first single-core processor 2021 acquires the area of each candidate frame in all candidate frames. Since the calculation process of the area of each candidate frame is independent, a single instruction stream with multiple data streams (single instruction multiple data) can be used, so that each first single-core processor 2021 in the multi-core processor 202 can simultaneously calculate multiple candidates area of the box. In another possible implementation, each first single-core processor 2021 calculates the area of each candidate frame in the partially obtained candidate frame, and according to the calculation result of each first single-core processor 2021, then The area of each candidate frame in all candidate frames can be obtained. Each first single-core processor 2021 may acquire the area of the candidate box calculated by the other first single-core processors 2021 . For example, in the above example, the first single-core processor 2021 may obtain the area of each candidate box in the candidate boxes whose serial numbers are from 21 to the first single-core processor 2021 . In a possible implementation, a single-core processor may acquire the area of each candidate frame in all candidate frames, and send the acquired area of each candidate frame in all candidate frames to another first single-core processor for processing device 2021.
每一个候选框对应一个坐标,用于表示该候选框在特征图上的坐标信息,可以根据一个候选框的坐标获取候选框的面积。比如,通常可以通过候选框的左上角坐标和右下角坐标表示一个候选框,假设左上角的坐标为(x1,y1),右下角的坐标为(x2,y2),则该候选框的面积可以表示为(x2-x1+1)*(y2-y1+1)。如图4所示,为通过SIMD指令同时获取多个候选框的面积的流程示意图。在图4的例子中,一共包括16个候选框,每个候选框左上角的横坐标为x1,每个候选框左上角的纵坐标为y1,每个候选框的右下角的横坐标为x2,每个候选框的右下角的纵坐标为y2。根据公式(x2-x1+1)*(y2-y1+1),可以同时获取到16个候选框中每个候选框的面积。Each candidate frame corresponds to a coordinate, which is used to represent the coordinate information of the candidate frame on the feature map, and the area of the candidate frame can be obtained according to the coordinates of a candidate frame. For example, a candidate frame can usually be represented by the coordinates of the upper left corner and the lower right corner of the candidate frame. Assuming that the coordinates of the upper left corner are (x1, y1) and the coordinates of the lower right corner are (x2, y2), the area of the candidate frame can be Expressed as (x2-x1+1)*(y2-y1+1). As shown in FIG. 4 , it is a schematic flowchart of simultaneously obtaining the areas of multiple candidate boxes through SIMD instructions. In the example of Figure 4, there are 16 candidate frames in total, the abscissa of the upper left corner of each candidate frame is x1, the ordinate of the upper left corner of each candidate frame is y1, and the abscissa of the lower right corner of each candidate frame is x2 , and the ordinate of the lower right corner of each candidate box is y2. According to the formula (x2-x1+1)*(y2-y1+1), the area of each candidate frame in the 16 candidate frames can be obtained simultaneously.
获取了全部候选框中每个候选框的面积之后,为了计算任意两个候选框之间的抑制关系,还需要计算两个候选框之间的重叠率。具体的,可以获取部分候选框中的每个候选框与全部候选框中的每个候选框之间的重叠面积。获取部分候选框中的每个候选框与全部候选框中的每个候选框的叠加面积。根据重叠面积和叠加面积的比值与预设阈值的关系,获取部分候选框中的每个候选框与全部候选框中的每个候选框之间的抑制关系。以下进行详细说明。After obtaining the area of each candidate frame in all candidate frames, in order to calculate the suppression relationship between any two candidate frames, it is also necessary to calculate the overlap ratio between the two candidate frames. Specifically, the overlapping area between each candidate frame in some candidate frames and each candidate frame in all candidate frames can be obtained. Obtain the overlapping area of each candidate frame in some candidate frames and each candidate frame in all candidate frames. According to the relationship between the overlapping area and the ratio of the overlapping area and the preset threshold, the suppression relationship between each candidate frame in some candidate frames and each candidate frame in all candidate frames is obtained. A detailed description will be given below.
针对每一个第一单核处理器2021,每一个第一单核处理器2021对各自获取到的候选框,均计算获取到的每一个候选框与所有的候选框之间的重叠面积。假设当前计算第一候选框和其他所有候选框中的各个候选框之间的重叠面积。假设第一候选框的左上角的坐标为(sel-x1, sel-y1),右下角的坐标为(sel-x2,sel-y2),全部候选框中的一个候选框的左上角的坐标为(x1,y1),右下角的坐标为(x2,y2)。则该两个候选框之间的重叠面积可以表示为[min(sel-x2,x2)-max(sel-x1,x1)+1]*[min(sel-y2,y2)-max(sel-y1,y1)+1]。如图5所述,为通过SIMD指令同时获取第一候选框和全部候选框中的每一个候选框之间的重叠面积的流程示意图。在图5的例子中,一共包括16个候选框,当前被选中的候选框(第一候选框)的左上角的坐标为(sel-x1,sel-y1),右下角的坐标为(sel-x2,sel-y2),全部候选框中的每个候选框的左上角的坐标为(x1,y1),右下角的坐标为(x2,y2)。根据公式[min(sel-x2,x2)-max(sel-x1,x1)+1]*[min(sel-y2,y2)-max(sel-y1,y1)+1],可以同时获取到被选中的候选框和全部候选框中的每个候选框之间的重叠面积。For each first single-core processor 2021 , each first single-core processor 2021 calculates the overlapping area between each obtained candidate frame and all the candidate frames for each of the obtained candidate frames. Assume that the overlapping area between the first candidate frame and each candidate frame in all other candidate frames is currently calculated. Assuming that the coordinates of the upper left corner of the first candidate frame are (sel-x1, sel-y1), the coordinates of the lower right corner are (sel-x2, sel-y2), and the coordinates of the upper left corner of a candidate frame in all candidate frames are (x1, y1), the coordinates of the lower right corner are (x2, y2). Then the overlapping area between the two candidate boxes can be expressed as [min(sel-x2, x2)-max(sel-x1, x1)+1]*[min(sel-y2, y2)-max(sel- y1, y1)+1]. As shown in FIG. 5 , it is a schematic flowchart of simultaneously obtaining the overlapping area between the first candidate frame and each candidate frame in all candidate frames through SIMD instructions. In the example of Fig. 5, there are 16 candidate frames in total. The coordinates of the upper left corner of the currently selected candidate frame (the first candidate frame) are (sel-x1, sel-y1), and the coordinates of the lower right corner are (sel-x1, sel-y1). x2, sel-y2), the coordinates of the upper left corner of each candidate frame in all candidate frames are (x1, y1), and the coordinates of the lower right corner are (x2, y2). According to the formula [min(sel-x2, x2)-max(sel-x1, x1)+1]*[min(sel-y2, y2)-max(sel-y1, y1)+1], it can be obtained at the same time The overlap area between the selected candidate box and each candidate box in all candidate boxes.
针对每一个第一单核处理器2021,每一个第一单核处理器2021对各自获取到的候选框,均计算获取到的每一个候选框与所有的候选框之间的叠加面积。上文已经介绍了根据候选框的坐标获取各个候选框的面积,将任意两个候选框的面积进行叠加,则可以获取两个候选框之间的叠加面积。具体的,比如假设第一候选框的左上角的坐标为(sel-x1,sel-y1),右下角的坐标为(sel-x2,sel-y2),全部候选框中的一个候选框的左上角的坐标为(x1,y1),右下角的坐标为(x2,y2)。则该两个候选框之间的重叠面积可以表示为[(x2-x1+1)*(y2-y1+1)]+[(sel-x2-sel-x1+1)*(sel-y2-sel-y1+1)]。在一个可能的实施方式中,可以通过SIMD指令同时获取第一候选框和全部候选框中的每个候选框之间的叠加面积,比如根据公式[(x2-x1+1)*(y2-y1+1)]+[(sel-x2-sel-x1+1)*(sel-y2-sel-y1+1)]同时获取到第一候选框和全部候选框中的每个候选框之间的叠加面积。For each first single-core processor 2021 , each first single-core processor 2021 calculates the overlapping area between each obtained candidate frame and all the candidate frames for each obtained candidate frame. It has been described above that the area of each candidate frame is obtained according to the coordinates of the candidate frame, and the overlapping area between the two candidate frames can be obtained by superimposing the areas of any two candidate frames. Specifically, for example, it is assumed that the coordinates of the upper left corner of the first candidate frame are (sel-x1, sel-y1), the coordinates of the lower right corner are (sel-x2, sel-y2), and the upper left corner of a candidate frame in all candidate frames is The coordinates of the corner are (x1, y1), and the coordinates of the lower right corner are (x2, y2). Then the overlapping area between the two candidate boxes can be expressed as [(x2-x1+1)*(y2-y1+1)]+[(sel-x2-sel-x1+1)*(sel-y2- sel-y1+1)]. In a possible implementation manner, the overlapping area between the first candidate frame and each candidate frame in all candidate frames can be obtained simultaneously through SIMD instructions, for example, according to the formula [(x2-x1+1)*(y2-y1 +1)]+[(sel-x2-sel-x1+1)*(sel-y2-sel-y1+1)] simultaneously obtains the difference between the first candidate frame and each candidate frame in all candidate frames Overlay area.
根据每个候选框与其他候选框之间的重叠面积与该候选框与其他候选框之间的叠加面积的比值与预设阈值之间的关系,获取每个候选框和其他候选框之间的抑制关系。若比值小于阈值,则表示两个候选框的重叠率小,表示两个候选框之间的抑制关系为未被抑制,具体的,两个候选框中置信度小的候选框未被置信度大的候选率所抑制;若比值大于阈值,则表示两个候选框的重叠率大,表示两个候选框之间的抑制关系为被抑制,具体的,置信度小的候选框被置信度大的候选框所抑制。According to the relationship between the ratio of the overlapping area between each candidate frame and other candidate frames to the overlapping area between the candidate frame and other candidate frames and the preset threshold, obtain the relationship between each candidate frame and other candidate frames. inhibit relationship. If the ratio is less than the threshold, it means that the overlap ratio of the two candidate frames is small, which means that the inhibition relationship between the two candidate frames is not suppressed. If the ratio is greater than the threshold, it means that the overlap rate of the two candidate frames is large, which means that the inhibition relationship between the two candidate frames is suppressed. The candidate box is suppressed.
在一个可能的实施方式中,可以按照部分候选框中的每个候选框的序号,依次将每个候选框作为被选中的候选框,获取该被选中的候选框和全部候选框中的每个候选框之间的抑制关系。在一个可能的实施方式中,也可以同时选择多个候选框作为被选中的候选框,一次性获取多个被选中的候选框中的每个候选框和全部候选框中的每个候选框之间的抑制关系。In a possible implementation, according to the sequence number of each candidate frame in some candidate frames, each candidate frame can be regarded as the selected candidate frame in turn, and the selected candidate frame and each of the all candidate frames can be obtained. Suppression relationship between candidate boxes. In a possible implementation, multiple candidate frames can also be selected as the selected candidate frames at the same time, and the difference between each candidate frame in the multiple selected candidate frames and each candidate frame in all the candidate frames is obtained at one time. inhibitory relationship between.
在一个可能的实施方式中,可以通过比特序列表示被选中的候选框和全部候选框中的每一个候选框之间的抑制关系。比如每个比特序列用于表示第一候选框和第二候选框之间的抑制关系,第一候选框是部分候选框中的一个候选框,第二候选框是全部候选框中的每个候选框。每个候选框分别对应一个比特序列,每个比特序列包括M个位置,M为全部候选框的数目。假设被选中的候选框为第一候选框,该第一候选框对应的比特序列为第一比特序列,该比特序列包括M个位置,M个位置中的每个位置用于表示全局候选框中的一个候选框和第一候选框之间的抑制关系。在一个可能的实施方式中,可以按照候选框的序号,依次根据序号确定第一候选框与各个候选框之间的抑制关系。在一个可能的实施方式中,M个位置中的每个位置可以通过多个比特指示抑制关系。在一个可能的实施方式中,为了降低计算量,减少 需要传输的数据量,提升算法的计算速度,M个位置中的每个位置可以通过1比特指示抑制关系,比如1比特为0时表示第二候选框被第一候选框抑制,1比特为1时表示第二候选框没有被第一候选框抑制。为了更直观的展示通过比特序列表示第一候选框和第二候选框之间的抑制关系,参阅图6,图6为本申请实施例中通过比特序列表示第一候选框和第二候选框之间的抑制关系的示意图。在这个实施例中,假设对于每一个候选框,都按照候选框的序号由小到大的顺序确定第一候选框与各个候选框之间的抑制关系。如图6所示,假设一共有16个候选框,序号为1的候选框对应的第一比特序列,在图6所示的第一比特序列中,第1个位置、第3个位置、第6个位置、第9个位置、第12个位置、第13个位置以及第15个位置为1,其余位置为0,则可以认为序号为3的候选框未被序号为1的候选框抑制,序号为6的候选框未被序号为1的候选框抑制,序号为9的候选框未被序号为1的候选框抑制,序号为12的候选框未被序号为1的候选框抑制,序号为13的候选框未被序号为1的候选框抑制,序号为15的候选框未被序号为1的候选框抑制,其余序号的候选框均被序号为1的候选框所抑制,即序号为2、序号为4、序号为5、序号为7、序号为8、序号为10、序号为11以及序号为16的候选框均被序号为1的候选框所抑制。再比如,序号为2的候选框对应的第二比特序列,在图6所示的第二比特序列中,第1个位置、第2个位置、第4个位置至第7个位置、第9个位置以及第14个位置为1,其余位置为0,则可以认为序号为1、序号为4至7、序号为9以及序号为14的候选框未被序号为2的候选框抑制,其余序号的候选框均被序号为2的候选框所抑制,即序号为3、序号为8、序号为10至13、序号15以及序号16被序号为2的候选框所抑制。In a possible implementation, the suppression relationship between the selected candidate frame and each candidate frame in all candidate frames can be represented by a bit sequence. For example, each bit sequence is used to represent the suppression relationship between the first candidate frame and the second candidate frame. The first candidate frame is a candidate frame in some candidate frames, and the second candidate frame is each candidate frame in all candidate frames. frame. Each candidate frame corresponds to a bit sequence, and each bit sequence includes M positions, where M is the number of all candidate frames. Assuming that the selected candidate frame is the first candidate frame, the bit sequence corresponding to the first candidate frame is the first bit sequence, the bit sequence includes M positions, and each position in the M positions is used to represent the global candidate frame The suppression relationship between one candidate box and the first candidate box. In a possible implementation manner, the suppression relationship between the first candidate frame and each candidate frame may be determined according to the sequence numbers of the candidate frames and sequentially according to the sequence numbers. In one possible implementation, each of the M positions may indicate the suppression relationship through a plurality of bits. In a possible implementation, in order to reduce the amount of calculation, reduce the amount of data to be transmitted, and improve the calculation speed of the algorithm, each position in the M positions can indicate the suppression relationship by 1 bit, for example, when 1 bit is 0, it indicates the first The second candidate frame is suppressed by the first candidate frame, and when 1 bit is 1, it indicates that the second candidate frame is not suppressed by the first candidate frame. In order to more intuitively show the suppression relationship between the first candidate frame and the second candidate frame represented by the bit sequence, please refer to FIG. 6 , which is the first candidate frame and the second candidate frame represented by the bit sequence in this embodiment of the application. Schematic diagram of the inhibition relationship between. In this embodiment, it is assumed that for each candidate frame, the suppression relationship between the first candidate frame and each candidate frame is determined according to the sequence number of the candidate frame from small to large. As shown in Figure 6, assuming that there are 16 candidate frames in total, the first bit sequence corresponding to the candidate frame with sequence number 1, in the first bit sequence shown in Figure 6, the first position, the third position, the first bit sequence If the 6th position, the 9th position, the 12th position, the 13th position and the 15th position are 1, and the rest of the positions are 0, it can be considered that the candidate frame with sequence number 3 is not suppressed by the candidate frame with sequence number 1, The candidate frame with sequence number 6 is not suppressed by the candidate frame with sequence number 1, the candidate frame with sequence number 9 is not suppressed by the candidate frame with sequence number 1, the candidate frame with sequence number 12 is not suppressed by the candidate frame with sequence number 1, and the sequence number is The candidate frame of 13 is not suppressed by the candidate frame with the sequence number 1, the candidate frame with the sequence number 15 is not suppressed by the candidate frame with the sequence number 1, and the candidate frames with the sequence number are all suppressed by the candidate frame with the sequence number 1, that is, the sequence number is 2. , 4, 5, 7, 8, 10, 11, and 16 candidates are suppressed by the 1 candidate. For another example, the second bit sequence corresponding to the candidate frame with sequence number 2, in the second bit sequence shown in FIG. 6, the first position, the second position, the fourth position to the seventh position, the ninth position The first position and the 14th position are 1, and the remaining positions are 0. It can be considered that the candidate frame with the sequence number 1, the sequence number 4 to 7, the sequence number 9 and the sequence number 14 is not suppressed by the candidate frame with the sequence number 2, and the remaining sequence numbers are not suppressed by the candidate frame with the sequence number 2. The candidate frames of , are suppressed by the candidate frame with sequence number 2, that is, the candidate frame with sequence number 3, sequence number 8, sequence numbers 10 to 13, sequence number 15, and sequence number 16 are suppressed by the candidate frame with sequence number 2.
在一个可能的实施方式中,可以初始化设定M个比特序列,针对每一个第一单核处理器2021,配置第一数目的初始比特序列,第一数目是该第一单核处理器2021获取的部分候选框的数目。由于在获取抑制关系时,每个候选框不需要获取和自身的抑制关系,以及不需要考虑置信度排序在自身之前的候选框之间的抑制关系,所以在初始化比特序列时,使M个比特中的前P个比特均用于表示第一候选框被第二候选框抑制,使M个比特中后M-P个比特均用于表示第一候选框未被第二候选框抑制,P为第一候选框在部分候选框的排序中的序号。举例说明,参照表1,假设一共有15个候选框,每个候选框分别对应一个序号,15个序号是连续的,序号越小表示置信度越高。则对序号为7的比特序列进行初始化时,比如通过1比特指示时,前7个比特为1,后8个比特为0。In a possible implementation manner, M bit sequences may be initially set, and a first number of initial bit sequences may be configured for each first single-core processor 2021 , and the first number is obtained by the first single-core processor 2021 The number of partial candidate boxes. When obtaining the suppression relationship, each candidate frame does not need to obtain the suppression relationship with itself, and does not need to consider the suppression relationship between candidate frames whose confidence levels are ranked before itself, so when initializing the bit sequence, M bits The first P bits are used to indicate that the first candidate frame is suppressed by the second candidate frame, so that the M-P bits in the M bits are used to indicate that the first candidate frame is not suppressed by the second candidate frame, and P is the first candidate frame. The sequence number of the candidate frame in the sorting of some candidate frames. For example, referring to Table 1, it is assumed that there are 15 candidate frames in total, each candidate frame corresponds to a sequence number, the 15 sequence numbers are consecutive, and the smaller the sequence number, the higher the confidence. Then, when initializing the bit sequence with the serial number of 7, for example, when indicated by 1 bit, the first 7 bits are 1, and the last 8 bits are 0.
表1:Table 1:
11 11 11 11 11 11 11 00 00 00 00 00 00 00 00
在一个可能的实施方式中,由于在获取抑制关系时,每个候选框不需要获取和自身的抑制关系,以及不需要考虑置信度排序在自身之前的候选框之间的抑制关系,还可以在获取了每个候选框的比特序列之后,通过校验比特序列对获取到的各个比特序列进行处理。其中校验比特使M个比特中的前P个比特均用于表示第一候选框被第二候选框抑制,使M个比特中后M-P个比特均用于表示第一候选框未被第二候选框抑制,P为第一候选框在部分候选框的排序中的序号。比如参照表1,还可以将表1理解为序号为7的比特序列的校验比特序列,将序号为7的比特序列的校验比特序列和序号为7的比特序列进行或按位“或”计算,以获取校验后的序号为7的比特序列。In a possible implementation, since each candidate frame does not need to obtain the inhibition relationship with itself when obtaining the suppression relationship, and does not need to consider the suppression relationship between candidate frames whose confidence levels are ranked before itself, it is also possible to After the bit sequence of each candidate frame is acquired, each acquired bit sequence is processed by checking the bit sequence. The check bit makes the first P bits in the M bits all used to indicate that the first candidate frame is suppressed by the second candidate frame, and the last M-P bits in the M bits are used to indicate that the first candidate frame is not suppressed by the second candidate frame. The candidate frame is suppressed, and P is the sequence number of the first candidate frame in the sorting of some candidate frames. For example, referring to Table 1, Table 1 can also be understood as the parity bit sequence of the bit sequence numbered 7, and the parity bit sequence of the bit sequence numbered 7 and the bit sequence numbered 7 are ORed bitwise Calculation is performed to obtain a bit sequence with a sequence number of 7 after verification.
通过以上步骤,针对每一个第一单核处理器2021,可以获取N个比特序列,N为该第一单核处理器2021获取的部分候选框的数目。针对全部的第一单核处理器2021,一共可以获取M个比特序列。Through the above steps, for each first single-core processor 2021, N bit sequences can be obtained, where N is the number of partial candidate frames obtained by the first single-core processor 2021. For all the first single-core processors 2021, a total of M bit sequences can be obtained.
每一个第一单核处理器2021将获取到的部分候选框中的每一个候选框和全部候选框中的每个候选框之间的抑制关系向第二单核处理器203发送。在一个可能的实施方式中,每一个第一单核处理器2021将获取到的N个比特序列向第二单核处理器203发送。在一个可能的实施方式中,每个第一单核处理器2021将获取到的检验后的N个比特序列向第二单核处理器203发送。在一个可能的实施方式中,每个第一单核处理器2021可以将N个比特序列发送至存储模块中,存储模块根据候选框的序号对N个比特序列进行排序,并将排序后的N个比特序列作为第二单核处理器203的输入。当第二单核处理器是多个第一单核处理器中的一个第一单核处理器时,第二单核处理器可以获取自身产生的N个比特序列,并接收其他的每个第一单核处理器发送的N个比特序列,以下对此不再重复赘述。Each first single-core processor 2021 sends the acquired suppression relationship between each candidate frame in some candidate frames and each candidate frame in all candidate frames to the second single-core processor 203 . In a possible implementation manner, each first single-core processor 2021 sends the acquired N bit sequences to the second single-core processor 203 . In a possible implementation manner, each first single-core processor 2021 sends the obtained N bit sequences after checking to the second single-core processor 203 . In a possible implementation manner, each first single-core processor 2021 may send N bit sequences to the storage module, and the storage module sorts the N bit sequences according to the sequence numbers of the candidate boxes, and stores the sorted N bit sequences. A sequence of bits is used as input to the second single-core processor 203 . When the second single-core processor is one of the plurality of first single-core processors, the second single-core processor may acquire the N bit sequences generated by itself, and receive each other The N bit sequences sent by a single-core processor will not be repeated below.
第二单核处理器203,用于根据每个第一单核处理器2021获取的抑制关系以及各个候选框对应的置信度,获取最终候选框。The second single-core processor 203 is configured to obtain the final candidate frame according to the suppression relationship obtained by each first single-core processor 2021 and the corresponding confidence level of each candidate frame.
在一种可能的实施方式中,第二单核处理器203可以首先获取置信度最大的候选框对应的比特序列,根据置信度最大的候选框对应的比特序列,可以获取全部候选框中的每个候选框和置信度最大的候选框之间的抑制关系,进而获取到未被置信度最大的候选框抑制的候选框,进入下一轮筛选过程。在下一轮筛选过程中,选择置信度最大的候选框对应的比特序列,根据这一轮中置信度最大的候选框对应的比特序列,获取未被置信度最大的候选框抑制的候选框,进入下一轮筛选过程,重复上述过程直至满足预设的停止条件。其中停止条件可以理解为输出预设数目的最终候选框,或者遍历了所有的候选框,或者其他设定的停止条件。为了更好的理解这一过程,这里结合一个例子进行说明。继续参阅图6,序号为1的候选框为所有候选框中置信度最大的候选框,根据序号1对应的第一比特序列可以获取到序号3、序号6、序号9、序号12、序号13以及序号15对应的候选框未被序号为1的候选框所抑制。则序号3、序号6、序号9、序号12、序号13以及序号15对应的候选框进行下一轮的筛选。这其中,序号3的候选框的置信度是序号3、序号6、序号9、序号12、序号13以及序号15的候选框中置信度最高的,则根据序号3对应的比特序列(比如第三比特序列)获取序号3与序号6、序号9、序号12、序号13以及序号15的候选框之间的抑制关系,经过这两轮的筛选获取了两个最终的候选框,分别是序号1的候选框和序号3的候选框,依次类推,重复上述筛选过程,直至预设的停止条件。In a possible implementation, the second single-core processor 203 may first obtain the bit sequence corresponding to the candidate frame with the highest confidence, and may obtain each bit sequence corresponding to the candidate frame with the highest confidence according to the bit sequence corresponding to the candidate frame with the highest confidence. The inhibition relationship between each candidate frame and the candidate frame with the highest confidence is obtained, and the candidate frame that is not inhibited by the candidate frame with the highest confidence is obtained, and the next round of screening process is entered. In the next round of screening process, select the bit sequence corresponding to the candidate frame with the highest confidence, and obtain the candidate frame that is not suppressed by the candidate frame with the highest confidence according to the bit sequence corresponding to the candidate frame with the highest confidence in this round, and enter In the next round of screening process, the above process is repeated until the preset stop condition is met. The stopping condition can be understood as outputting a preset number of final candidate frames, or traversing all candidate frames, or other set stopping conditions. In order to better understand this process, here is an example to illustrate. Continue to refer to FIG. 6 , the candidate frame with sequence number 1 is the candidate frame with the highest confidence in all candidate frames. According to the first bit sequence corresponding to sequence number 1, sequence number 3, sequence number 6, sequence number 9, sequence number 12, sequence number 13 and The candidate frame corresponding to the sequence number 15 is not suppressed by the candidate frame whose sequence number is 1. Then the candidate boxes corresponding to the sequence number 3, the sequence number 6, the sequence number 9, the sequence number 12, the sequence number 13, and the sequence number 15 are subjected to the next round of screening. Among them, the confidence of the candidate frame of sequence number 3 is the highest confidence of the candidate frame of sequence number 3, sequence number 6, sequence number 9, sequence number 12, sequence number 13 and sequence number 15, then according to the bit sequence corresponding to sequence number 3 (for example, the third bit sequence) to obtain the inhibition relationship between the candidate frames of sequence number 3 and sequence number 6, sequence number 9, sequence number 12, sequence number 13, and sequence number 15, and after these two rounds of screening, two final candidate frames are obtained, which are the ones of sequence number 1. The candidate frame and the candidate frame of sequence number 3, and so on, repeat the above-mentioned screening process until the preset stop condition is reached.
在一个可能的实施方式中,根据候选框序列,生成序号序列。In a possible implementation, a sequence number sequence is generated according to the sequence of candidate boxes.
对序号序列进行多次筛选,以获取多个序号,多个序号用于获取最终候选框,其中多次筛选中的任意一次筛选包括:根据前一次筛选获取的序号获取特定候选框的比特序列,特定候选框的比特序列用于表示特定候选框和第二候选框之间的抑制关系;根据特定候选框的比特序列与上一次筛选获取的全局序列的抑制关系,获取更新后的全局序列,更新后的全局序列指示至少一个序号,至少一个序号对应的候选框没有被特定候选框以及排序在特定候选框之前的其他候选框抑制。下面以通过1比特表示抑制关系为例进行举例说明。The sequence number sequence is screened multiple times to obtain multiple sequence numbers, and the multiple sequence numbers are used to obtain the final candidate frame, wherein any one screening in the multiple screening includes: obtaining the bit sequence of a specific candidate frame according to the sequence number obtained by the previous screening, The bit sequence of the specific candidate frame is used to represent the inhibition relationship between the specific candidate frame and the second candidate frame; according to the inhibition relationship between the bit sequence of the specific candidate frame and the global sequence obtained from the last screening, the updated global sequence is obtained, and the updated global sequence is updated. The latter global sequence indicates at least one sequence number, and the candidate frame corresponding to at least one sequence number is not suppressed by the specific candidate frame and other candidate frames ranked before the specific candidate frame. In the following, an example is given by taking 1 bit to represent the suppression relationship as an example.
比如按照置信度从高到低的顺序,对全部候选框进行了排序,使每个候选框对应一个序 号,则序号序列中的每个序号和每个候选框对应的序号是一致的。举例说明,假设一共有15个候选框,按照置信度从高到低的顺序,对全部候选框进行了排序后,第一候选框的序号为1,则序号序列中的序号1用于获取第一候选框的比特序列,再比如第七候选框的序号为7,则序号序列中的序号7用于获取第七候选框的比特序列。For example, all candidate frames are sorted in the order of confidence from high to low, so that each candidate frame corresponds to a sequence number, then each sequence number in the sequence number sequence is consistent with the sequence number corresponding to each candidate frame. For example, assuming that there are 15 candidate frames in total, after sorting all candidate frames in the order of confidence from high to low, the sequence number of the first candidate frame is 1, then the sequence number 1 in the sequence number sequence is used to obtain the first candidate frame. For a bit sequence of a candidate frame, for example, the sequence number of the seventh candidate frame is 7, the sequence number 7 in the sequence number sequence is used to obtain the bit sequence of the seventh candidate frame.
第二单核处理器203对全局序列进行初始化,初始化的全局序列包括M个比特,每个比特均设置为1,用于表示初始状态下,全部候选框中的每一个候选框都未被抑制,即全部候选框中的每一个候选框都保留。M个比特中从第一个比特至第M个比特,分别用于表示当前被选中的候选框和置信度从高到低对应的各个候选框的抑制关系。The second single-core processor 203 initializes the global sequence, the initialized global sequence includes M bits, and each bit is set to 1, which is used to indicate that in the initial state, each candidate frame in all candidate frames is not suppressed , that is, each candidate frame in all candidate frames is reserved. Among the M bits, the first bit to the Mth bit are respectively used to indicate the suppression relationship between the currently selected candidate frame and each candidate frame corresponding to the confidence level from high to low.
从序号为1的候选框的比特序列开始判断,将序号为1的候选框的比特序列和全局序列进行与操作,将经过与操作后的全局序列作为更新后的全局序列。参照图7a进行理解,在图7a所示的例子中,一共包括16个候选框,序号为1的候选框的比特序列和全局序列进行与操作之后,获取了更新后的全局序列。更新后的全局序列中0所在的位置表示该位置的候选框被序号为1的候选框抑制,更新后的全局序列中1所在的位置表示该位置的候选框未被序号为1的候选框抑制。在图7a所示的例子中,即序号2至6、序号10、序号12以及序号16对应的候选框未被序号1对应的候选框抑制,其他序号对应的候选框被序号1抑制。通过第一次更新后获取的全局序列,获取了序号2至6、序号10、序号12以及序号16,由于是按照置信度从高到低的顺序为每个候选框分配了序号,则序号越小,置信度越高。接着,从序号为2的候选框的比特序列开始判断,将序号为2的候选框的比特序列和第一次更新后获取的全局序列进行与操作,将经过与操作后的全局序列作为更新后的全局序列,具体的是作为第二次更新后的全局序列。参照图7b进行理解,图7b是在图7a的基础上继续举例说明,在图7b所示的例子中,即序号4-6对应的候选框未被序号2对应的候选框所抑制,其他序号对应的候选框被序号2对应的候选框抑制,由于序号1对应的候选框和序号2对应的候选框之间的抑制关系已经判断过,这里不再重复判断,序号1对应的候选框已经被选为第一个最终候选框。通过第二次更新后的全局序列,获取了序号4至序号6。然后,从序号4的候选框的比特序列开始判断,将序号4的候选框对应的比特序列和第二次更新后获取的全局序列进行与操作,将经过与操作后的全局序列作为更新后的全局序列,具体的是作为第三次更新后的全局序列。根据第三次更新后的全局序列可以获取下一次用于筛选的序号,依次类推,每一次循环的过程会挑选出经过此次与操作后,被保留的序号以及被挑选出的用于确定最终候选框的序号,再根据被保留的序号筛选下一次被保留的序号,重复上述循环筛选的过程,直至满足停止条件。当满足停止条件时,根据最终更新获取的全局序列中1所在的位置对应的序号,获取最终的候选框。参照图7c进行理解,假设最终获取的全局序列为图7c所示,则确定全局序列中1所在的位置对应的序号分别为序号1至序号6、序号10、序号12以及序号15,则确定序号1至序号6对应的候选框,序号10对应的候选框,序号12对应的候选框以及序号15对应的候选框为最终候选框。如果按照置信度从高到低的顺序为每个候选框分配的序号,则置信度排序第1至第6的候选框,排序在第10的候选框,排序在第12以及排序在第15的候选框为最终候选框。Judging from the bit sequence of the candidate frame with sequence number 1, and performing the AND operation on the bit sequence of the candidate frame with sequence number 1 and the global sequence, and using the global sequence after the AND operation as the updated global sequence. 7a, in the example shown in FIG. 7a, a total of 16 candidate frames are included, and the updated global sequence is obtained after the bit sequence of the candidate frame numbered 1 and the global sequence are ANDed. The position of 0 in the updated global sequence indicates that the candidate frame at this position is suppressed by the candidate frame with sequence number 1, and the position of 1 in the updated global sequence indicates that the candidate frame at this position is not suppressed by the candidate frame with sequence number 1 . In the example shown in Figure 7a, the candidate frames corresponding to sequence numbers 2 to 6, sequence number 10, sequence number 12, and sequence number 16 are not suppressed by the candidate frame corresponding to sequence number 1, and candidate frames corresponding to other sequence numbers are suppressed by sequence number 1. Through the global sequence obtained after the first update, the serial numbers 2 to 6, the serial number 10, the serial number 12 and the serial number 16 are obtained. Since the serial numbers are assigned to each candidate frame in the order of confidence from high to low, the higher the serial number, the higher the serial number. small, the higher the confidence. Next, start the judgment from the bit sequence of the candidate frame with sequence number 2, perform AND operation on the bit sequence of the candidate frame with sequence number 2 and the global sequence obtained after the first update, and use the global sequence after the AND operation as the updated global sequence. The global sequence of , specifically the global sequence after the second update. Refer to Figure 7b for understanding. Figure 7b continues to illustrate on the basis of Figure 7a. In the example shown in Figure 7b, that is, the candidate frames corresponding to sequence numbers 4-6 are not suppressed by the candidate frame corresponding to sequence number 2, and other sequence numbers are not suppressed by the candidate frame corresponding to sequence number 2. The corresponding candidate frame is suppressed by the candidate frame corresponding to sequence number 2. Since the suppression relationship between the candidate frame corresponding to sequence number 1 and the candidate frame corresponding to sequence number 2 has been judged, the judgment will not be repeated here. The candidate frame corresponding to sequence number 1 has already been determined. Selected as the first final candidate box. Through the global sequence after the second update, the sequence numbers 4 to 6 are obtained. Then, starting from the bit sequence of the candidate frame of sequence number 4, perform AND operation on the bit sequence corresponding to the candidate frame of sequence number 4 and the global sequence obtained after the second update, and use the global sequence after the AND operation as the updated global sequence. The global sequence, specifically as the global sequence after the third update. According to the global sequence after the third update, the sequence number for the next screening can be obtained, and so on. The process of each cycle will select the sequence number that is reserved after this AND operation and the selected sequence number for determining the final The sequence number of the candidate frame, and then the sequence number to be reserved next time is screened according to the reserved sequence number, and the above-mentioned cyclic screening process is repeated until the stop condition is satisfied. When the stopping condition is satisfied, the final candidate frame is obtained according to the sequence number corresponding to the position of 1 in the global sequence obtained by the final update. 7c for understanding, assuming that the final obtained global sequence is shown in FIG. 7c, then it is determined that the sequence numbers corresponding to the position of 1 in the global sequence are sequence number 1 to sequence number 6, sequence number 10, sequence number 12 and sequence number 15, then determine the sequence number The candidate frames corresponding to 1 to 6, the candidate frame corresponding to 10, the candidate frame corresponding to 12, and the candidate frame corresponding to 15 are the final candidate frames. If the sequence number assigned to each candidate box is in the order of confidence from high to low, the confidence level ranks the 1st to 6th candidate boxes, the 10th candidate box, the 12th and the 15th candidate box. The candidate frame is the final candidate frame.
在一个可能的实施方式中,第二单核处理器203的输出可以是最终候选框的坐标,比如输出各个最终候选框的左上角的坐标以及右下角的坐标。In a possible implementation manner, the output of the second single-core processor 203 may be the coordinates of the final candidate frame, such as outputting the coordinates of the upper left corner and the lower right corner of each final candidate frame.
在第二单核处理器203对序号序列进行多次筛选的过程中,实际每一次筛选只需要获取前几个序号,或者前1个序号。比如,在上述例子中,通过第一次更新后获取的全局序列,实际上只需要获取序号2,就可以进行下一次的筛选,而无需获取序号3-6、序号10、序号12以及序号16;通过第二次更新后的更新后的全局序列,实际上只需要获取序号4,就可以进行下一次的筛选,而无需获取序号5和序号6。因此,在一个可能的实施方式中,可以在利用多核处理候选框的装置中设计提前停止的指示符。比如指示符用于指示每一次只获取预设数目的序号,比如获取1个或者获取3个,以减少冗余的计算。参阅图8,当指示符指示获取3个序号时,利用多核处理候选框的装置对更新后的全局序列获取前3个位置1的序号,然后停止获取其他位置为1的序号。In the process that the second single-core processor 203 performs multiple screenings on the sequence number sequence, actually each screening only needs to acquire the first few sequence numbers, or the first one sequence number. For example, in the above example, the global sequence obtained after the first update actually only needs to obtain the sequence number 2, and then the next screening can be performed without the need to obtain the sequence numbers 3-6, sequence numbers 10, sequence numbers 12 and sequence numbers 16. ; Through the updated global sequence after the second update, in fact, only the sequence number 4 needs to be obtained, and the next screening can be performed without the need to obtain the sequence number 5 and the sequence number 6. Therefore, in a possible implementation, an early-stop indicator can be designed in an apparatus utilizing multiple cores to process candidate boxes. For example, the indicator is used to indicate that only a preset number of sequence numbers are acquired each time, such as acquiring one or three, so as to reduce redundant computation. Referring to FIG. 8 , when the indicator instructs to acquire three sequence numbers, the device using the multi-core processing candidate frame acquires the sequence numbers of the first three positions of 1 for the updated global sequence, and then stops acquiring the sequence numbers of other positions of 1.
在一些可能的场景中,输入至利用多核处理候选框的装置的候选框的数据量过大、或者多核处理器202中包括的第一单核处理器2021的数目过少等原因,可能导致第一单核处理器2021需要处理的数据量,或者第一单核处理器2021获取到的数据量超过了第一单核处理器2021的最大存储空间。针对这些场景,在一种可能的实施方式中,可以将全部候选框分多次输入至利用多核处理候选框的装置中。比如将全部候选框拆分为多组候选框,比如拆分为第一组候选框和第二组候选框,各个第一单核处理器2021根据各自的标识信息从第一组候选框中获取部分候选框,各个第一单核处理器2021获取部分候选框中的每个候选框与全部候选框中的每个候选框之间的抑制关系,第二单核处理器203根据各个第一单核处理器2021获取的抑制关系以及第一组候选框中各个候选框对应的置信度,获取第一组候选框中的最终候选框。各个第一单核处理器2021再根据各自的标识信息从第二组候选框中获取部分候选框,各个第一单核处理器2021获取部分候选框中的每个候选框与全部候选框中的每个候选框之间的抑制关系,第二单核处理器203根据各个第一单核处理器2021获取的抑制关系以及第二组候选框中各个候选框对应的置信度,获取第二组候选框中的最终候选框。参照图9进行举例说明,多核处理器202对第一组候选框进行处理,将处理后的数据发送至第二单核处理器203,这一部分的过程可以参照图1至图8所描述的方案进行理解,这里不再重复赘述。多核处理器202对第二组候选框进行处理,将处理后的数据发送至第二单核处理器203,至此第二单核处理器203获取了针对第一组候选框获取到的最后一次更新后的全局序列(以下称为第一组全局序列)。将第一组全局序列和第二组候选框中置信度最大的候选框对应的比特序列进行与计算,以获取针对第二组候选框的第一次更新后的全局序列,此后的循环计算过程参照图1至图8所描述的方案进行理解,这里不再重复赘述。即第二组候选框的初始化状态时的全局序列是第一组全局序列。根据第二组候选框的最后一次更新后的全局序列可以获取全部的最终候选框。In some possible scenarios, the amount of data input to the candidate frame of the device using multi-core processing candidate frames is too large, or the number of the first single-core processors 2021 included in the multi-core processor 202 is too small, etc., which may cause the first The amount of data that needs to be processed by a single-core processor 2021 , or the amount of data acquired by the first single-core processor 2021 exceeds the maximum storage space of the first single-core processor 2021 . For these scenarios, in a possible implementation, all candidate frames may be input into the apparatus for processing candidate frames by using multiple cores. For example, all candidate frames are divided into multiple groups of candidate frames, such as a first group of candidate frames and a second group of candidate frames, and each first single-core processor 2021 obtains from the first group of candidate frames according to their respective identification information For some candidate frames, each first single-core processor 2021 obtains the suppression relationship between each candidate frame in some candidate frames and each candidate frame in all candidate frames, and the second single-core processor 203 obtains the suppression relationship between each candidate frame in the partial candidate frames and each candidate frame in all candidate frames. The suppression relationship obtained by the core processor 2021 and the confidence level corresponding to each candidate frame in the first group of candidate frames are used to obtain the final candidate frame in the first group of candidate frames. Each first single-core processor 2021 then obtains some candidate frames from the second group of candidate frames according to their respective identification information, and each first single-core processor 2021 obtains each candidate frame in the partial candidate frame and all candidate frames. For the suppression relationship between each candidate frame, the second single-core processor 203 obtains the second group of candidates according to the suppression relationship obtained by each first single-core processor 2021 and the corresponding confidence level of each candidate frame in the second group of candidate frames The final candidate box in the box. Referring to FIG. 9 for illustration, the multi-core processor 202 processes the first group of candidate frames, and sends the processed data to the second single-core processor 203. For this part of the process, refer to the solutions described in FIGS. 1 to 8 . For understanding, it will not be repeated here. The multi-core processor 202 processes the second group of candidate frames, and sends the processed data to the second single-core processor 203. So far, the second single-core processor 203 obtains the last update obtained for the first group of candidate frames. The latter global sequence (hereinafter referred to as the first set of global sequences). The first set of global sequences and the bit sequences corresponding to the candidate frames with the highest confidence in the second set of candidate frames are ANDed to obtain the first updated global sequence for the second set of candidate frames, and the subsequent cyclic calculation process For understanding, refer to the solutions described in FIG. 1 to FIG. 8 , and details are not repeated here. That is, the global sequence in the initialization state of the second group of candidate boxes is the first group of global sequences. All final candidate boxes can be obtained according to the last updated global sequence of the second group of candidate boxes.
在一个可能的实施方式中,若根据第一组候选框获取的最终候选框的数目已经满足预设的需求,则可以不再根据第二组候选框获取最终的候选框。此外,需要说明的是,上述列举的两组候选框仅为示例性的说明,实际上可以将全部候选框拆分为更多组候选框。In a possible implementation manner, if the number of final candidate frames obtained according to the first group of candidate frames already meets the preset requirement, the final candidate frame may not be obtained according to the second group of candidate frames. In addition, it should be noted that the above-mentioned two groups of candidate frames are only exemplary, and in fact, all candidate frames may be divided into more groups of candidate frames.
在一个可能的实施方式中,也可以将全部候选框一次性输入至利用多核处理候选框的装置中,各个第一单核处理器2021获取的部分候选框的总数目小于全部候选框的总数目,比如各个第一单核处理器2021获取的部分候选框的总数目是全部候选框的总数目的一半。各个第一单核处理器2021如何对获取到的部分候选框进行处理,以及第二单核处理器203如何根据 各个第一单核处理器2021发送的数据进行输出,输出最终的候选框的过程已经在上文进行了具体的介绍,这里不再重复赘述。在一个可能的实施方式中,当各个第一单核处理器2021获取的部分候选框的总数目小于全部候选框的总数目时,若第二单核处理器203已经输出预设数据的最终候选框,则可以不再针对剩余的候选框进行处理。比如各个第一单核处理器2021获取的部分候选框的总数目是全部候选框的总数目的一半时,如果针对这一半数目的候选框,第二单核处理器203已经输出了预设数目的最终候选框,则利用多核处理候选框的装置不再对剩下的一半候选框进行处理。In a possible implementation, all candidate frames may also be input into the device using multi-core processing candidate frames at one time, and the total number of partial candidate frames obtained by each first single-core processor 2021 is less than the total number of all candidate frames. , for example, the total number of partial candidate frames acquired by each first single-core processor 2021 is half of the total number of all candidate frames. How each first single-core processor 2021 processes some of the obtained candidate frames, and how the second single-core processor 203 outputs the final candidate frame according to the data sent by each first single-core processor 2021 The specific introduction has been made above, and will not be repeated here. In a possible implementation, when the total number of partial candidate frames acquired by each first single-core processor 2021 is less than the total number of all candidate frames, if the second single-core processor 203 has output the final candidate of the preset data frame, the remaining candidate frames can no longer be processed. For example, when the total number of partial candidate frames acquired by each first single-core processor 2021 is half of the total number of candidate frames, if the second single-core processor 203 has output a preset number of candidate frames for this half number of candidate frames For the final candidate frame, the apparatus for processing candidate frames using multi-core does not process the remaining half of the candidate frames.
参阅图10,为本申请实施例提供的一种利用多核处理候选框的方法的流程示意图,具体地,该方法包括以下步骤:Referring to FIG. 10 , it is a schematic flowchart of a method for processing candidate frames using multiple cores provided by an embodiment of the present application. Specifically, the method includes the following steps:
1001、获取待检测图像的候选框。1001. Obtain a candidate frame of an image to be detected.
关于如何获取待检测图像的候选框可以参照图1中关于如何获取候选框的过程进行理解,这里不再重复赘述。How to obtain the candidate frame of the image to be detected can be understood with reference to the process of how to obtain the candidate frame in FIG. 1 , and details are not repeated here.
在一种优选的实施方式中,获取到的待检测图像的候选框是已经按照置信度从高到低排序后的候选框,并且按照置信度从高到低的顺序,每个候选框分别对应一个序号,序号越小代表候选框的置信度越高。参阅图11,一共获取了2560个候选框,每个候选框分别对应一个序号,一共包括2560个序号。In a preferred embodiment, the obtained candidate frames of the image to be detected are candidate frames that have been sorted in descending order of confidence, and in the order of confidence from high to low, each candidate frame corresponds to A sequence number, the smaller the sequence number, the higher the confidence of the candidate frame. Referring to FIG. 11 , a total of 2560 candidate frames are obtained, each candidate frame corresponds to a sequence number, and a total of 2560 sequence numbers are included.
1002、通过多个第一单核处理器各自的标识信息从全部候选框中分别获取部分候选框。1002. Obtain some candidate frames from all candidate frames by using the respective identification information of the multiple first single-core processors.
由于第一单核处理器各自的标识信息不同,各个第一单核处理器获取的部分候选框都不相同。需要说明的是,多个第一单核处理器的每一个单核处理器都可以获取全部的候选框,但是仅将各自获取的部分候选框作为待处理候选框,对待处理候选框按照步骤1003进行处理。继续参阅图11,举例说明,假设一共包括X个第一单核处理器,每一个单核处理器分别从全部候选框中获取16个候选框作为待处理候选框。此外,比特序列中每个比特的取值仅作为示意,不代表真实的抑制关系。Because the identification information of the first single-core processors is different, the partial candidate frames obtained by the first single-core processors are different. It should be noted that each single-core processor of the plurality of first single-core processors can obtain all candidate frames, but only some of the candidate frames obtained by each of them are used as candidate frames to be processed, and the candidate frames to be processed follow step 1003 to be processed. Continuing to refer to FIG. 11 , for example, it is assumed that X first single-core processors are included in total, and each single-core processor obtains 16 candidate frames from all candidate frames as candidate frames to be processed. In addition, the value of each bit in the bit sequence is only for illustration, and does not represent the real suppression relationship.
1003、通过每个第一单核处理器获取部分候选框中的每个候选框与全部候选框中的每个候选框之间的抑制关系。1003. Acquire, through each first single-core processor, an inhibition relationship between each candidate frame in some candidate frames and each candidate frame in all candidate frames.
1004、通过第二单核处理器根据每个第一单核处理器获取的抑制关系以及各个候选框对应的置信度,获取最终候选框。1004. Obtain, by the second single-core processor, a final candidate frame according to the inhibition relationship obtained by each first single-core processor and the confidence level corresponding to each candidate frame.
在一种优选的实施方式中,各个第一单核处理器执行的流程完全一致。参阅图12,为本申请实施例中第一单核处理器获取候选框之间的抑制关系的流程示意图。以其中一个单核处理器为例,第一单核处理模型可以执行以下步骤:In a preferred embodiment, the processes executed by each first single-core processor are completely consistent. Referring to FIG. 12 , it is a schematic flowchart of obtaining the suppression relationship between candidate frames by the first single-core processor in an embodiment of the present application. Taking one of the single-core processors as an example, the first single-core processing model can perform the following steps:
1201、获取每个候选框的面积。1201. Obtain the area of each candidate frame.
1202、根据部分候选框的序号初始化比特序列。1202. Initialize the bit sequence according to the sequence numbers of some candidate frames.
1203、计算获取的部分候选框中的每个候选框和全部候选框中的每个候选框的重叠面积。1203. Calculate the overlapping area of each candidate frame in the obtained partial candidate frames and each candidate frame in all the candidate frames.
1204、计算获取的部分候选框中的每个候选框和全部候选框中的每个候选框的叠加面积。1204. Calculate the overlapping area of each candidate frame in some of the obtained candidate frames and each candidate frame in all the candidate frames.
1205、计算重叠面积和叠加面积的比值获取每个候选框的比特序列。1205. Calculate the ratio of the overlapping area and the overlapping area to obtain a bit sequence of each candidate frame.
其中,步骤1201至步骤1205可以参照图2a和图2b对应的实施例中第一单核处理器2021执行的相关的步骤进行理解,这里不再重复赘述。示例性的,举例说明,继续参照图11,在获取抑制关系时,每个候选框不需要获取和自身的抑制关系,以及不需要考虑置信度排序 在自身之前的候选框之间的抑制关系,所以,步骤1205输出的每个候选框的比特序列,序号P的候选框对应的比特序列,前M-P个比特在计算中不需要考虑,比如可以将前M-P个比特全部置为1。此外,继续参照图11,可以将每个第一单核处理器输出的各个比特序列按照序号组成比特序列矩阵。可以将该比特序列矩阵存储到存储单元中,作为第二单核处理器的输入。此外需要说明的是,比特序列矩阵中的取值为示意性说明,不代表真实的抑制关系。Wherein, steps 1201 to 1205 can be understood with reference to the relevant steps performed by the first single-core processor 2021 in the embodiments corresponding to FIG. 2a and FIG. 2b, and details are not repeated here. 11, when obtaining the suppression relationship, each candidate frame does not need to obtain the suppression relationship with itself, and does not need to consider the suppression relationship between candidate frames whose confidence is ranked before itself, Therefore, the bit sequence of each candidate frame output in step 1205, the bit sequence corresponding to the candidate frame of serial number P, and the first M-P bits do not need to be considered in the calculation, for example, all the first M-P bits can be set to 1. In addition, referring to FIG. 11 , each bit sequence output by each first single-core processor may be formed into a bit sequence matrix according to the serial number. This matrix of bit sequences can be stored in a memory unit as an input to the second single core processor. In addition, it should be noted that the values in the bit sequence matrix are schematic illustrations and do not represent the real suppression relationship.
参阅图13,为本申请实施例中第二单核处理器获取最终候选框的流程示意图。Referring to FIG. 13 , it is a schematic flowchart of obtaining a final candidate frame by the second single-core processor in an embodiment of the present application.
如图13所示,第二单核处理器可以执行以下步骤:As shown in Figure 13, the second single-core processor may perform the following steps:
1301、根据候选框序列的序号生成序号序列并初始化全局序列。1301. Generate a sequence number sequence according to the sequence number of the candidate frame sequence and initialize the global sequence.
1302、根据前一次筛选获取的序号获取候选框,并根据获取的候选框的比特序列与上一次更新后的全局序列进行与操作,并再一次更新全局序列。1302. Obtain a candidate frame according to the sequence number obtained from the previous screening, perform an AND operation according to the bit sequence of the obtained candidate frame and the last updated global sequence, and update the global sequence again.
1303、根据更新后的全局序列获取下一次筛选的序号。1303. Obtain the sequence number of the next screening according to the updated global sequence.
1304、重复执行步骤1302和步骤1303,直至满足停止条件。1304. Repeat step 1302 and step 1303 until the stop condition is satisfied.
1305、根据最终更新后的全局序列获取最终候选框。1305. Obtain a final candidate frame according to the final updated global sequence.
步骤1301至步骤1305可以参照图2a对应的实施例中的第二单核处理器203执行的相关的步骤进行理解,这里不再重复赘述。当第二单核处理器203是多个第一单核处理器2021中的任意一个时,步骤1301至步骤1305可以参照图2b对应的实施例中的第一单核处理器2021执行的相关的步骤进行理解,这里不再重复赘述。继续参阅图11,第二单核处理器对序号序列进行多次筛选,以获取最终的候选框。 Steps 1301 to 1305 can be understood with reference to the relevant steps performed by the second single-core processor 203 in the embodiment corresponding to FIG. 2a , and details are not repeated here. When the second single-core processor 203 is any one of the plurality of first single-core processors 2021, steps 1301 to 1305 may refer to the related processes performed by the first single-core processor 2021 in the embodiment corresponding to FIG. 2b. The steps are understood, and details are not repeated here. Continuing to refer to FIG. 11 , the second single-core processor performs multiple screening on the sequence number sequence to obtain the final candidate frame.
以上对本申请提供的一种利用多核处理候选框的装置以及一种利用多核处理候选框的方法进行了介绍,通过本申请实施例提供的方案。可以理解的是,上述利用多核处理候选框的装置为了实现上述功能,其包含了执行各个功能相应的硬件结构和/或软件模块。本领域技术人员应该很容易意识到,结合本文中所公开的实施例描述的各示例的模块及算法步骤,本申请能够以硬件或硬件和计算机软件的结合形式来实现。某个功能究竟以硬件还是计算机软件驱动硬件的方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。An apparatus for processing candidate frames using multiple cores and a method for processing candidate frames using multiple cores provided by the present application have been introduced above, and the solutions provided by the embodiments of the present application are adopted. It can be understood that, in order to realize the above-mentioned functions, the above-mentioned apparatus for utilizing multi-core processing candidate frames includes corresponding hardware structures and/or software modules for executing each function. Those skilled in the art should easily realize that the present application can be implemented in hardware or in the form of a combination of hardware and computer software. Whether a function is performed by hardware or computer software driving hardware depends on the specific application and design constraints of the technical solution. Skilled artisans may implement the described functionality using different methods for each particular application, but such implementations should not be considered beyond the scope of this application.
从硬件结构上来描述,图2a至图13中的利用多核处理候选框的装置可以由一个实体设备实现,也可以由多个实体设备共同实现,还可以是一个实体设备内的一个逻辑功能模块,本申请实施例对此不作具体限定。Describing from the hardware structure, the apparatus for utilizing the multi-core processing candidate frame in FIG. 2a to FIG. 13 can be implemented by one entity device, or can be implemented jointly by multiple entity devices, and can also be a logical function module in one entity device, This embodiment of the present application does not specifically limit this.
除了上文介绍的利用多核处理候选框的装置的一种结构,利用多核处理候选框的装置还可以通过图14中的计算机设备来实现,图14所示为本申请实施例提供的计算机设备的硬件结构示意图。包括:通信接口1401和处理器1402,还可以包括存储器1403。In addition to the structure of the apparatus for processing candidate frames using multiple cores described above, the apparatus for processing candidate frames using multiple cores can also be implemented by the computer device shown in FIG. 14 . Schematic diagram of the hardware structure. It includes: a communication interface 1401 and a processor 1402, and may also include a memory 1403.
通信接口1401可以使用任何收发器一类的装置,用于与其他设备或通信网络通信,在本方案中,端侧设备可以利用通信接口1401与服务器进行通信,比如上传模型或者下载模型。在一个可能的实施方式中,通信接口1401可以采用以太网,无线接入网(radio access network,RAN),无线局域网(wireless local area networks,WLAN)等技术与服务器进行通信。The communication interface 1401 can use any device such as a transceiver for communicating with other devices or communication networks. In this solution, the end-side device can use the communication interface 1401 to communicate with the server, such as uploading a model or downloading a model. In a possible implementation manner, the communication interface 1401 may use technologies such as Ethernet, radio access network (RAN), and wireless local area networks (WLAN) to communicate with the server.
处理器1402包括但不限于中央处理器(central processing unit,CPU),网络处理器(network processor,NP),专用集成电路(application-specific integrated circuit,ASIC)或者可编 程逻辑器件(programmable logic device,PLD)中的一个或多个。上述PLD可以是复杂可编程逻辑器件(complex programmable logic device,CPLD),现场可编程逻辑门阵列(field-programmable gate array,FPGA),通用阵列逻辑(generic array logic,GAL)或其任意组合。处理器1402负责通信线路1404和通常的处理,还可以提供各种功能,包括定时,外围接口,电压调节,电源管理以及其他控制功能。The processor 1402 includes but is not limited to a central processing unit (CPU), a network processor (NP), an application-specific integrated circuit (ASIC) or a programmable logic device (programmable logic device, PLD) one or more. The above-mentioned PLD may be a complex programmable logic device (CPLD), a field-programmable gate array (FPGA), a general-purpose array logic (generic array logic, GAL) or any combination thereof. Processor 1402 is responsible for communication lines 1404 and general processing, and may also provide various functions including timing, peripheral interface, voltage regulation, power management, and other control functions.
存储器1403可以是只读存储器(read-only memory,ROM)或可存储静态信息和指令的其他类型的静态存储设备,随机存取存储器(random access memory,RAM)或者可存储信息和指令的其他类型的动态存储设备,也可以是电可擦可编程只读存储器(electrically erasable programmable read-only memory,EEPROM)、只读光盘(compact disc read-only memory,CD-ROM)或其他光盘存储、光碟存储(包括压缩光碟、激光碟、光碟、数字通用光碟、蓝光光碟等)、磁盘存储介质或者其他磁存储设备、或者能够用于携带或存储具有指令或数据结构形式的期望的程序代码并能够由计算机存取的任何其他介质,但不限于此。存储器可以是独立存在,通过通信线路1404与处理器1402相连接。存储器1403也可以和处理器1402集成在一起。如果存储器1403和处理器1402是相互独立的器件,存储器1403和处理器1402相连,例如存储器1403和处理器1402可以通过通信线路通信。通信接口1401和处理器1402可以通过通信线路通信,通信接口1401也可以与处理器1402直连。 Memory 1403 may be read-only memory (ROM) or other types of static storage devices that can store static information and instructions, random access memory (RAM), or other types of storage devices that can store information and instructions It can also be an electrically erasable programmable read-only memory (EEPROM), a compact disc read-only memory (CD-ROM) or other optical disk storage, CD-ROM storage (including compact discs, laser discs, optical discs, digital versatile discs, Blu-ray discs, etc.), magnetic disk storage media or other magnetic storage devices, or capable of carrying or storing desired program code in the form of instructions or data structures and capable of being executed by a computer Access any other medium without limitation. The memory may exist independently and be connected to the processor 1402 through the communication line 1404 . The memory 1403 may also be integrated with the processor 1402. If the memory 1403 and the processor 1402 are separate devices, the memory 1403 and the processor 1402 are connected, for example, the memory 1403 and the processor 1402 can communicate through a communication line. The communication interface 1401 and the processor 1402 can communicate through a communication line, and the communication interface 1401 can also be directly connected to the processor 1402 .
通信线路1404可以包括任意数量的互联的总线和桥,通信线路1404将包括由处理器1402代表的一个或多个处理器1402和存储器1403代表的存储器的各种电路链接在一起。通信线路1404还可以将诸如外围设备、稳压器和功率管理电路等之类的各种其他电路链接在一起,这些都是本领域所公知的,因此,本申请不再对其进行进一步描述。第二单核处理器是多个第一单核处理器中的任意一个时,可以认为本申请中的处理器1402包括图2a对应的实施例中的多核处理器202以及第二单核处理器203;第二单核处理器与多个第一单核处理器中的任意一个都不相同时,可以认为本申请中的处理器1402包括图2b对应的实施例中的第一单核处理器2021。应当理解,上述仅为本申请实施例提供的一个例子,并且,利用多核处理候选框的装置可具有比示出的部件更多或更少的部件,可以组合两个或更多个部件,或者可具有部件的不同配置实现。Communication lines 1404 , which may include any number of interconnected buses and bridges, link together various circuits including one or more processors 1402 , represented by processor 1402 , and memory, represented by memory 1403 . Communication lines 1404 may also link together various other circuits, such as peripherals, voltage regulators, and power management circuits, which are well known in the art and, therefore, will not be described further herein. When the second single-core processor is any one of the multiple first single-core processors, it can be considered that the processor 1402 in this application includes the multi-core processor 202 and the second single-core processor in the embodiment corresponding to FIG. 2a 203; when the second single-core processor is different from any one of the multiple first single-core processors, it may be considered that the processor 1402 in this application includes the first single-core processor in the embodiment corresponding to FIG. 2b 2021. It should be understood that the above is only an example provided by the embodiments of the present application, and an apparatus utilizing a multi-core processing candidate block may have more or less components than those shown, two or more components may be combined, or Different configurations of components are possible.
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。In the above-mentioned embodiments, it may be implemented in whole or in part by software, hardware, firmware or any combination thereof. When implemented in software, it can be implemented in whole or in part in the form of a computer program product.
本申请实施例提供的利用多核处理候选框的装置可以为芯片,芯片包括:处理单元和通信单元,所述处理单元例如可以是处理器,所述通信单元例如可以是输入/输出接口、管脚或电路等。该训练装置为芯片时,该处理单元可执行存储单元存储的计算机执行指令,以使芯片执行上述图2a至图13所示实施例描述的利用多核处理候选框的装置执行的步骤。可选地,所述存储单元为所述芯片内的存储单元,如寄存器、缓存等,所述存储单元还可以是所述无线接入设备端内的位于所述芯片外部的存储单元,如只读存储器(read-only memory,ROM)或可存储静态信息和指令的其他类型的静态存储设备,随机存取存储器(random access memory,RAM)等。The device for utilizing a multi-core processing candidate frame provided by the embodiment of the present application may be a chip, and the chip includes: a processing unit and a communication unit, the processing unit may be, for example, a processor, and the communication unit may be, for example, an input/output interface, a pin or circuit etc. When the training device is a chip, the processing unit can execute the computer execution instructions stored in the storage unit, so that the chip executes the steps performed by the device using the multi-core processing candidate frame described in the embodiments shown in FIG. 2a to FIG. 13 . Optionally, the storage unit is a storage unit in the chip, such as a register, a cache, etc., and the storage unit may also be a storage unit located outside the chip in the wireless access device, such as only Read-only memory (ROM) or other types of static storage devices that can store static information and instructions, random access memory (RAM), etc.
具体地,前述的处理单元或者处理器可以是中央处理器(central processing unit,CPU)、神经网络处理器(neural-network processing unit,NPU)、图形处理器(graphics processing  unit,GPU)、数字信号处理器(digital signal processor,DSP)、专用集成电路(application specific integrated circuit,ASIC)或现场可编程逻辑门阵列(field programmable gate array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者也可以是任何常规的处理器等。Specifically, the aforementioned processing unit or processor may be a central processing unit (CPU), a neural-network processing unit (NPU), a graphics processor (graphics processing unit, GPU), a digital signal processor (digital signal processor, DSP), application specific integrated circuit (ASIC) or field programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic device, Discrete hardware components, etc. A general purpose processor may be a microprocessor or it may be any conventional processor or the like.
通过以上的实施方式的描述,所属领域的技术人员可以清楚地了解到本申请可借助软件加必需的通用硬件的方式来实现,当然也可以通过专用硬件包括专用集成电路、专用CPU、专用存储器、专用元器件等来实现。一般情况下,凡由计算机程序完成的功能都可以很容易地用相应的硬件来实现,而且,用来实现同一功能的具体硬件结构也可以是多种多样的,例如模拟电路、数字电路或专用电路等。但是,对本申请而言更多情况下软件程序实现是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在可读取的存储介质中,如计算机的软盘、U盘、移动硬盘、只读存储器(read only memory,ROM)、随机存取存储器(random access memory,RAM)、磁碟或者光盘等,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述的方法。From the description of the above embodiments, those skilled in the art can clearly understand that the present application can be implemented by means of software plus necessary general-purpose hardware. Special components, etc. to achieve. Under normal circumstances, all functions completed by a computer program can be easily implemented by corresponding hardware, and the specific hardware structures used to implement the same function can also be various, such as analog circuits, digital circuits or special circuit, etc. However, a software program implementation is a better implementation in many cases for this application. Based on this understanding, the technical solutions of the present application can be embodied in the form of software products in essence, or the parts that make contributions to the prior art. The computer software products are stored in a readable storage medium, such as a floppy disk of a computer. , U disk, mobile hard disk, read only memory (ROM), random access memory (RAM), disk or CD, etc., including several instructions to make a computer device (which can be a personal computer, server, or network device, etc.) to execute the methods described in the various embodiments of the present application.
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。In the above-mentioned embodiments, it may be implemented in whole or in part by software, hardware, firmware or any combination thereof. When implemented in software, it can be implemented in whole or in part in the form of a computer program product.
本申请实施例中还提供一种计算机可读存储介质,该计算机可读存储介质中存储有用于训练模型的程序,当其在计算机上运行时,使得计算机执行如前述图9或图10所示实施例描述的方法中的步骤。Embodiments of the present application further provide a computer-readable storage medium, where a program for training a model is stored in the computer-readable storage medium, and when it runs on a computer, the computer executes the program shown in FIG. 9 or FIG. 10 above. The examples describe steps in the method.
本申请实施例中还提供一种计算机可读存储介质,该计算机可读存储介质中存储有用于数据处理的程序,当其在计算机上运行时,使得计算机执行如前述图6、图8、图9、图11所示实施例描述的方法中的步骤。或者使得计算机执行如前述图12所示实施例描述的方法中的步骤。Embodiments of the present application also provide a computer-readable storage medium, where a program for data processing is stored in the computer-readable storage medium, and when the computer-readable storage medium runs on a computer, the computer executes the program shown in FIG. 6 , FIG. 8 , and FIG. 9. Steps in the method described in the embodiment shown in FIG. 11 . Or cause the computer to execute the steps in the method described in the embodiment shown in FIG. 12 above.
本申请实施例还提供一种数字处理芯片。该数字处理芯片中集成了用于实现上述处理器,或者处理器的功能的电路和一个或者多个接口。当该数字处理芯片中集成了存储器时,该数字处理芯片可以完成前述实施例中的任一个或多个实施例的方法步骤。当该数字处理芯片中未集成存储器时,可以通过通信接口与外置的存储器连接。该数字处理芯片根据外置的存储器中存储的程序代码来实现上述实施例中训练装置/翻译设备执行的动作。The embodiments of the present application also provide a digital processing chip. The digital processing chip integrates circuits and one or more interfaces for realizing the above-mentioned processor or the functions of the processor. When a memory is integrated in the digital processing chip, the digital processing chip can perform the method steps of any one or more of the foregoing embodiments. When the digital processing chip does not integrate the memory, it can be connected with the external memory through the communication interface. The digital processing chip implements the actions performed by the training device/translation device in the above embodiment according to the program codes stored in the external memory.
本申请实施例中还提供一种计算机程序产品,所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时,全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存储的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质(例如,DVD)、或者半导体介质(例如固态硬盘Solid State  Disk(SSD))等。Embodiments of the present application also provide a computer program product, where the computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on a computer, all or part of the processes or functions described in the embodiments of the present application are generated. The computer may be a general purpose computer, special purpose computer, computer network, or other programmable device. The computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be downloaded from a website site, computer, server, or data center Transmission to another website site, computer, server, or data center is by wire (eg, coaxial cable, fiber optic, digital subscriber line (DSL)) or wireless (eg, infrared, wireless, microwave, etc.). The computer-readable storage medium may be any available medium that can be stored by a computer, or a data storage device such as a server, data center, etc., which includes one or more available media integrated. The usable media may be magnetic media (eg, floppy disks, hard disks, magnetic tapes), optical media (eg, DVD), or semiconductor media (eg, Solid State Disk (SSD)), and the like.
本领域普通技术人员可以理解上述实施例的各种方法中的全部或部分步骤是可以通过程序来指令相关的硬件来完成,该程序可以存储于一计算机可读存储介质中,存储介质可以包括:ROM、RAM、磁盘或光盘等。Those of ordinary skill in the art can understand that all or part of the steps in the various methods of the above embodiments can be completed by instructing relevant hardware through a program, and the program can be stored in a computer-readable storage medium, and the storage medium can include: ROM, RAM, magnetic disk or optical disk, etc.
以上对本申请实施例所提供的利用多核处理候选框的装置和方法进行了详细介绍,本文中应用了具体个例对本申请的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本申请的方法及其核心思想;同时,对于本领域的一般技术人员,依据本申请的思想,在具体实施方式及应用范围上均会有改变之处,综上所述,本说明书内容不应理解为对本申请的限制。The device and method for utilizing multi-core processing candidate frames provided by the embodiments of the present application have been described in detail above. The principles and implementations of the present application are described with specific examples in this article. The descriptions of the above embodiments are only used to help understanding The method of the present application and its core idea; at the same time, for those skilled in the art, according to the idea of the present application, there will be changes in the specific implementation and application scope. In summary, the content of this specification should not be It is construed as a limitation of this application.
本申请的说明书和权利要求书及上述附图中的术语“第一”,“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便这里描述的实施例能够以除了在这里图示或描述的内容以外的顺序实施。本申请中术语“和/或”,仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况,另外,本文中字符“/”,一般表示前后关联对象是一种“或”的关系。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或模块的过程,方法,系统,产品或设备不必限于清楚地列出的那些步骤或模块,而是可包括没有清楚地列出的或对于这些过程,方法,产品或设备固有的其它步骤或模块。在本申请中出现的对步骤进行的命名或者编号,并不意味着必须按照命名或者编号所指示的时间/逻辑先后顺序执行方法流程中的步骤,已经命名或者编号的流程步骤可以根据要实现的技术目的变更执行次序,只要能达到相同或者相类似的技术效果即可。本申请中所出现的模块的划分,是一种逻辑上的划分,实际应用中实现时可以有另外的划分方式,例如多个模块可以结合成或集成在另一个系统中,或一些特征可以忽略,或不执行,另外,所显示的或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些端口,模块之间的间接耦合或通信连接可以是电性或其他类似的形式,本申请中均不作限定。并且,作为分离部件说明的模块或子模块可以是也可以不是物理上的分离,可以是也可以不是物理模块,或者可以分布到多个电路模块中,可以根据实际的需要选择其中的部分或全部模块来实现本申请方案的目的。The terms "first", "second" and the like in the description and claims of the present application and the above drawings are used to distinguish similar objects and are not necessarily used to describe a specific order or sequence. It is to be understood that data so used may be interchanged under appropriate circumstances so that the embodiments described herein can be practiced in sequences other than those illustrated or described herein. The term "and/or" in this application is only an association relationship to describe associated objects, which means that there can be three kinds of relationships, for example, A and/or B, which can mean that A exists alone, A and B exist at the same time, independently There are three cases of B. In addition, the character "/" in this article generally indicates that the related objects before and after are an "or" relationship. Furthermore, the terms "comprising" and "having", and any variations thereof, are intended to cover non-exclusive inclusion, eg, a process, method, system, product or device comprising a series of steps or modules not necessarily limited to those expressly listed Rather, those steps or modules may include other steps or modules not expressly listed or inherent to these processes, methods, products or apparatus. The naming or numbering of the steps in this application does not mean that the steps in the method flow must be executed in the time/logical sequence indicated by the naming or numbering, and the named or numbered process steps can be implemented according to the The technical purpose is to change the execution order, as long as the same or similar technical effects can be achieved. The division of modules in this application is a logical division. In practical applications, there may be other divisions. For example, multiple modules may be combined or integrated into another system, or some features may be ignored. , or not implemented, in addition, the shown or discussed mutual coupling or direct coupling or communication connection may be through some ports, and the indirect coupling or communication connection between modules may be electrical or other similar forms. There are no restrictions in the application. In addition, the modules or sub-modules described as separate components may or may not be physically separated, may or may not be physical modules, or may be distributed into multiple circuit modules, and some or all of them may be selected according to actual needs. module to achieve the purpose of the solution of this application.

Claims (20)

  1. 一种利用多核处理候选框的装置,其特征在于,所述装置包括多个单核处理器,An apparatus for processing candidate frames using multiple cores, wherein the apparatus comprises multiple single-core processors,
    所述多个单核处理器中的多个第一单核处理器,用于并行执行下述流程:The multiple first single-core processors in the multiple single-core processors are configured to execute the following processes in parallel:
    从待检测图像的全部候选框中获取部分候选框;Obtain some candidate frames from all candidate frames of the image to be detected;
    获取所述部分候选框中的每个候选框与所述全部候选框中的每个候选框之间的抑制关系;obtaining the suppression relationship between each candidate frame in the partial candidate frame and each candidate frame in the all candidate frames;
    所述多个单核处理器中的第二单核处理器,用于根据每个所述第一单核处理器获取的所述抑制关系和所述全部候选框中的每个候选框对应的置信度,获取目标候选框,其中,所述第二单核处理器是所述多个第一单核处理器中的任意一个,或者所述第二单核处理器与所述多个第一单核处理器中的任意一个都不相同。The second single-core processor in the plurality of single-core processors is configured to obtain the suppression relationship according to each of the first single-core processors and the corresponding relationship between each candidate frame in the all candidate frames. Confidence, obtain a target candidate frame, where the second single-core processor is any one of the multiple first single-core processors, or the second single-core processor is associated with the multiple first single-core processors None of the single-core processors are the same.
  2. 根据权利要求1所述的装置,其特征在于,每个所述第一单核处理器,具体用于:The apparatus according to claim 1, wherein each of the first single-core processors is specifically used for:
    根据各自的标识信息从所述全部候选框中获取所述部分候选框。The partial candidate frames are acquired from all the candidate frames according to the respective identification information.
  3. 根据权利要求2所述的装置,其特征在于,所述每个所述第一单核处理器,具体用于:The apparatus according to claim 2, wherein each of the first single-core processors is specifically used for:
    根据各自的所述标识信息从候选框序列中获取数目相同的、排序相邻的所述部分候选框,其中,所述候选框序列用于指示所述全部候选框的置信度的高低。According to the respective identification information, the partial candidate frames with the same number and adjacent in order are obtained from the candidate frame sequence, wherein the candidate frame sequence is used to indicate the confidence level of all the candidate frames.
  4. 根据权利要求1至3任一项所述的装置,其特征在于,所述第二单核处理器,还用于:The apparatus according to any one of claims 1 to 3, wherein the second single-core processor is further configured to:
    从每个所述第一单核处理器获取N个比特序列,所述N为所述部分候选框的数目,所述N个比特序列用于表示所述部分候选框中的每个候选框与所述全部候选框中的每个候选框之间的抑制关系。Obtain N bit sequences from each of the first single-core processors, where N is the number of the partial candidate frames, and the N bit sequences are used to indicate that each candidate frame in the partial candidate frames is the same as the Suppression relationship between each candidate box in the all candidate boxes.
  5. 根据权利要求4所述的装置,其特征在于,每个所述比特序列包括M个比特,所述M为全部所述候选框的数目,所述M个比特用于表示第一候选框和全部所述候选框中的每个候选框之间的抑制关系,所述第一候选框是所述部分候选框中的一个候选框。The apparatus according to claim 4, wherein each of the bit sequences includes M bits, where M is the number of all the candidate frames, and the M bits are used to represent the first candidate frame and all the candidate frames. Suppression relationship between each candidate frame in the candidate frame, the first candidate frame is one candidate frame in the partial candidate frame.
  6. 根据权利要求4或5所述的装置,其特征在于,所述第二单核处理器,具体用于:The apparatus according to claim 4 or 5, wherein the second single-core processor is specifically used for:
    根据获取的所述比特序列的序号,从获取的所述比特序列中获取待处理的比特序列,所述待处理的比特序列的序号是根据第二候选框的序号确定的,所述待处理的比特序列用于表示所述第二候选框和全部所述候选框中的每个候选框之间的抑制关系,所述第二候选框是所述全部候选框中的一个候选框,所述全部候选框中每个候选框的序号是根据所述候选框序列获取的;Obtain the bit sequence to be processed from the acquired bit sequence according to the acquired sequence number of the bit sequence, the sequence number of the to-be-processed bit sequence is determined according to the sequence number of the second candidate frame, the to-be-processed bit sequence is determined The bit sequence is used to represent the suppression relationship between the second candidate frame and each candidate frame in all the candidate frames, the second candidate frame is one candidate frame in the all candidate frames, and the all candidate frames The serial number of each candidate frame in the candidate frame is obtained according to the candidate frame sequence;
    根据所述待处理的比特序列以及已处理的比特序列,获取所述目标候选框,所述目标候选框为没有被所述第二候选框以及排序在所述第二候选框之前的其他候选框抑制的候选框,每个所述已处理的比特序列用于表示排序在所述第二候选框之前的每个其他候选框和全部所述候选框中的每个候选框之间的抑制关系。Obtain the target candidate frame according to the to-be-processed bit sequence and the processed bit sequence, where the target candidate frame is not selected by the second candidate frame and other candidate frames sorted before the second candidate frame Suppressed candidate frames, each of the processed bit sequences is used to represent a suppression relationship between each other candidate frame ranked before the second candidate frame and each candidate frame in all the candidate frames.
  7. 根据权利要求1至6任一项所述的装置,其特征在于,所述每个第一单核处理器获取的部分所述候选框组成了全部所述候选框。The apparatus according to any one of claims 1 to 6, wherein a part of the candidate frames acquired by each first single-core processor constitutes all the candidate frames.
  8. 根据权利要求1至7任一项所述的装置,其特征在于,每个所述第一单核处理器,具体用于:The apparatus according to any one of claims 1 to 7, wherein each of the first single-core processors is specifically configured to:
    获取全部所述候选框中每个候选框的面积;Obtain the area of each candidate frame in all the candidate frames;
    获取所述部分候选框中的每个候选框与所述全部候选框中的每个候选框之间的重叠面积;obtaining the overlapping area between each candidate frame in the partial candidate frame and each candidate frame in the all candidate frames;
    获取所述部分候选框中的每个候选框与所述全部候选框中的每个候选框之间的叠加面积;obtaining the overlapping area between each candidate frame in the partial candidate frame and each candidate frame in the entire candidate frame;
    根据所述重叠面积和所述叠加面积的比值与预设阈值的关系,获取所述部分候选框中的每个候选框与所述全部候选框中的每个候选框之间的抑制关系。According to the relationship between the overlapping area and the ratio of the overlapping area and a preset threshold, the suppression relationship between each candidate frame in the partial candidate frame and each candidate frame in the all candidate frames is acquired.
  9. 一种利用多核处理候选框的利用多核处理候选框的方法,其特征在于,包括:A method for processing candidate frames using multiple cores, comprising:
    通过多个第一单核处理器,并行执行下述流程:Through the multiple first single-core processors, the following processes are executed in parallel:
    通过每个所述第一单核处理器从待检测图像的全部候选框中获取部分候选框;Obtain partial candidate frames from all candidate frames of the image to be detected by each of the first single-core processors;
    通过每个所述第一单核处理器获取所述部分候选框中的每个候选框与所述全部候选框中的每个候选框之间的抑制关系;Obtain, by each of the first single-core processors, the suppression relationship between each candidate frame in the partial candidate frame and each candidate frame in the entire candidate frame;
    通过第二单核处理器根据每个所述第一单核处理器获取的所述抑制关系,以及所述全部候选框中的每个候选框对应的置信度,获取目标候选框,其中,所述第二单核处理器是所述多个第一单核处理器中的任意一个,或者所述第二单核处理器与所述多个第一单核处理器中的任意一个都不相同。The target candidate frame is obtained by the second single-core processor according to the inhibition relationship obtained by each of the first single-core processor and the confidence level corresponding to each candidate frame in the all candidate frames, wherein the the second single-core processor is any one of the multiple first single-core processors, or the second single-core processor is different from any one of the multiple first single-core processors .
  10. 根据权利要求9所述的方法,其特征在于,所述通过每个所述第一单核处理器从待检测图像的全部候选框中获取部分候选框,包括:The method according to claim 9, characterized in that the obtaining, by each of the first single-core processors, some candidate frames from all candidate frames of the image to be detected comprises:
    通过每个所述第一单核处理器各自的标识信息从候选框序列中获取所述部分候选框。The partial candidate frame is acquired from the sequence of candidate frames through the respective identification information of each of the first single-core processors.
  11. 根据权利要求10所述的方法,其特征在于,所述通过每个所述第一单核处理器各自的标识信息从候选框序列中获取所述部分候选框,包括:The method according to claim 10, wherein the obtaining the partial candidate frame from the candidate frame sequence through the respective identification information of each of the first single-core processors comprises:
    通过每个所述第一单核处理器各自的所述标识信息从所述候选框序列中获取数目相同的、排序相邻的所述部分候选框,其中,所述候选框序列用于指示所述全部候选框的置信度的高低。The partial candidate frames with the same number and adjacent in order are obtained from the candidate frame sequence through the respective identification information of each of the first single-core processors, wherein the candidate frame sequence is used to indicate the The confidence level of all the candidate boxes described above.
  12. 根据权利要求9至11任一项所述的方法,其特征在于,所述方法还包括:The method according to any one of claims 9 to 11, wherein the method further comprises:
    通过所述第二单核处理器从每个所述第一单核处理器获取N个比特序列,所述N个比特序列用于表示所述部分候选框中的每个候选框与所述全部候选框中的每个候选框之间的抑制关系。N bit sequences are obtained from each of the first single-core processors by the second single-core processor, and the N bit sequences are used to represent each candidate frame in the partial candidate frame and the entire candidate frame. Suppression relationship between each candidate box in the candidate box.
  13. 根据权利要求12所述的方法,其特征在于,每个所述比特序列包括M个比特,所述M为全部所述候选框的数目,所述M个比特用于表示第一候选框和全部所述候选框中的每个候选框之间的抑制关系,所述第一候选框是所述部分候选框中的一个候选框。The method according to claim 12, wherein each of the bit sequences includes M bits, where M is the number of all the candidate frames, and the M bits are used to represent the first candidate frame and all the candidate frames. Suppression relationship between each candidate frame in the candidate frame, the first candidate frame is one candidate frame in the partial candidate frame.
  14. 根据权利要求12或13所述的方法,其特征在于,所述通过第二单核处理器根据每个所述第一单核处理器获取的所述抑制关系,以及所述全部候选框中的每个候选框对应的置信度,获取目标候选框,包括:The method according to claim 12 or 13, wherein the suppression relationship obtained by the second single-core processor according to each of the first single-core processors, and the The confidence corresponding to each candidate frame is obtained, and the target candidate frame is obtained, including:
    根据获取的所述比特序列的序号,从获取的所述比特序列中获取待处理的比特序列,所述待处理的比特序列的序号是根据第二候选框的序号确定的,所述待处理的比特序列用于表示所述第二候选框和全部所述候选框中的每个候选框之间的抑制关系,所述第二候选框是所述全部候选框中的一个候选框,所述全部候选框中每个候选框的序号是根据所述候选框序列获取的;Obtain the bit sequence to be processed from the acquired bit sequence according to the acquired sequence number of the bit sequence, the sequence number of the to-be-processed bit sequence is determined according to the sequence number of the second candidate frame, the to-be-processed bit sequence is determined The bit sequence is used to represent the suppression relationship between the second candidate frame and each candidate frame in all the candidate frames, the second candidate frame is one candidate frame in the all candidate frames, and the all candidate frames The serial number of each candidate frame in the candidate frame is obtained according to the candidate frame sequence;
    根据所述待处理的比特序列以及已处理的比特序列,获取所述目标候选框,所述目标候选框为没有被所述第二候选框以及排序在所述第二候选框之前的其他候选框抑制的候选框,每个所述已处理的比特序列用于表示排序在所述第二候选框之前的每个其他候选框和全部所述候选框中的每个候选框之间的抑制关系。Obtain the target candidate frame according to the to-be-processed bit sequence and the processed bit sequence, where the target candidate frame is not selected by the second candidate frame and other candidate frames sorted before the second candidate frame Suppressed candidate frames, each of the processed bit sequences is used to represent a suppression relationship between each other candidate frame ranked before the second candidate frame and each candidate frame in all the candidate frames.
  15. 根据权利要求9至14任一项所述的方法,其特征在于,所述每个第一单核处理器获取的部分所述候选框组成了全部所述候选框。The method according to any one of claims 9 to 14, wherein part of the candidate frames obtained by each first single-core processor constitutes all the candidate frames.
  16. 根据权利要求9至15任一项所述的方法,其特征在于,所述通过每个所述第一单核处理器获取所述部分候选框中的每个候选框与所述全部候选框中的每个候选框之间的抑制关系,包括:The method according to any one of claims 9 to 15, wherein the obtaining each candidate frame and all the candidate frames in the partial candidate frames through each of the first single-core processors The suppression relationship between each candidate box of , including:
    通过每个所述第一单核处理器获取所述全部候选框中每个候选框的面积;Obtain the area of each candidate frame in all candidate frames through each of the first single-core processors;
    通过每个所述第一单核处理器获取所述部分候选框中的每个候选框与所述全部候选框中的每个候选框之间的重叠面积;Obtain, by each of the first single-core processors, the overlapping area between each candidate frame in the partial candidate frame and each candidate frame in the entire candidate frame;
    通过每个所述第一单核处理器获取所述部分候选框中的每个候选框与所述全部候选框中的每个候选框之间的叠加面积;Obtain, by each of the first single-core processors, the overlapping area between each candidate frame in the partial candidate frame and each candidate frame in the entire candidate frame;
    通过每个所述第一单核处理器根据所述重叠面积和所述叠加面积的比值与预设阈值的关系,获取所述部分候选框中的每个候选框与所述全部候选框中的每个候选框之间的抑制关系。Obtain each candidate frame in the partial candidate frame and the whole candidate frame by each of the first single-core processors according to the relationship between the overlapping area and the ratio of the overlapping area and a preset threshold. Suppression relationship between each candidate box.
  17. 一种利用多核处理候选框的装置,其特征在于,包括:A device for processing candidate frames using multiple cores, comprising:
    存储器,用于存储计算机可读指令;memory for storing computer-readable instructions;
    与所述存储器耦合的处理器,用于执行所述存储器中的计算机可读指令从而执行如权利要求9至16任一项所描述的方法。A processor coupled to the memory for executing computer readable instructions in the memory to perform a method as described in any one of claims 9 to 16.
  18. 一种芯片系统,其特征在于,所述芯片系统包括处理器和通信接口,所述处理器通过所述通信接口获取程序指令,当所述程序指令被所述处理器执行时实现权利要求9至16中任一项所述的方法。A chip system, characterized in that the chip system includes a processor and a communication interface, the processor obtains program instructions through the communication interface, and when the program instructions are executed by the processor, claims 9 to 10 are implemented. The method of any of 16.
  19. 一种计算机可读存储介质,其特征在于,包括程序,当其被处理单元所执行时,执行如权利要求9至16中任一项所述的方法。A computer-readable storage medium, characterized by comprising a program that, when executed by a processing unit, executes the method according to any one of claims 9 to 16.
  20. 一种计算机程序产品,其特征在于,当所述计算机程序产品在计算机上运行时,使得计算机执行如权利要求9至16中任一项所述的方法。A computer program product, characterized in that, when the computer program product is run on a computer, the computer is caused to execute the method according to any one of claims 9 to 16.
PCT/CN2021/074313 2021-01-29 2021-01-29 Apparatus and method for processing candidate boxes by using plurality of cores WO2022160229A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202180092237.5A CN116762092A (en) 2021-01-29 2021-01-29 Apparatus and method for processing candidate frame by using multi-core
PCT/CN2021/074313 WO2022160229A1 (en) 2021-01-29 2021-01-29 Apparatus and method for processing candidate boxes by using plurality of cores

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2021/074313 WO2022160229A1 (en) 2021-01-29 2021-01-29 Apparatus and method for processing candidate boxes by using plurality of cores

Publications (1)

Publication Number Publication Date
WO2022160229A1 true WO2022160229A1 (en) 2022-08-04

Family

ID=82652853

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/074313 WO2022160229A1 (en) 2021-01-29 2021-01-29 Apparatus and method for processing candidate boxes by using plurality of cores

Country Status (2)

Country Link
CN (1) CN116762092A (en)
WO (1) WO2022160229A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116051548A (en) * 2023-03-14 2023-05-02 中国铁塔股份有限公司 Positioning method and device

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109409517A (en) * 2018-09-30 2019-03-01 北京字节跳动网络技术有限公司 The training method and device of object detection network
CN109800809A (en) * 2019-01-22 2019-05-24 华南理工大学 A kind of candidate region extracting method decomposed based on dimension
CN110647794A (en) * 2019-07-12 2020-01-03 五邑大学 Attention mechanism-based multi-scale SAR image recognition method and device
CN111626916A (en) * 2020-06-01 2020-09-04 上海商汤智能科技有限公司 Information processing method, device and equipment

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109409517A (en) * 2018-09-30 2019-03-01 北京字节跳动网络技术有限公司 The training method and device of object detection network
CN109800809A (en) * 2019-01-22 2019-05-24 华南理工大学 A kind of candidate region extracting method decomposed based on dimension
CN110647794A (en) * 2019-07-12 2020-01-03 五邑大学 Attention mechanism-based multi-scale SAR image recognition method and device
CN111626916A (en) * 2020-06-01 2020-09-04 上海商汤智能科技有限公司 Information processing method, device and equipment

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116051548A (en) * 2023-03-14 2023-05-02 中国铁塔股份有限公司 Positioning method and device
CN116051548B (en) * 2023-03-14 2023-08-11 中国铁塔股份有限公司 Positioning method and device

Also Published As

Publication number Publication date
CN116762092A (en) 2023-09-15

Similar Documents

Publication Publication Date Title
CN110378381B (en) Object detection method, device and computer storage medium
Wang et al. Detect globally, refine locally: A novel approach to saliency detection
CN107945204B (en) Pixel-level image matting method based on generation countermeasure network
CN112446270B (en) Training method of pedestrian re-recognition network, pedestrian re-recognition method and device
WO2023138300A1 (en) Target detection method, and moving-target tracking method using same
CN111291809B (en) Processing device, method and storage medium
CN112561027A (en) Neural network architecture searching method, image processing method, device and storage medium
KR101581112B1 (en) Method for generating hierarchical structured pattern-based descriptor and method for recognizing object using the descriptor and device therefor
CN111931764B (en) Target detection method, target detection frame and related equipment
Masurekar et al. Real time object detection using YOLOv3
CN110232361B (en) Human behavior intention identification method and system based on three-dimensional residual dense network
CN111832484A (en) Loop detection method based on convolution perception hash algorithm
KR20190071079A (en) Apparatus and method for recognizing image
CN111079539A (en) Video abnormal behavior detection method based on abnormal tracking
CN111062278A (en) Abnormal behavior identification method based on improved residual error network
US20230060211A1 (en) System and Method for Tracking Moving Objects by Video Data
KR20180109658A (en) Apparatus and method for image processing
CN110837786A (en) Density map generation method and device based on spatial channel, electronic terminal and medium
CN108345835B (en) Target identification method based on compound eye imitation perception
WO2022160229A1 (en) Apparatus and method for processing candidate boxes by using plurality of cores
CN111881915A (en) Satellite video target intelligent detection method based on multiple prior information constraints
Chen et al. DuBox: No-prior box objection detection via residual dual scale detectors
Gu et al. Thermal image colorization using Markov decision processes
CN114155278A (en) Target tracking and related model training method, related device, equipment and medium
CN110135428A (en) Image segmentation processing method and device

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21921836

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 202180092237.5

Country of ref document: CN

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21921836

Country of ref document: EP

Kind code of ref document: A1