WO2022193074A1 - Rpn网络的后处理方法及装置 - Google Patents

Rpn网络的后处理方法及装置 Download PDF

Info

Publication number
WO2022193074A1
WO2022193074A1 PCT/CN2021/080811 CN2021080811W WO2022193074A1 WO 2022193074 A1 WO2022193074 A1 WO 2022193074A1 CN 2021080811 W CN2021080811 W CN 2021080811W WO 2022193074 A1 WO2022193074 A1 WO 2022193074A1
Authority
WO
WIPO (PCT)
Prior art keywords
target
region
target candidate
score
candidate regions
Prior art date
Application number
PCT/CN2021/080811
Other languages
English (en)
French (fr)
Inventor
闫隆鑫
陈创荣
符雷
Original Assignee
深圳市大疆创新科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳市大疆创新科技有限公司 filed Critical 深圳市大疆创新科技有限公司
Priority to PCT/CN2021/080811 priority Critical patent/WO2022193074A1/zh
Publication of WO2022193074A1 publication Critical patent/WO2022193074A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/30Post-processing

Definitions

  • the present application relates to the field of computer technology, and in particular, to a post-processing method and device for an RPN network.
  • RPN Region Proposal Network
  • the post-processing of the RPN network is an important step to realize the target detection algorithm.
  • the post-processing of the RPN network includes: 1. According to the output of the RPN network and the preset prior region, each candidate region (ROI, region of interest, the size of the region of interest); 2. Filter ROIs that are too small; 3. Sort ROIs according to their scores, and keep the N ROIs with the highest scores; 4. Perform non-maximum suppression on N ROIs ( NMS, Non-Maximum Suppression) processing to obtain the final M ROIs.
  • the present application provides a post-processing method and device for an RPN network, which can solve the problems of long time-consuming and low efficiency in the post-processing process in the prior art.
  • an embodiment of the present application provides a post-processing method for an RPN network, including:
  • an embodiment of the present application provides a post-processing device for an RPN network, including: an acquisition module and a processor;
  • the obtaining module is used to obtain the score of the candidate region output by the RPN network, and the offset between the candidate region and the corresponding prior region;
  • the processing module is configured to determine the N target candidate regions with the largest scores from all the candidate regions;
  • the present application provides a computer-readable storage medium, the computer-readable storage medium comprising instructions that, when executed on a computer, cause the computer to perform the method described in the above aspects.
  • the present application provides a computer program product comprising instructions which, when run on a computer, cause the computer to perform the method described in the above aspects.
  • the present application can first sort and screen the candidate regions according to the scores of the candidate regions output by the RPN network and the offset between the candidate regions and the corresponding prior regions, and determine the N with the largest score. target candidate regions, and then calculate the size of the N target candidate regions and consider the overlap rate between the candidate regions, because the number of N target candidate regions has been greatly reduced compared to the number of candidate regions output by the RPN network. , therefore, the calculation amount in the process of calculating the size of the candidate region can be greatly reduced, a large number of redundant calculations can be reduced, and the load pressure of the processor can be reduced.
  • FIG. 1 is a flowchart of a post-processing method for an RPN network provided by an embodiment of the present application
  • FIG. 2 is a specific flowchart of a post-processing method for an RPN network provided by an embodiment of the present application
  • FIG. 3 is a schematic diagram of a TOP N algorithm provided by an embodiment of the present application.
  • FIG. 4 is a schematic diagram of another TOP N algorithm provided by an embodiment of the present application.
  • FIG. 5 is a schematic diagram of an NMS algorithm provided by an embodiment of the present application.
  • FIG. 6 is a block diagram of a post-processing apparatus of an RPN network provided by an embodiment of the present application.
  • a fixed-point neural network model using an RPN network can be used to improve the efficiency of target detection, wherein the RPN network can take a feature map as an input and output a feature map
  • the RPN network can take a feature map as an input and output a feature map
  • the post-processing of the RPN network refers to using the output content of the RPN network to calculate, and finally obtain the specific size of M (the value of M can be set according to actual needs) candidate regions.
  • the selection of these M candidate regions also considers The influence of the score and the overlap rate has better value.
  • the M candidate regions are subjected to subsequent pooling, classification and regression operations, and the detection category and position of the target can be output to complete the target detection.
  • the process of calculating the specific size of the candidate area involves converting the data in the fixed-point format output by the RPN network into the floating-point format, and the exponential operation in the floating-point format.
  • the processor load increases dramatically.
  • the size of all candidate regions output by the RPN network is first calculated, which undoubtedly puts a great pressure on the load of the processor.
  • subsequent screening and sorting operations will screen out a part of the candidate regions, and the computing resources consumed by the size of these screened candidate regions are wasted, resulting in redundant computing.
  • the candidate regions can be sorted and screened according to the scores of the candidate regions output by the RPN network and the offset between the candidate regions and the corresponding prior regions, and the N with the highest scores can be determined.
  • Target candidate regions and then calculate the size of N target candidate regions and consider the overlap rate between candidate regions. Since the number of N target candidate regions has been greatly reduced compared to the number of candidate regions output by the RPN network, Therefore, the calculation amount in the process of calculating the size of the candidate region can be greatly reduced, a large number of redundant calculations can be reduced, and the load pressure of the processor can be reduced.
  • FIG. 1 is a flowchart of a post-processing method for an RPN network provided by an embodiment of the present application. As shown in FIG. 1 , the method may include:
  • Step 101 Obtain the score of the candidate region output by the RPN network, and the offset between the candidate region and the corresponding prior region.
  • the RPN network can take the feature map as an input, and output the offset between the candidate region that may contain the target in the feature map relative to the preset prior region, and is used to reflect the relative relationship between the candidate region and the target region.
  • the score of the matching possibility of the prior region the greater the probability that the candidate region matches the prior region, the greater the score.
  • the a priori area is the area with the size of the frame and the possible location of the common target in the scene by using the prior rules.
  • the RPN network is extracted from the feature map.
  • the candidate regions that may contain the target are matched with the prior regions to obtain the score of the candidate region and the offset between the candidate region and the corresponding prior region.
  • Step 102 From all the candidate regions, determine N target candidate regions with the largest scores.
  • sorting and screening operations can be performed first, so as to obtain the N (TOPN) target candidate regions with the largest scores.
  • the value of N can be set according to actual requirements. For example, when the computing resources are sufficient, the value of N can be relatively large, and when the computing resources are insufficient, the value of N can be relatively small. Sorting and filtering operations can be specifically performed by using a variety of sorting and filtering algorithms, such as a quick sort algorithm, a bucket sorting algorithm, and the like.
  • Step 103 Calculate the size of the target candidate region according to the size of the target priori region corresponding to the target candidate region and the target offset between the target candidate region and the target priori region.
  • the embodiment of the present application can greatly reduce the process of calculating the size of candidate regions compared with the related art. It reduces the amount of redundant computing and reduces the load pressure of the processor.
  • Step 104 Calculate the overlap ratio between the target candidate regions according to the size of the target candidate regions, and select M target candidate regions from the N target candidate regions according to the overlap ratio.
  • the overlap ratio between the target candidate regions is less than or equal to the preset overlap ratio threshold.
  • the overlap ratio between the two candidate regions is too large, it means that the similarity between the two candidate regions is high and there is redundant calculation.
  • the candidate with a larger score among the two candidate regions of the M target candidate regions are selected from the regions, and the overlap ratio between the M target candidate regions is less than or equal to a preset overlap ratio threshold.
  • the value of M can be set according to actual requirements. For example, when the computing resources are sufficient, the value of M can be relatively large, and when the computing resources are insufficient, the value of M can be relatively small.
  • This step can be completed by using a variety of algorithms. For example, in one implementation, a Non-maximum Suppression (NMS, Non-maximum Suppression) algorithm can be used to achieve this step.
  • NMS Non-maximum Suppression
  • the idea of NMS is to classify the target candidate area according to the score After sorting, the target candidate regions with a higher overlap rate are excluded from the target candidate regions with a lower score than a certain target candidate region.
  • a post-processing method of an RPN network obtains the score of the candidate region output by the RPN network, and the offset between the candidate region and the corresponding prior region; , determine the N target candidate regions with the largest scores; calculate the size of the target candidate region according to the size of the target a priori region corresponding to the target candidate region and the target offset between the target candidate region and the target prior region; The size of the candidate area, calculate the overlap rate between the target candidate areas, and select M target candidate areas from the N target candidate areas according to the overlap rate, and the overlap rate between the M target candidate areas is less than or equal to the preset overlap rate threshold.
  • the application can first sort and filter the candidate regions, and determine the N target candidate regions with the highest scores, and then Calculate the size of the N target candidate regions and consider the overlap rate between the candidate regions. Since the number of N target candidate regions has been greatly reduced compared to the number of candidate regions output by the RPN network, the calculation can be greatly reduced. The calculation amount in the process of candidate area size reduces a lot of redundant calculation and reduces the load pressure of the processor.
  • FIG. 2 is a specific flowchart of a post-processing method for an RPN network provided by an embodiment of the present application. As shown in FIG. 2 , the method may include:
  • Step 201 Obtain the score of the candidate region output by the RPN network, and the offset between the candidate region and the corresponding prior region.
  • step 201 For details of step 201, reference may be made to the foregoing step 101, which will not be repeated here.
  • Step 202 According to the value range and dimension of the score of the candidate area, according to the bucket sorting rule, establish a bucket sorting model, and the bucket sorting model is used to store the index serial number and output when a preset termination condition is reached.
  • the score of the candidate area includes: the score corresponding to the three dimensions and the index number of the candidate area respectively.
  • the quick sort algorithm is used in the relevant calculation to obtain the N candidate regions with the highest scores.
  • n the number of candidate models participating in the quick sort algorithm
  • a sorting algorithm such as quick sort is used, and its algorithm complexity is O(n ⁇ log(n)).
  • O(n ⁇ log(n)) the algorithm complexity of the order of magnitude of n is relatively large, so this process is more dependent on computing resources and takes a long time on processors with weak computing performance.
  • the embodiment of the present application provides a bucket sorting algorithm to achieve N target candidate regions with the largest output scores.
  • the bucket sorting algorithm stores elements in the same value range in the set to be sorted into the same bucket, that is, according to the elements
  • the value feature splits the collection into multiple regions, and the multiple buckets formed after the split are in an ordered state from the perspective of the value domain. If the elements in each bucket are sorted, the set of elements in all buckets is sorted.
  • the bucket sorting algorithm requires the objects to be sorted to be discrete and finite. It can use this feature to replace the sorting operation with a classification operation.
  • the algorithm complexity of the bucket sorting algorithm is O(n), which requires less memory than the original algorithm and greatly reduces the amount of computation.
  • FIG. 3 shows a schematic diagram of a TOP N algorithm provided by an embodiment of the present application, wherein the RPN network outputs the rpn_cls_score branch (score branch) and rpn_bbox_pred (offset branch), and for the rpn_cls_score branch,
  • the scores of candidate regions include scores corresponding to different prior regions, and each score corresponds to the index number index of three dimensions.
  • the three dimensions include C dimension, W dimension, and H dimension; C dimension is used to reflect each feature in the feature map. The number of prior regions corresponding to the location.
  • the W dimension is used to reflect the number of rows where the prior region exists in the feature map.
  • the H dimension is used to reflect the number of columns in the feature map where the prior regions exist.
  • the index numbers index on the three dimensions of a candidate region may constitute a set of indexes, and the index is used to find the size of the prior region corresponding to the candidate region, so as to be used to calculate the size of the subsequent candidate region.
  • the output of the RPN network is in a fixed-point format.
  • the output value range is only 256 integers at most.
  • the output score ranges from -128 to 127, so that the rpn_cls_score branch has the discrete and fixed-point format. limited. Therefore, the embodiment of the present application can adopt the idea of "bucket sorting", only keep the largest N values among all the score values, and simplify and equivalently replace the conventional quick sort algorithm, so as to reduce the computational load and improve the computational efficiency. purpose of efficiency.
  • a bucket sorting model can be established according to the bucket sorting rules according to the value range and dimensions of the scores of the candidate regions.
  • the bucket sorting model can store the index number and output the N with the largest score when a preset termination condition is reached. target candidate regions. For example, for the branch of rpn_cls_score, the output score ranges from -128 to 127, and corresponds to the three dimensions of C, H, and W. Therefore, three groups of "buckets” can be established, and each group of "buckets” contains 256 “buckets”. Bucket", namely “bucket”-128, “bucket"-127, "bucket”-126....
  • the value range of the score includes K scores, and step 202 may specifically include:
  • Sub-step 2021 Establish three storage area groups corresponding to the three dimensions to obtain the bucket sorting model.
  • each group of storage area groups includes K storage areas, the K storage areas are in one-to-one correspondence with the K scores, and the upper limit of the capacity of each storage area is N; when one of the storage areas is stored in one of the When indexing the serial number, the capacity of the storage area is incremented by one.
  • each group of storage area groups includes K storage areas.
  • the capacity of each storage area is N.
  • the capacity of the storage area is increased by one.
  • the capacity reaches N the storage area is full.
  • Step 203 Input the index number of the score of the candidate region into the bucket sorting model, and obtain the N target candidate regions with the largest scores output by the bucket sorting model, and the score of each target candidate region is in The corresponding index numbers in the three dimensions.
  • the index number of the score in the rpn_cls_score branch can be input into the bucket sorting model, and the score in the rpn_cls_score branch can be matched with the score corresponding to each "bucket" in the bucket sorting model, and if the two scores are consistent, they are matched , at this time, the index number corresponding to the score is stored in the "bucket", until the preset termination condition is reached, the three groups of "buckets" can output the scores of the N target candidate regions with the largest scores corresponding to the three dimensions index number.
  • each group of “buckets” contains 256 “buckets", namely “bucket”-128, “bucket”-127, “bucket”-126... .”
  • bucket" 126, "bucket” 127, each "bucket” has a capacity of N.
  • the index number corresponding to the score is placed in the "bucket” corresponding to the dimension according to the dimension.
  • the capacity of the "bucket” (127) with the highest score reaches the upper limit N or all the scores are traversed, It can be considered that the termination condition is reached.
  • the index numbers corresponding to the scores of the N target candidate regions in the three dimensions can be extracted from the three "buckets" (C127, W127, H127) with the highest scores; when all the scores have been traversed , but if the capacity of the "bucket" with the highest score does not reach the upper limit N, start from the "bucket” with the highest score, and sequentially extract the index number in the "bucket” corresponding to each score until N target candidate regions are obtained The index number corresponding to the score in the three dimensions.
  • step 203 may specifically include:
  • Sub-step 2031 in the case that the score of the target dimension of the candidate area matches the score corresponding to the storage area in the target storage area group corresponding to the target dimension, the index number corresponding to the score of the target dimension, Stored in the storage area corresponding to the score; the target dimension is any one of the three dimensions of the candidate area.
  • Sub-step 2032 When the index numbers of all the scores of the candidate area are stored, or when the capacity of the storage area in the storage area group with the largest score reaches N, determine that the termination condition is reached, and delete the storage area from the storage area.
  • the index numbers corresponding to the scores of the target candidate regions in the three dimensions are extracted from the group.
  • FIG. 4 shows a schematic diagram of another TOP N algorithm provided by the embodiment of the present application, wherein it is assumed that the candidate region
  • the value ranges from 0 to 3; the score of the candidate area corresponds to three dimensions, and three sets of storage area groups can be established.
  • the capacity of the slice is 3. When storing the index serial number in the storage slice, the capacity of the storage slice is increased by one for each set of index serial numbers stored. When the capacity reaches 3, the storage slice is full.
  • sub-step 2032 may specifically include:
  • Sub-step A1 when the capacity of the storage area in the storage area group with the largest score reaches N, the index number extracted from the storage area group with the largest score is used as the score of the target candidate area in the The corresponding index numbers in the three dimensions.
  • Sub-step A2 in the case where the index numbers of all the scores of the candidate regions are stored, extract the index numbers in each of the storage segment groups in descending order of the scores of the storage segment groups as The index numbers corresponding to the target candidate regions in the three dimensions are extracted until N corresponding index numbers of the target candidate regions in the three dimensions are obtained.
  • the "bucket” corresponding to each score is sequentially extracted from the "bucket” with the highest score.
  • the index number in the “bucket” 127 is extracted. If the index number requirement of the N target candidate regions is not met, the index number is further extracted from the “bucket” 126 until the N target candidate regions are obtained.
  • the index number corresponding to the score in the three dimensions is extracted.
  • Step 204 Obtain the target offset corresponding to the target candidate region and the size of the target prior region according to the index constructed by the index numbers of the score of the target candidate region in the three dimensions.
  • the score of the candidate area includes: the score of the candidate area corresponding to the three dimensions and the index number; the index constructed by the index number of the dimension, the size of the prior area corresponding to the candidate area, the There is a correspondence between the offsets of the candidate regions.
  • the index constructed by the index numbers of the three dimensions in the memory of the electronic device, there is a correspondence between the index constructed by the index numbers of the three dimensions, the size of the prior region corresponding to the candidate region, and the offset of the candidate region.
  • the index constructed by the index numbers of the three dimensions can be further extracted from the corresponding relationship to obtain the size of the prior region corresponding to the corresponding candidate region, the size of the candidate region Offset.
  • the offset and the size of the a priori area are respectively stored in a continuous memory slice provided by the single instruction multiple data stream processor; the offset and the size of the a priori area are different from each other.
  • the value corresponds to the memory slice one-to-one, then step 204 can specifically also obtain the target offset and target first corresponding to the target candidate area by extracting from the continuous memory slice through an extraction operation according to the index.
  • the size of the test area is implemented.
  • the processor for processing the operation can be a single instruction multiple data stream (SIMD, Single Instruction Multiple Data) processor, SIMD processors can copy multiple operands through SIMD instructions and pack them into a set of instruction sets in large registers.
  • SIMD Single Instruction Multiple Data
  • SIMD-type processor several execution components can access the memory at the same time after the instruction is decoded, and obtain all operands for operation at one time. This feature makes SIMD especially suitable for multimedia applications and other data. intensive operations.
  • the size of the prior region and the offset of the candidate region since the length of the vector register of the SIMD processor is relatively long, these data can be respectively stored in the continuous continuous data provided by the single instruction multiple data stream processor.
  • the memory slice and make the value of the offset and the size of the a priori area correspond one-to-one with the memory slice, so that it can be more convenient to use the vector to perform parallel computation, and extract the size of the a priori area according to the index.
  • the offset of the candidate area you can use a SIMD instruction to extract the size of all the prior areas and the offset of the candidate area at one time, so as to make full use of the bandwidth and reduce the reading of data from memory to vector registers.
  • the purpose of fetching time is to obtain the most efficient memory access efficiency and improve the calculation speed.
  • the bucket sorting model, the size of the prior region and the offset are stored in a tightly coupled memory processor.
  • the processor for performing post-processing operations may also support Tightly Coupled Memories (TCM, Tightly Coupled Memories).
  • TCM is the memory closest to the computing unit in the processor, and the computing unit obtains data from the TCM.
  • the frequency is the same as the main frequency of the computing unit, and the delay is the lowest.
  • the bucket sorting model, the size of the prior region and the offset are stored in the tightly coupled memory processor, which can effectively reduce the computing delay. Improve computational efficiency.
  • Step 205 Calculate the size of the target candidate region according to the size of the target priori region corresponding to the target candidate region and the target offset between the target candidate region and the target priori region.
  • step 205 For details of step 205, reference may be made to the foregoing step 103, which will not be repeated here.
  • Sub-step 2051 calculate and obtain the center point coordinates (Xb, Yb) of the target candidate region according to formula 1 and formula 2.
  • Sub-step 2052 Calculate the width Wb and height Hb of the target candidate region according to formula 3 and formula 4.
  • S is the conversion coefficient from fixed-point number to floating-point number
  • v1 is the variance of the center point coordinate of the prior region and the width of the prior region
  • dw ⁇ w ⁇ S ⁇ v2
  • dh ⁇ h ⁇ S ⁇ v2
  • v2 is the variance of the center point coordinates of the prior region and the height of the prior region.
  • the output of the RPN network is in a fixed-point number format, and the final output of the RPN post-processing process is required to be in a floating-point number format. Therefore, it is necessary to pass
  • the conversion coefficient S between fixed-point numbers and floating-point numbers performs conversion between fixed-point numbers and floating-point numbers.
  • the value of edw is obtained by querying the first operation table, and the value of edh is obtained by querying the second operation table.
  • the operation table is obtained; the value range of ⁇ w and ⁇ h includes J fixed-point integers;
  • all 256 kinds of results of e ⁇ w ⁇ S ⁇ V2 can be calculated in advance and stored in the memory as the first operation table, and e ⁇ h ⁇ All 256 kinds of results of S ⁇ V2 are stored in memory as a second operation table.
  • Step 206 Sort the N target candidate regions in descending order of scores.
  • Step 207 Calculate the overlap ratio between each target candidate region with a higher score and all target candidate regions with a lower score according to the size of the target candidate region.
  • Step 208 Delete the target candidate regions with smaller scores whose overlap ratio is greater than the preset overlap ratio threshold.
  • Step 209 Select M target candidate regions from the remaining target candidate regions.
  • the overlap ratio between the two candidate regions is too large, it means that the similarity between the two candidate regions is high and there is redundant calculation.
  • the candidate with a larger score among the two candidate regions of the M target candidate regions are selected from the regions, and the overlap ratio between the M target candidate regions is less than or equal to a preset overlap ratio threshold.
  • the value of M can be set according to actual requirements. For example, when the computing resources are sufficient, the value of M can be relatively large, and when the computing resources are insufficient, the value of M can be relatively small.
  • the target candidate regions with a higher overlap rate are excluded from the target candidate regions with a lower score than a certain target candidate region.
  • step 209 may be specifically implemented by selecting from the remaining target candidate regions, starting from the target candidate region with the largest score, until selecting M target candidate regions.
  • the NMS algorithm needs to perform NMS on all N target candidate regions according to the order of scores from high to low, and the score of the smaller scoring target candidate region whose overlap ratio is greater than the preset overlap ratio threshold is set to 0 ( That is, delete), and then sort the scores again, and retain the M target candidate regions with the highest scores as the final candidate regions. But in fact, in the NMS calculation process, when it is found that there are M target candidate regions that have not been "suppressed", the NMS can be stopped, and the rest of the NMS calculations are redundant calculations.
  • FIG. 5 shows a schematic diagram of an NMS algorithm provided by an embodiment of the present application.
  • the sorting operation can be replaced by a copy operation, and when all the overlap ratios are larger than the preset overlap ratio threshold, the smaller After the score of the scoring target candidate region is set to 0, the 2nd, 5th, M-3, M+1, M+2, M+4, Nth positions are suppressed (scores set 0), the embodiment of the present application can copy the reserved target candidate regions after the M-th position (target candidate regions whose score is not set to 0) to the suppressed “holes” before the M-th position (the score is set to 0).
  • the location of the target candidate region can be, that is, in the order from front to back, copy the M-th target candidate region to the 2nd position, and copy the M+3-th target candidate region to the 5th position, Copy the target candidate region at the M+5th position to the M-3th position, so that all suppressed "holes" before the Mth position are completed, and the final M target candidate regions are obtained.
  • This processing idea reduces redundant computing, and replaces the sorting operation with a copy operation, which reduces the consumption of computing resources and improves the computing efficiency.
  • step 209 it may further include:
  • Step 210 The feature maps corresponding to the M target candidate regions are subjected to a pooling operation and then input to a convolutional neural network model to obtain a result of identifying the content of the M target candidate regions output by the convolutional neural network model.
  • the selection of the finally obtained M candidate regions takes into account the influence of the score and the overlap rate at the same time, and has better value.
  • the M candidate regions are pooled and classified in the subsequent convolutional neural network model. Regression and other operations can output the detection category and position of the target to complete target detection.
  • a post-processing method for an RPN network obtains the score of the candidate region output by the RPN network and the offset between the candidate region and the corresponding prior region; , determine the N target candidate regions with the largest scores; calculate the size of the target candidate region according to the size of the target a priori region corresponding to the target candidate region and the target offset between the target candidate region and the target prior region; The size of the candidate area, calculate the overlap rate between the target candidate areas, and select M target candidate areas from the N target candidate areas according to the overlap rate, and the overlap rate between the M target candidate areas is less than or equal to the preset overlap rate threshold.
  • the application can first sort and filter the candidate regions, and determine the N target candidate regions with the highest scores, and then Calculate the size of the N target candidate regions and consider the overlap rate between the candidate regions. Since the number of N target candidate regions has been greatly reduced compared to the number of candidate regions output by the RPN network, the calculation can be greatly reduced. The amount of computation in the process of candidate area size reduces a large number of redundant computations and reduces the load pressure of the processor.
  • FIG. 6 is a block diagram of a post-processing apparatus of an RPN network provided by an embodiment of the present application.
  • the post-processing apparatus 300 of the RPN network may include: an acquisition module 301 and a processing module 302;
  • the obtaining module 301 is configured to perform: obtaining the score of the candidate region output by the RPN network, and the offset between the candidate region and the corresponding prior region;
  • the processing module 302 is used to execute:
  • the scores of the candidate regions include: scores and index numbers of the candidate regions corresponding to three dimensions respectively;
  • the processing module is specifically used for:
  • a bucket sorting model is established, and the bucket sorting model is used to store the index sequence number, and output the output when a preset termination condition is reached.
  • the value range of the score includes K scores; the processing module is specifically configured to execute:
  • each group of storage area groups includes K storage areas, the K storage areas are in one-to-one correspondence with the K scores, and the upper limit of the capacity of each storage area is N; when one of the storage areas is stored in one of the When indexing the serial number, the capacity of the storage area is incremented by one.
  • processing module is specifically configured to execute:
  • the index number corresponding to the score of the target dimension is stored in the index number corresponding to the target dimension.
  • the target dimension is any one of the three dimensions of the candidate area;
  • the index numbers of all the scores of the candidate area are stored, or the capacity of the storage area in the storage area group with the largest score reaches N, it is determined that the termination condition is reached, and the storage area group is extracted from the storage area group.
  • processing module is specifically configured to execute:
  • the index number extracted from the storage area group with the highest score is used as the score of the target candidate area in the three dimensions the corresponding index number;
  • the index numbers in each of the storage slice groups are extracted as the target candidates in descending order of the scores of the storage slice groups
  • the index numbers corresponding to the regions in the three dimensions are extracted until the index numbers corresponding to the N target candidate regions in the three dimensions are obtained.
  • the score of the candidate area includes: the score of the candidate area corresponding to the three dimensions and the index number respectively; the index constructed by the index number of the dimension, the prior area corresponding to the candidate area, the There is a correspondence between the offsets of the candidate regions;
  • the processing module is also used to execute:
  • the target offset corresponding to the target candidate region and the size of the target prior region are obtained according to the index constructed by the index numbers of the score of the target candidate region in the three dimensions.
  • the offset and the size of the a priori area are respectively stored in a continuous memory slice provided by the single instruction multiple data stream processor; the offset and the size of the a priori area are different from each other.
  • the value corresponds to the memory slice one-to-one;
  • the processing module is specifically used to execute:
  • the target offset corresponding to the target candidate region and the size of the target prior region are extracted from the continuous memory slice through one extraction operation.
  • the size of the prior region includes: coordinates of the center point (Xa, Ya), width Wa and height Ha of the prior region; the offset includes: center point offset ⁇ x, ⁇ y, Width offset ⁇ w and height offset ⁇ h;
  • the processing module is specifically used to execute:
  • S is the conversion coefficient from fixed-point number to floating-point number
  • v1 is the variance of the center point coordinate of the prior region and the width of the prior region
  • dw ⁇ w ⁇ S ⁇ v2
  • dh ⁇ h ⁇ S ⁇ v2
  • v2 is the variance of the center point coordinates of the prior region and the height of the prior region.
  • the value of edw is obtained by querying the first operation table, and the value of edh is obtained by querying the second operation table.
  • the operation table is obtained; the value range of ⁇ w and ⁇ h includes J fixed-point integers;
  • processing module is specifically configured to execute:
  • the target candidate region calculates the overlap ratio between each target candidate region with a larger score and all target candidate regions with a smaller score
  • M target candidate regions are selected from the remaining target candidate regions.
  • processing module is specifically configured to execute:
  • the selection starts from the target candidate region with the largest score, until M target candidate regions are selected.
  • processing module is further configured to execute:
  • the feature maps corresponding to the M target candidate regions are subjected to a pooling operation and then input to a convolutional neural network model to obtain a result of identifying the contents of the M target candidate regions output by the convolutional neural network model.
  • the bucket sorting model, the size of the prior region and the offset are stored in a tightly coupled memory processor.
  • the post-processing device of the RPN network obtains the score of the candidate region output by the RPN network and the offset between the candidate region and the corresponding prior region; N target candidate regions with the largest scores; according to the size of the target priori region corresponding to the target candidate region, and the target offset between the target candidate region and the target priori region, calculate the size of the target candidate region; according to the target candidate region size, calculate the overlap ratio between target candidate regions, and select M target candidate regions from N target candidate regions according to the overlap ratio, and the overlap ratio between the M target candidate regions is less than or equal to the preset overlap ratio threshold .
  • the application can first sort and filter the candidate regions, and determine the N target candidate regions with the highest scores, and then Calculate the size of the N target candidate regions and consider the overlap rate between the candidate regions. Since the number of N target candidate regions has been greatly reduced compared to the number of candidate regions output by the RPN network, the calculation can be greatly reduced. The calculation amount in the process of candidate area size reduces a lot of redundant calculation and reduces the load pressure of the processor.
  • Embodiments of the present application further provide a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, each process of the foregoing RPN network post-processing method embodiment can be achieved, and can achieve The same technical effect, in order to avoid repetition, will not be repeated here.
  • the computer-readable storage medium such as read-only memory (Read-Only Memory, referred to as ROM), random access memory (Random Access Memory, referred to as RAM), magnetic disk or optical disk and the like.
  • the acquiring module may be an interface connecting the external control terminal with the post-processing device of the RPN network.
  • the external control terminal may include a wired or wireless headset port, an external power (or battery charger) port, a wired or wireless data port, a memory card port, a port for connecting a control terminal with an identification module, an audio input /Output (I/O) ports, video I/O ports, headphone ports, and more.
  • the acquisition module may be used to receive input (eg, data information, power, etc.) from an external control terminal and transmit the received input to one or more elements within the post-processing device of the RPN network or may be used in the RPN network data is transmitted between the post-processing device and the external control terminal.
  • At least one magnetic disk storage device For example at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device.
  • the processor is the control center of the control terminal. It uses various interfaces and lines to connect various parts of the entire control terminal, and executes control by running or executing the software programs and/or modules stored in the memory and calling the data stored in the memory. Various functions of the terminal and processing data, so as to carry out overall monitoring of the control terminal.
  • the processor may include one or more processing units; preferably, the processor may integrate an application processor and a modem processor, wherein the application processor mainly processes the operating system, user interface and application programs, etc., and the modem processor Mainly deals with wireless communication. It can be understood that, the above-mentioned modulation and demodulation processor may not be integrated into the processor.
  • the embodiments of the present application may be provided as a method, a control terminal, or a computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
  • computer-usable storage media including, but not limited to, disk storage, CD-ROM, optical storage, etc.
  • These computer program instructions may also be stored in a computer readable memory capable of directing a computer or other programmable data processing terminal device to operate in a particular manner, such that the instructions stored in the computer readable memory result in an article of manufacture comprising the instruction to control the terminal,
  • the instruction controls the terminal to implement the function specified in one flow or multiple flows of the flowchart and/or one block or multiple blocks of the block diagram.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

一种RPN网络的后处理方法及装置,所述方法包括:获取RPN网络输出的候选区域的得分,以及候选区域与对应的先验区域之间的偏移量(101);从所有候选区域中,确定得分最大的N个目标候选区域(102);计算目标候选区域的尺寸(103);据目标候选区域的尺寸,计算目标候选区域之间的重叠率,并根据重叠率,从N个目标候选区域中选取M个目标候选区域(104)。本申请可以先对候选区域进行排序、筛选,确定得分最大的N个目标候选区域,之后再进行N个目标候选区域的尺寸的计算,由于N个目标候选区域的数量相较于RPN网络输出的候选区域的数量已经大幅减少,因此,能够大幅降低计算候选区域尺寸的过程中的运算量。

Description

RPN网络的后处理方法及装置 技术领域
本申请涉及计算机技术领域,特别是涉及一种RPN网络的后处理方法及装置。
背景技术
对于自动驾驶、无人机、智能终端等对目标检测有需求的场景,通常需要利用区域候选网络(RPN,Region Proposal Network)来进行目标的候选区域的生成。
RPN网络的后处理是实现目标检测算法的重要一步,相关技术中,RPN网络的后处理包括:1、针对RPN网络的输出与预设的先验区域,计算得到每个候选区域(ROI,region of interest,感兴趣区域)的尺寸;2、过滤尺寸过小的ROI;3、按照得分对ROI进行排序,并保留得分最高的N个ROI;4、对N个ROI进行非极大值抑制(NMS,Non-Maximum Suppression)处理,得到最终的M个ROI。
但是,在计算候选区域的尺寸的过程中,涉及浮点的指数运算,目前方案中,需要针对大量的候选区域一一计算其尺寸,造成后处理过程的耗时较长,效率低下。
发明内容
本申请提供一种RPN网络的后处理方法及装置,可以解决现有技术中后处理过程的耗时较长,效率低下的问题。
第一方面,本申请实施例提供了一种RPN网络的后处理方法,包括:
获取RPN网络输出的候选区域的得分,以及所述候选区域与对应的先验区域之间的偏移量;
从所有所述候选区域中,确定所述得分最大的N个目标候选区域;
根据所述目标候选区域对应的目标先验区域的尺寸,以及所述目标候选区域与所述目标先验区域之间的目标偏移量,计算所述目标候选区域的尺寸;
根据所述目标候选区域的尺寸,计算所述目标候选区域之间的重叠率,并根据所述重叠率,从所述N个目标候选区域中选取M个目标候选区域,所述M个目标候选区域之间的重叠率小于或等于预设重叠率阈值。
第二方面,本申请实施例提供了一种RPN网络的后处理装置,包括:获取模块和处理器;
获取模块和处理模块;
所述获取模块用于获取RPN网络输出的候选区域的得分,以及所述候选区域与对应的先验区域之间的偏移量;
所述处理模块用于从所有所述候选区域中,确定所述得分最大的N个目标候选区域;
根据所述目标候选区域对应的目标先验区域的尺寸,以及所述目标候选区域与所述目标先验区域之间的目标偏移量,计算所述目标候选区域的尺寸;
根据所述目标候选区域的尺寸,计算所述目标候选区域之间的重叠率,并根据所述重叠率,从所述N个目标候选区域中选取M个目标候选区域,所述M个目标候选区域之间的重叠率小于或等于预设重叠率阈值。
第三方面,本申请提供一种计算机可读存储介质,所述计算机可读存储介质包括指令,当其在计算机上运行时,使得计算机执行上述方面所述的方法。
第四方面,本申请提供一种计算机程序产品,所述计算机程序产品包括指令,当其在计算机上运行时,使得计算机执行上述方面所述的方法。
在本申请实施例中,本申请可以根据RPN网络输出的候选区域的得分,以及候选区域与对应的先验区域之间的偏移量,先对候选区域进行排序、筛选,确定得分最大的N个目标候选区域,之后再进行N个目标候选区域的尺寸的计算和基于候选区域之间重叠率的考虑,由于N个目标候选区域的数量相较于RPN网络输出的候选区域的数量已经大幅减少,因此,能够大幅降低计算候选区域尺寸的过程中的运算量,减少了大量的冗余计算,降低了处理器的负载压力。
附图说明
图1是本申请实施例提供的一种RPN网络的后处理方法的流程图;
图2是本申请实施例提供的一种RPN网络的后处理方法的具体流程图;
图3是本申请实施例提供的一种TOP N算法的示意图;
图4是本申请实施例提供的另一种TOP N算法的示意图;
图5是本申请实施例提供的一种NMS算法的示意图;
图6是本申请实施例提供的一种RPN网络的后处理装置的框图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行描述。
在本申请实施例中,基于各个场景下对目标检测的需求,可以利用采用了RPN网络的定点神经网络模型来提高目标检测的效率,其中,RPN网络可以将特征图作为输入,并输出特征图中可能包含目标的候选区域相对于预设的先验区域之间的偏移量,以及用于反映候选区域相对于先验区域的匹配可能性的得分,候选区域与先验区域匹配的几率越大,该得分越大。
具体的,RPN网络的后处理是指利用RPN网络的输出内容进行计算,最终得到M个(M的值可针对实际需求设定)候选区域的具体尺寸,这M个候选区域的选取同时考虑了得分以及重叠率的影响,具有较优价值,将这M个候选区域进行后续的池化、分类回归等操作,可以输出目标的检测类别和位置,完成目标检测。
相关技术中,在计算候选区域的具体尺寸的过程中,涉及到将RPN网络输出的定点格式的数据转换为浮点格式,以及浮点格式的指数运算,这种运算的运算量极大,造成处理器负载急剧提升。而相关技术中首先对RPN网络输出的所有候选区域计算其尺寸,无疑对处理器的负载造成较大压力。另外,计算完候选区域的尺寸后,后续的筛选、排序操作又会筛除一部分候选区域,这些被筛除的候选区域的尺寸所消耗的计算资源则被浪费,造成冗余计算。
在本申请实施例中,则可以根据RPN网络输出的候选区域的得分,以及候选区域与对应的先验区域之间的偏移量,先对候选区域进行排序、筛选,确定得分最大的N个目标候选区域,之后再进行N个目标候选区域的尺寸的计算和基于候选区域之间重叠率的考虑,由于N个目标候选区域的数量相较于RPN网络输出的候选区域的数量已经大幅减少,因此,能够大幅降低计算候选区域尺寸的过程中的运算量,减少了大量的冗余计算,降低了处理器的负载压力。
图1是本申请实施例提供的一种RPN网络的后处理方法的流程图,如图1所示,该方法可以包括:
步骤101、获取RPN网络输出的候选区域的得分,以及所述候选区域与对应的先验区域之间的偏移量。
在本申请实施例中,RPN网络可以将特征图作为输入,并输出特征图中可能包含目标的候选区域相对于预设的先验区域之间的偏移量,以及用于反映候选区域相对于先验区域的匹配可能性的得分,候选区域与先验区域匹配的几率越大,该得分越大。
具体的,先验区域(anchor,也称锚区域)是利用先验规则,具有场景中常见目标的框体的尺寸及可能出现位置的区域,RPN网络即是通过将从特征图中提取到的可能包含目标的候选区域与先验区域进行匹配,得到候选区域的得分,以及候选区域与对应的先验区域之间的偏移量。
步骤102、从所有所述候选区域中,确定所述得分最大的N个目标候选区域。
在该步骤中,基于RPN网络输出的所有候选区域,首先可以进行排序和筛选操作,从而得到得分最大的N个(TOPN)目标候选区域。其中,N的值可以根据实际需求进行设定,如在计算资源充足的情况下,N的取值可以较大,在计算资源不充足的情况下,N的取值可以较小。排序和筛选操作具体可以采用多种排序、筛选算法来完成,如,快速排序算法、桶排序算法等。
步骤103、根据所述目标候选区域对应的目标先验区域的尺寸,以及所述目标候选区域与所述目标先验区域之间的目标偏移量,计算所述目标候选区域的尺寸。
在该步骤中,由于N个目标候选区域的数量相较于RPN网络输出的候选区域的数量已经大幅减少,因此,相较于相关技术,本申请实施例能够大幅降低计算候选区域尺寸的过程中的运算量,减少了大量的冗余计算,降低了处理器的负载压力。
步骤104、根据所述目标候选区域的尺寸,计算所述目标候选区域之间的重叠率,并根据所述重叠率,从所述N个目标候选区域中选取M个目标候选区域,所述M个目标候选区域之间的重叠率小于或等于预设重叠率阈值。
在本申请实施例中,若两个候选区域间的重叠率过大,则说明两个候选 区域相似度较高,存在冗余计算,此时仅需保留重叠率过大(如大于90%)的两个候选区域中得分较大的候选即可,本申请实施例中,则可以N个目标候选区域的尺寸,计算目标候选区域之间的重叠率,并根据重叠率,从N个目标候选区域中选取M个目标候选区域,且M个目标候选区域之间的重叠率小于或等于预设重叠率阈值。其中,M的值可以根据实际需求进行设定,如在计算资源充足的情况下,M的取值可以较大,在计算资源不充足的情况下,M的取值可以较小。该步骤具体可以采用多种算法来完成,如,一种实现方式中,可以采用非极大值抑制(NMS,Non-maximum Suppression)算法来实现该步骤,NMS的思想是将目标候选区域按照得分排序后,从比某个目标候选区域得分低的目标候选区域里,把重叠率较高的目标候选区域排除掉。
综上,本申请实施例提供的一种RPN网络的后处理方法,通过获取RPN网络输出的候选区域的得分,以及候选区域与对应的先验区域之间的偏移量;从所有候选区域中,确定得分最大的N个目标候选区域;根据目标候选区域对应的目标先验区域的尺寸,以及目标候选区域与目标先验区域之间的目标偏移量,计算目标候选区域的尺寸;根据目标候选区域的尺寸,计算目标候选区域之间的重叠率,并根据重叠率,从N个目标候选区域中选取M个目标候选区域,M个目标候选区域之间的重叠率小于或等于预设重叠率阈值。本申请可以根据RPN网络输出的候选区域的得分,以及候选区域与对应的先验区域之间的偏移量,先对候选区域进行排序、筛选,确定得分最大的N个目标候选区域,之后再进行N个目标候选区域的尺寸的计算和基于候选区域之间重叠率的考虑,由于N个目标候选区域的数量相较于RPN网络输出的候选区域的数量已经大幅减少,因此,能够大幅降低计算候选区域尺寸的过程中的运算量,减少了大量的冗余计算,降低了处理器的负载压力。
图2是本申请实施例提供的一种RPN网络的后处理方法的具体流程图,如图2所示,该方法可以包括:
步骤201、获取RPN网络输出的候选区域的得分,以及所述候选区域与对应的先验区域之间的偏移量。
步骤201具体可以参照上述步骤101,此处不再赘述。
步骤202、根据所述候选区域的得分的取值范围以及维度,按照桶排序规则,建立桶排序模型,所述桶排序模型用于存储所述索引序号,并在达到预设的终止条件时输出所述得分最大的N个目标候选区域。
其中,所述候选区域的得分包括:所述候选区域的分别对应三个维度的 得分以及索引序号。
在实际应用中,相关计算采用了快速排序算法获取得分最高的N个候选区域,假设参与快速排序算法的候选模型的数量为n,则使用快速排序等排序算法,其算法复杂度为O(n×log(n))。通常n的数量级都比较大,因此这个过程对计算资源的依赖较大,在计算性能较弱的处理器上的耗时较长。
本申请实施例提供了一种桶排序算法来实现输出得分最大的N个目标候选区域,桶排序算法是将待排序集合中处于同一个值域的元素存入同一个桶中,也就是根据元素值特性将集合拆分为多个区域,则拆分后形成的多个桶,从值域上看是处于有序状态的。对每个桶中元素进行排序,则所有桶中元素构成的集合是已排序的,桶排序算法需要待排序对象具有离散性和有限性,其可以利用这种特性,用分类操作代替排序操作,使得桶排序算法的算法复杂度是O(n),相比于原算法需要的内存更少,计算量大幅减少。
具体的,参照图3,图3示出了本申请实施例提供的一种TOP N算法的示意图,其中,RPN网络输出rpn_cls_score分支(得分分支)和rpn_bbox_pred(偏移量分支),针对rpn_cls_score分支,候选区域的得分包括分别对应不同先验区域的得分,以及每个得分对应三个维度的索引序号index,三个维度包括C维度、W维度、H维度;C维度用于反映特征图中每一个位置对应的先验区域的数量。W维度用于反映特征图中存在先验区域的位置的行数。H维度用于反映特征图中存在先验区域的位置的列数。另外,一个候选区域的三个维度上的索引序号index可以构成一组索引,该索引用于查找该候选区域对应的先验区域的尺寸,以用于进行后续候选区域的尺寸计算。
进一步的,对于定点的RPN网络,该RPN网络的输出是定点格式。以8bit的RPN网络为例,其输出的数值范围最多只有256个整数,对于rpn_cls_score这一分支,其输出的得分的数值范围是-128~127,使得rpn_cls_score这一分支具备定点格式的离散性和有限性。因此,本申请实施例可以采用“桶排序”的思想,只保留所有得分数值中最大的N个数值即可,对常规的快速排序算法进行简化和等效替代,达到降低运算负载度和提高运算效率的目的。
在该步骤中,可以根据候选区域的得分的取值范围以及维度,按照桶排 序规则,建立桶排序模型,桶排序模型可以存储索引序号,并在达到预设的终止条件时输出得分最大的N个目标候选区域。例如,针对rpn_cls_score这一分支,其输出的得分的数值范围是-128~127,且对应C、H、W三个维度,因此可以建立三组“桶”,每组“桶”包含256个“桶”,即“桶”-128、“桶”-127、“桶”-126….“桶”126、“桶”127,每个“桶”的容量为N。通过将rpn_cls_score分支中的得分与“桶”对应的得分匹配,若匹配则将该得分对应的索引序号存储在该“桶”中,直至达到预设的终止条件时,三组“桶”可以输出得分最大的N个目标候选区域的得分在三个维度上对应的索引序号。
可选的,所述得分的取值范围包括K个得分,步骤202具体可以包括:
子步骤2021、建立与所述三个维度对应的三个存储片区组,得到所述桶排序模型。
其中,每组存储片区组包括K个存储片区,所述K个存储片区与所述K个得分一一对应,每个存储片区的容量上限为N;当一个所述存储片区存入一个所述索引序号时,所述存储片区的容量加一。
针对上述思想,则在该步骤中,可以根据候选区域的得分的取值范围包括K个定点格式的得分,以及三个维度,建立三组存储片区组,每组存储片区组包含K个存储片区,每个存储片区的容量为N,在向存储片区存放索引序号时,每存放一组索引序号,该存储片区的容量加一,直至容量达到N时,该存储片区存满。
步骤203、将所述候选区域的得分的索引序号输入所述桶排序模型,得到所述桶排序模型输出的所述得分最大的N个目标候选区域,以及每个所述目标候选区域的得分在三个维度上对应的索引序号。
在本申请实施例中,可以将rpn_cls_score分支中的得分的索引序号输入桶排序模型,将rpn_cls_score分支中的得分与桶排序模型中每个“桶”对应的得分匹配,若两个得分一致则匹配,此时将该得分对应的索引序号存储在该“桶”中,直至达到预设的终止条件时,三组“桶”可以输出得分最大的N个目标候选区域的得分在三个维度上对应的索引序号。
例如,针对步骤202中图3提供的示例,建立有三组“桶”,每组“桶”包含256个“桶”,即“桶”-128、“桶”-127、“桶”-126….“桶”126、 “桶”127,每个“桶”的容量为N。
通过依次遍历rpn_cls_score分支中的每个得分,将得分对应的索引序号按照维度,放置在维度对应的“桶”中,当得分最高的“桶”(127)容量达到上限N或所有得分遍历完毕,可以认为达到终止条件,此时可以从得分最高的三个“桶”(C127,W127,H127)中,提取N个目标候选区域的得分在三个维度上对应的索引序号;当所有得分遍历完毕,但得分最高的“桶”的容量未达到上限N的情况下,则从得分最高的“桶”开始,依次提取每个得分对应的“桶”中的索引序号,直至得到N个目标候选区域的得分在三个维度上对应的索引序号。
可选的,步骤203具体可以包括:
子步骤2031、在所述候选区域的目标维度的得分,与所述目标维度对应的目标存储片区组中的存储片区对应的得分匹配的情况下,将所述目标维度的得分对应的索引序号,存入与所述得分对应的存储片区中;所述目标维度为所述候选区域的三个维度中的任一维度。
子步骤2032、在将所述候选区域的所有得分的索引序号存储完毕,或所述得分最大的存储片区组中存储片区的容量达到N时,确定达到所述终止条件,并从所述存储片区组中提取所述目标候选区域的得分在所述三个维度上对应的索引序号。
本申请实施例提供了一个具体示例来对子步骤2031及子步骤2032进行描述,参照图4,图4示出了本申请实施例提供的另一种TOP N算法的示意图,其中,假设候选区域的得分的取值范围为-3至3,N=3,则取值范围包括7个定点格式的得分,C_index的取值范围是0至2,H_index的取值范围0至2,W_index的取值范围0至3;候选区域的得分对应三个维度,可以建立三组存储片区组,每组存储片区组包含7个存储片区,分别对应得分等于-3至3的7种结果,每个存储片区的容量为3,在向存储片区存放索引序号时,每存放一组索引序号,该存储片区的容量加一,直至容量达到3时,该存储片区存满。
创建完毕桶排序模型后,从rpn_cls_score分支中的得分的第一个值开始,按照从左到右,从上到下的顺序进行遍历,把得分在C维度的索引序号C_index、得分在H维度的索引序号H_index、得分在W维度的索引序号 W_index放入相应score的存储片区里。当遍历到第3个值为3的score时,score=3的存储片区已满,遍历结束,此时可以从score=3的存储片区提取TOP 3的目标候选区域的得分在三个维度上对应的索引序号。
可选的,子步骤2032具体可以包括:
子步骤A1、在所述得分最大的存储片区组中存储片区的容量达到N时,将从所述得分最大的存储片区组中提取得到的索引序号,作为所述目标候选区域的得分在所述三个维度上对应的索引序号。
在一种情况下,参照图3,当得分最高的“桶”(127)容量达到上限N或所有得分遍历完毕,可以认为达到终止条件,此时可以从得分最高的三个“桶”(C127,W127,H127)中,提取N个目标候选区域的得分在三个维度上对应的索引序号。
子步骤A2、在将所述候选区域的所有得分的索引序号存储完毕的情况下,按照所述存储片区组的得分从大到小的顺序,提取每个所述存储片区组中的索引序号作为所述目标候选区域在所述三个维度上对应的索引序号,直至提取得到N个所述目标候选区域在所述三个维度上对应的索引序号。
在另一种情况下,当所有得分遍历完毕,但得分最高的“桶”的容量未达到上限N的情况下,则从得分最高的“桶”开始,依次提取每个得分对应的“桶”中的索引序号,即提取“桶”127中所有索引序号,若未达到到N个目标候选区域的索引序号要求,则进一步从“桶”126中提取索引序号…直至得到N个目标候选区域的得分在三个维度上对应的索引序号。
步骤204、根据所述目标候选区域的得分在所述三个维度上的索引序号所构建的索引,获取所述目标候选区域对应的目标偏移量和目标先验区域的尺寸。
其中,所述候选区域的得分包括:所述候选区域的分别对应三个维度的得分以及索引序号;所述维度的索引序号构建的索引、所述候选区域对应的先验区域的尺寸、所述候选区域的偏移量之间具有对应关系。
在本申请实施例中,电子设备的内存中,由三个维度的索引序号构建的索引、候选区域对应的先验区域的尺寸、候选区域的偏移量之间具有对应关系,在得到了每个目标候选区域的得分在三个维度上的索引序号后,可以由三个维度的索引序号构建的索引,进一步从对应关系中提取得到对应的候选 区域对应的先验区域的尺寸、候选区域的偏移量。
可选的,所述偏移量和所述先验区域的尺寸分别存储于由单指令多数据流处理器提供的连续的内存片区中;所述偏移量和所述先验区域的尺寸的值与所述内存片区一一对应,则步骤204具体还可以通过根据所述索引,通过一次提取操作从所述连续的内存片区中提取得到所述目标候选区域对应的目标偏移量和目标先验区域的尺寸的方式进行实现。
在本申请实施例中,可以进一步通过优化数据存取,来实现进一步的运算效率的提升,具体的,处理运算的处理器可以为单指令多数据流(SIMD,Single Instruction Multiple Data)处理器,SIMD处理器可以通过SIMD指令复制多个操作数,并把它们打包在大型寄存器的一组指令集中。具体的,以加法指令为例,在SIMD型的处理器中,指令译码后几个执行部件可以同时访问内存,一次性获得所有操作数进行运算,这个特点使SIMD特别适合于多媒体应用等数据密集型运算。因此,在提取先验区域的尺寸、候选区域的偏移量之前,由于SIMD处理器的向量寄存器的长度较长,则可以将这些数据分别存储于由单指令多数据流处理器提供的连续的内存片区中,且使得所述偏移量和所述先验区域的尺寸的值与所述内存片区一一对应,这样可以更方便地用矢量进行并行计算,在根据索引提取先验区域的尺寸、候选区域的偏移量时,则可以通过一个SIMD指令,一次性提取所有的先验区域的尺寸、候选区域的偏移量,达到更充分地利用带宽,减少数据从内存到向量寄存器的读取时间的目的,获取最高效的访存效率,提高计算速度。
可选的,所述桶排序模型、所述先验区域的尺寸和所述偏移量存储于紧耦合内存处理器中。
在本申请实施例中,用于进行后处理运算的处理器还可以支持紧耦合内存(TCM,Tightly Coupled Memories),TCM是处理器中离计算单元最近的存储器,计算单元从TCM中获取数据的频率与计算单元的主频是一样的,延时也最低,将桶排序模型、所述先验区域的尺寸和所述偏移量存储于紧耦合内存处理器中,能够有效缩小计算时延,提高计算效率。
步骤205、根据所述目标候选区域对应的目标先验区域的尺寸,以及所述目标候选区域与所述目标先验区域之间的目标偏移量,计算所述目标候选 区域的尺寸。
步骤205具体可以参照上述步骤103,此处不再赘述。
具体的,所述先验区域的尺寸包括:所述先验区域的中心点坐标(Xa,Ya)、宽度Wa和高度Ha;所述偏移量包括:中心点偏移量δx,δy、宽度偏移量δw和高度偏移量δh;步骤205具体可以包括:
子步骤2051、根据公式1和公式2计算得到所述目标候选区域的中心点坐标(Xb,Yb)。
子步骤2052、根据公式3和公式4计算得到所述目标候选区域的宽度Wb和高度Hb。
公式1:Xb=δx×S×v1×Wa+Xa;公式2:Yb=δy×S×v1×Ha+Ya;
公式3:Wb=e dw×Wa;公式4:Hb=e dh×Ha;
其中,S为定点数到浮点数的转换系数;v1为所述先验区域的中心点坐标与所述先验区域的宽度的方差;dw=δw×S×v2;dh=δh×S×v2;v2为所述先验区域的中心点坐标与所述先验区域的高度的方差。
在本申请实施例中,由于利用采用了RPN网络的定点神经网络模型来进行目标检测,但是RPN网络的输出是定点数格式,而且要求RPN后处理过程的最终输出是浮点数格式,因此需要通过定点数与浮点数之间的转换系数S进行定点数与浮点数之间的转换。
将RPN网络输出的rpn_bbox_pred(偏移量分支)分支中的所述先验区域的中心点坐标(Xa,Ya)、宽度Wa和高度Ha;所述偏移量包括:中心点偏移量δx,δy、宽度偏移量δw和高度偏移量δh代入上述公式1至公式3,即可求出目标候选区域的中心点坐标(Xb,Yb)以及目标候选区域的宽度Wb和高度Hb。
可选的,在通过所述公式3和所述公式4计算得到所述宽度Wb和所述高度Hb的过程中,e dw的值通过查询第一运算表得到,e dh的值通过查询第二运算表得到;δw和δh的取值范围包括J个定点整数;
其中,所述第一运算表包括:根据e dw=e δw×S×V2计算得到的J个浮点数格式的结果;所述第二运算表包括:根据e dh=e δh×S×V2计算得到的J个浮点数格式的结果。
在本申请实施例中,可以发现,在计算目标候选区域的宽度Wb=e dw×Wa和高度Hb=e dh×Ha的过程中,dw=δw×S×v2;dh=δh×S×v2,整个计算涉及到浮点数的指数运算,而浮点数的指数运算复杂度较高,会导致处理器的运算效率较差,因此,考虑到δw和δh都是取值范围为-128至127的定点数,而S、v1,v2都是常数,本申请实施例可以提前计算得到e δw×S×V2的所有256种结果并建立为第一运算表存在内存中,以及提前计算得到e δh×S× V2的所有256种结果并建立为第二运算表存在内存中。
在处理器实际计算过程中,则可以直接通过查找表(LUT,Look Up Table)操作在内存中通过查询第一运算表和第二运算表,获取对应的e dw和e dh的值,使得整个计算过程省去了浮点数指数运算的过程,优化了运算效率。
步骤206、将所述N个目标候选区域按照得分由大到小的顺序进行排序。
步骤207、根据所述目标候选区域的尺寸,计算每个得分较大目标候选区域与所有得分较小目标候选区域之间的重叠率。
步骤208、将所述重叠率大于所述预设重叠率阈值的得分较小目标候选区域删除。
步骤209、从剩余的所述目标候选区域中选取M个目标候选区域。
在本申请实施例中,若两个候选区域间的重叠率过大,则说明两个候选区域相似度较高,存在冗余计算,此时仅需保留重叠率过大(如大于90%)的两个候选区域中得分较大的候选即可,本申请实施例中,则可以N个目标候选区域的尺寸,计算目标候选区域之间的重叠率,并根据重叠率,从N个目标候选区域中选取M个目标候选区域,且M个目标候选区域之间的重叠率小于或等于预设重叠率阈值。其中,M的值可以根据实际需求进行设定,如在计算资源充足的情况下,M的取值可以较大,在计算资源不充足的情况下,M的取值可以较小。
具体的,可以基于NMS算法的思想,将目标候选区域按照得分排序后,从比某个目标候选区域得分低的目标候选区域里,把重叠率较高的目标候选区域排除掉。
可选的,步骤209具体可以通过在剩余的所述目标候选区域中,从得分最大的目标候选区域开始选取,直至选取得到M个目标候选区域的方式进行实现。
在本申请实施例中,NMS算法需要对全部N个目标候选区域按照得分从高到低的顺序做NMS,将重叠率大于预设重叠率阈值的较小得分目标候选区域的得分设置为0(也即删除),然后再对得分进行一次排序,保留得分最高的M个目标候选区域作为最终的候选区域。但事实上,在NMS计算过程中,当发现有M个目标候选区域没有被“抑制”时,NMS就可以停止了,其余的NMS计算均为冗余计算。
其次,通过排序来获取得分最高的M个目标候选区域的做法也不是必须的,因为采用了RPN网络的定点神经网络模型的后续算法对这M个目标候选区域的顺序并没有要求。
因此,参照图5,图5示出了本申请实施例提供的一种NMS算法的示意图,NMS的步骤中可以通过复制操作代替排序操作,在将所有重叠率大于预设重叠率阈值的较小得分目标候选区域的得分设置为0之后,第2位、第5位、第M-3位、第M+1位、第M+2位、第M+4位、第N位被抑制(得分置0),本申请实施例可以将排在第M位之后的保留目标候选区域(得分未置0的目标候选区域),依次复制到第M位之前的被抑制的“空洞”(得分置0的目标候选区域所处位置)上即可,即按照从前到后的顺序,将第M位的目标候选区域复制至第2位,将第M+3位的目标候选区域复制至第5位,将第M+5位的目标候选区域复制至第M-3位,使得第M位之前所有抑制的“空洞”补全,得到最终M个目标候选区域。这种处理思路降低了冗余计算,并且用复制操作代替排序操作,降低了计算资源的消耗,提高了运算效率。
可选的,在步骤209之后,还可以包括:
步骤210、将所述M个目标候选区域对应的特征图进行池化操作后输入卷积神经网络模型,得到所述卷积神经网络模型输出的对所述M个目标候选区域的内容的识别结果。
在本申请实施例中,最终得到的M个候选区域的选取同时考虑了得分以及重叠率的影响,具有较优价值,将这M个候选区域进行后续的卷积神经网络模型中池化、分类回归等操作,可以输出目标的检测类别和位置,完成目标检测。
综上,本申请实施例提供的一种RPN网络的后处理方法,通过获取RPN 网络输出的候选区域的得分,以及候选区域与对应的先验区域之间的偏移量;从所有候选区域中,确定得分最大的N个目标候选区域;根据目标候选区域对应的目标先验区域的尺寸,以及目标候选区域与目标先验区域之间的目标偏移量,计算目标候选区域的尺寸;根据目标候选区域的尺寸,计算目标候选区域之间的重叠率,并根据重叠率,从N个目标候选区域中选取M个目标候选区域,M个目标候选区域之间的重叠率小于或等于预设重叠率阈值。本申请可以根据RPN网络输出的候选区域的得分,以及候选区域与对应的先验区域之间的偏移量,先对候选区域进行排序、筛选,确定得分最大的N个目标候选区域,之后再进行N个目标候选区域的尺寸的计算和基于候选区域之间重叠率的考虑,由于N个目标候选区域的数量相较于RPN网络输出的候选区域的数量已经大幅减少,因此,能够大幅降低计算候选区域尺寸的过程中的运算量,减少了大量的冗余计算,降低了处理器的负载压力。
图6是本申请实施例提供的一种RPN网络的后处理装置的框图,如图6所示,该RPN网络的后处理装置300可以包括:获取模块301和处理模块302;
所述获取模块301用于执行:获取RPN网络输出的候选区域的得分,以及所述候选区域与对应的先验区域之间的偏移量;
所述处理模块302用于执行:
从所有所述候选区域中,确定所述得分最大的N个目标候选区域;
根据所述目标候选区域对应的目标先验区域的尺寸,以及所述目标候选区域与所述目标先验区域之间的目标偏移量,计算所述目标候选区域的尺寸;
根据所述目标候选区域的尺寸,计算所述目标候选区域之间的重叠率,并根据所述重叠率,从所述N个目标候选区域中选取M个目标候选区域,所述M个目标候选区域之间的重叠率小于或等于预设重叠率阈值。
可选的,所述候选区域的得分包括:所述候选区域的分别对应三个维度的得分以及索引序号;
所述处理模块具体用于:
根据所述候选区域的得分的取值范围以及所述维度,按照桶排序规则,建立桶排序模型,所述桶排序模型用于存储所述索引序号,并在达到预设的终止条件时输出所述得分最大的N个目标候选区域;
将所述候选区域的得分的索引序号输入所述桶排序模型,得到所述桶排序模型输出的所述得分最大的N个目标候选区域,以及每个所述目标候选区域的得分在三个维度上对应的索引序号。
可选的,所述得分的取值范围包括K个得分;所述处理模块具体用于执行:
建立与所述三个维度对应的三个存储片区组,得到所述桶排序模型;
其中,每组存储片区组包括K个存储片区,所述K个存储片区与所述K个得分一一对应,每个存储片区的容量上限为N;当一个所述存储片区存入一个所述索引序号时,所述存储片区的容量加一。
可选的,所述处理模块具体用于执行:
在所述候选区域的目标维度的得分,与所述目标维度对应的目标存储片区组中的存储片区对应的得分匹配的情况下,将所述目标维度的得分对应的索引序号,存入与所述得分对应的存储片区中;所述目标维度为所述候选区域的三个维度中的任一维度;
在将所述候选区域的所有得分的索引序号存储完毕,或所述得分最大的存储片区组中存储片区的容量达到N时,确定达到所述终止条件,并从所述存储片区组中提取所述目标候选区域的得分在所述三个维度上对应的索引序号。
可选的,所述处理模块具体用于执行:
在所述得分最大的存储片区组中存储片区的容量达到N时,将从所述得分最大的存储片区组中提取得到的索引序号,作为所述目标候选区域的得分在所述三个维度上对应的索引序号;
在将所述候选区域的所有得分的索引序号存储完毕的情况下,按照所述存储片区组的得分从大到小的顺序,提取每个所述存储片区组中的索引序号作为所述目标候选区域在所述三个维度上对应的索引序号,直至提取得到N个所述目标候选区域在所述三个维度上对应的索引序号。
可选的,所述候选区域的得分包括:所述候选区域的分别对应三个维度的得分以及索引序号;所述维度的索引序号构建的索引、所述候选区域对应的先验区域、所述候选区域的偏移量之间具有对应关系;
所述处理模块还用于执行:
根据所述目标候选区域的得分在所述三个维度上的索引序号所构建的索引,获取所述目标候选区域对应的目标偏移量和目标先验区域的尺寸。
可选的,所述偏移量和所述先验区域的尺寸分别存储于由单指令多数据流处理器提供的连续的内存片区中;所述偏移量和所述先验区域的尺寸的值与所述内存片区一一对应;
所述处理模块具体用于执行:
根据所述索引,通过一次提取操作从所述连续的内存片区中提取得到所述目标候选区域对应的目标偏移量和目标先验区域的尺寸。
可选的,所述先验区域的尺寸包括:所述先验区域的中心点坐标(Xa,Ya)、宽度Wa和高度Ha;所述偏移量包括:中心点偏移量δx,δy、宽度偏移量δw和高度偏移量δh;
所述处理模块具体用于执行:
根据公式1和公式2计算得到所述目标候选区域的中心点坐标(Xb,Yb);
根据公式3和公式4计算得到所述目标候选区域的宽度Wb和高度Hb;
公式1:Xb=δx×S×v1×Wa+Xa;公式2:Yb=δy×S×v1×Ha+Ya;
公式3:Wb=e dw×Wa;公式4:Hb=e dh×Ha;
其中,S为定点数到浮点数的转换系数;v1为所述先验区域的中心点坐标与所述先验区域的宽度的方差;dw=δw×S×v2;dh=δh×S×v2;v2为所述先验区域的中心点坐标与所述先验区域的高度的方差。
可选的,在通过所述公式3和所述公式4计算得到所述宽度Wb和所述高度Hb的过程中,e dw的值通过查询第一运算表得到,e dh的值通过查询第二运算表得到;δw和δh的取值范围包括J个定点整数;
其中,所述第一运算表包括:根据e dw=e δw×S×V2计算得到的J个浮点数格式的结果;所述第二运算表包括:根据e dh=e δh×S×V2计算得到的J个浮点数格式的结果。
可选的,所述处理模块具体用于执行:
将所述N个目标候选区域按照得分由大到小的顺序进行排序;
根据所述目标候选区域的尺寸,计算每个得分较大目标候选区域与所有得分较小目标候选区域之间的重叠率;
将所述重叠率大于所述预设重叠率阈值的得分较小目标候选区域删除;
从剩余的所述目标候选区域中选取M个目标候选区域。
可选的,所述处理模块具体用于执行:
在剩余的所述目标候选区域中,从得分最大的目标候选区域开始选取,直至选取得到M个目标候选区域。
可选的,所述处理模块还用于执行:
将所述M个目标候选区域对应的特征图进行池化操作后输入卷积神经网络模型,得到所述卷积神经网络模型输出的对所述M个目标候选区域的内容的识别结果。
可选的,所述桶排序模型、所述先验区域的尺寸和所述偏移量存储于紧耦合内存处理器中。
综上,本申请实施例提供的RPN网络的后处理装置,通过获取RPN网络输出的候选区域的得分,以及候选区域与对应的先验区域之间的偏移量;从所有候选区域中,确定得分最大的N个目标候选区域;根据目标候选区域对应的目标先验区域的尺寸,以及目标候选区域与目标先验区域之间的目标偏移量,计算目标候选区域的尺寸;根据目标候选区域的尺寸,计算目标候选区域之间的重叠率,并根据重叠率,从N个目标候选区域中选取M个目标候选区域,M个目标候选区域之间的重叠率小于或等于预设重叠率阈值。本申请可以根据RPN网络输出的候选区域的得分,以及候选区域与对应的先验区域之间的偏移量,先对候选区域进行排序、筛选,确定得分最大的N个目标候选区域,之后再进行N个目标候选区域的尺寸的计算和基于候选区域之间重叠率的考虑,由于N个目标候选区域的数量相较于RPN网络输出的候选区域的数量已经大幅减少,因此,能够大幅降低计算候选区域尺寸的过程中的运算量,减少了大量的冗余计算,降低了处理器的负载压力。
本申请实施例还提供一种计算机可读存储介质,计算机可读存储介质上存储有计算机程序,该计算机程序被处理器执行时实现上述RPN网络的后处理方法实施例的各个过程,且能达到相同的技术效果,为避免重复,这里不再赘述。其中,所述的计算机可读存储介质,如只读存储器(Read-Only  Memory,简称ROM)、随机存取存储器(Random Access Memory,简称RAM)、磁碟或者光盘等。
获取模块可以为外部控制终端与RPN网络的后处理装置连接的接口。例如,外部控制终端可以包括有线或无线头戴式耳机端口、外部电源(或电池充电器)端口、有线或无线数据端口、存储卡端口、用于连接具有识别模块的控制终端的端口、音频输入/输出(I/O)端口、视频I/O端口、耳机端口等等。获取模块可以用于接收来自外部控制终端的输入(例如,数据信息、电力等等)并且将接收到的输入传输到RPN网络的后处理装置内的一个或多个元件或者可以用于在RPN网络的后处理装置和外部控制终端之间传输数据。
例如至少一个磁盘存储器件、闪存器件、或其他易失性固态存储器件。
处理器是控制终端的控制中心,利用各种接口和线路连接整个控制终端的各个部分,通过运行或执行存储在存储器内的软件程序和/或模块,以及调用存储在存储器内的数据,执行控制终端的各种功能和处理数据,从而对控制终端进行整体监控。处理器可包括一个或多个处理单元;优选的,处理器可集成应用处理器和调制解调处理器,其中,应用处理器主要处理操作系统、用户界面和应用程序等,调制解调处理器主要处理无线通信。可以理解的是,上述调制解调处理器也可以不集成到处理器中。
本说明书中的各个实施例均采用递进的方式描述,每个实施例重点说明的都是与其他实施例的不同之处,各个实施例之间相同相似的部分互相参见即可。
本领域内的技术人员应明白,本申请的实施例可提供为方法、控制终端、或计算机程序产品。因此,本申请可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。
本申请是参照根据本申请的方法、终端设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理终端设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理终端设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的控制终端。
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理终端设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令控制终端的制造品,该指令控制终端实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。
这些计算机程序指令也可装载到计算机或其他可编程数据处理终端设备上,使得在计算机或其他可编程终端设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程终端设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。
尽管已描述了本申请的优选实施例,但本领域内的技术人员一旦得知了基本创造性概念,则可对这些实施例做出另外的变更和修改。所以,所附权利要求意欲解释为包括优选实施例以及落入本申请范围的所有变更和修改。
最后,还需要说明的是,在本文中,诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者终端设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者终端设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、物品或者终端设备中还存在另外的相同要素。
以上对本申请进行了详细介绍,本文中应用了具体个例对本申请的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本申请的方法及其核心思想;同时,对于本领域的一般技术人员,依据本申请的思想,在具体实施方式及应用范围上均会有改变之处,综上所述,本说明书内容不应理解为对本申请的限制。

Claims (27)

  1. 一种RPN网络的后处理方法,其特征在于,所述方法包括:
    获取RPN网络输出的候选区域的得分,以及所述候选区域与对应的先验区域之间的偏移量;
    从所有所述候选区域中,确定所述得分最大的N个目标候选区域;
    根据所述目标候选区域对应的目标先验区域的尺寸,以及所述目标候选区域与所述目标先验区域之间的目标偏移量,计算所述目标候选区域的尺寸;
    根据所述目标候选区域的尺寸,计算所述目标候选区域之间的重叠率,并根据所述重叠率,从所述N个目标候选区域中选取M个目标候选区域,所述M个目标候选区域之间的重叠率小于或等于预设重叠率阈值。
  2. 根据权利要求1所述的方法,其特征在于,所述候选区域的得分包括:所述候选区域的分别对应三个维度的得分以及索引序号;
    所述从所有所述候选区域中,确定所述得分最大的N个目标候选区域,包括:
    根据所述候选区域的得分的取值范围以及所述维度,按照桶排序规则,建立桶排序模型,所述桶排序模型用于存储所述索引序号,并在达到预设的终止条件时输出所述得分最大的N个目标候选区域;
    将所述候选区域的得分的索引序号输入所述桶排序模型,得到所述桶排序模型输出的所述得分最大的N个目标候选区域,以及每个所述目标候选区域的得分在三个维度上对应的索引序号。
  3. 根据权利要求2所述的方法,其特征在于,所述得分的取值范围包括K个得分;所述根据所述候选区域的得分的取值范围以及所述维度,按照桶排序规则,建立桶排序模型,包括:
    建立与所述三个维度对应的三个存储片区组,得到所述桶排序模型;
    其中,每组存储片区组包括K个存储片区,所述K个存储片区与所述K个得分一一对应,每个存储片区的容量上限为N;当一个所述存储片区存入一个所述索引序号时,所述存储片区的容量加一。
  4. 根据权利要求3所述的方法,其特征在于,所述将所述候选区域的得分的索引序号输入所述桶排序模型,得到所述桶排序模型输出的所述得分 最大的N个目标候选区域,以及每个所述目标候选区域的得分在所述三个维度上对应的索引序号,包括:
    在所述候选区域的目标维度的得分,与所述目标维度对应的目标存储片区组中的存储片区对应的得分匹配的情况下,将所述目标维度的得分对应的索引序号,存入与所述得分对应的存储片区中;所述目标维度为所述候选区域的三个维度中的任一维度;
    在将所述候选区域的所有得分的索引序号存储完毕,或所述得分最大的存储片区组中存储片区的容量达到N时,确定达到所述终止条件,并从所述存储片区组中提取所述目标候选区域的得分在所述三个维度上对应的索引序号。
  5. 根据权利要求4所述的方法,其特征在于,所述从所述存储片区组中提取所述目标候选区域的得分在所述三个维度上对应的索引序号,包括:
    在所述得分最大的存储片区组中存储片区的容量达到N时,将从所述得分最大的存储片区组中提取得到的索引序号,作为所述目标候选区域的得分在所述三个维度上对应的索引序号;
    在将所述候选区域的所有得分的索引序号存储完毕的情况下,按照所述存储片区组的得分从大到小的顺序,提取每个所述存储片区组中的索引序号作为所述目标候选区域在所述三个维度上对应的索引序号,直至提取得到N个所述目标候选区域在所述三个维度上对应的索引序号。
  6. 根据权利要求1所述的方法,其特征在于,所述候选区域的得分包括:所述候选区域的分别对应三个维度的得分以及索引序号;所述维度的索引序号构建的索引、所述候选区域对应的先验区域、所述候选区域的偏移量之间具有对应关系;
    在所述根据所述目标候选区域对应的目标先验区域的尺寸,以及所述目标候选区域与所述目标先验区域之间的目标偏移量,计算所述目标候选区域的尺寸之前,所述方法还包括:
    根据所述目标候选区域的得分在所述三个维度上的索引序号所构建的索引,获取所述目标候选区域对应的目标偏移量和目标先验区域的尺寸。
  7. 根据权利要求1所述的方法,其特征在于,所述偏移量和所述先验 区域的尺寸分别存储于由单指令多数据流处理器提供的连续的内存片区中;所述偏移量和所述先验区域的尺寸的值与所述内存片区一一对应;
    所述根据所述目标候选区域的得分在所述三个维度上的索引序号所构建的索引,获取所述目标候选区域对应的目标偏移量和目标先验区域的尺寸,包括:
    根据所述索引,通过一次提取操作从所述连续的内存片区中提取得到所述目标候选区域对应的目标偏移量和目标先验区域的尺寸。
  8. 根据权利要求1所述的方法,其特征在于,所述先验区域的尺寸包括:所述先验区域的中心点坐标(Xa,Ya)、宽度Wa和高度Ha;所述偏移量包括:中心点偏移量δx,δy、宽度偏移量δw和高度偏移量δh;
    所述根据所述目标候选区域对应的目标先验区域的尺寸,以及所述目标候选区域与所述目标先验区域之间的目标偏移量,计算所述目标候选区域的尺寸,包括:
    根据公式1和公式2计算得到所述目标候选区域的中心点坐标(Xb,Yb);
    根据公式3和公式4计算得到所述目标候选区域的宽度Wb和高度Hb;
    公式1:Xb=δx×S×v1×Wa+Xa;公式2:Yb=δy×S×v1×Ha+Ya;
    公式3:Wb=e dw×Wa;公式4:Hb=e dh×Ha;
    其中,S为定点数到浮点数的转换系数;v1为所述先验区域的中心点坐标与所述先验区域的宽度的方差;dw=δw×S×v2;dh=δh×S×v2;v2为所述先验区域的中心点坐标与所述先验区域的高度的方差。
  9. 根据权利要求8所述的方法,其特征在于,在通过所述公式3和所述公式4计算得到所述宽度Wb和所述高度Hb的过程中,e dw的值通过查询第一运算表得到,e dh的值通过查询第二运算表得到;δw和δh的取值范围包括J个定点整数;
    其中,所述第一运算表包括:根据e dw=e δw×S×V2计算得到的J个浮点数格式的结果;所述第二运算表包括:根据e dh=e δh×S×V2计算得到的J个浮点数格式的结果。
  10. 根据权利要求1所述的方法,其特征在于,所述根据所述目标候选 区域的尺寸,计算所述目标候选区域之间的重叠率,并根据所述重叠率,从所述N个目标候选区域中选取M个目标候选区域,包括:
    将所述N个目标候选区域按照得分由大到小的顺序进行排序;
    根据所述目标候选区域的尺寸,计算每个得分较大目标候选区域与所有得分较小目标候选区域之间的重叠率;
    将所述重叠率大于所述预设重叠率阈值的得分较小目标候选区域删除;
    从剩余的所述目标候选区域中选取M个目标候选区域。
  11. 根据权利要求10所述的方法,其特征在于,所述从剩余的所述目标候选区域中选取M个目标候选区域,包括:
    在剩余的所述目标候选区域中,从得分最大的目标候选区域开始选取,直至选取得到M个目标候选区域。
  12. 根据权利要求1至11任一项所述的方法,其特征在于,在所述从所述N个目标候选区域中选取M个目标候选区域之后,所述方法还包括:
    将所述M个目标候选区域对应的特征图进行池化操作后输入卷积神经网络模型,得到所述卷积神经网络模型输出的对所述M个目标候选区域的内容的识别结果。
  13. 根据权利要求2所述的方法,其特征在于,所述桶排序模型、所述先验区域的尺寸和所述偏移量存储于紧耦合内存处理器中。
  14. 一种RPN网络的后处理装置,其特征在于,所述装置包括:获取模块和处理模块;
    所述获取模块用于获取RPN网络输出的候选区域的得分,以及所述候选区域与对应的先验区域之间的偏移量;
    所述处理模块用于从所有所述候选区域中,确定所述得分最大的N个目标候选区域;
    根据所述目标候选区域对应的目标先验区域的尺寸,以及所述目标候选区域与所述目标先验区域之间的目标偏移量,计算所述目标候选区域的尺寸;
    根据所述目标候选区域的尺寸,计算所述目标候选区域之间的重叠率,并根据所述重叠率,从所述N个目标候选区域中选取M个目标候选区域,所述M个目标候选区域之间的重叠率小于或等于预设重叠率阈值。
  15. 根据权利要求14所述的装置,其特征在于,所述候选区域的得分包括:所述候选区域的分别对应三个维度的得分以及索引序号;
    所述处理模块具体用于:
    根据所述候选区域的得分的取值范围以及所述维度,按照桶排序规则,建立桶排序模型,所述桶排序模型用于存储所述索引序号,并在达到预设的终止条件时输出所述得分最大的N个目标候选区域;
    将所述候选区域的得分的索引序号输入所述桶排序模型,得到所述桶排序模型输出的所述得分最大的N个目标候选区域,以及每个所述目标候选区域的得分在三个维度上对应的索引序号。
  16. 根据权利要求15所述的装置,其特征在于,所述得分的取值范围包括K个得分;所述处理模块具体用于执行:
    建立与所述三个维度对应的三个存储片区组,得到所述桶排序模型;
    其中,每组存储片区组包括K个存储片区,所述K个存储片区与所述K个得分一一对应,每个存储片区的容量上限为N;当一个所述存储片区存入一个所述索引序号时,所述存储片区的容量加一。
  17. 根据权利要求16所述的装置,其特征在于,所述处理模块具体用于执行:
    在所述候选区域的目标维度的得分,与所述目标维度对应的目标存储片区组中的存储片区对应的得分匹配的情况下,将所述目标维度的得分对应的索引序号,存入与所述得分对应的存储片区中;所述目标维度为所述候选区域的三个维度中的任一维度;
    在将所述候选区域的所有得分的索引序号存储完毕,或所述得分最大的存储片区组中存储片区的容量达到N时,确定达到所述终止条件,并从所述存储片区组中提取所述目标候选区域的得分在所述三个维度上对应的索引序号。
  18. 根据权利要求17所述的装置,其特征在于,所述处理模块具体用于执行:
    在所述得分最大的存储片区组中存储片区的容量达到N时,将从所述得分最大的存储片区组中提取得到的索引序号,作为所述目标候选区域的得分 在所述三个维度上对应的索引序号;
    在将所述候选区域的所有得分的索引序号存储完毕的情况下,按照所述存储片区组的得分从大到小的顺序,提取每个所述存储片区组中的索引序号作为所述目标候选区域在所述三个维度上对应的索引序号,直至提取得到N个所述目标候选区域在所述三个维度上对应的索引序号。
  19. 根据权利要求14所述的装置,其特征在于,所述候选区域的得分包括:所述候选区域的分别对应三个维度的得分以及索引序号;所述维度的索引序号构建的索引、所述候选区域对应的先验区域、所述候选区域的偏移量之间具有对应关系;
    所述处理模块还用于执行:
    根据所述目标候选区域的得分在所述三个维度上的索引序号所构建的索引,获取所述目标候选区域对应的目标偏移量和目标先验区域的尺寸。
  20. 根据权利要求14所述的装置,其特征在于,所述偏移量和所述先验区域的尺寸分别存储于由单指令多数据流处理器提供的连续的内存片区中;所述偏移量和所述先验区域的尺寸的值与所述内存片区一一对应;
    所述处理模块具体用于执行:
    根据所述索引,通过一次提取操作从所述连续的内存片区中提取得到所述目标候选区域对应的目标偏移量和目标先验区域的尺寸。
  21. 根据权利要求14所述的装置,其特征在于,所述先验区域的尺寸包括:所述先验区域的中心点坐标(Xa,Ya)、宽度Wa和高度Ha;所述偏移量包括:中心点偏移量δx,δy、宽度偏移量δw和高度偏移量δh;
    所述处理模块具体用于执行:
    根据公式1和公式2计算得到所述目标候选区域的中心点坐标(Xb,Yb);
    根据公式3和公式4计算得到所述目标候选区域的宽度Wb和高度Hb;
    公式1:Xb=δx×S×v1×Wa+Xa;公式2:Yb=δy×S×v1×Ha+Ya;
    公式3:Wb=e dw×Wa;公式4:Hb=e dh×Ha;
    其中,S为定点数到浮点数的转换系数;v1为所述先验区域的中心点坐标与所述先验区域的宽度的方差;dw=δw×S×v2;dh=δh×S×v2;v2为 所述先验区域的中心点坐标与所述先验区域的高度的方差。
  22. 根据权利要求21所述的装置,其特征在于,在通过所述公式3和所述公式4计算得到所述宽度Wb和所述高度Hb的过程中,e dw的值通过查询第一运算表得到,e dh的值通过查询第二运算表得到;δw和δh的取值范围包括J个定点整数;
    其中,所述第一运算表包括:根据e dw=e δw×S×V2计算得到的J个浮点数格式的结果;所述第二运算表包括:根据e dh=e δh×S×V2计算得到的J个浮点数格式的结果。
  23. 根据权利要求14所述的装置,其特征在于,所述处理模块具体用于执行:
    将所述N个目标候选区域按照得分由大到小的顺序进行排序;
    根据所述目标候选区域的尺寸,计算每个得分较大目标候选区域与所有得分较小目标候选区域之间的重叠率;
    将所述重叠率大于所述预设重叠率阈值的得分较小目标候选区域删除;
    从剩余的所述目标候选区域中选取M个目标候选区域。
  24. 根据权利要求23所述的装置,其特征在于,所述处理模块具体用于执行:
    在剩余的所述目标候选区域中,从得分最大的目标候选区域开始选取,直至选取得到M个目标候选区域。
  25. 根据权利要求14至24任一项所述的装置,其特征在于,所述处理模块还用于执行:
    将所述M个目标候选区域对应的特征图进行池化操作后输入卷积神经网络模型,得到所述卷积神经网络模型输出的对所述M个目标候选区域的内容的识别结果。
  26. 根据权利要求15所述的装置,其特征在于,所述桶排序模型、所述先验区域的尺寸和所述偏移量存储于紧耦合内存处理器中。
  27. 一种计算机可读存储介质,其特征在于,包括指令,当其在计算机上运行时,使得所述计算机执行权利要求1至14中任一项所述的RPN网络的后处理方法。
PCT/CN2021/080811 2021-03-15 2021-03-15 Rpn网络的后处理方法及装置 WO2022193074A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2021/080811 WO2022193074A1 (zh) 2021-03-15 2021-03-15 Rpn网络的后处理方法及装置

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2021/080811 WO2022193074A1 (zh) 2021-03-15 2021-03-15 Rpn网络的后处理方法及装置

Publications (1)

Publication Number Publication Date
WO2022193074A1 true WO2022193074A1 (zh) 2022-09-22

Family

ID=83321758

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/080811 WO2022193074A1 (zh) 2021-03-15 2021-03-15 Rpn网络的后处理方法及装置

Country Status (1)

Country Link
WO (1) WO2022193074A1 (zh)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106019340A (zh) * 2016-05-12 2016-10-12 厦门市美亚柏科信息股份有限公司 快速gps定位点获取方法及系统
CN109902697A (zh) * 2017-12-07 2019-06-18 展讯通信(天津)有限公司 多目标检测方法、装置及移动终端
CN110942000A (zh) * 2019-11-13 2020-03-31 南京理工大学 一种基于深度学习的无人驾驶车辆目标检测方法
CN111027547A (zh) * 2019-12-06 2020-04-17 南京大学 一种针对二维图像中的多尺度多形态目标的自动检测方法
US20200133989A1 (en) * 2018-10-31 2020-04-30 Samsung Electronics Co., Ltd. Neural network processor and convolution operation method thereof
CN112418108A (zh) * 2020-11-25 2021-02-26 西北工业大学深圳研究院 一种基于样本重加权的遥感图像多类目标检测方法

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106019340A (zh) * 2016-05-12 2016-10-12 厦门市美亚柏科信息股份有限公司 快速gps定位点获取方法及系统
CN109902697A (zh) * 2017-12-07 2019-06-18 展讯通信(天津)有限公司 多目标检测方法、装置及移动终端
US20200133989A1 (en) * 2018-10-31 2020-04-30 Samsung Electronics Co., Ltd. Neural network processor and convolution operation method thereof
CN110942000A (zh) * 2019-11-13 2020-03-31 南京理工大学 一种基于深度学习的无人驾驶车辆目标检测方法
CN111027547A (zh) * 2019-12-06 2020-04-17 南京大学 一种针对二维图像中的多尺度多形态目标的自动检测方法
CN112418108A (zh) * 2020-11-25 2021-02-26 西北工业大学深圳研究院 一种基于样本重加权的遥感图像多类目标检测方法

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
REDMON JOSEPH; DIVVALA SANTOSH; GIRSHICK ROSS; FARHADI ALI: "You Only Look Once: Unified, Real-Time Object Detection", 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), IEEE, 27 June 2016 (2016-06-27), pages 779 - 788, XP033021255, DOI: 10.1109/CVPR.2016.91 *

Similar Documents

Publication Publication Date Title
CN109815788B (zh) 一种图片聚类方法、装置、存储介质及终端设备
CN111368893A (zh) 图像识别方法、装置、电子设备及存储介质
CN109376596B (zh) 人脸匹配方法、装置、设备及存储介质
CN109840589B (zh) 一种在fpga上运行卷积神经网络的方法和装置
CN111400535A (zh) 轻量级人脸识别方法、系统、计算机设备及存储介质
CN110309836B (zh) 图像特征提取方法、装置、存储介质和设备
CN113704531A (zh) 图像处理方法、装置、电子设备及计算机可读存储介质
US10769784B2 (en) Image analyzing method and electrical device
CN107223242B (zh) 用于在多个已存储图像中搜索相似图像的方法
CN112132279A (zh) 卷积神经网络模型压缩方法、装置、设备及存储介质
WO2022041188A1 (zh) 用于神经网络的加速器、方法、装置及计算机存储介质
CN105354228A (zh) 相似图搜索方法及装置
KR102305575B1 (ko) 이미지 간 유사도를 이용한 유사 영역 강조 방법 및 시스템
CN110825902B (zh) 特征相似性搜索的实现方法、装置、电子设备及存储介质
CN113688261B (zh) 图像数据清理方法、装置、电子设备及可读存储介质
CN113157962B (zh) 图像检索方法、电子装置和存储介质
CN108416425B (zh) 一种卷积运算方法及装置
WO2022193074A1 (zh) Rpn网络的后处理方法及装置
WO2019127926A1 (zh) 一种稀疏神经网络的计算方法及计算装置、电子装置、计算机可读存储介质以及计算机程序产品
CN113705598A (zh) 数据分类方法、装置及电子设备
CN115984671A (zh) 模型在线更新方法、装置、电子设备及可读存储介质
CN108536769B (zh) 图像分析方法、搜索方法及装置、计算机装置及存储介质
CN114897147B (zh) 骨干网络的生成方法、装置、设备以及存储介质
CN112884730B (zh) 一种协同显著性物体检测方法及系统
CN114610922A (zh) 图像处理方法及装置、存储介质及电子设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21930671

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21930671

Country of ref document: EP

Kind code of ref document: A1