CN111626916B - Information processing method, device and equipment - Google Patents
Information processing method, device and equipment Download PDFInfo
- Publication number
- CN111626916B CN111626916B CN202010484303.0A CN202010484303A CN111626916B CN 111626916 B CN111626916 B CN 111626916B CN 202010484303 A CN202010484303 A CN 202010484303A CN 111626916 B CN111626916 B CN 111626916B
- Authority
- CN
- China
- Prior art keywords
- objects
- processing device
- processing
- thread
- task execution
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 230000010365 information processing Effects 0.000 title claims abstract description 57
- 238000003672 processing method Methods 0.000 title claims abstract description 26
- 238000012545 processing Methods 0.000 claims abstract description 327
- 238000000034 method Methods 0.000 claims abstract description 69
- 230000004044 response Effects 0.000 claims description 78
- 238000001914 filtration Methods 0.000 claims description 34
- 238000004590 computer program Methods 0.000 claims description 10
- 230000001629 suppression Effects 0.000 claims description 10
- 238000003062 neural network model Methods 0.000 claims description 7
- 238000009825 accumulation Methods 0.000 claims description 6
- 230000004888 barrier function Effects 0.000 claims description 3
- 230000005764 inhibitory process Effects 0.000 claims 1
- 230000008569 process Effects 0.000 description 22
- 238000001514 detection method Methods 0.000 description 7
- 238000004364 calculation method Methods 0.000 description 4
- 230000006870 function Effects 0.000 description 4
- 238000010586 diagram Methods 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 3
- 230000000717 retained effect Effects 0.000 description 2
- 238000000926 separation method Methods 0.000 description 2
- 230000003993 interaction Effects 0.000 description 1
- 230000014759 maintenance of location Effects 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- 230000000644 propagated effect Effects 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 239000000758 substrate Substances 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T1/00—General purpose image data processing
- G06T1/20—Processor architectures; Processor configuration, e.g. pipelining
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/48—Program initiating; Program switching, e.g. by interrupt
- G06F9/4806—Task transfer initiation or dispatching
- G06F9/4843—Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
- G06F9/4881—Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
- G06F9/5027—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Image Processing (AREA)
Abstract
The invention discloses an information processing method, device and equipment, wherein the method comprises the following steps: a plurality of first thread blocks of the first processing equipment respectively acquire a corresponding object subset in an object set, and the first object in the corresponding object subset is obtained by processing a plurality of objects included in the corresponding object subset, and a plurality of object subsets corresponding to the plurality of first thread blocks are included in the object set; the second thread block of the first processing device obtains a second object in the plurality of first objects by processing the plurality of first objects obtained by the plurality of first thread blocks; and the plurality of first thread blocks respectively filter a plurality of objects in the corresponding object subsets based on the second objects obtained by the second thread blocks to obtain updated corresponding object subsets, wherein the updated object subsets corresponding to the plurality of first thread blocks are contained in the updated object sets.
Description
Technical Field
The present disclosure relates to computer vision, and in particular, to an information processing method, apparatus, and device.
Background
In the process of target detection through a neural network model, a large number of candidate boxes are formed. Screening of test frames is currently typically performed using non-maximal suppression (Non Maximum Suppression, NMS) of these candidate frames.
With the increase of target detection requirements and the diversification and complexity of detection scenes, how to improve the efficiency of non-maximum suppression processing is a problem to be solved.
Disclosure of Invention
The present disclosure provides an information processing scheme.
According to an aspect of the present disclosure, there is provided an information processing method including: a plurality of first thread blocks of first processing equipment respectively acquire a corresponding object subset in an object set, and a first object in the corresponding object subset is obtained by processing a plurality of objects included in the corresponding object subset, wherein the plurality of object subsets corresponding to the plurality of first thread blocks are contained in the object set; the second thread block of the first processing device obtains a second object in the plurality of first objects by processing the plurality of first objects obtained by the plurality of first thread blocks; and the plurality of first thread blocks respectively filter a plurality of objects in the corresponding object subsets based on the second objects obtained by the second thread blocks to obtain updated corresponding object subsets, wherein the updated object subsets corresponding to the plurality of first thread blocks are contained in the updated object sets.
In connection with any one of the embodiments provided in this disclosure, each object included in the subset of objects is processed by a separate thread in the first thread block.
In combination with any one of the embodiments provided in the present disclosure, the filtering processing is performed on a plurality of objects in a corresponding subset of objects by using the plurality of first thread blocks based on the second object obtained by the second thread block, to obtain an updated corresponding subset of objects, including: the first thread block determines a weight coefficient of each object in a plurality of objects contained in the corresponding object subset based on the second object; the first thread block filters the plurality of objects based on a weight coefficient of each object in the plurality of objects.
In combination with any one of the embodiments provided in the present disclosure, the processing a plurality of objects included in the corresponding object subset to obtain a first object in the corresponding object subset includes: in response to receiving a first task execution instruction sent by a second processing device, the plurality of first thread blocks respectively obtain a first object in a corresponding object subset by processing a plurality of objects included in the corresponding object subset, wherein the first task execution instruction comprises the plurality of object subsets; the method further comprises the steps of: the first processing device sends a first task execution response to the second processing device, wherein the first task execution response comprises the first objects respectively obtained by the plurality of first thread blocks.
In combination with any one of the embodiments provided in the present disclosure, the processing, by the second thread block of the first processing device, of the plurality of first objects obtained by the plurality of first thread blocks, to obtain a second object of the plurality of first objects includes: in response to receiving a second task execution instruction sent by the second processing device, the second thread block obtains a second object in the plurality of first objects by processing the plurality of first objects obtained by the plurality of first thread blocks, wherein the second task execution instruction comprises the plurality of first objects; the method further comprises the steps of: and the first processing equipment sends a second task execution response to the second processing equipment, wherein the second task execution response comprises the second object obtained by the second thread block.
In combination with any one of the embodiments provided in the present disclosure, the filtering processing, by the plurality of first thread blocks, on the basis of the second objects obtained by the second thread blocks, the plurality of objects in the corresponding object subsets includes: in response to receiving a third task execution instruction sent by the second processing device, the plurality of first thread blocks respectively perform filtering processing on a plurality of objects in a corresponding object subset based on the second object obtained by the second thread block, wherein the third task execution instruction comprises the second object; the method further comprises the steps of: the first processing device sends a third task execution response to the second processing device, the third task execution response comprising the filtered subset of the plurality of objects.
In connection with any one of the embodiments provided by the present disclosure, the method further comprises: the first processing device performs data synchronization among the plurality of first thread blocks and/or inside the second thread blocks in a memory fence mode.
In connection with any one of the embodiments provided by the present disclosure, the method further comprises: the first processing device establishes a memory fence; the first processing device completes one-time write operation on a memory every time when detecting the plurality of first thread blocks, and performs one-time accumulation counting, wherein the write operation is used for writing a first object obtained by the first thread blocks into the memory; and when the count value reaches the number of the plurality of first thread blocks, the second thread blocks of the first processing equipment read the plurality of first objects from the memory.
In connection with any of the embodiments provided in the present disclosure, the first processing device includes a GPU and the second processing device includes a CPU.
According to an aspect of the present disclosure, there is provided an information processing method including: the second processing device obtains an object set, the object set comprising a plurality of object subsets, each object subset comprising a plurality of objects; sending a first task execution instruction to a first processing device, the first task execution instruction comprising the plurality of subsets of objects; and receiving a first task execution response sent by the second processing device, wherein the first task execution response comprises processing results of the second processing device on the plurality of object subsets through a plurality of thread blocks.
In connection with any of the embodiments provided herein, each object included in the subset of objects is assigned to a separate thread in the thread block for processing.
In combination with any one of the embodiments provided in the present disclosure, the first task execution response includes a first object obtained from a corresponding subset of objects by each of a plurality of first thread blocks of the first processing device.
In connection with any one of the embodiments provided by the present disclosure, the method further comprises: sending a second task execution instruction to the first processing device, wherein the second task execution instruction comprises a plurality of first objects obtained by the plurality of first thread blocks; and receiving a second task execution response sent by the first processing device, wherein the second task execution response comprises the second object obtained by a second thread block of the first processing device.
In connection with any one of the embodiments provided by the present disclosure, the method further comprises: sending a third task execution instruction to the first processing device, wherein the third task execution instruction comprises the second object obtained by the second thread block; and receiving a third task execution response sent by the first device, wherein the third task execution response comprises the plurality of filtered object subsets.
In combination with any one of the embodiments provided in the present disclosure, the first task execution instruction is configured to instruct the first processing device to perform data synchronization inside a thread block and/or between thread blocks by using a memory fence; the first task execution response includes the updated subset of the plurality of objects obtained by filtering the subset of the plurality of objects by the first processing device.
According to an aspect of the present disclosure, there is provided an information processing apparatus including: an obtaining unit, configured to enable a plurality of first thread blocks of a first processing device to obtain corresponding object subsets in an object set, respectively, and obtain a first object in the corresponding object subsets by processing a plurality of objects included in the corresponding object subsets, where the plurality of object subsets corresponding to the plurality of first thread blocks are included in the object set; the processing unit is used for enabling a second thread block of the first processing device to obtain a second object in the plurality of first objects by processing the plurality of first objects obtained by the plurality of first thread blocks; and the filtering unit is used for enabling the plurality of first thread blocks to filter a plurality of objects in the corresponding object subsets based on the second objects obtained by the second thread blocks to obtain updated corresponding object subsets, wherein the updated object subsets corresponding to the plurality of first thread blocks are contained in the updated object sets.
In connection with any one of the embodiments provided in this disclosure, each object included in the subset of objects is processed by a separate thread in the first thread block.
In combination with any one of the embodiments provided in the present disclosure, the filtering unit is specifically configured to: the first thread block determines a weight coefficient of each object in a plurality of objects contained in the corresponding object subset based on the second object; the first thread block filters the plurality of objects based on a weight coefficient of each object in the plurality of objects.
In combination with any one of the embodiments provided in the present disclosure, the obtaining unit is specifically configured to: in response to receiving a first task execution instruction sent by a second processing device, the plurality of first thread blocks respectively obtain a first object in a corresponding object subset by processing a plurality of objects included in the corresponding object subset, wherein the first task execution instruction comprises the plurality of object subsets; the device further comprises a first sending unit, configured to enable the first processing device to send a first task execution response to the second processing device, where the first task execution response includes the first objects obtained by the plurality of first thread blocks respectively.
In combination with any one of the embodiments provided in the present disclosure, the processing unit is specifically configured to: in response to receiving a second task execution instruction sent by the second processing device, the second thread block obtains a second object in the plurality of first objects by processing the plurality of first objects obtained by the plurality of first thread blocks, wherein the second task execution instruction comprises the plurality of first objects; the device further comprises a second sending unit, configured to send, by the first processing device, a second task execution response to the second processing device, where the second task execution response includes the second object obtained by the second thread block.
In combination with any one of the embodiments provided in the present disclosure, the filtering unit is specifically configured to: in response to receiving a third task execution instruction sent by the second processing device, the plurality of first thread blocks respectively perform filtering processing on a plurality of objects in a corresponding object subset based on the second object obtained by the second thread block, wherein the third task execution instruction comprises the second object; the apparatus further includes a third sending unit configured to send, by the first processing device, a third task execution response to the second processing device, where the third task execution response includes the filtered subset of the plurality of objects.
In combination with any one of the embodiments provided in the present disclosure, the apparatus further includes a first synchronization unit, configured to synchronize data between the plurality of first thread blocks and/or inside the second thread blocks by using a memory fence by using the first processing device.
In combination with any one of the embodiments provided in the present disclosure, the apparatus further includes a second synchronization unit, configured to establish a memory fence by the first processing device; the first processing device completes one-time write operation on a memory every time when detecting the plurality of first thread blocks, and performs one-time accumulation counting, wherein the write operation is used for writing a first object obtained by the first thread blocks into the memory; and when the count value reaches the number of the plurality of first thread blocks, the second thread blocks of the first processing equipment read the plurality of first objects from the memory.
In connection with any of the embodiments provided in the present disclosure, the first processing device includes a GPU and the second processing device includes a CPU.
According to an aspect of the present disclosure, there is provided an information processing apparatus including: an acquisition unit configured to cause a second processing apparatus to acquire an object set including a plurality of object subsets, each object subset including a plurality of objects; a sending unit, configured to send a first task execution instruction to a first processing device, where the first task execution instruction includes the plurality of object subsets; the receiving unit is used for receiving a first task execution response sent by the second processing device, wherein the first task execution response comprises processing results of the second processing device on the plurality of object subsets through a plurality of thread blocks.
In connection with any of the embodiments provided herein, each object included in the subset of objects is assigned to a separate thread in the thread block for processing.
In combination with any one of the embodiments provided in the present disclosure, the first task execution response includes a first object obtained from a corresponding subset of objects by each of a plurality of first thread blocks of the first processing device.
In combination with any one of the embodiments provided in the present disclosure, the apparatus further includes a second sending unit, configured to send a second task execution instruction to the first processing device, where the second task execution instruction includes a plurality of the first objects obtained by the plurality of first thread blocks; and receiving a second task execution response sent by the first processing device, wherein the second task execution response comprises the second object obtained by a second thread block of the first processing device.
In combination with any one of the embodiments provided in the present disclosure, the apparatus further includes a third sending unit, configured to send a third task execution instruction to the first processing device, where the third task execution instruction includes the second object obtained by the second thread block; and receiving a third task execution response sent by the first device, wherein the third task execution response comprises the plurality of filtered object subsets.
In combination with any one of the embodiments provided in the present disclosure, the first task execution instruction is configured to instruct the first processing device to perform data synchronization inside a thread block and/or between thread blocks by using a memory fence; the first task execution response includes the updated subset of the plurality of objects obtained by filtering the subset of the plurality of objects by the first processing device.
According to an aspect of the present disclosure, there is provided an information processing apparatus including a memory for storing computer instructions executable on the processor and a processor for executing the computer instructions to implement an information processing method according to any of the embodiments of the present disclosure.
According to an aspect of the present disclosure, there is provided a computer-readable storage medium having stored thereon a computer program which, when executed by a first processing device, implements the information processing method of any of the embodiments of the present disclosure, and when executed by a second processing device, implements the information processing method of any of the embodiments of the present disclosure.
According to an aspect of the present disclosure, there is provided a computer system including a first processing device and a second processing device, wherein the first processing device and the second processing device are information processing apparatuses according to any embodiment of the present disclosure; or, the first processing device and the second processing device are processors for implementing the information processing method according to any embodiment of the disclosure.
According to the information processing method, device and equipment provided by any embodiment of the disclosure, a plurality of first thread blocks of a first processing device respectively acquire a corresponding object subset in an object set, and the first object in the corresponding object subset is obtained by processing a plurality of objects included in the corresponding object subset; the second thread block of the first processing device obtains a second object in the plurality of first objects by processing the plurality of first objects obtained by the plurality of first thread blocks; and the plurality of first thread blocks respectively filter a plurality of objects in the corresponding object subsets based on the second objects obtained by the second thread blocks to obtain the updated corresponding object subsets, so that the updating of the object sets in a parallel mode is realized, and the processing efficiency is improved.
Drawings
In order to more clearly illustrate one or more embodiments of the present specification or the prior art, the drawings that are required for the description of the embodiments or the prior art will be briefly described, it being apparent that the drawings in the following description are only some of the embodiments described in one or more embodiments of the present specification, and that other drawings may be obtained from these drawings without inventive faculty for a person of ordinary skill in the art.
FIG. 1 is a flow chart of an information processing method provided by at least one embodiment of the present disclosure;
FIG. 2A is a schematic diagram illustrating a process of obtaining a second object in an information processing method according to at least one embodiment of the present disclosure;
FIG. 2B is a schematic diagram illustrating a process of filtering a plurality of objects in a subset of objects in an information processing method according to at least one embodiment of the present disclosure;
FIG. 3 is a flow chart of another information processing method provided by at least one embodiment of the present disclosure;
FIG. 4 is a flow chart of another information processing method provided by at least one embodiment of the present disclosure;
fig. 5 is a schematic structural view of an information processing apparatus according to at least one embodiment of the present disclosure;
fig. 6 is a schematic structural view of another information processing apparatus according to at least one embodiment of the present disclosure;
fig. 7 is a block diagram of an electronic device according to at least one embodiment of the present disclosure.
Detailed Description
In order to enable a person skilled in the art to better understand the technical solutions in one or more embodiments of the present specification, the technical solutions in one or more embodiments of the present specification will be clearly and completely described below with reference to the drawings in one or more embodiments of the present specification, and it is obvious that the described embodiments are only some embodiments of the present specification, not all embodiments. All other embodiments, which can be made by one or more embodiments of the present disclosure without inventive effort by one of ordinary skill in the art, are intended to be within the scope of the present disclosure.
Fig. 1 is a flowchart of an information processing method according to at least one embodiment of the present disclosure. As shown in fig. 1, the method includes steps 101 to 103.
In step 101, a plurality of first thread blocks of a first processing device respectively acquire a corresponding subset of objects in an object set, and a first object in the corresponding subset of objects is obtained by processing a plurality of objects included in the corresponding subset of objects, where the plurality of subsets of objects corresponding to the plurality of first thread blocks are included in the object set.
In the disclosed embodiments, the first processing device may include an image processor (Graphics Processing Unit, GPU) and other processors with parallel processing capabilities, such as various types of AI processors, but the disclosed embodiments are not limited in this regard.
In the first processing device, the smallest logical unit is a thread (thread), and several threads form a thread block (block), and the thread block is loaded into a hardware resource of the first processing device and runs.
In the disclosed embodiment, the set of objects includes a plurality of processing objects of an information processing task, the type of which may vary based on different scenarios and information processing tasks. For example, in a scenario in which an image is subject to detection by a neural network model, the object set may be a plurality of candidate boxes obtained by the neural network model, that is, the object set is a candidate box set, and accordingly, the information processing task is configured to perform non-maximum suppression processing on the candidate box set.
The object set may be divided into a plurality of object subsets according to the number of objects contained in the object set, each object subset includes a plurality of objects, and a specific division manner may be determined based on different requirements, for example, average allocation, or allocation based on a specific policy, where a division manner may be specified by a user or a preset rule, which is not limited in the embodiments of the present disclosure. And distributing a plurality of object subsets contained in the object set to different thread blocks of the first processing device, so that each thread block respectively obtains a corresponding object subset in the object set, and each thread block can process the corresponding object subset. The thread block processes a plurality of objects contained in a corresponding object subset, and a first object in the corresponding object subset can be obtained. The first object may be one or more objects selected from a plurality of objects included in the object subset according to an information processing task, for example, the first object is an object in the object subset, where a certain parameter value in the object subset meets a specific condition (such as maximum or minimum), or the thread block may determine a score of each object in the object subset based on a certain rule, for example, the score of the object is a cross-correlation between the object and a reference object, and then determine an object with the largest score as the first object. In this step, for the purpose of distinguishing the description, a thread block of the acquisition object subset is referred to as a first thread block.
In step 102, a second thread block of the first processing device obtains a second object of the plurality of first objects by processing the plurality of first objects obtained by the plurality of first thread blocks.
The second object can be obtained by processing a plurality of first objects obtained by the respective first thread blocks. The second object may be one or more objects selected from the plurality of first objects according to the information processing task, for example, the second object is an object satisfying a specific condition among the plurality of first objects, for example, a certain parameter value satisfies the specific condition, or a result obtained by calculating the second object satisfies the specific condition, and as an example, the second object may be an object having the largest score among the plurality of first objects, but the embodiment of the disclosure is not limited thereto.
In step 103, the plurality of first thread blocks respectively perform filtering processing on a plurality of objects in a corresponding object subset based on the second object obtained by the second thread block, so as to obtain an updated corresponding object subset, where the updated plurality of object subsets corresponding to the plurality of first thread blocks are included in the updated object set.
Each first thread block in the plurality of first thread blocks acquires the second object, and filters the plurality of objects in the object subset corresponding to the first thread block according to the second object to obtain an updated object subset. And filtering the object subsets corresponding to the first thread blocks to obtain updated object subsets, so that the object sets are updated. The filtering manner of the corresponding object subset by each first thread block according to the second object may be determined according to an information processing task, and as an example, a certain parameter value or a specific calculation result of each object in the object subset may be compared with a parameter value or a specific calculation result of the second object, and the object is determined to be retained or filtered according to the relation of the parameter values, so as to implement filtering of the object subset. As another example, the parameter value or the specific calculation result of each object in the subset of objects may be updated based on the second object, resulting in an updated result, and determining to retain or filter the object based on the updated result and a preset threshold, but the embodiments of the disclosure are not limited thereto.
In the embodiment of the disclosure, a plurality of first thread blocks of a first processing device respectively acquire a corresponding object subset in an object set, and obtain a first object in the corresponding object subset by processing a plurality of objects included in the corresponding object subset; the second thread block of the first processing device obtains a second object in the plurality of first objects by processing the plurality of first objects obtained by the plurality of first thread blocks; and the plurality of first thread blocks respectively filter a plurality of objects in the corresponding object subsets based on the second objects obtained by the second thread blocks to obtain the updated corresponding object subsets, so that the updating of the object sets in a parallel mode is realized, and the processing efficiency is improved.
The information processing method proposed by at least one embodiment of the present disclosure is described below by taking a GPU to perform non-maximum suppression processing on a candidate frame set obtained in a target detection process as an example. Those skilled in the art will appreciate that the present method may also be applied to other scenarios, not just those described below.
During target detection, a series of candidate boxes are generated, each containing a location parameter and a Confidence Score (Confidence Score). From the generated series of candidate boxes, a set of candidate boxes (set of objects) may be obtained, which may be part or all of the generated series of candidate boxes. The candidate frame sets are divided into a number of candidate frame subsets (object subsets) according to the number of candidate frames contained in the candidate frame sets, each candidate frame set comprising a plurality of candidate frames (objects).
Taking the example that the candidate frame set includes 4 candidate frame subsets, each candidate frame subset includes 4 candidate frames, referring to the processing procedure shown in fig. 2A, 4 first thread blocks 201 in the GPU respectively obtain one candidate frame subset (object subset) 211, and each first thread block 201 may obtain a candidate frame with the highest confidence score in the candidate frame subset, that is, the first object 212 by comparing confidence scores of the 4 candidate frames included in the obtained candidate frame subset 211.
The second thread block 202 acquires the first objects 212 in the 4 first thread blocks 201, and compares the confidence scores of the acquired 4 first objects 212 to obtain a candidate frame with the highest confidence score, namely, a second object 213. The second object 213 is the candidate box with the highest confidence in the entire set of candidate boxes.
The next processing procedure refers to fig. 2b, where the 4 first thread blocks 201 respectively obtain second objects 213, and respectively perform filtering processing on the respective candidate frame subsets 211 based on the second objects 213 to obtain updated candidate frame subsets 214, where the updated candidate frame subsets 214 are included in the updated candidate frame sets, that is, filtering processing on the entire candidate frame sets is also implemented by filtering processing on each candidate frame subset.
In some embodiments, all of the plurality of objects included in the subset of objects are handled by a same thread in the first thread block. At this point, the first thread block may optionally include one thread. Alternatively, some of the objects in the plurality of objects of the object subset are processed by the same thread in the first thread block, at which time, alternatively, two or more objects processed by the same thread may be performed in a certain order, and processing between different threads is performed in parallel, but the embodiment of the disclosure is not limited thereto.
In some embodiments, each object included in the subset of objects is processed by a separate thread in the first thread block.
And distributing the objects contained in the object subset to different threads in the first thread block, so that each thread obtains one object, each object can be processed by a single thread, the parallel processing process of acquiring the first object is realized, and the information processing efficiency is improved.
In some embodiments, the filtering of the plurality of objects in the subset of objects may be accomplished in the following manner.
First, the first thread block determines a weight coefficient of each object of a plurality of objects included in a corresponding subset of objects based on the second object.
Each thread is caused to obtain the second object by assigning the second object to each thread of the first thread block. The thread obtains an object in the object subset in the previous processing process, and according to a certain parameter value of the object and the parameter value of the second object, a weight coefficient of the object can be determined, wherein the weight coefficient can be input by taking the parameter value of the object and the parameter value of the second object as a weight function, and the obtained function output result is smaller than 1. Taking non-maximum suppression processing of the candidate frame set as an example, firstly determining the intersection ratio between the global maximum candidate frame (second object) and the candidate frame (object) in a thread in the first thread block, and taking the obtained intersection ratio as the input of a weight function to obtain the weight coefficient corresponding to the candidate frame.
Next, the first thread block performs filtering processing on the plurality of objects based on the weight coefficient of each of the plurality of objects.
In one example, the weight coefficient of the object may be multiplied by a certain parameter value of the object to obtain an updated parameter value. And determining whether to retain the object by judging whether the updated parameter value satisfies a set condition. For example, for obtaining a weight coefficient corresponding to a candidate frame, the weight coefficient may be multiplied by a confidence score of the candidate frame to obtain an updated confidence score. In the process shown in FIG. 2B, gray filled circles contained in object subset 215A candidate box is represented that has updated the confidence score. In the event that the updated confidence score is less than a predetermined retention threshold, the candidate box is not retained, i.e., no longer participates in subsequent calculations.As shown in FIG. 2B, the circle with a slash therein is +.>Representing candidate boxes that are not preserved.
In the embodiment of the disclosure, the recall rate of the target object is improved by determining the weight coefficient of each object according to the second object and filtering the plurality of objects in the object subset according to the weight coefficient.
The information processing method provided by the embodiment of the present disclosure may be divided into two parts as a whole, where the second object is obtained by executing steps 101 to 102, and the updated object set is obtained by executing step 103. By repeatedly performing the second object and updating the set of objects until all objects to be processed involved in the task are traversed.
The information processing method provided by the embodiment of the disclosure is realized through the GPU, and because operations between blocks of the GPU are asynchronous, the problem of how to realize data synchronization between blocks when data updating is required to be considered.
In some embodiments, the first processing device updates and synchronizes data by receiving a call from a second processing device and sending a processing result to the second processing device.
In the disclosed embodiments, the first processing device may include a central processing unit (Central Processing Unit, CPU) as well as other processors having similar processing capabilities.
Fig. 3 shows a flow chart of an information processing method proposed by at least one embodiment of the present disclosure. As shown in fig. 3, the method may include:
in step 301, a second processing device obtains a set of objects.
Wherein the set of objects comprises a plurality of subsets of objects, each subset of objects comprising a plurality of objects.
In step 302, the second processing device sends a first task execution instruction to the first processing device.
Wherein the first task execution instruction includes the plurality of subsets of objects.
In one example, the first task execution instruction is configured to call the GPU kernel, allocate a plurality of object subsets to a plurality of first thread blocks of the first processing device, respectively, and allocate a plurality of objects included in the object subsets to a plurality of threads in the first thread blocks, respectively, so that each thread in each first thread block obtains an object.
Step 303, in response to receiving the first task execution instruction sent by the second processing device, the plurality of first thread blocks respectively obtain the first objects in the corresponding object sub-set by processing the plurality of objects included in the corresponding object sub-set.
In one example, each thread in the first thread block of the second processing device obtains an object in the object subset, and the object of each thread is processed to obtain the first object in the object subset.
Step 304, the first processing device sends a first task execution response to the second processing device, where the first task execution response includes the first objects obtained by the plurality of first thread blocks respectively.
Through steps 302 to 304, the second processing device completes one call of the first processing device, completes processing on a plurality of object subsets by using the first processing device, and obtains a processing result, namely a first object.
Step 305, the second processing device sends a second task execution instruction to the first processing device.
The second task execution instruction includes a plurality of first objects obtained by the plurality of first thread blocks.
In one example, the second task execution instruction is configured to cause the CPU to invoke the GPU kernel again to allocate the plurality of first objects obtained by the second processing device to a second thread block of the first processing device.
And step 306, responding to the received second task execution instruction sent by the second processing device, wherein the second thread block obtains a second object in the plurality of first objects by processing the plurality of first objects obtained by the plurality of first thread blocks.
In one example, each thread in the second thread block of the second processing device obtains a first object respectively, and the first object of each thread is processed to obtain a second object in a plurality of first objects.
Step 307, the first processing device sends a second task execution response to the second processing device, where the second task execution response includes the second object obtained by the second thread block.
The second processing device completes the further invocation of the first processing device by steps 305 to 307, completes the processing of the plurality of first objects by the first processing device, and obtains a processing result, namely a second object.
Through the above steps, the second object is obtained. For non-maximum suppression in the target detection task, the candidate box with the greatest global confidence is obtained. Next, the update process is entered.
In step 308, the second processing device sends a third task execution instruction to the first processing device.
And the third task execution instruction comprises the second object obtained by the second thread block.
In one example, the second task execution instruction is configured to call the GPU kernel a third time by the CPU, allocate the second object obtained by the second processing device to a plurality of first thread blocks of the first device, and allocate the second object to each thread of each first thread block.
Step 309, in response to receiving a third task execution instruction sent by the second processing device, the plurality of first thread blocks respectively perform filtering processing on a plurality of objects in the corresponding object subset based on the second objects obtained by the second thread blocks.
In one example, each thread of each first thread block of the first processing device obtains the second object, and the first thread block filters the objects of the respective threads according to the second object, and obtains a plurality of filtered object subsets.
Step 310, the first processing device sends a third task execution response to the second processing device, the third task execution response including the filtered subset of the plurality of objects.
Through steps 308 to 310, the second processing device completes the third invocation of the first processing device, completes the filtering of the plurality of object subsets by using the first processing device, and obtains the plurality of filtered object subsets, namely, completes the updating of the object set.
In the embodiment of the disclosure, the second processing device performs data distribution and receives the processing result of the first processing device on the data in each calling process by calling the first processing device for multiple times, so that updated data can be distributed to the first processing device, the first processing device can perform parallel processing based on the updated data, the processing efficiency is improved, and meanwhile, the data synchronization in the processing process of the second processing device is realized.
In some embodiments, the first processing device performs data synchronization between the plurality of first thread blocks and/or within the second thread blocks by way of a memory barrier.
Taking the processing procedure of step 101 as an example, the plurality of first thread blocks of the first processing device respectively process the plurality of objects included in the corresponding object sub-set, so as to obtain a first object in the object sub-set. Since the operations of the respective thread blocks are invisible to each other, it cannot be ensured that the next operation is performed with the processing of all thread blocks completed, i.e., synchronization of the respective thread blocks cannot be determined.
To solve the above problems, at least one embodiment of the present disclosure proposes a data synchronization method. As shown in fig. 4, the method may include:
in step 401, the second processing device obtains a set of objects.
Wherein the set of objects comprises a plurality of subsets of objects, each subset of objects comprising a plurality of objects.
In step 402, the second processing device sends a first task execution instruction to the first processing device.
Wherein the first task execution instruction includes the plurality of subsets of objects.
In one example, the first task execution instruction is configured to call the GPU kernel, allocate a plurality of object subsets to a plurality of first thread blocks of the first processing device, respectively, and allocate a plurality of objects included in the object subsets to a plurality of threads in the first thread blocks, respectively, so that each thread in each first thread block obtains an object.
Step 403, in response to receiving a first task execution instruction sent by the second processing device, the plurality of first thread blocks respectively obtain a first object in the corresponding object subset by processing a plurality of objects included in the corresponding object subset.
In one example, each thread in the first thread block of the second processing device obtains an object in the object subset, and the object of each thread is processed to obtain the first object in the object subset.
In step 404, the first processing device establishes a memory fence.
By establishing a memory fence, the results of the write operation of the current thread to global memory can be made visible to threads in the remaining thread blocks.
In step 405, the first processing device performs an accumulation count after detecting that the plurality of first thread blocks complete a write operation on the memory each time, where the write operation is used to write a first object obtained by the first thread blocks into the memory.
The accumulated count can be realized through an atomic addition operation, so that the completion condition of writing the first object into the memory by each first thread block can be monitored.
Step 406, when the count value reaches the number of the plurality of first thread blocks, the second thread blocks of the first processing device read the plurality of first objects from the memory.
Under the condition that each first thread block is ensured to finish writing the first object into the memory, the second thread block reads the first objects from the memory, and data synchronization among the first thread blocks is realized.
In step 407, the second thread block obtains a second object by processing the plurality of first objects obtained by the plurality of first thread blocks. The specific process of obtaining the second object is referred to as step 102, and will not be described herein.
For the second object obtained in step 407, it is also necessary to ensure that the second thread block completes the operation of writing the second object into the memory and then reads the second object.
In step 408, the first processing device establishes a memory fence.
And 409, performing, by the first processing device, an accumulation count once a write operation on the memory is detected to be completed by the second thread block each time, where the write operation is used to write a second object obtained by the second thread block into the memory.
In step 410, the plurality of first thread blocks of the first processing device reads the plurality of second objects from the memory if the count value reaches the number of second thread blocks.
Steps 408-410 are implemented in a similar manner to steps 404-406, and specific processes are described with reference to steps 404-406.
Under the condition that each thread of the second thread block is ensured to finish writing the second object into the memory, the plurality of first thread blocks read the plurality of second objects from the memory, and data synchronization inside the second thread block is realized.
In step 411, the plurality of first thread blocks respectively perform filtering processing on the plurality of objects in the corresponding object subset based on the second object obtained by the second thread block.
For the second object calculated again after updating the object subset, the data synchronization among the plurality of first thread blocks can be realized by ensuring that the updating of all the object subsets is completed in the mode of establishing the memory fence.
In the embodiment of the disclosure, the first processing device performs data synchronization among the plurality of first thread blocks and/or inside the second thread blocks in a memory fence manner, so that the processing efficiency is improved, and meanwhile, the resource consumption of interaction between the first processing device and the second processing device is reduced.
In some embodiments, the update status may be monitored by the second processing device setting at least one variable for each object in the set of objects by varying the value of the variable during the update.
In one example, a keep (keep) variable is set for each object in the set of objects, indicating whether the object was kept during the update, e.g., a value of 1 for the keep of the object, and a value of 0 for the non-keep (filter), i.e., indicating that the object is not involved in subsequent computations. The initial value of the reserved variable may be set to 1. An order (order) variable may also be set, representing the order of the objects in the set of objects, and an initial value of the order variable for each object may be set to 0. Those skilled in the art will appreciate that the above variable value settings are merely examples and may be set in other ways, which the present disclosure is not limited to.
In the processing of the object set according to the information processing manner set forth in at least one embodiment of the present disclosure, since the first object is one of the objects obtained from the subset of objects corresponding to the respective first thread blocks and the second object is one of the objects obtained from the first object corresponding to the respective first thread blocks, the determination of the second object undergoes the process of comparing the respective objects in the subset of objects and the comparison of the respective first objects, and thus the respective objects participating in the comparison may be ordered according to a certain parameter value of the objects or a score determined based on a certain rule, and a value other than 0 may be given to the ordering variable.
After the second object is determined, a stage of filtering a plurality of objects in each object subset is entered. For an object for which reservation is determined, the value of the reservation variable for the object may be set to 1; for an object that is determined to be unreserved, then the value of the reserved variable for the object may be set to 0. By reserving or filtering each object in the object subset, the update of the object subset is achieved.
After the update of the object subset is completed, if all the sorting variables of the objects to be processed are endowed with values other than 0 or the values of the reserved variables are not 0, ending the processing process, otherwise, re-executing the process of acquiring the second object and updating the object subset until the conditions are met.
At least one embodiment of the present disclosure further provides an information processing apparatus, as shown in fig. 5, including: an obtaining unit 501, configured to enable a plurality of first thread blocks of a first processing device to obtain corresponding object subsets in an object set, respectively, and obtain a first object in the corresponding object subset by processing a plurality of objects included in the corresponding object subsets, where the plurality of object subsets corresponding to the plurality of first thread blocks are included in the object set; a processing unit 502, configured to enable a second thread block of a first processing device to obtain a second object of the plurality of first objects by processing a plurality of first objects obtained by the plurality of first thread blocks; and a filtering unit 503, configured to cause the plurality of first thread blocks to perform filtering processing on a plurality of objects in a corresponding subset of objects based on the second objects obtained by the second thread blocks, to obtain an updated corresponding subset of objects, where the updated plurality of subsets of objects corresponding to the plurality of first thread blocks are included in the updated subset of objects.
In some embodiments, each object included in the subset of objects is processed by a separate thread in the first thread block.
In some embodiments, the filter unit is specifically configured to: the first thread block determines a weight coefficient of each object in a plurality of objects contained in the corresponding object subset based on the second object; the first thread block filters the plurality of objects based on a weight coefficient of each object in the plurality of objects.
In some embodiments, the acquiring unit is specifically configured to: in response to receiving a first task execution instruction sent by a second processing device, the plurality of first thread blocks respectively obtain a first object in a corresponding object subset by processing a plurality of objects included in the corresponding object subset, wherein the first task execution instruction comprises the plurality of object subsets; the device further comprises a first sending unit, configured to enable the first processing device to send a first task execution response to the second processing device, where the first task execution response includes the first objects obtained by the plurality of first thread blocks respectively.
In some embodiments, the processing unit is specifically configured to: in response to receiving a second task execution instruction sent by the second processing device, the second thread block obtains a second object in the plurality of first objects by processing the plurality of first objects obtained by the plurality of first thread blocks, wherein the second task execution instruction comprises the plurality of first objects; the device further comprises a second sending unit, configured to send, by the first processing device, a second task execution response to the second processing device, where the second task execution response includes the second object obtained by the second thread block.
In some embodiments, the filter unit is specifically configured to: in response to receiving a third task execution instruction sent by the second processing device, the plurality of first thread blocks respectively perform filtering processing on a plurality of objects in a corresponding object subset based on the second object obtained by the second thread block, wherein the third task execution instruction comprises the second object; the apparatus further includes a third sending unit configured to send, by the first processing device, a third task execution response to the second processing device, where the third task execution response includes the filtered subset of the plurality of objects.
In some embodiments, the apparatus further comprises a first synchronization unit configured to synchronize data among the plurality of first thread blocks and/or within the second thread blocks by means of a memory barrier.
In some embodiments, the apparatus further comprises a second synchronization unit for the first processing device to establish a memory fence; the first processing device completes one-time write operation on a memory every time when detecting the plurality of first thread blocks, and performs one-time accumulation counting, wherein the write operation is used for writing a first object obtained by the first thread blocks into the memory; and when the count value reaches the number of the plurality of first thread blocks, the second thread blocks of the first processing equipment read the plurality of first objects from the memory.
In some embodiments, the first processing device comprises a GPU and the second processing device comprises a CPU.
At least one embodiment of the present disclosure further provides an information processing apparatus, as shown in fig. 6, including: an acquisition unit 601 configured to cause a second processing device to acquire an object set, the object set including a plurality of object subsets, each object subset including a plurality of objects; a sending unit 602, configured to send a first task execution instruction to a first processing device, where the first task execution instruction includes the plurality of object subsets; and a receiving unit 603, configured to receive a first task execution response sent by the second processing device, where the first task execution response includes a processing result of the second processing device on the multiple object subsets through multiple thread blocks.
In some embodiments, each object included in the subset of objects is assigned to a separate thread in the thread block for processing.
In some embodiments, the first task execution response includes a first object derived from a corresponding subset of objects for each of a plurality of first thread blocks of the first processing device.
In some embodiments, the apparatus further includes a second sending unit configured to send a second task execution instruction to the first processing device, where the second task execution instruction includes a plurality of the first objects obtained by the plurality of first thread blocks; and receiving a second task execution response sent by the first processing device, wherein the second task execution response comprises the second object obtained by a second thread block of the first processing device.
In some embodiments, the apparatus further includes a third sending unit that sends a third task execution instruction to the first processing device, where the third task execution instruction includes the second object obtained by the second thread block; and receiving a third task execution response sent by the first device, wherein the third task execution response comprises the plurality of filtered object subsets.
In some embodiments, the first task execution instruction is configured to instruct the first processing device to perform data synchronization inside a thread block and/or between thread blocks by using a memory fence; the first task execution response includes the updated subset of the plurality of objects obtained by filtering the subset of the plurality of objects by the first processing device.
Fig. 7 is an electronic device provided in at least one embodiment of the present disclosure, including a memory for storing computer instructions executable on the processor and a processor for executing the computer instructions to implement an information processing method according to any of the embodiments of the present disclosure.
At least one embodiment of the present specification also provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the information processing method described in any of the embodiments of the present specification.
At least one embodiment of the present disclosure further provides a computer system, including a first processing device and a second processing device, where the first processing device and the second processing device are the information processing apparatus described in any one embodiment of the present disclosure; alternatively, the first processing device and the second processing device are processors that implement the information processing method described in any embodiment of the present specification.
One skilled in the relevant art will recognize that one or more embodiments of the present description may be provided as a method, system, or computer program product. Accordingly, one or more embodiments of the present description may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Moreover, one or more embodiments of the present description can take the form of a computer program product on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for data processing apparatus embodiments, the description is relatively simple, as it is substantially similar to method embodiments, with reference to the description of method embodiments in part.
The foregoing describes specific embodiments of the present disclosure. Other embodiments are within the scope of the following claims. In some cases, the acts or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.
Embodiments of the subject matter and the functional operations described in this specification can be implemented in: digital electronic circuitry, tangibly embodied computer software or firmware, computer hardware including the structures disclosed in this specification and structural equivalents thereof, or a combination of one or more of them. Embodiments of the subject matter described in this specification can be implemented as one or more computer programs, i.e., one or more modules of computer program instructions encoded on a tangible, non-transitory program carrier for execution by, or to control the operation of, data processing apparatus. Alternatively or additionally, the program instructions may be encoded on a manually-generated propagated signal, e.g., a machine-generated electrical, optical, or electromagnetic signal, that is generated to encode and transmit information to suitable receiver apparatus for execution by data processing apparatus. The computer storage medium may be a machine-readable storage device, a machine-readable storage substrate, a random or serial access memory device, or a combination of one or more of them.
The processes and logic flows described in this specification can be performed by one or more programmable computers executing one or more computer programs to perform corresponding functions by operating on input data and generating output. The processes and logic flows can also be performed by, and apparatus can also be implemented as, special purpose logic circuitry, e.g., an FPGA (field programmable gate array) or an ASIC (application-specific integrated circuit).
Computers suitable for executing computer programs include, for example, general purpose and/or special purpose microprocessors, or any other type of central processing unit. Typically, the central processing unit will receive instructions and data from a read only memory and/or a random access memory. The essential elements of a computer include a central processing unit for carrying out or executing instructions and one or more memory devices for storing instructions and data. Typically, a computer will also include, or be operatively coupled to receive data from or transfer data to, or both, one or more mass storage devices for storing data, e.g., magnetic, magneto-optical disks, or optical disks, etc. However, a computer does not have to have such a device. Furthermore, the computer may be embedded in another device, such as a mobile phone, a Personal Digital Assistant (PDA), a mobile audio or video player, a game console, a Global Positioning System (GPS) receiver, or a portable storage device such as a Universal Serial Bus (USB) flash drive, to name a few.
Computer readable media suitable for storing computer program instructions and data include all forms of non-volatile memory, media and memory devices including, for example, semiconductor memory devices (e.g., EPROM, EEPROM, and flash memory devices), magnetic disks (e.g., internal hard disk or removable disks), magneto-optical disks, and CD-ROM and DVD-ROM disks. The processor and the memory can be supplemented by, or incorporated in, special purpose logic circuitry.
While this specification contains many specific implementation details, these should not be construed as limitations on the scope of any invention or of what may be claimed, but rather as descriptions of features of specific embodiments of particular inventions. Certain features that are described in this specification in the context of separate embodiments can also be implemented in combination in a single embodiment. On the other hand, the various features described in the individual embodiments may also be implemented separately in the various embodiments or in any suitable subcombination. Furthermore, although features may be acting in certain combinations and even initially claimed as such, one or more features from a claimed combination can in some cases be excised from the combination, and the claimed combination may be directed to a subcombination or variation of a subcombination.
Similarly, although operations are depicted in the drawings in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order, or that all illustrated operations be performed, to achieve desirable results. In some cases, multitasking and parallel processing may be advantageous. Moreover, the separation of various system modules and components in the embodiments described above should not be understood as requiring such separation in all embodiments, and it should be understood that the described program components and systems can generally be integrated together in a single software product or packaged into multiple software products.
Thus, particular embodiments of the subject matter have been described. Other embodiments are within the scope of the following claims. In some cases, the actions recited in the claims can be performed in a different order and still achieve desirable results. Furthermore, the processes depicted in the accompanying drawings are not necessarily required to be in the particular order shown, or sequential order, to achieve desirable results. In some implementations, multitasking and parallel processing may be advantageous.
The foregoing description of the preferred embodiment(s) is (are) merely intended to illustrate the embodiment(s) of the present invention, and it is not intended to limit the embodiment(s) of the present invention to the particular embodiment(s) described.
Claims (20)
1. An information processing method, characterized in that the method comprises:
a plurality of first thread blocks of first processing equipment respectively acquire a corresponding object subset in an object set, and the first object in the corresponding object subset is obtained by processing a plurality of objects included in the corresponding object subset, wherein the plurality of object subsets corresponding to the plurality of first thread blocks are included in the object set, the object set comprises a plurality of processing objects of an information processing task, the object set is a plurality of candidate frames obtained by a neural network model, and the information processing task is used for carrying out non-maximum suppression processing on the candidate frame set;
The second thread block of the first processing device obtains a second object in the plurality of first objects by processing the plurality of first objects obtained by the plurality of first thread blocks;
and the plurality of first thread blocks respectively filter a plurality of objects in the corresponding object subsets based on the second objects obtained by the second thread blocks to obtain updated corresponding object subsets, wherein the updated object subsets corresponding to the plurality of first thread blocks are contained in the updated object sets.
2. The method of claim 1, wherein each object included in the subset of objects is processed by a separate thread in the first thread block.
3. The method according to claim 1 or 2, wherein the filtering, by the plurality of first thread blocks, the plurality of objects in the corresponding subset of objects based on the second object obtained by the second thread block, respectively, to obtain the updated corresponding subset of objects includes:
the first thread block determines a weight coefficient of each object in a plurality of objects contained in the corresponding object subset based on the second object;
The first thread block filters the plurality of objects based on a weight coefficient of each object in the plurality of objects.
4. The method of claim 1, wherein the obtaining a first object in the corresponding subset of objects by processing a plurality of objects included in the corresponding subset of objects comprises:
in response to receiving a first task execution instruction sent by a second processing device, the plurality of first thread blocks respectively obtain a first object in a corresponding object subset by processing a plurality of objects included in the corresponding object subset, wherein the first task execution instruction comprises the plurality of object subsets;
the method further comprises the steps of:
the first processing device sends a first task execution response to the second processing device, wherein the first task execution response comprises the first objects respectively obtained by the plurality of first thread blocks.
5. The method of claim 1, wherein the second thread block of the first processing device obtains a second object of the plurality of first objects by processing the plurality of first objects obtained by the plurality of first thread blocks, comprising:
Responding to receiving a second task execution instruction sent by a second processing device, wherein the second thread block obtains a second object in the plurality of first objects by processing the plurality of first objects obtained by the plurality of first thread blocks, and the second task execution instruction comprises the plurality of first objects;
the method further comprises the steps of:
and the first processing equipment sends a second task execution response to the second processing equipment, wherein the second task execution response comprises the second object obtained by the second thread block.
6. The method of claim 1, wherein the filtering the plurality of objects in the corresponding subset of objects by the plurality of first thread blocks based on the second object obtained by the second thread block, respectively, comprises:
in response to receiving a third task execution instruction sent by a second processing device, the plurality of first thread blocks respectively perform filtering processing on a plurality of objects in a corresponding object subset based on the second object obtained by the second thread block, wherein the third task execution instruction comprises the second object;
the method further comprises the steps of:
The first processing device sends a third task execution response to the second processing device, the third task execution response comprising the filtered subset of the plurality of objects.
7. The method according to claim 4, wherein the method further comprises:
the first processing device performs data synchronization among the plurality of first thread blocks and/or inside the second thread blocks in a memory fence mode.
8. The method according to claim 4, wherein the method further comprises:
the first processing device establishes a memory fence;
the first processing device completes one-time write operation on a memory every time when detecting the plurality of first thread blocks, and performs one-time accumulation counting, wherein the write operation is used for writing a first object obtained by the first thread blocks into the memory;
and when the count value reaches the number of the plurality of first thread blocks, the second thread blocks of the first processing equipment read the plurality of first objects from the memory.
9. The method of any of claims 4 to 8, wherein the first processing device comprises a GPU and the second processing device comprises a CPU.
10. An information processing method, characterized in that the method comprises:
the second processing equipment acquires an object set, wherein the object set comprises a plurality of object subsets, each object subset comprises a plurality of objects, the object set comprises a plurality of processing objects of an information processing task, the object set is a plurality of candidate frames obtained by a neural network model, and the information processing task is used for carrying out non-maximum value inhibition processing on the candidate frame sets;
sending a first task execution instruction to a first processing device, where the first task execution instruction includes the plurality of object subsets, where the first task execution instruction is used for a CPU to call a GPU kernel, allocate the plurality of object subsets to a plurality of first thread blocks of the first processing device, and allocate a plurality of objects included in the object subsets to a plurality of threads in the first thread blocks, respectively, so that each thread in each first thread block obtains an object;
and receiving a first task execution response sent by the first processing device, wherein the first task execution response comprises processing results of the first processing device on the plurality of object subsets through a plurality of thread blocks.
11. The method of claim 10, wherein each object included in the subset of objects is assigned to a separate thread in the thread block for processing.
12. The method of claim 10 or 11, wherein the first task execution response includes a first object derived from a corresponding subset of objects for each of a plurality of first thread blocks of the first processing device.
13. The method according to claim 10, wherein the method further comprises:
sending a second task execution instruction to the first processing device, wherein the second task execution instruction comprises a plurality of first objects obtained by the plurality of first thread blocks;
and receiving a second task execution response sent by the first processing device, wherein the second task execution response comprises the second object obtained by a second thread block of the first processing device.
14. The method according to claim 10, wherein the method further comprises:
sending a third task execution instruction to the first processing device, wherein the third task execution instruction comprises the second object obtained by the second thread block;
And receiving a third task execution response sent by the first processing device, wherein the third task execution response comprises the plurality of filtered object subsets.
15. The method according to any one of claims 13 to 14, wherein the first task execution instruction is configured to instruct the first processing device to perform data synchronization inside and/or between thread blocks by means of a memory barrier;
the first task execution response includes the updated subset of the plurality of objects obtained by filtering the subset of the plurality of objects by the first processing device.
16. An information processing apparatus, characterized in that the apparatus comprises:
an obtaining unit, configured to enable a plurality of first thread blocks of a first processing device to obtain corresponding object subsets in an object set, respectively, and obtain a first object in the corresponding object subsets by processing a plurality of objects included in the corresponding object subsets, where the plurality of object subsets corresponding to the plurality of first thread blocks are included in the object set, the object set includes a plurality of processing objects of an information processing task, the object set is a plurality of candidate frames obtained by a neural network model, and the information processing task is configured to perform non-maximum suppression processing on the candidate frame set;
The processing unit is used for enabling a second thread block of the first processing device to obtain a second object in the plurality of first objects by processing the plurality of first objects obtained by the plurality of first thread blocks;
and the filtering unit is used for enabling the plurality of first thread blocks to filter a plurality of objects in the corresponding object subsets based on the second objects obtained by the second thread blocks to obtain updated corresponding object subsets, wherein the updated object subsets corresponding to the plurality of first thread blocks are contained in the updated object sets.
17. An information processing apparatus, characterized in that the apparatus comprises:
an acquisition unit configured to cause a second processing apparatus to acquire an object set including a plurality of object subsets, each object subset including a plurality of objects, the object set including a plurality of processing objects of an information processing task, the object set being a plurality of candidate frames obtained by a neural network model, the information processing task being configured to perform non-maximum suppression processing on the candidate frame set;
a sending unit, configured to send a first task execution instruction to a first processing device, where the first task execution instruction includes the plurality of object subsets, where the first task execution instruction is used for a CPU to call a GPU kernel, allocate the plurality of object subsets to a plurality of first thread blocks of the first processing device, and allocate a plurality of objects included in the object subsets to a plurality of threads in the first thread blocks, so that each thread in each first thread block obtains an object;
The receiving unit is used for receiving a first task execution response sent by the first processing device, wherein the first task execution response comprises processing results of the first processing device on the plurality of object subsets through a plurality of thread blocks.
18. An information processing apparatus comprising a memory for storing computer instructions executable on the processor and a processor for executing the computer instructions to implement the method of any one of claims 1 to 9 or to implement the method of any one of claims 10 to 15.
19. A computer readable storage medium having stored thereon a computer program, wherein the program when executed by a first processing device implements the method of any of claims 1 to 9 and when executed by a second processing device implements the method of any of claims 10 to 15.
20. A computer system comprising a first processing device and a second processing device, wherein,
the first processing device is the information processing apparatus of claim 16, and the second processing device is the information processing apparatus of claim 17; or,
The first processing device is a processor implementing the method of any of claims 1 to 9, and the second processing device is a processor implementing the method of any of claims 10 to 15.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010484303.0A CN111626916B (en) | 2020-06-01 | 2020-06-01 | Information processing method, device and equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010484303.0A CN111626916B (en) | 2020-06-01 | 2020-06-01 | Information processing method, device and equipment |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111626916A CN111626916A (en) | 2020-09-04 |
CN111626916B true CN111626916B (en) | 2024-03-22 |
Family
ID=72272645
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010484303.0A Active CN111626916B (en) | 2020-06-01 | 2020-06-01 | Information processing method, device and equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111626916B (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2022160229A1 (en) * | 2021-01-29 | 2022-08-04 | 华为技术有限公司 | Apparatus and method for processing candidate boxes by using plurality of cores |
CN116302504B (en) * | 2023-02-23 | 2024-08-27 | 海光信息技术股份有限公司 | Thread block processing system, method and related equipment |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109242168A (en) * | 2018-08-27 | 2019-01-18 | 北京百度网讯科技有限公司 | Determine the method, apparatus, equipment and computer readable storage medium of shortest path |
WO2020042126A1 (en) * | 2018-08-30 | 2020-03-05 | 华为技术有限公司 | Focusing apparatus, method and related device |
-
2020
- 2020-06-01 CN CN202010484303.0A patent/CN111626916B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109242168A (en) * | 2018-08-27 | 2019-01-18 | 北京百度网讯科技有限公司 | Determine the method, apparatus, equipment and computer readable storage medium of shortest path |
WO2020042126A1 (en) * | 2018-08-30 | 2020-03-05 | 华为技术有限公司 | Focusing apparatus, method and related device |
Non-Patent Citations (1)
Title |
---|
韦春丹 ; 龚奕利 ; 李文海 ; .一种基于GPU的移动对象并行处理框架.计算机应用与软件.2016,(10),全文. * |
Also Published As
Publication number | Publication date |
---|---|
CN111626916A (en) | 2020-09-04 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110750342B (en) | Scheduling method, scheduling device, electronic equipment and readable storage medium | |
CN111626916B (en) | Information processing method, device and equipment | |
CN107026900B (en) | Shooting task allocation method and device | |
CN110494848A (en) | Task processing method, equipment and machine readable storage medium | |
CN110381310B (en) | Method and device for detecting health state of visual system | |
CN110704182A (en) | Deep learning resource scheduling method and device and terminal equipment | |
CN111245732A (en) | Flow control method, device and equipment | |
CN111246500A (en) | Bluetooth channel evaluation method and device for identifying wifi interference and storage medium | |
CN108154163A (en) | Data processing method, data identification and learning method and its device | |
CN115269118A (en) | Scheduling method, device and equipment of virtual machine | |
CN117829269B (en) | Federal learning method, apparatus, computing device, and machine-readable storage medium | |
CN111625358B (en) | Resource allocation method and device, electronic equipment and storage medium | |
CN108174055B (en) | Intelligent monitoring method, system, equipment and storage medium | |
CN109358961B (en) | Resource scheduling method and device with storage function | |
CN111143148B (en) | Model parameter determining method, device and storage medium | |
US20200226767A1 (en) | Tracking apparatus and computer readable medium | |
CN110308873B (en) | Data storage method, device, equipment and medium | |
CN116260876A (en) | AI application scheduling method and device based on K8s and electronic equipment | |
CN113343725B (en) | Anti-collision method and system for multiple RFID readers | |
CN115643178A (en) | Network target range configuration method, device, equipment and machine readable storage medium | |
CN109034174B (en) | Cascade classifier training method and device | |
CN107196866B (en) | Flow control method and device | |
CN105635596A (en) | System for controlling exposure of camera and method thereof | |
CN111290850A (en) | Data storage method, device and equipment | |
CN111381956A (en) | Task processing method and device and cloud analysis system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |