CN104123178B - Parallelism constraint detection method based on GPUs - Google Patents

Parallelism constraint detection method based on GPUs Download PDF

Info

Publication number
CN104123178B
CN104123178B CN201410358441.9A CN201410358441A CN104123178B CN 104123178 B CN104123178 B CN 104123178B CN 201410358441 A CN201410358441 A CN 201410358441A CN 104123178 B CN104123178 B CN 104123178B
Authority
CN
China
Prior art keywords
node
current
result
constraint
pointer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201410358441.9A
Other languages
Chinese (zh)
Other versions
CN104123178A (en
Inventor
许畅
马晓星
吕建
眭骏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
CVIC Software Engineering Co Ltd
Original Assignee
Nanjing University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University filed Critical Nanjing University
Priority to CN201410358441.9A priority Critical patent/CN104123178B/en
Publication of CN104123178A publication Critical patent/CN104123178A/en
Application granted granted Critical
Publication of CN104123178B publication Critical patent/CN104123178B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Multi Processors (AREA)
  • Devices For Executing Special Programs (AREA)

Abstract

本发明是一种基于图形处理器的并行化地检测约束的方法,步骤:1)以量词为分割点,将一条约束分割成若干处理单元,通过调度这些处理单元,消除检测过程中的递归并最大化并行度;2)根据当前的处理单元和信息集合,产生相应数量的GPU线程,每个GPU线程根据自身的线程号计算其对应的变量赋值,并处理在此赋值下的处理单元。一个被赋值的处理单元称为一个并行计算单元,并行计算单元是能在GPU中并行处理的最小单元;3)索引‑结果池的二层次存储策略,所有并行计算单元的节点产生的非定长结果存储在结果池中,而在索引中存储节点产生的结果在结果池中的起始地址和长度,该策略“串行分配空间,并行写结果”,能达到较高的写速度。

The present invention is a method for detecting constraints in parallel based on a graphics processor. Steps: 1) using quantifiers as segmentation points, a constraint is divided into several processing units, and the recursive merger in the detection process is eliminated by scheduling these processing units. Maximize the degree of parallelism; 2) Generate a corresponding number of GPU threads according to the current processing unit and information set, each GPU thread calculates its corresponding variable assignment according to its own thread number, and processes the processing unit under this assignment. An assigned processing unit is called a parallel computing unit, which is the smallest unit that can be processed in parallel in the GPU; 3) The two-level storage strategy of the index-result pool, the non-fixed-length data generated by the nodes of all parallel computing units The results are stored in the result pool, and the starting address and length of the results generated by the nodes in the result pool are stored in the index. This strategy "serially allocates space and writes results in parallel", which can achieve a higher writing speed.

Description

基于图形处理器的并行化约束检测方法Parallelization Constraint Detection Method Based on Graphics Processor

技术领域technical field

本发明涉及一种基于图形处理器的并行化约束检测方法。The invention relates to a parallelization constraint detection method based on a graphics processor.

背景技术Background technique

约束检测是一种常用的验证信息有效性的方法。一条约束反映了一条信息或者多条信息之间应该满足的关系。一般而言,一条约束由若干种节点联接而成:“全称量词”节点,“存在量词”节点,“与”节点,“或”节点,“蕴含”节点,“非”节点和“函数”节点。每种节点描述了一个特定的关系。检测约束即:将获取的信息与预定义的约束进行核对,违反了约束的一条信息或者一组信息是无效的。约束检测通常是结合其他应用中的。Constraint detection is a commonly used method to verify the validity of information. A constraint reflects a relationship that should be satisfied between one piece of information or multiple pieces of information. Generally speaking, a constraint is connected by several kinds of nodes: "universal quantifier" node, "existential quantifier" node, "and" node, "or" node, "implies" node, "not" node and "function" node . Each kind of node describes a specific relationship. Detecting constraints is to check the acquired information against predefined constraints, and a piece of information or a group of information that violates the constraints is invalid. Constraint detection is often combined with other applications.

当前约束检测的方式主要有两类:增量式检测和并行检测。但是,这两种方式都完全依赖于中央处理器(CPU),因此会消耗大量本来应该用于其他应用的计算资源。本方法的计算不再依赖于CPU,相反,它主要依赖图形处理器(GPU)进行计算。因此,该方法在提高了约束检测的速度的同时,也保证了有充分的计算资源供其他应用使用。There are two main types of current constraint detection methods: incremental detection and parallel detection. However, both of these methods are completely dependent on the central processing unit (CPU), thus consuming a large amount of computing resources that should be used for other applications. The calculation of this method no longer depends on the CPU, on the contrary, it mainly relies on the graphics processing unit (GPU) for calculation. Therefore, while improving the speed of constraint detection, this method also ensures sufficient computing resources for other applications.

发明内容Contents of the invention

针对现有技术中存在的不足,对当前约束检测耗时过大,占用资源过多的缺点,本发明提出了一种基于GPU的约束检测方法。该方法的核心在于三个部分:约束预处理;并行策略;存储策略。Aiming at the deficiencies in the prior art, the present invention proposes a constraint detection method based on GPU, which takes too much time and occupies too many resources. The core of this method lies in three parts: constraint preprocessing; parallel strategy; storage strategy.

本发明的技术方案为:一种基于图形处理器的并行化约束检测方法,它包括:The technical solution of the present invention is: a parallel constraint detection method based on a graphics processor, which includes:

约束预处理,基于量词的约束分割方法;具体为:Constraint preprocessing, quantifier-based constraint segmentation method; specifically:

步骤1、指定约束头节点为当前节点,从当前节点开始分割;Step 1. Designate the constraint head node as the current node, and start splitting from the current node;

步骤2、若当前节点为“全称量词”或者“存在量词”节点,则将该舦灝分割成两个子部分,一个子部分以该量词节点结束,另一部分从该量词节点的子节点开始,指定该量词节点的子节点为当前节点继续分割;Step 2. If the current node is a "full name quantifier" or "existence quantifier" node, then divide the conglomerate into two sub-parts, one sub-part ends with the quantifier node, and the other starts with the sub-node of the quantifier node, specifying The child node of the quantifier node continues to split for the current node;

步骤3、若当前节点为“与”节点,“或”节点或者“蕴含”节点,则指定该节点的左子节点为当前节点继续分割,处理完左子节点后,指定该节点的右子节点为当前节点继续分割;Step 3. If the current node is an "and" node, an "or" node or an "implication" node, specify the left child node of the node as the current node to continue splitting. After processing the left child node, specify the right child node of the node Continue splitting for the current node;

步骤4、若当前节点为“非”节点,则指定该节点的子节点为当前节点继续分割;Step 4, if the current node is a "non" node, then specify the child node of the node to continue splitting for the current node;

步骤5、若当前节点为“函数”节点,则停止当前分支的递归;Step 5. If the current node is a "function" node, stop the recursion of the current branch;

经过分割后,一条约束被转变为若干处理单元,各个处理单元不相交,且所有处理单元共同构成该约束。After splitting, a constraint is transformed into several processing units, each processing unit is disjoint, and all processing units together constitute the constraint.

并行策略,基于处理单元的并行处理方法;具体为:Parallel strategy, a parallel processing method based on processing units; specifically:

步骤1、计算所需线程数N,设从当前处理单元的父节点开始,到约束头结点的路径中的变量<υ12,...υn>对应的上下文信息集合为<Si,S2,...Sn>,各个上下文信息集合中的信息条数为<I1,I2,...In>,则N=I1×I2×...×In;若当前处理单元到头节点不包含任何变量,或者当前处理单元包含头节点,则N=1;Step 1. Calculate the number of threads N required, and set the context information set corresponding to the variable <υ 12 ,...υ n > in the path from the parent node of the current processing unit to the constraint head node as < S i , S 2 ,...S n >, the number of pieces of information in each context information set is <I 1 , I 2 ,...I n >, then N=I 1 ×I 2 ×...× I n ; if the current processing unit does not contain any variable to the head node, or the current processing unit contains the head node, then N=1;

步骤2、生成N个GPU线程,线程id从0至N-1(该id由GPU自动分配);各个线程根据自身id独立计算其对应的赋值,设整数值Mi=j表示变量υi取其对应集合Si中第j条信息(0≤Mi<Ii);则Mi的取值按以下步骤得出:Step 2, generate N GPU threads, the thread id is from 0 to N-1 (the id is automatically assigned by the GPU); each thread independently calculates its corresponding assignment according to its own id, and setting the integer value M i =j means that the variable υ i takes It corresponds to the jth piece of information in the set S i (0≤M i <I i ); then the value of M i is obtained according to the following steps:

i:设size=1,cur=n;i: set size=1, cur=n;

ii:若cur≥1,转子iii,否则结束;ii: If cur ≥ 1, rotor iii, otherwise end;

iii:size=size*Icur;cur=cur-1,转子ii。iii: size=size*I cur ; cur=cur-1, rotor ii.

步骤3、各个线程将所计算出的赋值映射到处理单元中,产生各个线程需要处理的并行计算单元;各个线程独立处理各个并行计算单元;Step 3. Each thread maps the calculated assignment to a processing unit to generate a parallel computing unit that each thread needs to process; each thread independently processes each parallel computing unit;

所述的所有GPU线程并发执行,且互相之间不存在依赖关系。All the GPU threads described above are executed concurrently, and there is no dependency relationship between them.

存储策略。索引-结果池的二层次存储方法,主要包含三个部分:1)索引数组,包含两个域:结果的起始位置pos和长度len;2)结果数组;3)结果数组位置指针Pointer(简称位置指针),它只能被互斥地写。设n个线程产生的结果长度分别为l1,l2...li...ln,所述索引-结果池的二层次存储方法具体为:storage strategy. The two-level storage method of the index-result pool mainly includes three parts: 1) index array, including two fields: the starting position pos of the result and the length len; 2) the result array; 3) the result array position pointer Pointer (referred to as location pointer), which can only be written exclusively. Assuming that the lengths of results generated by n threads are l 1 , l 2 ... l i ... l n , the two-level storage method of the index-result pool is specifically:

步骤1、各个线程根据其赋值计算各自当前节点在索引数组中的存储位置;Step 1. Each thread calculates the storage position of its current node in the index array according to its assigned value;

步骤2、各个线程互斥的获取结果数组位置指针,设第i个线程获取到该结果数组位置指针,则它将该节点的起始位置pos设为Pointer当前值,之后,Step 2. Each thread obtains the position pointer of the result array mutually exclusively. If the ith thread obtains the position pointer of the result array, it sets the starting position pos of the node as the current value of Pointer. After that,

步骤3、更新该结果数组位置指针的值:Pointernew=Pointerold+li,其中,Pointerold为位置指针初始值,li为第i个线程产生的结果长度;Step 3, update the value of the result array position pointer: Pointer new =Pointer old +l i , wherein, Pointer old is the initial value of the position pointer, and l i is the length of the result produced by the ith thread;

步骤4、更新后释放该结果数组位置指针供其他线程使用,并将结果填入结果数组,结果长度填入该节点的结果长度len中。Step 4. After updating, release the result array position pointer for use by other threads, and fill the result into the result array, and fill the result length into the result length len of the node.

本发明的有益效果:本发明能高效地利用GPU进行约束检测:约束分割消除了约束处理过程中的递归,使之适应于GPU的工作方式;基于处理单元的并行策略,使得各个线程能够独立地定位和处理数据,提升了该方法的并发性;并发存储策略显著提升了存储效率。该发明在提升了约束检测的效率的同时,大幅降低对CPU资源的依赖,从而使得CPU资源能够更多的服务于其他应用。此外,由于GPU和CPU可以同时执行,不存在相互等待的情况,因此,该方法也能借此特性获得额外的效率上的增益。Beneficial effects of the present invention: the present invention can efficiently utilize GPU to carry out constraint detection: constraint segmentation eliminates recursion in the constraint processing process, making it suitable for the working mode of GPU; based on the parallel strategy of the processing unit, each thread can independently Locating and processing data improves the concurrency of the method; the concurrent storage strategy significantly improves storage efficiency. While improving the efficiency of constraint detection, the invention greatly reduces the dependence on CPU resources, so that more CPU resources can serve other applications. In addition, since the GPU and the CPU can execute at the same time, there is no waiting for each other, so this method can also use this feature to obtain additional efficiency gains.

附图说明Description of drawings

图1本发明的约束处理用例图。Fig. 1 is a use case diagram of constraint processing in the present invention.

图2本发明计算过程中的上下文映射和并行策略。Fig. 2 Context mapping and parallel strategy in the computing process of the present invention.

图3本发明二层次的存储策略。Figure 3 shows the two-level storage strategy of the present invention.

具体实施方式detailed description

以下结合附图和具体实施例对本发明作进一步详细说明。The present invention will be described in further detail below in conjunction with the accompanying drawings and specific embodiments.

本实施例的基于GPU的约束检测方法。该方法的核心在于三个部分:约束预处理;并行策略;存储策略。具体的说:The GPU-based constraint detection method of this embodiment. The core of this method lies in three parts: constraint preprocessing; parallel strategy; storage strategy. Specifically:

1.约束预处理。本发明提出了基于量词的约束分割方法,包含以下步骤:1. Constraint preprocessing. The present invention proposes a constraint segmentation method based on quantifiers, comprising the following steps:

a)指定约束头节点为当前节点,从当前节点开始分割;a) Designate the constraint head node as the current node, and start splitting from the current node;

b)若当前节点为“全称量词”或者“存在量词”节点,则将该部分分割成两个子部分,一个子部分以该量词节点结束,另一部分从该量词节点的子节点开始,指定该量词节点的子节点为当前节点继续分割;b) If the current node is a "full quantifier" or "existence quantifier" node, divide the part into two subparts, one subpart ends with the quantifier node, and the other starts from the subnode of the quantifier node, specifying the quantifier The child nodes of the node continue to split for the current node;

c)若当前节点为“与”节点,“或”节点或者“蕴含”节点,则指定该节点的左子节点为当前节点继续分割,处理完左子节点后,指定该节点的右子节点为当前节点继续分割;c) If the current node is an "and" node, an "or" node or an "implication" node, specify the left child node of the node as the current node to continue splitting. After processing the left child node, specify the right child node of the node as The current node continues to split;

d)若当前节点为“非”节点,则指定该节点的子节点为当前节点继续分割;d) If the current node is a "non-" node, specify the sub-node of the node as the current node to continue splitting;

e)若当前节点为“函数”节点,则停止当前分支的递归。经过分割后,一条约束被转变为若干处理单元,各个处理单元不相交,且所有处理单元共同构成该约束。图1展示了一条约束,它的含义如下:对于在城市A中的任何一辆出租车,它在一段时间内行驶的距离只能在一个合理的范围内。对于这条约束,它按照上述算法将被分割成三部分,如图中虚线所示。e) If the current node is a "function" node, stop the recursion of the current branch. After splitting, a constraint is transformed into several processing units, each processing unit is disjoint, and all processing units together constitute the constraint. Figure 1 shows a constraint, and its meaning is as follows: For any taxi in city A, the distance it travels within a period of time can only be within a reasonable range. For this constraint, it will be divided into three parts according to the above algorithm, as shown by the dotted line in the figure.

2.并行策略。基于处理单元的并行处理方法,包含以下步骤:2. Parallel strategy. The processing unit-based parallel processing method includes the following steps:

a)计算所需线程数N。设从当前处理单元的父节点开始,到约束头结点的路径中的变量<υ12,...υn>对应的上下文信息集合为<S1,S2,...Sn>,各个上下文信息集合中的信息条数为<I1,I2,...In>,则N=I1×I2×...×In;若当前处理单元到头节点不包含任何变量,或者当前处理单元包含头节点,则N=1;a) Calculate the required number of threads N. Let the context information set corresponding to variables <υ 12 ,...υ n > in the path from the parent node of the current processing unit to the constraint head node be <S 1 , S 2 ,...S n >, the number of pieces of information in each context information set is <I 1 , I 2 ,...I n >, then N=I 1 ×I 2 ×...×I n ; if the current processing unit reaches the head node Contains any variable, or the current processing unit contains a head node, then N=1;

b)生成N个GPU线程,线程id从0至N-1(该id由GPU自动分配);各个线程根据自身id独立计算其对应的赋值。设整数值Mi=j表示变量υi取其对应集合Si中第j条信息(0≤Mi<li),则Mi的取值按以下步骤得出:b) Generate N GPU threads with thread ids ranging from 0 to N-1 (the id is automatically assigned by the GPU); each thread independently calculates its corresponding assignment according to its own id. Let the integer value M i =j mean that the variable υ i takes the jth piece of information in its corresponding set S i (0≤M i <l i ), then the value of M i is obtained according to the following steps:

i.令size=1,cur=n;i. let size=1, cur=n;

ii.若cur≥1,转iii,否则结束;ii. If cur≥1, go to iii, otherwise end;

iii.size=size*Icur;cur=cur-1,转iiiii. size=size*I cur ; cur=cur-1, transfer to ii

c)各个线程将所计算出的赋值映射到处理单元中,产生各个线程需要处理的并行计算单元;各个线程独立处理各个并行计算单元。c) Each thread maps the calculated assignment to a processing unit to generate a parallel computing unit that each thread needs to process; each thread independently processes each parallel computing unit.

所述的所有GPU线程并发执行,且互相之间不存在依赖关系。以图1所示约束为例。设当前接收到两条A城市中的出租车信息:出租车1和出租车2。则在计算处理单元1时,根据上述算法步骤a),该处理单元到根节点有2个变量(a和b),每个变量可以取两个值(出租车1和出租车2),因此N=4;根据步骤b)生成了4个线程(id分别为0,1,2,3)。各个线程独立计算其变量的取值。以计算id为3的线程的变量取值为例,两个变量对应的赋值计算过程为:All the GPU threads described above are executed concurrently, and there is no dependency relationship between them. Take the constraints shown in Figure 1 as an example. Assume that two taxi information in city A are currently received: taxi 1 and taxi 2. Then, when calculating the processing unit 1, according to the above algorithm step a), the processing unit has 2 variables (a and b) from the root node, and each variable can take two values (taxi 1 and taxi 2), so N=4; 4 threads (ids are 0, 1, 2, 3 respectively) are generated according to step b). Each thread independently calculates the value of its variable. Taking the calculation of the variable value of the thread whose id is 3 as an example, the assignment calculation process corresponding to the two variables is:

size=1,cur=2;size=1, cur=2;

size=1×Icur=1×2=2,cur=cur-1=1;size=1×I cur =1×2=2, cur=cur-1=1;

由于cur≥1,继续该过程,可以得到M1=1;最终得出M1=1,M2=1,注意到信息是从0开始编号,因此M1=1,M2=1意味着第一个变量和第二个变量都取各自信息集合中第二条信息,即(a=出租车2,b=出租车2)。将取值映射到处理单元中,可以得到并行计算单元。图2的并行计算单元组1展示了生成的4个并行计算单元。Since cur ≥ 1, continue this process, you can get M 1 = 1; finally get M 1 = 1, M 2 = 1, note that the information is numbered from 0, so M 1 = 1, M 2 = 1 means Both the first variable and the second variable take the second piece of information in their respective information sets, ie (a=taxi 2, b=taxi 2). Mapping the value to the processing unit can result in a parallel computing unit. Parallel computing unit group 1 in FIG. 2 shows four generated parallel computing units.

3.存储策略。索引-结果池的二层次存储方法,主要包含三个部分:1)索引数组,包含两个域:结果的起始位置pos和长度len;2)结果数组;3)结果数组位置指针Pointer(简称位置指针),它只能被互斥地写。设n个线程产生的结果长度分别为l1,l2...li...ln,则该方法存储过程如下:3. Storage strategy. The two-level storage method of the index-result pool mainly includes three parts: 1) index array, including two fields: the starting position pos of the result and the length len; 2) the result array; 3) the result array position pointer Pointer (referred to as location pointer), which can only be written exclusively. Assuming that the lengths of the results generated by n threads are l 1 , l 2 ... l i ... l n , the stored procedure of this method is as follows:

a)各个线程根据其赋值计算各自当前节点在索引数组中的存储位置;a) Each thread calculates the storage position of each current node in the index array according to its assignment;

b)各个线程互斥的获取位置指针,设第i个线程获取到该位置指针,则它将该节点的起始位置pos设为Pointer当前值,之后,b) Each thread obtains the position pointer mutually exclusive, if the i-th thread obtains the position pointer, then it sets the starting position pos of the node as the current value of Pointer, after that,

c)更新该位置指针的值:Pointernew=Pointerold+li,其中,Pointerold为位置指针初始值,li为第i个线程产生的结果长度。c) Update the value of the location pointer: Pointer new = Pointer old + l i , wherein, Pointer old is the initial value of the location pointer, and l i is the length of the result generated by the i-th thread.

d)更新后释放该位置指针供其他线程使用,并将结果填入结果数组,结果长度填入该节点的结果长度len中。d) After updating, release the location pointer for use by other threads, and fill the result into the result array, and fill the result length into the result length len of the node.

图3中展示了该存储策略:设有三个线程同时在处理三个节点,它们都需要将结果写入存储器。设这三个节点的结果分别需要占用1,3,2个存储空间,则它们将同时申请访问位置指针(当前值为0)。假设线程t1获取到该位置指针访问权,则它设置自己的结果的起始位置为Pointer当前值(即0),并更新Pointer的值Pointernew=Pointerold+1=1,之后释放该指针供线程t2和t3访问;t1更新位置指针值之后,将结果填入描述信息数组中,并更新索引数组中自己的长度(即1)。索引数组中第一个元素记录了t1产生的结果的存储起始位置(0)及长度(1)。设t2先于t3获取Pointer访问权,由于t1的更新,Pointer的当前值为1,因此,t2的结果的起始位置为1,更新Pointer的值Pointernew=Pointerold+3=4,释放Pointer,结果填入描述信息数组中,并更新索引数组中自己的长度(即3)。这种策略串行的分配空间,但是多个线程同时写结果是可以并行并且无冲突的完成。The storage strategy is shown in Figure 3: There are three threads processing three nodes at the same time, and they all need to write results to memory. Assuming that the results of these three nodes need to occupy 1, 3, and 2 storage spaces respectively, they will apply for access location pointers (the current value is 0) at the same time. Assuming that thread t1 obtains the right to access the position pointer, it sets the starting position of its own result as the current value of Pointer (ie 0), and updates the value of Pointer: Pointer new = Pointer old + 1 = 1, and then releases the pointer For access by threads t 2 and t 3 ; after t 1 updates the value of the position pointer, fills the result into the description information array, and updates its own length in the index array (ie 1). The first element in the index array records the storage start position (0) and length (1) of the result generated by t 1 . Assume that t 2 obtains Pointer access right before t 3. Due to the update of t 1 , the current value of Pointer is 1. Therefore, the starting position of the result of t 2 is 1, and update the value of Pointer. Pointer new = Pointer old + 3 = 4. Release the Pointer, fill the result into the description information array, and update its own length in the index array (that is, 3). This strategy allocates space serially, but multiple threads can write results in parallel and without conflict.

上述1,2阐述了使用GPU并行检测约束的方法,3阐述了适用于GPU的高效的并行写结果的方法。The above 1, 2 expounded the method of using GPU parallel detection constraints, and 3 expounded the method of efficient parallel writing results suitable for GPU.

以上实施例只是对于本发明的部分功能进行描述,但实施例和附图并不是用来限定本发明的。在不脱离本发明之精神和范围内,所做的任何等效变化或润饰,同样属于本发明之保护范围。因此本发明的保护范围应当以本申请的权利要求所界定的内容为准。The above embodiments only describe part of the functions of the present invention, but the embodiments and drawings are not used to limit the present invention. Any equivalent changes or modifications made without departing from the spirit and scope of the present invention also belong to the protection scope of the present invention. Therefore, the scope of protection of the present invention should be determined by the content defined in the claims of the present application.

Claims (4)

1.一种基于图形处理器的并行化约束检测方法,其特征在于,它包括:1. A parallelization constraint detection method based on a graphics processor, characterized in that it comprises: 基于量词的约束分割步骤;quantifier-based constraint segmentation step; 基于处理单元的并行处理步骤;Parallel processing steps based on processing units; 存储策略步骤;storage policy steps; 所述基于量词的约束分割步骤具体为:The quantifier-based constraint segmentation step is specifically: 步骤1、指定约束头节点为当前节点,从当前节点开始分割;Step 1. Designate the constraint head node as the current node, and start splitting from the current node; 步骤2、若当前节点为“全称量词”或者“存在量词”节点,则将该节点分割成两个子部分,一个子部分以该量词节点结束,另一部分从该量词节点的子节点开始,指定该量词节点的子节点为当前节点继续分割;Step 2. If the current node is a "full quantifier" or "existence quantifier" node, divide the node into two subparts, one subpart ends with the quantifier node, and the other part starts with the subnode of the quantifier node. Specify the The child nodes of the quantifier node continue to split for the current node; 步骤3、若当前节点为“与”节点,“或”节点或者“蕴含”节点,则指定该节点的左子节点为当前节点继续分割,处理完左子节点后,指定该节点的右子节点为当前节点继续分割;Step 3. If the current node is an "and" node, an "or" node or an "implication" node, specify the left child node of the node as the current node to continue splitting. After processing the left child node, specify the right child node of the node Continue splitting for the current node; 步骤4、若当前节点为“非”节点,则指定该节点的子节点为当前节点继续分割;Step 4, if the current node is a "non" node, then specify the child node of the node to continue splitting for the current node; 步骤5、若当前节点为“函数”节点,则停止当前分支的递归;Step 5. If the current node is a "function" node, stop the recursion of the current branch; 经过分割后,一条约束被转变为若干处理单元,各个处理单元不相交,且所有处理单元共同构成该约束。After splitting, a constraint is transformed into several processing units, each processing unit is disjoint, and all processing units together constitute the constraint. 2.根据权利要求1所述的并行化约束检测方法,其特征在于:所述基于处理单元的并行处理步骤,具体为:2. The parallelization constraint detection method according to claim 1, characterized in that: the parallel processing step based on the processing unit is specifically: 步骤1、计算所需线程数N,设从当前处理单元的父节点开始,到约束头结点的路径中的变量<v1,v2,…vn>对应的上下文信息集合为<S1,S2,…Sn>,各个上下文信息集合中的信息条数为<I1,I2,…In>,则N=I1×I2×…×In;若当前处理单元到头节点不包含任何变量,或者当前处理单元包含头节点,则N=1;Step 1. Calculate the number of threads N required, and set the context information set corresponding to the variables <v 1 ,v 2 ,…v n > in the path from the parent node of the current processing unit to the constraint head node as <S 1 ,S 2 ,…S n >, the number of pieces of information in each context information set is <I 1 ,I 2 ,…I n >, then N=I 1 ×I 2 ×…×I n ; if the current processing unit reaches the end The node does not contain any variables, or the current processing unit contains the head node, then N=1; 步骤2、生成N个GPU线程,线程id从0至N-1(该id由GPU自动分配);各个线程根据自身id独立计算其对应的赋值,设整数值Mi=j表示变量vi取其对应集合Si中第j条信息(0≤Mi<Ii);Step 2, generate N GPU threads, the thread id is from 0 to N-1 (the id is automatically assigned by the GPU); each thread independently calculates its corresponding assignment according to its own id, and the integer value M i =j indicates that the variable v i takes It corresponds to the jth piece of information in the set S i (0≤M i <I i ); 步骤3、各个线程将所计算出的赋值映射到处理单元中,产生各个线程需要处理的并行计算单元;各个线程独立处理各个并行计算单元;Step 3. Each thread maps the calculated assignment to a processing unit to generate a parallel computing unit that each thread needs to process; each thread independently processes each parallel computing unit; 所述的所有GPU线程并发执行,且互相之间不存在依赖关系。All the GPU threads described above are executed concurrently, and there is no dependency relationship between them. 3.根据权利要求2所述的并行化约束检测方法,其特征在于:步骤2中Mi的取值按以下步骤得出:3. the parallelization constraint detection method according to claim 2, is characterized in that: the value of Mi among the step 2 draws by the following steps: i.子步骤i:设size=1,cur=n;i. Sub-step i: set size=1, cur=n; ii.子步骤ii:若cur≥1,转子步骤iii,否则结束;ii. Sub-step ii: if cur ≥ 1, go to rotor step iii, otherwise end; 子步骤iii:size=size*Icur;cur=cur-1,转子步骤ii。Sub-step iii: size=size*I cur ; cur=cur-1, rotor step ii. 4.根据权利要求1所述的并行化约束检测方法,其特征在于:所述存储策略步骤具体为:索引-结果池的二层次存储方法,包含三个部分:1)索引数组,包含两个域:结果的起始位置pos和长度len;2)结果数组;3)结果数组位置指针Pointer,它只能被互斥地写;4. The parallelization constraint detection method according to claim 1, characterized in that: the storage strategy step is specifically: a two-level storage method of an index-result pool, comprising three parts: 1) an index array, comprising two Domain: the starting position pos and the length len of the result; 2) the result array; 3) the result array position pointer Pointer, which can only be written mutually exclusive; 设n个线程产生的结果长度分别为l1,l2…li…ln,所述索引-结果池的二层次存储方法具体为:Assuming that the lengths of results generated by n threads are l 1 , l 2 ... l i ... l n , the two-level storage method of the index-result pool is specifically: 步骤1、各个线程根据其赋值计算各自当前节点在索引数组中的存储位置;Step 1. Each thread calculates the storage position of its current node in the index array according to its assigned value; 步骤2、各个线程互斥的获取结果数组位置指针,设第i个线程获取到该结果数组位置指针,则它将该节点的起始位置pos设为Pointer当前值,之后,Step 2. Each thread obtains the position pointer of the result array mutually exclusively. If the ith thread obtains the position pointer of the result array, it sets the starting position pos of the node as the current value of Pointer. After that, 步骤3、更新该结果数组位置指针的值:Pointernew=Pointerold+li,其中,Pointerold为位置指针初始值,li为第i个线程产生的结果长度;Step 3, update the value of the result array position pointer: Pointer new =Pointer old +l i , wherein, Pointer old is the initial value of the position pointer, and l i is the length of the result produced by the ith thread; 步骤4、更新后释放该结果数组位置指针供其他线程使用,并将结果填入结果数组,结果长度填入该节点的结果长度len中。Step 4. After updating, release the result array position pointer for use by other threads, and fill the result into the result array, and fill the result length into the result length len of the node.
CN201410358441.9A 2014-07-25 2014-07-25 Parallelism constraint detection method based on GPUs Active CN104123178B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410358441.9A CN104123178B (en) 2014-07-25 2014-07-25 Parallelism constraint detection method based on GPUs

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410358441.9A CN104123178B (en) 2014-07-25 2014-07-25 Parallelism constraint detection method based on GPUs

Publications (2)

Publication Number Publication Date
CN104123178A CN104123178A (en) 2014-10-29
CN104123178B true CN104123178B (en) 2017-05-17

Family

ID=51768602

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410358441.9A Active CN104123178B (en) 2014-07-25 2014-07-25 Parallelism constraint detection method based on GPUs

Country Status (1)

Country Link
CN (1) CN104123178B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016187043A1 (en) 2015-05-15 2016-11-24 Cox Automotive, Inc. Parallel processing for solution space partitions
CA3000456A1 (en) * 2015-10-05 2017-04-13 Cox Automotive, Inc. Parallel processing for solution space partitions

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7120903B2 (en) * 2001-09-26 2006-10-10 Nec Corporation Data processing apparatus and method for generating the data of an object program for a parallel operation apparatus
CN101593129A (en) * 2008-05-28 2009-12-02 国际商业机器公司 Triggering has the method and apparatus of execution of a plurality of incidents of restriction relation
CN103201764A (en) * 2010-11-12 2013-07-10 高通股份有限公司 Parallel image processing using multiple processors
CN103294775A (en) * 2013-05-10 2013-09-11 苏州祥益网络科技有限公司 Police service cloud image recognition vehicle management and control system based on geographic space-time constraint

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7120903B2 (en) * 2001-09-26 2006-10-10 Nec Corporation Data processing apparatus and method for generating the data of an object program for a parallel operation apparatus
CN101593129A (en) * 2008-05-28 2009-12-02 国际商业机器公司 Triggering has the method and apparatus of execution of a plurality of incidents of restriction relation
CN103201764A (en) * 2010-11-12 2013-07-10 高通股份有限公司 Parallel image processing using multiple processors
CN103294775A (en) * 2013-05-10 2013-09-11 苏州祥益网络科技有限公司 Police service cloud image recognition vehicle management and control system based on geographic space-time constraint

Also Published As

Publication number Publication date
CN104123178A (en) 2014-10-29

Similar Documents

Publication Publication Date Title
EP3678346A1 (en) Blockchain smart contract verification method and apparatus, and storage medium
JP5425541B2 (en) Method and apparatus for partitioning and sorting data sets on a multiprocessor system
CN103970506B (en) Vector instruction for realizing high efficiency synchronous and parallel reduction operation
US11030714B2 (en) Wide key hash table for a graphics processing unit
CN106446134A (en) Local multi-query optimization method based on predicate statutes and cost estimation
CN104504008B (en) A kind of Data Migration algorithm based on nested SQL to HBase
CN106682514A (en) System call sequence characteristic mode set generation method based on subgraph mining
CN104102699B (en) A Subgraph Retrieval Method in a Clustered Graph Collection
CN108038304B (en) Parallel acceleration method of lattice boltzmann method by utilizing time locality
CN104123178B (en) Parallelism constraint detection method based on GPUs
CN112214468B (en) Small file acceleration method, device, equipment and medium for distributed storage system
CN111429974B (en) Molecular dynamics simulation short-range force parallel optimization method on super computer platform
CN105701128A (en) Query statement optimization method and apparatus
CN109961516A (en) Surface acquisition method, device, and non-transitory computer-readable recording medium
CN104574275B (en) A kind of method for merging textures during modeling rendering
CN102158388B (en) Extremum route determination engine and method
CN108334563A (en) A kind of method and device of data query
CN107885797A (en) A kind of multi-mode figure matching process based on structural dependence
US8560805B1 (en) Efficient allocation of address space resources to bus devices
CN105045767B (en) A kind of method of immediate access and reading power system sparse matrix data
CN109766268B (en) A sequential assembly instruction program verification method and system
CN118035268A (en) Data processing method, device, electronic equipment and storage medium
CN113495901B (en) Quick retrieval method for variable-length data blocks
CN113468178B (en) Data partition loading method and device of association table
CN111061927A (en) Data processing method and device and electronic equipment

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20220126

Address after: 250014 No. 41-1 Qianfo Shandong Road, Jinan City, Shandong Province

Patentee after: SHANDONG CVIC SOFTWARE ENGINEERING Co.,Ltd.

Address before: 210093 No. 22, Hankou Road, Gulou District, Jiangsu, Nanjing

Patentee before: NANJING University