CN114722571B - CPU-GPU cooperative additive manufacturing parallel scanning line filling method - Google Patents

CPU-GPU cooperative additive manufacturing parallel scanning line filling method Download PDF

Info

Publication number
CN114722571B
CN114722571B CN202210230094.6A CN202210230094A CN114722571B CN 114722571 B CN114722571 B CN 114722571B CN 202210230094 A CN202210230094 A CN 202210230094A CN 114722571 B CN114722571 B CN 114722571B
Authority
CN
China
Prior art keywords
array
task
gpu
cpu
thread
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210230094.6A
Other languages
Chinese (zh)
Other versions
CN114722571A (en
Inventor
李慧贤
吴陈浩
马创新
彭理想
马良
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Northwestern Polytechnical University
Original Assignee
Northwestern Polytechnical University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Northwestern Polytechnical University filed Critical Northwestern Polytechnical University
Priority to CN202210230094.6A priority Critical patent/CN114722571B/en
Publication of CN114722571A publication Critical patent/CN114722571A/en
Application granted granted Critical
Publication of CN114722571B publication Critical patent/CN114722571B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5011Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5011Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
    • G06F9/5016Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals the resource being the memory
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/5038Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the execution order of a plurality of tasks, e.g. taking priority or time dependency constraints into consideration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/505Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the load
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/20Processor architectures; Processor configuration, e.g. pipelining
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/60Memory management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2113/00Details relating to the application field
    • G06F2113/10Additive manufacturing, e.g. 3D printing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2119/00Details relating to the type or aim of the analysis or the optimisation
    • G06F2119/18Manufacturability analysis or optimisation for manufacturability
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/50Indexing scheme relating to G06F9/50
    • G06F2209/5018Thread allocation
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/02Total factory control, e.g. smart factories, flexible manufacturing systems [FMS] or integrated manufacturing systems [IMS]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Geometry (AREA)
  • Computer Hardware Design (AREA)
  • Evolutionary Computation (AREA)
  • Computer Graphics (AREA)
  • Image Processing (AREA)

Abstract

The invention provides a CPU-GPU collaborative additive manufacturing parallel scanning line filling method, which adopts a parallelization mode of algorithms, utilizes the powerful parallel computing capacity of the GPU, combines the ultra-strong multitask coordination and comprehensive scheduling capacity of a multi-core CPU, improves the traditional scanning line filling algorithm, fully utilizes the computing resources of a hardware platform, and optimally accelerates the most time-consuming intersection ordering computing process of the computing process through a C++ multithread library, a CUDA library and the like.

Description

CPU-GPU cooperative additive manufacturing parallel scanning line filling method
Technical Field
The invention relates to the technical field of additive manufacturing, in particular to a CPU-GPU collaborative additive manufacturing parallel scanning line filling method.
Background
Additive manufacturing is an emerging fabrication technology that stacks layer by layer, with the data processing algorithm of the part model being the focus of the fabrication technology. Through a model data processing algorithm, the CAD model of the part can be converted into a Gcode code which can be identified by an additive manufacturing machine, so that the machine is controlled to complete the additive manufacturing process.
The most important and time-consuming of the various steps in the model data processing in the additive manufacturing process is the path filling process. The document "general scan line polygon filling algorithm [ J ]. Computer engineering and application, 2000 (02): 57-59" discloses a scan line filling algorithm. The algorithm is now an algorithm that is widely used in additive manufacturing path filling processes. The scan line filling algorithm uses a group of equidistant parallel lines to calculate the intersection point with the section profile line of the slice, only the parallel line segments of the solid part in the section profile are reserved, and finally the line segments are changed into filling paths in an end-to-end mode.
In the field of additive manufacturing, a part model to be manufactured is finer and finer, the CAD model data amount is larger and larger, if the model is processed by using an original algorithm, the processing efficiency of the model is lower, and for the CAD model represented by a GB-level STL file, at least more than a few hours are needed to process the model into a Gcode which can be understood by a printer, so that the requirement of industrial production cannot be met.
Disclosure of Invention
In order to improve the calculation efficiency of the traditional serial scanning line algorithm, the invention provides a CPU-GPU collaborative additive manufacturing parallel filling method. The method adopts a parallelization mode of the algorithm, utilizes the powerful parallel computing capability of the GPU, combines the ultra-strong multitask coordination and comprehensive scheduling capability of the multi-core CPU, improves the traditional scan line filling algorithm, fully utilizes the computing resources of a hardware platform, and optimally accelerates the most time-consuming intersection sequencing computing process of the computing process through a C++ multi-thread library, a CUDA library and the like.
The technical scheme of the invention is as follows:
the CPU-GPU collaborative additive manufacturing parallel scanning line filling method is characterized by comprising the following steps of: the method comprises the following steps:
step 1: obtaining layering data of a three-dimensional model of a part to be additively manufactured, calculating the Task quantity lineCNTs of each layer of plies of the model, taking the Task quantity of each layer of plies as an element value, and storing the element value in a Task array;
Step 2: dividing the Task number into (m-n-2) +2n parts to obtain segmented points, wherein the m-n-2 parts are processed by a CPU, and the 2n parts are processed by a GPU; m is the number of CPU cores, n is the number of GPUs; dividing the model layer sheet set into (m-n-2) +2n parts according to the segmentation points, and determining the task quantity of each part layer sheet set;
step 3: the main thread creates m threads for task processing, and marks the m threads as CPU task threads and GPU task threads according to the CPU core number and the GPU number respectively, m-n is the number of the CPU task threads, n is the number of the GPU threads, the parallel acceleration ratio of the CPU is set to be m-n-2, and the acceleration ratio of the GPU is set to be 2; creating a task queue, and adding the ply set of each part into the task queue as a task; the main thread also establishes a result_CPU array and a result_GPU array for storing calculation results of the CPU and the GPU, wherein the calculation results comprise layer numbers and paths;
step 4: collaborative filling is performed according to the task queues:
if the current task queue is empty, executing the step 7; otherwise, each task thread sequentially executes the following steps:
step 4.1: judging whether the current thread is a CPU task thread, if so, sequentially taking out a task from a task queue, reading the data of the task from a hard disk to a memory, then applying for CPU computing resources, and if the CPU task thread successfully acquires the CPU computing resources, executing step 5; otherwise, the CPU task thread is blocked until the task thread is scheduled; if the current thread is the GPU task thread, executing the step 4.2;
Step 4.2: the GPU task thread takes out the first 2n tasks with the largest task quantity from all the tasks and calls the GPU kernel for processing; after the task is fetched, the data of the 2n tasks are read from the hard disk to the memory; then applying for GPU computing resources, if the GPU computing resources are successfully obtained, copying the data from the memory to the video memory, and executing the step 6; otherwise, the task thread is blocked until the task thread is scheduled;
step 5: CPU parallel filling is carried out by the following steps:
step 5.1: if the CPU parallel filling is executed for the first time, the outline data of the initial layer sheet of the task is read; otherwise, reading contour data of the next layer of slice;
step 5.2: determining minimum values and maximum values of the contour in the x direction and the y direction according to contour data of the lamellar, recording the minimum values and the maximum values as boundaryMin and boundaryMax respectively, wherein boundaryMin (xMin, yMin) represents the minimum values of the contour boundary in the x direction and the y direction, boundaryMax (xMax, yMax) represents the maximum values of the contour boundary in the x direction and the y direction, and calculating scanning line serial numbers sMin and sMax corresponding to boundaryMin and boundaryMax respectively;
step 5.3: determining the number of scanning lines SubS intersecting the current lamellar contour according to the scanning line sequence numbers sMin and sMax, dividing the SubS scanning lines into w groups, and calculating the intersection point of one group of scanning lines and the contour by each thread by using a multithreading method;
Establishing a cutLine array for each scanning line, wherein the cutLine array is used for storing intersection point coordinates, and elements in the array consist of x values and y values;
establishing a layerResult array named layerresult_idx i I=1, 2,3 …, i is a positive integer, idx i Representing the layer number, a layerrest array is a two-dimensional array, each element of the array stores a path endpoint, each column of the array stores a path, and a path comprises two endpoints;
step 5.4: each thread concurrently traverses all edges on the outline, obtains the intersection point of a group of scanning lines corresponding to the thread, and stores the intersection point into a corresponding cutLine array according to the scanning line number; after the thread calculates the intersection points of the divided group of scanning lines and the outline, arranging the elements in the cutLine array corresponding to each scanning line according to the sequence from the small y coordinate value to the large y coordinate value, and forming a path by parity pairing the arranged intersection points according to the sequence, and storing the path in layerresult_idx i In the array;
step 5.5: after all the w threads are calculated, the main thread will layerreult_idx i Merging and sequencing paths stored in the array according to the sequence from small to large of x coordinate values to obtain all paths of the profile group; and layeresult_idx i All paths and layer numbers of the array are stored in a result_CPU array;
step 5.6: the main thread judges whether the layer of the sheet is the last layer, if so, the step 5.7 is executed; otherwise, returning to the execution step 5.1;
step 5.7: judging whether the task queue is empty by the main thread, and executing the step 7 when the task queue is empty; otherwise, executing the step 4;
step 6: GPU parallel filling is carried out through the following steps:
step 6.1: opening corresponding GPU threads according to the distributed tasks, wherein each GPU thread is responsible for processing a contour; calculating the intersection times of the profile and the scanning line in each thread; summing the intersecting times of all the contours and the scanning lines to obtain an intersecting point total number total S, and opening up a cutPoint array with the size of total S on a GPU (graphic processing unit) display memory, wherein the cutPoint array is used for storing intersecting points of the contours and the scanning lines, and each element of the cutPoint array consists of two values, namely a contour sequence number and a hash value; calculating a hash value of the intersection point of the profile and the scanning line, and then storing the profile serial number and the hash value in a cutPoint array; after the intersection point is obtained, firstly sequencing the cutPoint arrays according to the sequence from small to large of the profile sequence numbers, sequencing the cutPoint arrays according to the sequence from small to large of the hash values when the profile sequence numbers are the same, and restoring the elements in the cutPoint arrays to the original coordinates of the intersection point after sequencing;
Step 6.2: copying the cutPoint array into a host side memory, establishing 2n sliceResult arrays in the host side memory, and setting the idx of the model contained in 2n tasks calculated by the GPU side i The layers are named sliceResult_idx i The path filling information is used for storing path filling information of the corresponding ply; the sliceResult array is a two-dimensional array, each element in the array consists of an x value and a y value, each column comprises two elements which respectively correspond to two endpoints of a path, and all columns in the array jointly form path filling information of one layer sheet; storing the intersection point coordinates in the cutPoint array in the corresponding sliceResult_idx i After the array is formed, sliceResult_idx is performed i Storing all paths and layer numbers of the array into a result_GPU array; finally judging whether the task queue is empty, and executing the step 7 when the task queue is empty; otherwise, executing the step 4;
step 7: filling is accomplished by:
step 7.1: establishing a Result array as a two-dimensional array, wherein each element of the array stores a path, namely x and y coordinate values of two endpoints, and each column of the array is used for storing all paths of one slice;
step 7.2: merging the result_CPU array and the result_GPU array; storing paths in the result_CPU array and the result_GPU array to corresponding positions of the Result array according to the sequence from small to large by using the layer sequence numbers recorded in the result_CPU array and the result_GPU;
Step 7.3: after merging, creating a path file; and copying the data of the Result array into the path file in sequence, and storing the data as a final calculation Result.
Further, in step 1, the task amount lineCnts is the number of scan lines intersecting the slice contour.
Further, in step 1, the task amount linecut is calculated by the following steps:
step 1.1: calculating the boundary value of the lamellar contour;
step 1.2: and calculating the number of scanning lines intersected with the lamellar contour according to the boundary value of the contour and the scanning line spacing.
In step 2, a dynamic programming algorithm is used to divide the Task array into (m-n-2) +2n parts.
Further, in step 2, a dynamic programming algorithm is adopted to segment the Task array, and the specific process of determining the segment points is as follows:
step 2.1: the Problem of dividing the Task array is marked as Problem (Task, layers, k), which means that the Task array with the length of Layers is divided into k parts, wherein the Task array contains Layers elements; the sub-Problem is marked as Problem (Task, i, j), which means that the Task array is divided into j parts, wherein the Task array contains i elements; layers, k, i, j are all positive integers;
Step 2.2: establishing a matrix opt with the size of Layers' k, and recording the state of a neutron problem in the solving process: the i elements are divided into j groups of optimal segmentation values; a matrix dp of size Layers k is created for recording the segmentation position when i elements are segmented into j parts: corresponding subscripts of the Task array; the optimal value of the sub-Problem Problem (Task, i, j) is denoted opt [ i ] [ j ], representing the optimal segmentation value that divides i elements into j groups;
step 2.3: initializing boundary conditions:
opt[i][j]=opt[i][i],if j>i
in which opt [ i ]][1]Optimum value, a, of the sub-Problem Problem (Task, i, 1) z Is the z-th value of the array, i and z are positive integers, opt [ i ]][j]Optimum value of the sub-Problem Problem (Task, i, j), opt [ i ]][i]The optimum value of the sub-Problem (Task, i, i), i and j are positive integers; thus, when j=1, i=1, 2, …, opt [ i ] at Layers][1]And j=1, 2, …, opt [1 ] at k when i=1][j]Is a value of (2); initializing all elements of the dp matrix to be-1;
step 2.4: traversing and calculating i=2, … and Layers by using a recurrence relation, dividing a number component into an optimal value opt [ i ] [ j ] when j (j=2, …, k) segments, and recording a segmentation point dp [ i ] [ j ] at the moment; the recurrence relation is as follows:
when 1<j =i, there is a recurrence relation:
wherein opt [ i ] ][j]Optimal value of Problem (Task, i, j), opt [ x ]][j-1]Is the optimal value, a, of the sub-Problem Problem (Task, x, j-1) p The p-th value of the array, m, x and p are positive integers;
step 2.5: a one-dimensional array index_seg is established for storing all the segmentation points, and each element of the array index_seg stores one segmentation point:
from dp [ Layers ] in a dp matrix based on stored segment point locations of the dp matrix][k]Starting to reversely search all the segmentation points of the Task array in sequence to obtain a required segmentation point set; wherein dp [ Layers ]][k]When the Layers elements are divided into k parts, the division point n between the k-1 part and the k part is stored k-1 ;dp[n k-1 ][k-1]Is stored with n k-1 K-2 when the elements are divided into k-1 partsPartition Point n between part and k-1 st part k-2 The method comprises the steps of carrying out a first treatment on the surface of the Similarly, all segmentation points are retrieved and stored in an index_seg array, where n k-1 And k is a positive integer.
Further, in step 2, after all the segmentation points are obtained, the ply set is divided into (m-n-2) +2n parts by the following procedure:
step 2.6.1: creating a task_seq array for storing (m-n-2) +2n partial parameters of the ply set after segmentation:
the task_seq array is a two-dimensional array, each element of the array is a positive integer, the first two elements of each column of the array are used for storing the initial layer number and the final layer number of the partial layer sheet set, and the third element of the column is used for storing the Task amount of the partial layer sheet set;
Step 2.6.2: storing the start layer number and the end layer number of the (m-n-2) +2n partial layer sheet sets in corresponding positions of the task_seq array; calculating the Task quantity of the (m-n-2) +2n partial ply sets, and storing the Task quantity into the corresponding position of the task_seq array;
step 2.6.3: and sequencing the task_seq array from large to small according to the total Task amount of each part of the layer sheet set.
Further, in step 5.3, the scan lines are allocated at intervals to balance the task amount between the threads.
Advantageous effects
The beneficial effects of the invention are as follows: the method adopts CPU multithreading and GPU multithreading to divide the scanning lines into groups and distribute the scanning lines to different threads for parallel execution, so as to accelerate the algorithm calculation efficiency. Meanwhile, the filling layer to be filled is reasonably distributed to the CPU and the GPU for collaborative calculation by carrying out load balancing on the overall filling task of the model, and the calculation resources of the CPU-GPU heterogeneous platform are fully utilized. Compared with the original serial algorithm, the CPU-GPU collaborative parallel algorithm has higher filling efficiency, the calculation time of path information is obviously reduced, and the additive manufacturing efficiency is improved.
Additional aspects and advantages of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.
Drawings
The foregoing and/or additional aspects and advantages of the invention will become apparent and may be better understood from the following description of embodiments taken in conjunction with the accompanying drawings in which:
fig. 1: collaborative filling of a flow chart;
fig. 2: scan line allocation schematic.
Detailed Description
The invention provides a CPU-GPU collaborative additive manufacturing parallel filling method, which adopts a parallelization mode of algorithms, utilizes the powerful parallel computing capacity of the GPU, combines the ultra-strong multitask coordination and comprehensive scheduling capacity of a multi-core CPU, improves the traditional scanning line filling algorithm, fully utilizes the computing resources of a hardware platform, and optimally accelerates the most time-consuming intersection ordering computing process of the computing process through a C++ multithread library, a CUDA library and the like.
The method comprises the following specific steps:
step one, data preprocessing
After layering operation is carried out on the part three-dimensional model to be subjected to additive manufacturing, the three-dimensional model is divided into a plurality of Layers from low to high, the Layers are set to be Layers, and the Layers is a positive integer. Firstly, the task quantity of each layer of the layer sheet of the model needs to be calculated so as to carry out task quantity distribution subsequently. Since the run time of the scan line path filling algorithm is related to the number of line groups intersecting the ply contours, the longer the scan lines intersect the contours, the greater the run time of the algorithm, the number of scan lines intersecting the contours and the algorithm run time can be approximated as linear relationships. Therefore, the number linednts of scan lines intersecting the slice contour is regarded as an evaluation index of the task amount of the path filling of this layer, and linednts is a positive integer. The pretreatment process is as follows:
(1) A Task array is established, wherein the array is a one-dimensional array, and each element is a positive integer. For storing the amount of tasks for each ply. And reading the first layer slice data, and setting the spacing linespace between the scanning lines, wherein linespace is a pure decimal and a band decimal.
(2) The boundary values xMin, yMin, xMax, yMax of the layer profile are calculated. These four boundary values are integers. Where xMin represents the minimum value of the profile in the x-direction, yMin represents the minimum value of the profile in the y-direction, xMax represents the maximum value of the profile in the x-direction, and yMax represents the maximum value of the profile in the y-direction. The calculation process of the contour boundary value is as follows:
a) Initializing xMin, wherein yMin is the largest positive integer of the type of int in the C++ language; initializing xMax, wherein yMax is the smallest positive integer of the type of int in C++ language. Reading a first endpoint P of the contour ply 1
b) The x and y coordinate values of the endpoint are compared to xMin, yMin, xMax, yMax. If the x coordinate value of the endpoint is smaller than xMin, changing the value of xMin into the x coordinate value of the endpoint; if the x coordinate value of the endpoint is greater than xMax, changing the value of xMax into the x coordinate value of the endpoint; if the y coordinate value of the endpoint is smaller than yMin, changing the value of yMin into the y coordinate value of the endpoint; if the y coordinate value of the endpoint is greater than yMax, changing the value of yMax into the y coordinate value of the endpoint;
c) Judging whether the contour end point is the last end point, if yes, finishing contour boundary value calculation to obtain contour boundary values xMin, yMin, xMax and yMax; if not, the next endpoint P of the profile is read i Continuing to execute the step (b), wherein i is a positive integer;
(3) According to the boundary value of the contour and the distance between the scanning lines, the number of the scanning lines intersecting the contour of the layer is calculated by using the formula (1), and the lineCNTs are stored in a Task array after the calculation is completed.
lineCnts=(xMax-xMin)/linesapce (1)
In the formula, lineCNTs represents the number of scan lines intersecting the layer profile, xMax represents the maximum value of the profile in the x-direction, xMin represents the minimum value of the profile in the x-direction, and linespace represents the interval between the scan lines.
(4) Judging whether the layer of the sheet is the last layer of sheet, if so, ending the Task amount calculation, wherein the Task amounts of all layers of sheets are stored in a Task array; if not, then; then the next slice of data is read and step (2) is performed.
Step two, determining the number of the divided segments of the Task array
And acquiring the CPU core number m and the GPU number n, wherein m and n are positive integers. M threads are created later for data processing, wherein n threads perform GPU-related operations, and m-n threads are used for processing CPU tasks. The acceleration ratio of m-n threads when executing the parallel algorithm is not up to m-n, the acceleration ratio is rounded, and the overall acceleration ratio of CPU parallel is set to m-n-2. And secondly, according to the completed GPU parallel filling experiment, the GPU parallel filling algorithm has a good acceleration effect when the ratio of the intersection times of the scanning line set and the contour to the line segment number is more than or equal to 1.8, the acceleration ratio can be basically maintained at about 1.6, and the acceleration effect is increased along with the increase of the intersection times and the line segment. To better perform the task of distribution, we set the lowest speed ratio of the monolithic GPU to 2.
According to the overall speed ratio m-n-2 of the CPU and the overall speed ratio 2n of the GPU, the Task number is divided into (m-n-2) +2n parts, wherein the m-n-2 parts are processed by the CPU, and the 2n parts are processed by the GPU. This is to avoid frequent copying by the GPU, and to transfer data into the GPU for processing as once as possible.
Step three, dividing the Task array
In order to fully exert the computing power of the multi-core CPU and the GPU, all layers obtained by layering the model are required to be divided integrally, and load balancing is performed. As can be seen from the second step, the Task component is required to be divided into (m-n-2) +2n parts. The principle of data segmentation is to make CPU and GPU process tasks simultaneously as much as possible, namely, the Task number is divided into (m-n-2) +2n, and the Task quantity of each part is equal as much as possible on the premise of not changing the sequence of the array elements. The Task array is segmented by adopting a dynamic programming algorithm.
(1) This Problem is denoted as Problem (Task, layers, k), meaning that a Task array of length Layers is divided into k parts, where the Task array contains Layers elements. The sub-Problem is denoted Problem (Task, i, j), meaning that the Task array is divided into j parts, where the Task array contains i elements. Wherein, layers, k, i, j are all positive integers.
(2) A matrix opt with the size of Layers' k is established, and the state of the sub-problem in the solving process is recorded, namely, the optimal dividing value (the maximum value of the sum of elements in each group) of i elements into j groups is recorded. A matrix dp of size Layers k is used to record where the split is when i elements are split into j parts, i.e. the array index of the Task array, i.e. the layer number of the ply. The optimal value of the sub-Problem (Task, i, j) can be denoted opt [ i ] [ j ], representing the optimal split of i elements into j groups, i.e. the maximum of the sum of the j groups of elements.
(3) Initializing the boundary conditions, since there is only one division of the Task array containing i elements into a segment, there is obviously a boundary condition:
in which opt [ i ]][1]Optimum value, a, of the sub-Problem Problem (Task, i, 1) z For the z-th value of the array, i and z are positive integers.
Secondly, the Task array containing i elements can be divided into i segments at most, so there is a boundary condition:
opt[i][j]=opt[i][i],if j>i (3)
wherein opt [ i ] [ j ] is the optimal value of the sub-Problem Problem (Task, i, j), opt [ i ] [ i ] is the optimal value of the sub-Problem Problem (Task, i, i), and i and j are positive integers.
From formulas (2) and (3), we can calculate the values of opt [ i ] [1] at j=1, i=1, 2, …, and Layers. And values of opt [1] [ j ] at i=1, j=1, 2, …, k. Initializing all elements of a dp matrix to be-1;
(4) When i=2, … and Layers are calculated through traversal by using a recurrence relation, the optimal value opt [ i ] [ j ] when the number group is divided into j (j=2, … and k) segments is recorded, and the segmentation point dp [ i ] [ j ] at the moment is recorded. The recurrence relation is as follows:
when 1<j =i, there is a recurrence relation:
wherein opt [ i ]][j]Optimal value opt [ x ] for the sub-Problem Problem (Task, i, j)][j-1]Is the optimal value, a, of the sub-Problem Problem (Task, x, j-1) p The p-th value of the array, m, x and p are positive integers.
The specific calculation process is as follows:
(a) If the first execution is performed, initializing i=2; if not, executing i=i+1;
(b) If the first execution is performed, initializing j=2; if not, then executing j=j+1;
(c) Setting opt [ i ] [ j ] as the maximum value of the type of the C++ language int;
(d) Initializing x=1;
(e) Calculation ofS, S is a positive integer;
(f) If S is smaller than opt [ i ] [ j ], changing the value of opt [ i ] [ j ] into S, and assigning the value of dp [ i ] [ j ] into x; if S is greater than opt [ i ] [ j ] and x+1 is less than or equal to i-1, then adding 1 to x and executing step (e);
(g) If x+1 is greater than i-1, returning to step (b);
(h) If j+1 is greater than k, returning to step (a);
(i) If i+1 is greater than Layers, ending the calculation process, and storing all intermediate states into an opt matrix and a dp matrix;
(5) And outputting the segmentation points. A one-dimensional array index_seg is built for storing all the segmentation points, one segmentation point being stored for each element of the array. According to the segment point position stored in the dp matrix, the following dp [ Layers ] can be sequentially carried out in the dp matrix][k]And starting to reversely search out all the segmentation points of the Task array to obtain a required segmentation point set. Wherein dp [ Layers ]][k]When the Layers elements are divided into k parts, the division point n between the k-1 part and the k part is stored k-1 。dp[n k-1 ][k-1]Is stored with n k-1 Dividing the element into k-1 parts, dividing the element into k-2 parts and k-1 parts into n k-2 . By analogy, all can be usedIs stored in the index_seg array. Wherein n is k-1 And k is a positive integer.
The specific process is as follows:
(a) Initializing idx=k, wherein idx is a positive integer;
(b) First from dp [ dyes ]][idx]Obtaining the (k-1) th segmentation point n k-1 Will n k-1 Storing the data into an index_seg array;
(c) If idx-1 is greater than 0, then performing step (d) and performing idx=idx-1; if idx-1 is less than or equal to 0, executing step (e), wherein all segmentation points are searched out and stored in an index_seg array;
(d) From dp [ n ] idx ][idx]Obtaining the idx-th segmentation point n idx Will n idx Storing the data into an index_seg array, and returning to the step (c);
(e) Ordering the data of the index_seg array from small to large;
(6) After obtaining all the segmentation points, the Task array can be divided into (m-n-2) +2n parts according to the index_seg array, and all the segmentation points of the Task array are stored in the index_seg array. The index_seg array stores, from small to large, each segment point, i.e., the start layer number and the end layer number of each partial ply set.
(a) A task_seq array is created for storing parameters of the (m-n-2) +2n parts of the ply set. The task_seq array is a two-dimensional array, each element of the array is a positive integer, the first two elements of each column of the array are used for storing the initial layer number and the final layer number of the partial layer sheet set, and the third element of the column is used for storing the Task amount of the partial layer sheet set.
(b) Storing the start layer number and the end layer number of the (m-n-2) +2n partial layer sheet sets in corresponding positions of the task_seq array;
(c) Calculating the total Task amount of the (m-n-2) +2n partial layer sheet sets, and storing the total Task amount into the corresponding position of the task_seq array;
(d) Sequencing the task_seq array from large to small according to the total amount of tasks of each part of the layer sheet set, wherein the part with the largest Task is arranged at the forefront, and the part with the smallest Task is arranged at the rearmost;
(7) And creating m threads by the main thread to perform task processing, and marking the m threads as CPU task threads and GPU task threads according to the CPU core number and the GPU number. Then creating a Task queue, and regarding each part of the layer sheet set as a Task, namely adding the tasks into the Task queue from large to small according to the element sequence of the task_seq array.
Meanwhile, the main thread establishes a result_CPU array and a result_GPU array which are used for storing calculation results of the CPU and the GPU, the result_CPU array and the result_GPU array are two-dimensional arrays, the first element of each column of the array stores a layer number, and each element of the subsequent column stores a path, namely x and y coordinate values of two endpoints. Corresponding to all paths of each column of the array for storing one ply.
Step four, collaborative filling
If the current task queue is empty, executing a step seven; otherwise, each thread sequentially executes the following steps:
(1) Judging whether the current thread is a CPU task thread, if so, taking out a task from a task queue, reading the data of the task from a hard disk to a memory, then applying for CPU computing resources, and if the CPU task thread successfully acquires the CPU computing resources, executing the fifth step; otherwise, the CPU task thread is blocked until it is scheduled by the computer operating system. And if the current thread is the GPU task thread, executing (2).
(2) And the GPU task thread takes out the largest 2n tasks from all the tasks and calls the GPU-kernel to process, namely the first 2n tasks of the task queue. After a task is acquired, the data of the task is read from a hard disk to a memory; then applying for GPU computing resources, if the GPU computing resources are successfully obtained, copying the data from the memory to the video memory, and executing the step six; otherwise, the thread is blocked until it is scheduled by the computer operating system.
The specific flow is shown in figure 1.
Step five, CPU parallel filling
(1) If the task is executed for the first time, the contour data of the initial layer sheet of the task is read; otherwise, reading contour data of the next layer of slice;
(2) The minimum and maximum values of the profile in the x-direction and the y-direction are determined from the profile data, and recorded as boundaryMin (xMin, yMin) and boundaryMax (xMax, yMax), respectively. Then, scan line numbers sMin and sMax corresponding to boundaryMin and boundaryMax are calculated according to equation 5.
s=(p.x+1)/lineSpacing-1 (5)
Where s is the calculated scan line number, p.x is the x-coordinate of the point to be calculated, and linespace is the distance between two adjacent scan lines.
(3) The difference between sMax and sMin is the number of scanning lines SubS intersecting the current slice contour, the SubS scanning lines are divided into w groups, C++ multithreading is utilized, and each thread calculates the intersection point of a group of scanning lines and the contour. A cutLine array is established for each scanning line, the cutLine array is used for storing intersection coordinates, and elements in the array are composed of x values and y values. Establishing a layerResult array named layerresult_idx i (i=1, 2,3 …), i being a positive integer. idx i The layerrResult array is a two-dimensional array, each element of the array stores a path endpoint, each column of the array stores a path, and a path contains two endpoints. Because some contours are complex, each scanning line has different intersecting times with the contours, even has overlarge phase difference, and the task quantity phase difference among the threads is larger, so that the load is unbalanced, and the scanning lines need to be distributed at intervals in a branching way as shown in fig. 2 so as to balance the task quantity among the threads.
(4) Each thread concurrently traverses all edges on the profile group to obtain intersection points of the edges and the group of scanning lines, and stores the intersection points into the corresponding cutLine array according to the scanning line numbers; after the thread calculates the intersection points of the divided group of scanning lines and the outline, arranging the elements in the cutLine array corresponding to each scanning line according to the sequence from the small y coordinate value to the large y coordinate value, and forming a path by parity pairing the arranged intersection points according to the sequence, and storing the path in layerresult_idx i In the array.
(5) After all the w threads are calculated, the main thread will layerResult_idx i And merging and sequencing paths stored in the array according to the sequence from small to large of x coordinate values to obtain all paths of the profile group. And layeresult_idx i All paths and layer numbers of the array are stored in a result_CPU array;
(6) Judging whether the layer of the sheet is the last layer by the main thread, and if so, executing the step (7); otherwise, returning to the execution step (1)
(7) Judging whether the task queue is empty by the main thread, and executing a step seven when the task queue is empty; otherwise, judging the execution step IV in the execution task.
Step six, GPU parallel filling
Firstly, a corresponding GPU thread is opened up according to tasks distributed by a host computer, and each GPU thread is responsible for processing a contour. Then, the intersection times of the contour and the scanning line are calculated in each thread, and the specific calculation process is as follows:
(1) Let two adjacent vertexes on the contour be P 1 And P 2 The two points are connected to form an edge on the outline;
(2) Finding the distance P in the opposite direction of the x-axis 1 The closest scan line is pointed, the scan line number s 1 The calculation formula is shown as 5, and the distance P is calculated along the opposite direction of the x-axis in the same way 2 Serial number s of the closest scan line 2
(3)P 1 And P 2 The intersection times of the composed edge and the scanning line are s 1 Sum s 2 Absolute value of the difference;
(4) And accumulating and summing the intersecting times of all edges on the contour and the scanning line to obtain the intersecting times of the contour and the scanning line.
Summing the intersecting times of all the contours and the scanning lines to obtain an intersecting point total number total S, and opening up a cutPoint array with the size of total S on a GPU video memory, wherein the cutPoint array is used for storing the intersecting point of the contours and the scanning lines, and each element of the cutPoint array consists of two values, namely a contour sequence number and a hash value. Next, a hash value of the intersection of the contour and the scan line is calculated according to equation 6, and then the contour sequence number and the hash value are stored in the cutPoint array.
Hash(P)=2 32 p.x+p.y (6)
Where Hash (P) represents the Hash value calculated from the intersection point coordinates, P represents the intersection point, p.x represents the x coordinate of the intersection point, and p.y represents the y coordinate of the intersection point.
After the intersection point is obtained, firstly sequencing the cutPoint array according to the sequence from the small to the large of the serial numbers of the profile groups, and then restoring the elements in the cutPoint array to the original coordinates of the intersection point after sequencing according to the sequence from the small to the large of the hash value when the serial numbers of the profile groups are the same, wherein a calculation formula is shown as formula 7.
Where Hash (p) represents a Hash value, p represents the intersection point, p.x represents the x-coordinate of the intersection point, p.y represents the y-coordinate of the intersection point, and "a mod b" represents the remainder of dividing a by b.
Then copying the cutPoint array into a host memory, establishing 2n sliceResult arrays on the host, and setting the idx of the model contained in 2n tasks calculated by the GPU i (i=1, …, o) plies, then the arrays are named sliceresult_idx, respectively i And the path filling information storage module is used for storing path filling information of the corresponding ply. The sliceResult array is a two-dimensional array, each element in the array consists of an x value and a y value, each column comprises two elements which respectively correspond to two endpoints of a path, and all columns in the array jointly form path filling information of one slice. Storing the intersection point coordinates in the cutPoint array in the corresponding sliceResult_idx i After the array is formed, sliceResult_idx is performed i All paths and layer numbers of the array are stored in the result_gpu array. Finally judging whether the task queue is empty, and executing the seventh step when the task queue is empty; otherwise, executing the fourth step.
Step seven, completing filling
(1) And establishing a Result array as a two-dimensional array, wherein each element of the array stores a path, namely x and y coordinate values of two endpoints, and each column of the array is used for storing all paths of one slice.
(2) And merging the result_CPU array and the result_GPU array. And storing paths in the result_CPU array and the result_GPU array to corresponding positions of the Result array according to the sequence from small to large by using the layer sequence numbers recorded in the result_CPU array and the result_GPU.
(3) After merging, creating a path file on the hard disk. And copying the data of the Result array into a file of the hard disk in sequence, and storing the data as a final calculation Result.
Wherein, the result_cpu array and the result_gpu array are two-dimensional arrays, the first element of each column of the array stores a layer number, and each element of the subsequent column stores a path, that is, x and y coordinate values of two endpoints of the path. Corresponding to all paths of each column of the array for storing one ply.
The following detailed description of embodiments of the invention is exemplary and intended to be illustrative of the invention and not to be construed as limiting the invention.
Step one, data preprocessing
After layering operation is carried out on the three-dimensional model, the three-dimensional model is divided into a plurality of Layers from low to high, the Layers are Layers, and the number of Layers is assumed to be 5000 Layers.
(1) And establishing a Task array, wherein the array is a one-dimensional array, each element is a positive integer, and the length of the array is 5000. For storing the amount of tasks for each ply. The first slice data is read, where the spacing linegap between the scan lines is set to 0.1mm.
(2) The boundary values xMin, yMin, xMax, yMax of the layer profile are calculated. The calculation process is as follows:
a) Initializing xMin, wherein yMin is the largest positive integer of the type of int in the C++ language; initializing xMax, wherein yMax is the smallest positive integer of the type of int in C++ language. The first end of the contour ply is read.
b) The x and y coordinate values of the endpoint are compared to xMin, yMin, xMax, yMax. If the x coordinate value of the endpoint is smaller than xMin, changing the value of xMin into the x coordinate value of the endpoint; if the x coordinate value of the endpoint is greater than xMax, changing the value of xMax into the x coordinate value of the endpoint; if the y coordinate value of the endpoint is smaller than yMin, changing the value of yMin into the y coordinate value of the endpoint; if the y coordinate value of the endpoint is smaller than yMax, changing the value of yMax into the y coordinate value of the endpoint;
c) Judging whether the contour end point is the last end point, if yes, finishing contour boundary value calculation to obtain contour boundary values xMin, yMin, xMax and yMax; if not, reading the next endpoint of the contour, and continuing to execute the step (b);
(3) And calculating the number of scanning lines intersecting the contour of the layer by using the lineCNTs= (xMax-xMin)/linemap according to the boundary value of the contour and the distance between the scanning lines, and storing the lineCNTs into the corresponding position of the Task array after the calculation is completed. Wherein lineCNTs represents the number of scan lines intersecting the layer profile, xMax represents the maximum value of the profile in the x-direction, xMin represents the minimum value of the profile in the x-direction, and linespace represents the spacing between the scan lines.
(4) Judging whether the layer number of the layer sheet is 5000, if yes, ending Task amount calculation, wherein the Task amounts of all the layer sheets are calculated and stored in a Task array; if not, then; then the next slice of data is read and step (2) is performed.
Step two, determining the number of the divided segments of the Task array
The number of CPU cores 10 and the number of GPUs 1 are obtained. 10 threads are created for data processing, wherein 1 thread performs GPU-related operations, and 9 threads perform CPU task processing. Of the 9 threads, 1 thread executes the original serial stuffing algorithm, and the other threads execute the CPU parallel stuffing algorithm in pairs. The acceleration ratio of CPU parallelism is set to 7 and the parallel acceleration ratio of gpu is set to 2. And according to the CPU core number, determining the Task number group into 9 parts by the speed-up ratio of CPU parallelism and the speed-up ratio of GPU parallelism.
Step three, dividing the Task array
The problem to be solved in the step is to divide the Task array in the step one into 9 parts, so that the Task amounts of the parts are equal as much as possible. And adopting a dynamic programming method to carry out segmentation.
The solution process of dynamic programming to the problem is described herein by way of example in terms of 5000 layers.
(1) Firstly, establishing a matrix opt with the size of 5000 x 9 and a matrix dp with the size of 5000 x 9, which are used for storing state values in a solving process, wherein the opt matrix is used for storing the optimal segmentation values of i elements divided into j groups, namely the maximum value of the sum of elements of each group; the dp matrix is used to store where the i elements split when they split into j parts, i.e., the array index at the first i-1 part and the i-th part split.
(2) And initializing an opt matrix and a dp matrix. When j=1, i=1, 2, …, and opt [ i ] [1] at 5000. And calculating the value of opt [1] [ j ] at i=1, j=1, 2, …, 9. And setting all elements of the dp matrix to be-1, and finishing the initialization operation of the dp matrix.
(3) When i=2, …,5000 is calculated by traversal, the optimum value opt [ i ] [ j ] of the number component j (j=2, …, 9) is divided, and the segmentation point dp [ i ] [ j ] at that time is recorded. The specific calculation process is as follows:
(a) If the first execution is performed, initializing i=2; if not, executing i=i+1;
(b) If the first execution is performed, initializing j=2; if not, then executing j=j+1;
(c) Setting opt [ i ] [ j ] as the maximum value of the type of the C++ language int;
(d) Initializing x=1;
(e) Calculation ofS, S is a positive integer;
(f) If S is smaller than opt [ i ] [ j ], the value of opt [ i ] [ j ] is designated as S, and the value of dp [ i ] [ j ] is designated as x; if S is greater than opt [ i ] [ j ] and x+1 is less than or equal to i-1, then adding 1 to x and executing step (e);
(g) If x+1 is greater than i-1, returning to step (b);
(h) If j+1 is greater than 9, returning to step (a);
(i) If i+1 is greater than 5000, the calculation is completed, and the calculation process is ended. All intermediate states are stored in an opt matrix and a dp matrix;
(4) After the traversal calculation is completed, a one-dimensional array index_seg is established for storing all the segmentation points. Wherein dp [5000][9]Store the 50When the 00 elements are divided into 9 parts, the division point n between the 8 th part and the 9 th part 8 。dp[n 8 ][9]Is stored with n 8 The dividing point n between the 7 th and 8 th parts when the element is divided into 8 parts 7 . Similarly, the segmentation points of the Task array can be sequentially and reversely searched in the dp matrix, and the needed segmentation point set is obtained and stored in the index_seg array. Wherein n is 8 And n 7 Is a positive integer.
The specific process is as follows:
(a) Initializing idx=9, wherein idx is a positive integer;
(b) First from dp [5000][9]Obtain the 8 th segmentation point n 8 Will n 8 Storing the data into an index_seg array;
(c) If idx-1 is greater than 0, then performing step (d) and performing idx=idx-1; if idx-1 is less than or equal to 0, ending the calculation process, wherein all segmentation points are searched and stored in an index_seg array;
(d) From dp [ n ] idx ][idx]Obtaining the idx-th segmentation point n idx Will n idx Storing the data into an index_seg array, and returning to the step (c);
(5) After all the segmentation points are obtained, the index_seg array layer sheet set can be divided into 9 parts according to the index_seg array layer sheet set.
The specific process is as follows:
(a) A task_seq array is created for storing k parts of the ply set. The task_seq array is a two-dimensional array, each element of the array is a positive integer, the first two elements of each column of the array are used for storing the initial layer number and the final layer number of the partial layer sheet set, and the third element of the column is used for storing the Task amount of the partial layer sheet set.
(b) Storing the initial layer number and the final layer number of the 9 partial layer sheet sets into corresponding positions of the task_seq array;
(c) Calculating the total Task amount of the 9 partial layer sheet sets, and storing the total Task amount into the corresponding position of the task_seq array;
(d) Sequencing the task_seq array from large to small according to the total amount of tasks of each part of the layer sheet set, wherein the part with the largest Task is arranged at the forefront, and the part with the smallest Task is arranged at the rearmost;
(6) And creating 10 threads by the main thread to perform task processing, and marking the threads as CPU task threads and GPU task threads according to the number of CPU cores and the number of GPUs. Then creating a Task queue, and regarding each part of the layer sheet set as a Task, namely adding the tasks into the Task queue from large to small according to the element sequence of the task_seq array. At this point there are 9 tasks in the task queue.
Then the main thread establishes a result_CPU array and a result_GPU array to store the calculation results of the CPU and the GPU, wherein the result_CPU array and the result_GPU array are two-dimensional arrays, and each column in the array stores a filling path of one layer; the first element in the column stores the layer number and each subsequent element stores a path, the x and y coordinate values of the two endpoints.
Step four, collaborative filling
If the current task queue is empty, executing a step seven; otherwise, each thread sequentially executes the following steps:
(1) Judging whether the current thread is a CPU task thread, if so, taking out a task from a task queue, reading the data of the task from a hard disk to a memory, then applying for CPU computing resources, and if the CPU task thread successfully acquires the CPU computing resources, executing the fifth step; otherwise, the CPU task thread is blocked until it is scheduled by the computer operating system. And if the current thread is the GPU task thread, executing (2).
(2) And the GPU task thread takes out the maximum 2 tasks from all the tasks and calls the GPU kernel for processing. After a task is acquired, the data of the task is read from a hard disk to a memory; then applying for GPU computing resources, if the GPU computing resources are successfully obtained, copying the data from the memory to the video memory, and executing the step six; otherwise, the thread is blocked until it is scheduled by the computer operating system.
Step five, CPU parallel filling
(1) If the task is executed for the first time, the contour data of the initial layer sheet of the task is read; otherwise, reading contour data of the next layer of slice;
(2) Calculating boundary values boundaryMin (xMin, yMin) and boundaryMax (xMax, yMax) of the read ply profile; and then, according to a formula 5, scanning line numbers sMin and sMax corresponding to the boundaryMin and boundaryMax are calculated, and the scanning line numbers SubS intersecting the outline are obtained by subtracting the sMax and the sMin.
(3) The number of the cores of the processor which is currently used mainly is even between 4 and 10 cores, so that the sub scanning lines are divided into 2 groups, 2 CPU threads are adopted to process one layer sheet at the same time, and when the number of the cores does not meet 2 threads, a serial algorithm is adopted for calculation. With C++ multithreading, the two threads each compute the intersection of a set of scan lines and contours. And establishing a plurality of sub-S cutLine arrays, wherein each cutLine array stores intersection point data between one scanning line and the outline, namely the x coordinate and the y coordinate of the intersection point. Establishment of layeresult_idx i An array.
(4) The two threads concurrently traverse all edges on the profile group to obtain intersection points of the edges and the group of scanning lines, and store the intersection points into corresponding cutLine arrays according to the scanning line numbers; after the calculation is completed, each thread arranges the intersection points on each scanning line corresponding to the group according to the sequence from the small y coordinate value to the large y coordinate value, and the arranged intersection points are matched in sequence and even to form a path, and the path is stored in layerresult_idx i In the array.
(5) The main thread will then layeresult_idx i And merging and sequencing paths stored in the array according to the sequence from small to large of x coordinate values, namely the filling result of the current layer sheet. And layeresult_idx i All paths and layer numbers of the array are stored in a result_CPU array;
(6) Judging whether the layer of the sheet is the last layer by the main thread, and if so, executing the step (7); otherwise, returning to the execution step (1)
(7) Judging whether the task queue is empty by the main thread, and executing a step seven when the task queue is empty; otherwise, judging the execution step IV in the execution task.
Step six, GPU parallel filling
At the GPU end, each GPU thread is responsible for a calculation task of a contour, and a plurality of threads finish the processing of all contours in the threads in parallel. Firstly, the intersecting times between each contour and the scanning line are calculated in the GPU, and the specific calculation steps are as follows:
(1) Two adjacent points in the contour can form a line segment, and the two points formed into the line segment are respectively P 1 And P 2
(2) Finding the distance P in the opposite direction of the x-axis 1 The closest scan line, scan line number s 1 The calculation formula is shown as 5, and the distance P is calculated along the opposite direction of the x-axis in the same way 2 Scan line number s with nearest point 2
(3)P 1 And P 2 The intersecting times of the composed line segments and the scanning lines are s 1 Sum s 2 Absolute value of the difference;
(4) And accumulating and summing the intersecting times of all the line segments on the contour and the scanning lines to obtain the intersecting times of the contour and the scanning lines.
And opening a cut point array with a corresponding size on the GPU video memory according to the intersection times of all the contours and the scanning lines in the video memory, wherein each element in the cut point array consists of two 16 byte numbers of a contour sequence number and a hash value, and the cut point array is used for storing the intersection points of the contours and the scanning lines. And then calculating the intersection point of each contour and the scanning line, and storing the serial number of the contour group and the hash value calculated according to the intersection point coordinate and calculated according to the formula 6 in the corresponding position in the cutPoint array.
After the intersection point is obtained, the cutPoint array is sequenced according to the sequence from the small to the large of the serial numbers of the profile groups, and then is sequenced according to the hash value from the small to the large when the serial numbers of the profile groups are the same, and the x coordinate in the hash value calculated by the intersection point coordinate is positioned at the high position, so that finally the intersection point is sequenced according to the serial numbers of the profile groups, the x value of the intersection point and the y value of the intersection point from the small to the large. After the sorting is finished, each element in the cutPoint array is subjected to inverse transformation according to a formula (7), and then the original coordinates of the intersection point can be restored.
And finally, copying the cutPoint array into a host memory, and establishing o sliceResult arrays at a host side, wherein o is the number of plies contained in two tasks distributed to a GPU side. Let the o-layer sequence numbers calculated by the GPU end be idx respectively 1 To idx o Then the arrays are respectively named sliceResult_idx 1 To sliceresult_idx o For storing the mostFinal path filling information, the array storing one path per element. And the intersections in the cutPoint array are paired in order and even to form a path, and the path is stored in the corresponding sliceResult array according to the sequence number of the layer sheet. Then sliceresult_idx is applied 1 To sliceresult_idx o Is stored in the Result GPU.
Judging whether the task queue is empty by the main thread after the calculation is finished, and executing a step seven when the task queue is empty; otherwise, executing the fourth step.
Step seven, completing filling
(1) And establishing a Result array as a two-dimensional array, wherein each element of the array stores a path, namely x and y coordinate values of two endpoints, and each column of the array is used for storing all paths of one slice. The array length is 5000.
(2) And merging the result_CPU array and the result_GPU array. And storing paths in the result_CPU array and the result_GPU array to corresponding positions of the Result array according to the sequence from small to large by using the layer sequence numbers recorded in the result_CPU array and the result_GPU.
(3) After merging, creating a path file on the hard disk. And copying the data of the Result array into a file of the hard disk in sequence, and storing the data as a final calculation Result.
Although embodiments of the present invention have been shown and described above, it will be understood that the above embodiments are illustrative and not to be construed as limiting the invention, and that variations, modifications, alternatives, and variations may be made in the above embodiments by those skilled in the art without departing from the spirit and principles of the invention.

Claims (8)

1. A CPU-GPU cooperative additive manufacturing parallel scanning line filling method is characterized in that: the method comprises the following steps:
step 1: obtaining layering data of a three-dimensional model of a part to be additively manufactured, calculating the Task quantity lineCNTs of each layer of plies of the model, taking the Task quantity of each layer of plies as an element value, and storing the element value in a Task array;
step 2: dividing the Task number into (m-n-2) +2n parts to obtain segmented points, wherein the m-n-2 parts are processed by a CPU, and the 2n parts are processed by a GPU; m is the number of CPU cores, n is the number of GPUs; dividing the model layer sheet set into (m-n-2) +2n parts according to the segmentation points, and determining the task quantity of each part layer sheet set;
step 3: the main thread creates m threads for task processing, and marks the m threads as CPU task threads and GPU task threads according to the CPU core number and the GPU number; creating a task queue, and adding the ply set of each part into the task queue as a task; the main thread also establishes a result_CPU array and a result_GPU array for storing calculation results of the CPU and the GPU, wherein the calculation results comprise layer numbers and paths;
Step 4: collaborative filling is performed according to the task queues:
if the current task queue is empty, executing the step 7; otherwise, each task thread sequentially executes the following steps:
step 4.1: judging whether the current thread is a CPU task thread, if so, taking out a task from a task queue, reading the data of the task from a hard disk to a memory, then applying for CPU computing resources, and if the CPU task thread successfully acquires the CPU computing resources, executing step 5; otherwise, the CPU task thread is blocked until the task thread is scheduled; if the current thread is the GPU task thread, executing the step 4.2;
step 4.2: the GPU task thread takes out 2n tasks with the largest task quantity from all tasks and calls the GPU kernel for processing; after the task is fetched, 2n tasks of data are read from the hard disk to the memory; then applying for GPU computing resources, if the GPU computing resources are successfully obtained, copying the data from the memory to the video memory, and executing the step 6; otherwise, the task thread is blocked until the task thread is scheduled;
step 5: CPU parallel filling is carried out by the following steps:
step 5.1: if the CPU parallel filling is executed for the first time, the outline data of the initial layer sheet of the task is read; otherwise, reading contour data of the next layer of slice;
Step 5.2: determining minimum and maximum values of the contour in the x direction and the y direction according to contour data of the lamellar, recording the minimum and maximum values as boundaryMin and boundaryMax respectively, and calculating scanning line serial numbers sMin and sMax corresponding to boundaryMin and boundaryMax respectively;
step 5.3: determining the number of scanning lines SubS intersecting the current lamellar contour according to the scanning line sequence numbers sMin and sMax, dividing the SubS scanning lines into a plurality of groups, and calculating the intersection point of one group of scanning lines and the contour by each thread by utilizing a multi-thread method;
establishing a cutLine array for each scanning line, wherein the cutLine array is used for storing intersection point coordinates, and elements in the array consist of x values and y values;
establishing a layerResult array named layerresult_idx i I=1, 2,3 …, i is a positive integer, idx i Representing the layer number, a layerrest array is a two-dimensional array, each element of the array stores a path endpoint, each column of the array stores a path, and a path comprises two endpoints;
step 5.4: each thread concurrently traverses all edges on the outline, obtains the intersection point of a group of scanning lines corresponding to the thread, and stores the intersection point into a corresponding cutLine array according to the scanning line number; after the thread calculates the intersection points of the divided group of scanning lines and the outline, arranging the elements in the cutLine array corresponding to each scanning line according to the sequence from the small y coordinate value to the large y coordinate value, and forming a path by parity pairing the arranged intersection points according to the sequence, and storing the path in layerresult_idx i In the array;
step 5.5: after the thread-average computation is completed, the main thread will layerreult_idx i Merging and sequencing paths stored in the array according to the sequence from small to large of x coordinate values to obtain all paths of the profile; and layeresult_idx i All paths and layer numbers of the array are stored in a result_CPU array;
step 5.6: the main thread judges whether the layer of the sheet is the last layer, if so, the step 5.7 is executed; otherwise, returning to the execution step 5.1;
step 5.7: judging whether the task queue is empty by the main thread, and executing the step 7 when the task queue is empty; otherwise, executing the step 4;
step 6: GPU parallel filling is carried out through the following steps:
step 6.1: opening corresponding GPU threads according to the distributed tasks, wherein each GPU thread is responsible for processing a contour; calculating the intersection times of the profile and the scanning line in each thread; summing the intersecting times of all the contours and the scanning lines to obtain an intersecting point total number total S, and opening up a cutPoint array with the size of total S on a GPU (graphic processing unit) display memory, wherein the cutPoint array is used for storing intersecting points of the contours and the scanning lines, and each element of the cutPoint array consists of two values, namely a contour sequence number and a hash value; calculating a hash value of the intersection point of the profile and the scanning line, and then storing the profile serial number and the hash value in a cutPoint array; after the intersection point is obtained, firstly sequencing the cutPoint arrays according to the sequence from small to large of the profile sequence numbers, sequencing the cutPoint arrays according to the sequence from small to large of the hash values when the profile sequence numbers are the same, and restoring the elements in the cutPoint arrays to the original coordinates of the intersection point after sequencing;
Step 6.2: copying the cutPoint array into a memory, establishing 2n sliceResult arrays at a host end, and setting the idx of a model contained in 2n tasks calculated by a GPU end i The layers are named sliceResult_idx i The path filling information is used for storing path filling information of the corresponding ply; the sliceResult array is a two-dimensional array, each element in the array consists of an x value and a y value, each column comprises two elements which respectively correspond to two endpoints of a path, and all columns in the array jointly form path filling information of one layer sheet; storing the intersection point coordinates in the cutPoint array in the corresponding sliceResult_idx i After the array is formed, sliceResult_idx is performed i Storing all paths and layer numbers of the array into a result_GPU array; finally judging whether the task queue is empty, and executing the step 7 when the task queue is empty; otherwise, executing the step 4;
step 7: filling is accomplished by:
step 7.1: establishing a Result array as a two-dimensional array, wherein each element of the array stores a path, namely x and y coordinate values of two endpoints, and each column of the array is used for storing all paths of one slice;
step 7.2: merging the result_CPU array and the result_GPU array; storing paths in the result_CPU array and the result_GPU array to corresponding positions of the Result array according to the sequence from small to large by using the layer sequence numbers recorded in the result_CPU array and the result_GPU;
Step 7.3: after merging, creating a path file; and copying the data of the Result array into the path file in sequence, and storing the data as a final calculation Result.
2. The CPU-GPU collaborative additive manufacturing parallel scan line filling method of claim 1, wherein: in step 1, the task amount linects is the number of scan lines intersecting the slice contour.
3. The CPU-GPU collaborative additive manufacturing parallel scan line filling method of claim 2, wherein: in the step 1, the task quantity lineCNTs is obtained through the following steps:
step 1.1: calculating the boundary value of the lamellar contour;
step 1.2: and calculating the number of scanning lines intersected with the lamellar contour according to the boundary value of the contour and the scanning line spacing.
4. The CPU-GPU collaborative additive manufacturing parallel scan line filling method of claim 1, wherein: in the step 2, a dynamic programming algorithm is adopted to divide the Task array, and the Task array is divided into (m-n-2) +2n parts.
5. The CPU-GPU collaborative additive manufacturing parallel scan line filling method of claim 4, wherein: in step 2, a dynamic programming algorithm is adopted to segment the Task array, and the specific process of determining the segmentation points is as follows:
Step 2.1: the Problem of dividing the Task array is marked as Problem (Task, layers, k), which means that the Task array with the length of Layers is divided into k parts, wherein the Task array contains Layers elements; the sub-Problem is marked as Problem (Task, i, j), which means that the Task array is divided into j parts, wherein the Task array contains i elements; layers, k, i, j are all positive integers;
step 2.2: establishing a matrix opt with the size of Layers' k, and recording the state of a neutron problem in the solving process: the i elements are divided into j groups of optimal segmentation values; a matrix dp of size Layers k is created for recording the segmentation position when i elements are segmented into j parts: corresponding subscripts of the Task array; the optimal value of the sub-Problem Problem (Task, i, j) is denoted opt [ i ] [ j ], representing the optimal segmentation value that divides i elements into j groups;
step 2.3: initializing boundary conditions:
opt[i][j]=opt[i][i],if j>i
in which opt [ i ]][1]Optimum value, a, of the sub-Problem Problem (Task, i, 1) z Is the z-th value of the array, i and z are positive integers, opt [ i ]][j]Optimum value of the sub-Problem Problem (Task, i, j), opt [ i ]][i]The optimum value of the sub-Problem (Task, i, i), i and j are positive integers; thus, when j=1, i=1, 2, …, opt [ i ] at Layers][1]And j=1, 2, …, opt [1 ] at k when i=1 ][j]Is a value of (2); initializing all elements of the dp matrix to be-1;
step 2.4: traversing and calculating i=2, … and Layers by using a recurrence relation, dividing a number component into an optimal value opt [ i ] [ j ] when j (j=2, …, k) segments, and recording a segmentation point dp [ i ] [ j ] at the moment; the recurrence relation is as follows:
when 1<j =i, there is a recurrence relation:
wherein opt [ i ]][j]Optimal value of Problem (Task, i, j), opt [ x ]][j-1]Is the optimal value, a, of the sub-Problem Problem (Task, x, j-1) p The p-th value of the array, m, x and p are positive integers;
step 2.5: a one-dimensional array index_seg is established for storing all the segmentation points, and each element of the array index_seg stores one segmentation point:
from dp [ Layers ] in a dp matrix based on stored segment point locations of the dp matrix][k]Starting to reversely search all the segmentation points of the Task array in sequence to obtain a required segmentation point set; wherein dp [ Layers ]][k]When the Layers elements are divided into k parts, the division point n between the k-1 part and the k part is stored k-1 ;dp[n k-1 ][k-1]Is stored with n k-1 Dividing the element into k-1 parts, dividing the element into k-2 parts and k-1 parts into n k-2 The method comprises the steps of carrying out a first treatment on the surface of the Similarly, all segmentation points are retrieved and stored in an index_seg array, where n k-1 And k is a positive integer.
6. The CPU-GPU collaborative additive manufacturing parallel scan line filling method of claim 5, wherein: in step 2, after all segmentation points are obtained, the ply set is divided into (m-n-2) +2n parts by the following procedure:
step 2.6.1: creating a task_seq array for storing (m-n-2) +2n partial parameters of the ply set after segmentation:
the task_seq array is a two-dimensional array, each element of the array is a positive integer, the first two elements of each column of the array are used for storing the initial layer number and the final layer number of the partial layer sheet set, and the third element of the column is used for storing the Task amount of the partial layer sheet set;
step 2.6.2: storing the start layer number and the end layer number of the (m-n-2) +2n partial layer sheet sets in corresponding positions of the task_seq array; calculating the Task quantity of the (m-n-2) +2n partial ply sets, and storing the Task quantity into the corresponding position of the task_seq array;
step 2.6.3: and sequencing the task_seq array from large to small according to the total Task amount of each part of the layer sheet set.
7. The CPU-GPU collaborative additive manufacturing parallel scan line filling method of claim 1, wherein: in the step 3, the number of CPU task threads is m-n, the number of GPU threads is n, the parallel acceleration ratio of the CPU is set to be m-n-2, and the acceleration ratio of the GPU is set to be 2.
8. The CPU-GPU collaborative additive manufacturing parallel scan line filling method of claim 1, wherein: in step 5.3, the scan lines are allocated at intervals to balance the amount of tasks between the threads.
CN202210230094.6A 2022-03-10 2022-03-10 CPU-GPU cooperative additive manufacturing parallel scanning line filling method Active CN114722571B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210230094.6A CN114722571B (en) 2022-03-10 2022-03-10 CPU-GPU cooperative additive manufacturing parallel scanning line filling method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210230094.6A CN114722571B (en) 2022-03-10 2022-03-10 CPU-GPU cooperative additive manufacturing parallel scanning line filling method

Publications (2)

Publication Number Publication Date
CN114722571A CN114722571A (en) 2022-07-08
CN114722571B true CN114722571B (en) 2024-02-09

Family

ID=82238226

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210230094.6A Active CN114722571B (en) 2022-03-10 2022-03-10 CPU-GPU cooperative additive manufacturing parallel scanning line filling method

Country Status (1)

Country Link
CN (1) CN114722571B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112464446A (en) * 2020-11-10 2021-03-09 西北工业大学 Metal additive manufacturing path filling method based on interlayer information inheritance
CN112743834A (en) * 2020-12-16 2021-05-04 华南理工大学 Multi-laser cooperative load laser selective melting additive manufacturing method
CN113191016A (en) * 2021-05-20 2021-07-30 华中科技大学 Body expression model-based multi-material product modeling and analyzing integrated method

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100076941A1 (en) * 2008-09-09 2010-03-25 Microsoft Corporation Matrix-based scans on parallel processors

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112464446A (en) * 2020-11-10 2021-03-09 西北工业大学 Metal additive manufacturing path filling method based on interlayer information inheritance
CN112743834A (en) * 2020-12-16 2021-05-04 华南理工大学 Multi-laser cooperative load laser selective melting additive manufacturing method
CN113191016A (en) * 2021-05-20 2021-07-30 华中科技大学 Body expression model-based multi-material product modeling and analyzing integrated method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
快速成形中层片填充路径的优化处理;陈青果;刘超颖;张君彩;徐安平;;煤矿机械(第10期);全文 *

Also Published As

Publication number Publication date
CN114722571A (en) 2022-07-08

Similar Documents

Publication Publication Date Title
US11763168B2 (en) Progressive modification of generative adversarial neural networks
CN102609978B (en) Method for accelerating cone-beam CT (computerized tomography) image reconstruction by using GPU (graphics processing unit) based on CUDA (compute unified device architecture) architecture
JP5425541B2 (en) Method and apparatus for partitioning and sorting data sets on a multiprocessor system
EP2711839A1 (en) Parallel processing device, parallel processing method, optimization device, optimization method, and computer program
CN110516316B (en) GPU acceleration method for solving Euler equation by interrupted Galerkin method
CN101894358A (en) Definition technique
KR20130016120A (en) System, method, and computer-readable recording medium for constructing an acceleration structure
WO2015099562A1 (en) Methods and apparatus for data-parallel execution of operations on segmented arrays
CN112947870A (en) G-code parallel generation method of 3D printing model
CN111240744B (en) Method and system for improving parallel computing efficiency related to sparse matrix
CN111186139B (en) Multi-level parallel slicing method for 3D printing model
CN106484532B (en) GPGPU parallel calculating method towards SPH fluid simulation
Man et al. Implementations of parallel computation of Euclidean distance map in multicore processors and GPUs
CN114722571B (en) CPU-GPU cooperative additive manufacturing parallel scanning line filling method
LUONG et al. Neighborhood structures for GPU-based local search algorithms
CN115034950B (en) Thread construction method and device
CN202093573U (en) Parallel acceleration device used in industrial CT image reconstruction
CN112950451B (en) GPU-based maximum k-tress discovery algorithm
CN105573834B (en) A kind of higher-dimension vocabulary tree constructing method based on heterogeneous platform
Honda et al. Simple and fast parallel algorithms for the Voronoi map and the Euclidean distance map, with GPU implementations
Sopin et al. Real-time SAH BVH construction for ray tracing dynamic scenes
US20230195509A1 (en) Variable dispatch walk
AU2017248406A1 (en) Method, apparatus and system for rendering a graphical representation within limited memory
Chen et al. An efficient sorting algorithm with CUDA
CN109976810B (en) Dense matrix multiplication GPU acceleration method based on OpenCL

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant