CN116483587B

CN116483587B - Video super-division parallel method, server and medium based on image segmentation

Info

Publication number: CN116483587B
Application number: CN202310738868.0A
Authority: CN
Inventors: 邓正秋; 徐振语
Original assignee: Hunan Malanshan Video Advanced Technology Research Institute Co ltd
Current assignee: Hunan Malanshan Video Advanced Technology Research Institute Co ltd
Priority date: 2023-06-21
Filing date: 2023-06-21
Publication date: 2023-09-08
Anticipated expiration: 2043-06-21
Also published as: CN116483587A

Abstract

The invention provides a video super-division parallel method, a server and a medium based on image segmentation, wherein the method comprises the following steps: s1, obtaining an average value of the superfraction computing capacities of all the GPUs on a server; s2, obtaining CPU computing power Pcpu_chs closest to the average value of the GPU super-division computing power from the computing power of the CPU on the server, and obtaining the CPU thread number corresponding to the Pcpu_chs; s3, generating a partition weight array Q of the server; s4, dividing the video pixels according to the weight array Q < - > and the total weight of the total computing capacity; s5, distributing the video blocks to a CPU part and a GPU part respectively through multithreading; performing super-resolution calculation on the cut video blocks in parallel; and S6, synthesizing the result obtained by the super-resolution calculation to obtain a super-resolution reconstructed image. The method can equalize the computing capacities of the CPU and the GPU.

Description

Video super-division parallel method, server and medium based on image segmentation

Technical Field

The invention relates to the technical field of image processing, in particular to a video super-division parallel method, a server and a medium based on image segmentation.

Background

The video super-resolution reconstruction process is to reconstruct a low-resolution video into a high-resolution video, so that old video, old movies, old television programs and old cartoons can be repaired, and the video quality is improved. In the calculation process, the video is required to be decoded into original images, each image or a group of images is used as input to perform super-resolution calculation, and the calculated images are recoded to obtain new video. Wherein the super-resolution calculation time is the longest and is also the core of the calculation.

The video super-resolution calculation has higher requirement on calculation force and high time efficiency, so that the calculation capability of all calculation resources is fully exerted on a server, and the calculation speed of the single-frame super-resolution is necessary to be improved. The existing server generally adopts a CPU+multi-GPU architecture, the computing power of the CPU is completely different from that of the GPUs, the computing power of the GPUs of different models is completely different, and the performance difference of the GPUs of the same model is also caused by depreciation and wear; how to schedule various computing resources on a server to perform video super-resolution computation and improve the computing efficiency of a video single frame is a difficult problem to be solved.

The currently common parallel method for the super-resolution of the video is based on video frames, each video frame can be regarded as a task, a plurality of tasks can be calculated in parallel, and when the number of the video frames is small, the parallel efficiency is severely restricted; however, when calculating a single frame image, on the architecture of a pure CPU, the method accelerates a calculation operator by establishing threads on a plurality of CPU cores; on the architecture of CPU+GPU, the computing operators are accelerated by 1 GPU, and the parallel granularity of the computing operators is limited, so that the acceleration is limited. When the single server is a multi-path CPU+multi-GPU, all CPU and GPU resources cannot be accelerated to a single frame of video at the same time.

The background description provided herein is for the purpose of generally presenting the context of the disclosure. Unless otherwise indicated herein, the materials described in this section are not prior art to the claims in this application and are not admitted to be prior art by inclusion in this section.

Disclosure of Invention

Aiming at the technical problems in the related art, the invention provides a video super-division parallel method based on image segmentation, which comprises the following steps:

s1, obtaining an average value Pgpu_ave of the super-division computing capacity of all the GPUs on a server;

s2, obtaining CPU computing capacity Pcpu_chs closest to an average value Pgpu_ave of the GPU super-division computing capacity from the computing capacities of the CPUs on the server, and obtaining CPU thread number thum_cpu corresponding to the Pcpu_chs;

s3, generating a partition weight array Q [ ] of the server, wherein the length is equal to part_all=part_cpu+n_gpu, and part_all represents the total number of the cutting parts; q [0] to Q [ n_gpu-1] = Pgpu [0] to Pgpu [ n_gpu-1], Q [ n_gpu ] - [ part_all-1] = Pcpu_chs; wherein n_gpu represents the number of GPUs on the server; part_cpu=core_num/num_cpu; where core_num represents the total CPU core number of the server, and part_cpu represents the part of the image cut to which the CPU will divide;

s4, dividing the video pixels according to the weight array Q < - > and the total computing power total weight Qsum of the weight array Q < - > to obtain divided image blocks;

s5, distributing the video blocks to a CPU part and a GPU part respectively through multithreading according to the cut video blocks; performing super-resolution calculation on the cut video blocks in parallel;

and S6, synthesizing the result obtained by the super-resolution calculation to obtain a super-resolution reconstructed image.

Specifically, before the step S1, the method includes:

step 1, taking a preset frame video as a video test sample;

step 2, taking the convention array from N CPU cores, and setting a list according to the convention array; including L common divisors; initializing a convention array thum with length L, and storing elements in each convention; the server comprises N CPU cores and M GPUs;

step 3, initializing a CPU computing capacity array Pcpu with the length of L, and storing elements to use the superdivision computing capacity of the CPU under different thread numbers;

step 4, initializing i=0, wherein i represents a common divisor index number;

step 5, setting the CPU thread number participating in calculation as the number of threads of the CPU;

step 6, performing super-division calculation on the video of the preset frame on the thread of the ium [ i ] of the CPU; recording and calculating the service time to be T1;

step 7, calculating video super-resolution calculation capacity Pcpu [ i ] of the CPU core, wherein Pcpu [ i ] =1/T1;

step 8,i =i+1;

step 9, judging whether i is smaller than L, if yes, returning to step 5, and if not, executing step 10;

step 10, initializing a GPU computing capability array Pgpu, wherein the length is n_gpu, and the elements store the super-division computing capabilities of different GPUs;

step 11, initializing i=0, wherein i represents the number of the GPU card;

step 12, performing super-division calculation on the video of the preset frame on the GPU; recording and calculating the service time as T2;

step 13, calculating video super-resolution calculation capability Pgpu [ i ], pcpu [ i ] =1/T2 of the GPU;

step 14, i=i+1;

step 15, judging whether i is smaller than n_gpu, if so, returning to step 12, and if not, ending.

Specifically, the step S2 specifically includes:

step 21, taking an average value of all GPU superfraction computing capacities on the server, wherein Pgpu_ave=avger (Pgpu [ ]);

step 22, selecting a value Pcpu_chs closest to the Pcpu_ave value in the Pcpu [ ] array; i.e. Pcpu _ chs = min (|pcpu [ ] -Pgpu _ ave|), and find the Pcpu [ ] index corresponding to Pcpu _ chs,

step 23, taking the sum_cpu=sum [ index ];

step 24, taking part_cpu=core_num/num_cpu; where core_num represents the total number of CPU cores of the server, part_cpu represents the part of the image cut into which the CPU will split, and thum_cpu represents the number of threads each part participates in the calculation.

Specifically, the step S4 specifically includes:

step 41, obtaining a video frame needing super division, wherein the image pixels are length;

step 42, taking the larger value in Dim equal to length, width, i.e. dim=max (length, width);

step 43, cutting on Dim, initializing a pixel start array Dstart and a pixel start array end array d_end, the length being equal to part_all;

step 44, initializing i=0, wherein i represents a part_all index number;

step 45, if i=0, dstart [ i ] =1;

otherwise, dstart [ i ] =dstart [ i-1] +dim Q [ i-1]/Qsum;

step 46, i=i+1;

step 47, judging whether i is smaller than part_all, if yes, returning to step 45, and if not, jumping to step 48;

step 48, initializing i=0, wherein i represents a part_all index number;

step 49, if i=part_all-1, dend [ i ] =dim; otherwise, dend [ i ] =dstart [ i+1] -1;

step 50, i=i+1;

step 51, judging whether i is smaller than part_all, if yes, returning to step 50, if no, ending.

Specifically, in the step 45, if i=0, dstart [ i ] =1; otherwise, dstart [ i ] =dstart [ i-1] +dim Q [ i-1]/Qsum-buff;

in the step 49, it is: if i=part_all-1, dend [ i ] =dim, otherwise, dend [ i ] =Dstart [ i+1] -1+buff; where buff is the buffer.

Specifically, the step S5 is:

step 52, starting a multithreading function;

step 53, according to the segmented image, sending the part of the GPU to the video memory;

step 54, different CPU thread groups acquire cut images in the memory;

step 55, performing super-division calculation on the respective images by the CPU thread group and the GPU in parallel;

specifically, the step S6 is: and synthesizing the results obtained by the super-resolution calculation, and averaging the overlapped pixels in a buffer area to obtain the super-resolution reconstructed image.

Specifically, the step S6 is:

step 61, combining the calculated parts of the GPU and the CPU;

step 62, on the area of the buffer zone, the pixel values are averaged by the common ownership of the left and right sides;

step 63, completing the synthesis of the picture.

In a second aspect, another embodiment of the present invention discloses a server, where the server includes N CPUs and M GPUs, where N and M are positive integers greater than 1, and the server is configured to implement the above-mentioned video super-division parallel method based on image segmentation.

In a third aspect, another embodiment of the present invention discloses a non-volatile storage medium having stored thereon instructions that, when executed by a processor, are configured to implement a video hyper-segmentation parallelism method based on image segmentation as described above.

According to the method, the CPU and the GPU in the server are quantized, the corresponding thread numbers of the CPU participating in calculation are obtained according to the quantized average computing capacity of the GPU, the corresponding segmentation weights are generated, and then the video image is segmented according to the segmentation weights to carry out parallel super-resolution reconstruction. The video super-division parallel method based on image segmentation can balance the computing capacities of the CPU and the GPU, and cannot cause unbalanced load. In addition, the super division of the video single frame can be effectively accelerated in parallel; therefore, the computing performance of various resources on a single server can be fully exerted. Further, to avoid preventing significant cracking, a buffer is required at the cut.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are needed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of a video superdivision parallel method based on image segmentation provided by an embodiment of the invention;

fig. 2 is a schematic diagram of single-frame image video cutting according to an embodiment of the present invention;

fig. 3 is a schematic diagram of a video super-division parallel device based on image division according to an embodiment of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which are derived by a person skilled in the art based on the embodiments of the invention, fall within the scope of protection of the invention.

Example 1

Referring to fig. 1, the embodiment discloses a video super-division parallel method based on image segmentation, which comprises the following steps:

in this embodiment, before step S1, the super-resolution computing power of all GPUs and CPUs on the server needs to be obtained.

Specifically, in this embodiment, a reference test program is set, and the superminute computing capacities of the CPU and the GPU in the server for video superminute are evaluated according to the reference test program; stored in arrays of Pcpu and Pcpu; the server of the present embodiment includes N CPU cores and M GPUs in total;

step 1, taking a preset frame video as a video test sample; the preset frame of the embodiment is 10 frames;

step 2, taking the convention array from N CPU cores, and setting a list according to the convention array; including L common divisors; initializing a convention array thum with length L, and storing elements in each convention;

for example: let the CPU have 64 CPU cores, l=7, common divisors 64, 32, 16, 8, 4, 2, 1, thu [0] =64.

step 4, initializing i=0, wherein i represents a common divisor index number;

step 6, performing super-division calculation on 10 frames of video on a thread of the ium [ i ] of the CPU; recording and calculating the service time to be T1;

step 8,i =i+1;

step 10, initializing a GPU computing capability array Pgpu, wherein the length is n_gpu, and the elements store the super-division computing capabilities of different GPUs (the situation that different types of GPU cards are inserted into one server exists, and even if the GPU cards are the same, the situation that the performance is reduced is also caused;

step 11, initializing i=0, wherein i represents the number of the GPU card;

step 12, performing super-division calculation on the 10 frames of video on the GPU; recording and calculating the service time as T2;

step 14, i=i+1;

step 15, judging whether i is smaller than n_gpu, if so, returning to step 12, and if not, ending;

the step S2 specifically comprises the following steps:

step 23, taking the sum_cpu=sum [ index ];

step 24, taking part_cpu=core_num/num_cpu; where core_num represents the total CPU core number of the server, part_cpu represents the part of the image cut to which the CPU will divide, and thum_cpu represents the number of threads each part participates in calculation;

for example, the server has 64 cpu cores and 2 GPU cards, taking part_cpu=2, sum_cpu=32, and the partition weight set Q [ ] is assigned, and one example partition weight set Q [ ] is assigned to Pgpu [0], pgpu [1], pcpu_chs, respectively.

The total computing power total weight Qsum is equal to the sum of the weight arrays Q [ ]; qsum=sum (Qsum [ ]);

specifically, the steps of this embodiment include:

step 44, initializing i=0, wherein i represents a part_all index number;

step 45, if i=0, dstart [ i ] =1;

otherwise, dstart [ i ] =dstart [ i-1] +dim Q [ i-1]/Qsum;

step 46, i=i+1;

step 48, initializing i=0, wherein i represents a part_all index number;

step 50, i=i+1;

For example, assuming length > width, a cut is made on length, the server has 64 CPU cores and 2 GPU cards, part_cpu=2, and therefore_cpu=32, for example, as shown in fig. 2, the middle part of cpu_0 and cpu_1 in the figure is a buffer area, and the length is 2×buff=20;

in order to prevent obvious cracks, the embodiment needs to provide a buffer area at the cutting position;

then in step 49 above is: if i=part_all-1, dend [ i ] =dim, otherwise, dend [ i ] =Dstart [ i+1] -1+buff;

step 52, starting a multithreading function;

step 54, different CPU thread groups acquire cut images in the memory;

s6, synthesizing the result obtained by the super-resolution calculation to obtain a super-resolution reconstructed image;

specifically, after setting the buffer area buff, step S6 of this embodiment is: synthesizing the results obtained by the super-resolution calculation, and averaging the overlapped pixels in a buffer area to obtain a super-resolution reconstructed image;

step 61, combining the calculated parts of the GPU and the CPU;

for example, as shown in fig. 2, there is a buffer overlap region of 20 x width between gpu_1 and cpu_0, and the pixels become 80 x (4 width) after the pixels are super-divided, wherein the pixel value of gpu_1 after super-division is denoted as buff_gpus 1, the pixel value of cpu_0 after super-division is denoted as buff_cpus 0, and each pixel value at 80 x (4 width) is obtained by using (buff_gpus 1+buff_cpus 0)/2.

Step 63, completing the synthesis of the picture;

step 64, end;

according to the embodiment, the computing capacities of the CPU and the GPU in the server are quantized, the corresponding thread numbers of the CPU participating in computation are obtained according to the quantized average computing capacity of the GPU, the corresponding segmentation weights are generated, and then the video image is segmented according to the segmentation weights to carry out parallel super-resolution reconstruction. The video super-division parallel method based on image segmentation can balance the computing capacities of the CPU and the GPU, and cannot cause unbalanced load. In addition, the super division of the video single frame can be effectively accelerated in parallel; therefore, the computing performance of various resources on a single server can be fully exerted. Further, to avoid preventing significant cracking, a buffer is required at the cut.

Example two

The embodiment discloses a video super-division parallel method based on image segmentation, which comprises the following steps:

s1, setting a reference test program, and evaluating the superdivision computing capacities of a CPU and a GPU of a server for video superdivision according to the reference test program; stored in arrays of Pcpu and Pcpu;

step 1, taking 10 frames of video as a video test sample;

step 2, the selected server comprises N CPU cores and M GPUs in total;

step 3, taking the convention array from the N CPU cores, and setting a list according to the convention array; including L common divisors; initializing a convention array thum with length L, and storing elements in each convention;

for example: let the CPU have 64 CPU cores, l=7, common divisor 64, 32, 16, 8, 4, 2, 1; tham [0] =64.

Step 4, initializing a CPU computing capacity array Pcpu with the length of L, and storing elements to use the superdivision computing capacity of the CPU under different thread numbers;

step 5, initializing i=0, wherein i represents a common divisor index number;

step 6, setting the CPU thread number participating in calculation as the number of threads of the CPU;

step 7, performing super-division calculation on 10 frames of video on a thread of the ium [ i ] of the CPU; recording and calculating the service time to be T1;

step 8, calculating video super-resolution calculation capacity Pcpu [ i ] of the CPU core, wherein Pcpu [ i ] =1/T1;

step 9,i =i+1;

step 10, judging whether i is smaller than L, if yes, returning to step 6, and if not, executing step 11;

step 11, initializing a GPU computing capability array Pgpu, wherein the length is n_gpu, and the elements store the super-division computing capabilities of different GPUs (the situation that different types of GPU cards are inserted into one server exists, and even if the GPU cards are the same, the situation that the performance is reduced is also caused;

step 12, initializing i=0, wherein i represents the number of the GPU card;

step 13, performing super-division calculation on the 10 frames of video on the GPU; recording and calculating the service time as T2;

step 14, calculating video super-resolution calculation capability Pgpu [ i ], pcpu [ i ] =1/T2 of the GPU;

step 15, i=i+1;

step 16, judging whether i is smaller than n_gpu, if so, returning to step 13, and if not, ending;

s2, cutting the video image according to the computing capacities of the CPU and the GPU on the server, setting cutting weights, and enabling the sizes of the cut images to be similar as far as possible in consideration of the fact that the images are not suitable to be cut too small;

step 1, taking an average value of the superfraction computing capacities of all the GPUs on a server, wherein pgpu_ave=avger (Pgpu [ ]);

step 2, selecting a value Pcpu_chs closest to the Pcpu_ave value in the Pcpu [ ] array; i.e. Pcpu _ chs = min (|pcpu [ ] -Pgpu _ ave|), and find the Pcpu [ ] index corresponding to Pcpu _ chs,

step 3, taking the sum_cpu=sum [ index ];

step 4, taking part_cpu=core_num/num_cpu; where core_num represents the total CPU core number of the server, part_cpu represents the part of the image cut to which the CPU will divide, and thum_cpu represents the number of threads each part participates in calculation;

step 5, generating a new partition weight array Q [ ] of the current server, wherein the length is equal to part_all=part_cpu+n_gpu, and part_all represents the total number of the cut parts; q [0] to Q [ n_gpu-1] = Pgpu [0] to Pgpu [ n_gpu-1], Q [ n_gpu ] - [ part_all-1] = Pcpu_chs;

Step 6, taking the total computing power total weight Qsum equal to the sum of the weight array Q [ ]; qsum=sum (Qsum [ ]);

s3, cutting according to the weight value, and cutting by taking larger data in the video pixel length and the pixel height width; in order to prevent obvious cracks, a buffer area is required to be arranged at the cutting position;

step 1, obtaining a video frame needing super division, wherein an image pixel is length;

step 2, taking the larger value in Dim equal to length and width, namely dim=max (length and width);

step 3, cutting on Dim, initializing a pixel start array Dstart and a pixel start array end array D_end, wherein the length is equal to part_all;

step 4, setting a buffer area with a pixel width of buff, and taking buff=10;

step 5, initializing i=0, wherein i represents a part_all index number;

step 6, if i=0, dstart [ i ] =1;

otherwise, dstart [ i ] =dstart [ i-1] +dstate [ i-1]/Qsum-buff;

step 7,i =i+1;

step 8, judging whether i is smaller than part_all, if so, returning to step 6, and if not, jumping to step 9;

step 9, initializing i=0, wherein i represents a part_all index number;

step 10, if i=part_all-1, dend [ i ] =dim, otherwise, dend [ i ] =dstart [ i+1] -1+buff;

step 11, i=i+1;

step 12, judging whether i is smaller than part_all, if so, returning to step 10, and if not, ending;

s4, respectively distributing tasks to a CPU part and a GPU part through multithreading according to the cut video; performing super-resolution calculation on the cut pictures in parallel;

step 1, starting a multithreading function;

step 2, according to the segmented image, sending the part of the GPU to a video memory;

step 3, different CPU thread groups acquire cut images in the memory;

step 4, performing super-division calculation on the respective images by the CPU thread group and the GPU in parallel;

s5, synthesizing results obtained by super-resolution calculation, and averaging overlapped pixels in a buffer area;

step 1, synthesizing the calculated parts of the GPU and the CPU;

step 2, on the area of the buffer zone, the pixel values are averaged by sharing all the left and right sides;

Step 3, completing the synthesis of the picture;

step 4, ending;

example III

The server comprises N CPUs and M GPUs, wherein N and M are positive integers greater than 1, and the server is used for realizing the video super-division parallel method based on image segmentation.

Example IV

Referring to fig. 3, fig. 3 is a schematic structural diagram of a video super-division parallel device based on image division according to the present embodiment. The video superdivision parallel device 20 based on image segmentation of this embodiment comprises a processor 21, a memory 22 and a computer program stored in said memory 22 and executable on said processor 21. The steps of the above-described method embodiments are implemented by the processor 21 when executing the computer program. Alternatively, the processor 21 may implement the functions of the modules/units in the above-described device embodiments when executing the computer program.

Illustratively, the computer program may be partitioned into one or more modules/units that are stored in the memory 22 and executed by the processor 21 to complete the present invention. The one or more modules/units may be a series of computer program instruction segments capable of performing a specific function for describing the execution of the computer program in the image segmentation based video superparallel device 20.

The image segmentation based video superdivision parallel device 20 may include, but is not limited to, a processor 21, a memory 22. It will be appreciated by those skilled in the art that the schematic diagram is merely an example of the image segmentation based video superparallel device 20, and does not constitute a limitation of the image segmentation based video superparallel device 20, and may include more or less components than illustrated, or combine certain components, or different components, e.g., the image segmentation based video superparallel device 20 may further include an input-output device, a network access device, a bus, etc.

The processor 21 may be a central processing unit (Central Processing Unit, CPU), but may also be other general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), off-the-shelf programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. The general purpose processor may be a microprocessor or the processor may be any conventional processor, etc., and the processor 21 is a control center of the image segmentation-based video superparallel device 20, and connects various parts of the entire image segmentation-based video superparallel device 20 using various interfaces and lines.

The memory 22 may be used to store the computer program and/or module, and the processor 21 may implement various functions of the image segmentation-based video superdivision parallel device 20 by executing or executing the computer program and/or module stored in the memory 22 and invoking data stored in the memory 22. The memory 22 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program (such as a sound playing function, an image playing function, etc.) required for at least one function, and the like; the storage data area may store data (such as audio data, phonebook, etc.) created according to the use of the handset, etc. In addition, the memory 22 may include high-speed random access memory, and may also include non-volatile memory, such as a hard disk, memory, plug-in hard disk, smart Media Card (SMC), secure Digital (SD) Card, flash Card (Flash Card), at least one disk storage device, flash memory device, or other volatile solid-state storage device.

Wherein the modules/units integrated by the image segmentation-based video supersegmentation parallel device 20 may be stored in a computer readable storage medium if implemented in the form of software functional units and sold or used as a stand alone product. Based on such understanding, the present invention may implement all or part of the flow of the method of the above embodiment, or may be implemented by a computer program to instruct related hardware, where the computer program may be stored in a computer readable storage medium, and the computer program may implement the steps of each of the method embodiments described above when executed by the processor 21. Wherein the computer program comprises computer program code which may be in source code form, object code form, executable file or some intermediate form etc. The computer readable medium may include: any entity or device capable of carrying the computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), an electrical carrier signal, a telecommunications signal, a software distribution medium, and so forth. It should be noted that the computer readable medium contains content that can be appropriately scaled according to the requirements of jurisdictions in which such content is subject to legislation and patent practice, such as in certain jurisdictions in which such content is subject to legislation and patent practice, the computer readable medium does not include electrical carrier signals and telecommunication signals.

It should be noted that the above-described apparatus embodiments are merely illustrative, and the units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. In addition, in the drawings of the embodiment of the device provided by the invention, the connection relation between the modules represents that the modules have communication connection, and can be specifically implemented as one or more communication buses or signal lines. Those of ordinary skill in the art will understand and implement the present invention without undue burden.

The foregoing description of the preferred embodiments of the invention is not intended to be limiting, but rather is intended to cover all modifications, equivalents, alternatives, and improvements that fall within the spirit and scope of the invention.

Claims

1. A video super-division parallel method based on image segmentation is characterized by comprising the following steps of: the method comprises the following steps:

s1, obtaining an average value Pgpu_ave of the super-division computing capacity of all the GPUs on a server; the server is pre-stored with the superdivision computing capacities of the CPU and the GPU evaluated according to the reference test program, and the superdivision computing capacities are respectively stored in Pcpu </SUB > ] and Pgpu </SUB > ] arrays;

s2, obtaining CPU computing capacity Pcpu_chs closest to an average value Pgpu_ave of the GPU super-division computing capacity from the computing capacities of the CPUs on the server, and obtaining CPU thread number thum_cpu corresponding to the Pcpu_chs; the obtaining the CPU thread number thum_cpu corresponding to the pcpu_chs specifically includes: finding Pcpu [ ] index corresponding to Pcpu_chs, and obtaining CPU thread number thum_cpu from the convention array thum according to the index; the convention array sum is an array of the convention of N CPU cores, and a list is set according to the convention array;

2. The method according to claim 1, characterized in that: before the step S1, the method includes:

step 1, taking a preset frame video as a video test sample;

step 4, initializing i=0, wherein i represents a common divisor index number;

step 8,i =i+1;

step 11, initializing i=0, wherein i represents the number of the GPU card;

step 14, i=i+1;

3. The method according to claim 1, characterized in that: the step S2 specifically includes:

step 23, taking the sum_cpu=sum [ index ];

4. A method according to claim 3, characterized in that: the step S4 specifically includes:

step 44, initializing i=0, wherein i represents a part_all index number;

step 45, if i=0, dstart [ i ] =1;

otherwise, dstart [ i ] =dstart [ i-1] +dim Q [ i-1]/Qsum;

step 46, i=i+1;

step 48, initializing i=0, wherein i represents a part_all index number;

step 50, i=i+1;

5. The method according to claim 4, wherein: step 45 is, if i=0, dstart [ i ] =1; otherwise, dstart [ i ] =dstart [ i-1] +dim Q [ i-1]/Qsum-buff;

6. The method according to any one of claims 4-5, wherein: the step S5 is as follows:

step 52, starting a multithreading function;

step 54, different CPU thread groups acquire cut images in the memory;

in step 55, the cpu thread group and the GPU perform super-division computation on the respective images in parallel.

7. The method according to claim 5, wherein: the step S6 is as follows: and synthesizing the results obtained by the super-resolution calculation, and averaging the overlapped pixels in a buffer area to obtain the super-resolution reconstructed image.

8. The method according to claim 7, wherein: the step S6 is as follows:

step 61, combining the calculated parts of the GPU and the CPU;

step 63, completing the synthesis of the picture.

9. A server comprising N CPUs and M GPUs, wherein N, M are positive integers greater than 1, characterized in that: the server is used for realizing the video super-division parallel method based on image segmentation as claimed in any one of claims 1-8.

10. A non-volatile storage medium having instructions stored thereon, characterized by: the instructions, when executed by a processor, for implementing a video hyper-segmentation parallelism method based on image segmentation according to one of the claims 1-8.