CN116483587B - Video super-division parallel method, server and medium based on image segmentation - Google Patents

Video super-division parallel method, server and medium based on image segmentation Download PDF

Info

Publication number
CN116483587B
CN116483587B CN202310738868.0A CN202310738868A CN116483587B CN 116483587 B CN116483587 B CN 116483587B CN 202310738868 A CN202310738868 A CN 202310738868A CN 116483587 B CN116483587 B CN 116483587B
Authority
CN
China
Prior art keywords
cpu
gpu
super
pcpu
video
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310738868.0A
Other languages
Chinese (zh)
Other versions
CN116483587A (en
Inventor
邓正秋
徐振语
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hunan Malanshan Video Advanced Technology Research Institute Co ltd
Original Assignee
Hunan Malanshan Video Advanced Technology Research Institute Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hunan Malanshan Video Advanced Technology Research Institute Co ltd filed Critical Hunan Malanshan Video Advanced Technology Research Institute Co ltd
Priority to CN202310738868.0A priority Critical patent/CN116483587B/en
Publication of CN116483587A publication Critical patent/CN116483587A/en
Application granted granted Critical
Publication of CN116483587B publication Critical patent/CN116483587B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5061Partitioning or combining of resources
    • G06F9/5066Algorithms for mapping a plurality of inter-dependent sub-tasks onto a plurality of physical CPUs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5083Techniques for rebalancing the load in a distributed system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4053Scaling of whole images or parts thereof, e.g. expanding or contracting based on super-resolution, i.e. the output image resolution being higher than the sensor resolution
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/01Conversion of standards, e.g. involving analogue television standards or digital television standards processed at pixel level
    • H04N7/0117Conversion of standards, e.g. involving analogue television standards or digital television standards processed at pixel level involving conversion of the spatial resolution of the incoming video signal
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Graphics (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Image Processing (AREA)

Abstract

The invention provides a video super-division parallel method, a server and a medium based on image segmentation, wherein the method comprises the following steps: s1, obtaining an average value of the superfraction computing capacities of all the GPUs on a server; s2, obtaining CPU computing power Pcpu_chs closest to the average value of the GPU super-division computing power from the computing power of the CPU on the server, and obtaining the CPU thread number corresponding to the Pcpu_chs; s3, generating a partition weight array Q of the server; s4, dividing the video pixels according to the weight array Q < - > and the total weight of the total computing capacity; s5, distributing the video blocks to a CPU part and a GPU part respectively through multithreading; performing super-resolution calculation on the cut video blocks in parallel; and S6, synthesizing the result obtained by the super-resolution calculation to obtain a super-resolution reconstructed image. The method can equalize the computing capacities of the CPU and the GPU.

Description

Video super-division parallel method, server and medium based on image segmentation
Technical Field
The invention relates to the technical field of image processing, in particular to a video super-division parallel method, a server and a medium based on image segmentation.
Background
The video super-resolution reconstruction process is to reconstruct a low-resolution video into a high-resolution video, so that old video, old movies, old television programs and old cartoons can be repaired, and the video quality is improved. In the calculation process, the video is required to be decoded into original images, each image or a group of images is used as input to perform super-resolution calculation, and the calculated images are recoded to obtain new video. Wherein the super-resolution calculation time is the longest and is also the core of the calculation.
The video super-resolution calculation has higher requirement on calculation force and high time efficiency, so that the calculation capability of all calculation resources is fully exerted on a server, and the calculation speed of the single-frame super-resolution is necessary to be improved. The existing server generally adopts a CPU+multi-GPU architecture, the computing power of the CPU is completely different from that of the GPUs, the computing power of the GPUs of different models is completely different, and the performance difference of the GPUs of the same model is also caused by depreciation and wear; how to schedule various computing resources on a server to perform video super-resolution computation and improve the computing efficiency of a video single frame is a difficult problem to be solved.
The currently common parallel method for the super-resolution of the video is based on video frames, each video frame can be regarded as a task, a plurality of tasks can be calculated in parallel, and when the number of the video frames is small, the parallel efficiency is severely restricted; however, when calculating a single frame image, on the architecture of a pure CPU, the method accelerates a calculation operator by establishing threads on a plurality of CPU cores; on the architecture of CPU+GPU, the computing operators are accelerated by 1 GPU, and the parallel granularity of the computing operators is limited, so that the acceleration is limited. When the single server is a multi-path CPU+multi-GPU, all CPU and GPU resources cannot be accelerated to a single frame of video at the same time.
The background description provided herein is for the purpose of generally presenting the context of the disclosure. Unless otherwise indicated herein, the materials described in this section are not prior art to the claims in this application and are not admitted to be prior art by inclusion in this section.
Disclosure of Invention
Aiming at the technical problems in the related art, the invention provides a video super-division parallel method based on image segmentation, which comprises the following steps:
s1, obtaining an average value Pgpu_ave of the super-division computing capacity of all the GPUs on a server;
s2, obtaining CPU computing capacity Pcpu_chs closest to an average value Pgpu_ave of the GPU super-division computing capacity from the computing capacities of the CPUs on the server, and obtaining CPU thread number thum_cpu corresponding to the Pcpu_chs;
s3, generating a partition weight array Q [ ] of the server, wherein the length is equal to part_all=part_cpu+n_gpu, and part_all represents the total number of the cutting parts; q [0] to Q [ n_gpu-1] = Pgpu [0] to Pgpu [ n_gpu-1], Q [ n_gpu ] - [ part_all-1] = Pcpu_chs; wherein n_gpu represents the number of GPUs on the server; part_cpu=core_num/num_cpu; where core_num represents the total CPU core number of the server, and part_cpu represents the part of the image cut to which the CPU will divide;
s4, dividing the video pixels according to the weight array Q < - > and the total computing power total weight Qsum of the weight array Q < - > to obtain divided image blocks;
s5, distributing the video blocks to a CPU part and a GPU part respectively through multithreading according to the cut video blocks; performing super-resolution calculation on the cut video blocks in parallel;
and S6, synthesizing the result obtained by the super-resolution calculation to obtain a super-resolution reconstructed image.
Specifically, before the step S1, the method includes:
step 1, taking a preset frame video as a video test sample;
step 2, taking the convention array from N CPU cores, and setting a list according to the convention array; including L common divisors; initializing a convention array thum with length L, and storing elements in each convention; the server comprises N CPU cores and M GPUs;
step 3, initializing a CPU computing capacity array Pcpu with the length of L, and storing elements to use the superdivision computing capacity of the CPU under different thread numbers;
step 4, initializing i=0, wherein i represents a common divisor index number;
step 5, setting the CPU thread number participating in calculation as the number of threads of the CPU;
step 6, performing super-division calculation on the video of the preset frame on the thread of the ium [ i ] of the CPU; recording and calculating the service time to be T1;
step 7, calculating video super-resolution calculation capacity Pcpu [ i ] of the CPU core, wherein Pcpu [ i ] =1/T1;
step 8,i =i+1;
step 9, judging whether i is smaller than L, if yes, returning to step 5, and if not, executing step 10;
step 10, initializing a GPU computing capability array Pgpu, wherein the length is n_gpu, and the elements store the super-division computing capabilities of different GPUs;
step 11, initializing i=0, wherein i represents the number of the GPU card;
step 12, performing super-division calculation on the video of the preset frame on the GPU; recording and calculating the service time as T2;
step 13, calculating video super-resolution calculation capability Pgpu [ i ], pcpu [ i ] =1/T2 of the GPU;
step 14, i=i+1;
step 15, judging whether i is smaller than n_gpu, if so, returning to step 12, and if not, ending.
Specifically, the step S2 specifically includes:
step 21, taking an average value of all GPU superfraction computing capacities on the server, wherein Pgpu_ave=avger (Pgpu [ ]);
step 22, selecting a value Pcpu_chs closest to the Pcpu_ave value in the Pcpu [ ] array; i.e. Pcpu _ chs = min (|pcpu [ ] -Pgpu _ ave|), and find the Pcpu [ ] index corresponding to Pcpu _ chs,
step 23, taking the sum_cpu=sum [ index ];
step 24, taking part_cpu=core_num/num_cpu; where core_num represents the total number of CPU cores of the server, part_cpu represents the part of the image cut into which the CPU will split, and thum_cpu represents the number of threads each part participates in the calculation.
Specifically, the step S4 specifically includes:
step 41, obtaining a video frame needing super division, wherein the image pixels are length;
step 42, taking the larger value in Dim equal to length, width, i.e. dim=max (length, width);
step 43, cutting on Dim, initializing a pixel start array Dstart and a pixel start array end array d_end, the length being equal to part_all;
step 44, initializing i=0, wherein i represents a part_all index number;
step 45, if i=0, dstart [ i ] =1;
otherwise, dstart [ i ] =dstart [ i-1] +dim Q [ i-1]/Qsum;
step 46, i=i+1;
step 47, judging whether i is smaller than part_all, if yes, returning to step 45, and if not, jumping to step 48;
step 48, initializing i=0, wherein i represents a part_all index number;
step 49, if i=part_all-1, dend [ i ] =dim; otherwise, dend [ i ] =dstart [ i+1] -1;
step 50, i=i+1;
step 51, judging whether i is smaller than part_all, if yes, returning to step 50, if no, ending.
Specifically, in the step 45, if i=0, dstart [ i ] =1; otherwise, dstart [ i ] =dstart [ i-1] +dim Q [ i-1]/Qsum-buff;
in the step 49, it is: if i=part_all-1, dend [ i ] =dim, otherwise, dend [ i ] =Dstart [ i+1] -1+buff; where buff is the buffer.
Specifically, the step S5 is:
step 52, starting a multithreading function;
step 53, according to the segmented image, sending the part of the GPU to the video memory;
step 54, different CPU thread groups acquire cut images in the memory;
step 55, performing super-division calculation on the respective images by the CPU thread group and the GPU in parallel;
specifically, the step S6 is: and synthesizing the results obtained by the super-resolution calculation, and averaging the overlapped pixels in a buffer area to obtain the super-resolution reconstructed image.
Specifically, the step S6 is:
step 61, combining the calculated parts of the GPU and the CPU;
step 62, on the area of the buffer zone, the pixel values are averaged by the common ownership of the left and right sides;
step 63, completing the synthesis of the picture.
In a second aspect, another embodiment of the present invention discloses a server, where the server includes N CPUs and M GPUs, where N and M are positive integers greater than 1, and the server is configured to implement the above-mentioned video super-division parallel method based on image segmentation.
In a third aspect, another embodiment of the present invention discloses a non-volatile storage medium having stored thereon instructions that, when executed by a processor, are configured to implement a video hyper-segmentation parallelism method based on image segmentation as described above.
According to the method, the CPU and the GPU in the server are quantized, the corresponding thread numbers of the CPU participating in calculation are obtained according to the quantized average computing capacity of the GPU, the corresponding segmentation weights are generated, and then the video image is segmented according to the segmentation weights to carry out parallel super-resolution reconstruction. The video super-division parallel method based on image segmentation can balance the computing capacities of the CPU and the GPU, and cannot cause unbalanced load. In addition, the super division of the video single frame can be effectively accelerated in parallel; therefore, the computing performance of various resources on a single server can be fully exerted. Further, to avoid preventing significant cracking, a buffer is required at the cut.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are needed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a video superdivision parallel method based on image segmentation provided by an embodiment of the invention;
fig. 2 is a schematic diagram of single-frame image video cutting according to an embodiment of the present invention;
fig. 3 is a schematic diagram of a video super-division parallel device based on image division according to an embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which are derived by a person skilled in the art based on the embodiments of the invention, fall within the scope of protection of the invention.
Example 1
Referring to fig. 1, the embodiment discloses a video super-division parallel method based on image segmentation, which comprises the following steps:
s1, obtaining an average value Pgpu_ave of the super-division computing capacity of all the GPUs on a server;
in this embodiment, before step S1, the super-resolution computing power of all GPUs and CPUs on the server needs to be obtained.
Specifically, in this embodiment, a reference test program is set, and the superminute computing capacities of the CPU and the GPU in the server for video superminute are evaluated according to the reference test program; stored in arrays of Pcpu and Pcpu; the server of the present embodiment includes N CPU cores and M GPUs in total;
step 1, taking a preset frame video as a video test sample; the preset frame of the embodiment is 10 frames;
step 2, taking the convention array from N CPU cores, and setting a list according to the convention array; including L common divisors; initializing a convention array thum with length L, and storing elements in each convention;
for example: let the CPU have 64 CPU cores, l=7, common divisors 64, 32, 16, 8, 4, 2, 1, thu [0] =64.
Step 3, initializing a CPU computing capacity array Pcpu with the length of L, and storing elements to use the superdivision computing capacity of the CPU under different thread numbers;
step 4, initializing i=0, wherein i represents a common divisor index number;
step 5, setting the CPU thread number participating in calculation as the number of threads of the CPU;
step 6, performing super-division calculation on 10 frames of video on a thread of the ium [ i ] of the CPU; recording and calculating the service time to be T1;
step 7, calculating video super-resolution calculation capacity Pcpu [ i ] of the CPU core, wherein Pcpu [ i ] =1/T1;
step 8,i =i+1;
step 9, judging whether i is smaller than L, if yes, returning to step 5, and if not, executing step 10;
step 10, initializing a GPU computing capability array Pgpu, wherein the length is n_gpu, and the elements store the super-division computing capabilities of different GPUs (the situation that different types of GPU cards are inserted into one server exists, and even if the GPU cards are the same, the situation that the performance is reduced is also caused;
step 11, initializing i=0, wherein i represents the number of the GPU card;
step 12, performing super-division calculation on the 10 frames of video on the GPU; recording and calculating the service time as T2;
step 13, calculating video super-resolution calculation capability Pgpu [ i ], pcpu [ i ] =1/T2 of the GPU;
step 14, i=i+1;
step 15, judging whether i is smaller than n_gpu, if so, returning to step 12, and if not, ending;
s2, obtaining CPU computing capacity Pcpu_chs closest to an average value Pgpu_ave of the GPU super-division computing capacity from the computing capacities of the CPUs on the server, and obtaining CPU thread number thum_cpu corresponding to the Pcpu_chs;
the step S2 specifically comprises the following steps:
step 21, taking an average value of all GPU superfraction computing capacities on the server, wherein Pgpu_ave=avger (Pgpu [ ]);
step 22, selecting a value Pcpu_chs closest to the Pcpu_ave value in the Pcpu [ ] array; i.e. Pcpu _ chs = min (|pcpu [ ] -Pgpu _ ave|), and find the Pcpu [ ] index corresponding to Pcpu _ chs,
step 23, taking the sum_cpu=sum [ index ];
step 24, taking part_cpu=core_num/num_cpu; where core_num represents the total CPU core number of the server, part_cpu represents the part of the image cut to which the CPU will divide, and thum_cpu represents the number of threads each part participates in calculation;
s3, generating a partition weight array Q [ ] of the server, wherein the length is equal to part_all=part_cpu+n_gpu, and part_all represents the total number of the cutting parts; q [0] to Q [ n_gpu-1] = Pgpu [0] to Pgpu [ n_gpu-1], Q [ n_gpu ] - [ part_all-1] = Pcpu_chs; wherein n_gpu represents the number of GPUs on the server; part_cpu=core_num/num_cpu; where core_num represents the total CPU core number of the server, and part_cpu represents the part of the image cut to which the CPU will divide;
for example, the server has 64 cpu cores and 2 GPU cards, taking part_cpu=2, sum_cpu=32, and the partition weight set Q [ ] is assigned, and one example partition weight set Q [ ] is assigned to Pgpu [0], pgpu [1], pcpu_chs, respectively.
The total computing power total weight Qsum is equal to the sum of the weight arrays Q [ ]; qsum=sum (Qsum [ ]);
s4, dividing the video pixels according to the weight array Q < - > and the total computing power total weight Qsum of the weight array Q < - > to obtain divided image blocks;
specifically, the steps of this embodiment include:
step 41, obtaining a video frame needing super division, wherein the image pixels are length;
step 42, taking the larger value in Dim equal to length, width, i.e. dim=max (length, width);
step 43, cutting on Dim, initializing a pixel start array Dstart and a pixel start array end array d_end, the length being equal to part_all;
step 44, initializing i=0, wherein i represents a part_all index number;
step 45, if i=0, dstart [ i ] =1;
otherwise, dstart [ i ] =dstart [ i-1] +dim Q [ i-1]/Qsum;
step 46, i=i+1;
step 47, judging whether i is smaller than part_all, if yes, returning to step 45, and if not, jumping to step 48;
step 48, initializing i=0, wherein i represents a part_all index number;
step 49, if i=part_all-1, dend [ i ] =dim; otherwise, dend [ i ] =dstart [ i+1] -1;
step 50, i=i+1;
step 51, judging whether i is smaller than part_all, if yes, returning to step 50, if no, ending.
For example, assuming length > width, a cut is made on length, the server has 64 CPU cores and 2 GPU cards, part_cpu=2, and therefore_cpu=32, for example, as shown in fig. 2, the middle part of cpu_0 and cpu_1 in the figure is a buffer area, and the length is 2×buff=20;
in order to prevent obvious cracks, the embodiment needs to provide a buffer area at the cutting position;
specifically, in the step 45, if i=0, dstart [ i ] =1; otherwise, dstart [ i ] =dstart [ i-1] +dim Q [ i-1]/Qsum-buff;
then in step 49 above is: if i=part_all-1, dend [ i ] =dim, otherwise, dend [ i ] =Dstart [ i+1] -1+buff;
s5, distributing the video blocks to a CPU part and a GPU part respectively through multithreading according to the cut video blocks; performing super-resolution calculation on the cut video blocks in parallel;
step 52, starting a multithreading function;
step 53, according to the segmented image, sending the part of the GPU to the video memory;
step 54, different CPU thread groups acquire cut images in the memory;
step 55, performing super-division calculation on the respective images by the CPU thread group and the GPU in parallel;
s6, synthesizing the result obtained by the super-resolution calculation to obtain a super-resolution reconstructed image;
specifically, after setting the buffer area buff, step S6 of this embodiment is: synthesizing the results obtained by the super-resolution calculation, and averaging the overlapped pixels in a buffer area to obtain a super-resolution reconstructed image;
step 61, combining the calculated parts of the GPU and the CPU;
step 62, on the area of the buffer zone, the pixel values are averaged by the common ownership of the left and right sides;
for example, as shown in fig. 2, there is a buffer overlap region of 20 x width between gpu_1 and cpu_0, and the pixels become 80 x (4 width) after the pixels are super-divided, wherein the pixel value of gpu_1 after super-division is denoted as buff_gpus 1, the pixel value of cpu_0 after super-division is denoted as buff_cpus 0, and each pixel value at 80 x (4 width) is obtained by using (buff_gpus 1+buff_cpus 0)/2.
Step 63, completing the synthesis of the picture;
step 64, end;
according to the embodiment, the computing capacities of the CPU and the GPU in the server are quantized, the corresponding thread numbers of the CPU participating in computation are obtained according to the quantized average computing capacity of the GPU, the corresponding segmentation weights are generated, and then the video image is segmented according to the segmentation weights to carry out parallel super-resolution reconstruction. The video super-division parallel method based on image segmentation can balance the computing capacities of the CPU and the GPU, and cannot cause unbalanced load. In addition, the super division of the video single frame can be effectively accelerated in parallel; therefore, the computing performance of various resources on a single server can be fully exerted. Further, to avoid preventing significant cracking, a buffer is required at the cut.
Example two
The embodiment discloses a video super-division parallel method based on image segmentation, which comprises the following steps:
s1, setting a reference test program, and evaluating the superdivision computing capacities of a CPU and a GPU of a server for video superdivision according to the reference test program; stored in arrays of Pcpu and Pcpu;
step 1, taking 10 frames of video as a video test sample;
step 2, the selected server comprises N CPU cores and M GPUs in total;
step 3, taking the convention array from the N CPU cores, and setting a list according to the convention array; including L common divisors; initializing a convention array thum with length L, and storing elements in each convention;
for example: let the CPU have 64 CPU cores, l=7, common divisor 64, 32, 16, 8, 4, 2, 1; tham [0] =64.
Step 4, initializing a CPU computing capacity array Pcpu with the length of L, and storing elements to use the superdivision computing capacity of the CPU under different thread numbers;
step 5, initializing i=0, wherein i represents a common divisor index number;
step 6, setting the CPU thread number participating in calculation as the number of threads of the CPU;
step 7, performing super-division calculation on 10 frames of video on a thread of the ium [ i ] of the CPU; recording and calculating the service time to be T1;
step 8, calculating video super-resolution calculation capacity Pcpu [ i ] of the CPU core, wherein Pcpu [ i ] =1/T1;
step 9,i =i+1;
step 10, judging whether i is smaller than L, if yes, returning to step 6, and if not, executing step 11;
step 11, initializing a GPU computing capability array Pgpu, wherein the length is n_gpu, and the elements store the super-division computing capabilities of different GPUs (the situation that different types of GPU cards are inserted into one server exists, and even if the GPU cards are the same, the situation that the performance is reduced is also caused;
step 12, initializing i=0, wherein i represents the number of the GPU card;
step 13, performing super-division calculation on the 10 frames of video on the GPU; recording and calculating the service time as T2;
step 14, calculating video super-resolution calculation capability Pgpu [ i ], pcpu [ i ] =1/T2 of the GPU;
step 15, i=i+1;
step 16, judging whether i is smaller than n_gpu, if so, returning to step 13, and if not, ending;
s2, cutting the video image according to the computing capacities of the CPU and the GPU on the server, setting cutting weights, and enabling the sizes of the cut images to be similar as far as possible in consideration of the fact that the images are not suitable to be cut too small;
step 1, taking an average value of the superfraction computing capacities of all the GPUs on a server, wherein pgpu_ave=avger (Pgpu [ ]);
step 2, selecting a value Pcpu_chs closest to the Pcpu_ave value in the Pcpu [ ] array; i.e. Pcpu _ chs = min (|pcpu [ ] -Pgpu _ ave|), and find the Pcpu [ ] index corresponding to Pcpu _ chs,
step 3, taking the sum_cpu=sum [ index ];
step 4, taking part_cpu=core_num/num_cpu; where core_num represents the total CPU core number of the server, part_cpu represents the part of the image cut to which the CPU will divide, and thum_cpu represents the number of threads each part participates in calculation;
step 5, generating a new partition weight array Q [ ] of the current server, wherein the length is equal to part_all=part_cpu+n_gpu, and part_all represents the total number of the cut parts; q [0] to Q [ n_gpu-1] = Pgpu [0] to Pgpu [ n_gpu-1], Q [ n_gpu ] - [ part_all-1] = Pcpu_chs;
for example, the server has 64 cpu cores and 2 GPU cards, taking part_cpu=2, sum_cpu=32, and the partition weight set Q [ ] is assigned, and one example partition weight set Q [ ] is assigned to Pgpu [0], pgpu [1], pcpu_chs, respectively.
Step 6, taking the total computing power total weight Qsum equal to the sum of the weight array Q [ ]; qsum=sum (Qsum [ ]);
s3, cutting according to the weight value, and cutting by taking larger data in the video pixel length and the pixel height width; in order to prevent obvious cracks, a buffer area is required to be arranged at the cutting position;
step 1, obtaining a video frame needing super division, wherein an image pixel is length;
step 2, taking the larger value in Dim equal to length and width, namely dim=max (length and width);
step 3, cutting on Dim, initializing a pixel start array Dstart and a pixel start array end array D_end, wherein the length is equal to part_all;
step 4, setting a buffer area with a pixel width of buff, and taking buff=10;
step 5, initializing i=0, wherein i represents a part_all index number;
step 6, if i=0, dstart [ i ] =1;
otherwise, dstart [ i ] =dstart [ i-1] +dstate [ i-1]/Qsum-buff;
step 7,i =i+1;
step 8, judging whether i is smaller than part_all, if so, returning to step 6, and if not, jumping to step 9;
step 9, initializing i=0, wherein i represents a part_all index number;
step 10, if i=part_all-1, dend [ i ] =dim, otherwise, dend [ i ] =dstart [ i+1] -1+buff;
step 11, i=i+1;
step 12, judging whether i is smaller than part_all, if so, returning to step 10, and if not, ending;
for example, assuming length > width, a cut is made on length, the server has 64 CPU cores and 2 GPU cards, part_cpu=2, and therefore_cpu=32, for example, as shown in fig. 2, the middle part of cpu_0 and cpu_1 in the figure is a buffer area, and the length is 2×buff=20;
s4, respectively distributing tasks to a CPU part and a GPU part through multithreading according to the cut video; performing super-resolution calculation on the cut pictures in parallel;
step 1, starting a multithreading function;
step 2, according to the segmented image, sending the part of the GPU to a video memory;
step 3, different CPU thread groups acquire cut images in the memory;
step 4, performing super-division calculation on the respective images by the CPU thread group and the GPU in parallel;
s5, synthesizing results obtained by super-resolution calculation, and averaging overlapped pixels in a buffer area;
step 1, synthesizing the calculated parts of the GPU and the CPU;
step 2, on the area of the buffer zone, the pixel values are averaged by sharing all the left and right sides;
for example, as shown in fig. 2, there is a buffer overlap region of 20 x width between gpu_1 and cpu_0, and the pixels become 80 x (4 width) after the pixels are super-divided, wherein the pixel value of gpu_1 after super-division is denoted as buff_gpus 1, the pixel value of cpu_0 after super-division is denoted as buff_cpus 0, and each pixel value at 80 x (4 width) is obtained by using (buff_gpus 1+buff_cpus 0)/2.
Step 3, completing the synthesis of the picture;
step 4, ending;
example III
The server comprises N CPUs and M GPUs, wherein N and M are positive integers greater than 1, and the server is used for realizing the video super-division parallel method based on image segmentation.
Example IV
Referring to fig. 3, fig. 3 is a schematic structural diagram of a video super-division parallel device based on image division according to the present embodiment. The video superdivision parallel device 20 based on image segmentation of this embodiment comprises a processor 21, a memory 22 and a computer program stored in said memory 22 and executable on said processor 21. The steps of the above-described method embodiments are implemented by the processor 21 when executing the computer program. Alternatively, the processor 21 may implement the functions of the modules/units in the above-described device embodiments when executing the computer program.
Illustratively, the computer program may be partitioned into one or more modules/units that are stored in the memory 22 and executed by the processor 21 to complete the present invention. The one or more modules/units may be a series of computer program instruction segments capable of performing a specific function for describing the execution of the computer program in the image segmentation based video superparallel device 20.
The image segmentation based video superdivision parallel device 20 may include, but is not limited to, a processor 21, a memory 22. It will be appreciated by those skilled in the art that the schematic diagram is merely an example of the image segmentation based video superparallel device 20, and does not constitute a limitation of the image segmentation based video superparallel device 20, and may include more or less components than illustrated, or combine certain components, or different components, e.g., the image segmentation based video superparallel device 20 may further include an input-output device, a network access device, a bus, etc.
The processor 21 may be a central processing unit (Central Processing Unit, CPU), but may also be other general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), off-the-shelf programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. The general purpose processor may be a microprocessor or the processor may be any conventional processor, etc., and the processor 21 is a control center of the image segmentation-based video superparallel device 20, and connects various parts of the entire image segmentation-based video superparallel device 20 using various interfaces and lines.
The memory 22 may be used to store the computer program and/or module, and the processor 21 may implement various functions of the image segmentation-based video superdivision parallel device 20 by executing or executing the computer program and/or module stored in the memory 22 and invoking data stored in the memory 22. The memory 22 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program (such as a sound playing function, an image playing function, etc.) required for at least one function, and the like; the storage data area may store data (such as audio data, phonebook, etc.) created according to the use of the handset, etc. In addition, the memory 22 may include high-speed random access memory, and may also include non-volatile memory, such as a hard disk, memory, plug-in hard disk, smart Media Card (SMC), secure Digital (SD) Card, flash Card (Flash Card), at least one disk storage device, flash memory device, or other volatile solid-state storage device.
Wherein the modules/units integrated by the image segmentation-based video supersegmentation parallel device 20 may be stored in a computer readable storage medium if implemented in the form of software functional units and sold or used as a stand alone product. Based on such understanding, the present invention may implement all or part of the flow of the method of the above embodiment, or may be implemented by a computer program to instruct related hardware, where the computer program may be stored in a computer readable storage medium, and the computer program may implement the steps of each of the method embodiments described above when executed by the processor 21. Wherein the computer program comprises computer program code which may be in source code form, object code form, executable file or some intermediate form etc. The computer readable medium may include: any entity or device capable of carrying the computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), an electrical carrier signal, a telecommunications signal, a software distribution medium, and so forth. It should be noted that the computer readable medium contains content that can be appropriately scaled according to the requirements of jurisdictions in which such content is subject to legislation and patent practice, such as in certain jurisdictions in which such content is subject to legislation and patent practice, the computer readable medium does not include electrical carrier signals and telecommunication signals.
It should be noted that the above-described apparatus embodiments are merely illustrative, and the units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. In addition, in the drawings of the embodiment of the device provided by the invention, the connection relation between the modules represents that the modules have communication connection, and can be specifically implemented as one or more communication buses or signal lines. Those of ordinary skill in the art will understand and implement the present invention without undue burden.
The foregoing description of the preferred embodiments of the invention is not intended to be limiting, but rather is intended to cover all modifications, equivalents, alternatives, and improvements that fall within the spirit and scope of the invention.

Claims (10)

1. A video super-division parallel method based on image segmentation is characterized by comprising the following steps of: the method comprises the following steps:
s1, obtaining an average value Pgpu_ave of the super-division computing capacity of all the GPUs on a server; the server is pre-stored with the superdivision computing capacities of the CPU and the GPU evaluated according to the reference test program, and the superdivision computing capacities are respectively stored in Pcpu </SUB > ] and Pgpu </SUB > ] arrays;
s2, obtaining CPU computing capacity Pcpu_chs closest to an average value Pgpu_ave of the GPU super-division computing capacity from the computing capacities of the CPUs on the server, and obtaining CPU thread number thum_cpu corresponding to the Pcpu_chs; the obtaining the CPU thread number thum_cpu corresponding to the pcpu_chs specifically includes: finding Pcpu [ ] index corresponding to Pcpu_chs, and obtaining CPU thread number thum_cpu from the convention array thum according to the index; the convention array sum is an array of the convention of N CPU cores, and a list is set according to the convention array;
s3, generating a partition weight array Q [ ] of the server, wherein the length is equal to part_all=part_cpu+n_gpu, and part_all represents the total number of the cutting parts; q [0] to Q [ n_gpu-1] = Pgpu [0] to Pgpu [ n_gpu-1], Q [ n_gpu ] - [ part_all-1] = Pcpu_chs; wherein n_gpu represents the number of GPUs on the server; part_cpu=core_num/num_cpu; where core_num represents the total CPU core number of the server, and part_cpu represents the part of the image cut to which the CPU will divide;
s4, dividing the video pixels according to the weight array Q < - > and the total computing power total weight Qsum of the weight array Q < - > to obtain divided image blocks;
s5, distributing the video blocks to a CPU part and a GPU part respectively through multithreading according to the cut video blocks; performing super-resolution calculation on the cut video blocks in parallel;
and S6, synthesizing the result obtained by the super-resolution calculation to obtain a super-resolution reconstructed image.
2. The method according to claim 1, characterized in that: before the step S1, the method includes:
step 1, taking a preset frame video as a video test sample;
step 2, taking the convention array from N CPU cores, and setting a list according to the convention array; including L common divisors; initializing a convention array thum with length L, and storing elements in each convention; the server comprises N CPU cores and M GPUs;
step 3, initializing a CPU computing capacity array Pcpu with the length of L, and storing elements to use the superdivision computing capacity of the CPU under different thread numbers;
step 4, initializing i=0, wherein i represents a common divisor index number;
step 5, setting the CPU thread number participating in calculation as the number of threads of the CPU;
step 6, performing super-division calculation on the video of the preset frame on the thread of the ium [ i ] of the CPU; recording and calculating the service time to be T1;
step 7, calculating video super-resolution calculation capacity Pcpu [ i ] of the CPU core, wherein Pcpu [ i ] =1/T1;
step 8,i =i+1;
step 9, judging whether i is smaller than L, if yes, returning to step 5, and if not, executing step 10;
step 10, initializing a GPU computing capability array Pgpu, wherein the length is n_gpu, and the elements store the super-division computing capabilities of different GPUs;
step 11, initializing i=0, wherein i represents the number of the GPU card;
step 12, performing super-division calculation on the video of the preset frame on the GPU; recording and calculating the service time as T2;
step 13, calculating video super-resolution calculation capability Pgpu [ i ], pcpu [ i ] =1/T2 of the GPU;
step 14, i=i+1;
step 15, judging whether i is smaller than n_gpu, if so, returning to step 12, and if not, ending.
3. The method according to claim 1, characterized in that: the step S2 specifically includes:
step 21, taking an average value of all GPU superfraction computing capacities on the server, wherein Pgpu_ave=avger (Pgpu [ ]);
step 22, selecting a value Pcpu_chs closest to the Pcpu_ave value in the Pcpu [ ] array; i.e. Pcpu _ chs = min (|pcpu [ ] -Pgpu _ ave|), and find the Pcpu [ ] index corresponding to Pcpu _ chs,
step 23, taking the sum_cpu=sum [ index ];
step 24, taking part_cpu=core_num/num_cpu; where core_num represents the total number of CPU cores of the server, part_cpu represents the part of the image cut into which the CPU will split, and thum_cpu represents the number of threads each part participates in the calculation.
4. A method according to claim 3, characterized in that: the step S4 specifically includes:
step 41, obtaining a video frame needing super division, wherein the image pixels are length;
step 42, taking the larger value in Dim equal to length, width, i.e. dim=max (length, width);
step 43, cutting on Dim, initializing a pixel start array Dstart and a pixel start array end array d_end, the length being equal to part_all;
step 44, initializing i=0, wherein i represents a part_all index number;
step 45, if i=0, dstart [ i ] =1;
otherwise, dstart [ i ] =dstart [ i-1] +dim Q [ i-1]/Qsum;
step 46, i=i+1;
step 47, judging whether i is smaller than part_all, if yes, returning to step 45, and if not, jumping to step 48;
step 48, initializing i=0, wherein i represents a part_all index number;
step 49, if i=part_all-1, dend [ i ] =dim; otherwise, dend [ i ] =dstart [ i+1] -1;
step 50, i=i+1;
step 51, judging whether i is smaller than part_all, if yes, returning to step 50, if no, ending.
5. The method according to claim 4, wherein: step 45 is, if i=0, dstart [ i ] =1; otherwise, dstart [ i ] =dstart [ i-1] +dim Q [ i-1]/Qsum-buff;
in the step 49, it is: if i=part_all-1, dend [ i ] =dim, otherwise, dend [ i ] =Dstart [ i+1] -1+buff; where buff is the buffer.
6. The method according to any one of claims 4-5, wherein: the step S5 is as follows:
step 52, starting a multithreading function;
step 53, according to the segmented image, sending the part of the GPU to the video memory;
step 54, different CPU thread groups acquire cut images in the memory;
in step 55, the cpu thread group and the GPU perform super-division computation on the respective images in parallel.
7. The method according to claim 5, wherein: the step S6 is as follows: and synthesizing the results obtained by the super-resolution calculation, and averaging the overlapped pixels in a buffer area to obtain the super-resolution reconstructed image.
8. The method according to claim 7, wherein: the step S6 is as follows:
step 61, combining the calculated parts of the GPU and the CPU;
step 62, on the area of the buffer zone, the pixel values are averaged by the common ownership of the left and right sides;
step 63, completing the synthesis of the picture.
9. A server comprising N CPUs and M GPUs, wherein N, M are positive integers greater than 1, characterized in that: the server is used for realizing the video super-division parallel method based on image segmentation as claimed in any one of claims 1-8.
10. A non-volatile storage medium having instructions stored thereon, characterized by: the instructions, when executed by a processor, for implementing a video hyper-segmentation parallelism method based on image segmentation according to one of the claims 1-8.
CN202310738868.0A 2023-06-21 2023-06-21 Video super-division parallel method, server and medium based on image segmentation Active CN116483587B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310738868.0A CN116483587B (en) 2023-06-21 2023-06-21 Video super-division parallel method, server and medium based on image segmentation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310738868.0A CN116483587B (en) 2023-06-21 2023-06-21 Video super-division parallel method, server and medium based on image segmentation

Publications (2)

Publication Number Publication Date
CN116483587A CN116483587A (en) 2023-07-25
CN116483587B true CN116483587B (en) 2023-09-08

Family

ID=87221811

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310738868.0A Active CN116483587B (en) 2023-06-21 2023-06-21 Video super-division parallel method, server and medium based on image segmentation

Country Status (1)

Country Link
CN (1) CN116483587B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117687802B (en) * 2024-02-02 2024-04-30 湖南马栏山视频先进技术研究院有限公司 Deep learning parallel scheduling method and device based on cloud platform and cloud platform

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102497547A (en) * 2011-11-30 2012-06-13 国云科技股份有限公司 Processing method for cloud terminal video data
CN103617626A (en) * 2013-12-16 2014-03-05 武汉狮图空间信息技术有限公司 Central processing unit (CPU) and ground power unit (GPU)-based remote-sensing image multi-scale heterogeneous parallel segmentation method
CN107945098A (en) * 2017-11-24 2018-04-20 腾讯科技(深圳)有限公司 Image processing method, device, computer equipment and storage medium
CN112988395A (en) * 2021-04-20 2021-06-18 宁波兰茜生物科技有限公司 Pathological analysis method and device of extensible heterogeneous edge computing framework
CN114398167A (en) * 2021-12-03 2022-04-26 武汉大学 Automatic load balancing method for CPU-GPU two-stage parallel computation
WO2022111631A1 (en) * 2020-11-30 2022-06-02 华为技术有限公司 Video transmission method, server, terminal, and video transmission system
CN116149841A (en) * 2022-11-22 2023-05-23 上海热璞网络科技有限公司 Processor resource dynamic superdivision method based on cloud database instance load

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2192780A1 (en) * 2008-11-28 2010-06-02 Thomson Licensing Method for video decoding supported by Graphics Processing Unit
US10489887B2 (en) * 2017-04-10 2019-11-26 Samsung Electronics Co., Ltd. System and method for deep learning image super resolution
CN109903221B (en) * 2018-04-04 2023-08-22 华为技术有限公司 Image super-division method and device
CN111489279B (en) * 2019-01-25 2023-10-31 深圳富联富桂精密工业有限公司 GPU acceleration optimization method and device and computer storage medium
US20220138596A1 (en) * 2020-11-02 2022-05-05 Adobe Inc. Increasing efficiency of inferencing digital videos utilizing machine-learning models

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102497547A (en) * 2011-11-30 2012-06-13 国云科技股份有限公司 Processing method for cloud terminal video data
CN103617626A (en) * 2013-12-16 2014-03-05 武汉狮图空间信息技术有限公司 Central processing unit (CPU) and ground power unit (GPU)-based remote-sensing image multi-scale heterogeneous parallel segmentation method
CN107945098A (en) * 2017-11-24 2018-04-20 腾讯科技(深圳)有限公司 Image processing method, device, computer equipment and storage medium
WO2022111631A1 (en) * 2020-11-30 2022-06-02 华为技术有限公司 Video transmission method, server, terminal, and video transmission system
CN112988395A (en) * 2021-04-20 2021-06-18 宁波兰茜生物科技有限公司 Pathological analysis method and device of extensible heterogeneous edge computing framework
CN114398167A (en) * 2021-12-03 2022-04-26 武汉大学 Automatic load balancing method for CPU-GPU two-stage parallel computation
CN116149841A (en) * 2022-11-22 2023-05-23 上海热璞网络科技有限公司 Processor resource dynamic superdivision method based on cloud database instance load

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
PMVS算法的CPU多线程和GPU两级粒度并行策略;刘金硕;江庄毅;徐亚渤;邓娟;章岚昕;;计算机科学(第02期);第296-301页 *

Also Published As

Publication number Publication date
CN116483587A (en) 2023-07-25

Similar Documents

Publication Publication Date Title
CN104244024B (en) Video cover generation method and device and terminal
CN116483587B (en) Video super-division parallel method, server and medium based on image segmentation
CN116567170A (en) Video super-resolution parallel scheduling method and device for cloud platform and cloud platform
US20230325963A1 (en) Image processing method and apparatus for mobile terminal, and storage medium and terminal
CN111179369B (en) GPU rendering method and device based on android system
CN113015007B (en) Video frame inserting method and device and electronic equipment
US20230136022A1 (en) Virtual reality display device and control method thereof
CN109447893A (en) A kind of convolutional neural networks FPGA accelerate in image preprocessing method and device
CN111179402B (en) Rendering method, device and system of target object
CN114913063A (en) Image processing method and device
DK2504814T3 (en) DECODING SYSTEM AND PROCEDURE TO USE ON CODED TEXTURE ELEMENT BLOCKS
CN114339412A (en) Video quality enhancement method, mobile terminal, storage medium and device
CN113286174B (en) Video frame extraction method and device, electronic equipment and computer readable storage medium
CN112991170A (en) Method, device, terminal and storage medium for image super-resolution reconstruction
CN116489457A (en) Video display control method, device, equipment, system and storage medium
US9129543B2 (en) Texture compression and decompression
US6954207B2 (en) Method and apparatus for processing pixels based on segments
CN115049531A (en) Image rendering method and device, graphic processing equipment and storage medium
CN114697555A (en) Image processing method, device, equipment and storage medium
CN110428453B (en) Data processing method, data processing device, data processing equipment and storage medium
CN110418197B (en) Video transcoding method and device and computer readable storage medium
CN110969672A (en) Image compression method and device
CN113344884B (en) Video graphic region detection and compression method, device and medium
US11423600B2 (en) Methods and apparatus for configuring a texture filter pipeline for deep learning operation
CN117252751B (en) Geometric processing method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant