WO2009090964A1 - 同期制御方法および情報処理装置 - Google Patents
同期制御方法および情報処理装置 Download PDFInfo
- Publication number
- WO2009090964A1 WO2009090964A1 PCT/JP2009/050397 JP2009050397W WO2009090964A1 WO 2009090964 A1 WO2009090964 A1 WO 2009090964A1 JP 2009050397 W JP2009050397 W JP 2009050397W WO 2009090964 A1 WO2009090964 A1 WO 2009090964A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- thread
- current thread
- synchronization
- current
- waiting time
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/52—Program synchronisation; Mutual exclusion, e.g. by means of semaphores
Definitions
- the present invention relates to synchronous control when a plurality of threads are processed in parallel.
- multi-thread As a technique for executing a process corresponding to one application in an information processing apparatus, there is a technique called multi-thread in which a plurality of threads (unit processes) of the process are executed in parallel.
- the efficiency of the entire process can be improved by parallel processing of threads.
- the processing time may be longer than specified. An example is given below.
- the thread A refers to the data at the synchronization point b of the other thread B at the synchronization point a of this thread.
- the thread A arrives at the synchronization point a first, that is, the process up to the synchronization point a of the thread A is completed before the processing up to the synchronization point b of the thread B.
- the thread A waits until the thread B arrives at the synchronization point b, but the waiting time of the thread A is prolonged as the progress of the thread B is delayed. As a result, the processing time of the entire process is prolonged.
- an object of the present invention is to provide a synchronization control method and an information processing apparatus for efficiently performing parallel processing of threads in a multi-thread.
- the synchronization control method executes a current thread and a reference thread to which data is referenced by the current thread in parallel, and when the current thread reaches a first synchronization point, the reference thread If the determination is negative, the time required for the reference thread to reach the second synchronization point is determined as the waiting time of the current thread, and the current Estimating a quality difference between data generated by a thread with reference to processing data at the second synchronization point of the reference thread and data generated by the current thread without referring to the processing data; In accordance with the waiting time and the magnitude of the quality difference, it is determined whether to make the current thread wait until the reference thread reaches the second synchronization point.
- the information processing apparatus includes: a synchronization control unit that executes in parallel a current thread and a reference thread whose data is referred to by the current thread; and the reference when the current thread reaches a first synchronization point When the thread has not reached the second synchronization point, a waiting time calculation unit that obtains a time required for the reference thread to reach the second synchronization point as a waiting time of the current thread; and A quality difference calculation unit that estimates a quality difference between data generated by referring to processing data at the second synchronization point of the reference thread and data generated by the current thread without referring to the processing data And whether to make the current thread wait until the reference thread reaches the second synchronization point according to the waiting time and the magnitude of the quality difference. And a period judgment unit.
- FIG. 1 shows an information processing apparatus 400 according to the first embodiment of the present invention.
- the information processing apparatus 400 includes a current thread 401, a reference thread 402, a synchronization control unit 403, a standby time calculation unit 404, a synchronization determination unit 405, and a quality difference calculation unit 406.
- the information processing apparatus 400 is a multi-thread compatible computer and executes the current thread 401 and the reference thread 402 in parallel.
- the current thread 401 refers to the data handled by the reference thread 402 such as the processing result of the reference thread 402 and the data to be processed, and proceeds with the processing. Further, a synchronization point A is set for the current thread 401, and a synchronization point B is set for the reference thread 402.
- the synchronization point A corresponds to the first synchronization point in the present invention, and the synchronization point B corresponds to the second synchronization point.
- the waiting time calculation unit 404 waits for the current thread 401 until the reference thread 402 reaches the synchronization point B when the reference thread 402 does not reach the synchronization point B when the current thread 401 reaches the synchronization point A. Calculate The waiting time can be obtained, for example, from the product of the number of remaining processing blocks up to the synchronization point B in the reference thread 402 and the time required for processing per block.
- the current thread 401 that has reached the synchronization point A refers to the data at the synchronization point B of the reference thread 402 and the current thread 401 refers to the data of the current reference thread 402
- the difference in quality from the data is estimated.
- the quality difference depends on the progress status of the current thread 401, the control parameter of the block processed by each thread, the relationship between the processing blocks, and the like. For example, if the current thread 401 cannot obtain a good processing result without data at the synchronization point B of the reference thread 402, the quality difference becomes large. Conversely, if a good processing result can be obtained from the current thread 401 without the data, the quality difference is small.
- the synchronization determination unit 405 is configured to synchronize the current thread 401 with the reference thread 402 based on the standby time calculated by the standby time calculation unit 404 and the quality difference estimated by the quality difference calculation unit 406. Decide whether to wait.
- the synchronization control unit 403 controls the progress of the current thread 401 in accordance with the determination by the synchronization determination unit 405. That is, when the synchronization determination unit 405 determines to perform synchronization, the current thread 401 is made to wait until the reference thread 402 reaches a predetermined synchronization point. If the synchronization determination unit 405 determines not to synchronize, the process is continued without causing the current thread 401 to wait.
- FIG. 2 schematically shows how the current thread 401 and the reference thread 402 are synchronized.
- the current thread 401 is made to wait until the reference thread 402 reaches the synchronization point B421.
- the time required for the reference thread 402 from the synchronization point A421 to the synchronization point B421 corresponds to the standby time 431.
- the current thread 401 resumes processing with reference to the data 441 at that time.
- This data 441 corresponds to the processing data at the first synchronization point in the present invention.
- step S101 the synchronization control unit 403 checks whether the other reference thread 402 has reached the synchronization point B422.
- the current thread 401 is controlled so as to continue the processing with reference to the data 411 at the synchronization point B422 (step S107).
- the standby time calculation unit 404 calculates the time until the reference thread 402 reaches the synchronization point B421, that is, the standby time 431. (Step S103). As described above, the time difference from the synchronization point A421 to the synchronization point B421 corresponds to the standby time 431.
- the quality difference calculation unit 406 includes data that the current thread 401 generates by referring to the data 441 at the synchronization point B422 of the reference thread 402, and data that the current thread 401 generates by referring to the current data of the reference thread 402. Is estimated (step S104).
- the synchronization determination unit 405 compares the quality difference estimated by the quality difference calculation unit 406 with a preset upper limit. As a result of the comparison, if the quality difference exceeds the upper limit (step S105: Yes), it is determined that synchronization is performed, that is, the current thread 401 is kept waiting until the reference thread 402 reaches the synchronization point B422 (step S106). In other words, since the quality difference is relatively large, it is determined that the data 441 needs to be referred to in order not to deteriorate the processing quality after the synchronization point A421 by the current thread 401 (determination I).
- the synchronization control unit 403 When the synchronization control unit 403 recognizes the determination, the synchronization control unit 403 interrupts the execution of the current thread 401 until the reference thread 402 reaches the synchronization point B422. Thereafter, when the data 441 is obtained from the reference thread 402, the current thread 401 is controlled so as to restart the processing with reference to the data 441 (step S107).
- step S105 when the estimated quality difference is less than the upper limit (step S105: No), the synchronization determination unit 405 compares the standby time 431 calculated by the standby time calculation unit 404 with a preset upper limit. As a result of the comparison, when the standby time 431 is less than the upper limit (step S107: No), it is determined that the current thread 401 is to wait (step S106). In other words, since the quality difference is small, reference to the data 441 is not essential for the current thread 401, but the reference thread 402 will soon reach the synchronization point B422, so that the current thread 401 is made to wait (determination II).
- step S107 If the standby time exceeds the upper limit (step S107: Yes), the synchronization determination unit 405 determines not to make the current thread 401 wait (step S109). In other words, since it is not essential for the current thread 401 to refer to the data 441, it is determined that the long standby time 431 is omitted and the processing is continued (determination III). Recognizing this determination, the synchronization control unit 403 controls the current thread 401 to continue with reference to the current data of the reference thread 402 (step S110).
- FIG. 4 shows a specific example of the above determinations I, II, and III.
- the illustrated graph shows the waiting time (431) and the degree of quality difference obtained at the four synchronization points 461, 462, 463, and 464 in the current thread 401.
- the standby time threshold value 451 and the quality difference threshold value 452 correspond to the above-described upper limit regarding the standby time and the quality difference.
- the current thread 401 is made to wait based on the above determination I.
- the waiting time is short, so the current thread 401 is made to wait based on the determination II. In these cases, the current thread 401 can generate data of good quality by the processing referring to the data 441 at the synchronization point B422 after waiting.
- the waiting time is long although the quality difference is small.
- the current thread 401 is continued without waiting according to the determination III.
- the current thread 401 proceeds with reference to the current data of the reference thread 402, but since the quality difference estimated in advance is small, the data quality of the processing result does not deteriorate significantly. .
- Such control can prevent processing delay of the current thread 401 due to synchronization with the reference thread 402.
- Steps S1051, S1061, and S1071 are the same as steps S105, S106, and S107 in FIG. That is, if the quality difference exceeds the upper limit, the current thread 401 is made to wait.
- the synchronization determination unit 405 determines that the current thread 401 is not kept on standby (Step S1091) because the quality difference is small (Step S1051: No). As a result, the current thread 401 continues processing based on the current data of the reference thread 402 (step S1101).
- the synchronization point 463 shown in FIG. 4 is determined to make the current thread 401 wait in the control of FIG. 3 (determination II), but is determined not to wait in the control of FIG. 5 (determination II ′). Therefore, the procedure of FIG. 5 is suitable for an application that places importance on preventing delay of the current thread 401.
- the current thread 401 does not wait for the reference thread 402
- the current data of the reference thread 402 is referred to.
- the present thread is not limited to this form, for example, data already processed by the reference thread 402, You may make it refer to the data prepared beforehand.
- the configuration of the information processing apparatus 400 of this embodiment is basically the same as that of the above-described embodiment (FIG. 1).
- the difference from the above-described embodiment lies in the operation of the synchronization determination unit 405.
- the synchronization determination unit 405 of the present embodiment shortens or extends the standby time calculated by the standby time calculation unit 404 according to the quality difference.
- the adjusted value is applied to the waiting time of the current thread 401.
- the adjusted waiting time is referred to as a timeout value.
- step S201 processing similar to that in steps S101 to S104 in the above-described embodiment is performed (step S201). That is, when the reference thread 402 has not reached the synchronization point B422 at this time, the standby time and the quality difference are estimated.
- step S202 If the estimated quality difference exceeds the upper limit (step S202: Yes), the synchronization determination unit 405 adds a predetermined value ⁇ ( ⁇ > 0) to the waiting time T and sets it as a timeout value (step S203). .
- the synchronization control unit 403 causes the current thread 401 to wait until the timeout value (T + ⁇ ) expires (step S204). Thereafter, similarly to step S107 in the above-described embodiment, the current thread 401 is resumed using the data 441 of the reference thread 402 (step S205).
- step S202 when the quality difference is less than the upper limit (step S202: No), the synchronization determination unit 405 subtracts a predetermined value ⁇ (0 ⁇ ⁇ T) from the standby time T and sets it as a timeout value (step S206). ). If the timeout value (T- ⁇ ) is greater than zero (step S202: No), the current thread 401 is made to wait until timeout (step S204).
- step S207: Yes the synchronization determination unit 405 determines not to wait for the current thread 401 (step S208). In this case, the synchronization control unit 403 continues the current thread 401 without waiting in the same manner as in step S110 in the above-described embodiment (step S209).
- the waiting time is extended when the quality difference is large, so that the current thread 401 can reliably capture the data 441 at the synchronization point B422.
- This is useful, for example, when the reference thread 402 is delayed for some reason.
- fine adjustment can be made to shorten the standby time. Thereby, it is possible to prevent the processing of the current thread 401 from being prolonged.
- the current thread 401 can perform processing by referring to the data processed by the reference thread 402 from the synchronization point A421 to the synchronization point B422.
- FIG. 1 A third embodiment of the present invention will be described.
- the configuration of the information processing apparatus 400 of this embodiment is basically the same as that of the above-described embodiment (FIG. 1).
- This embodiment is different from the first embodiment in that the waiting time calculation unit 404 recalculates the waiting time while the current thread 401 is waiting.
- step S106 step S1061
- step S1061 step S1061
- the standby time calculation unit 404 recalculates the standby time (step S302).
- the value obtained by this recalculation corresponds to the time required for the reference thread 402 to reach the synchronization point B422 when a predetermined time has elapsed from the synchronization point A421. Therefore, the synchronization point A421 or later, if there is no delay in the progression of the reference thread 402, the waiting time T R according recalculation can be expressed by the following equation (1).
- T 0 is the standby time 431 calculated at the synchronization point A421
- T P represents the elapsed time from the synchronization point A421.
- the synchronization judgment unit 405 When recalculated value T R is calculated, the synchronization judgment unit 405, the progress of the reference thread 402 to check whether the delay. This can be determined using the following equations (2) and (3).
- Synchronization determination unit 405 recalculates values T R with zero or more values may satisfy the above expression (2), the reference thread 402 is determined to be traveling without delay (Step S303: No, S304: No). In this case, recalculation is performed every predetermined period while waiting for the current thread 401 as it is (steps S301 and S302). Then, when the recalculated value T R becomes zero (step S303: Yes), and resumes the current thread 401 with reference to the data 441 of the synchronization point B422 (step S305). This process is the same as step S107 (FIG. 3) in the first embodiment described above.
- step S304 if the recomputed value T R satisfies the above formula (3), the synchronization judgment unit 405, a delay to a reference thread 402 determines that occurred (step S304: Yes). In this case, the standby is stopped to prevent the restart of the current thread 401 from being delayed (step S306), and the current thread 401 is restarted by referring to the current data of the reference thread 402 (step S307).
- the operation at the time of restart is the same as that in step S110 (FIG. 3) in the first embodiment described above.
- FIG. 1 A fourth embodiment of the present invention will be described.
- the configuration of the information processing apparatus 400 of this embodiment is basically the same as that of the above-described embodiment (FIG. 1).
- the present embodiment assumes that the information processing apparatus 400 executes a thread related to image processing.
- FIG. 8 shows an image frame processed by the information processing apparatus 400.
- An image to be processed includes n consecutive image frames (first frame 610-1 to n-th frame 610-n). Each frame 610-1 to 610-n is divided into m blocks 611-1 to 611-m. Each block (611-1 to 611-m) in the frame is assigned to one of a plurality of threads for processing. For example, a frame can be divided into a plurality of areas and assigned to threads in divided area units.
- FIG. 9 shows an example of frame division.
- the first divided area 631 and the second divided area 632 are formed in order from the top by dividing the frame one by one along the scanning direction of the frame, that is, along the horizontal direction of the frame.
- the reference thread (402) is assigned to the first divided area 631
- the current thread (401) is assigned to the second divided area 632, respectively. This assignment is intended to process the second divided area 632 with reference to the processing result of the first divided area 631.
- the waiting time calculation unit 404 can calculate the waiting time based on the numbers or positions of the blocks being processed by the current thread 401 and the blocks being processed by the reference thread 402, for example.
- the quality difference calculation unit 406 can calculate the quality difference based on, for example, the spatial position of the block being processed and the synchronization control status up to the previous frame.
- N is an integer of 2 or more.
- N is evaluated when the quality difference is the largest, and “1” is evaluated when the quality difference is the smallest.
- the waiting time “N” is evaluated when it is the longest, and “1” is evaluated when it is the shortest.
- the synchronization determination unit 405 determines to synchronize, that is, to make the current thread 401 wait, for example, when the evaluation of the standby time does not exceed the evaluation of the quality difference. Specifically, if the evaluation value of the standby time is “2” and the evaluation value of the quality difference is “1”, it is determined that synchronization is not performed. Further, when the evaluation value of the standby time is “4” and the evaluation value of the quality difference is “5”, it is determined that synchronization is performed. By such control, it is possible to avoid an increase in the overall processing time while suppressing a decrease in image quality.
- the evaluation of the standby time can appropriately add an offset value according to the progress of the process and the instruction from the user. Thereby, the determination of synchronization can be flexibly controlled.
- a time limit is provided for image processing. For example, when generating 30 image frames per second, it is necessary to perform decoding processing within 1/30 second per frame. When the processing time is sufficient, it does not matter if the waiting time is slightly longer. However, when there is no allowance for processing time, even a short standby time becomes a problem. Therefore, it is preferable to evaluate the standby time not in accordance with the absolute value of the standby time but in consideration of the time limit.
- the thread time limit is obtained from the frame processing time limit. For example, when one frame has 5 ⁇ 2 blocks and they are processed in parallel by 5 blocks with 2 threads, the time limit of each thread is the same as the time limit of the frame.
- the remaining time is calculated by subtracting the elapsed time until the current block from the time limit of the thread. Then, the remaining time is divided by the number of remaining blocks to calculate a time limit for one block, that is, a processing time that can be used for processing the remaining blocks. For example, when five blocks are processed by one thread, if the processing is completed up to the third block, a value obtained by dividing the remaining time by the number of remaining processing blocks “2” is the time limit for one block.
- the processing time per block in the reference thread 402 is calculated from the elapsed time up to the present time and the number of processed blocks.
- the time obtained by multiplying the calculated processing time by the number of blocks that the current thread 401 should wait becomes the waiting time of the current thread 401.
- twice the processing time per block is the waiting time of the current thread 401.
- Fig. 9 shows an example of setting the evaluation value for standby time.
- the evaluation value when the waiting time is equal to the time limit for one block is “3”. Further, the evaluation value when the standby time is longer than the time limit and the difference is equal to the time limit is “4”, and the evaluation value when the difference is twice or more the time limit is “5”. Conversely, the evaluation value when the waiting time is shorter than the time limit and the difference is equal to the time limit is “2”, and the evaluation value when the difference is twice or more the block processing time is “1”. .
- the time limit for one block was obtained by dividing the remaining time by the number of remaining blocks. Instead, a value obtained by dividing the processing time of the entire thread by the number of blocks of that thread. May be a time limit of one block. In the former case, the time limit changes with the progress of frame processing. In the latter case, the time limit remains unchanged until the end of the frame.
- FIG. 10 shows the parallel processing of the first divided region 631 and the second divided region 632 (FIG. 8) in one frame.
- the third block 611-3 and the subsequent fourth block 611-4 are processed by the current thread 401.
- the first block 611-1 and the second block 611-2 following the first block 611-1 are processed by the reference thread 402.
- the third block 611-3 requires the data 641 at the synchronization point 622 of the first block 611-1 in the processing after the synchronization point 621.
- the synchronization control unit 403 determines whether to make the third block 611-3 wait until the synchronization point 622.
- the third block 611-3 refers to the data 641 of the first block 611-1 at the synchronization point 622 and resumes processing.
- the process of the fourth block 611-4 is started.
- the fourth block 611-4 requires the data 642 of the second block 611-2 at the synchronization point 624 in the processing after the synchronization point 623.
- the synchronization control unit 403 determines whether standby is necessary. Here, it is determined not to wait.
- the current thread 401 refers to the current data 642 of the reference thread 402 without waiting for the data 642 of the second block 611-2, and continues the processing after the synchronization point 623.
- the data to be referred to when the current thread 401 does not wait is not limited to the current data 642 of the reference thread 402 as described above, but may be data that has been processed by the reference thread 402. Specifically, instead of the data 642 in the second block 611-2, for example, the data at the same position as the data 642 in the first block 611-1 or the data at the synchronization point 622 is referred to. May be.
- the fourth embodiment will be described in detail with a specific example.
- the information processing apparatus 400 performs H.264 / MPEG-4 AVC format image processing as described in Non-Patent Documents 1 and 2.
- Example 1 The current thread and reference thread in this example are deblocking filter processing applied to the macroblock.
- a technique related to deblocking filter processing is described in Non-Patent Document 3, for example.
- the deblocking filter process is simply referred to as “filter process”.
- the frame division format and thread assignment are the same as in FIG. That is, the first divided area 631 is assigned to the reference thread 402 and the second divided area 632 is assigned to the current thread 401. It is assumed that the data of the reference thread 402 that the current thread 401 intends to refer to is a pixel value after filtering.
- the waiting time calculation unit 404 calculates the waiting time until the filtering process by the reference thread 402 ends, on which the filtering process by the current thread 401 depends.
- the quality difference calculation unit 406 obtains a quality difference between when the current thread 401 waits for the reference thread 402 and when it does not wait.
- the image quality of the image is evaluated subjectively or objectively using a PSNR value or the like.
- the current thread 401 does not synchronize, that is, if it does not wait for the reference thread 402, the current thread 401 advances the filtering process using the pixel value before the filtering process in the reference thread 402.
- the quality difference calculation unit 406 calculates the quality difference using not only the spatial position of the macroblock and the synchronization control status up to the previous frame, but also the strength of the filter processing. Filter processing depends on the strength of the filter and the quantization step. Therefore, although the quantization step can be included in the calculation of the quality difference, an example in which control is performed using only the filter strength will be described below.
- FIG. 11 shows the dependency relationship between the blocks in the filter processing by arrows.
- the filter process basically uses the pixel data of the upper and left blocks that have already been subjected to the filter process.
- the filtering process 652 of the second block (# 2) depends on the result of the filtering process 651 of the first block (# 1).
- the filtering process 653 of the third block (# 3) also depends on the filtering process 651 of the first block (# 1).
- the filter processes 652 and 653 of the second block (# 2) and the third block (# 3) should wait for the end of the filter process 651 of the first block (# 1). Therefore, when performing the filtering process 653 of the third block (# 3) in the current thread, it is determined whether or not to wait for the processing of the filtering process 651 of the first block (# 1) in the reference thread.
- the degree of image quality deterioration when not waiting for the reference thread is related to the strength of the filter (Boundary Strength value: Bs value) as described in Non-Patent Document 3, for example.
- Bs the strength of the filter
- the influence reaches a maximum of 14 pixels from the boundary.
- the filter strength is weak
- the influence on other macroblocks is small. Therefore, an appropriate quality difference can be estimated by obtaining the quality of processing based on the strength of the filter processing.
- the evaluation value is “5”
- the evaluation values are “4”, respectively. ”,“ 3 ”, and“ 2 ”.
- the evaluation value of the waiting time can be determined in the form as shown in FIG.
- the synchronization determination unit 405 compares the evaluation value of the standby time with the evaluation value of the quality difference, and determines whether the current thread 401 needs to wait. For example, when the evaluation value of the standby time is equal to or less than the evaluation value of the quality difference, it is determined that synchronization is to be established.
- the quality difference evaluation value is determined based on the strength of the filter processing, but may be determined based on other factors. For example, the spatial position of the block to which the filter process is applied and the synchronization control status up to the previous frame may be considered.
- macroblocks positioned at the right and bottom edges of a frame are less likely to be affected by image quality degradation.
- “1” is subtracted from the quality difference evaluation value.
- the threshold value of the standby time is lowered, so that it can be determined that the standby is not performed.
- the image quality at that position may be significantly degraded. Therefore, in such a case, “1” may be added to the evaluation value at the time of quality evaluation in the current frame.
- the threshold value of the standby time is increased, which can lead to the determination of standby.
- Intra prediction related to H.264 luminance is performed in units of 4 ⁇ 4 blocks or 16 ⁇ 16 blocks.
- Nine types of prediction modes are prepared for intra prediction in units of 4 ⁇ 4 blocks, and four types of prediction modes are prepared for intra prediction in units of 16 ⁇ 16 blocks.
- the quality difference is estimated based on the prepared prediction mode and whether or not the macro block of the current thread 401 refers to the macro block of the reference thread 402. This quality difference represents how much the influence of image quality degradation in a frame differs between when synchronization is performed and when it is not performed. In the case of 4 ⁇ 4 block units, different prediction modes are applied to the 16 4 ⁇ 4 blocks of the macroblock, so that the influence of image quality degradation does not spread over the entire frame, and is limited to only a few blocks.
- the evaluation value when the influence of the image quality degradation of the macro block being processed in the current thread 401 is propagated to another macro block is “4”, and the case where the evaluation value remains in the macro block is “ 3 ”, and“ 2 ”can be used when it fits within an 8 ⁇ 8 block within a macroblock.
- the motion prediction process for the macroblock is the current thread 401
- the deblocking filter process for the macroblock is the reference thread 402.
- the motion prediction process of the current thread 401 is advanced using the pixel value before the filter process.
- the quality difference is calculated using the prediction vector of the motion prediction process.
- the divided area assigned to the reference thread 402 is a divided area of the frame immediately before the frame to which the divided area assigned to the current thread 401 belongs.
- “4” is set when the motion vector length is zero
- “3” is set when the motion vector length is greater than zero and smaller than the average within the frame
- the average within the frame is determined. If it is larger, “2” is set. Whether to wait for the end of the filtering process is determined based on the evaluation value of the quality difference and the evaluation value for the standby time. In this way, it is possible to realize synchronous control that shortens the processing time while suppressing a decrease in output image quality.
- a motion vector search process in a certain frame is a current thread 401
- a process of creating a reference image of a search area in another frame that is referred to for a motion vector search is a reference thread 402.
- an encoded image is once decoded and used as a reference image of a subsequent frame.
- a motion vector used for generating an intra predicted image is obtained using the reference image.
- the reference image of the vector search area has not been created, it is determined whether or not to wait for the creation process to end.
- a vector search is performed using an area that can be referred to at the present time.
- the quality difference is calculated using the motion vectors of the neighboring blocks and the search evaluation value using the area that can be referred to at the present time.
- the blocks assigned to the current thread 401 and the reference thread 402 are divided areas in the same frame in the specific example described above. In this example, it is assumed that the divided area assigned to the reference thread 402 is a divided area in the frame referred to in the motion vector search. That is, it is different from the frame of the current thread 401.
- the image quality is estimated based on the amount of code generated when a macroblock is encoded using the motion vector obtained by the search.
- the motion vector of a certain block has a strong correlation with the motion vectors of its surrounding blocks. For this reason, first, the motion vector of the block is predicted using the motion vector of the peripheral block. Then, it is checked whether or not the predicted motion vector points to the allocation area of the reference thread 402. Next, search evaluation such as the sum of absolute differences is performed in an area that can be referred to at the present time, and it is checked whether or not the best value obtained thereby exceeds the upper limit. At this time, a motion vector may be predicted from the tendency of the search evaluation value, and it may be estimated whether or not it points to the allocation area of the reference thread 402.
- a case where the predicted motion vector points to the allocation area of the reference thread 402 and the evaluation value of the sum of absolute differences in the currently referable area is smaller than the upper limit is “4. " Also, if the evaluation value of the sum of absolute differences is only smaller than the upper limit, it is set to “3”, and the evaluation value of the sum of absolute differences exceeds the upper limit, but the predicted motion vector points to the allocation area of the reference thread 402 If it is, “2” is set. Based on the evaluation value of the quality difference and the evaluation value for the standby time, it is determined whether to wait for the completion of the creation of the reference image in the vector search region. By doing in this way, synchronous control which shortens processing time is attained, suppressing the amount of generated codes.
- the present invention is not limited to the above embodiment.
- the implementation of the present invention can be modified as appropriate within the scope of the claims of the present application.
- the present invention can be implemented as a computer program corresponding to the operation of the information processing apparatus 400 and a recording medium storing the program.
Landscapes
- Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
- Synchronisation In Digital Transmission Systems (AREA)
Abstract
Description
401 現行スレッド
402 参照スレッド
403 同期制御部
404 待機時間計算部
405 同期判定部
406 品質差分計算部
図1に、本発明の第1実施形態の情報処理装置400を示す。情報処理装置400は、現行スレッド401、参照スレッド402、同期制御部403、待機時間計算部404、同期判定部405、及び、品質差分計算部406を備える。情報処理装置400は、マルチスレッド対応のコンピュータであり、現行スレッド401及び参照スレッド402を並列に実行する。
本発明の第2の実施形態を説明する。本実施形態の情報処理装置400の構成は、基本的には前述の実施形態のもの(図1)と同様である。前述の実施形態との差異は、同期判定部405の動作にある。本実施形態の同期判定部405は、待機時間計算部404が算出した待機時間を、品質差分の大きさに応じて短縮あるいは延長する。そして、調整した値を現行スレッド401の待機時間に適用する。以下、調整後の待機時間をタイムアウト値と記す。
本発明の第3の実施形態を説明する。本実施形態の情報処理装置400の構成は、基本的には前述の実施形態のもの(図1)と同様である。本実施形態は、待機時間計算部404が、現行スレッド401の待機中に待機時間の再計算を行う点が第1実施形態と相違する。
TR>T0-TP …(3)
本発明の第4の実施形態を説明する。本実施形態の情報処理装置400の構成は、基本的には前述の実施形態のもの(図1)と同様である。本実施形態は、情報処理装置400が画像処理に関するスレッドを実行することを想定したものである。
本例の現行スレッド及び参照スレッドは、マクロブロックに施すデブロッキングフィルタ処理である。デブロッキングフィルタ処理に関する技術は、例えば、非特許文献3に記載されている。以下、デブロッキングフィルタ処理を単に「フィルタ処理」と記す。
第4の実施形態に関する他の具体例について説明する。スレッドの内容に関し、前述の具体例1ではデブロッキングフィルタ処理であったが、本例では、イントラ予測処理であるとする。本例の同期判定部405は、同一フレームにおいて、現行スレッド401が参照しようとする参照スレッド402での画素にイントラ予測処理が施されていない場合に、そのイントラ予測処理の終了を待つか否かを判断する。待たない場合は、参照スレッド402の現時点の(イントラ予測が未処理の)画素、あるいは、イントラ予測済みであり且つ参照画素と同じ位置の前フレームの画素値を用いてイントラ予測を進める。
第4の実施形態に関する他の具体例について説明する。本例では、マクロブロックに対する動き予測処理を現行スレッド401とし、マクロブロックに対するデブロッキングフィルタ処理を参照スレッド402とするものである。本例では、現行スレッド401の動き予測処理が参照する画素に、参照スレッド402のフィルタ処理が施されていない場合に、そのフィルタ処理の完了を待つか否かを判断する。待たない場合は、フィルタ処理前の画素値を用いて現行スレッド401の動き予測処理を進める。
第4の実施形態に関する他の具体例について説明する。本例では、あるフレームでの動きベクトル探索処理を現行スレッド401とし、動きベクトルの探索に参照される別のフレームにおける探索領域の参照画像を作成する処理を参照スレッド402とする。
Claims (25)
- 現行スレッドと、当該現行スレッドによりデータが参照される参照スレッドとを並列に実行し、
前記現行スレッドが第1の同期点に達したとき、前記参照スレッドが第2の同期点に達したか否かを判別し、
前記判別が否の場合、前記参照スレッドが前記第2の同期点に到達するまでの所要時間を前記現行スレッドの待機時間として求め、
前記現行スレッドが前記参照スレッドの前記第2の同期点での処理データを参照して生成するデータと、前記現行スレッドが前記処理データを参照せずに生成するデータとの間の品質差分を推定し、
前記待機時間および品質差分の大きさに応じて、前記現行スレッドを前記参照スレッドが前記第2の同期点に達するまで待機させるか否かを判定することを特徴とする同期制御方法。 - 前記品質差分が上限を下回る場合、前記現行スレッドを待機させないと判定することを特徴とする請求項1記載の同期制御方法。
- さらに、前記待機時間が上限を超える場合は前記現行スレッドを待機させないと判定し、当該待機時間が上限を下回る場合は前記現行スレッドを待機させると判定する請求項2記載の同期制御方法。
- 前記品質差分が上限を超える場合は前記待機時間に所定値を加算したタイムアウト値を設定し、前記品質差分が上限を下回る場合は前記待機時間から所定値を減算したタイムアウト値を設定し、
前記設定したタイムアウト値の満了まで前記現行スレッドを待機させると判定することを特徴とする請求項1記載の同期制御方法。 - さらに、前記現行スレッドの待機中に前記待機時間を再計算し、
前記再計算の結果を用いて前記参照スレッドの遅延の有無を判定し、
前記参照スレッドが遅延していると判定した場合、前記現行スレッドの待機を中止することを特徴とする請求項1乃至4のいずれか1項に記載の同期制御方法。 - 前記現行スレッドおよび前記参照スレッドが画像フレームに対する画像処理であることを特徴とする請求項1乃至5のいずれか1項に記載の同期制御方法。
- 前記現行スレッドおよび前記参照スレッドが、同一の画像フレーム内の異なる画像ブロックに対するデブロッキングフィルタ処理であることを特徴とする請求項6記載の同期制御方法。
- 前記現行スレッドおよび前記参照スレッドが、同一の画像フレーム内の異なる画像ブロックに対するイントラ予測処理であることを特徴とする請求項6記載の同期制御方法。
- 前記現行スレッドが画像ブロックに対する動き予測処理であり、前記参照スレッドが当該画像ブロックに対するデブロッキングフィルタ処理であることを特徴とする請求項6記載の同期制御方法。
- 前記現行スレッドが画像フレームにおける動き予測処理であり、
前記参照スレッドが、前記動き予測処理における動きベクトルが探索される画像フレームの参照画像を作成する処理であることを特徴とする請求項6記載の同期制御方法。 - さらに、前記待機時間および前記品質差分について、それぞれをN段階で評価するための評価値を求め、
前記各評価値を用いて前記現行スレッドの待機に関する前記判定を行うことを特徴とする請求項1乃至10のいずれか1項に記載の同期制御方法。 - 前記処理データを参照せずに生成する前記データは、前記参照スレッドの現時点でのデータを参照して生成したデータであることを特徴とする請求項1乃至11のいずれか1項に記載の同期制御方法。
- 前記処理データを参照せずに生成する前記データは、前記参照スレッドによる処理済みのデータを参照して生成したデータであることを特徴とする請求項1乃至11のいずれか1項に記載の同期制御方法。
- 現行スレッドと該現行スレッドによりデータが参照される参照スレッドとを並列に実行する同期制御部と、
前記現行スレッドが第1の同期点に達したときに前記参照スレッドが第2の同期点に達していない場合、前記参照スレッドが前記第2の同期点に到達するまでの所要時間を前記現行スレッドの待機時間として求める待機時間計算部と、
前記現行スレッドが前記参照スレッドの前記第2の同期点での処理データを参照して生成するデータと、前記現行スレッドが前記処理データを参照せずに生成するデータとの間の品質差分を推定する品質差分計算部と、
前記待機時間および品質差分の大きさに応じて、前記現行スレッドを前記参照スレッドが前記第2の同期点に達するまで待機させるか否かを判定する同期判定部とを備えることを特徴とする情報処理装置。 - 前記同期判定部は、前記品質差分が上限を下回る場合、前記現行スレッドを待機させないと判定することを特徴とする請求項14記載の情報処理装置。
- 前記同期判定部は、さらに、前記待機時間が上限を超える場合は前記現行スレッドを待機させないと判定し、当該待機時間が上限を下回る場合は前記現行スレッドを待機させると判定する請求項15記載の情報処理装置。
- 前記同期判定部は、前記品質差分が上限を超える場合は前記待機時間に所定値を加算したタイムアウト値を設定し、前記品質差分が上限を下回る場合は前記待機時間から所定値を減算したタイムアウト値を設定し、前記設定したタイムアウト値の満了まで前記現行スレッドを待機させると判定することを特徴とする請求項14記載の情報処理装置。
- 前記待機時間計算部は、さらに、前記現行スレッドの待機中に前記待機時間を再計算し、
前記同期判定部は、前記再計算の結果を用いて前記参照スレッドの遅延の有無を判定し、前記参照スレッドが遅延していると判定した場合、前記現行スレッドの待機を中止することを特徴とする請求項14乃至17のいずれか1項に記載の情報処理装置。 - 前記現行スレッドおよび前記参照スレッドが画像フレームに対する画像処理であることを特徴とする請求項14乃至18のいずれか1項に記載の情報処理装置。
- 前記現行スレッドおよび前記参照スレッドが、同一の画像フレーム内の異なる画像ブロックに対するデブロッキングフィルタ処理であることを特徴とする請求項19記載の情報処理装置。
- 前記現行スレッドおよび前記参照スレッドが、同一の画像フレーム内の異なる画像ブロックに対するイントラ予測処理であることを特徴とする請求項19記載の情報処理装置。
- 前記現行スレッドが画像ブロックに対する動き予測処理であり、前記参照スレッドが当該画像ブロックに対するデブロッキングフィルタ処理であることを特徴とする請求項19記載の情報処理装置。
- 前記現行スレッドが画像フレームにおける動き予測処理であり、
前記参照スレッドが、前記動き予測処理における動きベクトルが探索される画像フレームの参照画像を作成する処理であることを特徴とする請求項19記載の情報処理装置。 - 前記同期判定部は、さらに、前記待機時間および前記品質差分について、それぞれをN段階で評価するための評価値を求め、前記各評価値を用いて前記現行スレッドの待機に関する前記判定を行うことを特徴とする請求項14乃至23のいずれか1項に記載の情報処理装置。
- コンピュータを請求項14乃至24のいずれか1項に記載の情報処理装置として機能させることを特徴とするプログラム。
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US12/812,936 US8555291B2 (en) | 2008-01-17 | 2009-01-14 | Synchronization control method and information processing device |
JP2009550023A JP5246603B2 (ja) | 2008-01-17 | 2009-01-14 | 同期制御方法および情報処理装置 |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2008008321 | 2008-01-17 | ||
JP2008-008321 | 2008-01-17 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2009090964A1 true WO2009090964A1 (ja) | 2009-07-23 |
Family
ID=40885353
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2009/050397 WO2009090964A1 (ja) | 2008-01-17 | 2009-01-14 | 同期制御方法および情報処理装置 |
Country Status (4)
Country | Link |
---|---|
US (1) | US8555291B2 (ja) |
JP (1) | JP5246603B2 (ja) |
TW (1) | TW200943175A (ja) |
WO (1) | WO2009090964A1 (ja) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2012086041A1 (ja) * | 2010-12-22 | 2012-06-28 | 富士通株式会社 | 同期処理方法 |
JP2013509738A (ja) * | 2009-10-29 | 2013-03-14 | 日本電気株式会社 | H.264インループ・デブロッキング・フィルタの並列実行装置及び方法 |
JP2015055994A (ja) * | 2013-09-11 | 2015-03-23 | 富士通株式会社 | 演算処理装置及び演算処理装置の制御方法 |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8850436B2 (en) * | 2009-09-28 | 2014-09-30 | Nvidia Corporation | Opcode-specified predicatable warp post-synchronization |
US20110276966A1 (en) * | 2010-05-06 | 2011-11-10 | Arm Limited | Managing task dependency within a data processing system |
US20140269933A1 (en) * | 2013-03-13 | 2014-09-18 | Magnum Semiconductor, Inc. | Video synchronization techniques using projection |
US20220291971A1 (en) * | 2021-03-10 | 2022-09-15 | EMC IP Holding Company LLC | Synchronization object having a stamp for flows in a storage system |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH0744403A (ja) * | 1993-08-02 | 1995-02-14 | Nippon Telegr & Teleph Corp <Ntt> | 連続メディアの同期方法 |
JPH09305546A (ja) * | 1996-05-17 | 1997-11-28 | Nec Corp | マルチプロセッサシステム及びその同期方法 |
JP2000010802A (ja) * | 1998-06-24 | 2000-01-14 | Ntt Data Corp | プロセス間データ連携システム及び記録媒体 |
JP2005078244A (ja) * | 2003-08-29 | 2005-03-24 | Fujitsu Ltd | プログラム実行方法およびプログラム実行装置 |
Family Cites Families (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6128640A (en) | 1996-10-03 | 2000-10-03 | Sun Microsystems, Inc. | Method and apparatus for user-level support for multiple event synchronization |
JP3730740B2 (ja) * | 1997-02-24 | 2006-01-05 | 株式会社日立製作所 | 並列ジョブ多重スケジューリング方法 |
US6026427A (en) | 1997-11-21 | 2000-02-15 | Nishihara; Kazunori | Condition variable to synchronize high level communication between processing threads |
JP3621598B2 (ja) | 1999-03-04 | 2005-02-16 | 日本電信電話株式会社 | 並列ソフトウェア画像符号化方法、および並列ソフトウェア画像符号化プログラムを記録した記録媒体 |
US6594773B1 (en) * | 1999-11-12 | 2003-07-15 | Microsoft Corporation | Adaptive control of streaming data in a graph |
US6920175B2 (en) | 2001-01-03 | 2005-07-19 | Nokia Corporation | Video coding architecture and methods for using same |
JP2005094054A (ja) | 2003-09-11 | 2005-04-07 | Hiroshima Univ | 画像符号化装置、その方法およびプログラム、並びにストリーム合成器、その方法およびプログラム |
US7584475B1 (en) * | 2003-11-20 | 2009-09-01 | Nvidia Corporation | Managing a video encoder to facilitate loading and executing another program |
JP4266894B2 (ja) | 2004-07-22 | 2009-05-20 | Necソフトウェア東北株式会社 | 性能統計情報出力システム、性能統計情報出力方法およびプログラム |
US7624208B2 (en) * | 2005-01-14 | 2009-11-24 | International Business Machines Corporation | Method, system, and computer program for managing a queuing system |
US20060215754A1 (en) * | 2005-03-24 | 2006-09-28 | Intel Corporation | Method and apparatus for performing video decoding in a multi-thread environment |
JP4828950B2 (ja) * | 2006-01-31 | 2011-11-30 | 株式会社東芝 | 動画像復号装置 |
US7930695B2 (en) * | 2006-04-06 | 2011-04-19 | Oracle America, Inc. | Method and apparatus for synchronizing threads on a processor that supports transactional memory |
US8019002B2 (en) * | 2006-06-08 | 2011-09-13 | Qualcomm Incorporated | Parallel batch decoding of video blocks |
US8000388B2 (en) * | 2006-07-17 | 2011-08-16 | Sony Corporation | Parallel processing apparatus for video compression |
KR100819289B1 (ko) * | 2006-10-20 | 2008-04-02 | 삼성전자주식회사 | 영상 데이터의 디블록킹 필터링 방법 및 디블록킹 필터 |
US8132171B2 (en) * | 2006-12-22 | 2012-03-06 | Hewlett-Packard Development Company, L.P. | Method of controlling thread access to a synchronization object |
TWI335764B (en) * | 2007-07-10 | 2011-01-01 | Faraday Tech Corp | In-loop deblocking filtering method and apparatus applied in video codec |
US20090049323A1 (en) * | 2007-08-14 | 2009-02-19 | Imark Robert R | Synchronization of processors in a multiprocessor system |
-
2009
- 2009-01-12 TW TW098100908A patent/TW200943175A/zh unknown
- 2009-01-14 US US12/812,936 patent/US8555291B2/en active Active
- 2009-01-14 JP JP2009550023A patent/JP5246603B2/ja active Active
- 2009-01-14 WO PCT/JP2009/050397 patent/WO2009090964A1/ja active Application Filing
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH0744403A (ja) * | 1993-08-02 | 1995-02-14 | Nippon Telegr & Teleph Corp <Ntt> | 連続メディアの同期方法 |
JPH09305546A (ja) * | 1996-05-17 | 1997-11-28 | Nec Corp | マルチプロセッサシステム及びその同期方法 |
JP2000010802A (ja) * | 1998-06-24 | 2000-01-14 | Ntt Data Corp | プロセス間データ連携システム及び記録媒体 |
JP2005078244A (ja) * | 2003-08-29 | 2005-03-24 | Fujitsu Ltd | プログラム実行方法およびプログラム実行装置 |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2013509738A (ja) * | 2009-10-29 | 2013-03-14 | 日本電気株式会社 | H.264インループ・デブロッキング・フィルタの並列実行装置及び方法 |
WO2012086041A1 (ja) * | 2010-12-22 | 2012-06-28 | 富士通株式会社 | 同期処理方法 |
JP5672311B2 (ja) * | 2010-12-22 | 2015-02-18 | 富士通株式会社 | 同期処理方法 |
US9690633B2 (en) | 2010-12-22 | 2017-06-27 | Fujitsu Limited | Synchronization method |
JP2015055994A (ja) * | 2013-09-11 | 2015-03-23 | 富士通株式会社 | 演算処理装置及び演算処理装置の制御方法 |
US9626230B2 (en) | 2013-09-11 | 2017-04-18 | Fujitsu Limited | Processor and control method of processor |
Also Published As
Publication number | Publication date |
---|---|
TW200943175A (en) | 2009-10-16 |
JPWO2009090964A1 (ja) | 2011-05-26 |
JP5246603B2 (ja) | 2013-07-24 |
US20110047556A1 (en) | 2011-02-24 |
US8555291B2 (en) | 2013-10-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP5246603B2 (ja) | 同期制御方法および情報処理装置 | |
US9918089B2 (en) | Image coding apparatus, image coding method, and program, pertaining to an image quality parameter, and image decoding apparatus, image decoding method, and program, pertaining to an image quality parameter | |
US9706203B2 (en) | Low latency video encoder | |
JP4224473B2 (ja) | 適応的モード決定による動き予測方法 | |
JP2007202150A (ja) | 可変ブロックサイズ動き予測のための符号化モードの決定方法及び装置 | |
JP2007081518A (ja) | 動画像符号化装置および動画像符号化方法 | |
JP2007067469A (ja) | フレーム内予測符号化制御方法、フレーム内予測符号化制御装置、フレーム内予測符号化制御プログラムおよびそのプログラムを記録した記録媒体 | |
US20220030250A1 (en) | Encoding device, decoding device, and program | |
KR101623064B1 (ko) | 영상 부호화 장치, 영상 부호화 방법 및 영상 부호화 프로그램 | |
US20240291979A1 (en) | Image encoding apparatus, image encoding method, image decoding apparatus, image decoding method, and non-transitory computer-readable storage medium | |
JPWO2006100946A1 (ja) | 画像信号再符号化装置及び画像信号再符号化方法 | |
JP2007180767A (ja) | 情報処理装置 | |
TWI493942B (zh) | 動畫像編碼方法、動畫像編碼裝置及動畫像編碼程式 | |
JP2005184241A (ja) | 動画像フレーム内モード判定方式 | |
JP2008311824A (ja) | 画像符号化装置および画像符号化プログラム | |
KR101525325B1 (ko) | 인트라 예측 모드 결정 방법 및 그 장치 | |
JP4232733B2 (ja) | ノイズ低減方法 | |
JP2010062999A (ja) | 動画像符号化装置、動画像符号化方法、及び、コンピュータプログラム | |
KR101145399B1 (ko) | 멀티-패스 인코딩 장치 및 방법 | |
JP2007288473A (ja) | 符号化装置、画像処理装置 | |
WO2020044985A1 (ja) | 画像復号装置、画像符号化装置、画像処理システム及びプログラム | |
JP2006191287A (ja) | 画像符号化装置、画像符号化方法および画像符号化プログラム | |
JP2009147847A (ja) | 動画像圧縮処理方法、装置、プログラム、及び媒体 | |
JP2007201675A (ja) | 動画像符号化装置および動画像符号化方法 | |
JP2007288674A (ja) | 符号化装置、復号化装置、画像処理装置。 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 09701492 Country of ref document: EP Kind code of ref document: A1 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 12812936 Country of ref document: US |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2009550023 Country of ref document: JP |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 09701492 Country of ref document: EP Kind code of ref document: A1 |