CN111510766A

CN111510766A - Video coding real-time evaluation and playing tool

Info

Publication number: CN111510766A
Application number: CN202010299483.5A
Authority: CN
Inventors: 章圣焰; 曲国远; 张海心; 王海翔
Original assignee: China Aeronautical Radio Electronics Research Institute
Current assignee: China Aeronautical Radio Electronics Research Institute
Priority date: 2020-04-16
Filing date: 2020-04-16
Publication date: 2020-08-07

Abstract

The invention discloses a video coding real-time evaluation and playing tool, which comprises the following modules: the data acquisition module, the unpacking module, the file storage module, the distribution module, the decoding module, the audio and video synchronization module, the display and play module and the task module are connected with each other as a bridge for task management, and the data acquisition module, the unpacking module, the file storage module, the distribution module, the decoding module, the audio and video synchronization module and the display and play module are used for completing the acquisition, unpacking, storage, distribution, decoding, audio and video synchronization and display and play of real-time audio and video data recorded by comprehensive data. The invention avoids the complex operation of reading the storage medium for playing after the fact, and improves the use convenience.

Description

Video coding real-time evaluation and playing tool

Technical Field

The invention belongs to the field of avionics, and relates to a real-time video player design technology, in particular to a video quality comparison technology.

Background

The comprehensive data recording is a basic function which needs to be completed by modern airborne avionics equipment, the audio and video recording is taken as an important component of the comprehensive data recording and is responsible for completing the recording of videos of all displays in a cockpit display system and the recording of voice calls in an airborne call system, along with the development of an airborne avionics technology, particularly the development of the cockpit display system, the display of multiple channels, large screens and high resolution has an inevitable trend, so that the synchronous development of the video recording technology is needed, the core of the video recording technology is video coding compression, the original video data with large data volume is compressed into the coded code stream data with small data volume on the premise of not obviously reducing visual sensory effects, and the currently popular video coding compression formats are as follows:

a) h.264 video compression standard based on discrete cosine transform, which is established on the basis of MPEG4 technology and is mainly used for coding and compressing moving images, can provide more excellent image quality under the same bandwidth compared with the MPEG2 standard, and provides a Network Abstraction layer (Network Abstraction L eye) so that files coded by H.264 can be easily transmitted on a Network, and meanwhile, the H.264 provides a fault-tolerant method for solving packet loss under an unstable Network environment.

b) H.265: a new video coding standard established after H.264 reserves some H.264 technologies, and meanwhile, related technologies are improved to improve the relationship among code streams, coding quality, time delay and algorithm complexity, so that the optimal setting is achieved, and 720P common high-definition video transmission can be realized at the speed lower than 1-2 Mbps.

c) JPEG 2000: the image compression standard based on wavelet transform supports lossless compression and has a better compression rate, and has great advantages in compression processing of large-resolution static image pictures with higher requirements on image quality.

The audio and video recording processing flow in the airborne integrated data recording device is shown in figure 1, a coding compression algorithm is selected to process according to the input audio and video types, the coded audio and video files are packaged in a processor according to a predefined file format and finally sent into a storage medium to be stored, and when the recorded audio and video needs to be reviewed and analyzed, the files in the storage medium are read to a PC (personal computer) or ground playback equipment through a special tool and played back through a player tool. The process of the playback analysis of the audio and video recording has the following defects:

1. the audio and video data are stored in the medium in a file form, and the process of audio and video coding cannot be tracked in real time and the quality of the video coding cannot be analyzed and judged in real time by 'after-the-fact analysis'.

2. The playing tool residing on the PC or the ground playback equipment can not realize the compatibility of three formats of H.264, H.265 and JPEG2000, and can not realize the synchronization of multi-path audio and video channels.

3. The quality of the video image can only be judged by subjective visual perception, and objective quantitative evaluation is difficult to carry out.

Disclosure of Invention

The invention aims to provide a video coding real-time evaluation and playing tool aiming at the defects of playback analysis of audio and video recording in comprehensive data recording equipment, which is used for decoding and playing video coded data in real time at a PC (personal computer) or ground playback equipment end, avoids the complex operation of reading a storage medium for playing after the fact, improves the use convenience, is easy to monitor the video coding process and improves the testability.

A video coding real-time evaluation and playback tool, comprising the following modules: the device comprises a data acquisition module, an unpacking module, a file storage module, a distribution module, a decoding module, an audio and video synchronization module, a display and play module and a task module;

a data acquisition module: establishing a data transmission channel with the comprehensive data recording equipment, collecting audio and video data packets on the data transmission channel in real time, and caching the audio and video data packets to a first memory cache region;

an unpacking module: analyzing the audio and video data packet on the first memory cache region to obtain a channel number, a data type, a timestamp, a coding mode and an original data frame;

a file storage module: the analyzed effective data is copied to form two identical data, wherein one data is packed according to a certain format to form a file, named by channel and time and stored in a large-capacity nonvolatile storage medium; the other part is input into a distribution module;

a distribution module: caching the analyzed effective data into a second memory cache region;

a decoding module: decoding the original data frame of the second memory cache region, and caching the decoded audio and video data, the channel number, the data type, the timestamp and the coding mode in a third memory cache region;

the audio and video synchronization module: according to the time stamp of the audio and video data in the third memory cache region, reading the audio and video data from the third memory cache region in a time sequence and storing the audio and video data in a fourth memory cache region;

the display and play module: sequentially reading the audio and video from the fourth memory buffer area for playing;

task management: the bridge is connected with each module, realizes the mutual communication between the modules and processes the service logic, and simultaneously, is used as a management control unit to monitor the state of each module.

Preferably, the second memory buffer area is divided into a plurality of small buffer areas, and the distribution module stores the valid data into the respective corresponding small buffer areas according to the analyzed data type and the encoding mode.

Preferably, the decoding module comprises a CPU soft decoding and a GPU hard decoding, the CPU soft decoding decodes the video data frames by operating a register, an instruction fetch, and an instruction decode of the CPU, and the GPU hard decoding decodes the video data frames by directly rendering YUV data, buffering multi-frame images, adjusting a buffer frame number parameter, a GOP size parameter, and an adjustment of a compilation option parameter by using a dedicated video decoding circuit in the graphics processor.

Preferably, for multiple channels of audio and video, the audio and video synchronization module records the time stamp of each channel of audio and video frames, the time stamps of the multiple channels of audio and video frames are compared with each other, a virtual time axis is started synchronously, synchronization is performed on the virtual time axis by taking the first time stamp as a reference, and then each channel of audio and video frames reads the audio and video frames from the third memory buffer area to the fourth memory buffer area in time sequence on the virtual time axis according to the time code.

Preferably, the video image processing system further comprises an image evaluation module, the video image evaluation module evaluates the video image by acquiring data such as high video bandwidth, real-time frame rate, bit rate, GOP, frame count, PSNR, SSIM and the like from the audio and video data in the third memory cache region, stores the evaluation result in the fifth memory cache region, and the display and play module reads the evaluation result from the fifth memory cache region for display.

Preferably, the display playing module uses OpenG L to display video, sets a drawing function and a timer by creating a display window, and finishes drawing by using a GPU with texture directly on the display card, so as to synchronously play and display multiple paths of videos and evaluation results.

The invention has the beneficial effects that:

1. the tool provided by the invention can decode and play the video coding data in real time at a PC or a ground playback device, avoids the complex operation of reading the storage medium for playing after the fact, improves the use convenience, is easy to monitor the video coding process, and improves the testability.

2. The tool provided by the invention supports playing of data in three video coding compression formats of H.264, H.265 and JPEG2000, and has strong universality.

3. The tool provided by the invention supports simultaneous and synchronous playing of multiple paths of audios and videos, is suitable for various application occasions, and has good engineering applicability.

4. The tool provided by the invention can carry out objective and quantitative quality evaluation analysis on the code streams in the H.264 and H.265 coding compression formats, PSNR and SSIM parameter indexes can accurately reflect the quality of the compressed video codes, the defect of people in a video quality evaluation method is overcome, a counterforce is provided for the development of the video codes to low code rate and high definition, and the tool has a remarkable market prospect and good economic benefit.

Drawings

Fig. 1 is a flowchart of audio-video recording processing in the integrated data recording device.

FIG. 2 is a block diagram of a video coding real-time evaluation and playback tool.

Fig. 3 is a block diagram of the overall architecture of real-time decoding and playing software.

FIG. 4 is a block diagram of the overall architecture of the real-time objective analysis software.

Fig. 5 is a real-time decoding playing software workflow.

Fig. 6 shows a method for distinguishing video types.

Fig. 7 is a flowchart of decoding data in three formats, h.264, h.265, and JPEG 2000.

Fig. 8 is a schematic diagram of performing multi-channel synchronous playback on the time axis.

FIG. 9 is a real-time objective analysis software workflow.

FIG. 10 shows an example of the real-time decoding and playing software and the real-time objective analysis software.

Detailed Description

The present invention will be described in further detail with reference to the accompanying drawings and examples.

Referring to fig. 2, the video coding real-time evaluation and playing tool shown in the embodiment receives ES streams or TS streams of audio and video from an integrated data recording device through a gigabit ethernet interface, and decodes and analyzes the ES streams or TS streams in real time, instead of performing "post-processing" on audio and video data on a storage medium, the video coding real-time evaluation and playing tool is composed of a device host and supporting software, wherein the device host comprises a dedicated ground playback device (resident Windows operating system) and a display, such as "dell (DE LL) XPS8920-R19N8 desktop computer host w 10I 7-7700 host +34 inch IPS curved surface display with built-in U3415W 32G/512G +2T/GTX 1060-6G", and the supporting software is installed and operated on hardware, and comprises a data acquisition module, a unpacking module, a file storage module, a distribution module, a decoding module, an audio and video synchronization module, an image evaluation module, a display module, a task module, a data acquisition module, a decoding module, a storage module, a task display module, a task module, a decoding module, a task analysis module, a decoding module, a task display module, a decoding module, a task display module, a task analysis module, a decoding module, a task analysis module, and a task display module, a task module, a real-time display module, a task module, and a task module, a real-time display module, and a display module.

A data acquisition module: and receiving a hardware IP address set by a user on a user interface, wherein the hardware IP address and the comprehensive data recording equipment are in the same network segment, and acquiring an audio/video data packet on a data transmission channel in real time and caching the audio/video data packet in a first memory cache region after establishing the data transmission channel according to information such as a corresponding network transmission protocol (such as UDP/TCP/RTP), a port address and the like and the comprehensive data recording equipment.

An unpacking module: firstly, judging whether the audio and video data packet on the first memory cache region is effective, if so, analyzing, and sending an analysis result to a file storage module.

The audio/video data packet generally includes a data channel number, a data type, a timestamp, a coding mode, an original data frame, and the like, and taking an h.264 elementary stream as an example, the audio/video data packet includes six parts, namely a data channel number, a data type, a timestamp, a Start code, NA L Uheader, and NA L U payload, and analyzing the audio/video data packet is to analyze a code stream data structure and determine key content information included in code stream data.

A file storage module: the analyzed effective data is copied into two identical data, wherein one data is packaged according to a certain format (for example, file header information, video header information, audio header information and timestamp information are added) to form a file, and then the file is named by channel and time and stored in a large-capacity nonvolatile storage medium; the other part is input into a distribution module;

a distribution module: and storing the analyzed effective data into a second memory cache region. And dividing the second memory cache region into a plurality of small cache regions, and storing the effective data into the respective corresponding small cache regions according to the analyzed data types. For example, the audio data is cached in an audio buffer, the video data is further classified according to the encoding mode, the H.264 type is cached in an H.264 buffer, the H.265 type is cached in an H.265 buffer, and the JPEG2000 type is cached in a JPEG2000 buffer. Fig. 6 shows a method for distinguishing video types:

video coding data in the V-shaped buffer space is data organized according to a certain format, each frame of data is a network abstraction layer unit (NA L U), in H.264 and H.265 data frames, a separator of 0x00000001 is arranged in front of each frame, a type of NA L U is arranged behind each separator, H.264 and H.265 code stream data can be further distinguished by judging the range of the type of NA L U, frame header SOC of JPEG2000 image frame data is 0xFF4F and SIZE 0xFF51, and the start of JPEG2000 frame can be judged through the mark.

Determining JPEG2000 data frames according to the separator information of 0xFF4FFF51 and placing the data frames into a JPEG2000 buffer area, determining H.264 and H.265 data frames according to the separator information of 0x00000001 and then according to the type of NA L U, if 0< (type &0x1f) <22, the data frames are H.264 data frames, and if 0< ((type >1) &0x3f) <47, the data frames are H.265 data frames, caching the H.264 type into the H.264 buffer area and the H.265 type into the H.265 buffer area.

A decoding module: and decoding the audio and video data in the second memory cache region, and caching the decoded audio and video in a third memory cache region. The decoding module comprises CPU soft decoding and GPU hard decoding, the CPU is an indispensable hardware processor in the PC and is good at general calculation, the coded video image is taken as a common data type through a CPU instruction set and is sent to the CPU for operation processing, and the decoded video image is finally obtained. The GPU is a graphics processor dedicated to processing graphics tasks in a PC, and one of the powerful advantages is its parallel processing capability, in the face of Single Instruction Multiple Data (SIMD), the computation load of data processing is far greater than the requirement of data scheduling and transmission, and the parallel processing efficiency of the GPU is far higher than that of the conventional CPU. The GPU is internally provided with a large number of special video decoding circuits, can execute a plurality of hard decoding, has the advantages of high decoding efficiency and high execution speed, and completes the decoding of audio and video data by the GPU after the parameters required by the decoding are set for the hard decoding of the GPU.

The audio data may be soft decoded by the CPU. H.264, H.265 and JPEG2000 are processes of calling a register, fetching an instruction and translating an instruction if the CPU performs soft decoding, and only the decoding algorithms are different; if the decoding is performed through the GPU, the decoding process is realized in a general decoding circuit in the GPU, and only the decoding logic is different. Fig. 7 is a flowchart of decoding data in three formats of h.264, h.265, and JPEG2000, and the techniques such as inverse transform of difference image DCT integer, inverse transform of two-dimensional wavelet, enhancement of high frequency suppression low frequency, and intra-frame residual error restoration are adopted, and the video is decoded with high fidelity and high time efficiency through the algorithm flows such as intra-frame prediction, inter-frame prediction, transform, quantization, deblocking filter, entropy coding, and the like, and the adjustment of the buffer frame number parameter, GOP size parameter, and coding option parameter is reasonably adjusted by using the buffer multi-frame image in the decoding process, so as to reduce the decoding delay and improve the visual effect.

The audio and video synchronization module: and reading the audio and video frames from the third memory buffer area and storing the audio and video frames into the fourth memory buffer area in time sequence according to the PTS time stamp of each audio and video frame by adopting a PTS reference mode. When multiple paths of audio and video are played, the PTS time stamps of each path of audio and video frames are recorded, the multiple paths of audio and video frames are compared with each other, a virtual time axis is started synchronously, synchronization is carried out on the virtual time axis by taking the first PTS time stamp as a reference, then each path of video reads the audio and video frames from a third memory cache area to a fourth memory cache area in time sequence on the virtual time axis according to PTS time codes, the reading is carried out before the time is reached, the reading is carried out after the time is not reached, meanwhile, the use efficiency of hardware resources is improved by utilizing a multithreading technology, the operation efficiency of a system is improved, and the synchronous playing of the multiple paths of. As shown in fig. 8, assuming that there are 4 channels of video and 2 channels of audio, the start PTS time of video 0 is 9:00, the start PTS time of video 1 is 9:13, the start PTS time of video 2 is 8:50, the start PTS time of video 3 is 9:28, the start PTS time of audio 0 is 9:03, and the start PTS time of audio 1 is 9:20, the PTS time axis is based on the PTS time 8:50 of video 2 and extends backwards in sequence with clock precision, video 2 is played when 8:50 starts playing, video 0 is played when 9:00 starts playing, audio 0 is played when 9:03 starts playing, video 1 is played when 9:13 starts playing, video 1 is played when 9:20 starts playing, and video 3 is played sequentially, in order to solve the problems of high resource demand and resource competition caused by multi-channel audio and video synchronous playing, the playing software uses multi-thread stream data to process code, which is a technology that multiple threads are implemented concurrently executed from software or hardware, for example, a computer with multithreading capability can execute more than one thread at the same time due to hardware support, thereby improving the overall processing performance.

An image evaluation module: and evaluating the video image by acquiring data such as high video bandwidth, real-time frame rate, code rate, GOP, frame count, PSNR (Peak to average ratio), SSIM (Small Scale integration) and the like from the audio and video data on the third memory cache region, and storing the evaluation result into a fifth memory cache region.

Video width and height information is embedded in an audio-video data stream, and taking an h.264 elementary stream as an example, the video width and height information is contained in NA L U payload with NA L U header of 0x68, from which the width and height of the video can be analyzed;

the frame count is the number of the obtained effective video frames, a calculator is adopted for counting, and when one effective video frame is obtained, the counter is increased by 1;

the real-time frame rate is a normalized quantized value of time based on frame count, i.e. the number of statistical frames per second;

the code rate is the flow of the coded video data used in unit time, a target value can be set through a task module, and the actual code rate fluctuates above and below the target value, generally speaking, the larger the code rate is, the clearer image quality of a video picture is higher, but the video compression ratio is not high, and the smaller the code rate is, the higher the video compression ratio is, but the image quality of the video is reduced;

GOP is the interval value between two key I frames of the coded video, which is an important index of video coding and can be obtained by calculating the number of video frames between two I frames.

PSNR is the most commonly used objective evaluation index for video images, and is often simply defined by MSE, where MSE represents the mean square error of the current image X and the reference image Y, H, W represents the height and width of the image, and the PSNR can be derived by taking the logarithm of MSE, and the formula of MSE and PSNR is as follows:

PSNR values in the range of [0,60] above 40dB indicate excellent image quality, generally indicate good image quality at 30-40dB, poor image quality at 20-30dB, and unacceptable image quality below 20dB, and for color image PSNR, two methods are generally used for calculation:

1) PSNR of the RGB three channels is calculated respectively, and then an average value is obtained;

2) the picture is converted into YCbCr format and then only the PSNR of the Y component, i.e. the luminance component, is counted.

SSIM is another objective evaluation index for video images, the basic principle of SSIM is that natural images are considered to be highly structured, that is, there is strong correlation between adjacent pixels, and this correlation expresses structural information of objects in a scene, and the human visual system has strong understanding and information extraction capability for images, so structural distortion is an important consideration when measuring image quality, and the structural similarity of two images X, Y is compared according to 3 dimensions: luminance l (x, y), contrast c (x, y), structure s (x, y), SSIM is a function of the three:

SSIM＝f<l(x，y)|c(x，y)|s(x，y)>

the brightness is the brightness of the image or video, if the image X has N pixel points, the pixel value of each pixel point is X_iThen the average luminance u of the image X_xComprises the following steps:

average luminance u of image Y_yIs calculated by the formula and u_xSame, y_iIs the pixel value of each pixel of image Y:

the brightness similarity of the two graphs X, Y can thus be found to be:

wherein, C₁Is to prevent the denominator from being zero, and is generally C₁Calculated using the formula:

C₁＝(K₁L)²

note: k₁<<A constant of 1, often 0.01, L is the dynamic range of the gray scale, often 255.

Contrast is the degree of change in brightness of an image or video, i.e. the standard deviation of pixel values, the contrast σ of an image X_xThe calculation formula is as follows:

contrast σ of image Y_yFormula is the same as sigma_x，

The contrast similarity of the two graphs X, Y can thus be found to be:

wherein, C₂Is to prevent the denominator from being zero, and is generally C₂Calculated using the formula:

C₂＝(K₂L)²

note: k₂<<A constant of 1, often 0.03, L is the dynamic range of the gray scale, often 255.

The structure is the overall similarity degree of an image or a video, a vector composed of all pixels of the image is required to be used for representing the structure, and the influence of brightness and contrast is required to be eliminated, and the calculation formula is as follows:

wherein, C₃Is to prevent the denominator from being zero, and is generally C₃Calculated using the formula:

according to the above formula of the brightness l (x, y), the contrast c (x, y), and the structure s (x, y), the formula of SSIM is finally obtained as follows:

the SSIM value range is [0,1], and the larger the value is, the better the image quality is shown to be.

The image evaluation module optimizes the CPU instruction set to enable the CPU instruction set to divide 32-bit and 64-bit registers into Z × 8 bits for use, and the computation time is reduced by more than half by using Z threads to compute the PSNR and the SSIR simultaneously.

The display and play module utilizes the OpenG L video display technology, finishes drawing by creating a display window, setting a drawing function and a timer and utilizing a GPU (graphics processing Unit) with texture directly on a display card, and ensures that a plurality of paths of videos and an evaluation result are well and synchronously played and displayed.

Task management: the system is used as a bridge to connect all the process modules, realizes the mutual communication among the modules and efficiently processes business logic, reduces the coupling among all the process modules, and simultaneously, the task management is used as a management control unit to monitor the state of all the processes and realize the monitoring and processing of the whole process of the playing software.

According to the above, the real-time decoding playing software has a working flow as shown in fig. 5, which specifically includes the following steps:

a) and establishing a data transmission channel with the comprehensive data recording equipment, collecting audio and video data packets on the data transmission channel in real time, and caching the audio and video data packets to a first memory cache region.

b) And analyzing the audio and video data packet on the first memory cache region to obtain a channel number, a data type, a timestamp, a coding mode, an original data frame and the like.

c) The analyzed effective data is copied to form two identical data, wherein one data is packed according to a certain format to form a file, named by channel and time and stored in a large-capacity nonvolatile storage medium; the other part is cached in a second memory cache region;

d) and decoding the original data frame of the second memory cache region, and caching the decoded audio and video data in a third memory cache region together with a channel number, a data type, a timestamp, a coding mode and the like.

e) And reading the audio and video data from the third memory buffer area in a time sequence according to the PTS time stamp of the audio and video data in the third memory buffer area and storing the audio and video data in the fourth memory buffer area.

f) And sequentially reading the audio and video from the fourth memory buffer area for playing.

The real-time objective analysis software workflow is shown in fig. 9, and the specific flow is as follows:

c) The analyzed effective data is copied to form two identical data, wherein one data is packed according to a certain format to form a file, named by channel and time and stored in a large-capacity nonvolatile storage medium; and the other part is cached in the second memory cache region.

e) And evaluating the video image by acquiring data such as high video bandwidth, real-time frame rate, code rate, GOP, frame count, PSNR (Peak to average ratio), SSIM (Small Scale integration) and the like from the audio and video data on the third memory cache region, and storing the evaluation result into a fifth memory cache region.

f) And reading out the evaluation result from the fifth memory cache region for displaying.

An example of comparing and playing two paths of videos by using the real-time decoding and playing software and the real-time objective analysis software is shown in fig. 10, wherein original non-compressed video data is displayed on the upper part, reference compressed video data is displayed on the lower part, and the current frame count, the PSNR value and the SSIM value are displayed in a lower right state frame.

Claims

1. A video coding real-time evaluation and playback tool, comprising the following modules: data acquisition module, unpacking module, file storage module, distribution module, decoding module, audio and video synchronization module, show broadcast module and task module, its characterized in that:

2. The tool of claim 1, wherein: the second memory buffer area is divided into a plurality of small buffer areas, and the distribution module stores the effective data into the corresponding small buffer areas according to the analyzed data type and the encoding mode.

3. The tool of claim 1, wherein: the decoding module comprises a CPU soft decoding module and a GPU hard decoding module, the CPU soft decoding module decodes video data frames by operating a register, an instruction fetch circuit and an instruction decode circuit of the CPU, and the GPU hard decoding module directly renders YUV data, caches multi-frame images, adjusts a parameter of a cache frame number and a parameter of GOP size, and decodes the video data frames by adjusting a parameter of a compiling option by using a special video decoding circuit in the graphics processor.

4. The tool of claim 1, wherein: and aiming at multiple paths of audio and video, the audio and video synchronization module records the time stamps of each path of audio and video frames, the time stamps of the multiple paths of audio and video frames are compared with each other, a virtual time axis is synchronously started, synchronization is carried out on the virtual time axis by taking the first time stamp as a reference, and then each path of audio and video frames reads the audio and video frame cache from the third memory cache region to the fourth memory cache region in time sequence on the virtual time axis according to time codes.

5. The tool of claim 1, wherein: the video image evaluation system further comprises an image evaluation module, the video image evaluation module evaluates the video image by acquiring data such as high video bandwidth, real-time frame rate, code rate, GOP, frame count, PSNR, SSIM and the like on the audio and video data in the third memory cache region, stores the evaluation result in the fifth memory cache region, and the display and play module reads the evaluation result from the fifth memory cache region for display.

6. The tool of claim 1, wherein the display and playback module uses OpenG L to display video, sets a drawing function and a timer by creating a display window, and finishes drawing by using a GPU (graphics processing unit) with texture directly on a graphics card, so that multiple channels of video and evaluation results are played and displayed synchronously.