CN108337504A

CN108337504A - A kind of method and device of evaluation video quality

Info

Publication number: CN108337504A
Application number: CN201810088362.9A
Authority: CN
Inventors: 陈志波; 周玮; 李卫平
Original assignee: University of Science and Technology of China USTC
Current assignee: University of Science and Technology of China USTC
Priority date: 2018-01-30
Filing date: 2018-01-30
Publication date: 2018-07-27

Abstract

The present invention proposes a kind of method and device of evaluation video quality.A method of evaluation video quality, including：Extraction obtains the first view video frame of setting quantity, and the second view video frame of setting quantity corresponding with setting first view video frame of quantity from three-dimensional video-frequency；Quality evaluation is carried out to the first view video frame of the setting quantity and the second view video frame of the setting quantity using trained preset two tunnels depth convolutional neural networks, obtains the quality evaluation result of the three-dimensional video-frequency.Above-mentioned stereoscopic video quality evaluation procedure need not manually extract video features, can be completely achieved automatic video quality evaluation, can improve accuracy and the automatization level of video quality evaluation.

Description

A kind of method and device of evaluation video quality

Technical field

The present invention relates to machine learning techniques field more particularly to a kind of method and devices of evaluation video quality.

Background technology

With the fast development of three-dimensional display apparatus, we can pass through three-dimensional television (Three-Dimensional Television, 3DTV) viewing three-dimensional video-frequency.The left and right of three-dimensional video-frequency is depending on that can integrate generation depth perception, to viewing Person brings immersion to a certain extent to experience.But in entire three-dimensional video-frequency processing procedure, such as in acquisition, compression, biography During defeated, reconstruction and display etc., original stereo video council is by a variety of quality impairments.Therefore, accurate algorithm is designed to come from The Quality of experience of dynamic evaluation and three-dimensional video-frequency is vital for entire three-dimensional video-frequency processing procedure.

The research about stereoscopic video quality evaluation is not lost based on the algorithm referred to entirely, that is, from original mostly at present The quality of manual characteristic evaluating distortion three-dimensional video-frequency is extracted in true three-dimensional video-frequency and distortion three-dimensional video-frequency.However in most of reality In, this complete required stereoscopic video information that is not distorted of algorithm that refers to generally can not obtain, while extract manual feature Not accurate enough and robust, and a large amount of priori is needed, the automatization level of evaluation procedure is not high enough.

Invention content

In order to solve above-mentioned defect existing in the prior art and deficiency, the following technical solutions are proposed by the present invention：

A method of evaluation video quality, including：

Extraction obtains the first view video frame of setting quantity, and first with the setting quantity from three-dimensional video-frequency Second view video frame of the corresponding setting quantity of view video frame；

Using trained preset two tunnels depth convolutional neural networks to the first multi-view video of the setting quantity Frame and the second view video frame of the setting quantity carry out quality evaluation, obtain the quality evaluation result of the three-dimensional video-frequency.

Preferably, the second of the first view video frame for obtaining setting quantity and setting quantity is being extracted from three-dimensional video-frequency After view video frame, this method further includes：

According to preset image division methods, respectively to the first view video frame of the setting quantity and setting quantity Second view video frame carries out image block division processing.

Preferably, it is described using trained preset two tunnels depth convolutional neural networks to the of the setting quantity One view video frame and the second view video frame of the setting quantity carry out quality evaluation, obtain the quality of the three-dimensional video-frequency Evaluation result, including：

Using trained preset two tunnels depth convolutional neural networks, respectively to each first view video frame and with Every group of image block in corresponding second view video frame of first view video frame obtains each the to carrying out quality evaluation The quality of one view video frame and every group of image block pair in the second view video frame corresponding with first view video frame Score；

To in each first view video frame and the second view video frame corresponding with first view video frame The mass fraction of every group of image block pair carry out spacial average processing, obtain each first view video frame and with described first The mass fraction of corresponding second view video frame of view video frame；

To each first view video frame and the second view video frame corresponding with first view video frame Mass fraction carries out time domain averaging processing, obtains the quality evaluation result of the three-dimensional video-frequency.

Preferably, include to the training process of the preset two tunnels depth convolutional neural networks：

Cycle executes following operation, until the quality evaluation penalty values being calculated are less than given threshold：

By the first view video frame of preset three-dimensional video-frequency and the second visual angle corresponding with first view video frame Video frame inputs preset two tunnels depth convolutional neural networks, obtains the video frame mass fraction of the three-dimensional video-frequency；

According to the standard quality score of the video frame mass fraction and the video frame, quality evaluation loss is calculated Value；

If the quality evaluation penalty values are not less than given threshold, using the quality evaluation penalty values to described pre- Bis- tunnel depth convolutional neural networks of She carry out reverse link parameter update.

Preferably, the extraction from three-dimensional video-frequency obtain setting quantity the first view video frame and setting quantity the Two view video frames, including：

From the first multi-view video sequence of three-dimensional video-frequency, the time domain key frame of extraction setting quantity obtains setting quantity First view video frame, and from the second multi-view video sequence of the three-dimensional video-frequency, the of extraction and the setting quantity Corresponding second view video frame of one view video frame obtains the second view video frame of setting quantity.

A kind of device of evaluation video quality, including：

Video frame extraction unit obtains the first view video frame of setting quantity for being extracted from three-dimensional video-frequency, and Second view video frame of setting quantity corresponding with setting first view video frame of quantity；

Video quality evaluation unit, for being set to described using trained preset two tunnels depth convolutional neural networks First view video frame of fixed number amount and the second view video frame of the setting quantity carry out quality evaluation, obtain the solid The quality evaluation result of video.

Preferably, which further includes：

Video frame division unit, for according to preset image division methods, being regarded respectively to the first of the setting quantity Second view video frame of angle video frame and setting quantity carries out image block division processing.

Preferably, the video quality evaluation unit utilizes trained preset two tunnels depth convolutional neural networks pair First view video frame of the setting quantity and the second view video frame of the setting quantity carry out quality evaluation, obtain institute When stating the quality evaluation result of three-dimensional video-frequency, it is specifically used for：

Preferably, the video quality evaluation unit is additionally operable to carry out the preset two tunnels depth convolutional neural networks Training；

It is specific to use when the video quality evaluation unit is to the training of the preset two tunnels depth convolutional neural networks In：

Preferably, the video frame extraction unit is extracted from three-dimensional video-frequency obtains the first view video frame of setting quantity With setting quantity the second view video frame when, be specifically used for：

The present invention first extracts the first visual angle of setting quantity when evaluating the video quality of three-dimensional video-frequency from three-dimensional video-frequency Video frame and the second view video frame, then by the first view video frame of extraction and the input of the second view video frame by training Two tunnel depth convolutional neural networks carry out quality evaluations, obtain the quality evaluation result to above-mentioned three-dimensional video-frequency.The present invention carries The method of the evaluation video quality gone out realizes the video matter to three-dimensional video-frequency by trained depth convolutional neural networks Amount evaluation, above-mentioned stereoscopic video quality evaluation procedure need not manually extract video features, can be completely achieved automatic video matter Amount evaluation, can improve accuracy and the automatization level of video quality evaluation.

Description of the drawings

In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below There is attached drawing needed in technology description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this The embodiment of invention for those of ordinary skill in the art without creative efforts, can also basis The attached drawing of offer obtains other attached drawings.

Fig. 1 is a kind of flow diagram of the method for evaluation video quality provided in an embodiment of the present invention；

Fig. 2 is the configuration diagram of two tunnels depth convolutional neural networks provided in an embodiment of the present invention；

Fig. 3 is the flow diagram of two tunnel depth convolutional neural networks of training provided in an embodiment of the present invention；

Fig. 4 is the flow diagram of the method for another evaluation video quality provided in an embodiment of the present invention；

Fig. 5 is the schematic diagram provided in an embodiment of the present invention that key video sequence frame is extracted from video sequence；

Fig. 6 is a kind of structural schematic diagram of the device of evaluation video quality provided in an embodiment of the present invention；

Fig. 7 is the structural schematic diagram of the device of another evaluation video quality provided in an embodiment of the present invention.

Specific implementation mode

The application scenarios that technical solution of the embodiment of the present invention is suitable for evaluating stereoscopic video quality.Using the present invention Embodiment technical solution, the quality for the evaluation three-dimensional video-frequency that can be automated.

Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation describes, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every other Embodiment shall fall within the protection scope of the present invention.

The embodiment of the invention discloses a kind of methods of evaluation video quality, and shown in Figure 1, this method includes：

S101, from three-dimensional video-frequency extraction obtain setting quantity the first view video frame, and with the setting quantity The first view video frame it is corresponding setting quantity the second view video frame；

Specifically, three-dimensional video-frequency is the planar video Sequence composition by left and right visual angle.Therefore, arbitrary three-dimensional video-frequency is all Including left and right multi-view video sequence, that is, include the video sequence at two visual angles, in embodiments of the present invention with the first multi-view video Sequence and the second multi-view video sequence are distinguish.

When carrying out video quality evaluation to three-dimensional video-frequency, the video of each video frame of three-dimensional video-frequency is actually evaluated Quality.In actual video evaluation procedure, each video of the video sequence at each visual angle of evaluation three-dimensional video-frequency can be passed through The video quality of frame, to determine the video quality of entire three-dimensional video-frequency.

In embodiments of the present invention, from the video sequence at each visual angle of three-dimensional video-frequency, a certain number of videos are extracted Key frame, by evaluate extraction key frame of video video quality, come represent entire visual angle video sequence video quality, The further video quality of the key frame of video of comprehensive extraction, determines the video quality of entire three-dimensional video-frequency.

S102, using trained preset two tunnels depth convolutional neural networks to it is described setting quantity the first visual angle Second view video frame of video frame and the setting quantity carries out quality evaluation, obtains the quality evaluation knot of the three-dimensional video-frequency Fruit.

Specifically, above-mentioned preset two tunnels depth convolutional neural networks refer to the embodiment of the present invention be specially arranged for commenting Two tunnel depth convolutional neural networks of the video quality of the left and right multi-view video sequence of valence three-dimensional video-frequency.It is deep when erecting two road After spending convolutional neural networks, it is trained using a large amount of sample data, specially trains its video sequence to input Quality evaluation is carried out, and inherent parameters are adjusted according to evaluation result, keeps evaluation more acurrate.

When the evaluation accuracy of the above-mentioned two tunnels depth convolutional neural networks of training reaches requirement, rolled up using the two tunnels depth Product neural network carries out quality evaluation to the first view video frame of three-dimensional video-frequency and the second view video frame respectively, obtains to whole The quality evaluation result of a three-dimensional video-frequency.

Technical solution of the embodiment of the present invention is using trained two tunnels depth convolutional neural networks to a left side for three-dimensional video-frequency LOOK RIGHT video carries out quality evaluation, to realize the effective evaluation to stereoscopic video quality.Above-mentioned stereoscopic video quality evaluation Process need not manually extract video features, can be completely achieved automatic video quality evaluation, can improve video quality evaluation Accuracy and automatization level.

It is appreciated that technical solution of the embodiment of the present invention is realized by two tunnel depth convolutional neural networks to three-dimensional video-frequency Quality evaluation.The a large amount of manual feature of priori extraction is not needed in evaluation procedure, completely according to the depth of design nerve Network carries out feature learning, to carry out without the quality evaluation with reference to three-dimensional video-frequency.

The structural framing of two tunnel depth convolutional neural networks used by the embodiment of the present invention is as shown in Figure 2.Shown in Fig. 2 Two tunnel depth convolutional neural networks in the structure per depth convolutional neural networks all the way it is consistent, parameter is also consistent, is complete Identical depth convolutional neural networks.The depth convolutional neural networks are modified to obtain by AlexNet.Per depth convolution all the way Neural network includes convolutional layer, pond layer, convolutional layer, pond layer, three convolutional layers, Chi Hua successively from input terminal to output end Layer, full articulamentum.The full articulamentum of two-way depth convolutional neural networks is connected by a fused layer again.Wherein, in order to be adapted to To the calculation processing demand of video image, the embodiment of the present invention sets the convolution kernel of the convolutional layer of above-mentioned depth convolutional neural networks It is set to 3 × 3 sizes.

Framework rise two tunnels depth convolutional neural networks shown in Fig. 2 after, need to two tunnel depth convolutional neural networks into Row training, so that above-mentioned two tunnels depth convolutional neural networks can carry out quality evaluation to the three-dimensional video-frequency of input automatically.

Shown in Figure 3, the embodiment of the present invention proposes that the training process of bis- tunnel depth convolutional neural networks of Dui is specifically wrapped It includes：

S301, by the first view video frame of preset three-dimensional video-frequency and with first view video frame corresponding second View video frame inputs two tunnel depth convolutional neural networks, obtains the video frame mass fraction of the three-dimensional video-frequency；

Specifically, any one frame image of three-dimensional video-frequency include left and right two visual angles video frame, the embodiment of the present invention with First view video frame and the second view video frame are distinguish.

In embodiments of the present invention, it to the quality evaluation of any one frame video frame of three-dimensional video-frequency, is required for respectively regarding this The first view video frame and the second view video frame that frequency frame is included carry out quality evaluation, and evaluation result is integrated to obtain the final product To the quality evaluation of this frame video frame of the three-dimensional video-frequency.Quality evaluation to three-dimensional video-frequency is all by three-dimensional video-frequency Video frame quality evaluation realize.By the first view video frame of three-dimensional video-frequency and corresponding with first view video frame After second view video frame inputs two tunnel depth convolutional neural networks, input is regarded respectively per depth convolutional neural networks all the way Frequency frame carries out a series of convolution sum pondization processing, finally respectively obtains the video frame quality of above-mentioned corresponding two frames video frame Score.

Can also image block division further be carried out to mutual corresponding first view video frame and the second view video frame, Mutual corresponding two video frame are divided into multigroup mutual corresponding image block pair.For example, the first view video frame is divided For multiple images block a_i,j, i=1,2 ..., m；J=1,2 ..., n.According to identical image block dividing mode, the second visual angle is regarded Frequency frame is divided into multiple images block b_i,j, i=1,2 ..., m；J=1,2 ..., n.Then corresponding a_i,jWith b_i,jForm image block It is right.Using the above-mentioned two tunnels depth convolutional neural networks phase to including in the first view video frame and the second view video frame respectively Mutual corresponding image block carries out spacial average to carrying out quality evaluation, in the quality evaluation result to all image blocks pair, you can Obtain the quality evaluation result of mutual corresponding first view video frame and the second view video frame.

It should be noted that training samples number be the key that influence to the training effects of depth convolutional neural networks because Element.A large amount of training sample is needed to the training of depth convolutional neural networks, in existing stereoscopic video quality evaluation procedure In, exactly there is a problem of that amount of training data is small, cause the training to network insufficient, so as to cause quality evaluation inaccuracy. The method for carrying out image block division to stereoscopic video sequence proposed using the embodiment of the present invention, can obtain sufficient amount of instruction Practice sample, improves the training effect to network, improve the accuracy of quality evaluation.Therefore, the embodiment of the present invention is implementing this hair When bright embodiment technical solution, image block division processing is carried out to the key video sequence frame extracted from three-dimensional video-frequency, with improvement pair The training effect of two tunnel depth convolutional neural networks.

It is further to note that video frame when being trained to above-mentioned two tunnels depth convolutional neural networks is processed Journey, video frame processing procedure phase when carrying out video quality evaluation with above-mentioned two tunnels depth convolutional neural networks are actually used Together, it just can guarantee that trained depth convolutional neural networks play the calculation processing ability having by training in this way.

S302, according to the standard quality score of the video frame mass fraction and the video frame, quality is calculated and comments Valence penalty values；

Specifically, being carried out to above-mentioned two tunnels depth convolutional neural networks in the video frame using pre-prepd three-dimensional video-frequency It is the standard quality score of the video frame of known pre-prepd three-dimensional video-frequency when training.As above-mentioned two tunnels depth convolution god The video frame mass fraction that after quality evaluation obtains the video frame mass fraction, will be obtained is carried out to the video frame of input through network It is compared with the standard quality score of the video frame, quality evaluation penalty values is calculated.The quality evaluation penalty values are upper The difference for stating mass fraction and standard quality score that two tunnel depth convolutional neural networks are evaluated, for indicating above-mentioned two tunnel Depth convolutional neural networks carry out the video frame of input the error of quality evaluation.

Further, the mass fraction and mark matter of the video frame of each image block pair or three-dimensional video-frequency can also be calculated Measure quality evaluation penalty values of the least mean-square error of score as three-dimensional video-frequency：

Wherein q_iThe output of bis- tunnel deep neural networks of Shi Gai, the i.e. mass fraction of image block pair or video frame；y_iIt is corresponding Three-dimensional video-frequency standard quality score, and i=1,2 ..., p indicate a total of p training image blocks to sample or video frame sample This.

If the quality evaluation penalty values are not less than given threshold, then follow the steps S303, utilizes the quality evaluation Penalty values carry out reverse link parameter update to the preset two tunnels depth convolutional neural networks；

Specifically, the quality evaluation penalty values being calculated in step S302 are not less than given threshold, then illustrate above-mentioned two The error of road depth convolutional neural networks is excessive, at this time using obtained quality evaluation penalty values to above-mentioned two tunnels depth convolution god Reversed parameter update is carried out through network, is adjusted the parameter of two tunnel depth convolutional neural networks, is made calculating error smaller.

It is then back to and executes step S301~S303, until the quality evaluation penalty values being calculated are less than given threshold. At this point it is possible to think the quality evaluation results of above-mentioned two tunnels depth convolutional neural networks with standard quality score very close to, The quality evaluation of i.e. above-mentioned two tunnels depth convolutional neural networks is accurate enough, and above-mentioned two tunnels depth convolutional neural networks have The ability of accurate evaluation video frame quality.

It can be used for arbitrary three-dimensional video-frequency by bis- tunnel depth convolutional neural networks of above-mentioned training process training Hou Video quality evaluation is carried out, i.e., video quality evaluation is carried out to arbitrary three-dimensional video-frequency according to technical solution of the embodiment of the present invention. Shown in Figure 4, the method for the evaluation video quality that the embodiment of the present invention proposes specifically includes：

S401, from the first multi-view video sequence of three-dimensional video-frequency, extraction setting quantity time domain key frame set First view video frame of quantity, and from the second multi-view video sequence of the three-dimensional video-frequency, extraction and the setting number Corresponding second view video frame of the first view video frame of amount obtains the second view video frame of setting quantity；

When extracting key video sequence frame from video sequence, the key video sequence that can represent video sequence entirety is extracted as possible Frame, quality evaluation is carried out to it could represent quality evaluation to entire video sequence.Theoretically, when being extracted from video sequence Equally distributed video frame on domain, can represent the overall condition of entire video sequence.Further, since three-dimensional video-frequency includes The video sequence at two visual angles, therefore key video sequence frame is extracted from the video sequence at two visual angles respectively, also, should protect It is mutual corresponding key video sequence frame to demonstrate,prove the key video sequence frame extracted from the video sequence at two visual angles.

When extracting key video sequence frame from three-dimensional video-frequency, first multi-view video sequence of the embodiment of the present invention from three-dimensional video-frequency In row, the time domain key frame of extraction setting quantity obtains the first view video frame of setting quantity.Specifically, as shown in figure 5, first The first frame and last frame of the first multi-view video sequence are first extracted, then extracts the intermediate frame of first frame and last frame, so Extract the intermediate frame of first frame and intermediate frame and the intermediate frame of intermediate frame and last frame again afterwards, and so on extraction first Equally distributed key video sequence frame in the time domain of view video frame obtains the first view video frame of setting quantity.It is carried above-mentioned While taking the first visual angle key frame, using identical video frame extraction method, extracted from the second multi-view video sequence same Second view video frame of quantity, then the second view video frame extracted are corresponding with the first view video frame extracted Video frame.

S402, according to preset image division methods, respectively to the first view video frame of setting quantity for extracting and The second view video frame for setting quantity carries out image block division processing；

Specifically, after extracting key video sequence frame respectively in the video sequence at two visual angles of three-dimensional video-frequency, the present invention Embodiment further carries out image block division processing to each key video sequence frame of extraction.For example, will be from the first of three-dimensional video-frequency The first key video sequence frame extracted in multi-view video sequence is divided into 64 × 64 image blocks, correspondingly, will be from three-dimensional video-frequency The second multi-view video sequence in first key video sequence frame extracting be divided into 64 × 64 image blocks, then from three-dimensional video-frequency Each image block of the first key video sequence frame extracted in first multi-view video sequence is regarded with the second visual angle from three-dimensional video-frequency Each image block of the first key video sequence frame extracted in frequency sequence constitutes corresponding image block pair.According to the method described above, Image block division is carried out to the second view video frame of the first view video frame of the setting quantity of extraction and setting quantity respectively Processing, obtains every group picture in each first view video frame and the second view video frame corresponding with first view video frame As block pair.

S403, using trained preset two tunnels depth convolutional neural networks, respectively to each first multi-view video Every group of image block in frame and the second view video frame corresponding with first view video frame is obtained to carrying out quality evaluation Every group of image block pair in each first view video frame and the second view video frame corresponding with first view video frame Mass fraction；

Specifically, the first view video frame and corresponding second view video frame are carried out image by executing step S402 Block divides to obtain each group image block in mutual corresponding two picture frames to rear, by every group of image block to inputting two tunnels depth respectively Spend convolutional neural networks.

For example, it is assumed that the first view video frame is divided into 64 × 64 image block a_i,j, i=1,2 ..., 64；J=1, 2,…,64.According to identical image block dividing mode, the second view video frame is divided into 64 × 64 image block b_i,j, i= 1,2,…,64；J=1,2 ..., 64.Then corresponding a_i,jWith b_i,jForm image block pair, such as a_1,1With b_1,1Constitute a figure As block pair, a_1,2With b_1,2It is right to constitute an image block ..., a_64,64With b_64,64Constitute an image block pair.By image block to defeated When entering two tunnel depth convolutional neural networks, by image block (such as a in the first view video frame_1,1) input depth convolution all the way Neural network, while by the image block in the second view video frame corresponding with the image block in above-mentioned first view video frame (such as b_1,1) input another way depth convolutional neural networks, two-way depth convolutional neural networks simultaneously to the image block of input into Row quality evaluation obtains this group of image block of two tunnel depth convolutional neural networks of input to (a_1,1With b_1,1) mass fraction.

According to above-mentioned processing method, respectively by each first view video frame and corresponding with the first view video frame second Every group of image block in view video frame carries out quality evaluation to inputting above-mentioned trained two tunnels depth convolutional neural networks, Obtain every group of image block pair in each first view video frame and the second view video frame corresponding with the first view video frame Mass fraction.

S404, to every in each first view video frame and the second view video frame corresponding with the first view video frame Group image block pair mass fraction carry out spacial average processing, obtain each first view video frame and with the first multi-view video The mass fraction of corresponding second view video frame of frame；

Specifically, for all images for including in mutual corresponding first view video frame and the second view video frame The mass fraction of block pair, the embodiment of the present invention carry out spacial average processing to it, then can obtain each group mutual corresponding The mass fraction of one view video frame and the second view video frame.

It specifically can be according to following formula to including in mutual corresponding first view video frame and the second view video frame The mass fraction of all image blocks pair carries out spacial average processing：

Wherein Q_jIt is_jThe mass fraction of frame video frame, j=1,2 ..., m indicate the video frame position in each time domain； q_iIt is the output of depth convolutional neural networks, p indicates the image number of blocks that video frame is included.

S405, to each first view video frame and the second multi-view video corresponding with first view video frame The mass fraction of frame carries out time domain averaging processing, obtains the quality evaluation result of the three-dimensional video-frequency.

Specifically, the quality point of each key video sequence frame at each visual angle of three-dimensional video-frequency is calculated by step S404 After number, time domain averaging processing is carried out to the mass fraction of each key video sequence frame at each visual angle, obtains each of three-dimensional video-frequency The mass fraction of the video sequence at a visual angle is to get to the quality evaluation result of the three-dimensional video-frequency.

When can specifically be carried out to the mass fraction of each key video sequence frame in visual angle video frame according to following formula Domain handling averagely：

Wherein Q_jIt is the mass fraction of the jth frame key video sequence frame of video sequence；Q indicates the mass fraction of video sequence；m Indicate the quantity of the key video sequence frame extracted from video sequence.

By above-mentioned introduction as it can be seen that the embodiment of the present invention propose method for evaluating video quality, by three-dimensional video-frequency sequence Row carry out video frame extraction and image block divides, to obtain a large amount of training sample data.Using above-mentioned training sample data Two tunnel depth convolutional neural networks are trained, the ability for making it have evaluation video quality.Then again using by training Two tunnel depth convolutional neural networks quality evaluation is carried out to the left and right multi-view video of three-dimensional video-frequency, to realize to three-dimensional video-frequency The effective evaluation of quality.Above-mentioned stereoscopic video quality evaluation procedure need not manually extract video features, can be completely achieved automatic Video quality evaluation, accuracy and the automatization level of video quality evaluation can be improved.

The embodiment of the invention also discloses a kind of devices of evaluation video quality, and shown in Figure 6, which includes：

Video frame extraction unit 100 obtains the first view video frame of setting quantity for being extracted from three-dimensional video-frequency, with And the second view video frame of setting quantity corresponding with setting first view video frame of quantity；

Video quality evaluation unit 110, for utilizing trained preset two tunnels depth convolutional neural networks to institute Second view video frame of the first view video frame and the setting quantity of stating setting quantity carries out quality evaluation, obtains described The quality evaluation result of three-dimensional video-frequency.

Optionally, in another embodiment of the present invention, shown in Figure 7, which further includes：

Video frame division unit 120, for according to preset image division methods, setting the first of quantity to described respectively View video frame and the second view video frame of setting quantity carry out image block division processing.

Optionally, in another embodiment of the present invention, video quality evaluation unit 110 is preset using trained Two tunnel depth convolutional neural networks to it is described setting quantity the first view video frame and it is described setting quantity the second visual angle Video frame carries out quality evaluation and is specifically used for when obtaining the quality evaluation result of the three-dimensional video-frequency：

Optionally, in another embodiment of the present invention, video quality evaluation unit 110 is additionally operable to described preset Two tunnel depth convolutional neural networks are trained；

When video quality evaluation unit 110 is to the training of the preset two tunnels depth convolutional neural networks, it is specifically used for：

Optionally, in another embodiment of the present invention, video frame extraction unit 100 is extracted from three-dimensional video-frequency and is obtained When setting the first view video frame of quantity and setting the second view video frame of quantity, it is specifically used for：

Specifically, the specific works content of each unit in the various embodiments described above, refers to above method embodiment Content, details are not described herein again.

The foregoing description of the disclosed embodiments enables those skilled in the art to implement or use the present invention. Various modifications to these embodiments will be apparent to those skilled in the art, as defined herein General Principle can be realized in other embodiments without departing from the spirit or scope of the present invention.Therefore, of the invention It is not intended to be limited to the embodiments shown herein, and is to fit to and the principles and novel features disclosed herein phase one The widest range caused.

Claims

1. a kind of method of evaluation video quality, which is characterized in that including：

Extraction obtains the first view video frame of setting quantity, and the first visual angle with the setting quantity from three-dimensional video-frequency Second view video frame of the corresponding setting quantity of video frame；

Using trained preset two tunnels depth convolutional neural networks to it is described setting quantity the first view video frame and Second view video frame of the setting quantity carries out quality evaluation, obtains the quality evaluation result of the three-dimensional video-frequency.

2. according to the method described in claim 1, it is characterized in that, obtaining the first of setting quantity being extracted from three-dimensional video-frequency After view video frame and the second view video frame of setting quantity, this method further includes：

According to preset image division methods, respectively to the first view video frame and the second of setting quantity of the setting quantity View video frame carries out image block division processing.

3. according to the method described in claim 2, it is characterized in that, described utilize trained preset two tunnels depth convolution Neural network carries out quality to the first view video frame of the setting quantity and the second view video frame of the setting quantity Evaluation, obtains the quality evaluation result of the three-dimensional video-frequency, including：

Using trained preset two tunnels depth convolutional neural networks, respectively to each first view video frame and with it is described Every group of image block in corresponding second view video frame of first view video frame obtains each first and regards to carrying out quality evaluation The mass fraction of angle video frame and every group of image block pair in the second view video frame corresponding with first view video frame；

To every in each first view video frame and the second view video frame corresponding with first view video frame Group image block pair mass fraction carry out spacial average processing, obtain each first view video frame and with first visual angle The mass fraction of corresponding second view video frame of video frame；

To the quality of each first view video frame and the second view video frame corresponding with first view video frame Score carries out time domain averaging processing, obtains the quality evaluation result of the three-dimensional video-frequency.

4. the method according to any claim in claims 1 to 3, which is characterized in that deep to preset two road Degree convolutional neural networks training process include：

By the first view video frame of preset three-dimensional video-frequency and the second multi-view video corresponding with first view video frame Frame inputs preset two tunnels depth convolutional neural networks, obtains the video frame mass fraction of the three-dimensional video-frequency；

According to the standard quality score of the video frame mass fraction and the video frame, quality evaluation penalty values are calculated；

If the quality evaluation penalty values are not less than given threshold, using the quality evaluation penalty values to described preset Two tunnel depth convolutional neural networks carry out reverse link parameter update.

5. according to the method described in claim 1, it is characterized in that, the extraction from three-dimensional video-frequency obtains the of setting quantity Second view video frame of one view video frame and setting quantity, including：

From the first multi-view video sequence of three-dimensional video-frequency, the time domain key frame of extraction setting quantity obtains the first of setting quantity View video frame, and from the second multi-view video sequence of the three-dimensional video-frequency, extract and regarded with the first of the setting quantity Video frame corresponding second view video frame in angle obtains the second view video frame of setting quantity.

6. a kind of device of evaluation video quality, which is characterized in that including：

Video frame extraction unit, for from three-dimensional video-frequency extraction obtain setting quantity the first view video frame, and with institute State the second view video frame of the corresponding setting quantity of the first view video frame of setting quantity；

Video quality evaluation unit, for utilizing trained preset two tunnels depth convolutional neural networks to the setting number First view video frame of amount and the second view video frame of the setting quantity carry out quality evaluation, obtain the three-dimensional video-frequency Quality evaluation result.

7. device according to claim 6, which is characterized in that the device further includes：

Video frame division unit, for according to preset image division methods, being regarded respectively to the first visual angle of the setting quantity Frequency frame and the second view video frame of setting quantity carry out image block division processing.

8. device according to claim 7, which is characterized in that the video quality evaluation unit utilizes trained pre- Bis- tunnel depth convolutional neural networks of She regard the first view video frame of the setting quantity and the second of the setting quantity Angle video frame carries out quality evaluation and is specifically used for when obtaining the quality evaluation result of the three-dimensional video-frequency：

9. the device according to any claim in claim 6 to 8, which is characterized in that the video quality evaluation list Member is additionally operable to be trained the preset two tunnels depth convolutional neural networks；

When the video quality evaluation unit is to the training of the preset two tunnels depth convolutional neural networks, it is specifically used for：

10. device according to claim 6, which is characterized in that the video frame extraction unit is extracted from three-dimensional video-frequency When obtaining the first view video frame of setting quantity and setting the second view video frame of quantity, it is specifically used for：