CN109302603A

CN109302603A - A kind of video speech quality appraisal procedure and device

Info

Publication number: CN109302603A
Application number: CN201710614327.1A
Authority: CN
Inventors: 王亚楠; 张志敏; 李欣然; 王嘉; 裴夺飞
Original assignee: China Mobile Communications Group Co Ltd; China Mobile Group Beijing Co Ltd
Current assignee: China Mobile Communications Group Co Ltd; China Mobile Group Beijing Co Ltd
Priority date: 2017-07-25
Filing date: 2017-07-25
Publication date: 2019-02-01

Abstract

The invention discloses a kind of video speech quality appraisal procedure and devices, which comprises when detecting that video calling occurs, obtains the video parameter and audio frequency parameter of this video calling respectively；The video quality of this video calling is determined according to the video parameter, and the audio quality of this video calling is determined according to the audio frequency parameter；And according to the video quality and the audio quality, determine the video speech quality of this video calling.Using method provided in an embodiment of the present invention, video speech quality can be determined according to video calling, without reference to the participation in source, and also improve the accuracy of the assessment result of video speech quality.

Description

A kind of video speech quality appraisal procedure and device

Technical field

The present invention relates to technical field of data processing more particularly to a kind of video speech quality appraisal procedure and devices.

Background technique

At this stage, VoLTE (Voice over Long Time Evolution, the voice on long term evolution) video calling Quality determining method mainly includes following 3 kinds: (1) P.910 and P.911 subjective evaluation method respectively describes multimedia such as ITU The video and view quality of application are assessed, P.910 Evaluation Method belong to multimedia application (such as video conference, storage and retrieval application, Video-medical application etc.) unidirectional overall video quality nonreciprocal subjective evaluation method, describe test method in 4 in specification: ACR (Absolute Category Rating, absolute category scoring), ACR-HR (Absolute Category Rating With Hidden Reference, band hide reference absolute category scoring), DCR (Degradation Category Rating, the scoring of loss type) and PC (Pair Comparision method, Paired Comparisons) etc.；(2) it is based on having reference PEVQ (Perceptual Evaluation of Video Quality, the video quality evaluation of recommendation)；(3) data are based on It is grouped the appraisal procedure of layer model, is specifically assessed using the grouping information of transport layer.RTP is mainly based upon in VoLTE (Realtime Transport Protocol, real-time transport protocol) packet is analyzed.

Above-mentioned three kinds of approach applications respectively have some limitations in the test of VoLTE the whole network, are mainly manifested in: (1) adopting Human intervention is needed with subjective evaluation method, inspection can only be sampled using the method, a large number of users can not be assessed, and can not Meet assessment in real time；(2) using there is the PEVQ method of reference to need to be arranged reference source, model of conversing not only it has not been suitable for, but also can not Assess whole network users；(3) appraisal procedure of the use based on data grouping layer model can not carry out parameter oneself according to different business Dynamic adjustment, computation complexity is high, computationally intensive, while can not converse massive video and carry out full dose calculating.

It can to sum up obtain, a kind of efficient video method for evaluating quality based on video calling how be found, without reference to source It participates in, so that it may video speech quality be assessed, and the accuracy that can also improve the assessment result of video speech quality is One of the technical problems that are urgent to solve.

Summary of the invention

The embodiment of the present invention provides a kind of video speech quality appraisal procedure and device, to carry out to video speech quality Assessment, without reference to the participation in source, that is, can be improved the accuracy of the assessment result of video speech quality.

In a first aspect, the embodiment of the present invention provides a kind of video speech quality appraisal procedure, comprising:

When detecting that video calling occurs, the video parameter and audio frequency parameter of this video calling are obtained respectively；

The video quality of this video calling is determined according to the video parameter, and this is determined according to the audio frequency parameter The audio quality of video calling；And

According to the video quality and the audio quality, the video speech quality of this video calling is determined.

Second aspect, the embodiment of the present invention provide a kind of video speech quality assessment device, comprising:

Acquiring unit, for detect video calling occur when, obtain respectively this video calling video parameter and Audio frequency parameter；

First determination unit, for determining the video quality of this video calling according to the video parameter, and according to institute State the audio quality that audio frequency parameter determines this video calling；

Second determination unit, for determining the view of this video calling according to the video quality and the audio quality Frequency speech quality.

The third aspect, the embodiment of the present invention provide a kind of electronic equipment, including memory, processor and are stored in described deposit On reservoir and the computer program that can run on the processor；The processor realizes that the application mentions when executing described program The video speech quality appraisal procedure of confession.

Fourth aspect, the embodiment of the present invention provide a kind of computer readable storage medium, are stored thereon with computer program, The program realizes the either step in video speech quality appraisal procedure provided by the present application when being executed by processor.

Beneficial effects of the present invention:

Video speech quality appraisal procedure provided in an embodiment of the present invention and device, when detecting that video calling occurs, The video parameter and audio frequency parameter of this video calling are obtained respectively；This video calling is determined according to the video parameter Video quality, and determine according to the audio frequency parameter audio quality of this video calling；And according to the video quality and The audio quality determines the video speech quality of this video calling.Using method provided in an embodiment of the present invention, Neng Gougen Video speech quality is determined according to video calling, without reference to the participation in source, and also improves the assessment result of video speech quality Accuracy.

Other features and advantages of the present invention will be illustrated in the following description, also, partly becomes from specification It obtains it is clear that understand through the implementation of the invention.The objectives and other advantages of the invention can be by written explanation Specifically noted structure is achieved and obtained in book, claims and attached drawing.

Detailed description of the invention

The drawings described herein are used to provide a further understanding of the present invention, constitutes a part of the invention, this hair Bright illustrative embodiments and their description are used to explain the present invention, and are not constituted improper limitations of the present invention.In the accompanying drawings:

Fig. 1 a is the flow diagram for the video speech quality appraisal procedure that the embodiment of the present invention one provides；

Fig. 1 b is the video parameter for obtaining this video calling respectively that the embodiment of the present invention one provides and audio frequency parameter Flow diagram；

Fig. 1 c is the parameter configuration that VoLTE video calling is established using H.264 agreement that the embodiment of the present invention one provides Configuration process schematic diagram；

Fig. 2 is the flow diagram for the acquisition video end-to-end time delay that the embodiment of the present invention one provides；

Fig. 3 is the flow diagram for the acquisition audio end-to-end time delay that the embodiment of the present invention one provides；

Fig. 4 is that the embodiment of the present invention one provides the flow diagram of order side time delay really；

Fig. 5 is the flow diagram of the audio quality for determination this video calling that the embodiment of the present invention one provides；

Fig. 6 is the flow diagram of the video speech quality for determination this video calling that the embodiment of the present invention one provides；

Fig. 7 is the flow diagram of the related coefficient for the determination video speech quality that the embodiment of the present invention one provides；

Fig. 8 is the structural schematic diagram that video speech quality provided by Embodiment 2 of the present invention assesses device；

Fig. 9 is the hardware configuration of the electronic equipment for the implementation video speech quality appraisal procedure that the embodiment of the present invention four provides Schematic diagram.

Specific embodiment

The embodiment of the present invention provides a kind of video speech quality appraisal procedure and device, is detecting video calling When, the video parameter and audio frequency parameter of this video calling are obtained respectively；Determine that this video is logical according to the video parameter The video quality of words, and determine according to the audio frequency parameter audio quality of this video calling；And according to the video matter Amount and the audio quality, determine the video speech quality of this video calling, not only realize and carry out to video speech quality Assessment, and without reference to the participation in source, while also improving the accuracy of the assessment result of video speech quality.

Video speech quality appraisal procedure provided in an embodiment of the present invention can be applied and VoLTE system, assessment VoLTE system Video speech quality of uniting is illustrated for method provided by the invention is applied to VoLTE system for convenience.

Below in conjunction with Figure of description, preferred embodiment of the present invention will be described, it should be understood that described herein Preferred embodiment only for the purpose of illustrating and explaining the present invention and is not intended to limit the present invention, and in the absence of conflict, this hair The feature in embodiment and embodiment in bright can be combined with each other.

Embodiment one

As shown in Figure 1a, the video speech quality appraisal procedure provided for the embodiment of the present invention one, may include following step It is rapid:

S11, detect video calling occur when, obtain the video parameter and audio frequency parameter of this video calling respectively.

When it is implemented, the video parameter and audio of this video calling can be obtained respectively according to method shown in Fig. 1 b Parameter:

S111, video calling real-time Transmission association is collected to this video calling progress data using at least one interface Discuss RTP data packet.

When it is implemented, can use medium surface and this video call data of control plane Collect jointly and then obtain RTP (Realtime Transport Protocol, real-time transport protocol) data packet.

When it is implemented, the interface of the medium surface can be, but not limited to include Mb；The interface of the control plane can With but be not limited to include: S1-MME, S6a, S11, Mw, Gx and Rx interface etc..

Wherein, the interface is arranged between the transmission node that any two are adjacent in data packet transmission link.

When it is implemented, above-mentioned interface is set on the transmission link that the data packet that video calling generates is passed through in advance, S1-MME interface such as is set between base station eNode and MME (Mobile Management Entity, mobile management entity), S6a interface is set between MME and HSS, in MME and S&PGW (Service&Packet Data Network Gateway, clothes Be engaged in & grouped data network gateway) between S11 interface etc. is set.

By extracting the RTP data packet on these interfaces, it is logical that this video can be obtained based on H.264 video encoding standard The video or audio frequency parameter of words.

S112, it collected video calling RTP data packet is decoded according to the communication protocol of each interface is decoded Video calling RTP data packet afterwards.

S113, according to decoded video calling RTP data packet, obtain this respectively using preset video encoding and decoding standard The video parameter and audio frequency parameter of secondary video calling.

When it is implemented, can use H.264 video encoding and decoding standard from control plane message obtains video parameter and audio Parameter, for example, being obtained from SIP/SDP message.

Preferably, the video parameter and the audio frequency parameter include coding parameter and transmission performance parameter, wherein institute It states coding parameter to include at least with the next item down: code/decode type, end-to-end time delay, code rate, frame per second and maximum transmitted bit rate, institute It states transmission performance parameter to include at least with the next item down: packet loss and transmission rate.

When it is implemented, video and audio are transmitted by respective carrying respectively, in establishing load bearing process, pass through carrying Parameter (QCI) available video and audio transmission rate.In addition, passing through the rtp streaming of associated video and the RTP of audio Stream, can calculate the transmission rate of video and audio in real time.

When it is implemented, can determine transmission bit rate according to following two ways:

Mode one: maximum transmitted bit rate is determined by the bearing parameter of VoLTE session establishment

Specifically, either audio or video can all establish corresponding carrying during VoLTE session establishment Bearer carries the parameter for having the maximum transmitted bit rate of the carrying in bearer.Bearing parameter can be carried by having a plurality of signaling, Such as: Initial Context Setup message is the maximum transmitted bit that can determine audio or video based on the bearing parameter Rate.

Mode two: actual transmission bit rate is determined by RTP data packet

Specifically, the data volume size that each RTP packet has COUNT information to indicate the packet, by all RTP of the session The quantity amount summation of packet can be obtained by actual transmission bit rate divided by transmission duration again.

When it is implemented, carrying out video in VoLTE system in the code rate and frame per second for determining audio frequency parameter and video parameter When call, in the handshaking procedure of call, the negotiation of session parameter is had, major parameter includes code rate and frame per second, be can use H.264/AVC agreement obtains the code rate and frame per second of this video calling.

Specifically, it can use the parameter configuration that H.264/AVC agreement establishes VoLTE video calling in advance, configured Journey can refer to Fig. 1 c, each in parameter profile-level inquiry VoLTE network through the above configuration after being successfully established A physical layer parameter.

In Fig. 1 c, m=video: medium type is indicated；B=AS: it indicates demand bandwidth (kbps)；B=RS, b=RR are indicated The Bandwidth adjustment of control channel；A=rtpmap indicates medium type and sample rate, wherein sample rate=9000, is H.264 Parameter；A=fmtp indicates the additional parameter of media formats, most important of which is that profile-level-id.It can be with according to ID It is associated with out code rate, frame per second and the maximum video bitrate (Max video bit rate) of video media；A=rtcp-fb: table Show the feedback parameter of control channel.

It should be noted that being H.264 a kind of new video encoding and decoding standard, the meeting pair in video session establishment process H.264 ability is held consultation, and negotiations process handling capacity collection and number are to identify.H.264 capability set be one comprising one or The list of multiple H.264 abilities, each H.264 ability include two mandatory parameters of Profile and Level and Several optional parameters such as CustomMaxMBPS, CustomMaxFS.In h .264, Profile, which is used to define, generates bit stream Encoding tool and algorithm, Level are then the parameter requests to some keys.The Profile of association encoding and decoding H.264 simultaneously and Level table, available code rate, frame per second.

Preferably, can determine that this video is logical in VoLTE system first according to RTP data packet when determining packet loss The total packet number and total number of discarded packets for talking about medium surface, then obtain the packet loss of this video calling divided by total packet number using total number of discarded packets Rate.

Preferably, can be negotiated by extracting the audio coding decoding of SIP/SDP agreement when determining audio coding decoding type In audio coding decoding scheme obtain audio coding decoding type；When determining coding and decoding video type, SIP/ can be equally extracted Coding and decoding video scheme in the coding and decoding video negotiation of SDP agreement obtains coding and decoding video type.

Preferably, obtaining theoretical speed by code/decode format when determining the maximum transmitted bit rate that audio frequency parameter includes Rate, then the practical maximum transmitted bit rate by association S1-MME interface and S11 interface acquisition VoLTE audio bearer.

Preferably, obtaining theoretical speed by code/decode format when determining the maximum transmitted bit rate that video parameter includes Rate, then the practical maximum transmitted bit rate by association S1-MME interface and S11 interface acquisition VoLTE video bearer.

Further, the end-to-end time delay of the video calling includes audio end-to-end time delay and video end-to-end time delay, It introduces individually below and determines that the audio end-to-end time delay that the audio frequency parameter includes and the video end that the video parameter includes arrive Terminal delay time:

When it is implemented, being parsed when determining the audio end-to-end time delay that audio frequency parameter includes by medium surface Mb Real-time Transport Protocol obtains the inter-packet gap and timestamp of audio RTP data packet, recycles the sip message and S1-MME interface of Mw Calling and called association is carried out with the channel information that S11 interface obtains, and then obtains audio end-to-end time delay.

When it is implemented, being parsed when determining the video end-to-end time delay that video parameter includes by medium surface Mb Real-time Transport Protocol obtains the inter-packet gap and timestamp of video RTP data packet, recycles the sip message and S1-MME interface of Mw Calling and called association is carried out with the channel information that S11 interface obtains, and then obtains video end-to-end time delay.

Specifically, voice packet or video bag have passed through the process of a sequence, from mobile phone to eNodeB base from reception is sent to The packet-based core networks EPC to stand to evolution to opposite end EPC, then to opposite end the base station eNodeB until arriving opposite end mobile phone.At this In sequence process, each section has link identification and packet mark: such as GTP tunnel number and RTP serial number.Pass through each section of chain Road matching, can do end-to-end tracking to each packet, then according to the timestamp of the timestamp given out a contract for a project and packet receiving, so that it may To the end-to-end time delay of audio or video.

Preferably, can be executed according to method described shown in Fig. 2, including following when determining video end-to-end time delay Step:

S21, the unilateral time delay for obtaining the video parameter.

When it is implemented, the step of can providing according to step S41~S43, determines the unilateral time delay of the video parameter, It is specific it needs to be determined that RTP data packet timestamp, then according to the time of the time difference of former and later two RTP and former and later two RTP Stamp difference determines unilateral time delay.

It should be noted that the timestamp field is to illustrate the synchronizing information of packet time in RTP stem, it is data The key that can be restored with correct time sequencing.The value of timestamp gives the sampling time of the first character section of data in grouping (Sampling Instant), it is desirable that the clock of sender's timestamp is continuous, monotone increasing, even if inputting in no data Or send data when be also such.In silence, sender need not send data, the growth of retention time stamp, in receiving end, by It is not lost in the serial number of the data grouping received, is known that there is no loss of data, as long as and relatively front and back grouping Timestamp difference, so that it may determine output time interval.

In addition, RTP provides that the initial time stamp of a session must randomly choose, but agreement is not specified by the list of timestamp Position, is also not specified by the accurate explanation of the value, but the particle of clock is determined by loadtype, and application types various in this way can To select suitably to export accuracy of timekeeping as needed.

When RTP transmits audio data, generally selected logical time stamp rate is identical as sampling rate, but regards in transmission Frequency according to when, it is necessary to make timestamp rate be greater than one of every frame it is ticking.If data are sampled in synchronization, Protocol Standard Standard also allows multiple grouping timestamp value having the same.

S22, the unilateral time delay according to the video parameter, determine the video end-to-end time delay.

It since VoLTE call is Bidirectional Flow, is delayed when determining unilateral, main quilt can be obtained by calling and called association The medium surface association cried, and then video end-to-end time delay can be determined according to unilateral time delay.

As shown in figure 3, the flow diagram of the acquisition audio end-to-end time delay provided for the embodiment of the present invention one, packet Include following steps:

S31, the unilateral time delay for obtaining the audio frequency parameter.

S32, the unilateral time delay according to the audio frequency parameter, determine the audio end-to-end time delay.

When it is implemented, the explanation of step S21~S22 can be referred to, overlaps will not be repeated.

When it is implemented, when determining video end-to-end time delay or audio end-to-end time delay, it is thus necessary to determine that unilateral time delay, tool Body can determine unilateral time delay according to method shown in Fig. 4, comprising the following steps:

S41, the Network Time Protocol NTP time difference for determining two adjacent video call RTP data packet.

When it is implemented, being illustrated so that adjacent data packet is respectively RTP1 and RTP2 data packet as an example, RTP1 data packet Corresponding NTP (Network Time Protocol, the Network Time Protocol) time is denoted as NTP1；The corresponding NTP of RTP2 data packet Time is denoted as NTP2, then the NTP time difference of RTP1 and RTP2 data packet is denoted as NTP2-NTP1.

S42, the time tolerance for determining two adjacent video call RTP data packet.

When it is implemented, the timestamp of RTP1 data packet and RTP2 data packet is denoted as RTP1 and RTP2 respectively, then RTP1 It can be denoted as with RTP2 packet time stamp difference: RTP2-RTP1.

The NTP difference and time tolerance that S43, basis are determined, determine unilateral time delay.

When it is implemented, unilateral time delay can indicate are as follows:

Unilateral time delay=(NTP2-NTP1)-(RTP2-RTP1) * clock frequency

Specifically, corresponding media sampling frequency can be looked into according to the packet header domain PT (loadtype) of RTP packet, thus obtained Clock frequency.

It should be noted that the unilateral time delay that the embodiment of the present invention one is related to refers to from the every of video calling originating end progress Primary video call reach video calling receiving end generate time delay, or from the video calling receiving end carry out it is each Secondary call reaches the time delay that video calling originating end generates.

Preferably, video calling RTCP (Realtime Transport Control Protocol, reality can also be obtained When transmission control protocol) data packet, determine the video parameter and audio frequency parameter using RTCP data packet, then recycle step The method of S21~21, step S31~S32 and step S41~S43 determines video end-to-end time delay and audio end-to-end time delay.

Specifically, in the video quality and audio quality for determining this video calling, it can use and G.1070 determine, It is described in detail below it:

S12, according to the video parameter, determine the video quality of this video calling.

When it is implemented, the video quality of this video calling can be determined according to the video parameter according to formula (1):

Wherein, V_qFor the video quality of this video calling；

I_codingVideo quality when for coding distortion；

Ppl_VFor packet loss；

Robustness degree degree for the video quality influenced by packet loss.

When it is implemented, packet loss Ppl_VIt is determined, can be directly substituted into formula (1) by step S113；It can table It is shown as:Fr_VIt can be by H.264 determining, Br_VTable Show the video bitrate at encoder, coefficient v₈,v₉,v₁₀,v₁₁,v₁₂It, can be by code/decode type, video format, key for constant Frame period and video show that size determines.

When it is implemented, video quality I when can determine coding distortion according to formula (2)_coding:

Wherein, O_frFor optimal frame rate corresponding when video quality is maximized under current video bit rate；

I_OfrFor the maximum value of video quality under current video bit rate；

Fr_VFor current frame rate；

D_FrVRobustness degree for the video quality influenced by frame per second.

When it is implemented, corresponding optimal frame rate when video quality is maximized under current video bit rate in formula (2) O_frIt can indicate are as follows: O_fr=v₁+v₂*Br_V,1≤O_fr≤30,v₁andv₂: const, wherein Br_VVideo ratio at presentation code device Special rate, the maximum value I of video quality under current video bit rate_OfrExpression formula are as follows: v₃,v₄,and v₅:const；And D in formula (1)_FrVIt is represented by D_FrV=v₆+v₇*Br_V, 0 < D_FrV,v₆andv₇: const, v₁, v₂,v₃,...,v₇For constant, it can show that size determines by code/decode type, video format, key frame interval and video.

S13, according to the audio frequency parameter, determine the audio quality of this video calling.

When it is implemented, can determine the sound of this video calling according to method shown in fig. 5 when executing step S13 Frequency quality:

S131, according to the audio frequency parameter, determine the quality index of the audio quality.

When it is implemented, can determine the quality index of the audio quality according to formula (3):

Wherein, Idte, WB are in this video calling because degenerating caused by caller's echo；

Ie-eff, WB are in this video calling because degenerating caused by voice coding and packet loss；

Qx is the quality index of the audio quality.

It is possible to further determine in video calling according to formula (4) because of the Idte, WB of degenerating caused by caller's echo:

Wherein, the expression formula of Re, WB are as follows: Re, WB=80+2.5* (TERV, WB-14)；And the expression formula of TERV, WB Are as follows:And

TELR is caller's echo loudness scale；

Ts is this audio frequency in video call end-to-end time delay.

It is possible to further determine in video calling according to formula (5) because of degeneration caused by voice coding and packet loss Ie-eff, WB:

Wherein, Ie_S, WB is the voice coding distortion factor；

Ppl_SFor voice packet loss in this video calling；

Bpl_SFor the robustness of packets of voice packet loss in this video calling.

S132, according to the quality index, the audio quality of this video calling is determined according to formula (6).

When it is implemented, the expression formula of formula (6) are as follows:

Wherein, Qx is the quality index；

S_qFor the audio quality of the video calling.

When it is implemented, the quality index Qx is determined by formula (3).

Specifically, the embodiment of the present invention is to the implementation sequence of step S12 and S13 without limiting.

S14, according to the video quality and audio quality, determine the video speech quality of this video calling.

When it is implemented, the video speech quality of this video calling can be determined according to process shown in fig. 6, including with Lower step:

S141, according to the video quality and the audio quality, determine respectively this video calling view quality and The degree of audio-video delay and synchronous caused view quality decline.

When it is implemented, can determine this video according to formula (7) according to the video quality and the audio quality The view quality of call:

MM_SV=m₅*S_q+m₆*V_q+m₇*S_q*V_q+m₈ (7)

Wherein, MM_SVFor the view quality of this video calling；

S_qFor the audio quality；

V_qFor the video quality；

m₅,m₆,m₇,m₈For related coefficient, the video depending on this video calling shows size and call task.

It is possible to further according to the video quality and audio quality, determine this video calling according to formula (8) The degree of audio-video delay and synchronous caused view quality decline:

MM_T=max { AD+MS, 1 } (8)

Wherein, AD=m₉*(T_S+T_V)+m₁₀,

T_SFor this audio frequency in video call end-to-end time delay；

T_VFor video end-to-end time delay in this video calling；

AD is audiovisual time delay absolute in this video calling；

MS is audio-visual media synchronization value in this video calling；

MM_TThe degree declined for the audio-video delay of this video calling with synchronous caused view quality；

m₉,m₁₀,m₁₁,m₁₂,m₁₃,m₁₄For related coefficient, the video depending on this video calling shows that size and call are appointed Business.

S142, the degree declined with synchronous caused view quality is postponed with the audio-video according to the view quality, The video speech quality of this video calling is determined according to formula (9):

MM_q=m₁*MM_SV+m₂*MM_T+m₃*MM_SV*MM_T+m₄ (9)

Wherein, MM_SVIndicate the view quality；

MM_TIndicate the degree of the audio-video delay with synchronous caused view quality decline；

m₁,m₂,m₃,m₄For related coefficient, the video depending on this video calling shows size and call task；

MM_qIndicate the video speech quality of this video calling.

So far, can use video speech quality appraisal procedure provided in an embodiment of the present invention can assess the view of VoLTE Frequency speech quality, and then massive video speech quality in whole net is assessed on this basis.

Specifically, the related coefficient m in formula (9)₁,m₂,m₃,m₄It can determine based on experience value.

Preferably, in order to improve the accuracy for the video speech quality determined, the present invention is determining video speech quality When, the related coefficient in formula (9) can also be determined first with linear fit algorithm, the correlation for then fitting being recycled to obtain Coefficient determines video speech quality, can specifically refer to process shown in Fig. 7, comprising the following steps:

S51, detect video calling occur when, extract several video calling samples.

S52, the subjective scoring for determining the video calling sample.

It is obtained in network after video speech quality executing step S11~S14, extracts individual call as cycle tests. Subjective scoring is carried out to known cycle tests according to regulation P.800.

When it is implemented, P.800 the subjective scoring of the video calling sample can be determined with ITU-T, ITU-T is P.800 A kind of method for defining subjective testing video, that is, MOS (Mean Opinion Score) test.Test method is will to use The behavior of family viewing video is investigated and quantifies, and to primary standard video and passes through wireless network respectively by different investigation users Decline video after propagation carries out subjective feeling comparison, then gives a mark.Marking can be according to Absolute category Rating, that is, ACR code of points, in general ACR points are 5 grades, shown in reference table 1:

Subjective feeling	Score value
		Excellent	5
Good	4
		Fair	3
Poor	2
		Bad	1

According to the corresponding relationship of subjective feeling and score value in table 1, the subjective scoring of available each video calling sample.

S53, the degree declined with synchronous caused view quality is postponed with the audio-video based on the view quality, and In conjunction with the subjective scoring of the video calling sample, the related coefficient m for evaluating this video calling total quality is obtained₁, m₂,m₃,m₄。

When it is implemented, can use least square method is fitted the related coefficient estimated in formula (9), so that according to quasi- It is more accurate to close the video speech quality that obtained related coefficient is determined.

Specifically, by MM in formula (9)_q、MM_SVAnd MM_TRegard as and m₁,m₂,m₃,m₄Linear relationship, by MM_SVAnd MM_T As independent variable, by MM_qAs dependent variable, in parameter fitting process, the subjective scoring carried out to each sample of extraction is obtained The appraisal result arrived is equivalent to MM as test value_q, then can be obtained by several linear equations, further according to least square The principle of difference is fitted, so that it may obtain optimal related coefficient m₁,m₂,m₃,m₄Value.

For example, having extracted 10 video calling samples, it is then based on and subjectivity P.800 is carried out to this 10 video calling samples Scoring obtains 10 appraisal results, then using this 10 appraisal results as dependent variable MM_qValue.And utilize formula of the present invention (7) and formula (8) can determine the corresponding MM of this 10 video calling samples respectively_SVAnd MM_TValue, then by this 10 groups with it is right 10 appraisal results answered, which are updated in formula (9), can be obtained 10 linear equations, and available optimal one group based on this Correlation coefficient value.Further, it is also possible to obtain optimal correlation coefficient value using cross validation algorithm.

Determining related coefficient m₁,m₂,m₃,m₄Afterwards, the video that this video calling can be obtained in substitution formula (9) leads to Talk about quality.

The video speech quality appraisal procedure that the embodiment of the present invention one provides, when detecting that video calling occurs, respectively Obtain the video parameter and audio frequency parameter of this video calling；The video matter of this video calling is determined according to the video parameter It measures, and determines the audio quality of this video calling according to the audio frequency parameter；And according to the video quality and the sound Frequency quality determines the video speech quality of this video calling.It, can be according to video using method provided in an embodiment of the present invention Converse determine video speech quality, without reference to the participation in source, and also improve video speech quality assessment result it is accurate Property, the assessment suitable for massive video speech quality；In addition, several can also be extracted when detecting that video calling occurs Video calling sample, and determine the subjective scoring of the video calling sample, then mention from video parameter and audio frequency parameter respectively Parameter sample is taken, and in conjunction with the subjective scoring of the video calling sample, using preset algorithm to the parameter sample and described Subjective scoring carries out processing and obtains the parameter sets for being used to indicate this video calling total quality.Obtaining the parameter sets Afterwards, the relevant parameter of different business can be adjusted according to the parameter sets, and then improves the accuracy of video speech quality assessment.

Embodiment two

Based on the same inventive concept, a kind of video speech quality assessment device is additionally provided in the embodiment of the present invention, due to The principle that above-mentioned apparatus solves the problems, such as is similar to video speech quality appraisal procedure, therefore the implementation side of may refer to of above-mentioned apparatus The implementation of method, overlaps will not be repeated.

As shown in figure 8, the structural schematic diagram of device is assessed for video speech quality provided by Embodiment 2 of the present invention, including Acquiring unit 81, the first determination unit 82 and the second determination unit 83, in which:

Acquiring unit 81, for obtaining the video parameter of this video calling respectively when detecting that video calling occurs And audio frequency parameter；

First determination unit 82, for determining the video quality of this video calling according to the video parameter, and according to The audio frequency parameter determines the audio quality of this video calling；

Second determination unit 83 determines this view for being handled according to the video quality and the audio quality The video speech quality of frequency call.

When it is implemented, the acquiring unit 81, is specifically used for carrying out this video calling using at least one interface Data collect video calling realtime transmission protocol RTP data packet, and the interface is arranged in data packet transmission link any Between two adjacent transmission nodes；Collected video calling RTP data packet is solved according to the communication protocol of each interface Code obtains decoded video calling RTP data packet；According to decoded video calling RTP data packet, preset video is utilized Encoding and decoding standard obtains the video parameter and audio frequency parameter of this video calling respectively.

When it is implemented, the acquiring unit 81, is specifically used for obtaining video end-to-end time delay in accordance with the following methods: obtain The unilateral time delay of the video parameter；According to the unilateral time delay of the video parameter of acquisition, when determining that the video is end-to-end Prolong.

Preferably, the acquiring unit 81, is also used to obtain audio end-to-end time delay in accordance with the following methods: obtaining the sound The unilateral time delay of frequency parameter；According to the unilateral time delay of the audio frequency parameter of acquisition, the audio end-to-end time delay is determined.

Specifically, the acquiring unit 81, specifically for determining unilateral time delay in accordance with the following methods: determining two neighboring view The Network Time Protocol NTP time difference of frequency call RTP data packet；And determine two adjacent video call RTP data packet Time tolerance；According to the NTP difference and time tolerance determined, unilateral time delay is determined.

When it is implemented, first determination unit 82, is specifically used for true according to the video parameter according to following formula Make the video quality of this video calling:

Wherein, V_qFor the video quality of this video calling；

I_codingVideo quality when for coding distortion；

Ppl_VFor packet loss；

Robustness degree for the video quality influenced by packet loss.

Further, first determination unit 82, specifically for determining video when coding distortion in accordance with the following methods Quality I_coding:

I_OfrFor the maximum value of video quality under current video bit rate；

Fr_VFor current frame rate；

D_FrVRobustness degree for the video quality influenced by frame per second.

When it is implemented, first determination unit 82, is specifically used for determining the audio matter according to the audio frequency parameter The quality index of amount；And

According to the quality index, the audio quality of this video calling is determined according to following formula:

Wherein, Qx is the quality index；

S_qFor the audio quality of the video calling.

Preferably, first determination unit 82, is specifically used for determining institute according to the audio frequency parameter according to following formula State the quality index of audio quality:

Qx is the quality index of the audio quality.

Further, second determination unit, specifically for determining Idte according to following formula:

TELR is caller's echo loudness scale；

Ts is this audio frequency in video call end-to-end time delay.

Further, first determination unit 82, specifically for determining Ie-eff, WB according to following formula:

Wherein, Ie_S, WB is the voice coding distortion factor；

Ppl_SFor voice packet loss in this video calling；

Bpl_SFor the robustness of packets of voice packet loss in this video calling.

When it is implemented, second determination unit 83, is specifically used for according to the video quality and the audio quality, The degree that the view quality of this video calling declines with audio-video delay and synchronous caused view quality is determined respectively；

Postpone the degree declined with synchronous caused view quality with the audio-video according to the view quality, under State the video speech quality that formula determines this video calling:

MM_q=m₁*MM_SV+m₂*MM_T+m₃*MM_SV*MM_T+m₄

Wherein, MM_SVIndicate the view quality；

MM_qIndicate the video speech quality of this video calling.

Preferably, second determination unit 83, it is specifically used for according to the following equation according to the video quality and described Audio quality determines the view quality of this video calling:

MM_SV=m₅*S_q+m₆*V_q+m₇*S_q*V_q+m₈

Wherein, MM_SVFor the view quality of this video calling；

S_qFor the audio quality；

V_qFor the video quality；

Further, second determination unit 83 is specifically used for according to the following equation according to the video quality and sound Frequency quality determines the degree that the audio-video delay of this video calling declines with synchronous caused view quality:

MM_T=max { AD+MS, 1 }

Wherein, AD=m₉*(T_S+T_V)+m₁₀,

T_SFor this audio frequency in video call end-to-end time delay；

T_VFor video end-to-end time delay in this video calling；

AD is audiovisual time delay absolute in this video calling；

MS is audio-visual media synchronization value in this video calling；

Specifically, second determination unit, specifically for extracting several videos when detecting that video calling occurs Call sample；And determine the subjective scoring of the video calling sample；And prolonged based on the view quality and the audio-video The degree declined late with synchronous caused view quality, and in conjunction with the subjective scoring of the video calling sample, it obtains for commenting The related coefficient m of this video calling total quality of valence₁,m₂,m₃,m₄。

Embodiment three

The embodiment of the present application three provides a kind of nonvolatile computer storage media, the computer storage medium storage There are computer executable instructions, which can be performed the video speech quality in above-mentioned any means embodiment Appraisal procedure.

Example IV

Fig. 9 is the hardware configuration of the electronic equipment for the implementation video speech quality appraisal procedure that the embodiment of the present invention four provides Schematic diagram, as shown in figure 9, the electronic equipment includes:

One or more processors 910 and memory 920, in Fig. 9 by taking a processor 910 as an example.

The electronic equipment for executing video speech quality appraisal procedure can also include: input unit 930 and output device 940。

Processor 910, memory 920, input unit 930 and output device 940 can pass through bus or other modes It connects, in Fig. 9 for being connected by bus.

Memory 920 is used as a kind of non-volatile computer readable storage medium storing program for executing, can be used for storing non-volatile software journey Sequence, non-volatile computer executable program and module, such as the video speech quality appraisal procedure pair in the embodiment of the present application Program instruction/module/the unit answered is (for example, attached acquiring unit shown in Fig. 8 81, the first determination unit 82 and second determine list Member is 83).Non-volatile software program, instruction and the module/unit that processor 910 is stored in memory 920 by operation, Thereby executing the various function application and data processing of server or intelligent terminal, i.e. realization above method embodiment video Speech quality appraisal procedure.

Memory 920 may include storing program area and storage data area, wherein storing program area can store operation system Application program required for system, at least one function；Storage data area, which can be stored, assesses making for device according to video speech quality With the data etc. created.In addition, memory 920 may include high-speed random access memory, it can also include non-volatile Memory, for example, at least a disk memory, flush memory device or other non-volatile solid state memory parts.In some realities It applies in example, optional memory 920 includes the memory remotely located relative to processor 910, these remote memories can lead to It crosses network connection to video speech quality and assesses device.The example of above-mentioned network include but is not limited to internet, intranet, Local area network, mobile radio communication and combinations thereof.

Input unit 930 can receive the number or character information of input, and generates and assess device with video speech quality User setting and function control related key signals input.Output device 940 may include that display screen etc. shows equipment.

One or more of modules are stored in the memory 920, when by one or more of processors When 910 execution, the video speech quality appraisal procedure in above-mentioned any means embodiment is executed.

Method provided by the embodiment of the present application can be performed in the said goods, has the corresponding functional module of execution method and has Beneficial effect.The not technical detail of detailed description in the present embodiment, reference can be made to method provided by the embodiment of the present application.

The electronic equipment of the embodiment of the present application exists in a variety of forms, including but not limited to:

(1) mobile communication equipment: the characteristics of this kind of equipment is that have mobile communication function, and to provide speech, data Communication is main target.This Terminal Type includes: smart phone (such as iPhone), multimedia handset, functional mobile phone and low Hold mobile phone etc..

(2) super mobile personal computer equipment: this kind of equipment belongs to the scope of personal computer, there is calculating and processing function Can, generally also have mobile Internet access characteristic.This Terminal Type includes: PDA, MID and UMPC equipment etc., such as iPad.

(3) portable entertainment device: this kind of equipment can show and play multimedia content.Such equipment include: audio, Video player (such as iPod), handheld device, e-book and intelligent toy and portable car-mounted navigation equipment.

(4) server: providing the equipment of the service of calculating, and the composition of server includes that processor, hard disk, memory, system are total Line etc., server is similar with general computer architecture, but due to needing to provide highly reliable service, in processing energy Power, stability, reliability, safety, scalability, manageability etc. are more demanding.

(5) other electronic devices with data interaction function.

Embodiment five

The embodiment of the present application five provides a kind of computer program product, wherein the computer program product includes depositing The computer program in non-transient computer readable storage medium is stored up, the computer program includes program instruction, wherein when When described program instruction is computer-executed, the computer is set to execute any one of the application above method embodiment video logical Talk about method for evaluating quality.

Video speech quality appraisal procedure provided in an embodiment of the present invention and device, when detecting that video calling occurs, The video parameter and audio frequency parameter of this video calling are obtained respectively；The view of this video calling is determined according to the video parameter Frequency quality, and determine according to the audio frequency parameter audio quality of this video calling；And according to the video quality and institute Audio quality is stated, determines the video speech quality of this video calling.It, being capable of basis using method provided in an embodiment of the present invention Video calling determines video speech quality, without reference to the participation in source, and also improve the assessment result of video speech quality Accuracy, the assessment suitable for massive video speech quality；In addition, when detecting that video calling occurs, if can also extract Dry video calling sample, and determine the subjective scoring of the video calling sample, then respectively from video parameter and audio frequency parameter Middle extracting parameter sample, and in conjunction with the subjective scoring of the video calling sample, using preset algorithm to the parameter sample and The subjective scoring carries out processing and obtains the parameter sets for being used to indicate this video calling total quality.Obtaining the parameter set After conjunction, the relevant parameter of different business can be adjusted according to the parameter sets, and then improves the accurate of video speech quality assessment Property.

The assessment device of video speech quality provided by embodiments herein can be realized by a computer program.This field Technical staff is it should be appreciated that above-mentioned module division mode is only one of numerous module division modes, if divided It all should be the application's as long as video speech quality assessment device has above-mentioned function for other modules or non-division module Within protection scope.

It should be understood by those skilled in the art that, the embodiment of the present invention can provide as method, system or computer program Product.Therefore, complete hardware embodiment, complete software embodiment or reality combining software and hardware aspects can be used in the present invention Apply the form of example.Moreover, it wherein includes the computer of computer usable program code that the present invention, which can be used in one or more, The computer program implemented in usable storage medium (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.) produces The form of product.

The present invention be referring to according to the method for the embodiment of the present invention, the process of equipment (system) and computer program product Figure and/or block diagram describe.It should be understood that every one stream in flowchart and/or the block diagram can be realized by computer program instructions The combination of process and/or box in journey and/or box and flowchart and/or the block diagram.It can provide these computer programs Instruct the processor of general purpose computer, special purpose computer, Embedded Processor or other programmable data processing devices to produce A raw machine, so that being generated by the instruction that computer or the processor of other programmable data processing devices execute for real The device for the function of being specified in present one or more flows of the flowchart and/or one or more blocks of the block diagram.

These computer program instructions, which may also be stored in, is able to guide computer or other programmable data processing devices with spy Determine in the computer-readable memory that mode works, so that it includes referring to that instruction stored in the computer readable memory, which generates, Enable the manufacture of device, the command device realize in one box of one or more flows of the flowchart and/or block diagram or The function of being specified in multiple boxes.

These computer program instructions also can be loaded onto a computer or other programmable data processing device, so that counting Series of operation steps are executed on calculation machine or other programmable devices to generate computer implemented processing, thus in computer or The instruction executed on other programmable devices is provided for realizing in one or more flows of the flowchart and/or block diagram one The step of function of being specified in a box or multiple boxes.

Although preferred embodiments of the present invention have been described, it is created once a person skilled in the art knows basic Property concept, then additional changes and modifications can be made to these embodiments.So it includes excellent that the following claims are intended to be interpreted as It selects embodiment and falls into all change and modification of the scope of the invention.

Obviously, various changes and modifications can be made to the invention without departing from essence of the invention by those skilled in the art Mind and range.In this way, if these modifications and changes of the present invention belongs to the range of the claims in the present invention and its equivalent technologies Within, then the present invention is also intended to include these modifications and variations.

Claims

1. a kind of video speech quality appraisal procedure characterized by comprising

The video quality of this video calling is determined according to the video parameter, and this video is determined according to the audio frequency parameter The audio quality of call；And

2. the method as described in claim 1, which is characterized in that when detecting that video calling occurs, obtain this view respectively The video parameter and audio frequency parameter of frequency call, specifically include:

Data are carried out to this video calling using at least one interface and collect video calling realtime transmission protocol RTP data Packet, the interface are arranged between the transmission node that any two are adjacent in data packet transmission link；

It is logical to be decoded to obtain decoded video to collected video calling RTP data packet according to the communication protocol of each interface Talk about RTP data packet；

According to decoded video calling RTP data packet, it is logical that this video is obtained respectively using preset video encoding and decoding standard The video parameter and audio frequency parameter of words.

3. method according to claim 2, which is characterized in that the video parameter and the audio frequency parameter include coding ginseng Several and transmission performance parameter, wherein the coding parameter is included at least with the next item down: code/decode type, end-to-end time delay, code Rate, frame per second and maximum transmitted bit rate, the transmission performance parameter are included at least with the next item down: packet loss and transmission rate.

4. method as claimed in claim 3, which is characterized in that obtain video end-to-end time delay in accordance with the following methods:

Obtain the unilateral time delay of the video parameter；

According to the unilateral time delay of the video parameter, the video end-to-end time delay is determined.

5. method as claimed in claim 3, which is characterized in that obtain audio end-to-end time delay in accordance with the following methods:

Obtain the unilateral time delay of the audio frequency parameter；

According to the unilateral time delay of the audio frequency parameter, the audio end-to-end time delay is determined.

6. method as described in claim 4 or 5, which is characterized in that determine unilateral time delay in accordance with the following methods:

Determine the Network Time Protocol NTP time difference of two adjacent video call RTP data packet；And

Determine the time tolerance of two adjacent video call RTP data packet；

According to the NTP difference and time tolerance determined, unilateral time delay is determined.

7. method as claimed in claim 3, which is characterized in that determine this view according to the video parameter according to following formula The video quality of frequency call:

Wherein, V_qFor the video quality of this video calling；

I_codingVideo quality when for coding distortion；

Ppl_VFor packet loss；

Robustness degree for the video quality influenced by packet loss.

8. the method for claim 7, which is characterized in that determine video quality when coding distortion in accordance with the following methods I_coding:

I_OfrFor the maximum value of video quality under current video bit rate；

Fr_VFor current frame rate；

D_FrVRobustness degree for the video quality influenced by frame per second.

9. method as claimed in claim 3, which is characterized in that determine the audio of this video calling according to the audio frequency parameter Quality specifically includes:

According to the audio frequency parameter, the quality index of the audio quality is determined；And

Wherein, Qx is the quality index；

S_qFor the audio quality of the video calling.

10. method as claimed in claim 9, which is characterized in that according to following formula according to the audio frequency parameter, determine described in The quality index of audio quality:

Qx is the quality index of the audio quality.

11. method as claimed in claim 10, which is characterized in that determine Idte, WB according to following formula:

TELR is caller's echo loudness scale；

This audio frequency in video call end-to-end time delay of Ts.

12. method as claimed in claim 10, which is characterized in that determine Ie-eff, WB according to following formula:

Wherein, Ie_S, WB is the voice coding distortion factor；

Ppl_SFor voice packet loss in this video calling；

Bpl_SFor the robustness of packets of voice packet loss in this video calling.

13. the method as described in claim 7 or 9, which is characterized in that according to the video quality and the audio quality, really The video speech quality of this fixed video calling, specifically includes:

According to the video quality and the audio quality, view quality and the audio-video delay of this video calling are determined respectively With the degree of synchronous caused view quality decline；

Postpone the degree declined with synchronous caused view quality with the audio-video according to the view quality, according to following public affairs Formula determines the video speech quality of this video calling:

MM_q=m₁*MM_SV+m₂*MM_T+m₃*MM_SV*MM_T+m₄

Wherein, MM_SVIndicate the view quality；

MM_qIndicate the video speech quality of this video calling.

14. method as claimed in claim 13, which is characterized in that according to the following equation according to the video quality and the sound Frequency quality determines the view quality of this video calling:

MM_SV=m₅*S_q+m₆*V_q+m₇*S_q*V_q+m₈

Wherein, MM_SVFor the view quality of this video calling；

S_qFor the audio quality；

V_qFor the video quality；

15. method as claimed in claim 13, which is characterized in that according to the following equation according to the video quality and audio matter Amount determines the degree that the audio-video delay of this video calling declines with synchronous caused view quality:

MM_T=max { AD+MS, 1 }

Wherein, AD=m₉*(T_S+T_V)+m₁₀,

T_SFor this audio frequency in video call end-to-end time delay；

T_VFor video end-to-end time delay in this video calling；

AD is audiovisual time delay absolute in this video calling；

MS is audio-visual media synchronization value in this video calling；

m₉,m₁₀,m₁₁,m₁₂,m₁₃,m₁₄For related coefficient, the video depending on this video calling shows size and call task.

16. method as claimed in claim 13, which is characterized in that determine related coefficient m by the following method₁,m₂,m₃,m₄:

When detecting that video calling occurs, several video calling samples are extracted；And

Determine the subjective scoring of the video calling sample；And

Postpone the degree declined with synchronous caused view quality with the audio-video based on the view quality, and in conjunction with described The subjective scoring of video calling sample obtains the related coefficient m for evaluating this video calling total quality₁,m₂,m₃,m₄。

17. a kind of video speech quality assesses device characterized by comprising

Acquiring unit, for obtaining the video parameter and audio of this video calling respectively when detecting that video calling occurs Parameter；

First determination unit, for determining the video quality of this video calling according to the video parameter, and according to the sound Frequency parameter determines the audio quality of this video calling；

Second determination unit, for determining that the video of this video calling is logical according to the video quality and the audio quality Talk about quality.

18. device as claimed in claim 17, which is characterized in that

The acquiring unit, it is logical specifically for collecting video to this video calling progress data using at least one interface Realtime transmission protocol RTP data packet is talked about, the transmission node that any two are adjacent in data packet transmission link is arranged in the interface Between；Collected video calling RTP data packet is decoded to obtain decoded video according to the communication protocol of each interface Call RTP data packet；According to decoded video calling RTP data packet, obtained respectively using preset video encoding and decoding standard The video parameter and audio frequency parameter of this video calling.

19. device as claimed in claim 18, which is characterized in that the video parameter and the audio frequency parameter include coding Parameter and transmission performance parameter, wherein the coding parameter is included at least with the next item down: code/decode type, end-to-end time delay, code Rate, frame per second and maximum transmitted bit rate, the transmission performance parameter are included at least with the next item down: packet loss and transmission rate.

20. device as claimed in claim 19, which is characterized in that

The acquiring unit is specifically used for obtaining video end-to-end time delay in accordance with the following methods: obtaining the list of the video parameter Side time delay；According to the unilateral time delay of the video parameter, the video end-to-end time delay is determined.

21. device as claimed in claim 18, which is characterized in that

The acquiring unit is also used to obtain audio end-to-end time delay in accordance with the following methods: obtaining the unilateral of the audio frequency parameter Time delay；According to the unilateral time delay of the audio frequency parameter, the audio end-to-end time delay is determined.

22. the device as described in claim 20 or 21, which is characterized in that

The acquiring unit, specifically for determining unilateral time delay in accordance with the following methods: determining two adjacent video call RTP data The Network Time Protocol NTP time difference of packet；And determine the time tolerance of two adjacent video call RTP data packet；Root According to the NTP difference and time tolerance determined, unilateral time delay is determined.

23. device as claimed in claim 19, which is characterized in that

First determination unit, specifically for determining the view of this video calling according to the video parameter according to following formula Frequency quality:

Wherein, V_qFor the video quality of this video calling；

I_codingVideo quality when for coding distortion；

Ppl_VFor packet loss；

Robustness degree for the video quality influenced by packet loss.

24. device as claimed in claim 23, which is characterized in that

First determination unit, specifically for determining video quality I when coding distortion in accordance with the following methods_coding:

I_OfrFor the maximum value of video quality under current video bit rate；

Fr_VFor current frame rate；

D_FrVRobustness degree for the video quality influenced by frame per second.

25. device as claimed in claim 19, which is characterized in that

First determination unit is specifically used for determining the quality index of the audio quality according to the audio frequency parameter；And

Wherein, Qx is the quality index；

S_qFor the audio quality of the video calling.

26. device as claimed in claim 25, which is characterized in that

First determination unit is specifically used for determining the audio quality according to the audio frequency parameter according to following formula Quality index:

Qx is the quality index of the audio quality.

27. device as claimed in claim 26, which is characterized in that

First determination unit, specifically for determining Idte, WB according to following formula:

TELR is caller's echo loudness scale；

Ts is this audio frequency in video call end-to-end time delay.

28. device as claimed in claim 26, which is characterized in that

First determination unit, specifically for determining Ie-eff, WB according to following formula:

Wherein, Ie_S, WB is the voice coding distortion factor；

Ppl_SFor voice packet loss in this video calling；

Bpl_SFor the robustness of packets of voice packet loss in this video calling.

29. the device as described in claim 23 or 25, which is characterized in that

Second determination unit is specifically used for determining this video respectively according to the video quality and the audio quality The view quality of call postpones and the degree of synchronous caused view quality decline with audio-video；

MM_q=m₁*MM_SV+m₂*MM_T+m₃*MM_SV*MM_T+m₄

Wherein, MM_SVIndicate the view quality；

MM_qIndicate the video speech quality of this video calling.

30. device as claimed in claim 29, which is characterized in that

Second determination unit is specifically used for being determined according to the video quality and the audio quality according to the following equation The view quality of this video calling:

MM_SV=m₅*S_q+m₆*V_q+m₇*S_q*V_q+m₈

Wherein, MM_SVFor the view quality of this video calling；

S_qFor the audio quality；

V_qFor the video quality；

31. device as claimed in claim 29, which is characterized in that

Second determination unit is specifically used for determining this according to the following equation according to the video quality and audio quality The degree of the audio-video delay of video calling and synchronous caused view quality decline:

MM_T=max { AD+MS, 1 }

Wherein, AD=m₉*(T_S+T_V)+m₁₀,

T_SFor this audio frequency in video call end-to-end time delay；

T_VFor video end-to-end time delay in this video calling；

AD is audiovisual time delay absolute in this video calling；

MS is audio-visual media synchronization value in this video calling；

32. device as claimed in claim 29, which is characterized in that

Second determination unit, specifically for extracting several video calling samples when detecting that video calling occurs；And Determine the subjective scoring of the video calling sample；And postponed based on the view quality with the audio-video and synchronize cause The degree of view quality decline obtain logical for evaluating this video and in conjunction with the subjective scoring of the video calling sample Talk about the related coefficient m of total quality₁,m₂,m₃,m₄。

33. a kind of electronic equipment, including memory, processor and it is stored on the memory and can transports on the processor Capable computer program；It is characterized in that, the processor is realized when executing described program such as any one of claim 1~16 institute The video speech quality appraisal procedure stated.

34. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the program is by processor It realizes when execution such as the step in the described in any item video speech quality appraisal procedures of claim 1~16.