CN106782610B

CN106782610B - A kind of acoustical testing method of audio conferencing

Info

Publication number: CN106782610B
Application number: CN201611004655.1A
Authority: CN
Inventors: 洪剑平; 陈锦辉
Original assignee: Fujian Star-Net Wisdom Technology Co Ltd
Current assignee: Fujian Xingwang Intelligent Technology Co., Ltd
Priority date: 2016-11-15
Filing date: 2016-11-15
Publication date: 2019-09-20
Anticipated expiration: 2036-11-15
Also published as: CN106782610A

Abstract

The present invention provides a kind of acoustical testing method of audio conferencing, the method are as follows: obtains the corresponding source audio of each member for participating in audio conferencing；Each member is based on source audio by audio system and constructs and Composite tone similar in target audio, after i.e. each member deducts the source audio of itself, Composite tone is formed after the source audio of other members is mixed, which is the corresponding Composite tone of each member itself；By itself corresponding Composite tone and the target audio data of audio conference server transmission, using ITU-T, P.862 recommendation progress PESQ calculating obtains real-time sound quality currently to each member, and the records of values of sound quality is got off, to complete the test of sound quality.This method test is simple, and test and comparison is accurate.

Description

A kind of acoustical testing method of audio conferencing

Technical field

The present invention relates to technical field of communication equipment more particularly to a kind of acoustical testing methods of audio conferencing.

Background technique

In existing acoustical testing model, that the most commonly used is PESQ (Perceptual evaluation of Speech quality) i.e.: the test method of subjective speech quality assessment.This method is that P.862 recommendation provides ITU-T Objective MOS (Mean Opinion Score, mean subjective opinion point) value evaluation method.Use the test method of PESQ, it is necessary to Active audio frequency and target audio.By calculating the PESQ of source audio and target audio, to judge the front and back difference of sound quality, into And obtain MOS value.MOS value is exactly the quantized result of current corresponding sound quality.

In audio conferencing, each member is added in the same meeting, can only hear oneself non-audio, that is, for For audio conferencing, what each member heard is the audio mixing result of remaining member.This also just brings up the sound that each member hears Frequency is special.Everyone is different.

Because can parliament the audio of each participant is synthesized, and what each participant heard is remaining and meeting The audio mixing of person is as a result, the audio that namely everyone hears is not quite similar.Therefore the sound quality of testing audio meeting can be held very much The easy source audio for getting each participant, but target audio has been the data being mixed, under normal circumstances with source Audio has had very big difference, therefore is not easy to assess based on the ITU-T sound quality that P.862 recommendation carries out PESQ, if base In ITU-T, P.862 recommendation carries out tone frequency estimation, and sound quality is relatively poor certainly.

It for above situation, can be easy to get the source audio of each participant, but more difficult get mesh Mark with phonetic symbols frequency, or can so think, it is easy to get target audio, but source audio can not be corresponding with target audio, Therefore it is configured to meet the source audio of target audio with regard to needs.

Summary of the invention

The technical problem to be solved in the present invention is to provide a kind of acoustical testing method of audio conferencing, this method test Simply, and test and comparison is accurate.

The present invention is implemented as follows: a kind of acoustical testing method of audio conferencing, the method are as follows:

Obtain the corresponding source audio of each member for participating in audio conferencing；

Each member by audio system be based on source audio construct with Composite tone similar in target audio, i.e., it is each at After member deducts the source audio of itself, Composite tone is formed after the source audio of other members is mixed, which is every The corresponding Composite tone of a member itself；

Each member uses the target audio data that itself corresponding Composite tone and audio conference server are sent P.862 recommendation carries out the current real-time sound quality of PESQ calculating acquisition to ITU-T, and the records of values of sound quality is got off, to complete The test of sound quality.

Further, the method specifically: after meeting starts, obtain the corresponding source of each member for participating in audio conferencing Audio；Audio conference server, which needs locally to synthesize to each member, does a time synchronization, informs that local synthesis is needed from which A time point starts to synthesize；After informing, local synthesis starts to carry out local synthesis according to the source audio of each member, services at this time Device starts the packet parsing send over from local source audio, and carries out synthetic operation；When being synthesized to some time point, no It needs to do notice to the member for carrying out synthetic operation and does other operations；But it must be notified to the local synthesis of other members, Its member locally synthesizes after receiving the operation that the member for carrying out synthetic operation is muted, it is necessary to timely responds, it will The source data for carrying out synthetic operation member is substituted for mute audio.

Further, network delay is also had to local due to audio conference server, in notification data, also accused Know that the member for carrying out synthetic operation is muted period in which, then local synthetic model starts the member for carrying out synthetic operation The data of silence period synchronize operation processing.

Further, after audio conference server sends back the RTP message of synthesis, local reception audio conference server The RTP message data sent back to, the RTP message data are the target audio data that audio conference server is sent.

The present invention has the advantage that 1, using the characteristic of existing audio conferencing, remove the source sound for constructing suitable target audio Frequently.2, the sound quality situation of each participant is calculated based on ITU-TP.862 recommendation using the source audio of construction.3, addition pair Some details parameters such as network delay carry out synthesis adjustment to the source audio of construction, reach more accurate acoustical testing.

Detailed description of the invention

The present invention is further illustrated in conjunction with the embodiments with reference to the accompanying drawings.

Fig. 1 is the method for the present invention flow diagram.

Fig. 2 is the frame diagram of audio conferencing sound intermediate frequency synthesis.

Fig. 3 is the frame diagram for constructing source audio and carrying out test assessment.

Fig. 4 is Conference control relevant control timing diagram.

Fig. 5 is the sound quality estimation flow schematic diagram of single user.

Specific embodiment

Refering to Figure 1, a kind of acoustical testing method of audio conferencing, the method are as follows: obtain and participate in audio conferencing The corresponding source audio of each member；

Wherein, after audio conference server sends back the RTP message of synthesis, local reception audio conference server is sent back to RTP message data, the RTP message data be audio conference server send target audio data.

The method specifically: after meeting starts, obtain the corresponding source audio of each member for participating in audio conferencing；Audio Conference server, which needs locally to synthesize to each member, does a time synchronization, informs that local synthesis needs to open from which time point Begin to synthesize；After informing, local synthesis starts to carry out local synthesis according to the source audio of each member, and server starts to carry out at this time The packet parsing sended over from local source audio, and carry out synthetic operation；When being synthesized to some time point, do not need to progress The member of synthetic operation does notice and does other operations；But it must notify that other members are local to the local synthesis of other members It synthesizes after receiving the operation that the member for carrying out synthetic operation is muted, it is necessary to timely respond, synthesis behaviour will be carried out The source data for making member is substituted for mute audio.Since audio conference server also has network delay to local, notify In data, also to inform that the member for carrying out synthetic operation is muted period in which, then local synthetic model will synthesize The data that the member of operation starts silence period synchronize operation processing.

Below with reference to a specific example, the present invention will be further described:

1, the frame of audio conferencing sound intermediate frequency synthesis

The resultant frame of the audio conferencing of mainstream illustrates as shown in Fig. 2, simply doing here: in Fig. 2 example, a meeting View has 4 participants, and each participant has a say in a meeting.The audio that user A can be heard is user B, C, D speech Audio after audio mixing；The audio that user B is heard is the audio after user A, C, D speech audio mixing, and equally, what user C was heard is to use Audio after family A, B, D speech audio mixing；User D is then the audio heard after user A, B, C speech audio mixing.And if carrying out audio Meeting acoustical testing usually uses the source audio of some fixations.For example party A-subscriber's source audio of party A-subscriber, party B-subscriber use party B-subscriber Source audio, C user uses the source audio of C user, and D user then uses the source audio of D user.

2, it constructs source audio and carries out the frame of test assessment

Above, have been able to get the source audio of ABCD4 user, a core of Current patents application is exactly Based on source audio construction and Composite tone similar in target audio, then after Composite tone and target audio are carried out PESQ calculating, The sound quality situation of each participant can be obtained.As shown in figure 3, under the frame, for synthesis or fairly simple, it may be assumed that

The Composite tone of party A-subscriber are as follows: the source audio of party B-subscriber, the source audio of C user, the source audio of D user mix Audio；

The Composite tone of party B-subscriber are as follows: the source audio of party A-subscriber, the source audio of C user, the source audio of D user mix Audio；

The Composite tone of C user are as follows: the source audio of party A-subscriber, the source audio of party B-subscriber, the source audio of D user mix Audio；

The Composite tone of D user are as follows: the source audio of party A-subscriber, the source audio of party B-subscriber, the source audio of C user mix Audio；But network transmission delay, and some composite characters of corresponding audio conferencing must also be related to: such as some at The gain of member is enhanced, some specially treateds such as the gain of certain members is lowered, therefore when this just needs to test, audio Meeting is required to inform that the source audio locally synthesized goes how to construct.

3, Conference control relevant control timing

Such as the timing of Fig. 4, the control situation that occurs in the meeting of one test frame of simple illustration.Meeting starts first Afterwards, audio conference server, which needs locally to synthesize to each user, does a time synchronization, informs that local synthesis is needed from which Time point starts to synthesize.After informing, each local synthesis starts to carry out local synthesis according to the source audio of each user, at this time sound Frequency Conference server also starts the packet parsing send over from local source audio, and is synthesized.When being synthesized to some Time point, after C user is muted, since C user can not hear that the sound of oneself, C do not need to notify it to C user Do any operation.But it must be notified to the local synthesis of party A-subscriber and party B-subscriber.Locally synthesis is receiving C use to AB user After the operation that family is muted, it is necessary to timely respond, the source data of C user is substituted for mute audio.And due to audio Conference server also has network delay to local, therefore in notification data, also to inform that C user is muted period in which. Then the data that C user starts silence period are synchronized operation processing by local synthetic model.

4, the sound quality assessment of single user

As shown in figure 5, being exactly processing of the user for sound quality, the final assessment of sound quality is in user side (i.e. user Local) carry out what calculating evaluated, after audio conference server sends back the RTP message of synthesis, locally need audio The RTP data that Conference server is sent back to, that is, the target data that audio conference server is sent back to.And with the source number that locally synthesizes According to.PESQ calculating is carried out using recommendation P.862, so that it may calculate current real-time sound quality.And the numerical value of sound quality is given It records.

Although specific embodiments of the present invention have been described above, those familiar with the art should be managed Solution, we are merely exemplary described specific embodiment, rather than for the restriction to the scope of the present invention, it is familiar with this The technical staff in field should be covered of the invention according to modification and variation equivalent made by spirit of the invention In scope of the claimed protection.

Claims

1. a kind of acoustical testing method of audio conferencing, it is characterised in that: the method are as follows:

Each member is based on source audio by audio system and constructs and Composite tone similar in target audio, i.e., each member's button Except forming Composite tone after the source audio of other members is mixed after itself source audio, the Composite tone be it is each at Itself corresponding Composite tone of member；

The target audio data that itself corresponding Composite tone and audio conference server are sent are used ITU-T by each member P.862 recommendation carries out PESQ and calculates the current real-time sound quality of acquisition, and the records of values of sound quality is got off, to complete sound quality Test；

After meeting starts, the corresponding source audio of each member for participating in audio conferencing is obtained；Audio conference server is needed to every Locally a time synchronization is done in synthesis to a member, informs that local synthesis needs to synthesize since which time point；It is local after informing Synthesis starts to carry out local synthesis according to the source audio of each member, and server starts be transmitted across from local source audio at this time The packet parsing come, and carry out synthetic operation；When being synthesized to some time point, do not need to do to the member for carrying out synthetic operation logical Know and does other operations；But it must be notified to the local synthesis of other members, other members locally close receiving by synthesis After the operation being muted at the member of operation, it is necessary to timely respond, the source data for carrying out synthetic operation member is replaced At mute audio.

2. a kind of acoustical testing method of audio conferencing according to claim 1, it is characterised in that: since audio conferencing takes Business device also has network delay to local, therefore in notification data, also to inform the member of progress synthetic operation in which period It is muted, then local synthetic model synchronizes the data that the member for carrying out synthetic operation starts silence period at operation Reason.

3. a kind of acoustical testing method of audio conferencing according to claim 1, it is characterised in that: audio conference server After the RTP message of synthesis is sent back, the RTP message data that local reception audio conference server is sent back to, the RTP message data The as target audio data of audio conference server transmission.