CN113923395A

CN113923395A - Method, equipment and storage medium for improving conference quality

Info

Publication number: CN113923395A
Application number: CN202010644423.2A
Authority: CN
Inventors: 刘成芳
Original assignee: ZTE Corp
Current assignee: ZTE Corp
Priority date: 2020-07-07
Filing date: 2020-07-07
Publication date: 2022-01-11

Abstract

The invention discloses a method, equipment and a storage medium for improving the quality of a conference network telephone, and belongs to the technical field of communication. The method comprises the following steps: receiving status information transmitted from each terminal in a network conference in which a plurality of terminals participate; determining a talkback terminal in the network conference according to the state information of each terminal; and carrying out conference quality improvement processing according to the determined talkback terminal.

Description

Method, equipment and storage medium for improving conference quality

Technical Field

The present invention relates to the field of communications technologies, and in particular, to a method, a device, and a storage medium for improving conference quality.

Background

In actual work, a lot of work needs to be discussed and determined after communication, but due to the problem of epidemic situation, most people can only work at home, and can only communicate in the mode of network conference telephone when meeting. In a network conference, multi-party speech is frequently generated, and a series of problems such as signal problems, traffic problems, multi-person speech and the like can affect the quality of the conference. The current conference phone does not process the speaker speaking or mutes the other speakers after the speaker is confirmed. The former scheme is easy to interfere with the main speaker when a plurality of persons speak, so that the voice of the main speaker is not clearly heard.

Disclosure of Invention

The main purpose of the embodiments of the present invention is to provide a method, a device, and a storage medium for improving conference quality, which are used to ensure that a main speaker in a network conference can clearly be heard by others, and that other speakers also have a right to speak, and that the speaking of the main speaker is not interfered, so as to ensure conference quality.

In order to at least achieve the above object, an embodiment of the present invention provides a method for improving conference quality, where the method includes the following steps: receiving status information transmitted from each terminal in a network conference in which a plurality of terminals participate; determining a talkback terminal in the network conference according to the state information of each terminal; and carrying out conference quality improvement processing according to the determined talkback terminal.

In order to at least achieve the above object, an embodiment of the present invention provides a method for improving conference quality, including: detecting state information of a terminal, wherein the state information comprises one or more of the following information: the distance between the user and the microphone MIC, the identity information of the user and the voice tone information of the user; transmitting the detected state information to a server so that the server determines a talkback terminal in the network conference from a plurality of terminals participating in the network conference; and carrying out conference quality improvement processing according to the determined talkback terminal.

To achieve at least the above object, an embodiment of the present invention further provides an apparatus for improving conference quality, where the apparatus includes a memory, a processor, a program stored in the memory and running on the processor, and a data bus for implementing connection communication between the processor and the memory, and the program implements the steps of the foregoing method when executed by the processor.

To achieve at least the above objects, the present invention provides a computer-readable storage medium storing one or more programs, which are executable by one or more processors to implement the steps of the aforementioned method.

The method, the equipment and the storage medium for improving the conference quality can ensure that the speaker can be clearly heard by other people, can update the speaker in real time (ensure that the speaker is not fixed), and optimize the performance of the network telephone.

Drawings

Fig. 1 is a flowchart of a method for improving conference quality according to an embodiment of the present invention.

Fig. 2 is a flowchart of a method for improving conference quality according to another embodiment of the present invention.

Fig. 3 is a schematic diagram of detecting a voice energy signal change of a speaker during a period of time according to an embodiment of the present invention.

Fig. 4 is a schematic diagram of processing the speech energy signal of fig. 3 according to an embodiment of the present invention.

Detailed Description

It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

In the following description, suffixes such as "module", "part", or "unit" used to denote elements are used only for facilitating the explanation of the present invention, and have no peculiar meaning in itself. Thus, "module", "component" or "unit" may be used mixedly.

Example one

As shown in fig. 1, a flowchart of a method for improving conference quality according to an embodiment of the present invention is provided, which may include the following steps:

step S110: receiving status information transmitted from each terminal in a network conference in which a plurality of terminals participate;

step S120: determining a talkback terminal in the network conference according to the state information of each terminal;

step S130: and carrying out conference quality improvement processing according to the determined talkback terminal.

In an embodiment, the method for improving the quality of the conference may be executed by running in a server, or may be executed by installing and running in a terminal as an application. The terminal can be any suitable terminal such as a conference terminal, a mobile phone and the like, which can participate in a network conference or a telephone conference or other conference. For ease of description, the server is described below as performing the above method.

Further, the state information includes a distance between the user and a microphone MIC or identity information of the user, wherein the determining, according to the state information of each terminal, a speaker terminal in the netmeeting includes: when the state information comprises the distance between the user and the microphone MIC, according to the distance between the user and the microphone MIC in the state information of each terminal, confirming the terminal corresponding to the state information with the minimum distance between the user and the microphone MIC, and taking the terminal as a main speaking terminal; or when the status information includes the identity information of the user, the terminal corresponding to the identity information of the user is used as the talkback terminal according to the fact that the identity information of the user is used as the talkback person.

Wherein the state information comprises the distance between the user and a microphone MIC and the voice tone of the user; wherein, the determining the talkback terminal in the netmeeting according to the state information of each terminal comprises: determining the voice tone of a user in the state information of at least one target terminal according to the distance between the user in the state information of each terminal and a microphone MIC, wherein the distance between the user and the microphone MIC is smaller than a preset distance; inquiring whether the voice tone of the user in the state information of the terminal exists from a pre-stored user tone characteristic library; and when the server inquires the voice tone of the user in the state information of the terminal from a pre-stored user tone characteristic library, taking the terminal as a main speaking terminal in the network conference.

The method of the embodiment of the invention also comprises the following steps: determining the participant terminals in the network conference according to the state information of each terminal; and carrying out conference quality improvement processing according to the determined participant terminals.

Wherein the determining of the participant terminal in the network conference comprises: when the state information comprises the distance between the user and the microphone MIC, confirming one or more terminals corresponding to the state information of which the distance between the user and the microphone MIC is greater than a preset distance according to the distance between the user and the microphone MIC in the state information of each terminal, and taking the one or more terminals as participant terminals; or when the status information comprises the identity information of the user and the identity information of the user is taken as a participant, the terminal is taken as a participant terminal in the network conference.

The embodiment of the invention also comprises the following steps: receiving state information sent by the talkback terminal or the participant terminal; when the state information of the main speaking terminal is confirmed to be changed, the main speaking terminal is switched to a participant terminal; or when the state information of the participant terminal is confirmed to change, the changed participant terminal is switched to the main speaking terminal.

Before a network conference composed of a plurality of terminals, the embodiment of the invention further comprises: receiving the user tone color characteristics or the identity information sent by each terminal, and storing the user tone color characteristics or the identity information; wherein the identity information comprises a main speaker and other participants.

The processing of conference instruction promotion according to the determined main terminal may include one or more of the following modes: picking up the sound of the talkback terminal through the MIC, and automatically enhancing the volume picked up by the MIC of the talkback terminal; adopting a first sampling mode for an MIC (microphone) used by a main speaking terminal; and setting the network transmission of the main speaking terminal as the network transmission with the highest priority. In one embodiment, the enhancement process may be performed by: the loudness of sound is automatically normalized, sound is picked up through mic, the volume of the sound is automatically enhanced, and the loudness is improved. In one embodiment, the principle of optimal sound quality can also be used: the sound can be sampled by 48k/32k/16k/8k and the like (the specific sampling mode can also be selected according to the actual situation), and when the main speaker is selected, the mic used by the speaker adopts the optimal sampling mode, so that the definition is highest; in one embodiment, the network transmission optimization principle may also be adopted: in the signal transmission sequence, the signal transmission priority of the main speaker is the highest or higher than the priority of the participant terminal.

Wherein, the processing of conference instruction promotion according to the determined participant terminal may also include one or more of the following modes: the sound of the participant terminal is picked up through the MIC, the volume picked up by the MIC of the participant terminal is automatically reduced, the MIC used by the participant terminal adopts a second sampling mode, and the network transmission of the participant terminal is set to be network transmission with lower priority. In one embodiment, the loudness of the sound of the participant terminal may be automatically halved; for example, in the case where it is determined that there is a main speaker, the sound of other speakers is attenuated but not completely eliminated, and the sound is guaranteed to be audible (the loudness of the sound is automatically detected, and in the case of voice detection, the volume of the sound is increased to some extent if it is too small, but the degree of the increase is not more than 50%); in an embodiment, the principle of automatic voice quality reduction may also be adopted, for example, in the case that the main speaker is determined, the strength of the signal is detected to confirm the sampling rate that other speakers can use, if the signal strength is strong, sampling of 48K/32K or the like may be adopted (the specific sampling mode may also be determined according to the actual situation), and if the signal is lower than a certain threshold, the sampling rate is forcibly reduced, and the sampling rate is reduced to 8K samples. In one embodiment, the network transport minimum principle may also be employed: in the signal transmission sequence, the signal transmission priority of other speakers is the lowest or lower than the priority of the main speaking terminal.

Example two

Fig. 2 is a flowchart of a method for improving conference quality according to another embodiment of the present invention, and as shown in fig. 2, the method may include:

step S210: detecting state information of a terminal, wherein the state information comprises one or more of the following information: the distance between the user and the microphone MIC, the identity information of the user and the voice tone information of the user;

step S220: transmitting the detected state information to a server so that the server determines a talkback terminal in the network conference from a plurality of terminals participating in the network conference;

step S230: and carrying out conference quality improvement processing according to the determined talkback terminal.

It should be noted that the method for improving the conference quality in the above embodiments may be executed on a terminal.

According to an embodiment of the present invention, an apparatus (not shown in the drawings) for improving conference quality is provided, and the apparatus may include a memory, a processor, a program stored in the memory and running on the processor, and a data bus for implementing connection communication between the processor and the memory, and the program is executed by the processor to implement the specific steps shown in fig. 1 or fig. 2.

According to an embodiment of the present invention, a computer-readable storage medium is provided, which may store one or more programs, which may be executed by one or more processors, to implement specific steps as may be shown in fig. 1 or fig. 2.

The method, the equipment and the computer storage medium provided by the embodiment of the invention can improve the conference quality aiming at the network conference or the telephone conference, and ensure the conference quality by dynamically identifying the main speaking terminal or other terminals in the conference and optimizing the conference quality by combining the volume, the tone quality, the network and the like.

In an embodiment, the method for improving the quality of the conference may further automatically establish, for a conference participant database: obtaining the tone of a spoken word using a conferencing application: a person using a conference application may record a fixed voice (recording is not necessary, if the user needs to check if the mic and the loudspeaker of his own application are ok, it is simply recorded) to check the timbre characteristics of the user, and store them in a database. When the user uses the application for the first time, the user detects the speaking through the voice when using the application, automatically records a section of tone color characteristics of the user, and stores the tone color characteristics in a database.

In an embodiment, the method for improving the quality of the conference may further automatically identify and store the related information: the roles of the speaker, participants or other meeting personnel are saved in a database.

In one embodiment, the participant may have at least two identity transformations: main speakers, participants, etc. In other embodiments, transformations of other identities may be arranged according to the identity of the actual conference.

In an embodiment, after the conference is created, the method for improving the quality of the conference may further determine which terminal corresponds to the main speaker by detecting the speaker tone and the amplitude of the mic, and determine that the priority of the other terminals is degraded to the audience mode, for example, the other terminals may be regarded as participant terminals and perform corresponding processing.

In an embodiment, the method for improving conference quality may further detect a distance change between the main speaker and the mic in real time, the mic picks up a signal energy change, and if the distance between the main speaker and the mic changes significantly, the terminal corresponding to the main speaker is switched to the audience mode.

In which it is detected whether a speech energy signal of a speaker has changed significantly over a period of time, as shown in fig. 3. Energy (T3-T4) < Energy (T2-T3) -Ndb (N is a loudness of a certain magnitude that can distinguish the change in mic distance), indicating that mic has been far away from the main speaker, which signal indicates that the main speaker has finished speaking or does not need to speak any more.

In one embodiment, the voice tone of the main speaker may be detected in real time from the stored database information, and after separating the voice, it may be confirmed whether the voice is in a speaking state, and the voice of the main speaker may be processed to detect whether the voice is in a silent state, as shown in fig. 4.

In one embodiment, VAD (Voice Activity Detection Voice Detection strategy) may be used to identify and confirm a long silent period from within the Voice signal stream, and when the silent period is longer than a certain period of time or more, the conference terminal side sends status information to the conference processing center (server), and the conference processing center releases the default speaker's right according to the information. It is needless to say that whether or not the speaker is in the mute mode may be recognized by other recognition means, and for example, it may be determined that the speaker's terminal device audio equipment or the like is switched to mute, and it may be confirmed that the speaker releases the right of the speaker.

In one embodiment, if the obtained data shows that the main speaker has no release authority and identifies people except the speaker, the conference processing center (server) can add the data into the speaker (the number of the main speaker is less than or equal to 3 people) according to a voice recognition strategy. The method comprises the steps that other participants are automatically set to be in a default audience mode when a conference begins, the distance between the participants and a mic and the change of the signal energy picked by the mic are automatically detected in the conference process, when the sudden change of a signal is larger than 2s and the signal is voice, a conference terminal sends state information to a conference processing center, the conference processing center confirms whether the role of the conference terminal is improved or not according to the existence of a main speaker at present, and if the signal is maintained for more than a period of time, the conference terminal is set as the main speaker; if the main speaker releases the authority, the mic voice detection of the user terminal is provided for the conference processing center to find that two or more users speak, the conference processing center sets one of the main speakers (the other people are alternative main speakers) randomly or according to the volume picked up by the mic, detects the mic voice energy of the main speaker, and automatically releases the identity of the main speaker if the mute or volume reduction occurs.

According to the method, the device and the storage medium for improving the conference quality, which are provided by the embodiment of the invention, the voice characteristics of the participants and the identity characteristics of the participants are automatically identified and stored when the conference is started through establishing the database for the participants. In the conference opening process, the voices of participants are processed according to the identity characteristics and the voice characteristics of the participants, the voice definition and loudness of speakers are enhanced, and the voice characteristics of other people are reduced. Meanwhile, the identity of the speaker is not specific, and the speaker is automatically updated according to a series of methods such as voice detection and distance detection. The method and the system ensure that the main speaker can be clearly heard by other people in the network conference, and other speakers also have the right to speak and cannot interfere with the speaking of the main speaker.

One of ordinary skill in the art will appreciate that all or some of the steps of the methods, systems, functional modules/units in the devices disclosed above may be implemented as software, firmware, hardware, and suitable combinations thereof.

In a hardware implementation, the division between functional modules/units mentioned in the above description does not necessarily correspond to the division of physical components; for example, one physical component may have multiple functions, or one function or step may be performed by several physical components in cooperation. Some or all of the physical components may be implemented as software executed by a processor, such as a central processing unit, digital signal processor, or microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit. Such software may be distributed on computer readable media, which may include computer storage media (or non-transitory media) and communication media (or transitory media). The term computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data, as is well known to those of ordinary skill in the art. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, Digital Versatile Disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can accessed by a computer. In addition, communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media as known to those skilled in the art.

The preferred embodiments of the present invention have been described above with reference to the accompanying drawings, and are not to be construed as limiting the scope of the invention. Any modifications, equivalents and improvements which may occur to those skilled in the art without departing from the scope and spirit of the present invention are intended to be within the scope of the claims.

Claims

1. A method for enhancing conference quality, the method comprising the steps of:

receiving status information transmitted from each terminal in a network conference in which a plurality of terminals participate;

determining a talkback terminal in the network conference according to the state information of each terminal;

and carrying out conference quality improvement processing according to the determined talkback terminal.

2. The method of claim 1, wherein the state information includes a distance between a user and a microphone MIC or identity information of the user, and wherein the determining the speaker terminal in the netmeeting according to the state information of each terminal comprises:

when the state information comprises the distance between the user and the microphone MIC, according to the distance between the user and the microphone MIC in the state information of each terminal, confirming the terminal corresponding to the state information with the minimum distance between the user and the microphone MIC, and taking the terminal as a main speaking terminal; or

When the status information includes the identity information of the user, the terminal corresponding to the identity information of the user is used as the talkback terminal according to the fact that the identity information of the user is used as the talkback person.

3. The method for improving conference quality as claimed in claim 1, wherein the status information comprises a distance between the user and a microphone MIC and a voice tone of the user; wherein, the determining the talkback terminal in the netmeeting according to the state information of each terminal comprises:

determining the voice tone of a user in the state information of at least one target terminal according to the distance between the user in the state information of each terminal and a microphone MIC, wherein the distance between the user and the microphone MIC is smaller than a preset distance;

inquiring whether the voice tone of the user in the state information of the terminal exists from a pre-stored user tone characteristic library;

and when the server inquires the voice tone of the user in the state information of the terminal from a pre-stored user tone characteristic library, taking the terminal as a main speaking terminal in the network conference.

4. The method of improving meeting quality as recited in claim 1, further comprising: determining the participant terminals in the network conference according to the state information of each terminal;

and carrying out conference quality improvement processing according to the determined participant terminals.

5. The method for improving conference quality as claimed in claim 4, wherein said determining the participant terminals in the network conference comprises:

when the state information comprises the distance between the user and the microphone MIC, confirming one or more terminals corresponding to the state information of which the distance between the user and the microphone MIC is greater than a preset distance according to the distance between the user and the microphone MIC in the state information of each terminal, and taking the one or more terminals as participant terminals; or

And when the state information comprises the identity information of the user and the identity information of the user is taken as a participant, taking the terminal as a participant terminal in the network conference.

6. The method for improving conference quality as claimed in claim 4, further comprising:

receiving state information sent by the talkback terminal or the participant terminal;

when the state information of the main speaking terminal is confirmed to be changed, the main speaking terminal is switched to a participant terminal; or

And when the state information of the participant terminal is confirmed to change, the changed participant terminal is switched to the main speaking terminal.

7. The method for improving conference quality according to any one of claims 1-6, wherein before the network conference consisting of a plurality of terminals, further comprising:

receiving the user tone color characteristics or the identity information sent by each terminal, and storing the user tone color characteristics or the identity information;

wherein the identity information comprises a main speaker and other participants.

8. The method for improving conference quality according to claim 1, wherein the performing conference instruction improvement processing according to the determined speaker terminal comprises:

the server picks up the sound of the talkback terminal through the MIC, automatically enhances the volume picked up by the MIC of the talkback terminal, adopts a first sampling mode for the used MIC, and sets the network transmission of the talkback terminal to the network transmission with the highest priority.

9. The method for improving conference quality according to claim 4, wherein the performing conference instruction improvement processing according to the determined participant terminal comprises:

and picking up the sound of the participant terminal through the MIC, automatically reducing the volume picked up by the MIC of the participant terminal, adopting a second sampling mode for the used MIC, and simultaneously setting the network transmission of the participant terminal to be network transmission with lower priority.

10. A method for enhancing conference quality, comprising:

detecting state information of a terminal, wherein the state information comprises one or more of the following information: the distance between the user and the microphone MIC, the identity information of the user and the voice tone information of the user;

transmitting the detected state information to a server so that the server determines a talkback terminal in the network conference from a plurality of terminals participating in the network conference;

11. An apparatus for improving conference quality, the apparatus comprising a memory, a processor, a program stored on the memory and executable on the processor, and a data bus for enabling connection communication between the processor and the memory, the program when executed by the processor implementing the steps of the method for improving conference quality as claimed in any one of claims 1-10.

12. A computer readable storage medium, characterized in that the storage medium stores one or more programs which are executable by one or more processors to implement the steps of the method of improving conference quality of any of claims 1 to 10.