WO2015172435A1

WO2015172435A1 - Method and server for ordered speaking in teleconference

Info

Publication number: WO2015172435A1
Application number: PCT/CN2014/083233
Authority: WO
Inventors: 周琦
Original assignee: 中兴通讯股份有限公司
Priority date: 2014-05-14
Filing date: 2014-07-29
Publication date: 2015-11-19
Also published as: CN105100521A

Abstract

Disclosed are a method and server for ordered speaking in a teleconference, the method comprising: a server receives audio information corresponding to a teleconference speaker and transmitted by a terminal; searching a pre-stored voice sample database, conducting voice identification on the audio information, and acquiring the speaking priority of the speaker corresponding to the audio information; and according to the speaking priority of the speaker, utilizing the audio information corresponding to the speaker having the highest speaking priority as the preferred audio information, and transmitting to the terminal to enable the terminal to broadcast the received preferred audio information, thus solving the mutual voice interference problem when a plurality of persons simultaneously speak in a teleconference, realizing ordered speaking in a teleconference, and improving conference efficiency and man-machine interactivity.

Description

TECHNICAL FIELD The present invention relates to the field of communications, and in particular, to a method and a server for implementing an ordered speech in a remote conference. BACKGROUND With the wide application of remote conferences (for example, conference calls and video conferences), the requirements for conference quality, efficiency, and user experience of remote conferences are higher, and how to enable remote conferences to achieve the same effect as real conferences. And the user experience has become an urgent problem to be solved. In the existing teleconferences, such as video conferences, if multiple people speak at the same time, there will be problems in which the voices of the multiple speakers interfere with each other, causing other participants to hear the speaker's words, especially in the network performance. When it was not good, the other participants heard almost a noise, which seriously affected the quality of the conferences in the teleconference. SUMMARY OF THE INVENTION Embodiments of the present invention provide a method and a server for implementing an ordered speech in a remote conference, so as to at least solve the problem of mutual interference of sound caused by simultaneous speaking by multiple people in a remote conference in the related art. The embodiment of the invention discloses a method for implementing an ordered speech in a remote conference, which includes the following steps: receiving audio information corresponding to a remote conference speaker sent by the terminal; searching a pre-stored sound sample database, and performing voice on the audio information Identifying, obtaining a speaking priority of the speaker corresponding to the audio information; and transmitting, according to the speaking priority of the speaker, the audio information corresponding to the speaker with the highest speaking priority as the priority audio information to the terminal, The terminal is caused to play the received priority audio information. Preferably, the step of transmitting, according to the speaking priority of the speaker, the audio information corresponding to the speaker with the highest speaking priority as the priority audio information to the terminal comprises: the speaker corresponding to the audio information is one Sending the audio information as priority audio information to the terminal; When the number of speakers corresponding to the audio information is at least two, the voice priority corresponding to each speaker is obtained, and the audio information corresponding to the speaker with the highest voice priority is sent to the terminal as priority audio information. . Preferably, the step of searching for a pre-stored sound sample database, performing voice recognition on the audio information, and acquiring the number of speakers corresponding to the audio information and the speaking priority corresponding to the speaker further includes: the audio information When the corresponding speaker is a stranger, the audio information is prohibited from being sent to the terminal, and the sound of the audio information mapping is treated as noise; wherein, the stranger is: the sound sample not stored in the sound sample database corresponds to The speaker of the audio information is mapped. Preferably, the step of receiving the audio information corresponding to the remote conference speaker sent by the terminal further comprises: receiving a sound sample corresponding to the participant with different speaking priorities sent by the terminal, and creating the sound according to the sound sample Sample database. Preferably, the method for implementing an ordered speech in the remote conference further includes: receiving a new sound sample corresponding to a participant newly joined to the remote conference sent by the terminal, and adding the new sound sample to the sound sample database; The new sound sample carries a corresponding speaking priority.

The embodiment of the invention further discloses a server for implementing an ordered speech in a remote conference, comprising: an information receiving module, configured to receive audio information corresponding to a remote conference speaker sent by the terminal; and an information recognition module configured to search for a pre-stored sound a sample database, performing voice recognition on the audio information, and acquiring a speaking priority of the speaker corresponding to the audio information; and the information processing module is configured to set the speaking priority to be the highest according to the speaking priority of the speaker The audio information corresponding to the speaker is sent to the terminal as the priority audio information, so that the terminal plays the received priority audio information. Preferably, the information processing module is further configured to: When the speaker corresponding to the audio information is one, the audio information is sent to the terminal as the priority audio information; when the speaker corresponding to the audio information is at least two, the speaking priority corresponding to each of the speakers is obtained first. The audio information corresponding to the speaker with the highest speaking priority is sent to the terminal as priority audio information. Preferably, the information processing module is further configured to: when the speaker corresponding to the audio information is a stranger, prohibit sending the audio information to the terminal, and process the sound of the audio information as noise; The stranger is: a speaker mapped by the audio information corresponding to the stored sound sample in the sound sample database. Preferably, the server that implements the ordered speech in the remote conference further includes: a database establishing module, configured to receive a sound sample respectively corresponding to the participants with different speaking priorities sent by the terminal, and create the sound according to the sound sample Sample database. Preferably, the database establishing module is further configured to: receive a new sound sample corresponding to a participant newly joined to the remote conference sent by the terminal, and add the new sound sample to the sound sample database; wherein the new sound sample Carry the corresponding speaking priority.

The server of the embodiment of the present invention receives the audio information corresponding to the remote conference speaker sent by the terminal; searches the pre-stored voice sample database, performs voice recognition on the audio information, and obtains the speaker priority of the speaker corresponding to the audio information; The speaking priority of the speaker, the audio information corresponding to the speaker with the highest speaking priority is sent to the terminal as the priority audio information, so that the terminal plays the received priority audio information; The problem that many people speak at the same time and interfere with each other realizes the purpose of orderly speaking in tele-conferences, and improves the efficiency of meetings and the interactivity of human-machines. BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 is a schematic diagram showing a system architecture for implementing an ordered speech in a remote conference according to an embodiment of the present invention; 2 is a schematic flowchart of a first embodiment of a method for implementing an ordered speech in a remote conference according to the present invention; FIG. 3 is a schematic flowchart of a second embodiment of a method for implementing an ordered speech in a remote conference according to the present invention; FIG. 5 is a schematic diagram of a functional module of a first embodiment of a server for implementing an ordered speech in a remote conference according to the present invention; FIG. 6 is a server for implementing an ordered speech in a remote conference according to the present invention. A schematic diagram of a functional module of the second embodiment.

The implementation, functional features, and advantages of the embodiments of the present invention will be further described with reference to the accompanying drawings. DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS The technical solutions of the present invention will be further described below in conjunction with the drawings and specific embodiments. It is understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

The server can be deployed as a cloud server, and the terminal that interacts with the server can be deployed as a cloud terminal; the remote conference can be deployed in the operating environment of the server in the remote conference of the present invention. Including video conferencing, teleconferencing and other remote audio conferences and remote video conferences. As shown in FIG. 1 , in the method for implementing the ordered speech in the remote conference of the present invention and the operating environment of the server, the server 100 performs data interaction with the plurality of terminals 200 (only two terminal examples in FIG. 1 ), and the implementation is not in the same geographical location. A remote meeting is performed between the participants of the location based on the server 100 and the terminal 200. In this embodiment, the terminal 200 establishes communication with the server 100 via the Internet to construct an implementation environment for the remote conference. After the remote conference implementation environment is deployed, the terminal 200 detects in real time whether a user has triggered a sound collection instruction. When the terminal 200 detects that the user triggers the voice collection instruction, for example, the user speaks through the terminal microphone, the terminal 200 collects the audio information of the speaker, and sends the collected audio information to the server 100. Since the server 100 performs data interaction with the plurality of terminals 200, the server 100 may receive audio information transmitted by the plurality of terminals 200 at the same time. When receiving the audio information sent by the plurality of terminals 200 at the same time, the server 100 searches the sound sample database according to the received audio information, and identifies the speaking priority of the speaker corresponding to each of the received plurality of terminals 200; The audio information corresponding to the highest speaker is used as the priority audio information in the audio information collected this time, and the priority audio information is sent to each terminal 200 while shielding other received audio information. After receiving the priority audio information sent by the server 100, each terminal 200 plays the received ^ Audio information; thus achieving the purpose of orderly speaking in a remote conference, avoiding the sound interference caused by multiple speakers simultaneously speaking in a remote conference.

The present invention also provides a first embodiment of a method for implementing an ordered speech in a remote conference. Based on the description of the embodiment shown in FIG. 1, as shown in FIG. 2, the method for implementing an ordered speech in the remote conference of the present invention includes the following: Steps: Step S01: Receive audio information corresponding to the remote conference speaker sent by the terminal; in the remote conference running environment, the terminal detects the operation instruction triggered by the user in real time. When the terminal detects that the user triggers the sound collection instruction (for example, the user speaks through the microphone), or receives the sound information sent by the user, the terminal collects the audio information corresponding to the user; and sends the collected audio information to the server, and the server receives the terminal. The audio information corresponding to the remote conference spokesperson sent. In the embodiment of the present invention, the audio information sent by the terminal received by the server may not be the audio information corresponding to the participant's speech corresponding to the remote conference, but the server considers all the audio information sent by the remote conference terminal to be The audio information corresponding to the remote conference participant; and when the audio information is subsequently identified, it is determined whether the audio information is the audio information corresponding to the voice of the remote conference participant. Step S02: Search a pre-stored sound sample database, perform voice recognition on the audio information, and obtain a speaker priority level of the speaker corresponding to the audio information; and when the server receives the audio information corresponding to the remote conference speaker sent by the terminal, The pre-stored sound sample database is searched to identify whether the sound sample corresponding to the audio information is stored in the sound sample database. In this embodiment, the sound sample database stores sound samples corresponding to all participants of the remote conference. The server performs voice recognition on the received audio information, and finds a sound sample corresponding to the audio information from the sound sample database, so as to obtain a priority level of the speaker corresponding to the audio information according to the searched sound sample. For example, the server compares the received audio information with the sound sample database every 100 milliseconds. Those skilled in the art can understand that different voices can be distinguished according to different voices of the person; that is, when the server finds the sound sample corresponding to the audio information in the sound sample database, The speaker corresponding to the sound sample can be determined, and then the speaking priority of the speaker corresponding to the audio information can be obtained. In addition, while the server acquires the speaking priority of the speaker corresponding to the audio information, the server can also acquire other related information such as the number of speakers corresponding to the audio information. Step S03: Send, according to the speaking priority of the speaker, the audio information corresponding to the speaker with the highest speaking priority as the priority audio information to the terminal, so that the terminal plays the received priority audio information. The server finds the audio information corresponding to the speaker with the highest priority according to the speaking priority of the obtained speaker, and uses the audio information corresponding to the speaker with the highest priority as the priority audio information; the server will find out The priority audio information is sent to each terminal, so that the above-mentioned priority audio information sent by the server is played by each terminal, thereby avoiding the problem of sound interference caused by multiple speakers simultaneously speaking. In a preferred embodiment of the present invention, when the server recognizes that the received audio information corresponds to only one speaker, the server directly sends the audio information to the terminal. When the server recognizes that the received audio information corresponds to multiple speakers, it further identifies the speaking priority of the audio sample corresponding to each audio information, and finds the highest priority speaking priority among the speaking priority levels, and prioritizes the highest speaking priority. The audio information corresponding to the level is sent to the terminal. Further, in order to reduce noise interference during sound playback and improve the clarity of the sound played by the terminal, the server performs the noise reduction processing such as filtering noise on the priority audio information, and then delivers the audio information to each terminal. The server of the embodiment of the present invention receives the audio information corresponding to the remote conference speaker sent by the terminal; searches the pre-stored voice sample database, performs voice recognition on the audio information, and obtains the speaker priority of the speaker corresponding to the audio information; The speaking priority of the speaker, the audio information corresponding to the speaker with the highest speaking priority is sent to the terminal as the priority audio information, so that the terminal plays the received priority audio information; The problem that many people speak at the same time and interfere with each other realizes the purpose of orderly speaking in tele-conferences, and improves the efficiency of meetings and the interactivity of human-machines.

The present invention further provides a second implementation of a method for implementing an ordered speech in a remote conference. The difference between the embodiment and the embodiment shown in FIG. 2 is that the server identifies that the sound sample corresponding to the audio information sent by the terminal is not in the sound sample database. , the action performed. Based on the description of the embodiment shown in FIG. 1 and FIG. 2, as shown in FIG. 3, the method for implementing an ordered speech in the remote conference of the present invention is "step S02, searching for a pre-stored sound sample database in the embodiment of FIG. 2, Performing voice recognition on the audio information, and acquiring the speaking priority of the speaker corresponding to the audio information, the method further includes: Step S13: When the speaker corresponding to the audio information is a stranger, sending the audio information to the Terminal, and treating the sound corresponding to the audio information as noise; In this embodiment, the server performs voice recognition on the received audio information of the terminal, searches the sound sample database, and identifies whether the sound sample corresponding to the audio information can be found; the server cannot find the sound corresponding to the audio information in the sound sample database. In the sample, the server recognizes that the speaker corresponding to the audio information is a stranger, that is, the voice sample of the speaker corresponding to the audio information is not stored in the sound sample database. When the server recognizes that the speaker corresponding to the audio information is a stranger, the server prohibits transmitting the audio information to the terminal, and processes the sound corresponding to the audio information as noise. In this embodiment, the stranger can be understood as: a speaker of the audio information mapping corresponding to the sound sample stored in the sound sample database; that is, the sound mapped by the audio information corresponding to the stranger The sample is not in the sound sample database. It can be understood by those skilled in the art that since the sound sample corresponding to all the participants of the remote conference is stored in the sound sample database, when the server cannot find the sound sample corresponding to the audio information in the sound sample database, it is recognized. If the audio information is from a stranger who is not a participant in the remote conference, the server performs noise processing on the voice corresponding to the audio information. In the embodiment of the present invention, when the server recognizes that the received audio information is from a stranger, it directly performs noise processing on the sound corresponding to the audio information; and automatically masks the voice of the speaker with a low priority, thereby reducing the noise interference of the remote conference. Avoid mixing sounds.

The present invention also provides a third embodiment of a method for implementing an ordered speech in a remote conference; the difference between the embodiment and the embodiment shown in FIG. 2 and FIG. 3 is that the server sends the terminal according to the terminal before the remote conference is officially started. A sound sample, the sound sample database is created. This embodiment is described by taking the difference from the embodiment described in FIG. 2 as an example. Based on the description of the embodiment shown in FIG. 1, FIG. 2 and FIG. 3, as shown in FIG. 4, in the method for implementing an ordered speech in the remote conference of the present invention, in step S01 of the embodiment shown in FIG. 2, the receiving terminal sends The step of the remote conference speaker corresponding to the audio information further includes: Step S11: receiving sound samples respectively corresponding to the participants with different speaking priorities sent by the terminal, and creating the sound sample database according to the sound samples. In this embodiment, after the hardware environment in which the remote conference is running is set up, before the remote conference is officially started, the terminal separately records the participation corresponding to each speaking priority according to the different speaking priority according to the configuration instruction triggered by the user. A sample of the voice of each participant in the teleconference, and sends the recorded sound sample to the server. The server will receive the sound samples respectively corresponding to the participants with different speaking priorities sent by the terminal, and the server establishes the sound sample database according to the received sound samples. The priority of each sound sample in the sound sample sent by the terminal is obtained by weighting the identity of the participant. The terminal is based on user touch The operation instructions are sent to determine the weight of each participant's identity. Usually, the higher the identity of the participants, the higher the priority of their speech. Further, in the embodiment of the present invention, during the remote conference, the remote conference may add participants at any time. When detecting the configuration command triggered by the user, the terminal responds to the configuration command, records a new sound sample corresponding to a certain speaking priority of the participant newly joining the remote conference, and sends the recorded new sound sample to the server; , the recorded new sound sample carries the corresponding speaking priority. Receiving, by the server, a new sound sample corresponding to the participant newly joined to the remote conference, and adding the new sound sample to the sound sample database; wherein the new sound sample received by the server is configured to have a corresponding record Priority of speaking. In a preferred embodiment of the present invention, in order to reduce the data storage pressure of the server, the sound samples stored by the server are only valid in the current remote conference. Once the server receives the operation instruction of the end of the remote conference, the server will The sound sample database corresponding to the remote conference is deleted. The specific application scenario is taken as an example to describe again the implementation process of the server and the terminal performing data interaction and establishing the sound sample database in the method for implementing the ordered speech in the remote conference of the present invention. The server sets the priority of the speech corresponding to each role in the remote conference according to the user's setting instructions. For example, by default, the server weights the participants' identity, and the weights are divided into: leadership identity, moderator identity, and expert. The status of the identity and the status of the ordinary participants, the leader of the leadership has the highest priority, the priority of the speaker corresponds to the second, the priority of the expert corresponds to the third, and the priority of the ordinary participant corresponds to the priority of the speech; In this identity, you can set up multiple people, such as leader 1, leader 2, and the priority leader 1 is higher than leader 2, and so on. The server receives and stores the voice sample of the participant, such as voice data information, and the manner in which the terminal collects the voice sample includes: selecting the identity definition function by the host of the remote conference, corresponding to the leader, the moderator, the expert, and the ordinary participant. Personnel identity, in turn, please ask the relevant personnel to greet the participants. At this time, the sound samples of each person's voice are separately collected by the terminal (for example, through the sound collection device on the terminal microphone), and used by the subsequent server to establish a sound sample database according to the sound sample. , so that the sound is compared and identified. In this way, the priority of each participant in this teleconference is defined successfully, and the remote audio or video conference can be officially started. The server of the embodiment of the present invention establishes a sound sample database corresponding to different speaking priorities, which has the beneficial effect of improving the clarity of the sound transmission. The present invention also provides a first embodiment of a server for implementing an ordered speech in a remote conference; as shown in FIG. 5, the server for implementing an ordered speech in the remote conference of the present invention includes: an information receiving module 01, an information identifying module 02, and Information processing module 03. The information receiving module 01 is configured to receive audio information corresponding to the remote conference speaker sent by the terminal. In the remote conference running environment, the terminal detects the user-triggered operation instruction in real time. When the terminal detects that the user triggers the sound collection instruction (for example, the user speaks through the microphone), or receives the sound information sent by the user, the terminal collects the audio information corresponding to the user; and sends the collected audio information to the server, the server information. The receiving module 01 receives the audio information corresponding to the remote conference speaker sent by the terminal. In the embodiment of the present invention, the audio information sent by the terminal received by the information receiving module 01 may not be the audio information corresponding to the participant's speech corresponding to the current remote conference, but the information receiving module 01 sends the received remote conference terminal. All the audio information is considered to be the audio information corresponding to the remote conference participant; when the server subsequently recognizes the audio information, it is determined whether the audio information is the audio information corresponding to the voice of the remote conference participant. The information recognition module 02 is configured to search a pre-stored sound sample database, perform voice recognition on the audio information, and obtain a speech priority of the speaker corresponding to the audio information; and the information receiving module 01 receives the remote conference speech sent by the terminal. When the person corresponds to the audio information, the information recognition module 02 searches the pre-stored sound sample database to identify whether the sound sample corresponding to the audio information is stored in the sound sample database. In this embodiment, the sound sample database stores sound samples corresponding to all participants of the remote conference. The information identifying module 02 performs voice recognition on the audio information received by the information receiving module 01, and finds a sound sample corresponding to the audio information from the sound sample database, so as to obtain the priority of the speaker corresponding to the audio information according to the searched sound sample. grade. For example, the information recognition module 02 performs a comparison of the received audio information with the sound sample database every 100 milliseconds. It will be understood by those skilled in the art that since different human voices are different, that is, different speakers can be distinguished according to the timbre of the person; therefore, the information recognition module 02 finds the sound sample corresponding to the above audio information in the sound sample database. At this time, the speaker corresponding to the sound sample can be determined, and then the speaking priority of the speaker corresponding to the audio information can be obtained. In addition, the information recognition module 02 acquires the speech priority of the speaker corresponding to the audio information, and can acquire other related information such as the number of speakers corresponding to the audio information. The information processing module 03 is configured to send, according to the speaking priority of the speaker, the audio information corresponding to the speaker with the highest speaking priority as the priority audio information to the terminal, so that the terminal plays the received priority Audio information. According to the speaking priority of the speaker acquired by the information identifying module 02, the information processing module 03 finds the audio information corresponding to the speaker with the highest speaking priority, and uses the audio information corresponding to the speaker with the highest speaking priority as the priority audio. The information processing module 03 sends the found priority audio information to each terminal, so that the above-mentioned priority audio information sent by the server is played by each terminal, thereby avoiding the problem of sound interference caused by multiple speakers simultaneously speaking. In a preferred embodiment of the present invention, when the information recognition module 02 recognizes that the received audio information corresponds to only one speaker, the information processing module 03 directly transmits the audio information to the terminal. When the information identifying module 02 recognizes that the received audio information corresponds to multiple speakers, the information processing module 03 recognizes the speaking priority of the corresponding sound sample of each audio information, and finds the highest priority speech among the speaking priority levels. Priority, the audio information corresponding to the highest speaking priority is sent to the terminal. Further, in order to reduce noise interference during sound playback and improve the clarity of the sound played by the terminal, the information processing module 03 performs noise reduction processing such as filtering noise on the priority audio information, and then delivers the information to each terminal. The server of the embodiment of the present invention receives the audio information corresponding to the remote conference speaker sent by the terminal; searches the pre-stored voice sample database, performs voice recognition on the audio information, and obtains the speaker priority of the speaker corresponding to the audio information; The speaking priority of the speaker, the audio information corresponding to the speaker with the highest speaking priority is sent to the terminal as the priority audio information, so that the terminal plays the received priority audio information; The problem that many people speak at the same time and interfere with each other realizes the purpose of orderly speaking in tele-conferences, and improves the efficiency of meetings and the interactivity of human-machines.

With reference to FIG. 5, in the server for implementing the ordered speech in the remote conference of the present invention, the information processing module 03 is further configured to: when the speaker corresponding to the audio information is a stranger, prohibit sending the audio information to the terminal. And the sound mapped by the audio information is treated as noise; wherein the stranger is: a speaker mapped by the audio information corresponding to the sound sample not stored in the sound sample database. In this embodiment, the information identifying module 02 performs voice recognition on the audio information sent by the terminal received by the information receiving module 01, searches the sound sample database, and identifies whether the sound sample corresponding to the audio information can be found; the information identifying module 02 is in the sound sample. The corresponding audio information cannot be found in the database ^ ^ The information recognition module 02 recognizes that the speaker corresponding to the audio information is a stranger, that is, the voice sample of the speaker corresponding to the audio information is not stored in the sound sample database. When the information recognition module 02 recognizes that the speaker corresponding to the audio information is a stranger, the information processing module 03 prohibits the transmission of the audio information to the terminal, and processes the sound corresponding to the audio information as noise. In this embodiment, the stranger can be understood as: a speaker of the audio information mapping corresponding to the sound sample stored in the sound sample database; that is, the sound mapped by the audio information corresponding to the stranger The sample is not in the sound sample database. It can be understood by those skilled in the art that since the sound sample corresponding to all the participants of the remote conference is stored in the sound sample database, when the information recognition module 02 cannot find the sound sample corresponding to the audio information in the sound sample database, Then, it is recognized that the audio information is from a stranger who is not the current remote conference attendee, and the information processing module 03 performs noise processing on the sound corresponding to the audio information. In the embodiment of the present invention, when the server recognizes that the received audio information is from a stranger, it directly performs noise processing on the sound corresponding to the audio information; and automatically masks the voice of the speaker with a low priority, thereby reducing the noise interference of the remote conference. Avoid mixing sounds.

The embodiment of the present invention further provides a second embodiment of a server for implementing an ordered speech in a remote conference. The difference between this embodiment and the embodiment shown in FIG. 5 is that the server sends a sound according to the terminal before the remote conference is officially started. A sample, the sound sample database is created. Based on the description of the embodiment shown in FIG. 5, as shown in FIG. 6, the server for implementing the ordered speech in the remote conference of the present invention further includes: a database establishing module 04, configured to respectively correspond to the participants with different speaking priorities sent by the receiving terminal a sound sample, and the sound sample database is created based on the sound sample. In this embodiment, after the hardware environment in which the remote conference is running is set up, before the remote conference is officially started, the terminal separately records the participation corresponding to each speaking priority according to the different speaking priority according to the configuration instruction triggered by the user. A sample of the voice of each participant in the teleconference, and sends the recorded sound sample to the server. The database establishing module 04 receives the sound samples respectively corresponding to the participants with different speaking priorities sent by the terminal, and the server establishes the sound sample database according to the received sound samples. The priority of each sound sample in the sound sample sent by the terminal is obtained by weighting the identity of the participant. The terminal determines the weight of each participant's identity according to the operation instruction triggered by the user. Usually, the higher the identity of the participants, the higher the priority of their speech. Further, in the embodiment of the present invention, during the remote conference, the remote conference may add participants at any time. When detecting the configuration command triggered by the user, the terminal responds to the configuration command, records a new sound sample corresponding to a certain speaking priority of the participant newly joining the remote conference, and sends the recorded new sound sample to the server; , the recorded new sound sample carries the corresponding speaking priority. The database establishing module 04 receives a new sound sample corresponding to the participant newly joined to the remote conference sent by the terminal, and adds the new sound sample to the sound sample database; wherein the new sound sample received by the database establishing module 04 is The corresponding speaking priority is configured when recording. In a preferred embodiment of the present invention, in order to reduce the data storage pressure of the server, the sound samples stored by the database establishing module 04 are valid only in the current remote conference. Once the server receives the operation instruction of the end of the remote conference, the database is established. The module 04 deletes the sound sample database corresponding to the remote conference. The specific application scenario is taken as an example to describe again the implementation process of the server and the terminal performing data interaction and establishing the sound sample database in the method for implementing the ordered speech in the remote conference of the present invention. The database establishing module 04 sets the speaking priority corresponding to each role in the remote conference according to the setting instruction of the user; for example, by default, the database establishing module 04 weights the identity of the participants, and the weights are divided into: leadership status. The identity of the moderator, the identity of the expert, and the identity of the ordinary participant, the priority of the leader corresponding to the leader is the highest, the priority of the speaker corresponding to the identity of the host is second, the priority of the speaker corresponding to the identity of the expert is the third, and the identity of the ordinary participant corresponds. Speak priority; and each identity can be set up with multiple people, such as leader 1, leader 2, and speaking priority leader 1 is higher than leader 2, and so on. The database establishing module 04 receives and stores the voice samples of the participants, such as voice data information, and the manner in which the terminal collects the sound samples includes: selecting the identity definition function by the host of the remote conference, corresponding to the leader, the moderator, and the expert The identity of the ordinary participants, in turn, ask the relevant personnel to greet the participants. At this time, the sound samples of each person's voice are collected by the terminal (for example, through the sound collection device on the terminal microphone), and the database creation module 04 for the subsequent server is used. A sound sample database is created based on the above sound samples, thereby performing sound comparison and identifying. In this way, the priority of each participant in this teleconference is defined successfully, and the remote audio or video conference can be officially started. The server of the embodiment of the present invention establishes a sound sample database corresponding to different speaking priorities, which has the beneficial effect of improving the clarity of the sound transmission.

It is to be understood that the term "comprising", "comprising", or any other variants thereof, is intended to encompass a non-exclusive inclusion, such that a process, method, article, or device that comprises a plurality of elements includes not only those elements. It also includes other elements that are not explicitly listed, or are included for this process, law, and things. The elements inherent in the product or device. An element that is defined by the phrase "comprising a ..." does not exclude the presence of additional elements in the process, method, article, or device that comprises the element.

The serial numbers of the embodiments of the present invention are merely for the description, and do not represent the advantages and disadvantages of the embodiments.

Through the description of the above embodiments, those skilled in the art can clearly understand that the foregoing embodiment method can be implemented by means of software plus a necessary general hardware platform, and of course, can also be through hardware, but in many cases, the former is better. Implementation. Based on such understanding, the technical solution of the present invention, which is essential or contributes to the prior art, may be embodied in the form of a software product stored in a storage medium (such as ROM/RAM, disk, The optical disc includes a number of instructions for causing a terminal device (which may be a mobile phone, a computer, a server, or a network device, etc.) to perform the methods described in various embodiments of the present invention.

The above description is only a preferred embodiment of the present invention, and is not intended to limit the scope of the patents. The equivalent structure or equivalent process transformations made by the description of the present invention and the drawings are directly or indirectly applied to other related technical fields. The same is included in the scope of patent protection of the present invention. INDUSTRIAL APPLICABILITY As described above, a method and a server for implementing an ordered speech in a remote conference provided by an embodiment of the present invention have the following beneficial effects: The purpose of orderly speaking in a remote conference is realized, and the conference efficiency and the human-machine capability are improved. Interactivity.

Claims

The method for claiming a book, a method for implementing an ordered speech in a remote conference, comprising the steps of: receiving audio information corresponding to a remote conference speaker sent by the terminal;

Searching a pre-stored sound sample database, performing voice recognition on the audio information, and acquiring a speaker priority level of the speaker corresponding to the audio information;

And according to the speaking priority of the speaker, the audio information corresponding to the speaker with the highest speaking priority is sent to the terminal as the priority audio information, so that the terminal plays the received priority audio information. The method of claim 1, wherein the step of transmitting the audio information corresponding to the speaker with the highest speaking priority as the priority audio information to the terminal according to the speaking priority of the speaker comprises: When the speaker corresponding to the audio information is one, the audio information is sent to the terminal as the priority audio information; when the speaker corresponding to the audio information is at least two, the speaking priority corresponding to each of the speakers is obtained first. The audio information corresponding to the speaker with the highest speaking priority is sent to the terminal as priority audio information. The method of claim 1, wherein the step of searching for a pre-stored sound sample database, performing voice recognition on the audio information, and obtaining a speaker priority of the speaker corresponding to the audio information further comprises: When the speaker corresponding to the audio information is a stranger, the audio information is prohibited from being sent to the terminal, and the sound mapped by the audio information is treated as noise;

The stranger is: a speaker mapped by the audio information corresponding to the sound sample not stored in the sound sample database. The method according to any one of claims 1-3, wherein the step of receiving the audio information corresponding to the remote conference speaker sent by the terminal further comprises:

Receiving sound samples respectively corresponding to the participants with different speaking priorities sent by the terminal, and creating the sound sample database according to the sound samples. The method of claim 4, further comprising: receiving a new sound sample corresponding to the participant newly joined to the remote conference sent by the terminal, adding the new sound sample to the sound sample database; The new sound sample carries the corresponding speaking priority. a server for implementing an ordered speech in a remote conference, comprising: an information receiving module, configured to receive audio information corresponding to a remote conference speaker sent by the terminal; and an information recognition module configured to search a pre-stored sound sample database, Performing voice recognition on the audio information, and acquiring a speaking priority of the speaker corresponding to the audio information;

The information processing module is configured to send, according to the speaking priority of the speaker, the audio information corresponding to the speaker with the highest speaking priority as the priority audio information to the terminal, so that the terminal plays the received priority audio. information. The server according to claim 6, wherein the information processing module is further configured to: when the speaker corresponding to the audio information is one, send the audio information as priority audio information to a terminal; When there are at least two speakers, the speaking priority corresponding to each of the speakers is obtained, and the audio information corresponding to the speaker with the highest speaking priority is sent to the terminal as the priority audio information. The server according to claim 6, wherein the information processing module is further configured to: when the speaker corresponding to the audio information is a stranger, prohibit sending the audio information to the terminal, and mapping the audio information The sound is treated as noise;

The stranger is: a speaker mapped by the audio information corresponding to the sound sample not stored in the sound sample database. The server according to any one of claims 6 to 8, further comprising: a database establishing module, configured to receive sound samples respectively corresponding to the participants with different speaking priorities sent by the terminal, and create the sound samples according to the sound samples The sound sample database. The server of claim 9, wherein the database establishing module is further configured to: Receiving, by the terminal, a new sound sample corresponding to the participant newly joining the remote conference, adding the new sound sample to the sound sample database; wherein the new sound sample carries a corresponding speaking priority.