WO2015172435A1 - Method and server for ordered speaking in teleconference - Google Patents

Method and server for ordered speaking in teleconference Download PDF

Info

Publication number
WO2015172435A1
WO2015172435A1 PCT/CN2014/083233 CN2014083233W WO2015172435A1 WO 2015172435 A1 WO2015172435 A1 WO 2015172435A1 CN 2014083233 W CN2014083233 W CN 2014083233W WO 2015172435 A1 WO2015172435 A1 WO 2015172435A1
Authority
WO
WIPO (PCT)
Prior art keywords
audio information
speaker
priority
terminal
speaking
Prior art date
Application number
PCT/CN2014/083233
Other languages
French (fr)
Chinese (zh)
Inventor
周琦
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Publication of WO2015172435A1 publication Critical patent/WO2015172435A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M3/00Automatic or semi-automatic exchanges
    • H04M3/42Systems providing special services or facilities to subscribers
    • H04M3/56Arrangements for connecting several subscribers to a common circuit, i.e. affording conference facilities
    • H04M3/563User guidance or feature selection
    • H04M3/566User guidance or feature selection relating to a participants right to speak
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M3/00Automatic or semi-automatic exchanges
    • H04M3/42Systems providing special services or facilities to subscribers
    • H04M3/56Arrangements for connecting several subscribers to a common circuit, i.e. affording conference facilities
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M2201/00Electronic components, circuits, software, systems or apparatus used in telephone systems
    • H04M2201/16Sequence circuits

Definitions

  • the present invention relates to the field of communications, and in particular, to a method and a server for implementing an ordered speech in a remote conference.
  • remote conferences for example, conference calls and video conferences
  • the requirements for conference quality, efficiency, and user experience of remote conferences are higher, and how to enable remote conferences to achieve the same effect as real conferences.
  • the user experience has become an urgent problem to be solved.
  • the existing teleconferences such as video conferences
  • Embodiments of the present invention provide a method and a server for implementing an ordered speech in a remote conference, so as to at least solve the problem of mutual interference of sound caused by simultaneous speaking by multiple people in a remote conference in the related art.
  • the embodiment of the invention discloses a method for implementing an ordered speech in a remote conference, which includes the following steps: receiving audio information corresponding to a remote conference speaker sent by the terminal; searching a pre-stored sound sample database, and performing voice on the audio information Identifying, obtaining a speaking priority of the speaker corresponding to the audio information; and transmitting, according to the speaking priority of the speaker, the audio information corresponding to the speaker with the highest speaking priority as the priority audio information to the terminal, The terminal is caused to play the received priority audio information.
  • the step of transmitting, according to the speaking priority of the speaker, the audio information corresponding to the speaker with the highest speaking priority as the priority audio information to the terminal comprises: the speaker corresponding to the audio information is one Sending the audio information as priority audio information to the terminal; When the number of speakers corresponding to the audio information is at least two, the voice priority corresponding to each speaker is obtained, and the audio information corresponding to the speaker with the highest voice priority is sent to the terminal as priority audio information.
  • the step of searching for a pre-stored sound sample database, performing voice recognition on the audio information, and acquiring the number of speakers corresponding to the audio information and the speaking priority corresponding to the speaker further includes: the audio information When the corresponding speaker is a stranger, the audio information is prohibited from being sent to the terminal, and the sound of the audio information mapping is treated as noise; wherein, the stranger is: the sound sample not stored in the sound sample database corresponds to The speaker of the audio information is mapped.
  • the step of receiving the audio information corresponding to the remote conference speaker sent by the terminal further comprises: receiving a sound sample corresponding to the participant with different speaking priorities sent by the terminal, and creating the sound according to the sound sample Sample database.
  • the method for implementing an ordered speech in the remote conference further includes: receiving a new sound sample corresponding to a participant newly joined to the remote conference sent by the terminal, and adding the new sound sample to the sound sample database;
  • the new sound sample carries a corresponding speaking priority.
  • the embodiment of the invention further discloses a server for implementing an ordered speech in a remote conference, comprising: an information receiving module, configured to receive audio information corresponding to a remote conference speaker sent by the terminal; and an information recognition module configured to search for a pre-stored sound a sample database, performing voice recognition on the audio information, and acquiring a speaking priority of the speaker corresponding to the audio information; and the information processing module is configured to set the speaking priority to be the highest according to the speaking priority of the speaker
  • the audio information corresponding to the speaker is sent to the terminal as the priority audio information, so that the terminal plays the received priority audio information.
  • the information processing module is further configured to: When the speaker corresponding to the audio information is one, the audio information is sent to the terminal as the priority audio information; when the speaker corresponding to the audio information is at least two, the speaking priority corresponding to each of the speakers is obtained first. The audio information corresponding to the speaker with the highest speaking priority is sent to the terminal as priority audio information.
  • the information processing module is further configured to: when the speaker corresponding to the audio information is a stranger, prohibit sending the audio information to the terminal, and process the sound of the audio information as noise; The stranger is: a speaker mapped by the audio information corresponding to the stored sound sample in the sound sample database.
  • the server that implements the ordered speech in the remote conference further includes: a database establishing module, configured to receive a sound sample respectively corresponding to the participants with different speaking priorities sent by the terminal, and create the sound according to the sound sample Sample database.
  • the database establishing module is further configured to: receive a new sound sample corresponding to a participant newly joined to the remote conference sent by the terminal, and add the new sound sample to the sound sample database; wherein the new sound sample Carry the corresponding speaking priority.
  • the server of the embodiment of the present invention receives the audio information corresponding to the remote conference speaker sent by the terminal; searches the pre-stored voice sample database, performs voice recognition on the audio information, and obtains the speaker priority of the speaker corresponding to the audio information;
  • the speaking priority of the speaker the audio information corresponding to the speaker with the highest speaking priority is sent to the terminal as the priority audio information, so that the terminal plays the received priority audio information;
  • FIG. 1 is a schematic diagram showing a system architecture for implementing an ordered speech in a remote conference according to an embodiment of the present invention
  • 2 is a schematic flowchart of a first embodiment of a method for implementing an ordered speech in a remote conference according to the present invention
  • FIG. 3 is a schematic flowchart of a second embodiment of a method for implementing an ordered speech in a remote conference according to the present invention
  • FIG. 5 is a schematic diagram of a functional module of a first embodiment of a server for implementing an ordered speech in a remote conference according to the present invention
  • FIG. 6 is a server for implementing an ordered speech in a remote conference according to the present invention.
  • the server can be deployed as a cloud server, and the terminal that interacts with the server can be deployed as a cloud terminal; the remote conference can be deployed in the operating environment of the server in the remote conference of the present invention. Including video conferencing, teleconferencing and other remote audio conferences and remote video conferences.
  • the server 100 performs data interaction with the plurality of terminals 200 (only two terminal examples in FIG. 1 ), and the implementation is not in the same geographical location.
  • a remote meeting is performed between the participants of the location based on the server 100 and the terminal 200.
  • the terminal 200 establishes communication with the server 100 via the Internet to construct an implementation environment for the remote conference.
  • the terminal 200 detects in real time whether a user has triggered a sound collection instruction.
  • the terminal 200 detects that the user triggers the voice collection instruction, for example, the user speaks through the terminal microphone, the terminal 200 collects the audio information of the speaker, and sends the collected audio information to the server 100. Since the server 100 performs data interaction with the plurality of terminals 200, the server 100 may receive audio information transmitted by the plurality of terminals 200 at the same time.
  • the server 100 searches the sound sample database according to the received audio information, and identifies the speaking priority of the speaker corresponding to each of the received plurality of terminals 200; The audio information corresponding to the highest speaker is used as the priority audio information in the audio information collected this time, and the priority audio information is sent to each terminal 200 while shielding other received audio information.
  • each terminal 200 plays the received ⁇ Audio information; thus achieving the purpose of orderly speaking in a remote conference, avoiding the sound interference caused by multiple speakers simultaneously speaking in a remote conference.
  • the present invention also provides a first embodiment of a method for implementing an ordered speech in a remote conference.
  • the method for implementing an ordered speech in the remote conference of the present invention includes the following: Steps: Step S01: Receive audio information corresponding to the remote conference speaker sent by the terminal; in the remote conference running environment, the terminal detects the operation instruction triggered by the user in real time.
  • the terminal detects that the user triggers the sound collection instruction (for example, the user speaks through the microphone), or receives the sound information sent by the user, the terminal collects the audio information corresponding to the user; and sends the collected audio information to the server, and the server receives the terminal.
  • the audio information sent by the terminal received by the server may not be the audio information corresponding to the participant's speech corresponding to the remote conference, but the server considers all the audio information sent by the remote conference terminal to be The audio information corresponding to the remote conference participant; and when the audio information is subsequently identified, it is determined whether the audio information is the audio information corresponding to the voice of the remote conference participant.
  • Step S02 Search a pre-stored sound sample database, perform voice recognition on the audio information, and obtain a speaker priority level of the speaker corresponding to the audio information; and when the server receives the audio information corresponding to the remote conference speaker sent by the terminal, The pre-stored sound sample database is searched to identify whether the sound sample corresponding to the audio information is stored in the sound sample database.
  • the sound sample database stores sound samples corresponding to all participants of the remote conference.
  • the server performs voice recognition on the received audio information, and finds a sound sample corresponding to the audio information from the sound sample database, so as to obtain a priority level of the speaker corresponding to the audio information according to the searched sound sample. For example, the server compares the received audio information with the sound sample database every 100 milliseconds.
  • different voices can be distinguished according to different voices of the person; that is, when the server finds the sound sample corresponding to the audio information in the sound sample database, The speaker corresponding to the sound sample can be determined, and then the speaking priority of the speaker corresponding to the audio information can be obtained.
  • Step S03 Send, according to the speaking priority of the speaker, the audio information corresponding to the speaker with the highest speaking priority as the priority audio information to the terminal, so that the terminal plays the received priority audio information.
  • the server finds the audio information corresponding to the speaker with the highest priority according to the speaking priority of the obtained speaker, and uses the audio information corresponding to the speaker with the highest priority as the priority audio information; the server will find out
  • the priority audio information is sent to each terminal, so that the above-mentioned priority audio information sent by the server is played by each terminal, thereby avoiding the problem of sound interference caused by multiple speakers simultaneously speaking.
  • the server when the server recognizes that the received audio information corresponds to only one speaker, the server directly sends the audio information to the terminal.
  • the server recognizes that the received audio information corresponds to multiple speakers, it further identifies the speaking priority of the audio sample corresponding to each audio information, and finds the highest priority speaking priority among the speaking priority levels, and prioritizes the highest speaking priority.
  • the audio information corresponding to the level is sent to the terminal.
  • the server performs the noise reduction processing such as filtering noise on the priority audio information, and then delivers the audio information to each terminal.
  • the server of the embodiment of the present invention receives the audio information corresponding to the remote conference speaker sent by the terminal; searches the pre-stored voice sample database, performs voice recognition on the audio information, and obtains the speaker priority of the speaker corresponding to the audio information;
  • the speaking priority of the speaker the audio information corresponding to the speaker with the highest speaking priority is sent to the terminal as the priority audio information, so that the terminal plays the received priority audio information;
  • the present invention further provides a second implementation of a method for implementing an ordered speech in a remote conference.
  • the difference between the embodiment and the embodiment shown in FIG. 2 is that the server identifies that the sound sample corresponding to the audio information sent by the terminal is not in the sound sample database. , the action performed.
  • the method for implementing an ordered speech in the remote conference of the present invention is "step S02, searching for a pre-stored sound sample database in the embodiment of FIG.
  • Step S13 When the speaker corresponding to the audio information is a stranger, sending the audio information to the Terminal, and treating the sound corresponding to the audio information as noise;
  • the server performs voice recognition on the received audio information of the terminal, searches the sound sample database, and identifies whether the sound sample corresponding to the audio information can be found; the server cannot find the sound corresponding to the audio information in the sound sample database. In the sample, the server recognizes that the speaker corresponding to the audio information is a stranger, that is, the voice sample of the speaker corresponding to the audio information is not stored in the sound sample database.
  • the server When the server recognizes that the speaker corresponding to the audio information is a stranger, the server prohibits transmitting the audio information to the terminal, and processes the sound corresponding to the audio information as noise.
  • the stranger can be understood as: a speaker of the audio information mapping corresponding to the sound sample stored in the sound sample database; that is, the sound mapped by the audio information corresponding to the stranger The sample is not in the sound sample database. It can be understood by those skilled in the art that since the sound sample corresponding to all the participants of the remote conference is stored in the sound sample database, when the server cannot find the sound sample corresponding to the audio information in the sound sample database, it is recognized.
  • the server performs noise processing on the voice corresponding to the audio information.
  • the server when the server recognizes that the received audio information is from a stranger, it directly performs noise processing on the sound corresponding to the audio information; and automatically masks the voice of the speaker with a low priority, thereby reducing the noise interference of the remote conference. Avoid mixing sounds.
  • the present invention also provides a third embodiment of a method for implementing an ordered speech in a remote conference; the difference between the embodiment and the embodiment shown in FIG. 2 and FIG. 3 is that the server sends the terminal according to the terminal before the remote conference is officially started. A sound sample, the sound sample database is created.
  • This embodiment is described by taking the difference from the embodiment described in FIG. 2 as an example. Based on the description of the embodiment shown in FIG. 1, FIG. 2 and FIG. 3, as shown in FIG. 4, in the method for implementing an ordered speech in the remote conference of the present invention, in step S01 of the embodiment shown in FIG.
  • the receiving terminal sends The step of the remote conference speaker corresponding to the audio information further includes: Step S11: receiving sound samples respectively corresponding to the participants with different speaking priorities sent by the terminal, and creating the sound sample database according to the sound samples.
  • Step S11 receiving sound samples respectively corresponding to the participants with different speaking priorities sent by the terminal, and creating the sound sample database according to the sound samples.
  • the priority of each sound sample in the sound sample sent by the terminal is obtained by weighting the identity of the participant.
  • the terminal is based on user touch
  • the operation instructions are sent to determine the weight of each participant's identity. Usually, the higher the identity of the participants, the higher the priority of their speech. Further, in the embodiment of the present invention, during the remote conference, the remote conference may add participants at any time.
  • the terminal When detecting the configuration command triggered by the user, the terminal responds to the configuration command, records a new sound sample corresponding to a certain speaking priority of the participant newly joining the remote conference, and sends the recorded new sound sample to the server; , the recorded new sound sample carries the corresponding speaking priority.
  • the server Receiving, by the server, a new sound sample corresponding to the participant newly joined to the remote conference, and adding the new sound sample to the sound sample database; wherein the new sound sample received by the server is configured to have a corresponding record Priority of speaking.
  • the sound samples stored by the server are only valid in the current remote conference. Once the server receives the operation instruction of the end of the remote conference, the server will The sound sample database corresponding to the remote conference is deleted.
  • the server sets the priority of the speech corresponding to each role in the remote conference according to the user's setting instructions. For example, by default, the server weights the participants' identity, and the weights are divided into: leadership identity, moderator identity, and expert. The status of the identity and the status of the ordinary participants, the leader of the leadership has the highest priority, the priority of the speaker corresponds to the second, the priority of the expert corresponds to the third, and the priority of the ordinary participant corresponds to the priority of the speech; In this identity, you can set up multiple people, such as leader 1, leader 2, and the priority leader 1 is higher than leader 2, and so on.
  • the server receives and stores the voice sample of the participant, such as voice data information, and the manner in which the terminal collects the voice sample includes: selecting the identity definition function by the host of the remote conference, corresponding to the leader, the moderator, the expert, and the ordinary participant. Personnel identity, in turn, please ask the relevant personnel to greet the participants.
  • the sound samples of each person's voice are separately collected by the terminal (for example, through the sound collection device on the terminal microphone), and used by the subsequent server to establish a sound sample database according to the sound sample. , so that the sound is compared and identified. In this way, the priority of each participant in this teleconference is defined successfully, and the remote audio or video conference can be officially started.
  • the server of the embodiment of the present invention establishes a sound sample database corresponding to different speaking priorities, which has the beneficial effect of improving the clarity of the sound transmission.
  • the present invention also provides a first embodiment of a server for implementing an ordered speech in a remote conference; as shown in FIG. 5, the server for implementing an ordered speech in the remote conference of the present invention includes: an information receiving module 01, an information identifying module 02, and Information processing module 03.
  • the information receiving module 01 is configured to receive audio information corresponding to the remote conference speaker sent by the terminal.
  • the terminal detects the user-triggered operation instruction in real time.
  • the terminal When the terminal detects that the user triggers the sound collection instruction (for example, the user speaks through the microphone), or receives the sound information sent by the user, the terminal collects the audio information corresponding to the user; and sends the collected audio information to the server, the server information.
  • the receiving module 01 receives the audio information corresponding to the remote conference speaker sent by the terminal.
  • the audio information sent by the terminal received by the information receiving module 01 may not be the audio information corresponding to the participant's speech corresponding to the current remote conference, but the information receiving module 01 sends the received remote conference terminal.
  • the information recognition module 02 is configured to search a pre-stored sound sample database, perform voice recognition on the audio information, and obtain a speech priority of the speaker corresponding to the audio information; and the information receiving module 01 receives the remote conference speech sent by the terminal.
  • the information recognition module 02 searches the pre-stored sound sample database to identify whether the sound sample corresponding to the audio information is stored in the sound sample database.
  • the sound sample database stores sound samples corresponding to all participants of the remote conference.
  • the information identifying module 02 performs voice recognition on the audio information received by the information receiving module 01, and finds a sound sample corresponding to the audio information from the sound sample database, so as to obtain the priority of the speaker corresponding to the audio information according to the searched sound sample. grade. For example, the information recognition module 02 performs a comparison of the received audio information with the sound sample database every 100 milliseconds. It will be understood by those skilled in the art that since different human voices are different, that is, different speakers can be distinguished according to the timbre of the person; therefore, the information recognition module 02 finds the sound sample corresponding to the above audio information in the sound sample database. At this time, the speaker corresponding to the sound sample can be determined, and then the speaking priority of the speaker corresponding to the audio information can be obtained.
  • the information recognition module 02 acquires the speech priority of the speaker corresponding to the audio information, and can acquire other related information such as the number of speakers corresponding to the audio information.
  • the information processing module 03 is configured to send, according to the speaking priority of the speaker, the audio information corresponding to the speaker with the highest speaking priority as the priority audio information to the terminal, so that the terminal plays the received priority Audio information.
  • the information processing module 03 finds the audio information corresponding to the speaker with the highest speaking priority, and uses the audio information corresponding to the speaker with the highest speaking priority as the priority audio.
  • the information processing module 03 sends the found priority audio information to each terminal, so that the above-mentioned priority audio information sent by the server is played by each terminal, thereby avoiding the problem of sound interference caused by multiple speakers simultaneously speaking.
  • the information processing module 03 when the information recognition module 02 recognizes that the received audio information corresponds to only one speaker, the information processing module 03 directly transmits the audio information to the terminal.
  • the information processing module 03 recognizes the speaking priority of the corresponding sound sample of each audio information, and finds the highest priority speech among the speaking priority levels. Priority, the audio information corresponding to the highest speaking priority is sent to the terminal.
  • the information processing module 03 performs noise reduction processing such as filtering noise on the priority audio information, and then delivers the information to each terminal.
  • the server of the embodiment of the present invention receives the audio information corresponding to the remote conference speaker sent by the terminal; searches the pre-stored voice sample database, performs voice recognition on the audio information, and obtains the speaker priority of the speaker corresponding to the audio information;
  • the speaking priority of the speaker the audio information corresponding to the speaker with the highest speaking priority is sent to the terminal as the priority audio information, so that the terminal plays the received priority audio information;
  • the problem that many people speak at the same time and interfere with each other realizes the purpose of orderly speaking in tele-conferences, and improves the efficiency of meetings and the interactivity of human-machines.
  • the information processing module 03 is further configured to: when the speaker corresponding to the audio information is a stranger, prohibit sending the audio information to the terminal. And the sound mapped by the audio information is treated as noise; wherein the stranger is: a speaker mapped by the audio information corresponding to the sound sample not stored in the sound sample database.
  • the information identifying module 02 performs voice recognition on the audio information sent by the terminal received by the information receiving module 01, searches the sound sample database, and identifies whether the sound sample corresponding to the audio information can be found; the information identifying module 02 is in the sound sample.
  • the corresponding audio information cannot be found in the database ⁇ ⁇
  • the information recognition module 02 recognizes that the speaker corresponding to the audio information is a stranger, that is, the voice sample of the speaker corresponding to the audio information is not stored in the sound sample database.
  • the information processing module 03 prohibits the transmission of the audio information to the terminal, and processes the sound corresponding to the audio information as noise.
  • the stranger can be understood as: a speaker of the audio information mapping corresponding to the sound sample stored in the sound sample database; that is, the sound mapped by the audio information corresponding to the stranger The sample is not in the sound sample database.
  • the information recognition module 02 cannot find the sound sample corresponding to the audio information in the sound sample database, Then, it is recognized that the audio information is from a stranger who is not the current remote conference attendee, and the information processing module 03 performs noise processing on the sound corresponding to the audio information.
  • the server when the server recognizes that the received audio information is from a stranger, it directly performs noise processing on the sound corresponding to the audio information; and automatically masks the voice of the speaker with a low priority, thereby reducing the noise interference of the remote conference. Avoid mixing sounds.
  • the embodiment of the present invention further provides a second embodiment of a server for implementing an ordered speech in a remote conference.
  • the difference between this embodiment and the embodiment shown in FIG. 5 is that the server sends a sound according to the terminal before the remote conference is officially started. A sample, the sound sample database is created.
  • the server for implementing the ordered speech in the remote conference of the present invention further includes: a database establishing module 04, configured to respectively correspond to the participants with different speaking priorities sent by the receiving terminal a sound sample, and the sound sample database is created based on the sound sample.
  • the terminal after the hardware environment in which the remote conference is running is set up, before the remote conference is officially started, the terminal separately records the participation corresponding to each speaking priority according to the different speaking priority according to the configuration instruction triggered by the user.
  • a sample of the voice of each participant in the teleconference and sends the recorded sound sample to the server.
  • the database establishing module 04 receives the sound samples respectively corresponding to the participants with different speaking priorities sent by the terminal, and the server establishes the sound sample database according to the received sound samples.
  • the priority of each sound sample in the sound sample sent by the terminal is obtained by weighting the identity of the participant.
  • the terminal determines the weight of each participant's identity according to the operation instruction triggered by the user. Usually, the higher the identity of the participants, the higher the priority of their speech.
  • the remote conference may add participants at any time.
  • the terminal When detecting the configuration command triggered by the user, the terminal responds to the configuration command, records a new sound sample corresponding to a certain speaking priority of the participant newly joining the remote conference, and sends the recorded new sound sample to the server; , the recorded new sound sample carries the corresponding speaking priority.
  • the database establishing module 04 receives a new sound sample corresponding to the participant newly joined to the remote conference sent by the terminal, and adds the new sound sample to the sound sample database; wherein the new sound sample received by the database establishing module 04 is The corresponding speaking priority is configured when recording.
  • the sound samples stored by the database establishing module 04 are valid only in the current remote conference.
  • the database is established.
  • the module 04 deletes the sound sample database corresponding to the remote conference.
  • the specific application scenario is taken as an example to describe again the implementation process of the server and the terminal performing data interaction and establishing the sound sample database in the method for implementing the ordered speech in the remote conference of the present invention.
  • the database establishing module 04 sets the speaking priority corresponding to each role in the remote conference according to the setting instruction of the user; for example, by default, the database establishing module 04 weights the identity of the participants, and the weights are divided into: leadership status.
  • the identity of the moderator, the identity of the expert, and the identity of the ordinary participant the priority of the leader corresponding to the leader is the highest, the priority of the speaker corresponding to the identity of the host is second, the priority of the speaker corresponding to the identity of the expert is the third, and the identity of the ordinary participant corresponds. Speak priority; and each identity can be set up with multiple people, such as leader 1, leader 2, and speaking priority leader 1 is higher than leader 2, and so on.
  • the database establishing module 04 receives and stores the voice samples of the participants, such as voice data information, and the manner in which the terminal collects the sound samples includes: selecting the identity definition function by the host of the remote conference, corresponding to the leader, the moderator, and the expert The identity of the ordinary participants, in turn, ask the relevant personnel to greet the participants. At this time, the sound samples of each person's voice are collected by the terminal (for example, through the sound collection device on the terminal microphone), and the database creation module 04 for the subsequent server is used. A sound sample database is created based on the above sound samples, thereby performing sound comparison and identifying. In this way, the priority of each participant in this teleconference is defined successfully, and the remote audio or video conference can be officially started.
  • the server of the embodiment of the present invention establishes a sound sample database corresponding to different speaking priorities, which has the beneficial effect of improving the clarity of the sound transmission.
  • the foregoing embodiment method can be implemented by means of software plus a necessary general hardware platform, and of course, can also be through hardware, but in many cases, the former is better.
  • Implementation Based on such understanding, the technical solution of the present invention, which is essential or contributes to the prior art, may be embodied in the form of a software product stored in a storage medium (such as ROM/RAM, disk,
  • the optical disc includes a number of instructions for causing a terminal device (which may be a mobile phone, a computer, a server, or a network device, etc.) to perform the methods described in various embodiments of the present invention.

Abstract

Disclosed are a method and server for ordered speaking in a teleconference, the method comprising: a server receives audio information corresponding to a teleconference speaker and transmitted by a terminal; searching a pre-stored voice sample database, conducting voice identification on the audio information, and acquiring the speaking priority of the speaker corresponding to the audio information; and according to the speaking priority of the speaker, utilizing the audio information corresponding to the speaker having the highest speaking priority as the preferred audio information, and transmitting to the terminal to enable the terminal to broadcast the received preferred audio information, thus solving the mutual voice interference problem when a plurality of persons simultaneously speak in a teleconference, realizing ordered speaking in a teleconference, and improving conference efficiency and man-machine interactivity.

Description

远程会议中实现有序发言的方法及服务器 技术领域 本发明涉及通信领域, 尤其涉及一种远程会议中实现有序发言的方法及服务器。 背景技术 随着远程会议 (例如: 电话会议和视频会议) 系统的广泛应用, 对远程会议的会 议质量、 效率和用户体验有了更高的要求, 如何使得远程会议能够达到如真实会议一 样的效果和用户体验, 成为目前亟待解决的一个问题。 现有的远程会议比如视频会议过程中, 如果多人同时发言, 则会出现多个发言人 之间声音相互干扰的问题, 导致其他与会人听不清楚发言人所说的话, 特别是在网络 性能不佳时, 其他与会人听到的几乎是一片噪音, 严重影响远程会议的会议质量。 发明内容 本发明实施例提供了一种远程会议中实现有序发言的方法及服务器, 以至少解决 相关技术中的远程会议中多人同时发言所导致的声音相互干扰的问题。 本发明实施例公开了一种远程会议中实现有序发言的方法, 包括以下步骤: 接收终端发送的远程会议发言人对应的音频信息; 查找预先存储的声音样本数据库, 对所述音频信息进行语音识别, 获取所述音频 信息对应的发言人的发言优先级; 根据所述发言人的发言优先级, 将所述发言优先级最高的发言人所对应的音频信 息作为优先音频信息发送至终端, 以使终端播放接收到的所述优先音频信息。 优选地, 所述根据所述发言人的发言优先级, 将所述发言优先级最高的发言人所 对应的音频信息作为优先音频信息发送至终端的步骤包括: 所述音频信息对应的发言人为一个时, 将所述音频信息作为优先音频信息发送至 终端; 所述音频信息对应的发言人至少为两个时, 获取每个所述发言人分别对应的发言 优先级, 将所述发言优先级最高的发言人所对应的音频信息作为优先音频信息发送至 终端。 优选地, 所述查找预先存储的声音样本数据库, 对所述音频信息进行语音识别, 获取所述音频信息对应的发言人数量以及发言人对应的发言优先级的步骤之后还包 括: 所述音频信息对应的发言人为陌生人时, 禁止将所述音频信息发送至终端, 并将 所述音频信息映射的声音作为噪音处理; 其中, 所述陌生人为: 所述声音样本数据库中没有存储的声音样本对应的音频信 息所映射的发言人。 优选地,所述接收终端发送的远程会议发言人对应的音频信息的步骤之前还包括: 接收终端发送的发言优先级不同的与会人员分别对应的声音样本, 并根据所述声 音样本创建所述声音样本数据库。 优选地, 所述远程会议中实现有序发言的方法还包括: 接收终端发送的新加入远程会议的与会人员对应的新声音样本, 将所述新声音样 本添加至所述声音样本数据库; 其中, 所述新声音样本携带对应的发言优先级。 TECHNICAL FIELD The present invention relates to the field of communications, and in particular, to a method and a server for implementing an ordered speech in a remote conference. BACKGROUND With the wide application of remote conferences (for example, conference calls and video conferences), the requirements for conference quality, efficiency, and user experience of remote conferences are higher, and how to enable remote conferences to achieve the same effect as real conferences. And the user experience has become an urgent problem to be solved. In the existing teleconferences, such as video conferences, if multiple people speak at the same time, there will be problems in which the voices of the multiple speakers interfere with each other, causing other participants to hear the speaker's words, especially in the network performance. When it was not good, the other participants heard almost a noise, which seriously affected the quality of the conferences in the teleconference. SUMMARY OF THE INVENTION Embodiments of the present invention provide a method and a server for implementing an ordered speech in a remote conference, so as to at least solve the problem of mutual interference of sound caused by simultaneous speaking by multiple people in a remote conference in the related art. The embodiment of the invention discloses a method for implementing an ordered speech in a remote conference, which includes the following steps: receiving audio information corresponding to a remote conference speaker sent by the terminal; searching a pre-stored sound sample database, and performing voice on the audio information Identifying, obtaining a speaking priority of the speaker corresponding to the audio information; and transmitting, according to the speaking priority of the speaker, the audio information corresponding to the speaker with the highest speaking priority as the priority audio information to the terminal, The terminal is caused to play the received priority audio information. Preferably, the step of transmitting, according to the speaking priority of the speaker, the audio information corresponding to the speaker with the highest speaking priority as the priority audio information to the terminal comprises: the speaker corresponding to the audio information is one Sending the audio information as priority audio information to the terminal; When the number of speakers corresponding to the audio information is at least two, the voice priority corresponding to each speaker is obtained, and the audio information corresponding to the speaker with the highest voice priority is sent to the terminal as priority audio information. . Preferably, the step of searching for a pre-stored sound sample database, performing voice recognition on the audio information, and acquiring the number of speakers corresponding to the audio information and the speaking priority corresponding to the speaker further includes: the audio information When the corresponding speaker is a stranger, the audio information is prohibited from being sent to the terminal, and the sound of the audio information mapping is treated as noise; wherein, the stranger is: the sound sample not stored in the sound sample database corresponds to The speaker of the audio information is mapped. Preferably, the step of receiving the audio information corresponding to the remote conference speaker sent by the terminal further comprises: receiving a sound sample corresponding to the participant with different speaking priorities sent by the terminal, and creating the sound according to the sound sample Sample database. Preferably, the method for implementing an ordered speech in the remote conference further includes: receiving a new sound sample corresponding to a participant newly joined to the remote conference sent by the terminal, and adding the new sound sample to the sound sample database; The new sound sample carries a corresponding speaking priority.
本发明实施例还公开一种远程会议中实现有序发言的服务器, 包括: 信息接收模块, 设置为接收终端发送的远程会议发言人对应的音频信息; 信息识别模块, 设置为查找预先存储的声音样本数据库, 对所述音频信息进行语 音识别, 获取所述音频信息对应的发言人的发言优先级; 信息处理模块, 设置为根据所述发言人的发言优先级, 将所述发言优先级最高的 发言人所对应的音频信息作为优先音频信息发送至终端, 以使终端播放接收到的所述 优先音频信息。 优选地, 所述信息处理模块还设置为: 所述音频信息对应的发言人为一个时, 将所述音频信息作为优先音频信息发送至 终端; 所述音频信息对应的发言人至少为两个时, 获取每个所述发言人分别对应的发言 优先级, 将所述发言优先级最高的发言人所对应的音频信息作为优先音频信息发送至 终端。 优选地, 所述信息处理模块还设置为: 所述音频信息对应的发言人为陌生人时, 禁止将所述音频信息发送至终端, 并将 所述音频信息映射的声音作为噪音处理; 其中, 所述陌生人为: 所述声音样本数据库中没有存储的声音样本对应的音频信 息所映射的发言人。 优选地, 所述远程会议中实现有序发言的服务器还包括: 数据库建立模块, 设置为接收终端发送的发言优先级不同的与会人员分别对应的 声音样本, 并根据所述声音样本创建所述声音样本数据库。 优选地, 所述数据库建立模块还设置为: 接收终端发送的新加入远程会议的与会人员对应的新声音样本, 将所述新声音样 本添加至所述声音样本数据库; 其中, 所述新声音样本携带对应的发言优先级。 The embodiment of the invention further discloses a server for implementing an ordered speech in a remote conference, comprising: an information receiving module, configured to receive audio information corresponding to a remote conference speaker sent by the terminal; and an information recognition module configured to search for a pre-stored sound a sample database, performing voice recognition on the audio information, and acquiring a speaking priority of the speaker corresponding to the audio information; and the information processing module is configured to set the speaking priority to be the highest according to the speaking priority of the speaker The audio information corresponding to the speaker is sent to the terminal as the priority audio information, so that the terminal plays the received priority audio information. Preferably, the information processing module is further configured to: When the speaker corresponding to the audio information is one, the audio information is sent to the terminal as the priority audio information; when the speaker corresponding to the audio information is at least two, the speaking priority corresponding to each of the speakers is obtained first. The audio information corresponding to the speaker with the highest speaking priority is sent to the terminal as priority audio information. Preferably, the information processing module is further configured to: when the speaker corresponding to the audio information is a stranger, prohibit sending the audio information to the terminal, and process the sound of the audio information as noise; The stranger is: a speaker mapped by the audio information corresponding to the stored sound sample in the sound sample database. Preferably, the server that implements the ordered speech in the remote conference further includes: a database establishing module, configured to receive a sound sample respectively corresponding to the participants with different speaking priorities sent by the terminal, and create the sound according to the sound sample Sample database. Preferably, the database establishing module is further configured to: receive a new sound sample corresponding to a participant newly joined to the remote conference sent by the terminal, and add the new sound sample to the sound sample database; wherein the new sound sample Carry the corresponding speaking priority.
本发明实施例服务器接收终端发送的远程会议发言人对应的音频信息; 查找预先 存储的声音样本数据库, 对所述音频信息进行语音识别, 获取所述音频信息对应的发 言人的发言优先级; 根据所述发言人的发言优先级, 将所述发言优先级最高的发言人 所对应的音频信息作为优先音频信息发送至终端, 以使终端播放接收到的所述优先音 频信息; 解决了远程会议中多人同时发言声音相互干扰的问题, 实现了远程会议有序 发言的目的, 提高了会议效率和人机的可交互性。 附图说明 图 1是本发明远程会议中实现有序发言的方法及服务器的运行环境一实施例系统 架构示意图; 图 2是本发明远程会议中实现有序发言的方法第一实施例流程示意图; 图 3是本发明远程会议中实现有序发言的方法第二实施例流程示意图; 图 4是本发明远程会议中实现有序发言的方法第三实施例流程示意图; 图 5是本发明远程会议中实现有序发言的服务器第一实施例功能模块示意图; 图 6是本发明远程会议中实现有序发言的服务器第二实施例功能模块示意图。 The server of the embodiment of the present invention receives the audio information corresponding to the remote conference speaker sent by the terminal; searches the pre-stored voice sample database, performs voice recognition on the audio information, and obtains the speaker priority of the speaker corresponding to the audio information; The speaking priority of the speaker, the audio information corresponding to the speaker with the highest speaking priority is sent to the terminal as the priority audio information, so that the terminal plays the received priority audio information; The problem that many people speak at the same time and interfere with each other realizes the purpose of orderly speaking in tele-conferences, and improves the efficiency of meetings and the interactivity of human-machines. BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 is a schematic diagram showing a system architecture for implementing an ordered speech in a remote conference according to an embodiment of the present invention; 2 is a schematic flowchart of a first embodiment of a method for implementing an ordered speech in a remote conference according to the present invention; FIG. 3 is a schematic flowchart of a second embodiment of a method for implementing an ordered speech in a remote conference according to the present invention; FIG. 5 is a schematic diagram of a functional module of a first embodiment of a server for implementing an ordered speech in a remote conference according to the present invention; FIG. 6 is a server for implementing an ordered speech in a remote conference according to the present invention. A schematic diagram of a functional module of the second embodiment.
本发明实施例目的的实现、 功能特点及优点将结合实施例, 参照附图做进一步说 明。 具体实施方式 以下结合说明书附图及具体实施例进一步说明本发明的技术方案。 应当理解, 此 处所描述的具体实施例仅仅用以解释本发明, 并不用于限定本发明。 The implementation, functional features, and advantages of the embodiments of the present invention will be further described with reference to the accompanying drawings. DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS The technical solutions of the present invention will be further described below in conjunction with the drawings and specific embodiments. It is understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
实施本发明远程会议中实现有序发言的方法以及远程会议中实现有序发言的服务 器的运行环境中, 服务器可以部署为云服务器, 与服务器进行交互的终端可以部署为 云终端; 所述远程会议包括电视会议、 电话会议等远程音频会议和远程视频会议。 如图 1所示, 本发明远程会议中实现有序发言的方法及服务器的运行环境中, 服 务器 100与多个终端 200 (图 1中仅以 2个终端示例) 进行数据交互, 实现不在同一 地理位置的各与会人员之间基于服务器 100和终端 200进行远程会议。 本实施例中, 终端 200与服务器 100通过互联网建立通信, 构建远程会议的实施环境。 当远程会议 实施环境部署完毕后,终端 200实时检测是否有用户触发了声音采集指令。当终端 200 检测到用户触发了声音采集指令, 比如用户通过终端麦克风发言, 则终端 200采集发 言人的音频信息, 并将采集的音频信息发送至服务器 100。 由于服务器 100与多个终 端 200进行数据交互, 因此, 在同一时刻服务器 100可能接收到多个终端 200发送的 音频信息。 在同时接收到多个终端 200发送的音频信息时, 服务器 100根据接收到的 音频信息, 查找声音样本数据库, 识别所接收的多个终端 200分别对应的发言人的发 言优先级; 将发言优先级最高的发言人对应的音频信息作为本次采集的音频信息中的 优先音频信息, 并将该优先音频信息下发至各终端 200, 同时屏蔽接收到的其他音频 信息。 各终端 200接收到服务器 100下发的优先音频信息后, 播放所接收的 ^ 音频信息; 从而达到远程会议中有序发言的目的, 避免了远程会议中当多个发言人同 时发言所带来的声音干扰。 The server can be deployed as a cloud server, and the terminal that interacts with the server can be deployed as a cloud terminal; the remote conference can be deployed in the operating environment of the server in the remote conference of the present invention. Including video conferencing, teleconferencing and other remote audio conferences and remote video conferences. As shown in FIG. 1 , in the method for implementing the ordered speech in the remote conference of the present invention and the operating environment of the server, the server 100 performs data interaction with the plurality of terminals 200 (only two terminal examples in FIG. 1 ), and the implementation is not in the same geographical location. A remote meeting is performed between the participants of the location based on the server 100 and the terminal 200. In this embodiment, the terminal 200 establishes communication with the server 100 via the Internet to construct an implementation environment for the remote conference. After the remote conference implementation environment is deployed, the terminal 200 detects in real time whether a user has triggered a sound collection instruction. When the terminal 200 detects that the user triggers the voice collection instruction, for example, the user speaks through the terminal microphone, the terminal 200 collects the audio information of the speaker, and sends the collected audio information to the server 100. Since the server 100 performs data interaction with the plurality of terminals 200, the server 100 may receive audio information transmitted by the plurality of terminals 200 at the same time. When receiving the audio information sent by the plurality of terminals 200 at the same time, the server 100 searches the sound sample database according to the received audio information, and identifies the speaking priority of the speaker corresponding to each of the received plurality of terminals 200; The audio information corresponding to the highest speaker is used as the priority audio information in the audio information collected this time, and the priority audio information is sent to each terminal 200 while shielding other received audio information. After receiving the priority audio information sent by the server 100, each terminal 200 plays the received ^ Audio information; thus achieving the purpose of orderly speaking in a remote conference, avoiding the sound interference caused by multiple speakers simultaneously speaking in a remote conference.
本发明还提供了一种远程会议中实现有序发言的方法第一实施例; 基于图 1所述 实施例的描述, 如图 2所示, 本发明远程会议中实现有序发言的方法包括以下步骤: 步骤 S01、 接收终端发送的远程会议发言人对应的音频信息; 在远程会议运行环境中, 终端实时检测用户触发的操作指令。 当终端检测到用户 触发了声音采集指令(比如, 用户通过麦克风讲话), 或者接收到用户发送的声音信息 时, 终端采集用户对应的音频信息; 并将采集的音频信息发送至服务器, 服务器接收 终端发送的远程会议发言人对应的音频信息。 本发明实施例中, 服务器接收的终端发送的上述音频信息也有可能不是本次远程 会议对应的与会人员讲话所对应的音频信息, 但服务器均将接收到的远程会议终端发 送的所有音频信息认为是远程会议与会人员对应的音频信息; 并在后续对上述音频信 息进行识别时, 再判断上述音频信息是否为本次远程会议与会人员所发出的声音对应 的音频信息。 步骤 S02、 查找预先存储的声音样本数据库, 对所述音频信息进行语音识别, 获 取所述音频信息对应的发言人的发言优先级; 服务器接收到终端发送的远程会议发言人对应的音频信息时, 查找预先存储的声 音样本数据库, 识别声音样本数据库中是否存储有上述音频信息所对应的声音样本。 本实施例中, 所述声音样本数据库中存储了本次远程会议所有与会人员对应的声音样 本。 服务器对接收到的上述音频信息进行语音识别, 从声音样本数据库中找到上述音 频信息对应的声音样本, 从而根据查找的声音样本, 获取上述音频信息对应的发言人 的优先等级。 比如, 服务器每间隔 100毫秒对接收到的音频信息与声音样本数据库进 行一次比对。 本领域的技术人员可以理解, 由于不同的人声音不同, 也就是说, 可以根据人的 音色来区分不同的发言人; 因此, 服务器在声音样本数据库中找到上述音频信息对应 的声音样本时, 即可确定该声音样本对应的发言人, 进而便可获取上述音频信息对应 的发言人的发言优先级。 另外, 在服务器获取上述音频信息对应的发言人的发言优先 级的同时, 也能够获取到上述音频信息对应的发言人的数量等其他相关信息。 步骤 S03、 根据所述发言人的发言优先级, 将所述发言优先级最高的发言人所对 应的音频信息作为优先音频信息发送至终端, 以使终端播放接收到的所述优先音频信 息。 服务器根据获取的发言人的发言优先级, 找出发言优先级最高的发言人所对应的 音频信息, 并将发言优先级最高的发言人所对应的音频信息作为优先音频信息; 服务 器将找出的优先音频信息发送至各个终端, 从而由各个终端播放服务器发送的上述优 先音频信息, 避免了多个发言人同时发言所带来的声音干扰问题。 在本发明一优选实施例中, 当服务器识别出接收到的音频信息仅对应一个发言人 时, 直接将该音频信息发送至终端。 当服务器识别出接收到的音频信息对应多个发言 人时, 再识别出各音频信息对应声音样本的发言优先级, 并找出上述发言优先级中优 先级最高的发言优先级, 将最高发言优先级对应的音频信息发送至终端。 进一步地, 为了降低声音播放时噪音干扰、 提高终端播放声音的清晰度, 服务器 将优先音频信息进行过滤噪音等降噪处理后再下发至各终端。 本发明实施例服务器接收终端发送的远程会议发言人对应的音频信息; 查找预先 存储的声音样本数据库, 对所述音频信息进行语音识别, 获取所述音频信息对应的发 言人的发言优先级; 根据所述发言人的发言优先级, 将所述发言优先级最高的发言人 所对应的音频信息作为优先音频信息发送至终端, 以使终端播放接收到的所述优先音 频信息; 解决了远程会议中多人同时发言声音相互干扰的问题, 实现了远程会议有序 发言的目的, 提高了会议效率和人机的可交互性。 The present invention also provides a first embodiment of a method for implementing an ordered speech in a remote conference. Based on the description of the embodiment shown in FIG. 1, as shown in FIG. 2, the method for implementing an ordered speech in the remote conference of the present invention includes the following: Steps: Step S01: Receive audio information corresponding to the remote conference speaker sent by the terminal; in the remote conference running environment, the terminal detects the operation instruction triggered by the user in real time. When the terminal detects that the user triggers the sound collection instruction (for example, the user speaks through the microphone), or receives the sound information sent by the user, the terminal collects the audio information corresponding to the user; and sends the collected audio information to the server, and the server receives the terminal. The audio information corresponding to the remote conference spokesperson sent. In the embodiment of the present invention, the audio information sent by the terminal received by the server may not be the audio information corresponding to the participant's speech corresponding to the remote conference, but the server considers all the audio information sent by the remote conference terminal to be The audio information corresponding to the remote conference participant; and when the audio information is subsequently identified, it is determined whether the audio information is the audio information corresponding to the voice of the remote conference participant. Step S02: Search a pre-stored sound sample database, perform voice recognition on the audio information, and obtain a speaker priority level of the speaker corresponding to the audio information; and when the server receives the audio information corresponding to the remote conference speaker sent by the terminal, The pre-stored sound sample database is searched to identify whether the sound sample corresponding to the audio information is stored in the sound sample database. In this embodiment, the sound sample database stores sound samples corresponding to all participants of the remote conference. The server performs voice recognition on the received audio information, and finds a sound sample corresponding to the audio information from the sound sample database, so as to obtain a priority level of the speaker corresponding to the audio information according to the searched sound sample. For example, the server compares the received audio information with the sound sample database every 100 milliseconds. Those skilled in the art can understand that different voices can be distinguished according to different voices of the person; that is, when the server finds the sound sample corresponding to the audio information in the sound sample database, The speaker corresponding to the sound sample can be determined, and then the speaking priority of the speaker corresponding to the audio information can be obtained. In addition, while the server acquires the speaking priority of the speaker corresponding to the audio information, the server can also acquire other related information such as the number of speakers corresponding to the audio information. Step S03: Send, according to the speaking priority of the speaker, the audio information corresponding to the speaker with the highest speaking priority as the priority audio information to the terminal, so that the terminal plays the received priority audio information. The server finds the audio information corresponding to the speaker with the highest priority according to the speaking priority of the obtained speaker, and uses the audio information corresponding to the speaker with the highest priority as the priority audio information; the server will find out The priority audio information is sent to each terminal, so that the above-mentioned priority audio information sent by the server is played by each terminal, thereby avoiding the problem of sound interference caused by multiple speakers simultaneously speaking. In a preferred embodiment of the present invention, when the server recognizes that the received audio information corresponds to only one speaker, the server directly sends the audio information to the terminal. When the server recognizes that the received audio information corresponds to multiple speakers, it further identifies the speaking priority of the audio sample corresponding to each audio information, and finds the highest priority speaking priority among the speaking priority levels, and prioritizes the highest speaking priority. The audio information corresponding to the level is sent to the terminal. Further, in order to reduce noise interference during sound playback and improve the clarity of the sound played by the terminal, the server performs the noise reduction processing such as filtering noise on the priority audio information, and then delivers the audio information to each terminal. The server of the embodiment of the present invention receives the audio information corresponding to the remote conference speaker sent by the terminal; searches the pre-stored voice sample database, performs voice recognition on the audio information, and obtains the speaker priority of the speaker corresponding to the audio information; The speaking priority of the speaker, the audio information corresponding to the speaker with the highest speaking priority is sent to the terminal as the priority audio information, so that the terminal plays the received priority audio information; The problem that many people speak at the same time and interfere with each other realizes the purpose of orderly speaking in tele-conferences, and improves the efficiency of meetings and the interactivity of human-machines.
本发明还提供了一种远程会议中实现有序发言的方法第二实施; 本实施例与图 2 所述实施例的区别是, 服务器识别终端发送的音频信息对应的声音样本不在声音样本 数据库时, 所执行的操作。 基于图 1、 图 2所述实施例的描述, 如图 3所示, 本发明远程会议中实现有序发 言的方法在图 2所述实施例的 "步骤 S02、 查找预先存储的声音样本数据库, 对所述 音频信息进行语音识别, 获取所述音频信息对应的发言人的发言优先级"之后还包括: 步骤 S13、 所述音频信息对应的发言人为陌生人时, 禁止将所述音频信息发送至 终端, 并将所述音频信息对应的声音作为噪音处理; 本实施例中, 服务器对接收到的终端发送的音频信息进行语音识别, 查找声音样 本数据库, 识别是否能够找到上述音频信息对应的声音样本; 服务器在声音样本数据 库中不能找到上述音频信息对应的声音样本时, 服务器识别出上述音频信息对应的发 言人为陌生人,即该音频信息对应的发言人的声音样本没有存储在声音样本数据库中。 服务器识别出上述音频信息对应的发言人为陌生人时, 服务器禁止将上述音频信息发 送至终端, 并将上述音频信息对应的声音作为噪音进行处理。 本实施例中, 所述陌生人可以理解为: 所述声音样本数据库中没有存储的声音样 本所对应的音频信息映射的发言人; 也就是说, 所述陌生人对应的音频信息所映射的 声音样本不在所述声音样本数据库中。 本领域的技术人员可以理解, 由于声音样本数 据库中存储了本次远程会议所有与会人员对应的声音样本, 因此, 当服务器在上述声 音样本数据库中找不到音频信息对应的声音样本时, 便识别出上述音频信息来自于非 本次远程会议与会人员的陌生人, 则服务器将上述音频信息对应的声音作噪音处理。 本发明实施例服务器识别出接收的音频信息为来自陌生人时, 直接将上述音频信 息对应的声音进行噪音处理; 并自动屏蔽发言优先级低的发言人的声音, 降低了远程 会议的噪音干扰, 避免了声音混杂。 The present invention further provides a second implementation of a method for implementing an ordered speech in a remote conference. The difference between the embodiment and the embodiment shown in FIG. 2 is that the server identifies that the sound sample corresponding to the audio information sent by the terminal is not in the sound sample database. , the action performed. Based on the description of the embodiment shown in FIG. 1 and FIG. 2, as shown in FIG. 3, the method for implementing an ordered speech in the remote conference of the present invention is "step S02, searching for a pre-stored sound sample database in the embodiment of FIG. 2, Performing voice recognition on the audio information, and acquiring the speaking priority of the speaker corresponding to the audio information, the method further includes: Step S13: When the speaker corresponding to the audio information is a stranger, sending the audio information to the Terminal, and treating the sound corresponding to the audio information as noise; In this embodiment, the server performs voice recognition on the received audio information of the terminal, searches the sound sample database, and identifies whether the sound sample corresponding to the audio information can be found; the server cannot find the sound corresponding to the audio information in the sound sample database. In the sample, the server recognizes that the speaker corresponding to the audio information is a stranger, that is, the voice sample of the speaker corresponding to the audio information is not stored in the sound sample database. When the server recognizes that the speaker corresponding to the audio information is a stranger, the server prohibits transmitting the audio information to the terminal, and processes the sound corresponding to the audio information as noise. In this embodiment, the stranger can be understood as: a speaker of the audio information mapping corresponding to the sound sample stored in the sound sample database; that is, the sound mapped by the audio information corresponding to the stranger The sample is not in the sound sample database. It can be understood by those skilled in the art that since the sound sample corresponding to all the participants of the remote conference is stored in the sound sample database, when the server cannot find the sound sample corresponding to the audio information in the sound sample database, it is recognized. If the audio information is from a stranger who is not a participant in the remote conference, the server performs noise processing on the voice corresponding to the audio information. In the embodiment of the present invention, when the server recognizes that the received audio information is from a stranger, it directly performs noise processing on the sound corresponding to the audio information; and automatically masks the voice of the speaker with a low priority, thereby reducing the noise interference of the remote conference. Avoid mixing sounds.
本发明还提供了一种远程会议中实现有序发言的方法第三实施例; 本实施例与图 2、 图 3所示实施例的区别是, 在远程会议正式开始之前, 服务器根据终端发送的声音 样本, 建立所述声音样本数据库。本实施例以与图 2所述实施例的区别为例进行描述。 基于图 1、 图 2和图 3所述实施例的描述, 如图 4所示, 本发明远程会议中实现 有序发言的方法中, 在图 2所述实施例的 "步骤 S01、 接收终端发送的远程会议发言 人对应的音频信息" 的步骤之前还包括: 步骤 Sll、 接收终端发送的发言优先级不同的与会人员分别对应的声音样本, 并 根据所述声音样本创建所述声音样本数据库。 本实施例中, 在组建好远程会议运行的硬件环境之后, 在正式开始进行远程会议 之前, 终端根据用户触发的配置指令, 基于不同的发言优先级, 分别录制各发言优先 级对应的参与本次远程会议的各与会人员的声音样本, 并将录制的上述声音样本发送 至服务器。服务器将接收终端发送的发言优先级不同的与会人员分别对应的声音样本, 根据接收的上述声音样本, 服务器建立所述声音样本数据库。 其中, 终端发送的声音 样本中, 各声音样本的优先级是对与会人员的身份进行加权得到的。 终端根据用户触 发的操作指令来确定各与会人员身份权重的高低。 通常情况下, 与会人员的身份权重 越高, 其发言的优先级越高。 进一步地, 本发明实施例中, 在远程会议进行过程中, 该远程会议可随时添加与 会人员。 终端检测到用户触发的配置指令时, 响应上述配置指令, 录制新加入本次远 程会议的与会人员在某一发言优先级所对应的新声音样本, 并将录制的新声音样本发 送至服务器; 其中, 录制的新声音样本携带对应的发言优先级。 服务器接收终端发送 的新加入远程会议的与会人对应的新声音样本, 并将所述新声音样本添加至所述声音 样本数据库; 其中, 服务器接收的所述新声音样本在录制时已配置有对应的发言优先 级。 在本发明一优选实施例中, 为了降低服务器的数据存储压力, 服务器存储的声音 样本仅在本次远程会议中有效, 一旦服务器接收到本次远程会议结束的操作指令, 则 服务器便将本次远程会议对应的所述声音样本数据库删除。 下面以具体的应用场景为例,再次描述本发明远程会议中实现有序发言的方法中, 服务器及终端进行数据交互, 建立声音样本数据库的实现过程。 服务器根据用户的设置指令, 设置远程会议中各个角色对应的发言优先级; 比如, 默认情况下, 服务器对与会人员身份进行加权, 权重由高到低依次分为: 领导身份、 主持人身份、 专家身份、 普通参与人员身份, 则领导身份对应的发言优先级最高, 主 持人身份对应的发言优先级第二, 专家身份对应的发言优先级第三, 普通参与人员身 份对应的发言优先级; 而每种身份里又可以设置多位人员, 如领导 1、 领导 2, 且发言 优先级领导 1高于领导 2, 以此类推等。 服务器接收并存储终端发送的与会人员的声音样本比如声音数据信息, 终端对上 述声音样本的采集方式包括: 由本次远程会议的主持人选择身份定义功能,对应领导、 主持人、 专家、 普通参与人员身份, 依次请相关人员向与会人员打招呼, 此时通过终 端(比如, 通过终端麦克风上的声音采集装置)分别采集每个人的声音制作声音样本, 用于后续服务器根据上述声音样本建立声音样本数据库, 从而进行声音比对并进行身 份识别。 这样本次远程会议的每个与会人员的发言优先级就定义成功了, 便可以正式 开始远程音频或者视频会议。 本发明实施例服务器建立对应于不同发言优先级的声音样本数据库, 具有提高声 音传输清晰度的有益效果。 本发明还提供了一种远程会议中实现有序发言的服务器第一实施例;如图 5所示, 本发明远程会议中实现有序发言的服务器包括: 信息接收模块 01、 信息识别模块 02 和信息处理模块 03。 信息接收模块 01, 设置为接收终端发送的远程会议发言人对应的音频信息; 在远程会议运行环境中, 终端实时检测用户触发的操作指令。 当终端检测到用户 触发了声音采集指令(例如: 用户通过麦克风讲话), 或者接收到用户发送的声音信息 时, 终端采集用户对应的音频信息; 并将采集的音频信息发送至服务器, 服务器的信 息接收模块 01接收终端发送的远程会议发言人对应的音频信息。 本发明实施例中,信息接收模块 01接收的终端发送的上述音频信息也有可能不是 本次远程会议对应的与会人员讲话所对应的音频信息,但信息接收模块 01均将接收到 的远程会议终端发送的所有音频信息认为是远程会议与会人员对应的音频信息; 服务 器在后续对上述音频信息进行识别时, 再判断上述音频信息是否为本次远程会议与会 人员所发出的声音对应的音频信息。 信息识别模块 02, 设置为查找预先存储的声音样本数据库, 对所述音频信息进行 语音识别, 获取所述音频信息对应的发言人的发言优先级; 信息接收模块 01接收到终端发送的远程会议发言人对应的音频信息时,信息识别 模块 02查找预先存储的声音样本数据库,识别声音样本数据库中是否存储有上述音频 信息所对应的声音样本。 本实施例中, 所述声音样本数据库中存储了本次远程会议所 有与会人员对应的声音样本。 信息识别模块 02对信息接收模块 01接收到的上述音频 信息进行语音识别, 从声音样本数据库中找到上述音频信息对应的声音样本, 从而根 据查找的声音样本, 获取上述音频信息对应的发言人的优先等级。 比如, 信息识别模 块 02每间隔 100毫秒对接收到的音频信息与声音样本数据库进行一次比对。 本领域的技术人员可以理解, 由于不同的人声音不同, 也就是说, 可以根据人的 音色来区分不同的发言人; 因此,信息识别模块 02在声音样本数据库中找到上述音频 信息对应的声音样本时, 即可确定该声音样本对应的发言人, 进而便可获取上述音频 信息对应的发言人的发言优先级。另外,在信息识别模块 02获取上述音频信息对应的 发言人的发言优先级的同时, 也能够获取到上述音频信息对应的发言人的数量等其他 相关信息。 信息处理模块 03, 设置为根据所述发言人的发言优先级, 将所述发言优先级最高 的发言人所对应的音频信息作为优先音频信息发送至终端, 以使终端播放接收到的所 述优先音频信息。 根据信息识别模块 02获取的发言人的发言优先级, 信息处理模块 03找出发言优 先级最高的发言人所对应的音频信息, 并将发言优先级最高的发言人所对应的音频信 息作为优先音频信息; 信息处理模块 03将找出的优先音频信息发送至各个终端,从而 由各个终端播放服务器发送的上述优先音频信息, 避免了多个发言人同时发言所带来 的声音干扰问题。 在本发明一优选实施例中,当信息识别模块 02识别出接收到的音频信息仅对应一 个发言人时, 信息处理模块 03直接将该音频信息发送至终端。 当信息识别模块 02识 别出接收到的音频信息对应多个发言人时,信息处理模块 03再识别出各音频信息对应 声音样本的发言优先级, 并找出上述发言优先级中优先级最高的发言优先级, 将最高 发言优先级对应的音频信息发送至终端。 进一步地, 为了降低声音播放时噪音干扰、 提高终端播放声音的清晰度, 信息处 理模块 03将优先音频信息进行过滤噪音等降噪处理后再下发至各终端。 本发明实施例服务器接收终端发送的远程会议发言人对应的音频信息; 查找预先 存储的声音样本数据库, 对所述音频信息进行语音识别, 获取所述音频信息对应的发 言人的发言优先级; 根据所述发言人的发言优先级, 将所述发言优先级最高的发言人 所对应的音频信息作为优先音频信息发送至终端, 以使终端播放接收到的所述优先音 频信息; 解决了远程会议中多人同时发言声音相互干扰的问题, 实现了远程会议有序 发言的目的, 提高了会议效率和人机的可交互性。 The present invention also provides a third embodiment of a method for implementing an ordered speech in a remote conference; the difference between the embodiment and the embodiment shown in FIG. 2 and FIG. 3 is that the server sends the terminal according to the terminal before the remote conference is officially started. A sound sample, the sound sample database is created. This embodiment is described by taking the difference from the embodiment described in FIG. 2 as an example. Based on the description of the embodiment shown in FIG. 1, FIG. 2 and FIG. 3, as shown in FIG. 4, in the method for implementing an ordered speech in the remote conference of the present invention, in step S01 of the embodiment shown in FIG. 2, the receiving terminal sends The step of the remote conference speaker corresponding to the audio information further includes: Step S11: receiving sound samples respectively corresponding to the participants with different speaking priorities sent by the terminal, and creating the sound sample database according to the sound samples. In this embodiment, after the hardware environment in which the remote conference is running is set up, before the remote conference is officially started, the terminal separately records the participation corresponding to each speaking priority according to the different speaking priority according to the configuration instruction triggered by the user. A sample of the voice of each participant in the teleconference, and sends the recorded sound sample to the server. The server will receive the sound samples respectively corresponding to the participants with different speaking priorities sent by the terminal, and the server establishes the sound sample database according to the received sound samples. The priority of each sound sample in the sound sample sent by the terminal is obtained by weighting the identity of the participant. The terminal is based on user touch The operation instructions are sent to determine the weight of each participant's identity. Usually, the higher the identity of the participants, the higher the priority of their speech. Further, in the embodiment of the present invention, during the remote conference, the remote conference may add participants at any time. When detecting the configuration command triggered by the user, the terminal responds to the configuration command, records a new sound sample corresponding to a certain speaking priority of the participant newly joining the remote conference, and sends the recorded new sound sample to the server; , the recorded new sound sample carries the corresponding speaking priority. Receiving, by the server, a new sound sample corresponding to the participant newly joined to the remote conference, and adding the new sound sample to the sound sample database; wherein the new sound sample received by the server is configured to have a corresponding record Priority of speaking. In a preferred embodiment of the present invention, in order to reduce the data storage pressure of the server, the sound samples stored by the server are only valid in the current remote conference. Once the server receives the operation instruction of the end of the remote conference, the server will The sound sample database corresponding to the remote conference is deleted. The specific application scenario is taken as an example to describe again the implementation process of the server and the terminal performing data interaction and establishing the sound sample database in the method for implementing the ordered speech in the remote conference of the present invention. The server sets the priority of the speech corresponding to each role in the remote conference according to the user's setting instructions. For example, by default, the server weights the participants' identity, and the weights are divided into: leadership identity, moderator identity, and expert. The status of the identity and the status of the ordinary participants, the leader of the leadership has the highest priority, the priority of the speaker corresponds to the second, the priority of the expert corresponds to the third, and the priority of the ordinary participant corresponds to the priority of the speech; In this identity, you can set up multiple people, such as leader 1, leader 2, and the priority leader 1 is higher than leader 2, and so on. The server receives and stores the voice sample of the participant, such as voice data information, and the manner in which the terminal collects the voice sample includes: selecting the identity definition function by the host of the remote conference, corresponding to the leader, the moderator, the expert, and the ordinary participant. Personnel identity, in turn, please ask the relevant personnel to greet the participants. At this time, the sound samples of each person's voice are separately collected by the terminal (for example, through the sound collection device on the terminal microphone), and used by the subsequent server to establish a sound sample database according to the sound sample. , so that the sound is compared and identified. In this way, the priority of each participant in this teleconference is defined successfully, and the remote audio or video conference can be officially started. The server of the embodiment of the present invention establishes a sound sample database corresponding to different speaking priorities, which has the beneficial effect of improving the clarity of the sound transmission. The present invention also provides a first embodiment of a server for implementing an ordered speech in a remote conference; as shown in FIG. 5, the server for implementing an ordered speech in the remote conference of the present invention includes: an information receiving module 01, an information identifying module 02, and Information processing module 03. The information receiving module 01 is configured to receive audio information corresponding to the remote conference speaker sent by the terminal. In the remote conference running environment, the terminal detects the user-triggered operation instruction in real time. When the terminal detects that the user triggers the sound collection instruction (for example, the user speaks through the microphone), or receives the sound information sent by the user, the terminal collects the audio information corresponding to the user; and sends the collected audio information to the server, the server information. The receiving module 01 receives the audio information corresponding to the remote conference speaker sent by the terminal. In the embodiment of the present invention, the audio information sent by the terminal received by the information receiving module 01 may not be the audio information corresponding to the participant's speech corresponding to the current remote conference, but the information receiving module 01 sends the received remote conference terminal. All the audio information is considered to be the audio information corresponding to the remote conference participant; when the server subsequently recognizes the audio information, it is determined whether the audio information is the audio information corresponding to the voice of the remote conference participant. The information recognition module 02 is configured to search a pre-stored sound sample database, perform voice recognition on the audio information, and obtain a speech priority of the speaker corresponding to the audio information; and the information receiving module 01 receives the remote conference speech sent by the terminal. When the person corresponds to the audio information, the information recognition module 02 searches the pre-stored sound sample database to identify whether the sound sample corresponding to the audio information is stored in the sound sample database. In this embodiment, the sound sample database stores sound samples corresponding to all participants of the remote conference. The information identifying module 02 performs voice recognition on the audio information received by the information receiving module 01, and finds a sound sample corresponding to the audio information from the sound sample database, so as to obtain the priority of the speaker corresponding to the audio information according to the searched sound sample. grade. For example, the information recognition module 02 performs a comparison of the received audio information with the sound sample database every 100 milliseconds. It will be understood by those skilled in the art that since different human voices are different, that is, different speakers can be distinguished according to the timbre of the person; therefore, the information recognition module 02 finds the sound sample corresponding to the above audio information in the sound sample database. At this time, the speaker corresponding to the sound sample can be determined, and then the speaking priority of the speaker corresponding to the audio information can be obtained. In addition, the information recognition module 02 acquires the speech priority of the speaker corresponding to the audio information, and can acquire other related information such as the number of speakers corresponding to the audio information. The information processing module 03 is configured to send, according to the speaking priority of the speaker, the audio information corresponding to the speaker with the highest speaking priority as the priority audio information to the terminal, so that the terminal plays the received priority Audio information. According to the speaking priority of the speaker acquired by the information identifying module 02, the information processing module 03 finds the audio information corresponding to the speaker with the highest speaking priority, and uses the audio information corresponding to the speaker with the highest speaking priority as the priority audio. The information processing module 03 sends the found priority audio information to each terminal, so that the above-mentioned priority audio information sent by the server is played by each terminal, thereby avoiding the problem of sound interference caused by multiple speakers simultaneously speaking. In a preferred embodiment of the present invention, when the information recognition module 02 recognizes that the received audio information corresponds to only one speaker, the information processing module 03 directly transmits the audio information to the terminal. When the information identifying module 02 recognizes that the received audio information corresponds to multiple speakers, the information processing module 03 recognizes the speaking priority of the corresponding sound sample of each audio information, and finds the highest priority speech among the speaking priority levels. Priority, the audio information corresponding to the highest speaking priority is sent to the terminal. Further, in order to reduce noise interference during sound playback and improve the clarity of the sound played by the terminal, the information processing module 03 performs noise reduction processing such as filtering noise on the priority audio information, and then delivers the information to each terminal. The server of the embodiment of the present invention receives the audio information corresponding to the remote conference speaker sent by the terminal; searches the pre-stored voice sample database, performs voice recognition on the audio information, and obtains the speaker priority of the speaker corresponding to the audio information; The speaking priority of the speaker, the audio information corresponding to the speaker with the highest speaking priority is sent to the terminal as the priority audio information, so that the terminal plays the received priority audio information; The problem that many people speak at the same time and interfere with each other realizes the purpose of orderly speaking in tele-conferences, and improves the efficiency of meetings and the interactivity of human-machines.
请继续参照图 5, 本发明远程会议中实现有序发言的服务器中, 所述信息处理模 块 03还设置为: 所述音频信息对应的发言人为陌生人时, 禁止将所述音频信息发送至终端, 并将 所述音频信息映射的声音作为噪音处理; 其中, 所述陌生人为: 所述声音样本数据库 中没有存储的声音样本对应的音频信息所映射的发言人。 本实施例中, 信息识别模块 02对信息接收模块 01接收到的终端发送的音频信息 进行语音识别, 查找声音样本数据库, 识别是否能够找到上述音频信息对应的声音样 本; 信息识别模块 02在声音样本数据库中不能找到上述音频信息对应的声音 ^ ^ 信息识别模块 02识别出上述音频信息对应的发言人为陌生人,即该音频信息对应的发 言人的声音样本没有存储在声音样本数据库中。信息识别模块 02识别出上述音频信息 对应的发言人为陌生人时,信息处理模块 03禁止将上述音频信息发送至终端, 并将上 述音频信息对应的声音作为噪音进行处理。 本实施例中, 所述陌生人可以理解为: 所述声音样本数据库中没有存储的声音样 本所对应的音频信息映射的发言人; 也就是说, 所述陌生人对应的音频信息所映射的 声音样本不在所述声音样本数据库中。 本领域的技术人员可以理解, 由于声音样本数 据库中存储了本次远程会议所有与会人员对应的声音样本, 因此, 当信息识别模块 02 在上述声音样本数据库中找不到音频信息对应的声音样本时, 便识别出上述音频信息 来自于非本次远程会议与会人员的陌生人,则信息处理模块 03将上述音频信息对应的 声音作噪音处理。 本发明实施例服务器识别出接收的音频信息为来自陌生人时, 直接将上述音频信 息对应的声音进行噪音处理; 并自动屏蔽发言优先级低的发言人的声音, 降低了远程 会议的噪音干扰, 避免了声音混杂。 With reference to FIG. 5, in the server for implementing the ordered speech in the remote conference of the present invention, the information processing module 03 is further configured to: when the speaker corresponding to the audio information is a stranger, prohibit sending the audio information to the terminal. And the sound mapped by the audio information is treated as noise; wherein the stranger is: a speaker mapped by the audio information corresponding to the sound sample not stored in the sound sample database. In this embodiment, the information identifying module 02 performs voice recognition on the audio information sent by the terminal received by the information receiving module 01, searches the sound sample database, and identifies whether the sound sample corresponding to the audio information can be found; the information identifying module 02 is in the sound sample. The corresponding audio information cannot be found in the database ^ ^ The information recognition module 02 recognizes that the speaker corresponding to the audio information is a stranger, that is, the voice sample of the speaker corresponding to the audio information is not stored in the sound sample database. When the information recognition module 02 recognizes that the speaker corresponding to the audio information is a stranger, the information processing module 03 prohibits the transmission of the audio information to the terminal, and processes the sound corresponding to the audio information as noise. In this embodiment, the stranger can be understood as: a speaker of the audio information mapping corresponding to the sound sample stored in the sound sample database; that is, the sound mapped by the audio information corresponding to the stranger The sample is not in the sound sample database. It can be understood by those skilled in the art that since the sound sample corresponding to all the participants of the remote conference is stored in the sound sample database, when the information recognition module 02 cannot find the sound sample corresponding to the audio information in the sound sample database, Then, it is recognized that the audio information is from a stranger who is not the current remote conference attendee, and the information processing module 03 performs noise processing on the sound corresponding to the audio information. In the embodiment of the present invention, when the server recognizes that the received audio information is from a stranger, it directly performs noise processing on the sound corresponding to the audio information; and automatically masks the voice of the speaker with a low priority, thereby reducing the noise interference of the remote conference. Avoid mixing sounds.
本发明实施例还提供了一种远程会议中实现有序发言的服务器第二实施例; 本实 施例与图 5所示实施例的区别是, 在远程会议正式开始之前, 服务器根据终端发送的 声音样本, 建立所述声音样本数据库。 基于图 5所述实施例的描述, 如图 6所示, 本发明远程会议中实现有序发言的服 务器还包括: 数据库建立模块 04, 设置为接收终端发送的发言优先级不同的与会人员分别对应 的声音样本, 并根据所述声音样本创建所述声音样本数据库。 本实施例中, 在组建好远程会议运行的硬件环境之后, 在正式开始进行远程会议 之前, 终端根据用户触发的配置指令, 基于不同的发言优先级, 分别录制各发言优先 级对应的参与本次远程会议的各与会人员的声音样本, 并将录制的上述声音样本发送 至服务器。数据库建立模块 04将接收终端发送的发言优先级不同的与会人员分别对应 的声音样本, 根据接收的上述声音样本, 服务器建立所述声音样本数据库。 其中, 终 端发送的声音样本中, 各声音样本的优先级是对与会人员的身份进行加权得到的。 终 端根据用户触发的操作指令来确定各与会人员身份权重的高低。 通常情况下, 与会人 员的身份权重越高, 其发言的优先级越高。 进一步地, 本发明实施例中, 在远程会议进行过程中, 该远程会议可随时添加与 会人员。 终端检测到用户触发的配置指令时, 响应上述配置指令, 录制新加入本次远 程会议的与会人员在某一发言优先级所对应的新声音样本, 并将录制的新声音样本发 送至服务器; 其中, 录制的新声音样本携带对应的发言优先级。 数据库建立模块 04 接收终端发送的新加入远程会议的与会人对应的新声音样本, 并将所述新声音样本添 加至所述声音样本数据库; 其中,数据库建立模块 04接收的所述新声音样本在录制时 已配置有对应的发言优先级。 在本发明一优选实施例中, 为了降低服务器的数据存储 压力,数据库建立模块 04存储的声音样本仅在本次远程会议中有效,一旦服务器接收 到本次远程会议结束的操作指令,则数据库建立模块 04便将本次远程会议对应的所述 声音样本数据库删除。 下面以具体的应用场景为例,再次描述本发明远程会议中实现有序发言的方法中, 服务器及终端进行数据交互, 建立声音样本数据库的实现过程。 数据库建立模块 04根据用户的设置指令,设置远程会议中各个角色对应的发言优 先级; 比如, 默认情况下, 数据库建立模块 04对与会人员身份进行加权, 权重由高到 低依次分为: 领导身份、 主持人身份、 专家身份、 普通参与人员身份, 则领导身份对 应的发言优先级最高, 主持人身份对应的发言优先级第二, 专家身份对应的发言优先 级第三, 普通参与人员身份对应的发言优先级; 而每种身份里又可以设置多位人员, 如领导 1、 领导 2, 且发言优先级领导 1高于领导 2, 以此类推等。 数据库建立模块 04 接收并存储终端发送的与会人员的声音样本比如声音数据信 息, 终端对上述声音样本的采集方式包括: 由本次远程会议的主持人选择身份定义功 能, 对应领导、 主持人、 专家、 普通参与人员身份, 依次请相关人员向与会人员打招 呼, 此时通过终端 (比如, 通过终端麦克风上的声音采集装置) 分别采集每个人的声 音制作声音样本,用于后续服务器的数据库建立模块 04根据上述声音样本建立声音样 本数据库, 从而进行声音比对并进行身份识别。 这样本次远程会议的每个与会人员的 发言优先级就定义成功了, 便可以正式开始远程音频或者视频会议。 本发明实施例服务器建立对应于不同发言优先级的声音样本数据库, 具有提高声 音传输清晰度的有益效果。 The embodiment of the present invention further provides a second embodiment of a server for implementing an ordered speech in a remote conference. The difference between this embodiment and the embodiment shown in FIG. 5 is that the server sends a sound according to the terminal before the remote conference is officially started. A sample, the sound sample database is created. Based on the description of the embodiment shown in FIG. 5, as shown in FIG. 6, the server for implementing the ordered speech in the remote conference of the present invention further includes: a database establishing module 04, configured to respectively correspond to the participants with different speaking priorities sent by the receiving terminal a sound sample, and the sound sample database is created based on the sound sample. In this embodiment, after the hardware environment in which the remote conference is running is set up, before the remote conference is officially started, the terminal separately records the participation corresponding to each speaking priority according to the different speaking priority according to the configuration instruction triggered by the user. A sample of the voice of each participant in the teleconference, and sends the recorded sound sample to the server. The database establishing module 04 receives the sound samples respectively corresponding to the participants with different speaking priorities sent by the terminal, and the server establishes the sound sample database according to the received sound samples. The priority of each sound sample in the sound sample sent by the terminal is obtained by weighting the identity of the participant. The terminal determines the weight of each participant's identity according to the operation instruction triggered by the user. Usually, the higher the identity of the participants, the higher the priority of their speech. Further, in the embodiment of the present invention, during the remote conference, the remote conference may add participants at any time. When detecting the configuration command triggered by the user, the terminal responds to the configuration command, records a new sound sample corresponding to a certain speaking priority of the participant newly joining the remote conference, and sends the recorded new sound sample to the server; , the recorded new sound sample carries the corresponding speaking priority. The database establishing module 04 receives a new sound sample corresponding to the participant newly joined to the remote conference sent by the terminal, and adds the new sound sample to the sound sample database; wherein the new sound sample received by the database establishing module 04 is The corresponding speaking priority is configured when recording. In a preferred embodiment of the present invention, in order to reduce the data storage pressure of the server, the sound samples stored by the database establishing module 04 are valid only in the current remote conference. Once the server receives the operation instruction of the end of the remote conference, the database is established. The module 04 deletes the sound sample database corresponding to the remote conference. The specific application scenario is taken as an example to describe again the implementation process of the server and the terminal performing data interaction and establishing the sound sample database in the method for implementing the ordered speech in the remote conference of the present invention. The database establishing module 04 sets the speaking priority corresponding to each role in the remote conference according to the setting instruction of the user; for example, by default, the database establishing module 04 weights the identity of the participants, and the weights are divided into: leadership status. The identity of the moderator, the identity of the expert, and the identity of the ordinary participant, the priority of the leader corresponding to the leader is the highest, the priority of the speaker corresponding to the identity of the host is second, the priority of the speaker corresponding to the identity of the expert is the third, and the identity of the ordinary participant corresponds. Speak priority; and each identity can be set up with multiple people, such as leader 1, leader 2, and speaking priority leader 1 is higher than leader 2, and so on. The database establishing module 04 receives and stores the voice samples of the participants, such as voice data information, and the manner in which the terminal collects the sound samples includes: selecting the identity definition function by the host of the remote conference, corresponding to the leader, the moderator, and the expert The identity of the ordinary participants, in turn, ask the relevant personnel to greet the participants. At this time, the sound samples of each person's voice are collected by the terminal (for example, through the sound collection device on the terminal microphone), and the database creation module 04 for the subsequent server is used. A sound sample database is created based on the above sound samples, thereby performing sound comparison and identifying. In this way, the priority of each participant in this teleconference is defined successfully, and the remote audio or video conference can be officially started. The server of the embodiment of the present invention establishes a sound sample database corresponding to different speaking priorities, which has the beneficial effect of improving the clarity of the sound transmission.
需要说明的是, 在本文中, 术语 "包括"、 "包含"或者其任何其他变体意在涵盖 非排他性的包含, 从而使得包括一系列要素的过程、 方法、 物品或者装置不仅包括那 些要素, 而且还包括没有明确列出的其他要素, 或者是还包括为这种过程、 卞法、 物 品或者装置所固有的要素。 在没有更多限制的情况下, 由语句 "包括一个…… " 限定 的要素, 并不排除在包括该要素的过程、 方法、 物品或者装置中还存在另外的相同要 素。 It is to be understood that the term "comprising", "comprising", or any other variants thereof, is intended to encompass a non-exclusive inclusion, such that a process, method, article, or device that comprises a plurality of elements includes not only those elements. It also includes other elements that are not explicitly listed, or are included for this process, law, and things. The elements inherent in the product or device. An element that is defined by the phrase "comprising a ..." does not exclude the presence of additional elements in the process, method, article, or device that comprises the element.
上述本发明实施例序号仅仅为了描述, 不代表实施例的优劣。 The serial numbers of the embodiments of the present invention are merely for the description, and do not represent the advantages and disadvantages of the embodiments.
通过以上的实施方式的描述, 本领域的技术人员可以清楚地了解到上述实施例方 法可借助软件加必需的通用硬件平台的方式来实现, 当然也可以通过硬件, 但很多情 况下前者是更佳的实施方式。 基于这样的理解, 本发明的技术方案本质上或者说对现 有技术做出贡献的部分可以以软件产品的形式体现出来, 该计算机软件产品存储在一 个存储介质 (如 ROM/RAM、 磁碟、 光盘) 中, 包括若干指令用以使得一台终端设备 (可以是手机, 计算机, 服务器, 或者网络设备等) 执行本发明各个实施例所述的方 法。 Through the description of the above embodiments, those skilled in the art can clearly understand that the foregoing embodiment method can be implemented by means of software plus a necessary general hardware platform, and of course, can also be through hardware, but in many cases, the former is better. Implementation. Based on such understanding, the technical solution of the present invention, which is essential or contributes to the prior art, may be embodied in the form of a software product stored in a storage medium (such as ROM/RAM, disk, The optical disc includes a number of instructions for causing a terminal device (which may be a mobile phone, a computer, a server, or a network device, etc.) to perform the methods described in various embodiments of the present invention.
以上所述仅为本发明的优选实施例, 并非因此限制其专利范围, 凡是利用本发明 说明书及附图内容所作的等效结构或等效流程变换, 直接或间接运用在其他相关的技 术领域, 均同理包括在本发明的专利保护范围内。 工业实用性 如上所述, 本发明实施例提供的一种远程会议中实现有序发言的方法及服务器具 有以下有益效果: 实现了远程会议有序发言的目的, 提高了会议效率和人机的可交互 性。 The above description is only a preferred embodiment of the present invention, and is not intended to limit the scope of the patents. The equivalent structure or equivalent process transformations made by the description of the present invention and the drawings are directly or indirectly applied to other related technical fields. The same is included in the scope of patent protection of the present invention. INDUSTRIAL APPLICABILITY As described above, a method and a server for implementing an ordered speech in a remote conference provided by an embodiment of the present invention have the following beneficial effects: The purpose of orderly speaking in a remote conference is realized, and the conference efficiency and the human-machine capability are improved. Interactivity.

Claims

权 利 要 求 书 、 一种远程会议中实现有序发言的方法, 包括以下步骤: 接收终端发送的远程会议发言人对应的音频信息; The method for claiming a book, a method for implementing an ordered speech in a remote conference, comprising the steps of: receiving audio information corresponding to a remote conference speaker sent by the terminal;
查找预先存储的声音样本数据库, 对所述音频信息进行语音识别, 获取所 述音频信息对应的发言人的发言优先级;  Searching a pre-stored sound sample database, performing voice recognition on the audio information, and acquiring a speaker priority level of the speaker corresponding to the audio information;
根据所述发言人的发言优先级, 将所述发言优先级最高的发言人所对应的 音频信息作为优先音频信息发送至终端, 以使终端播放接收到的所述优先音频 信息。 、 如权利要求 1所述的方法, 其中, 所述根据所述发言人的发言优先级, 将所述 发言优先级最高的发言人所对应的音频信息作为优先音频信息发送至终端的步 骤包括: 所述音频信息对应的发言人为一个时, 将所述音频信息作为优先音频信息 发送至终端; 所述音频信息对应的发言人至少为两个时, 获取每个所述发言人分别对应 的发言优先级, 将所述发言优先级最高的发言人所对应的音频信息作为优先音 频信息发送至终端。 、 如权利要求 1所述的方法, 其中, 所述查找预先存储的声音样本数据库, 对所 述音频信息进行语音识别, 获取所述音频信息对应的发言人的发言优先级的步 骤之后还包括: 所述音频信息对应的发言人为陌生人时,禁止将所述音频信息发送至终端, 并将所述音频信息映射的声音作为噪音处理;  And according to the speaking priority of the speaker, the audio information corresponding to the speaker with the highest speaking priority is sent to the terminal as the priority audio information, so that the terminal plays the received priority audio information. The method of claim 1, wherein the step of transmitting the audio information corresponding to the speaker with the highest speaking priority as the priority audio information to the terminal according to the speaking priority of the speaker comprises: When the speaker corresponding to the audio information is one, the audio information is sent to the terminal as the priority audio information; when the speaker corresponding to the audio information is at least two, the speaking priority corresponding to each of the speakers is obtained first. The audio information corresponding to the speaker with the highest speaking priority is sent to the terminal as priority audio information. The method of claim 1, wherein the step of searching for a pre-stored sound sample database, performing voice recognition on the audio information, and obtaining a speaker priority of the speaker corresponding to the audio information further comprises: When the speaker corresponding to the audio information is a stranger, the audio information is prohibited from being sent to the terminal, and the sound mapped by the audio information is treated as noise;
其中, 所述陌生人为: 所述声音样本数据库中没有存储的声音样本对应的 音频信息所映射的发言人。 、 如权利要求 1-3任一项所述的方法, 其中, 所述接收终端发送的远程会议发言 人对应的音频信息的步骤之前还包括:  The stranger is: a speaker mapped by the audio information corresponding to the sound sample not stored in the sound sample database. The method according to any one of claims 1-3, wherein the step of receiving the audio information corresponding to the remote conference speaker sent by the terminal further comprises:
接收终端发送的发言优先级不同的与会人员分别对应的声音样本, 并根据 所述声音样本创建所述声音样本数据库。 、 如权利要求 4所述的方法, 其中, 还包括: 接收终端发送的新加入远程会议的与会人员对应的新声音样本, 将所述新 声音样本添加至所述声音样本数据库; 其中, 所述新声音样本携带对应的发言 优先级。 、 一种远程会议中实现有序发言的服务器, 包括: 信息接收模块, 设置为接收终端发送的远程会议发言人对应的音频信息; 信息识别模块, 设置为查找预先存储的声音样本数据库, 对所述音频信息 进行语音识别, 获取所述音频信息对应的发言人的发言优先级; Receiving sound samples respectively corresponding to the participants with different speaking priorities sent by the terminal, and creating the sound sample database according to the sound samples. The method of claim 4, further comprising: receiving a new sound sample corresponding to the participant newly joined to the remote conference sent by the terminal, adding the new sound sample to the sound sample database; The new sound sample carries the corresponding speaking priority. a server for implementing an ordered speech in a remote conference, comprising: an information receiving module, configured to receive audio information corresponding to a remote conference speaker sent by the terminal; and an information recognition module configured to search a pre-stored sound sample database, Performing voice recognition on the audio information, and acquiring a speaking priority of the speaker corresponding to the audio information;
信息处理模块, 设置为根据所述发言人的发言优先级, 将所述发言优先级 最高的发言人所对应的音频信息作为优先音频信息发送至终端, 以使终端播放 接收到的所述优先音频信息。 、 如权利要求 6所述的服务器, 其中, 所述信息处理模块还设置为: 所述音频信息对应的发言人为一个时, 将所述音频信息作为优先音频信息 发送至终端; 所述音频信息对应的发言人至少为两个时, 获取每个所述发言人分别对应 的发言优先级, 将所述发言优先级最高的发言人所对应的音频信息作为优先音 频信息发送至终端。 、 如权利要求 6所述的服务器, 其中, 所述信息处理模块还设置为: 所述音频信息对应的发言人为陌生人时,禁止将所述音频信息发送至终端, 并将所述音频信息映射的声音作为噪音处理;  The information processing module is configured to send, according to the speaking priority of the speaker, the audio information corresponding to the speaker with the highest speaking priority as the priority audio information to the terminal, so that the terminal plays the received priority audio. information. The server according to claim 6, wherein the information processing module is further configured to: when the speaker corresponding to the audio information is one, send the audio information as priority audio information to a terminal; When there are at least two speakers, the speaking priority corresponding to each of the speakers is obtained, and the audio information corresponding to the speaker with the highest speaking priority is sent to the terminal as the priority audio information. The server according to claim 6, wherein the information processing module is further configured to: when the speaker corresponding to the audio information is a stranger, prohibit sending the audio information to the terminal, and mapping the audio information The sound is treated as noise;
其中, 所述陌生人为: 所述声音样本数据库中没有存储的声音样本对应的 音频信息所映射的发言人。 、 如权利要求 6-8任一项所述的服务器, 其中, 还包括: 数据库建立模块, 设置为接收终端发送的发言优先级不同的与会人员分别 对应的声音样本, 并根据所述声音样本创建所述声音样本数据库。 0、 如权利要求 9所述的服务器, 其中, 所述数据库建立模块还设置为: 接收终端发送的新加入远程会议的与会人员对应的新声音样本, 将所述新 声音样本添加至所述声音样本数据库; 其中, 所述新声音样本携带对应的发言 优先级。 The stranger is: a speaker mapped by the audio information corresponding to the sound sample not stored in the sound sample database. The server according to any one of claims 6 to 8, further comprising: a database establishing module, configured to receive sound samples respectively corresponding to the participants with different speaking priorities sent by the terminal, and create the sound samples according to the sound samples The sound sample database. The server of claim 9, wherein the database establishing module is further configured to: Receiving, by the terminal, a new sound sample corresponding to the participant newly joining the remote conference, adding the new sound sample to the sound sample database; wherein the new sound sample carries a corresponding speaking priority.
PCT/CN2014/083233 2014-05-14 2014-07-29 Method and server for ordered speaking in teleconference WO2015172435A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201410204396.1 2014-05-14
CN201410204396.1A CN105100521A (en) 2014-05-14 2014-05-14 Method and server for realizing ordered speech in teleconference

Publications (1)

Publication Number Publication Date
WO2015172435A1 true WO2015172435A1 (en) 2015-11-19

Family

ID=54479218

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2014/083233 WO2015172435A1 (en) 2014-05-14 2014-07-29 Method and server for ordered speaking in teleconference

Country Status (2)

Country Link
CN (1) CN105100521A (en)
WO (1) WO2015172435A1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112468762A (en) * 2020-11-03 2021-03-09 视联动力信息技术股份有限公司 Method and device for switching speakers, terminal equipment and storage medium
CN112468760A (en) * 2020-09-29 2021-03-09 南京熊猫电子股份有限公司 Scheduling system and method for video consultation of high-definition mobile video equipment
CN112950424A (en) * 2021-03-04 2021-06-11 深圳市鹰硕技术有限公司 Online education interaction method and device
US11652857B2 (en) * 2020-12-10 2023-05-16 Verizon Patent And Licensing Inc. Computerized system and method for video conferencing priority and allocation using mobile edge computing

Families Citing this family (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106878230A (en) * 2015-12-10 2017-06-20 中国电信股份有限公司 Audio-frequency processing method, server and system in network telephone conference
CN105706442A (en) * 2016-01-19 2016-06-22 王晓光 Microphone control method and system for video web conference
WO2017124340A1 (en) * 2016-01-20 2017-07-27 王晓光 Figure recognition method and system for network video conference
KR102460105B1 (en) * 2016-05-03 2022-10-27 삼성에스디에스 주식회사 Method for providing conference service and apparatus thereof
CN106375283B (en) * 2016-08-29 2019-10-22 上海倍增智能科技有限公司 A kind of more conferencing datas quickly position and select system
CN106445654B (en) * 2016-08-31 2019-06-11 北京康力优蓝机器人科技有限公司 Determine the method and device of responsing control command priority
CN107580191A (en) * 2017-09-06 2018-01-12 合肥庆响网络科技有限公司 Tele-conferencing system
CN107749313B (en) * 2017-11-23 2019-03-01 郑州大学第一附属医院 A kind of method of automatic transcription and generation Telemedicine Consultation record
CN110099241A (en) * 2018-01-31 2019-08-06 北京视联动力国际信息技术有限公司 A kind of transmission method and device of audio/video flow
CN110324723B (en) * 2018-03-29 2022-03-08 华为技术有限公司 Subtitle generating method and terminal
CN108595645B (en) * 2018-04-26 2020-10-30 深圳市鹰硕技术有限公司 Conference speech management method and device
CN110636243B (en) * 2018-06-22 2022-03-01 中兴通讯股份有限公司 Conference control method and MCU
CN109302576B (en) * 2018-09-05 2020-08-25 视联动力信息技术股份有限公司 Conference processing method and device
CN110266996B (en) * 2019-06-17 2021-05-18 国家电网有限公司 Video conference control method and device and terminal equipment
CN112422879B (en) * 2019-08-20 2022-10-28 华为技术有限公司 Method and device for dynamically adjusting media capability
CN111753769A (en) * 2020-06-29 2020-10-09 歌尔科技有限公司 Terminal audio acquisition control method, electronic equipment and readable storage medium
CN112862461A (en) * 2021-03-03 2021-05-28 游密科技(深圳)有限公司 Conference process control method, device, server and storage medium
CN113596381A (en) * 2021-07-01 2021-11-02 海南视联通信技术有限公司 Audio data acquisition method and device
CN114222031A (en) * 2021-12-21 2022-03-22 瑞德电子(信丰)有限公司 Bidirectional audio data transmission method for network audio socket
CN116939150B (en) * 2023-09-14 2023-11-24 北京橙色风暴数字技术有限公司 Multimedia platform monitoring system and method based on machine vision

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070263821A1 (en) * 2006-03-31 2007-11-15 Shmuel Shaffer Method and apparatus for enhancing speaker selection
CN102036166A (en) * 2009-09-25 2011-04-27 普天信息技术研究院有限公司 Talk right management method in digital trunking communication system
US8290134B2 (en) * 2007-07-26 2012-10-16 International Business Machines Corporation Managing conference calls via a talk queue
US20140003595A1 (en) * 2012-06-29 2014-01-02 International Business Machines Corporation Managing voice collision in multi-party communications

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070263821A1 (en) * 2006-03-31 2007-11-15 Shmuel Shaffer Method and apparatus for enhancing speaker selection
US8290134B2 (en) * 2007-07-26 2012-10-16 International Business Machines Corporation Managing conference calls via a talk queue
CN102036166A (en) * 2009-09-25 2011-04-27 普天信息技术研究院有限公司 Talk right management method in digital trunking communication system
US20140003595A1 (en) * 2012-06-29 2014-01-02 International Business Machines Corporation Managing voice collision in multi-party communications

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112468760A (en) * 2020-09-29 2021-03-09 南京熊猫电子股份有限公司 Scheduling system and method for video consultation of high-definition mobile video equipment
CN112468762A (en) * 2020-11-03 2021-03-09 视联动力信息技术股份有限公司 Method and device for switching speakers, terminal equipment and storage medium
CN112468762B (en) * 2020-11-03 2024-04-02 视联动力信息技术股份有限公司 Switching method and device of speaking parties, terminal equipment and storage medium
US11652857B2 (en) * 2020-12-10 2023-05-16 Verizon Patent And Licensing Inc. Computerized system and method for video conferencing priority and allocation using mobile edge computing
CN112950424A (en) * 2021-03-04 2021-06-11 深圳市鹰硕技术有限公司 Online education interaction method and device
CN112950424B (en) * 2021-03-04 2023-12-19 深圳市鹰硕技术有限公司 Online education interaction method and device

Also Published As

Publication number Publication date
CN105100521A (en) 2015-11-25

Similar Documents

Publication Publication Date Title
WO2015172435A1 (en) Method and server for ordered speaking in teleconference
US8606249B1 (en) Methods and systems for enhancing audio quality during teleconferencing
CN105814535B (en) Virtual assistant in calling
US8400489B2 (en) Method of controlling a video conference
JP5137376B2 (en) Two-way telephony trainer and exerciser
US10574827B1 (en) Method and apparatus of processing user data of a multi-speaker conference call
US11502863B2 (en) Automatic correction of erroneous audio setting
US20050271194A1 (en) Conference phone and network client
EP2868072B1 (en) Metric for meeting commencement in a voice conferencing system
US20090132256A1 (en) Command and control of devices and applications by voice using a communication base system
WO2014180371A1 (en) Conference control method and device, and conference system
US20120259924A1 (en) Method and apparatus for providing summary information in a live media session
CN111683183B (en) Multimedia conference non-participant conversation shielding processing method and system thereof
US10236016B1 (en) Peripheral-based selection of audio sources
US20210409882A1 (en) Centrally controlling communication at a venue
EP3049949A1 (en) Call handling
CN111199751B (en) Microphone shielding method and device and electronic equipment
US11924370B2 (en) Method for controlling a real-time conversation and real-time communication and collaboration platform
TW201947924A (en) Incoming call processing method and device, intelligent sound box, and storage medium
US9525979B2 (en) Remote control of separate audio streams with audio authentication
CN113612759A (en) High-performance high-concurrency intelligent broadcasting system based on SIP protocol and implementation method
CN113923395A (en) Method, equipment and storage medium for improving conference quality
CN112543302A (en) Intelligent noise reduction method and equipment in multi-person teleconference
TWM583589U (en) System for processing voice command
JP6392161B2 (en) Audio conference system, audio conference apparatus, method and program thereof

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 14891754

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 14891754

Country of ref document: EP

Kind code of ref document: A1