WO2020244317A1 - 会议控制的实现方法、装置和服务器 - Google Patents

会议控制的实现方法、装置和服务器 Download PDF

Info

Publication number
WO2020244317A1
WO2020244317A1 PCT/CN2020/085482 CN2020085482W WO2020244317A1 WO 2020244317 A1 WO2020244317 A1 WO 2020244317A1 CN 2020085482 W CN2020085482 W CN 2020085482W WO 2020244317 A1 WO2020244317 A1 WO 2020244317A1
Authority
WO
WIPO (PCT)
Prior art keywords
conference
application server
chairman
audio
media server
Prior art date
Application number
PCT/CN2020/085482
Other languages
English (en)
French (fr)
Inventor
魏学松
李富
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Priority to EP20818931.6A priority Critical patent/EP3979560A4/en
Publication of WO2020244317A1 publication Critical patent/WO2020244317A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • H04N7/15Conference systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems
    • G10L15/30Distributed recognition, e.g. in client-server systems, for mobile phones or network applications
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L12/00Data switching networks
    • H04L12/02Details
    • H04L12/16Arrangements for providing special services to substations
    • H04L12/18Arrangements for providing special services to substations for broadcast or conference, e.g. multicast
    • H04L12/1813Arrangements for providing special services to substations for broadcast or conference, e.g. multicast for computer conferences, e.g. chat rooms
    • H04L12/1822Conducting the conference, e.g. admission, detection, selection or grouping of participants, correlating users to one or more conference sessions, prioritising transmission
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L65/00Network arrangements, protocols or services for supporting real-time applications in data packet communication
    • H04L65/10Architectures or entities
    • H04L65/1063Application servers providing network services
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L65/00Network arrangements, protocols or services for supporting real-time applications in data packet communication
    • H04L65/1066Session management
    • H04L65/1083In-session procedures
    • H04L65/1093In-session procedures by adding participants; by removing participants
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L65/00Network arrangements, protocols or services for supporting real-time applications in data packet communication
    • H04L65/40Support for services or applications
    • H04L65/403Arrangements for multi-party communication, e.g. for conferences
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L65/00Network arrangements, protocols or services for supporting real-time applications in data packet communication
    • H04L65/60Network streaming of media packets
    • H04L65/75Media network packet handling
    • H04L65/765Media network packet handling intermediate
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L12/00Data switching networks
    • H04L12/02Details
    • H04L12/16Arrangements for providing special services to substations
    • H04L12/18Arrangements for providing special services to substations for broadcast or conference, e.g. multicast
    • H04L12/1813Arrangements for providing special services to substations for broadcast or conference, e.g. multicast for computer conferences, e.g. chat rooms
    • H04L12/1827Network arrangements for conference optimisation or adaptation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L51/00User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail
    • H04L51/06Message adaptation to terminal or network requirements
    • H04L51/066Format adaptation, e.g. format conversion or compression

Definitions

  • This article relates to but is not limited to a method, device, conference application server, media server, and computer-readable storage medium for meeting control.
  • RCS business namely Rich Communication Suite, converged communication business, refers to the original three main entrances of "call”, “message”, and “contact” in mobile phones, which are directly integrated with mobile phones from the operator level to become new Calls, new messages, new contacts.
  • multi-party call service and multi-party video service are important services in the RCS service.
  • the multi-party call service especially the multi-party call service realized by the conference mode, it generally includes the conference control operation (referred to as the conference control operation), that is, the conference control operation is supported before or during the conference, especially the conference control in the conference.
  • Operations including: joining, kicking, member mute, unmute, member recording, etc., are all performed on the interface of APP (Application, mobile terminal application) or H5 (HTML5, the fifth modification of Hypertext Markup Language)
  • the conference control operation is generally performed by the mobile phone APP or H5 client by sending HTTP (HyperText Transfer Protocol)) requests to the server for operation.
  • HTTP HyperText Transfer Protocol
  • the embodiments of the present application provide a method and device for implementing conference control, a conference application server, a media server, and a computer-readable storage medium.
  • the embodiment of the present application provides a method for implementing conference control, including: during a multi-party conference, the conference application server receives a conference control request from the chairperson; the conference application server instructs the media server to perform audio processing on the audio sent by the chairperson. Voice recognition; the conference application server obtains text information corresponding to the audio from the media server; the conference application server performs conference control operations according to the text information.
  • An embodiment of the present application also provides a method for implementing conference control, including: during a multi-party conference, a media server receives an instruction from the conference application server to perform voice recognition on the audio of the chairman; the media server obtains the information sent by the chairman. Audio: Acquire text information corresponding to the audio of the chairman end through automatic speech recognition; the media server sends the text information to the conference application server, so that the conference application server performs conference control operations according to the text information .
  • the embodiment of the present application also provides a device for implementing conference control, including: a first receiving module, used to receive a conference control request from the chairperson during a multi-party conference; an indication module, used to instruct the media server to respond to the chairperson The sent audio performs voice recognition; the first obtaining module is used to obtain the text information corresponding to the audio from the media server; the control module is used to perform conference control operations according to the text information.
  • the embodiment of the present application also provides a device for realizing conference control, including: a second receiving module, which is used to receive an instruction from the conference application server to perform voice recognition on the audio of the chairperson during a multi-party conference; the second acquiring module uses After acquiring the audio sent by the chairman end, the text information corresponding to the audio of the chairman end is obtained through automatic speech recognition; a sending module is used to send the text information to the conference application server, so that the conference application server Perform conference control operations according to the text information.
  • An embodiment of the present application also provides a conference application server, including: a memory, a processor, and a computer program stored in the memory and capable of running on the processor, and the processor implements the conference control when the program is executed method.
  • An embodiment of the present application further provides a media server, including: a memory, a processor, and a computer program stored in the memory and capable of running on the processor, and the method for implementing the conference control when the processor executes the program .
  • An embodiment of the present application also provides a computer-readable storage medium that stores computer-executable instructions, and the computer-executable instructions are used to execute the method for implementing the conference control.
  • FIG. 1 is a flowchart of a method for implementing conference control in an embodiment of the present application
  • FIG. 2 is a flowchart of a method for implementing conference control in an embodiment of the present application (applied to a conference application server);
  • FIG. 3 is a flowchart of a method for implementing conference control in an embodiment of the present application (applied to a media server);
  • Figure 4 is a flowchart of a method for implementing conference control in an application example of this application
  • Fig. 5 is a flowchart of voice operation of an application example of this application.
  • Fig. 6 is a flowchart of voice matching recognition in an application example of this application.
  • FIG. 7 is a schematic diagram of the composition of a device for implementing conference control in an embodiment of the present application (applied to a conference application server);
  • FIG. 8 is a schematic diagram of the composition of a device for implementing conference control in an embodiment of the present application (applied to a media server);
  • FIG. 9 is a schematic diagram of the composition of a conference application server according to an embodiment of the present application.
  • FIG. 10 is a schematic diagram of the composition of a media server according to an embodiment of the present application.
  • the embodiment of the application provides a method for implementing conference control, so that voice interaction is used in the conference service of the mobile terminal to realize the conference control function, which is particularly suitable for the voice conference service.
  • the conference can be controlled by speaking.
  • the conference control operation cannot be performed in VoLTE without conference control interface or the business network is disconnected, or the interruption of the switching conference control call, it will also improve the user experience.
  • the method for implementing conference control in the embodiment of the present application includes:
  • Step 101 During a multi-party conference, the conference application server receives a conference control request from the chairperson.
  • the conference control request is a preset key press message.
  • the chairman end presses any key to enter the voice conference control, or it can be stipulated to press a certain key, such as pressing the "*" key, which means to enter the voice conference control.
  • the conference application server performs an operation of playing and collecting numbers on the chairman end.
  • the operation of the conference application server on the chairman's end to play and collect numbers includes:
  • the chairman's end can be played and collected.
  • the main function of the playback and collection operation is to realize the user's button collection operation, so the played voice notification is a paragraph The user cannot hear the empty tone.
  • the number receiving position can be set to 1 digit in length.
  • the timeout period of the playing and collecting number operation is very long, and can be set to one day. In this way, the number receiving operation will always continue in the meeting.
  • you want to perform voice conference control you will press "*" to notify the conference application server to enter the voice conference control stage, and the system will convert and recognize the voice.
  • the voice conference control command After pressing "*", the voice conference control command will be performed Recognition is a relatively safe operation for users, rather than recording, forwarding, and identifying the content of the user’s call during the entire conference. This is equivalent to the entire process of monitoring the content of the call, and the user may be leaked. Privacy risk, and the conference control performed after the button is pressed are all conference control instructions, which do not involve the privacy of the user's conversation, and are relatively safer.
  • the conference application server receives the preset key-press message, and may perform system playback, for example, a "boom" sound, which indicates that the request is received and the voice conference control is in the stage of collecting user voice.
  • Step 102 The conference application server instructs the media server to perform voice recognition on the audio sent by the chairman.
  • the conference application server can send an info message to the media server, instructing the chairman to perform voice recognition.
  • Option 1 Perform voice recognition directly on the chairman terminal
  • Option 2 The chairperson is isolated from the original conference site, so that the chairperson’s voice conference control operations cannot be heard by other members
  • Method 1 Isolate the voice of the chairman, which is equivalent to logical isolation. Mute the chairman's end first, so that the uplink voice of the chairman is not mixed into the conference site and other members of the conference site cannot hear it. Then perform voice recognition operations on the chairman. The chairman's voice recognition function is realized without affecting the original conference venue.
  • the conference application server instructs the media server to perform a mute operation on the chairman end.
  • the media server performs a mute operation on the chairman terminal according to an instruction of the conference application server.
  • Method 2 Create a new sub-venue, transfer the chairperson from the original site to the sub-venue, and perform voice recognition on the audio of the sub-venue. If the playback operation is performed, it is fine to play in the sub-venue. There is only one member on the chairman side, so that the two sub-venues are physically separated, and the chairman's voice conference control will not interfere with each other.
  • the conference application server establishes a sub-venue, and transfers the chairman end from the main site of the multi-party conference to the sub-venue.
  • the conference application server instructs the media server to perform voice recognition on the audio of the sub-venue.
  • the chairperson can start voice conference control, for example: "Mute 13770335897 number” similar conference control instruction.
  • Step 103 The media server obtains the audio sent by the chairman end, and obtains text information corresponding to the audio of the chairman end through automatic speech recognition.
  • the media server obtains the audio sent by the chairman end, and forwards the audio to an automatic speech recognition device; the media server obtains text information corresponding to the audio from the automatic speech recognition device.
  • the media server can forward the audio sent by the chairman end, and call the voice recognition system interface of the public cloud, such as the voice recognition system of iFLYTEK, to perform automatic voice recognition.
  • the voice recognition system interface of the public cloud such as the voice recognition system of iFLYTEK
  • Step 104 The media server sends the text information to the conference application server.
  • the media server may respond to the conference application server by responding to the event via info with the text information.
  • Step 105 The conference application server performs conference control operations according to the text information.
  • the conference application server performs text matching and recognition on the text information, and when the matching is successful, performs a conference control operation on the members in the multi-party conference according to the recognized content.
  • the application server can perform keyword recognition on text information, such as: please join the number 12313981988, add the number 12369856986 to the meeting, etc.
  • the recognition keywords are: add people keywords: join/pull in/invite, xxxxxx, optional words: ⁇ Conference ⁇ Enter ⁇ , other conference controls, including kicking people, mute members, unmute, recording, etc., all have corresponding semantic recognition matching. After matching, perform conference control operations. After the operation is successful, you can perform voice prompts" The request was successful". If the recognition fails, you can play "Unrecognized", if the conference control fails, you can play "Request failed” and other announcements.
  • voice recognition can have certain error redundancy.
  • the recognized keyword reaches the keyword of a certain instruction, it enters the conference control operation instruction stage to perform conference control operations on a member.
  • the chairman Play the operation result tone and wait for the chairman to confirm. If the chairman does not confirm, it will be processed according to the system default, and the operation request of the chairman will be executed, or the request failed to return to the original conference site.
  • IVR Interactive Voice Response
  • the conference application server performs text matching and recognition on the text information based on the pre-acquired address book of the chairman end.
  • the chairman terminal may be troublesome to use the telephone number to control the operation of the members during the operation.
  • the chairman terminal can import the user's address book to the meeting application server in advance to realize the direct access to the names in the address book Voice conference control is more convenient than direct number operation. If you do this, in the final text recognition process, search the chairman's personal address book and extract the corresponding number for operation.
  • the conference control operation is realized by voice, which avoids the situation that the conference control operation cannot be performed in the VoLTE without the conference control interface or the business network is disconnected, or the switching conference control call is interrupted, and the user experience is improved. , Make multi-party conferences more intelligent.
  • the method for implementing conference control in the embodiment of the present application, applied to a conference application server includes:
  • Step 201 During a multi-party conference, the conference application server receives a conference control request from the chairperson.
  • the conference control request is a preset key press message.
  • the chairman end presses any key to enter the voice conference control, or it can be stipulated to press a certain key, such as pressing the "*" key, which means to enter the voice conference control.
  • the conference application server performs an operation of playing and collecting numbers on the chairman end.
  • the operation of the conference application server on the chairman's end to play and collect numbers includes:
  • the chairman's end can be played and collected.
  • the main function of the playback and collection operation is to realize the user's button collection operation, so the played voice notification is a paragraph The user cannot hear the empty tone.
  • the number receiving position can be set to 1 digit in length.
  • the timeout period of the playing and collecting number operation is very long, and can be set to one day. In this way, the number receiving operation will always continue in the meeting.
  • you want to perform voice conference control you will press "*" to notify the conference application server to enter the voice conference control stage, and the system will convert and recognize the voice.
  • the voice conference control command After pressing "*", the voice conference control command will be performed Recognition is a relatively safe operation for users, rather than recording, forwarding, and identifying the content of the user’s call during the entire conference. This is equivalent to the entire process of monitoring the content of the call, and the user may be leaked. Privacy risk, and the conference control performed after the button is pressed are all conference control instructions, which do not involve the privacy of the user's conversation, and are relatively safer.
  • the conference application server receives the preset key-press message, and may perform system playback, for example, a "boom" sound, which indicates that the request is received and the voice conference control is in the stage of collecting user voice.
  • Step 202 The conference application server instructs the media server to perform voice recognition on the audio sent by the chairman.
  • the conference application server can send an info message to the media server, instructing the chairman to perform voice recognition.
  • Option 1 Perform voice recognition directly on the chairman terminal
  • Option 2 The chairperson is isolated from the original conference site, so that the chairperson’s voice conference control operations cannot be heard by other members
  • Method 1 Isolate the voice of the chairman, which is equivalent to logical isolation. Mute the chairman's end first, so that the uplink voice of the chairman is not mixed into the conference site and other members of the conference site cannot hear it. Then perform voice recognition operations on the chairman. The chairman's voice recognition function is realized without affecting the original conference venue.
  • the conference application server instructs the media server to perform a mute operation on the chairman end.
  • Method 2 Create a new sub-venue, transfer the chairperson from the original site to the sub-venue, and perform voice recognition on the audio of the sub-venue. If the playback operation is performed, it is fine to play in the sub-venue. There is only one member on the chairman side, so that the two sub-venues are physically separated, and the chairman's voice conference control will not interfere with each other.
  • the conference application server establishes a sub-venue, and transfers the chairman end from the main site of the multi-party conference to the sub-venue.
  • the conference application server instructs the media server to perform voice recognition on the audio of the sub-venue.
  • the chairperson can start voice conference control, for example: "Mute 13770335897 number” similar conference control instruction.
  • Step 203 The conference application server obtains text information corresponding to the audio from the media server.
  • the conference application server receives an event response from the media server info, and the response carries the text information.
  • Step 204 The conference application server performs conference control operations according to the text information.
  • the conference application server performs text matching and recognition on the text information, and when the matching is successful, performs a conference control operation on the members in the multi-party conference according to the recognized content.
  • the application server can perform keyword recognition on text information, such as: please join the number 12313981988, add the number 12369856986 to the meeting, etc.
  • the recognition keywords are: add people keywords: join/pull in/invite, xxxxxx, optional words: ⁇ Conference ⁇ Enter ⁇ , other conference controls, including kicking people, mute members, unmute, recording, etc., all have corresponding semantic recognition matching. After matching, perform conference control operations. After the operation is successful, you can perform voice prompts" The request was successful". If the recognition fails, you can play "Unrecognized", if the conference control fails, you can play "Request failed” and other announcements.
  • voice recognition can have certain error redundancy.
  • the recognized keyword reaches the keyword of a certain instruction, it enters the conference control operation instruction stage to perform conference control operations on a member.
  • the chairman Play the operation result tone and wait for the chairman to confirm. If the chairman does not confirm, it will be processed according to the system default, and the operation request of the chairman will be executed, or the request failed to return to the original conference site.
  • IVR Interactive Voice Response
  • the conference application server performs text matching and recognition on the text information based on the pre-acquired address book of the chairman end.
  • the chairman terminal may be troublesome to use the telephone number to control the operation of the members during the operation.
  • the chairman terminal can import the user's address book to the meeting application server in advance to realize the direct access to the names in the address book Voice conference control is more convenient than direct number operation. If you do this, in the final text recognition process, search the chairman's personal address book and extract the corresponding number for operation.
  • the method for implementing conference control in the embodiment of the present application, applied to a media server includes:
  • Step 301 During a multi-party conference, the media server receives an instruction from the conference application server to perform voice recognition on the audio of the chairman.
  • the media server may learn the instruction according to the info message issued by the conference application server.
  • Option 1 Perform voice recognition directly on the chairman terminal
  • Option 2 Separate the chairperson from the original conference site, so that the chairperson’s voice conference control operations cannot be heard by other members
  • Method 1 Isolate the voice of the chairman, which is equivalent to logical isolation. Mute the chairman's end first, so that the uplink voice of the chairman is not mixed into the conference site and other members of the conference site cannot hear it. Then perform voice recognition operations on the chairman. The chairman's voice recognition function is realized without affecting the original conference venue.
  • the media server performs a mute operation on the chairman terminal according to an instruction of the conference application server.
  • Method 2 Create a new sub-venue, transfer the chairperson from the original site to the sub-venue, and perform voice recognition on the audio of the sub-venue. If the playback operation is performed, it is fine to play in the sub-venue. There is only one member on the chairman side, so that the two sub-venues are physically separated, and the chairman's voice conference control will not interfere with each other.
  • the media server receives an instruction from the conference application server to perform voice recognition on the audio of the sub-venue of the multi-party conference, wherein the chairman end is located in the sub-venue.
  • the chairperson can start voice conference control, for example: "Mute 13770335897 number” similar conference control instruction.
  • Step 302 The media server obtains the audio sent by the chairman end, and obtains text information corresponding to the audio of the chairman end through automatic speech recognition.
  • the media server obtains the audio sent by the chairman end, and forwards the audio to an automatic speech recognition device; the media server obtains text information corresponding to the audio from the automatic speech recognition device.
  • the media server can forward the audio sent by the chairman end, and call the voice recognition system interface of the public cloud, such as the voice recognition system of iFLYTEK, to perform automatic voice recognition.
  • the voice recognition system interface of the public cloud such as the voice recognition system of iFLYTEK
  • Step 303 The media server sends the text information to the conference application server, so that the conference application server performs a conference control operation according to the text information.
  • the media server can respond to the conference application server by responding to the event via info with the text information.
  • the embodiments of this application are mainly based on the conference service under the RCS service architecture, taking the voice conference service in the CS (Circuit Switched) domain as an example (VoLTE multi-party voice and video are also applicable), as shown in Figure 4 and Figure 5. :
  • Step 401 UE (User Equipment) receives input from the user using the mobile phone APP or H5 interface, initiates a conference call, and sends an HTTP conference creation request to the conference application server according to the members selected to participate in the conference, and the conference application server receives the request. After the authentication is passed, the conference starts to be created.
  • the call back method is used to first call the chairman and then call each member. When the chairman's phone is connected, if the chairman does not sign the VoLTE service, the data network of the chairman will be disconnected. The terminal will not be able to use the mobile phone to perform conference control operations on the APP or H5 interface.
  • Step 402 After the conference is created, the conference application server performs P&C (playing and collecting numbers) operation on the chairman side, the main purpose is to collect the numbers, and the playing and playing are empty tones (the sound number exists, but the sound file is a small blank tone. ), the user will not hear or perceive.
  • P&C playing and collecting numbers
  • the main purpose is to collect the numbers, and the playing and playing are empty tones (the sound number exists, but the sound file is a small blank tone. ), the user will not hear or perceive.
  • the chairperson performs a key operation, the number will be received into the system.
  • the chairperson chooses to control the conference by voice, first press the "*" key (it can also be defined as pressing any key ), the conference system will receive the user's * button, and after recognizing that the user presses the * button, it will start to enter the chairman's voice recognition and conference control process.
  • Step 403 The conference application server sends an info message to the media server, instructing the media server to perform a mute operation on the chairman end.
  • Step 404 The media server returns a 200 OK confirmation message to the conference application server.
  • Step 405 The conference application server sends an info message to the media server, instructing the media server to perform voice recognition on the chairperson.
  • Step 406 The media server returns a 200 OK confirmation message to the conference application server.
  • Step 407 Play the scheduled conference control on the chairman side, and play a "boom" sound, which means that it has entered the voice conference control process and waits for the user to input voice. After the system plays the prompt sound, the voice recognition link on the chairman side starts.
  • the voice recognition link when entering the voice recognition link, it mainly records and recognizes the uplink voice of the chairman's end.
  • the first solution is simpler, but the experience is slightly worse.
  • the second and third solutions have much better experience.
  • the main control terminal is isolated, so that the conference control sound will not be added to the original conference system, so it will not interfere with the original conference venue.
  • Solution 1 Record and recognize the uplink voice of the chairman directly, and forward the uplink voice of the chairman to the automatic speech recognition system for recognition and conversion.
  • the disadvantage of this solution is that the conference control operation voice of the chairman can be heard by other members. It is a kind of interference to the original meeting, and the privacy is not good.
  • Solution 2 The conference application server first sends the conference control operation to the media server to mute the chairman's side, so that the chairman's voice will not be mixed into the conference bridge and will not be heard by other members, and then sends instructions to the media server , Perform voice recognition operation on this terminal, so that other member terminals will not hear the conference control operation of the chairman terminal. This kind of operation is more convenient.
  • the scheme adopted in Figure 4 is scheme 2.
  • Solution 3 Use two conference site technology, and then establish a sub-conference. After establishing a sub-conference, remove the chairperson from the original conference first, and then join the sub-conference, so that the sub-conference only includes one chairperson , Perform recording and voice recognition operations on the entire conference site in the sub-conference. After the conference control operation is completed, remove the chairman from the sub-conference and join the original main conference. In this way, the conference control operation of the chairman is isolated from the original conference, and the original conference is not affected.
  • Step 408 The chairman sends the voice media stream to the media server.
  • the media server processes the audio input of this channel after receiving the meeting application server's uplink voice recognition request from the chairman's end, starts recording, and uses the http method to input the recorded audio file stream into the voice recognition system.
  • Step 410 The voice recognition system converts the voice into text, and after the recognition is completed, returns the result to the media server.
  • Step 411 The media server returns a status report to the conference application server, and the returned report includes the result of voice recognition and the recognized voice and text.
  • Step 412 After receiving the voice processing status report, the conference application server starts to recognize the text therein, plays the notification sound about to be operated after the recognition is completed, and executes the corresponding conference control operation.
  • matching and recognizing characters may include:
  • Step 501 Recognizing and decomposing context characters and words.
  • Step 502 process keyword identification, in which only conference control keywords can be identified, such as: joining process, identifying keywords: add, join, add, add, pull in, pull in, call in, call up Wait, identify which process is the conference control.
  • Step 503 Identify the digital number to be operated, such as the process of adding people.
  • the identified number is the number to be added. You can add multiple numbers at a time. The numbers are separated by spaces or separated by other files. Other conference control processes similar.
  • Step 504 After the conference control process and key information are identified, voice broadcast and conference control processing are performed.
  • the chairman terminal will resume isolation operation and cancel the recording, rejoin the main conference room, unmute the chairman terminal, cancel the recording recognition, and return to the conference process.
  • the embodiment of the present application provides a mobile terminal multi-party call service that uses voice for conference control operations, including voice conference services in the CS domain, and multi-party video and voice services in the VoLTE service.
  • This embodiment can be used. It is especially suitable for scenarios where the user does not operate the conference control interface or has an interface but is inconvenient for conference control operations.
  • the natural voice of the chairman terminal is used for conference control to realize conference control operations such as joining, kicking, and mute.
  • the embodiment can improve the user's operating experience, make the voice call more intelligent, and better avoid the situation of conference control meetings that cannot be operated using the interface.
  • an embodiment of the present application also provides a device for implementing conference control, which is applied to a conference application server, and includes:
  • the first receiving module 601 is configured to receive a conference control request from the chairperson during a multi-party conference
  • the instruction module 602 is used to instruct the media server to perform voice recognition on the audio sent by the chairman terminal;
  • the first obtaining module 603 is configured to obtain text information corresponding to the audio from the media server;
  • the control module 604 is configured to perform conference control operations according to the text information.
  • an embodiment of the present application also provides a device for implementing conference control, which is applied to a media server, and includes:
  • the second receiving module 701 is configured to receive an instruction from the conference application server to perform voice recognition on the audio of the chairperson during a multi-party conference;
  • the second acquisition module 702 is configured to acquire the audio sent by the chairman terminal, and acquire text information corresponding to the audio of the chairman terminal through automatic speech recognition;
  • the sending module 703 is configured to send the text information to the conference application server, so that the conference application server performs a conference control operation according to the text information.
  • an embodiment of the present application further provides a conference application server, including: a memory 801, a processor 802, and a computer program 803 stored in the memory 801 and running on the processor 802.
  • the processor 802 The realization method of the conference control is realized when the program is executed.
  • an embodiment of the present application also provides a media server, including: a memory 901, a processor 902, and a computer program 903 stored in the memory 901 and running on the processor 902, and the processor 902 executes
  • the program is an implementation method for realizing the conference control.
  • An embodiment of the present application also provides a computer-readable storage medium that stores computer-executable instructions, and the computer-executable instructions are used to execute the method for implementing the conference control.
  • the foregoing storage medium may include, but is not limited to: U disk, Read-Only Memory (ROM), Random Access Memory (RAM, Random Access Memory), mobile hard disk, magnetic disk or optical disk, etc.
  • U disk Read-Only Memory
  • RAM Random Access Memory
  • RAM Random Access Memory
  • mobile hard disk magnetic disk or optical disk, etc.
  • Such software may be distributed on a computer-readable medium
  • the computer-readable medium may include a computer storage medium (or non-transitory medium) and a communication medium (or transitory medium).
  • the term computer storage medium includes volatile and non-volatile memory implemented in any method or technology for storing information (such as computer-readable instructions, data structures, program modules, or other data).
  • Computer storage media include but are not limited to RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disk (DVD) or other optical disk storage, magnetic cassette, tape, magnetic disk storage or other magnetic storage device, or Any other medium used to store desired information and that can be accessed by a computer.
  • communication media usually contain computer-readable instructions, data structures, program modules, or other data in a modulated data signal such as carrier waves or other transmission mechanisms, and may include any information delivery media .

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • General Business, Economics & Management (AREA)
  • Business, Economics & Management (AREA)
  • General Engineering & Computer Science (AREA)
  • Telephonic Communication Services (AREA)

Abstract

一种会议控制的实现方法、装置、会议应用服务器、媒体服务器和计算机可读存储介质,所述方法包括:在多方会议过程中,会议应用服务器接收来自主席端的会议控制请求;所述会议应用服务器指示媒体服务器对所述主席端发送的音频进行语音识别;所述会议应用服务器从所述媒体服务器获取所述音频对应的文字信息;所述会议应用服务器按照所述文字信息进行会议控制操作。

Description

会议控制的实现方法、装置和服务器
相关申请的交叉引用
本申请基于申请号为201910479990.4、申请日为2019年6月4日的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此以引入方式并入本申请。
技术领域
本文涉及但不限于一种会议控制的实现方法、装置、会议应用服务器、媒体服务器和计算机可读存储介质。
背景技术
RCS业务,即Rich Communication Suite,融合通信业务,是指把手机中原有的“通话”、“消息”、“联系人”这3个主要入口,从运营商的层面直接与手机整合,变为新通话、新消息、新联系。其中多方通话业务和多方视频业务,是RCS业务中重要的业务。
在多方通话业务中,特别是会议方式实现的多方通话业务,一般都会包括会议控制操作(简称会控操作),即在会议前或会议中都支持进行会控操作,特别是会议中的会控操作,包括:加入、踢人、成员静音、取消静音、成员录音等操作,都在APP(Application,移动端应用程序)或H5(HTML5,应用超文本标记语言第五次修改)的界面上进行会控操作,一般是手机端APP或H5客户端采用发送HTTP(HyperText Transfer Protocol,超文本传输协议))请求到服务端进行操作。
但是对于非VoLTE(Voice over LTE,长期演进语音承载)手机用户,这个会控操作会面临一个问题,即用户在没有连接WiFi(wireless fidelity,基于IEEE 802.11标准的无线局域网技术)情况下,数据网路在接通电话后会断网,这个情况下,是无法进行会控操作。另外,在VoLTE的多方语音和视频业务中,也没有APP用户界面进行会控操作。而且,即使用户有数据网路,能进行会控操作,如果用户在通话中要进行画面会控操作,实际上也要从语音通话到会控界面的切换操作,操作体验上也不太连贯。
发明内容
以下是对本文详细描述的主题的概述。本概述并非是为了限制权利要求的保护范围。
本申请实施例提供了一种会议控制的实现方法、装置、会议应用服务器、媒体服务器和计算机可读存储介质。
本申请实施例提供了一种会议控制的实现方法,包括:在多方会议过程中,会议应用服务器接收来自主席端的会议控制请求;所述会议应用服务器指示媒体服务器对所述主席端发送的音频进行语音识别;所述会议应用服务器从所述媒体服务器获取所述音频对应的文字信息;所述会议应用服务器按照所述文字信息进行会议控制操作。
本申请实施例还提供一种会议控制的实现方法,包括:在多方会议的过程中,媒体服务器接收会议应用服务器对主席端的音频进行语音识别的指示;所述媒体服务器获取所述主席端发送的音频,通过自动语音识别获取所述主席端的音频对应的文字信息;所述媒体服务器将所述文字信息发送至所述会议应用服务器,以使所述会议应用服务器按照所述文字信息进行会议控制操作。
本申请实施例还提供一种会议控制的实现装置,包括:第一接收模块,用于在多方会议过程中,接收来自主席端的会议控制请求;指示模块,用于指示媒体服务器对所述主席端发送的音频进行语音识别;第一获取模块,用于从所述媒体服务器获取所述音频对应的文字信息;控制模块,用于按照所述文字信息进行会议控制操作。
本申请实施例还提供一种会议控制的实现装置,包括:第二接收模块,用于在多方会议的过程中,接收会议应用服务器对主席端的音频进行语音识别的指示;第二获取模块,用于获取所述主席端发送的音频,通过自动语音识别获 取所述主席端的音频对应的文字信息;发送模块,用于将所述文字信息发送至所述会议应用服务器,以使所述会议应用服务器按照所述文字信息进行会议控制操作。
本申请实施例还提供一种会议应用服务器,包括:存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述程序时实现所述会议控制的实现方法。
本申请实施例还提供一种媒体服务器,包括:存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述程序时实现所述会议控制的实现方法。
本申请实施例还提供一种计算机可读存储介质,存储有计算机可执行指令,所述计算机可执行指令用于执行所述会议控制的实现方法。
在阅读并理解了附图和详细描述后,可以明白其他方面。
附图说明
图1是本申请实施例的会议控制的实现方法的流程图;
图2是本申请实施例的会议控制的实现方法的流程图(应用于会议应用服务器);
图3是本申请实施例的会议控制的实现方法的流程图(应用于媒体服务器);
图4是本申请应用实例的会议控制的实现方法的流程图;
图5是本申请应用实例的语音操作的流程图;
图6是本申请应用实例的语音匹配识别的流程图;
图7是本申请实施例的会议控制的实现装置的组成示意图(应用于会议应用服务器);
图8是本申请实施例的会议控制的实现装置的组成示意图(应用于媒体服务器);
图9是本申请实施例的会议应用服务器的组成示意图;
图10是本申请实施例的媒体服务器的组成示意图。
具体实施方式
下文中将结合附图对本申请的实施例进行详细说明。
在附图的流程图示出的步骤可以在诸如一组计算机可执行指令的计算机系统中执行。并且,虽然在流程图中示出了逻辑顺序,但是在某些情况下,可以以不同于此处的顺序执行所示出或描述的步骤。
本申请实施例提供一种会议控制的实现方法,使得移动端的会议业务中采用语音交互来实现会控的功能,特别适合在语音会议业务中,在这种场景下,通过说就可以会控,避免如在VoLTE无会控界面操作或断开业务网路无法进行会控操作,或切换会控通话中断的情况,同时也会提升用户的使用感受。
如图1所示,本申请实施例的会议控制的实现方法,包括:
步骤101,在多方会议过程中,会议应用服务器接收来自主席端的会议控制请求。
其中,多方会议中,仅主席端可进行会控操作。
在一实施例中,所述会议控制请求为预设的按键消息。
例如,可以规定在多方会议过程中,主席端按任意键,即进入语音会控,也可以规定按某个指定按键,比如按下“*”键,表示进入语音会控。
为了接收到该按键消息,在一实施例中,步骤101之前,所述会议应用服务器对所述主席端进行放音收号操作。
在一实施例中,所述会议应用服务器对所述主席端进行放音收号操作,包括:
对所述主席端播放空音,并识别所述预设的按键消息。
本申请实施例中,可以在创建会议成功后,对主席端进行放音收号操作,该放音收号操作,主要功能是实现对用户的按键实现收号操作,所以播放的语音通知为一段空音,用户听不到,收号位可以设置长度为1位,放音收号操作的超时时间非常长,可以设置为一天时间,这样收号操作,将一直在会议中持续,这样主席在讲话过程中,如果要进行语音会控,会按键“*”,通知会议应用服务器,进入到语音会控阶段,系统对语音进行转换和识别工作,按“*”后,进行语音会控指令的识别,对用户来说是一种相对安全的操作,而不是在全程会 议中,对用户的通话内容进行录制、转发和识别,这对用户而言相当于监控通话内容的全过程,有泄露用户隐私风险,而按键后进行的会控,都是会议控制指令,不涉及用户谈话隐私,要相对安全的多。
在一实施例中,所述会议应用服务器接收到所述预设的按键消息,可以进行系统放音例如播放“咚”的一声,表示收到请求,进入到语音会控收集用户语音阶段。
步骤102,所述会议应用服务器指示媒体服务器对所述主席端发送的音频进行语音识别。
其中,会议应用服务器可以下发info消息给媒体服务器,指示对主席端进行语音识别。
其中,进行主席端的语音识别,有2种实现方案:
方案一:直接对主席端进行语音识别
直接对主席端进行语音识别,技术上实现起来较为方便,但是,由于主席还在会场中,语音会控会被会场中的其他成员听到,这样用户体验不太好。
方案二:主席和原来的会场隔离开来,使得主席的语音会控操作,其他成员无法听到
这种方案这样保护了主席的操作因私,也对原有会议没有干扰。
方案二实现过程中,有两种实现方式:
方式一:对主席语音隔离方式,相当于逻辑隔离,先对主席端进行静音操作,使得主席的上行声音不在混入到会场中,会场其他成员无法听到,然后对主席进行语音识别操作,这样就实现了主席语音识别功能,并且不会影响原有的会场。
这种实现方式中,步骤102之前,所述会议应用服务器指示所述媒体服务器对所述主席端执行静音操作。所述媒体服务器根据所述会议应用服务器指示,对所述主席端执行静音操作。
方式二:新建立一个子会场,将主席从原来的会场转移到该子会场,对该子会场的音频进行语音识别,如果放音操作,也是对子会场中进行放音就可以,因为该会场只有主席端一个成员,这样两个子会场,实现物理隔离,主席的语 音会控,不会相互干扰。
这种实现方式中,步骤102之前,所述会议应用服务器建立子会场,将所述主席端从所述多方会议的主会场转移至所述子会场。步骤102中,所述会议应用服务器指示所述媒体服务器对所述子会场的音频进行语音识别。
在本步骤中,主席端可以开始进行语音会控,比如说:“把13770335897号码给静音掉”类似的会控指令。
步骤103,所述媒体服务器获取所述主席端发送的音频,通过自动语音识别获取所述主席端的音频对应的文字信息。
在一实施例中,所述媒体服务器获取所述主席端发送的音频,将所述音频转发至自动语音识别设备;所述媒体服务器从所述自动语音识别设备获取所述音频对应的文字信息。
其中,所述媒体服务器可以把主席端发送的音频进行转发,调用公有云的语音识别系统接口,如科大讯飞的语音识别系统,进行自动语音识别。
步骤104,所述媒体服务器将所述文字信息发送至所述会议应用服务器。
其中,媒体服务器可以将该文字信息,通过info响应事件响应给会议应用服务器。
步骤105,所述会议应用服务器按照所述文字信息进行会议控制操作。
在一实施例中,所述会议应用服务器将所述文字信息进行文字匹配和识别,在匹配成功时,根据识别的内容对所述多方会议中的成员进行会议控制操作。
其中,应用服务器可以对文字信息进行关键词识别,比如:请加入号码12313981988,把号码12369856986加入会议等,识别关键词有:加人关键词:加入/拉进/邀请,xxxxxx,可选词:{会议}{进入},其他会控,包括踢人,成员静音,取消静音,录音等,都有对应的语义识别匹配,当匹配后,进行会控操作,操作成功后,可以进行语音提示“请求成功”。如果识别失败,则可以播放“无法识别”,如果会控失败,则播放“请求失败”等放音。
另外,允许语音识别可以有一定的错误冗余,当识别出的关键字达到某条指令的关键字后,进入会控操作指令阶段,对某个成员进行会控操作,操作完毕,对主席端播放操作结果音,等主席端确认,如果主席端不确认,则按系统 默认处理,执行主席端的操作请求,或者请求失败返回原会场。
在语义识别后,也可以通过IVR(Interactive Voice Response,互动式语音应答)方式播放下识别的关键词,比如:您要求加入:1235689785,取消请按1,操作完成后将会直接返回到会议中,如果是视频会议,媒体服务器也可以把识别到的语音文字,直接合成叠加到用户看到的视频界面上。这样的用户体验会更好。
在一实施例中,所述会议应用服务器基于预先获取的所述主席端的通讯录,对所述文字信息进行文字匹配和识别。
为方便用户的语音操作,主席端在操作中,使用电话号码对成员的会控操作有可能比较麻烦,主席端可以提前导入用户的通讯录到会议应用服务器,实现直接对通讯录中的姓名进行语音会控,这样比直接对号码操作要方便,如果这样操作的话,则在最后的文字识别流程,搜索主席端的个人通讯录,提取对应的号码进行操作。
通过本申请实施例,通过语音的方式实现会控操作,避免了在VoLTE无会控界面操作或断开业务网路无法进行会控操作,或切换会控通话中断的情况,提高用户的操作体验,使多方会议更具有智能化。
下面分别对会议应用服务器和媒体服务器进行说明。
如图2所示,本申请实施例的会议控制的实现方法,应用于会议应用服务器,包括:
步骤201,在多方会议过程中,会议应用服务器接收来自主席端的会议控制请求。
其中,多方会议中,仅主席端可进行会控操作。
在一实施例中,所述会议控制请求为预设的按键消息。
例如,可以规定在多方会议过程中,主席端按任意键,即进入语音会控,也可以规定按某个指定按键,比如按下“*”键,表示进入语音会控。
为了接收到该按键消息,在一实施例中,步骤201之前,所述会议应用服务器对所述主席端进行放音收号操作。
在一实施例中,所述会议应用服务器对所述主席端进行放音收号操作,包括:
对所述主席端播放空音,并识别所述预设的按键消息。
本申请实施例中,可以在创建会议成功后,对主席端进行放音收号操作,该放音收号操作,主要功能是实现对用户的按键实现收号操作,所以播放的语音通知为一段空音,用户听不到,收号位可以设置长度为1位,放音收号操作的超时时间非常长,可以设置为一天时间,这样收号操作,将一直在会议中持续,这样主席在讲话过程中,如果要进行语音会控,会按键“*”,通知会议应用服务器,进入到语音会控阶段,系统对语音进行转换和识别工作,按“*”后,进行语音会控指令的识别,对用户来说是一种相对安全的操作,而不是在全程会议中,对用户的通话内容进行录制、转发和识别,这对用户而言相当于监控通话内容的全过程,有泄露用户隐私风险,而按键后进行的会控,都是会议控制指令,不涉及用户谈话隐私,要相对安全的多。
在一实施例中,所述会议应用服务器接收到所述预设的按键消息,可以进行系统放音例如播放“咚”的一声,表示收到请求,进入到语音会控收集用户语音阶段。
步骤202,所述会议应用服务器指示媒体服务器对所述主席端发送的音频进行语音识别。
其中,会议应用服务器可以下发info消息给媒体服务器,指示对主席端进行语音识别。
其中,进行主席端的语音识别,有2种实现方案:
方案一:直接对主席端进行语音识别
直接对主席端进行语音识别,技术上实现起来较为方便,但是,由于主席还在会场中,语音会控会被会场中的其他成员听到,这样用户体验不太好。
方案二:主席和原来的会场隔离开来,使得主席的语音会控操作,其他成员无法听到
这种方案这样保护了主席的操作因私,也对原有会议没有干扰。
方案二实现过程中,有两种实现方式:
方式一:对主席语音隔离方式,相当于逻辑隔离,先对主席端进行静音操作,使得主席的上行声音不在混入到会场中,会场其他成员无法听到,然后对主席进行语音识别操作,这样就实现了主席语音识别功能,并且不会影响原有的会场。
这种实现方式中,步骤202之前,所述会议应用服务器指示所述媒体服务器对所述主席端执行静音操作。
方式二:新建立一个子会场,将主席从原来的会场转移到该子会场,对该子会场的音频进行语音识别,如果放音操作,也是对子会场中进行放音就可以,因为该会场只有主席端一个成员,这样两个子会场,实现物理隔离,主席的语音会控,不会相互干扰。
这种实现方式中,步骤202之前,所述会议应用服务器建立子会场,将所述主席端从所述多方会议的主会场转移至所述子会场。步骤202中,所述会议应用服务器指示所述媒体服务器对所述子会场的音频进行语音识别。
在本步骤中,主席端可以开始进行语音会控,比如说:“把13770335897号码给静音掉”类似的会控指令。
步骤203,所述会议应用服务器从所述媒体服务器获取所述音频对应的文字信息。
其中,所述会议应用服务器接收来自媒体服务器info响应事件响应,该响应携带所述文字信息。
步骤204,所述会议应用服务器按照所述文字信息进行会议控制操作。
在一实施例中,所述会议应用服务器将所述文字信息进行文字匹配和识别,在匹配成功时,根据识别的内容对所述多方会议中的成员进行会议控制操作。
其中,应用服务器可以对文字信息进行关键词识别,比如:请加入号码12313981988,把号码12369856986加入会议等,识别关键词有:加人关键词:加入/拉进/邀请,xxxxxx,可选词:{会议}{进入},其他会控,包括踢人,成员静音,取消静音,录音等,都有对应的语义识别匹配,当匹配后,进行会控操作,操作成功后,可以进行语音提示“请求成功”。如果识别失败,则可以播放“无法识别”,如果会控失败,则播放“请求失败”等放音。
另外,允许语音识别可以有一定的错误冗余,当识别出的关键字达到某条指令的关键字后,进入会控操作指令阶段,对某个成员进行会控操作,操作完毕,对主席端播放操作结果音,等主席端确认,如果主席端不确认,则按系统默认处理,执行主席端的操作请求,或者请求失败返回原会场。
在语义识别后,也可以通过IVR(Interactive Voice Response,互动式语音应答)方式播放下识别的关键词,比如:您要求加入:1235689785,取消请按1,操作完成后将会直接返回到会议中,如果是视频会议,媒体服务器也可以把识别到的语音文字,直接合成叠加到用户看到的视频界面上。这样的用户体验会更好。
在一实施例中,所述会议应用服务器基于预先获取的所述主席端的通讯录,对所述文字信息进行文字匹配和识别。
为方便用户的语音操作,主席端在操作中,使用电话号码对成员的会控操作有可能比较麻烦,主席端可以提前导入用户的通讯录到会议应用服务器,实现直接对通讯录中的姓名进行语音会控,这样比直接对号码操作要方便,如果这样操作的话,则在最后的文字识别流程,搜索主席端的个人通讯录,提取对应的号码进行操作。
如图3所示,本申请实施例的会议控制的实现方法,应用于媒体服务器,包括:
步骤301,在多方会议的过程中,媒体服务器接收会议应用服务器对主席端的音频进行语音识别的指示。
其中,所述媒体服务器可以根据所述会议应用服务器下发的info消息,获知该指示。
其中,进行主席端的语音识别,有2种实现方案:
方案一:直接对主席端进行语音识别
直接对主席端进行语音识别,技术上实现起来较为方便,但是,由于主席还在会场中,语音会控会被会场中的其他成员听到,这样用户体验不太好。
方案二:主席和原来的会场隔离开来,使得主席的语音会控操作,其他成 员无法听到
这种方案这样保护了主席的操作因私,也对原有会议没有干扰。
方案二实现过程中,有两种实现方式:
方式一:对主席语音隔离方式,相当于逻辑隔离,先对主席端进行静音操作,使得主席的上行声音不在混入到会场中,会场其他成员无法听到,然后对主席进行语音识别操作,这样就实现了主席语音识别功能,并且不会影响原有的会场。
这种实现方式中,步骤301之前,所述媒体服务器根据所述会议应用服务器指示,对所述主席端执行静音操作。
方式二:新建立一个子会场,将主席从原来的会场转移到该子会场,对该子会场的音频进行语音识别,如果放音操作,也是对子会场中进行放音就可以,因为该会场只有主席端一个成员,这样两个子会场,实现物理隔离,主席的语音会控,不会相互干扰。
这种实现方式中,步骤301中,所述媒体服务器接收所述会议应用服务器对所述多方会议的子会场的音频进行语音识别的指示,其中,所述主席端位于所述子会场中。
在本步骤中,主席端可以开始进行语音会控,比如说:“把13770335897号码给静音掉”类似的会控指令。
步骤302,所述媒体服务器获取所述主席端发送的音频,通过自动语音识别获取所述主席端的音频对应的文字信息。
在一实施例中,所述媒体服务器获取所述主席端发送的音频,将所述音频转发至自动语音识别设备;所述媒体服务器从所述自动语音识别设备获取所述音频对应的文字信息。
其中,所述媒体服务器可以把主席端发送的音频进行转发,调用公有云的语音识别系统接口,如科大讯飞的语音识别系统,进行自动语音识别。
步骤303,所述媒体服务器将所述文字信息发送至所述会议应用服务器,以使所述会议应用服务器按照所述文字信息进行会议控制操作。
其中,媒体服务器可以将该文字信息,通过info响应事件响应给会议应用 服务器。
下面以一个应用实例进行说明:
本申请实施例主要是基于RCS业务架构下会议业务,以CS(Circuit Switched,电路交换)域的语音会议业务为例(VoLTE的多方语音和视频也都适用),如图4和图5所示:
步骤401,UE(User Equipment,用户设备)接收用户采用手机APP或者H5界面的输入,发起会议呼叫,根据选择参加会议的成员,发送HTTP创会请求到会议应用服务器,会议应用服务器收到请求,鉴权通过后,开始创建会议,采用回呼方式,先呼叫主席端,然后再呼叫各个成员,当主席电话接通后,如果主席没有签约VoLTE业务,主席端的数据网路会发生断网,主席端将无法使用手机在APP或H5界面上进行会控操作。
步骤402,会议创建后,会议应用服务器对主席端进行P&C(放音收号)操作,主要目的是收号操作,放音播放是空音(音号存在,但音文件是很小的空白音),用户不会听到,无感知,当主席进行按键操作将会被收号到系统,此时,主席如选择通过语音进行会控,先按“*”键(也可以定义为按任意键),会议系统将收到用户的*按键,识别出用户按*键后,开始进入主席语音识别和会控流程。
步骤403,会议应用服务器向媒体服务器发送info消息,指示媒体服务器对主席端执行静音操作。
步骤404,媒体服务器向会议应用服务器返回200OK确认消息。
步骤405,会议应用服务器向媒体服务器发送info消息,指示媒体服务器对主席端进行语音识别。
步骤406,媒体服务器向会议应用服务器返回200OK确认消息。
步骤407,对主席端播放进入预约会控,播放“咚”音,表示进入到语音会控流程,等待用户进行语音输入,系统播放提示音后,开始进入的主席端语音识别环节。
其中,进入语音识别环节,主要对主席端的上行语音进行录制和识别,有 以下几种方案,方案一比较简单些,但是体验稍差,方案二和三体验要好很多,可以对主控端进行录音识别的同时,对主控端进行隔离操作,使得会控声音不会加入到原来的会议系统中,所以也不会对原来的会场有干扰,主要有以下几种操作方案:
方案一:直接对主席端的上行语音进行录制和语音识别,把主席端的上行的语音转发到自动语音识别系统进行识别转换,该方案缺点是,主席的会控操作语音能被其他成员端听到,对原会议是一种干扰,私密性不好。
方案二:会议应用服务器首先对媒体服务器发送对主席端进行静音的会控操作,使得主席的音不会混音到会议桥中,不会被其他成员端听到,紧接着发送指令给媒体服务器,对该端进行语音识别操作,这样其他成员端就不会听到主席端的会控操作了,这种操作比较方便些,如图4所采用的方案即为方案二。
方案三:采用两个会场技术,再建立一个子会议,建立一个子会议后,把主席端从原来的会议中先撤离出来,然后加入到子会议中,这样子会议中,只包括一个主席端,对子会议中整个会场进行录音和语音识别操作,会控操作完成后,再把主席撤离出子会议,再加入到原来的主会议中。这样实现主席的会控操作和原来的会议相互隔离,不影响原来的会议。
步骤408,主席端将语音媒体流发送给媒体服务器。
步骤409,媒体服务器收到会议应用服务器对主席端上行语音识别请求后,对这一路的音频输入进行处理,开始录制并采用http方式把录制的音频文件流输入到语音识别系统。
步骤410,语音识别系统把语音转换为文字,识别完毕后,返回结果到媒体服务器。
步骤411,媒体服务器再给会议应用服务器返回状态报告,返回报告中包括语音识别的结果和识别出的语音文字。
步骤412,会议应用服务器收到语音处理的状态报告后,开始对其中的文字进行识别处理,识别完毕播放即将操作通知音,执行相应的会控操作。
其中,参照图6,对文字进行匹配和识别可以包括:
步骤501,对上下文字、词识别分解。
步骤502,流程关键词识别,其中,可以仅对会控关键字进行识别,比如:加入流程,识别关键字:增加、加入,加进来、加上、拉入、拉进来、呼进来、呼起来等,识别是会控的哪个流程。
步骤503,再识别要操作的数字号码,比如加人流程,识别的号码为要加入的号码,可以一次想加入多个号码,号码中间有空格间隔,或其他文件隔开,其他的会控流程类似。
步骤504,识别完毕会控流程和关键信息后,进行语音播报,以及会控处理。
例如,对操作端进行播放确认信息“您要求加入:1235689785,取消请按1”,播放完毕后,用户没有取消,开始下发增加成员的操作流程。如果用户听完毕后,按1,则取消该操作流程,如果此请求没有完成前,用户按1,则可以取消本次操作,比如正在加入此用户,在没有呼叫接听前,可以取消本次操作,终止呼叫,如果已经操作成功,则取消按键无效。
最后无论是否操作还是取消操作,都会把主席端恢复隔离操作和取消录音,重新加入到主会场中,对主席端取消静音,取消录音识别,重新回到会议流程。
综上所述,本申请实施例提供了一种移动端多方通话业务中,采用语音进行会控操作,包括CS域的语音会议业务,VoLTE业务的多方视频语音业务,都可以使用本实施例,特别适合在用户无操作会控界面,有界面但不方便进行会控操作等场景下,用户接听后,采用主席端自然语音进行会控,实现加入、踢人、静音等会控操作,采用本实施例,可以提高用户的操作体验,使语音通话更具有智能化,较好避免无法使用界面操作的会控会议的情况。
如图7所示,本申请实施例还提供一种会议控制的实现装置,应用于会议应用服务器,包括:
第一接收模块601,用于在多方会议过程中,接收来自主席端的会议控制请求;
指示模块602,用于指示媒体服务器对所述主席端发送的音频进行语音识别;
第一获取模块603,用于从所述媒体服务器获取所述音频对应的文字信息;
控制模块604,用于按照所述文字信息进行会议控制操作。
如图8所示,本申请实施例还提供一种会议控制的实现装置,应用于媒体服务器,包括:
第二接收模块701,用于在多方会议的过程中,接收会议应用服务器对主席端的音频进行语音识别的指示;
第二获取模块702,用于获取所述主席端发送的音频,通过自动语音识别获取所述主席端的音频对应的文字信息;
发送模块703,用于将所述文字信息发送至所述会议应用服务器,以使所述会议应用服务器按照所述文字信息进行会议控制操作。
如图9所示,本申请实施例还提供一种会议应用服务器,包括:存储器801、处理器802及存储在存储器801上并可在处理器802上运行的计算机程序803,所述处理器802执行所述程序时实现所述会议控制的实现方法。
如图10所示,本申请实施例还提供一种媒体服务器,包括:存储器901、处理器902及存储在存储器901上并可在处理器902上运行的计算机程序903,所述处理器902执行所述程序时实现所述会议控制的实现方法。
本申请实施例还提供一种计算机可读存储介质,存储有计算机可执行指令,所述计算机可执行指令用于执行所述会议控制的实现方法。
在本实施例中,上述存储介质可以包括但不限于:U盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、移动硬盘、磁碟或者光盘等各种可以存储程序代码的介质。
本领域普通技术人员可以理解,上文中所公开方法中的全部或某些步骤、系统、装置中的功能模块/单元可以被实施为软件、固件、硬件及其适当的组合。在硬件实施方式中,在以上描述中提及的功能模块/单元之间的划分不一定对应于物理组件的划分;例如,一个物理组件可以具有多个功能,或者一个功能或步骤可以由若干物理组件合作执行。某些组件或所有组件可以被实施为由处理器,如数字信号处理器或微处理器执行的软件,或者被实施为硬件,或者被实施为集成电路,如专用集成电路。这样的软件可以分布在计算机可读介质上,计算机可读介质可以包括计算机存储介质(或非暂时性介质)和通信介质(或暂时性介质)。如本领域普通技术人员公知的,术语计算机存储介质包括在用于存储信息(诸如计算机可读指令、数据结构、程序模块或其他数据)的任何 方法或技术中实施的易失性和非易失性、可移除和不可移除介质。计算机存储介质包括但不限于RAM、ROM、EEPROM、闪存或其他存储器技术、CD-ROM、数字多功能盘(DVD)或其他光盘存储、磁盒、磁带、磁盘存储或其他磁存储装置、或者可以用于存储期望的信息并且可以被计算机访问的任何其他的介质。此外,本领域普通技术人员公知的是,通信介质通常包含计算机可读指令、数据结构、程序模块或者诸如载波或其他传输机制之类的调制数据信号中的其他数据,并且可包括任何信息递送介质。

Claims (15)

  1. 一种会议控制的实现方法,包括:
    在多方会议过程中,会议应用服务器接收来自主席端的会议控制请求;
    所述会议应用服务器指示媒体服务器对所述主席端发送的音频进行语音识别;
    所述会议应用服务器从所述媒体服务器获取所述音频对应的文字信息;
    所述会议应用服务器按照所述文字信息进行会议控制操作。
  2. 如权利要求1所述的方法,其中,所述会议控制请求为预设的按键消息,会议应用服务器接收来自主席端的会议控制请求之前,所述方法还包括:
    所述会议应用服务器对所述主席端进行放音收号操作。
  3. 如权利要求2所述的方法,其中,所述会议应用服务器对所述主席端进行放音收号操作,包括:
    对所述主席端播放空音,并识别所述预设的按键消息。
  4. 如权利要求1所述的方法,其中,所述会议应用服务器指示媒体服务器对所述主席端发送的音频进行语音识别之前,所述方法还包括:
    所述会议应用服务器指示所述媒体服务器对所述主席端执行静音操作。
  5. 如权利要求1所述的方法,其中,所述会议应用服务器指示媒体服务器对所述主席端发送的音频进行语音识别之前,所述方法还包括:
    所述会议应用服务器建立子会场,将所述主席端从所述多方会议的主会场转移至所述子会场。
  6. 如权利要求5所述的方法,其中,所述会议应用服务器指示媒体服务器对所述主席端发送的音频进行语音识别,包括:
    所述会议应用服务器指示所述媒体服务器对所述子会场的音频进行语音识别。
  7. 如权利要求1所述的方法,其中,所述会议应用服务器按照所述文字信息进行会议控制操作,包括:
    所述会议应用服务器将所述文字信息进行文字匹配和识别,在匹配成功时,根据识别的内容对所述多方会议中的成员进行会议控制操作。
  8. 如权利要求7所述的方法,其中,所述会议应用服务器将所述文字信息进行文字匹配和识别,包括:
    所述会议应用服务器基于预先获取的所述主席端的通讯录,对所述文字信息进行文字匹配和识别。
  9. 一种会议控制的实现方法,包括:
    在多方会议的过程中,媒体服务器接收会议应用服务器对主席端的音频进行语音识别的指示;
    所述媒体服务器获取所述主席端发送的音频,通过自动语音识别获取所述主席端的音频对应的文字信息;
    所述媒体服务器将所述文字信息发送至所述会议应用服务器,以使所述会议应用服务器按照所述文字信息进行会议控制操作。
  10. 如权利要求9所述的方法,其中,所述媒体服务器获取所述主席端发送的音频,通过自动语音识别获取所述主席端的音频对应的文字信息,包括:
    所述媒体服务器获取所述主席端发送的音频,将所述音频转发至自动语音识别设备;
    所述媒体服务器从所述自动语音识别设备获取所述音频对应的文字信息。
  11. 如权利要求9所述的方法,其中,所述媒体服务器接收会议应用服务器对所述主席端的音频进行语音识别的指示之前,所述方法还包括:
    所述媒体服务器根据所述会议应用服务器指示,对所述主席端执行静音操作。
  12. 如权利要求9所述的方法,其中,所述媒体服务器接收会议应用服务器对所述主席端的音频进行语音识别的指示,包括:
    所述媒体服务器接收所述会议应用服务器对所述多方会议的子会场的音频进行语音识别的指示,其中,所述主席端位于所述子会场中。
  13. 一种会议应用服务器,包括:存储器、处理器及存储在存储器上并可 在处理器上运行的计算机程序,其中,所述处理器执行所述程序时实现如权利要求1~8中任意一项所述会议控制的实现方法。
  14. 一种媒体服务器,包括:存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,其中,所述处理器执行所述程序时实现如权利要求9~12中任意一项所述会议控制的实现方法。
  15. 一种计算机可读存储介质,存储有计算机可执行指令,所述计算机可执行指令用于执行权利要求1~12中任意一项所述会议控制的实现方法。
PCT/CN2020/085482 2019-06-04 2020-04-18 会议控制的实现方法、装置和服务器 WO2020244317A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
EP20818931.6A EP3979560A4 (en) 2019-06-04 2020-04-18 CONFERENCE CONTROL METHOD AND APPARATUS, AND SERVER

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910479990.4 2019-06-04
CN201910479990.4A CN112040166A (zh) 2019-06-04 2019-06-04 会议控制的实现方法、装置和服务器

Publications (1)

Publication Number Publication Date
WO2020244317A1 true WO2020244317A1 (zh) 2020-12-10

Family

ID=73575991

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/085482 WO2020244317A1 (zh) 2019-06-04 2020-04-18 会议控制的实现方法、装置和服务器

Country Status (3)

Country Link
EP (1) EP3979560A4 (zh)
CN (1) CN112040166A (zh)
WO (1) WO2020244317A1 (zh)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112565665B (zh) * 2020-12-08 2022-07-26 北京北信源软件股份有限公司 一种视频会议控制方法、装置、电子设备及存储介质
CN112541495A (zh) * 2020-12-22 2021-03-23 厦门亿联网络技术股份有限公司 会议消息的检测方法、装置、服务器及存储介质
CN112885350A (zh) * 2021-02-25 2021-06-01 北京百度网讯科技有限公司 网络会议的控制方法、装置、电子设备和存储介质
CN113225521B (zh) * 2021-05-08 2022-11-04 维沃移动通信(杭州)有限公司 视频会议控制方法、装置和电子设备

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH09281991A (ja) * 1996-04-18 1997-10-31 Nec Software Ltd 音声集計システムおよび音声集計方法
US5916302A (en) * 1996-12-06 1999-06-29 International Business Machines Corporation Multimedia conferencing using parallel networks
CN101159788A (zh) * 2007-11-23 2008-04-09 华为技术有限公司 一种呼叫用户加入会议的方法及装置
CN103824559A (zh) * 2012-11-19 2014-05-28 国际商业机器公司 插入用于电子会议的语音命令
CN105991964A (zh) * 2015-02-13 2016-10-05 中兴通讯股份有限公司 一种播报多媒体会议中动态信息的方法及装置

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2757444B2 (ja) * 1989-04-06 1998-05-25 日本電気株式会社 多地点遠隔会議システム
US20070263824A1 (en) * 2006-04-18 2007-11-15 Cisco Technology, Inc. Network resource optimization in a video conference
CN101170613B (zh) * 2007-11-16 2010-06-16 中兴通讯股份有限公司 语音会议中会议控制操作的方法和系统
CN101291373B (zh) * 2008-04-15 2010-12-08 中兴通讯股份有限公司 实现多方通话的方法及系统
CN101534411B (zh) * 2009-04-08 2012-12-12 华为终端有限公司 一种基于图像的视讯会议控制方法、终端及系统
CN101656863A (zh) * 2009-08-07 2010-02-24 深圳华为通信技术有限公司 一种会议控制的方法、装置和系统
CN101841422B (zh) * 2010-05-07 2012-06-20 无锡中星微电子有限公司 语音会议中实现私人聊天的方法和系统
CN102811205A (zh) * 2011-06-02 2012-12-05 中兴通讯股份有限公司 一种用应用服务器实现子会议功能的方法和系统
CN103516919B (zh) * 2012-06-27 2018-03-27 中兴通讯股份有限公司 发送音频数据的方法、装置及终端
CN107995456A (zh) * 2017-11-16 2018-05-04 杭州好园科技有限公司 智慧园区视频会议系统

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH09281991A (ja) * 1996-04-18 1997-10-31 Nec Software Ltd 音声集計システムおよび音声集計方法
US5916302A (en) * 1996-12-06 1999-06-29 International Business Machines Corporation Multimedia conferencing using parallel networks
CN101159788A (zh) * 2007-11-23 2008-04-09 华为技术有限公司 一种呼叫用户加入会议的方法及装置
CN103824559A (zh) * 2012-11-19 2014-05-28 国际商业机器公司 插入用于电子会议的语音命令
CN105991964A (zh) * 2015-02-13 2016-10-05 中兴通讯股份有限公司 一种播报多媒体会议中动态信息的方法及装置

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3979560A4 *

Also Published As

Publication number Publication date
EP3979560A1 (en) 2022-04-06
EP3979560A4 (en) 2022-07-20
CN112040166A (zh) 2020-12-04

Similar Documents

Publication Publication Date Title
WO2020244317A1 (zh) 会议控制的实现方法、装置和服务器
US7742587B2 (en) Telecommunications and conference calling device, system and method
US20160050079A1 (en) Teleconference message box
US8704872B2 (en) Method and device for switching video pictures
US20060121925A1 (en) Method for processing conversation information in wireless terminal
AU2009345700B2 (en) Method and apparatus for answering and recording automatically in visual telephone service
JP2005124183A (ja) 通信セッション上で起動中の複数個の通信機器の識別情報を情報受信コンポーネントに送信する装置および方法
EP1869793A2 (en) A communication apparatus
WO2017113581A1 (zh) 一种通话控制方法以及装置、系统
CN112887194B (zh) 实现听障人士通话的交互方法、装置、终端及存储介质
WO2014101209A1 (zh) 一种实现会议接入的方法、设备和系统
US10313502B2 (en) Automatically delaying playback of a message
US8503987B2 (en) Method and apparatus for multipoint call service in mobile terminal
US8526919B2 (en) Message injection system and method
JP2008258844A (ja) 通信端末
CN110650254A (zh) 信息的发送方法、信息的接收方法、终端及存储介质
JP4631603B2 (ja) PoCサービスにおける音声データ再生システム、再生方法、サーバ装置、及び、プログラム
JP4352138B2 (ja) 携帯電話での同報通話システム
WO2017141041A1 (en) System and method for assisting conference calls
CN112804402A (zh) 一种视频彩铃显示方法及装置
US8571186B2 (en) Method and system for recording telephone conversations placed on hold
US12003672B1 (en) Methods of duplex transfer of voice streams between mobile subscribers and clients of IM/VOIP with the corresponding implementation of PBX and of the VOIP-cell gateway based on smartphones
US20240040036A1 (en) Real-time user screening of messages within a communication platform
US20240056279A1 (en) Communication system
US8457601B2 (en) Key responsive record, navigation and marking controls for communications system

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20818931

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2020818931

Country of ref document: EP

Effective date: 20211230