WO2016184118A1 - Method and device for realizing multimedia conference - Google Patents

Method and device for realizing multimedia conference Download PDF

Info

Publication number
WO2016184118A1
WO2016184118A1 PCT/CN2015/099559 CN2015099559W WO2016184118A1 WO 2016184118 A1 WO2016184118 A1 WO 2016184118A1 CN 2015099559 W CN2015099559 W CN 2015099559W WO 2016184118 A1 WO2016184118 A1 WO 2016184118A1
Authority
WO
WIPO (PCT)
Prior art keywords
speech
participant
multimedia conference
information
client
Prior art date
Application number
PCT/CN2015/099559
Other languages
French (fr)
Chinese (zh)
Inventor
应益峰
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2016184118A1 publication Critical patent/WO2016184118A1/en

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • H04N7/15Conference systems

Definitions

  • the present invention relates to the field of multimedia conference technologies, and more particularly, to a method and apparatus for implementing a multimedia conference.
  • Multimedia conference is a kind of conference that integrates voice, video and data on the network.
  • the multimedia conference provides users with long-distance transmission of voice, video, data, instant messaging and other multimedia services through the broadband access network.
  • the web portal allows users to create multimedia conferences.
  • the conference spokesperson and other participants often have poor communication.
  • the conference spokesperson is different from the other participants' mother tongue or the conference spokesperson has a dialect, other participants often occur. It is impossible to accurately understand the meaning of the speaker of the meeting; for example, in the multimedia conference, if other participants are distracted and miss some of the speeches of the speaker of the meeting, the speech of the speaker of the meeting cannot be accurately understood, which greatly reduces the communication of the meeting. Effect.
  • a method and an apparatus for implementing a multimedia conference are provided to solve the problem that a participant cannot accurately understand the content of a conference speaker in a multimedia conference in the prior art.
  • the present invention provides a method for implementing a multimedia conference, including:
  • the client obtains the voice information of the voice of the local participant, and sends the voice message of the voice to the multimedia conference server;
  • the client converts the spoken voice information into speech text information
  • the other participant is a participant other than the local participant among the participants of the multimedia conference.
  • the performing, by the client, the speaking voice information to the speaking text information includes:
  • the speech notification message carries a user identification information ID of the speaker
  • the speaker is configured by the multimedia conference server according to the energy of the voice information sent by the participant participating in the multimedia conference, according to the a predetermined number of participants in the order of energy from large to small;
  • the voice speech recognition engine converts the collected speech voice information into speech text information.
  • the acquiring, by the client, the voice information of the local participant includes:
  • the voice information of the local participant is collected by using the voice device.
  • the acquiring, by the client, the voice information of the local participant includes:
  • the client sends a speech request message to the multimedia conference server, where the speech request message carries the user ID of the local participant, so that the multimedia conference server sends the speech request message to the client corresponding to the host;
  • the voice device When the client receives the voice device open command sent by the multimedia conference server, the voice device is used to collect the voice information of the local participant; the voice device is configured.
  • the standby enable command is received by the multimedia conference server, and the client corresponding to the host generates the response response message returned according to the speech request message.
  • the present invention provides a method for implementing a multimedia conference, including:
  • the multimedia conference server acquires the speech speech information sent by the client and the speech text information corresponding to the speech speech information, and the speech speech information obtained by the client is converted by the speech recognition engine by using the speech recognition engine;
  • the multimedia conference server sends the speech voice information and the speech text information to the client corresponding to the other participant, so that the client corresponding to the other participant displays the speech voice information and the speech text information;
  • the other participant is a participant other than the participant who sends the speaking voice information and the speaking text information among the participants who participate in the multimedia conference.
  • the method further includes:
  • the multimedia conference server detects energy of voice information sent by the client
  • the multimedia conference server determines, according to the order of the energy from the largest to the smallest, a preset number of participants as a speaker;
  • the multimedia conference server sends a speech notification message to the client corresponding to the speaker, where the speech notification message carries the user identification information ID of the speaker, so that the client corresponding to the speaker obtains the speech.
  • the person speaks the voice information and converts the spoken voice information into the spoken text information.
  • the method further includes:
  • the multimedia conference server receives a speech request message sent by the client, where the speech request message carries a user ID of the participant corresponding to the client;
  • the session response message is generated when the client corresponding to the moderator determines that the participant who sends the floor request message has the floor permission.
  • the present invention provides a method for implementing a multimedia conference, including:
  • the client obtains the voice information of the local participant's speech
  • the multimedia conference server sends, by the client, the speech voice information to the multimedia conference server, so that the multimedia conference server converts the speech voice information into speech text information, and causes the multimedia conference server to display the speech information and the speech message
  • the text of the speech is sent to the client of other participants participating in the multimedia conference.
  • the other participant is a participant other than the local participant among the participants of the multimedia conference.
  • the present invention provides a method for implementing a multimedia conference, including:
  • the multimedia conference server obtains the voice information sent by the client;
  • the multimedia conference server converts the speech voice information into speech text information
  • the multimedia conference server sends the speech voice information and the speech text information to a client corresponding to another participant, so that the client corresponding to the other participant displays the speech voice information and the speech text information.
  • the other participant is a participant other than the participant who sends the speaking voice information among the participants who participate in the multimedia conference.
  • the converting, by the multimedia conference server, the speaking voice information into the speaking text information includes:
  • the multimedia conference server detects the energy of the voice information sent by the client, and sequentially determines the preset number of participants as the speaker according to the order of the energy;
  • the voice recognition engine is used to convert the voice information sent by the client corresponding to the determined speaker into the voice text information.
  • the present invention provides a device for implementing a multimedia conference, which is used for a client, and includes:
  • An obtaining unit configured to obtain speaking voice information of a local participant
  • a converting unit configured to convert the spoken voice information into speech text information
  • a sending unit configured to send the speech voice information and the speech text information to the multimedia conference server, so that the multimedia conference server sends the speech speech information and the speech text information to the multimedia conference Participant's client;
  • the other participant is a participant other than the local participant among the participants of the multimedia conference.
  • the acquiring unit includes:
  • a first determining subunit configured to determine whether the local participant has a speaking right
  • the first collecting subunit is configured to: when the first determining unit determines that the local participant has a speaking right, collect the speaking voice information of the local participant by using a voice device.
  • the converting unit includes:
  • a first receiving subunit configured to receive a speech notification message sent by the multimedia conference server, where the speech notification message carries a user identity information ID of the speaker, and the speaker is sent by the multimedia conference server according to the participant participating in the multimedia conference
  • the energy of the voice information is determined according to the order of the energy from the largest to the smallest preset number of participants;
  • a second determining subunit configured to determine whether a user ID carried by the floor notification message is the same as a user ID of the local participant
  • a second collection subunit configured to: when the second judgment subunit determines that the user ID carried by the utterance notification message is the same as the user ID of the local participant, use a voice device to collect the speech of the local participant voice message.
  • the acquiring unit specifically includes:
  • a first sending subunit configured to send a speech request message to the multimedia conference server, where the speech request message carries a user ID of the local participant, so that the multimedia conference service Transmitting the speech request message to a client corresponding to the host;
  • a second receiving subunit configured to receive a voice device open command sent by the multimedia conference server
  • a third collection subunit configured to: when the second receiving subunit receives the voice device open instruction, use a voice device to collect voice information of the local participant; the voice device open command is used by the multimedia
  • the conference server receives the response response message returned by the client corresponding to the speaker request message.
  • the present invention provides an apparatus for implementing a multimedia conference, which is used in a multimedia conference server, and includes:
  • An acquiring unit configured to obtain speech speech information sent by the client, and speech text information corresponding to the speech speech information, where the speech speech information is converted by the speech recognition engine obtained by the client by using a speech recognition engine;
  • a first sending unit configured to send the speaking voice information and the speaking text information to a client corresponding to another participant, so that the client corresponding to the other participant displays the speaking voice information and the speaking text information ;
  • the other participant is a participant other than the participant who sends the speaking voice information and the speaking text information among the participants who participate in the multimedia conference.
  • the method further includes:
  • a detecting unit configured to detect energy of voice information sent by the client
  • a determining unit configured to determine, according to the order of the energy from the largest to the smallest, a preset number of participants as a speaker
  • a second sending unit configured to send a speech notification message to the client corresponding to the speaker, where the speech notification message carries the user identification information ID of the speaker, so that the client corresponding to the speaker acquires the location
  • the speaker's speech information is converted and the speech information is converted into speech text information.
  • the method further includes:
  • a first receiving unit configured to receive a speech request message sent by the client, where the speech is requested
  • the request message carries the user ID of the participant corresponding to the client;
  • a third sending unit configured to send the floor request message to the client corresponding to the moderator, so that the client corresponding to the moderator determines, according to the floor request message, whether the participant who sends the floor request message has the speaking right ;
  • a second receiving unit configured to receive a speech response message sent by the client corresponding to the host; the speech response message is determined by the client corresponding to the moderator, when the participant sending the speech request message has a speaking permission produce;
  • a fourth sending unit configured to send, according to the utterance response message, a voice device open command to a client corresponding to the participant having the utterance authority.
  • the present invention provides an apparatus for implementing a multimedia conference, which is applied to a client, and includes:
  • An obtaining unit configured to obtain speaking voice information of a local participant
  • a sending unit configured to send the speech voice information to the multimedia conference server, so that the multimedia conference server converts the speech voice information into speech text information, and causes the multimedia conference server to Sending the spoken text information to the client of other participants participating in the multimedia conference
  • the other participant is a participant other than the local participant among the participants of the multimedia conference.
  • the present invention provides an apparatus for implementing a multimedia conference, which is applied to a multimedia server, and includes:
  • the obtaining unit is configured to obtain the speaking voice information sent by the client;
  • a converting unit configured to convert the spoken voice information into speech text information
  • a sending unit configured to send the speaking voice information and the speaking text information to a client corresponding to another participant, so that the client corresponding to the other participant displays the speaking voice information and the speaking text information ;
  • the other participant is a participant other than the participant who sends the speaking voice information among the participants who participate in the multimedia conference.
  • the converting unit includes:
  • the detecting subunit is configured to detect the energy of the voice information sent by the client, and sequentially determine the preset number of participants as the speaker according to the order of the energy;
  • the conversion subunit is configured to convert the speech speech information sent by the determined speaker into the speech text information by using the speech recognition engine.
  • a ninth aspect provides a multimedia conference system, including: a client and a multimedia conference server;
  • the client is configured to obtain the speech information of the local participant and send the speech information to the multimedia conference server; and convert the speech information into speech text information, and send the speech text information to the multimedia conference server;
  • the multimedia conference server is configured to send the speech voice information and the speech text information to a client of another participant participating in the multimedia conference;
  • the other participant is a participant other than the local participant among the participants of the multimedia conference.
  • the multimedia conference server is further configured to detect, according to the energy, the energy of the voice information sent by the client participating in the multimedia conference The sequence of the large to small determines that the preset number of participants is a speaker, and sends a speech notification message to the client corresponding to the speaker, where the speech notification message carries the user identification information ID of the speaker;
  • the client is configured to receive a speech notification message sent by the multimedia conference server, and determine, according to the speech notification information, that the local participant is a speaker, obtain the speech information of the local participant, and send the speech message to the multimedia conference server. And converting the speech voice information into speech text information, and transmitting the speech text information to the multimedia conference server.
  • the present invention further provides a multimedia conference system, including: a client and a multimedia conference server;
  • the client is configured to obtain the voice information of the local participant and send the voice message to the multimedia conference server.
  • the multimedia conference server is configured to convert the speech voice information into speech text information, and send the speech speech information and the speech text information corresponding to the speech speech information to a client corresponding to another participant; And the other participant is a participant other than the participant who sends the speaking voice information among the participants who participate in the multimedia conference;
  • the client corresponding to the other participant is further configured to display the speaking voice information and the speaking text information sent by the multimedia conference server to the user.
  • the multimedia conference server is further configured to detect, according to the energy, the energy of the voice information sent by the client participating in the multimedia conference The order of the large to small determines that the preset number of participants is a speaker, and when the received speech voice information comes from the determined speaker, the speech information is converted into speech text information.
  • the client of the speaker can convert the voice information of the speaker to the text of the speech, and forward the text message to the participant through the multimedia conference server.
  • the client corresponding to the participant other than the speaker in the participant of the multimedia conference so that the speaker's speech information is displayed on the client corresponding to the other participant, so that the participant can only receive the voice message.
  • FIG. 1 is a block diagram of a multimedia conference according to an embodiment of the present invention.
  • FIG. 2 is a flowchart of a method for implementing a multimedia conference according to an embodiment of the present invention
  • FIG. 3 is a flowchart of still another method for implementing a multimedia conference according to an embodiment of the present invention.
  • FIG. 4 is a flowchart of another method for implementing a multimedia conference according to an embodiment of the present invention.
  • FIG. 5 is a flowchart of still another method for implementing a multimedia conference according to an embodiment of the present invention.
  • FIG. 6 is a flowchart of another method for implementing a multimedia conference according to an embodiment of the present invention.
  • FIG. 7 is a flowchart of another method for implementing a multimedia conference according to an embodiment of the present invention.
  • FIG. 8 is a schematic structural diagram of an apparatus for implementing a multimedia conference according to an embodiment of the present invention.
  • FIG. 9 is a schematic structural diagram of another apparatus for implementing a multimedia conference according to an embodiment of the present invention.
  • FIG. 10 is a schematic structural diagram of an acquiring unit according to an embodiment of the present invention.
  • FIG. 11 is a schematic structural diagram of a conversion unit according to an embodiment of the present invention.
  • FIG. 12 is a schematic structural diagram of still another acquiring unit according to an embodiment of the present invention.
  • FIG. 13 is a schematic structural diagram of another implementation of a multimedia conference apparatus according to an embodiment of the present invention.
  • FIG. 14 is a schematic structural diagram of another apparatus for implementing a multimedia conference according to an embodiment of the present invention.
  • FIG. 15 is a schematic structural diagram of another apparatus for implementing a multimedia conference according to an embodiment of the present invention.
  • 16 is a schematic structural diagram of an apparatus for implementing a multimedia conference applied to a client according to an embodiment of the present invention
  • FIG. 17 is a schematic structural diagram of an apparatus for implementing a multimedia conference applied to a multimedia conference server according to an embodiment of the present invention.
  • FIG. 18 is a schematic structural diagram of a client for implementing a multimedia conference according to an embodiment of the present invention.
  • FIG. 19 is a schematic structural diagram of a multimedia conference server according to an embodiment of the present invention.
  • FIG. 20 is a schematic structural diagram of another multimedia conference server according to an embodiment of the present invention.
  • 21 is a schematic structural diagram of another client for implementing a multimedia conference according to an embodiment of the present invention.
  • FIG. 22 is a schematic structural diagram of another multimedia conference server according to an embodiment of the present invention.
  • the solution of the multimedia conference provided by the embodiment of the present invention solves the problem that the participants in the background technology cannot accurately understand the speech information of the speaker, which leads to the problem of reducing the communication of the conference.
  • the multimedia conferencing system includes a plurality of clients 1 and at least one multimedia conferencing server 2.
  • the client can be a personal PC, a laptop, and the like.
  • the client obtains the media stream information (for example, the voice information) of the participant, and uploads the media stream information to the multimedia conference server 2, and the multimedia conference server 2 performs the mixing process on the media stream sent by each client, and then sends the media stream to each terminal. So that geographically dispersed users communicate through graphics, sound, and so on.
  • FIG. 2 is a flowchart of a method for implementing a multimedia conference according to an embodiment of the present invention. The method is applied to the client shown in FIG. 1. As shown in FIG. 2, the method includes the following steps:
  • the client obtains the voice information of the voice of the local participant, and sends the voice message to the multimedia conference server.
  • a local participant is a participant who is in the same geographic space as the client. For example, participant A uses client a to participate in a multimedia conference. For client a, participant A is a local participant corresponding to client a.
  • the client can use the voice device to obtain the voice information of the local participant's speech.
  • the voice device can include voice information collection hardware integrated on the client and operating software that controls the voice information collection hardware.
  • the voice information collection hardware can implement functions such as voice collection, voice coding, and voice decoding, for example, MIC.
  • the operating software can query the number and name of the local voice information collecting hardware, and can also turn on, turn off or mute the voice collecting hardware.
  • This embodiment is applicable to a discussion-type conference application scenario, and each participant can speak, so that each client can obtain the voice information of the participant's voice corresponding to itself. If the client obtains the voice information of the participant's voice through the voice device, the voice device corresponding to each participant is turned on.
  • the client converts the speech information into speech text information.
  • the client uses the speech recognition technology to convert the speech speech information of the obtained local speaker into the speech text information.
  • the voice information of the local participants obtained by the client is relatively strong. Therefore, the accuracy of the voice message converted by the client corresponding to the speaker is higher.
  • the client corresponding to other participants does not need to convert the speech information of the speaker to the speech text information, thereby saving the resources of the client corresponding to other participants.
  • the client corresponding to the speaker may further store the spoken text information, so as to generate the meeting minutes by using the spoken text information.
  • the client corresponding to other participants participating in the multimedia conference may also store the received speech text information to generate a meeting minutes based on the spoken text information.
  • the corresponding client of the speaker can also display the spoken text information, so that the speaker can view the content of his speech.
  • the client sends the speech text information to the multimedia conference server, so that the multimedia conference server sends the speech speech information and the speech text information to the client corresponding to the other participants.
  • the other participants are participants other than the speaker among all the participants participating in the multimedia conference.
  • the multimedia conference server sends the received voice information and the spoken text information to the client corresponding to other participants participating in the multimedia conference.
  • Client display corresponding to other participants Received voice messages and text messages, which help participants quickly understand the speaker's speech.
  • participants participating in the local multimedia conference include A, B, C, D, and E.
  • participant A is a speaker
  • participants B, C, D, and E are other participants.
  • the multimedia conference server sends the voice information and the voice text information of the participant A to the B, C, D, and E.
  • the T.120 protocol standard can be integrated on both the client and the multimedia conference server to implement the function of transmitting and receiving voice messages and speaking text information between the client and the multimedia conference server.
  • the T.120 standard includes a series of protocols such as T.120-T.127, which can realize the reliability of information transmission between clients and between the client and the multimedia conference server, and at the same time, can provide points to Multi-point data distribution service and select the transmission channel with the best transmission efficiency to transmit data.
  • the client obtains the speech information of the local participant and converts the speech information into speech text information. Then, the speech information and the speech text information are sent to the multimedia conference server, and then the multimedia conference server forwards the client to the client corresponding to the other participants participating in the multimedia conference, and the client corresponding to the other participant displays the received speech information and the speech. text information.
  • the participant can both hear the speech information of the speaker and the corresponding speech text information, so that the participant can accurately understand the speaker by combining the speech text information and the speech voice information. The content of the speech has thus improved the communication effect of the multimedia conference.
  • all participants are allowed to speak, for example, a discussion session. However, if the voice information sent by all participants is converted into the corresponding text information, many voices that are not related to the conference will be converted into text, and many texts that are not related to the conference will be displayed to the participants, causing interference to the participants. .
  • a participant with a large voice energy can be determined as a speaker, and the voice information of the speaker's speech can be converted into a speech text information, and the voice content of other participants with less voice energy is ignored.
  • FIG. 3 is a flowchart of another method for implementing a multimedia conference according to an embodiment of the present invention.
  • the embodiment is applicable to an application scenario in which the number of participants is large and the participants can speak.
  • the method may include the following steps:
  • the multimedia conference server detects the energy of the voice information sent by the client.
  • the client participating in the multimedia conference sends the obtained voice information of the participant to the multimedia conference server, and the multimedia conference server detects the energy of the received voice information.
  • the energy of detecting the voice information may be implemented by a voice conference bridge in the multimedia conference server.
  • the voice conference bridge is used to provide a voice conference site on the server side, and the voices of the speakers are mixed and sent to each participant.
  • the multimedia conference server determines, according to the order of the energy of the voice information, that the preset number of participants is a speaker.
  • the multimedia conference server detects the energy of the voice information sent by the participants participating in the multimedia conference, sorts the energy from large to small, and sequentially determines the preset number of participants as the speaker.
  • the preset number may be one, that is, the participant with the highest energy of the voice information is determined as a speaker; or the preset number may be two, that is, two participants with the highest energy of the voice information are determined as spokesman.
  • the multimedia conference server may determine that the speaker may be different according to the energy of the voice information.
  • the multimedia conference server sends a speech notification message to the participant participating in the multimedia conference, where the speech notification message carries a user ID (Identification) of the speaker.
  • the multimedia conference server may send a speech notification message to all clients participating in the multimedia conference by means of broadcast, and the client of the participant determines the participant corresponding to the client according to the user ID in the speech notification message. Whether the speaker is a speaker or not, the speaker notification message may be sent one-to-one to the client of the participant corresponding to the user ID, and the user may perform the determination according to the user ID.
  • the participant's client receives the speech notification message of the multimedia conference server. Since the speech notification message contains the user ID, the participant's client can compare the user ID with its own user ID, thereby determining the corresponding client's Whether the participant is a spokesperson.
  • the client determines that the user ID carried by the speech notification message is the same as the user ID of the user, the local participant is determined to be a speaker.
  • the client corresponding to the speaker obtains the voice information of the speaker of the speaker, and sends the voice message to the multimedia conference server.
  • the client corresponding to the speaker converts the spoken voice information into the spoken text information.
  • the client corresponding to the speaker sends the spoken text information to the multimedia conference server.
  • the multimedia conference server sends the speech voice information and the speech text information to a client corresponding to another participant.
  • the client corresponding to the other participant displays the speaking voice information and the speaking text information.
  • the multimedia conference server detects the energy of the voice information sent by each participant, and determines the preset number of participants as the speaker according to the order of the energy, that is, only the energy
  • the speech content of the largest preset number of participants is converted into corresponding text information. This method can avoid converting a lot of speech unrelated to a conference into text, resulting in many texts that are not related to the conference being displayed to the participants, and the interference caused to the participants occurs.
  • FIG. 4 is a flowchart of another method for implementing a multimedia conference according to an embodiment of the present invention.
  • the method includes the following steps:
  • the client determines whether the local participant has the speaking right; if the local participant has the speaking right, executes S320; otherwise, ends the current process.
  • Determining whether the participant has the speaking right may include determining whether the identity attribute of the participant has the presenter right or the moderator right.
  • S320 The client obtains the voice information of the voice of the local participant, and sends the voice message to the multimedia conference server.
  • the client converts the speech voice information into speech text information.
  • the client can have a built-in speech recognition engine, and the client uses the speech recognition engine to convert the speech information of the local participants into the speech information.
  • the client sends the text message to the multimedia conference server.
  • the client can immediately send the voice message to the multimedia conference server, so that the multimedia conference server can forward the voice message of the speaker to other participants in time to ensure voice information.
  • the real-time nature of the transmission Of course, if the time required to convert the speech information into the speech information is very short, generally in the millisecond level, the speech information and the speech information can be sent to the multimedia conference server together, so that the corresponding client of the other participants can be played.
  • the speech information and the displayed speech information are synchronized.
  • the multimedia conference server sends the speech voice information and the speech text information to a client corresponding to another participant.
  • the client corresponding to the other participant displays the speaking voice information and the speaking text information.
  • the method for implementing the multimedia conference provided in this embodiment only converts the speech information of the participant having the speaking authority into the speech text information, instead of converting the speech content of all the participants into the corresponding text information.
  • the method can be used to prevent the conference-independent voice content sent by the participants in the multimedia conference from being converted into corresponding text information and forwarded to other participants, thereby preventing other participants' clients from displaying too much unimportant text information to the participants.
  • the phenomenon of interference occurs.
  • FIG. 5 is a flowchart of still another method for implementing a multimedia conference according to an embodiment of the present invention. The method is applied to an application scenario in which a moderator specifies a speaker, and the method includes the following steps:
  • the client sends a speech request message to the multimedia conference server, where the speech request message carries a user identity ID of the participant corresponding to the client.
  • the client corresponding to the participant sends a speech request message to the multimedia conference server.
  • the floor request message carries the meeting User ID.
  • the multimedia conference server forwards the floor request message to the client corresponding to the moderator.
  • the client corresponding to the host sends a speech response message to the multimedia conference server when determining, according to the speech request message, that the participant is allowed to speak.
  • the client corresponding to the moderator determines whether to allow the participant to speak according to the user ID carried in the message request message. If the participant is allowed to speak, the speaker response message is generated and sent to the multimedia conference server. The participant response message may also carry the participant's user ID to facilitate the multimedia conference server to identify the participant.
  • the client corresponding to the moderator may determine whether to allow the participant to speak according to the identity attribute of the preset participant.
  • the multimedia conference server generates a voice device start command according to the voice response message, and sends the voice device open command to the client corresponding to the speaker.
  • the multimedia conference server generates a voice device open command according to the received speech response message, and the voice device open command is used to control the voice device corresponding to the participant who is allowed to speak.
  • the client corresponding to the speaker converts the spoken voice information into the spoken text information.
  • the multimedia conference server sends the speech voice information and the speech text information to a client corresponding to another participant other than the speaker.
  • the client corresponding to the other participant displays the speaking voice information and the speaking text information.
  • the method for implementing a multimedia conference when a participant other than the host or the presenter needs to speak, sends a speech request message to the client of the host, and the host determines whether to allow the message according to the speech request message.
  • the participant speaks, if the participant is allowed to speak, then Sending a speech response message to the multimedia conference server that allows the participant to speak, and the multimedia conference server generates a voice device activation command according to the speech response message, and controls the voice device corresponding to the participant to be turned on.
  • the voice information of the participant is obtained by the voice device corresponding to the participant, and the client corresponding to the participant converts the voice information into speech text information.
  • This method is applicable to formal conferences or higher-level conference scenarios, and expands the scope of application of multimedia conference implementation methods.
  • FIG. 6 is a flowchart of another method for implementing a multimedia conference according to an embodiment of the present invention. As shown in FIG. 6, the method includes the following steps:
  • S510 The client obtains the voice information of the voice of the local participant, and sends the voice message to the multimedia conference server.
  • the client uses the voice device to collect the voice information of the participant's speech.
  • the multimedia conference server converts the speech voice information into speech text information.
  • the multimedia conference server converts the received speech voice information into speech text information by using a voice recognition engine before mixing the voice information sent by each participant.
  • all participants participating in the multimedia conference can speak freely, and any one of the participants can send the obtained voice information of the local participant to the multimedia conference server.
  • the multimedia conference server can convert the speech text information of any one participant into the speech text information.
  • only the moderator and the presenter can speak, and only the moderator and the presenter can send the obtained voice message to the multimedia conference server.
  • the multimedia conference server converts the received speech voice information into speech text information.
  • the multimedia conference server sends the speech voice information and the corresponding speech text information to a client of another participant participating in the multimedia conference.
  • the other participant is a participant other than the local participant among the participants of the multimedia conference.
  • the client of the other participant displays the speaking voice information and the corresponding speaking text information.
  • the client of the participant obtains the speech voice
  • the information is sent to the multimedia conference server, and the multimedia conference server converts the speech voice information into the speech text information, and then sends the speech voice information and the corresponding speech text information to the client corresponding to other participants participating in the multimedia conference.
  • the participants who participate in the multimedia conference can not only hear the voice information of the speaker's speech, but also can see the corresponding speech text information, can accurately understand the speaker's speech content, and improve the communication effect of the multimedia conference.
  • the method converts the speech information into speech text information by the multimedia conference server, and does not need to integrate the speech recognition engine on each client, thereby reducing the production cost of the client.
  • FIG. 7 is a flowchart of another method for implementing a multimedia conference according to an embodiment of the present invention.
  • the preset number of participants with the largest energy is determined as a speaker, and the speaker is spoken.
  • the speech information of the person's speech is converted into speech text information.
  • the method may include the following steps:
  • the multimedia conference server detects the energy of the voice information sent by the client.
  • the multimedia conference server determines, according to the order of the energy of the voice information, that the preset number of participants is a speaker.
  • the client obtains the voice message of the local participant and sends the voice message to the multimedia conference server.
  • the multimedia conference server converts the voice information sent by the client corresponding to the determined speaker into the voice text information.
  • the multimedia conference server sends the voice message and the corresponding voice text information sent by the client corresponding to the speaker to the client of the other participant participating in the multimedia conference.
  • the other participant is a participant other than the local participant among the participants of the multimedia conference.
  • the client of the other participant displays the received speech voice information and the corresponding speech text information.
  • the multimedia conference server detects the energy of the voice information sent by each participant, and determines the preset number of participants as the speaker according to the order of the energy.
  • the multimedia conference server only converts the contents of the confirmed speaker into Corresponding text information. This method can avoid converting a lot of speech unrelated to a conference into text, resulting in many texts that are not related to the conference being displayed to the participants, and the interference caused to the participants occurs.
  • FIG. 8 is a schematic structural diagram of an apparatus for implementing a multimedia conference according to an embodiment of the present invention.
  • the apparatus for implementing a multimedia conference is used for a client, and includes: an obtaining unit 110, a converting unit 120, and a sending unit 130. .
  • the obtaining unit 110 is configured to obtain local speaking voice information.
  • a local participant refers to a participant in the same geographical space as the client. For example, participant A uses client a to participate in a multimedia conference. For client a, participant A is a local participant corresponding to client a.
  • the obtaining unit 110 may acquire the speech information of the local participant using the voice device.
  • the voice device can include voice information collection hardware integrated on the client and operating software that controls the voice information collection hardware.
  • the voice information collection hardware can implement functions such as voice collection, voice coding, and voice decoding.
  • the operating software can query the number and name of the local voice information collecting hardware, and can also turn on, turn off or mute the voice collecting hardware.
  • the device for implementing multimedia in this embodiment can be applied to a discussion-type conference application scenario, and each participant can speak, so that each client can obtain the voice information of the participant's voice corresponding to itself. If the client obtains the voice information of the participant's voice through the voice device, the voice device corresponding to each participant is turned on.
  • the converting unit 120 is configured to convert the speech information into speech text information.
  • the converting unit 120 converts the voice information of the obtained local speaker into speech text information by using a voice recognition technology.
  • the voice information of the local participants obtained by the client is relatively strong. Therefore, the accuracy of the voice message converted by the client corresponding to the speaker is higher. At the same time, in this way, the client corresponding to other participants does not need to convert the speech information of the speaker's speech into the speech text information, thereby saving the resources of the client corresponding to other participants.
  • the sending unit 130 is configured to send the speaking voice information and the speaking text information to the multimedia conference server, so that the multimedia conference server sends the speaking voice information and the speaking text information.
  • the client corresponding to other participants.
  • the other participant is a participant other than the speaker among all the participants who participate in the multimedia conference.
  • the client sends the voice message and the text message to the multimedia conference server, so that the multimedia conference server sends the message to the client corresponding to the other participants participating in the multimedia conference, and finally the client corresponding to the other participant receives the presentation.
  • the speech message and the text message are spoken, which helps the participants to quickly understand the speaker's speech.
  • participants participating in the local multimedia conference include A, B, C, D, and E.
  • participant A is a speaker
  • participants B, C, D, and E are other participants.
  • the multimedia conference server sends the voice information and the voice text information of the participant A to the B, C, D, and E.
  • the T.120 protocol standard can be integrated on both the client and the multimedia conference server to implement the function of transmitting and receiving voice messages and speaking text information between the client and the multimedia conference server.
  • the T.120 standard includes a series of protocols such as T.120-T.127, which can realize the reliability of information transmission between clients and between the client and the multimedia conference server, and at the same time, can provide points to Multi-point data distribution service and select the transmission channel with the best transmission efficiency to transmit data.
  • the apparatus for implementing a multimedia conference shown in this embodiment acquires the speech information of the local participant by the acquisition unit, and converts the speech information into speech text information through the conversion unit. Then, the speaking voice information and the speaking text information are sent to the multimedia conference server through the sending unit, and then forwarded by the multimedia conference server to the client corresponding to other participants participating in the multimedia conference, and the client corresponding to the other participant displays the received voice. Information and text messages.
  • the participant can both hear the voice information of the speaker and the corresponding text message, so that the participant can accurately understand the speaker by combining the text message and the voice message. The content of the speech has thus improved the communication effect of the multimedia conference.
  • FIG. 9 is a schematic structural diagram of another apparatus for implementing a multimedia conference according to an embodiment of the present invention.
  • the apparatus may further include: a display unit 140 and a storage unit 150 on the basis of the embodiment shown in FIG. 8.
  • the display unit 140 is configured to display the spoken text information.
  • the storage unit 150 is configured to store the spoken text information.
  • the client corresponding to the speaker may further store the utterance text information, so as to generate the conference minutes by using the utterance text information.
  • the client corresponding to other participants participating in the multimedia conference may also store the received speech text information to generate a meeting minutes based on the spoken text information.
  • the corresponding client of the speaker can also display the spoken text information, so that the speaker can view the content of his speech.
  • FIG. 10 is a schematic structural diagram of an obtaining unit 110 according to an embodiment of the present invention.
  • the multi-acquisition unit 110 is configured to convert a speech content of a host or a presenter into corresponding text information, and ignore the content of other participants.
  • the obtaining unit 110 may include a first determining subunit 1101 and a first collecting subunit 1102:
  • the first determining sub-unit 1101 is configured to determine, when the participant corresponding to the local client needs to speak, whether the participant has the speaking right.
  • Determining whether the participant has the speaking right may include determining whether the identity attribute of the participant has the presenter right or the moderator right.
  • the first collecting sub-unit 1102 is configured to use the voice device to collect the speaking voice information when the first determining unit 1101 determines that the local participant has the speaking right having the speaking right or the moderator right.
  • the apparatus for implementing a multimedia conference provided by this embodiment, only the speech information of the participant having the speaking authority is converted into the speech text information, instead of converting the speech content of all the participants into the corresponding text information.
  • the device can be used to prevent the conference-independent voice content sent by the participants in the multimedia conference from being converted into corresponding text information and forwarded to other participants, thereby preventing other participants' clients from displaying too much unimportant text information to the participants.
  • the phenomenon of interference occurs.
  • all participants are allowed to speak, for example, a discussion session.
  • the voice information sent by all participants is converted into the corresponding text information, it will result in Many speeches unrelated to the conference are converted into text, and many texts that are not related to the conference are displayed to the participants, causing interference to the participants.
  • a participant with a large voice energy can be determined as a speaker, and the voice information of the speaker's speech can be converted into a speech text information, and the voice content of other participants with less voice energy is ignored.
  • FIG. 11 is a schematic structural diagram of a conversion unit 120 according to an embodiment of the present invention.
  • the conversion unit 120 is applicable to an application scenario in which a large number of participants and participants can speak.
  • the converting unit 120 may include a first receiving subunit 1201, a second judging subunit 1202, and a second collecting subunit 1203:
  • the first receiving subunit 1201 is configured to receive a speech notification message sent by the multimedia conference server, where the speaker notification message carries the user identity information ID of the speaker, and the speaker is sent by the multimedia conference server according to the participant participating in the multimedia conference.
  • the energy of the voice information is determined according to the order of the energy from the largest to the smallest preset number of participants, and the client of the participant can compare the user ID with its own user ID to determine its own client. Whether the participant corresponding to the end is a speaker.
  • the second determining sub-unit 1202 is configured to determine whether the user ID carried by the speech notification message is the same as the user ID of the local participant.
  • the second collection sub-unit 1203 is configured to: when the second determination sub-unit 1202 determines that the user ID carried by the speech notification message is the same as the user ID of the local participant, the local participant is collected by using a voice device. Speech message.
  • the first notification subunit in the conversion unit 120 receives the speech notification message sent by the multimedia conference server. Since the speech notification message carries the user identity information ID of the speaker, the speaker may be according to the multimedia conference server.
  • the energy of the voice information sent by the participants participating in the multimedia conference is determined according to the number of participants in the order of being large to small, that is, the client only converts the content of the voice of the preset number of participants with the largest energy into corresponding Text information. It is possible to avoid converting a lot of speech that is not related to the conference into text, which causes many texts that are not related to the conference to be displayed to the participants, and the interference caused to the participants appears.
  • FIG. 12 is a schematic structural diagram of still another acquiring unit 110 according to an embodiment of the present invention.
  • the acquiring unit 110 is applied to an application scenario in which the moderator specifies a speaker.
  • the acquiring unit 110 includes: a first sending subunit 1103, a second receiving subunit 1104, and a third collecting subunit 1105.
  • a first sending subunit 1103 configured to send a floor request message to the multimedia conference server, where the floor request message carries a user ID of the local participant, so that the multimedia conference server sends the floor request message to the moderator .
  • the client corresponding to the participant sends a speech request message to the multimedia conference server.
  • the floor request message carries the user ID of the participant.
  • a second receiving subunit 1104 configured to receive a voice device open command sent by the multimedia conference server, where
  • the voice device open command is generated by the multimedia conference server after receiving the utterance response message returned by the client corresponding to the utterance request message, and specifically, after receiving the sneak request message, the client corresponding to the host
  • the user ID carried in the floor request message determines whether the participant is allowed to speak. If the participant is allowed to speak, the client corresponding to the host will generate a response message and send it to the multimedia conference server.
  • the participant response message may also carry the participant's user ID to facilitate the multimedia conference server to identify the participant.
  • the client corresponding to the moderator may determine whether to allow the participant to speak according to the identity attribute of the preset participant.
  • the third collection sub-unit 1105 is configured to collect the speech information of the local participant by using the voice device when the second receiving sub-unit 1104 receives the voice device opening instruction.
  • the multimedia conference server transmits a request message to the client of the host, and the host determines whether the message is requested according to the message. Allowing the participant to speak. If the participant is allowed to speak, the host's client sends a message to the multimedia conference server that allows the participant to speak. And responding to the message, so that the multimedia conference server generates a voice device open command according to the voice response message, and controls the voice device corresponding to the participant to be turned on.
  • the voice information of the participant is obtained by the voice device corresponding to the participant, and the client corresponding to the participant converts the voice information into speech text information.
  • the device is suitable for formal meetings or higher-level meeting scenarios, and expands the scope of application of the multimedia conference implementation method.
  • FIG. 13 is a schematic structural diagram of another apparatus for implementing a multimedia conference according to an embodiment of the present invention.
  • the apparatus for implementing a multimedia conference is used for a multimedia conference server.
  • the apparatus includes an acquisition unit. 210 and the first transmitting unit 220.
  • the obtaining unit 210 is configured to obtain the speaking voice information and the speaking text information sent by the client.
  • the first sending unit 220 is configured to send the speaking voice information and the speaking text information to the client corresponding to the other participant, so that the client corresponding to the other participant displays the speaking voice information and the speaking text.
  • Information wherein the other participant is a participant other than the participant who sent the speaking voice information and the speaking text information among the participants participating in the multimedia conference.
  • the multimedia conference server sends the received voice information and the spoken text information to the client corresponding to other participants participating in the multimedia conference.
  • the client corresponding to other participants displays the received speech information and the spoken text information, thereby facilitating the participants to quickly understand the speaker's speech.
  • the device for implementing the multimedia conference applied to the multimedia conference server in the embodiment the client obtains the voice information of the voice of the local participant and sends the voice message to the multimedia conference server; and then, the multimedia conference server forwards the voice message and the voice message to the voice conference server.
  • a client corresponding to another participant participating in the multimedia conference so that the client corresponding to the other participant displays the received speech voice information and the speech text information.
  • the participant can both hear the voice information of the speaker and the corresponding text message, so that the participant can accurately understand the speaker by combining the text message and the voice message. The content of the speech has thus improved the communication effect of the multimedia conference.
  • FIG. 14 is a schematic structural diagram of another apparatus for implementing a multimedia conference according to an embodiment of the present invention.
  • the embodiment is applicable to an application scenario in which the number of participants is large and the participants can speak.
  • the device is implemented in FIG. 13 .
  • the detection unit 230, the determination unit 240, and the second transmission unit 250 may also be included on an example basis.
  • the detecting unit 230 is configured to detect energy of the voice information sent by the client.
  • the multimedia conference server will receive the voice information of the participant obtained by the client of the participant participating in the multimedia conference, and the multimedia conference server detects the energy of the received voice message.
  • the energy of detecting the voice information may be implemented by a voice conference bridge in the multimedia conference server.
  • the voice conference bridge is used to provide a voice conference site on the server side, and the voices of the speakers are mixed and sent to each participant.
  • the determining unit 240 is configured to determine, according to the order of the energy from the largest to the smallest, the pre-preset number of participants as the speaker.
  • the multimedia conference server detects the energy of the voice information sent by the participants participating in the multimedia conference, sorts according to the energy from large to small, and sequentially determines a preset number of participants as speakers.
  • the preset number may be one, that is, the participant with the highest energy of the voice information is determined as a speaker; or the preset number may be two, that is, two participants with the highest energy of the voice information are determined as spokesman.
  • the multimedia conference server may determine that the speaker may be different according to the energy of the voice information.
  • the second sending unit 250 is configured to send a spoofing notification message to the client corresponding to the speaker, where the spoofing notification message carries the user identification information ID of the speaker, so that the client corresponding to the speaker obtains
  • the speaker's speech information is converted and the speech information is converted into speech text information.
  • the multimedia conference server may send a speech notification message to all clients participating in the multimedia conference by means of broadcast, and the client of the participant determines the participant corresponding to the client according to the user ID in the speech notification message. Whether the speaker is a speaker or not, the speaker notification message may be sent one-to-one to the client of the participant corresponding to the user ID, and the client determines whether the speaker is a speaker according to the user ID.
  • the participant's client receives the speech notification message of the multimedia conference server. Since the speech notification message contains the user ID, the participant's client can compare the user ID with its own user ID, thereby determining the client's corresponding participation. Whether it is a spokesman.
  • the multimedia conference server detects the energy of the voice information sent by each participant, and determines the preset number of participants as the speaker according to the order of the energy, that is, only the energy
  • the speech content of the largest preset number of participants is converted into corresponding text information.
  • the device can prevent many conference-independent voices generated by many clients from being converted into texts, resulting in many texts that are not related to the conference being displayed to the participants, and the interference caused to the participants occurs.
  • FIG. 15 is a schematic structural diagram of another apparatus for implementing a multimedia conference according to an embodiment of the present invention.
  • the device for implementing the multimedia conference is applied to the application scenario of the moderator designated by the moderator.
  • the device may further include: a first receiving unit 260, a third sending unit 270, and a second receiving, based on the embodiment shown in FIG. Unit 280 and fourth transmitting unit 290.
  • the first receiving unit 260 is configured to receive a floor request message sent by the client, where the floor request message carries a user ID of the participant corresponding to the client.
  • the third sending unit 270 is configured to send the floor request message to the client corresponding to the host, so that the client corresponding to the moderator determines, according to the floor request message, whether the participant who sends the floor request message has a speech. Permissions.
  • the second receiving unit 280 is configured to receive a speech response message sent by the client corresponding to the moderator.
  • the client corresponding to the moderator determines whether the participant is allowed to speak according to the user ID carried in the message request message. If the participant is allowed to speak, the hair is generated
  • the multimedia conference server will receive the speech response message of the participant.
  • the participant response message may also carry the participant's user ID to facilitate the multimedia conference server to identify the participant.
  • the client corresponding to the moderator may determine whether to allow the participant to speak according to the identity attribute of the preset participant. For example, when establishing a multimedia conference, the moderator can judge whether the participant can speak according to the attendance status of the participant, for example, the presenter of the conference allows the speaker to speak.
  • the fourth sending unit 290 is configured to send a voice device open command to the client corresponding to the participant having the floor permission, where the speaker response message is determined by the client corresponding to the moderator to send the participant request message Generated when speaking permission.
  • the multimedia conference server generates a voice device open command according to the received speech response message, and the voice device open command is used to control the voice device corresponding to the participant who is allowed to speak.
  • the multimedia conference server forwards the message request message of the other participant to the client of the moderator, and is hosted by the host.
  • the person determines, according to the speech request message, whether the participant is allowed to speak. If the participant is allowed to speak, the multimedia conference server receives the speech response message sent by the moderator client to allow the participant to speak, and the multimedia conference The server generates a voice device open command according to the voice response message, and controls the voice device corresponding to the participant to be turned on. After being enabled, the voice device corresponding to the participant obtains the voice information of the voice of the participant, and the client corresponding to the participant converts the voice message into speech text information.
  • This method is applicable to formal conferences or higher-level conference scenarios, and expands the scope of application of multimedia conference implementation methods.
  • the present invention further provides a corresponding device embodiment.
  • FIG. 16 is a schematic structural diagram of an apparatus for implementing a multimedia conference applied to a client according to an embodiment of the present invention.
  • the apparatus includes: an obtaining unit 310 and a sending unit 320.
  • the obtaining unit 310 is configured to obtain the speaking voice information of the local participant.
  • the sending unit 320 is configured to send the speech voice information to the multimedia conference server, so that the multimedia conference server converts the speech voice information into speech text information, and causes the multimedia conference server to use the speech message information And sending the speech text information to the client of other participants participating in the multimedia conference
  • the other participant is a participant other than the local participant among the participants of the multimedia conference.
  • the client of the participant obtains the voice message and sends the message to the multimedia conference server, and the multimedia conference server converts the voice message into the voice message, and then the voice message and the corresponding voice message.
  • the spoken text information is sent to the client corresponding to other participants participating in the multimedia conference.
  • the participants who participate in the multimedia conference can not only hear the voice information of the speaker's speech, but also can see the corresponding speech text information, can accurately understand the speaker's speech content, and improve the communication effect of the multimedia conference.
  • the method converts the speech information into speech text information by the multimedia conference server, and does not need to integrate the speech recognition engine on each client, thereby reducing the production cost of the client.
  • FIG. 17 is a schematic structural diagram of an apparatus for implementing a multimedia conference applied to a multimedia conference server according to an embodiment of the present invention.
  • the apparatus includes: an obtaining unit 410, a converting unit 420, and a sending unit 430.
  • the obtaining unit 410 is configured to obtain the speaking voice information sent by the client.
  • the converting unit 420 is configured to convert the spoken voice information into speech text information.
  • the multimedia conference server determines, according to the energy of the participant's voice information, a preset number of participants with the largest energy as a speaker, and converts the voice information of the received speaker into a speech text. information.
  • the conversion unit 420 may include a detection subunit and a conversion subunit.
  • the detecting subunit is configured to detect the energy of the voice information sent by the client, and determine, according to the energy from the largest to the smallest, the preset number of participants are speakers; the conversion subunit is used to utilize The speech recognition engine converts the speech speech information sent by the determined speaker into speech text information.
  • the sending unit 430 is configured to send the speaking voice information and the speaking text information to a client corresponding to another participant, so that the client corresponding to the other participant displays the speaking voice information and the speaking text. information.
  • the other participant is a participant other than the participant who sends the speaking voice information among the participants who participate in the multimedia conference.
  • the multimedia conference server detects the energy of the voice information sent by each participant, and determines the preset number of participants as the speaker according to the order of the energy.
  • the multimedia conference server only converts the content of the speech of the determined speaker into corresponding text information. This method can avoid converting a lot of speech unrelated to a conference into text, resulting in many texts that are not related to the conference being displayed to the participants, and the interference caused to the participants occurs.
  • the embodiment of the present invention further provides a client for implementing the multimedia conference.
  • the client includes: a processor 1411. , the transmitter 1412 and the memory 1413;
  • the memory 1413 stores an operation instruction executable by the processor 1411, and the processor 1411 reads the operation instruction in the memory 1413 for implementing the following functions: acquiring the speech information of the local participant, and converting the speech information into the speech information. .
  • the audio signal of the participant may be collected by the voice device for corresponding processing and then provided to the processor 1411.
  • the voice device may be a MIC.
  • the processor 1411 is specifically configured to: determine whether the local participant has the speaking right; if the local participant has the speaking right, collect the speaking voice information of the local participant.
  • the transmitter 1412 is configured to send the speech voice information and the speech text information to the multimedia conference server, so that the multimedia conference server sends the speech speech information and the speech text information to the multimedia conference.
  • the client corresponding to the other participants, wherein the other participants are other participants except the speaker among all the participants who participate in the multimedia conference.
  • the multimedia conference server is based on the participant's voice information.
  • the amount of energy, the preset number of participants who determine the maximum energy is the speaker, and then the client converts the speech information of the speaker into the speech information.
  • the client that implements the multimedia conference may further include a receiver.
  • the receiver is configured to receive a speech notification message sent by the multimedia conference server, where the speech notification message carries a user identity information ID of the speaker, and the speaker is sent by the multimedia conference server according to the voice sent by the participant participating in the multimedia conference.
  • the energy of the information which is determined according to the order in which the energy is in a small order;
  • the processor 1411 is further configured to determine whether the user ID carried by the floor notification message is the same as the user ID of the local participant, if the user ID carried by the floor notification message and the local participant The user ID is the same, and the local participant is determined to be a speaker, and then the voice information of the local participant is obtained.
  • only the moderator and the presenter can speak, other participants cannot speak, the voice devices of other participants are turned off, and the participants themselves cannot turn on the voice device.
  • the participant can request the moderator to turn on the participant's voice device.
  • the sender 1412 is further configured to send a floor request message to the multimedia conference server, where the floor request message carries the user ID of the local participant, so that the multimedia conference server sends the floor request message to the moderator.
  • the receiver is further configured to: receive a voice device open command sent by the multimedia conference server, and provide a voice device open command to the voice device, so that the voice device collects voice information of the local participant, where the voice device is enabled.
  • the instruction is received by the multimedia conference server, and the client corresponding to the host generates the utterance response message returned according to the utterance request message.
  • the client implementing the multimedia conference may further include a display.
  • the display is configured to display the spoken text information.
  • the memory is further configured to store the spoken text information to generate a meeting minutes by using the spoken text information.
  • the client obtains the speech information of the local participant and converts the speech information into speech text information. Then speak speech information and speech text information The message is sent to the multimedia conference server, and then forwarded by the multimedia conference server to the client corresponding to other participants participating in the multimedia conference, and the client corresponding to the other participant displays the received voice message and the text message.
  • the participant can both hear the speech information of the speaker and the corresponding speech text information, so that the participant can accurately understand the speaker by combining the speech text information and the speech voice information.
  • the content of the speech has thus improved the communication effect of the multimedia conference.
  • the embodiment of the present invention further provides a multimedia conference server corresponding to the device for implementing the multimedia conference applied to the multimedia conference server shown in FIG. 13 to FIG. 15 .
  • the multimedia conference server includes: a receiver 1511 And a transmitter 1512.
  • the receiver 1511 is configured to obtain speech voice information and speech text information sent by the client.
  • the sender 1512 is configured to send the speech voice information and the speech text information to a client corresponding to another participant, so that the client corresponding to the other participant displays the speech voice information and the speech text information.
  • the other participant is a participant other than the participant who sends the speaking voice information and the speaking text information among the participants who participate in the multimedia conference.
  • the multimedia conference server further includes a processor 1513.
  • the receiver 1511 is further configured to acquire voice information energy sent by the client.
  • the processor 1513 is configured to determine, according to the order of the energy of the voice information, a preset number of participants as a speaker.
  • the transmitter 1512 is specifically configured to: send a speech notification message to the participant participating in the multimedia conference, where the speech notification message carries the user ID of the speaker, so that the client of the participant acquires the speech voice sent by the multimedia conference server. information.
  • the receiver 1511 is further configured to receive a floor request message sent by the client, where the floor request message carries a user ID of the participant corresponding to the client.
  • the sender 1512 is further configured to send the floor request message to the client corresponding to the moderator, so that the client corresponding to the moderator determines, according to the floor request message, whether the participant sending the floor request message has Right to speak;
  • the receiver 1511 is further configured to receive a speech response message sent by the client corresponding to the host, and send a voice device open command to the client corresponding to the participant having the speaking permission, where the speech response message is The client corresponding to the person determines that the participant who sent the floor request message has a speaking right.
  • the client sent by the client obtains the voice information of the local participant and sends the voice message to the multimedia conference server. Then, the multimedia conference server forwards the voice message and the text message to the multimedia conference.
  • the other participant corresponds to the client, so that the client corresponding to the other participant displays the received speech message and the speech message.
  • the participant can both hear the voice information of the speaker and the corresponding text message, so that the participant can accurately understand the speaker by combining the text message and the voice message. The content of the speech has thus improved the communication effect of the multimedia conference.
  • the present invention also provides a multimedia conference system, including the client shown in FIG. 18 and the multimedia conference server shown in FIGS. 19-20.
  • the client is configured to obtain the speech information of the local participant and send the speech information to the multimedia conference server; and convert the speech information into speech text information, and send the speech text information to the multimedia conference server;
  • the multimedia conference server is configured to send the speech voice information and the speech text information to a client of another participant participating in the multimedia conference;
  • the other participant is a participant other than the local participant among the participants of the multimedia conference.
  • the multimedia conference server is further configured to detect the energy of the voice information sent by the client participating in the multimedia conference, and determine the preset number according to the order of the energy from large to small.
  • the participant is a spokesperson and sends a speech to the client corresponding to the spokesperson. Knowing the message, the speech notification message carries the user identification information ID of the speaker.
  • the client is configured to receive a speech notification message sent by the multimedia conference server, and determine, according to the speech notification information, that the local participant is a speaker, obtain the speech information of the local participant, and send the speech message to the multimedia conference server. And converting the speech voice information into speech text information, and transmitting the speech text information to the multimedia conference server.
  • the present invention further provides a client for implementing a multimedia conference.
  • the client includes a processor 1610 and a transmitter 1620.
  • the processor 1610 is configured to obtain the speaking voice information of the local participant.
  • the transmitter 1620 is configured to send the speech voice information to the multimedia conference server, so that the multimedia conference server converts the speech voice information into speech text information, and causes the multimedia conference server to use the speech voice information And sending the speech text information to the client of other participants participating in the multimedia conference
  • the other participant is a participant other than the local participant among the participants of the multimedia conference.
  • the client of the participant obtains the voice message and sends the message to the multimedia conference server, and the multimedia conference server converts the voice message into the voice message, and then the voice message is The corresponding speech text information is sent to the client corresponding to other participants participating in the multimedia conference.
  • the participants who participate in the multimedia conference can not only hear the voice information of the speaker's speech, but also can see the corresponding speech text information, can accurately understand the speaker's speech content, and improve the communication effect of the multimedia conference.
  • the method converts the speech information into speech text information by the multimedia conference server, and does not need to integrate the speech recognition engine on each client, thereby reducing the production cost of the client.
  • the present invention further provides a multimedia conference server, as shown in FIG. 22, which includes a processor 1710 and a transmitter 1720.
  • the processor 1710 is configured to obtain the speech information sent by the client, and convert the speech information into speech text information.
  • the sender 1720 is configured to send the speaking voice information and the speaking text information to a client corresponding to another participant, so that the client corresponding to the other participant displays the speaking voice information and the speaking text. information.
  • the other participant is a participant other than the participant who sends the speaking voice information among the participants who participate in the multimedia conference.
  • the multimedia conference server detects the energy of the voice information sent by each participant, and determines the preset number of participants as the speaker according to the order of the energy.
  • the multimedia conference server only converts the content of the speech of the determined speaker into corresponding text information. This method can avoid converting a lot of speech unrelated to a conference into text, resulting in many texts that are not related to the conference being displayed to the participants, and the interference caused to the participants occurs.
  • the present invention also provides another multimedia conference system, including the client shown in FIG. 21 and the multimedia conference server shown in FIG.
  • the client is configured to obtain the voice information of the local participant and send the voice message to the multimedia conference server.
  • the multimedia conference server is configured to convert the speech voice information into speech text information, and send the speech speech information and the speech text information corresponding to the speech speech information to a client corresponding to another participant;
  • the other participant is a participant other than the participant who sent the speaking voice information among the participants participating in the multimedia conference.
  • the client corresponding to the other participant is further configured to display the speaking voice information and the speaking text information sent by the multimedia conference server to the user.
  • the multimedia conference server is further configured to detect the energy of the voice information sent by the client participating in the multimedia conference, and determine the preset number according to the order of the energy from large to small.
  • the participant is a speaker, and when the received speech voice information comes from the determined speaker, the speech voice information is converted into speech text information.
  • the present invention can be implemented by means of software plus a necessary general hardware platform, and of course hardware, but in many cases the former is a better implementation.
  • the technical solution of the present invention which is essential or contributes to the prior art, may be embodied in the form of a software product stored in a storage medium, including a plurality of instructions for causing a A computer device (which may be a personal computer, server, or network device, etc.) performs all or part of the steps of the methods described in various embodiments of the present invention.
  • the foregoing storage medium includes various types of media that can store program codes, such as a read only memory (ROM), a random access memory (RAM), a magnetic disk, or an optical disk.

Abstract

Disclosed in embodiments of the present invention are a method and device for realizing a multimedia conference. The method comprises: acquiring, by a client, speech voice information from a local conference participant, and converting the speech voice information to speech text information; transmitting the speech voice information and the speech text information to a multimedia conference server, and forwarding, by the multimedia conference server, the same to clients corresponding to other conference participants attending the multimedia conference; and displaying, by the clients corresponding to the other conference participants, the received speech voice information and the speech text information. By utilizing the method for realizing multimedia conference of the present invention, a conference participant can hear speech voice information from a speaker and read corresponding speech text information. The combination of the speech text information and the speech voice information enables the conference participant to accurately understand the content delivered by the speaker, thereby improving the communication efficiency of the multimedia conference.

Description

实现多媒体会议的方法及装置Method and device for realizing multimedia conference
本申请要求于2015年5月19日提交中国专利局、申请号为201510255577.1、发明名称为“实现多媒体会议的方法及装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。The present application claims priority to Chinese Patent Application No. 201510255577.1, entitled "Method and Apparatus for Implementing Multimedia Conferences", which is incorporated by reference in its entirety in its entirety. .
技术领域Technical field
本发明涉及多媒体会议技术领域,更为具体地说,特别是涉及一种实现多媒体会议的方法及装置。The present invention relates to the field of multimedia conference technologies, and more particularly, to a method and apparatus for implementing a multimedia conference.
背景技术Background technique
多媒体会议是一种在网络上开展的融合语音、视频、数据于一体的会议,多媒体会议借助宽带接入网络,为用户提供语音、视频、数据、即时消息等远距离传输的多媒体业务,通过统一的Web门户,用户即可创建多媒体会议。Multimedia conference is a kind of conference that integrates voice, video and data on the network. The multimedia conference provides users with long-distance transmission of voice, video, data, instant messaging and other multimedia services through the broadband access network. The web portal allows users to create multimedia conferences.
但是现有技术的多媒体会议中,会议发言人与其它与会者经常出现沟通不畅的情况,例如:当会议发言人与其它与会者的母语不同或者会议发言人带有方言时,常常发生其它与会者无法准确理解会议发言人的意思的情况;又如,在多媒体会议中,若其它与会者走神,错过会议发言人的部分发言,导致无法准确理解会议发言人的发言内容,大大降低了会议沟通的效果。However, in the prior art multimedia conferences, the conference spokesperson and other participants often have poor communication. For example, when the conference spokesperson is different from the other participants' mother tongue or the conference spokesperson has a dialect, other participants often occur. It is impossible to accurately understand the meaning of the speaker of the meeting; for example, in the multimedia conference, if other participants are distracted and miss some of the speeches of the speaker of the meeting, the speech of the speaker of the meeting cannot be accurately understood, which greatly reduces the communication of the meeting. Effect.
发明内容Summary of the invention
本发明实施例中提供了一种实现多媒体会议的方法及装置,以解决现有技术的多媒体会议中与会者无法准确理解会议发言人的发言内容的问题。In the embodiment of the present invention, a method and an apparatus for implementing a multimedia conference are provided to solve the problem that a participant cannot accurately understand the content of a conference speaker in a multimedia conference in the prior art.
为了解决上述技术问题,本发明实施例公开了如下技术方案:In order to solve the above technical problem, the embodiment of the present invention discloses the following technical solutions:
第一方面,本发明提供一种实现多媒体会议的方法,包括:In a first aspect, the present invention provides a method for implementing a multimedia conference, including:
客户端获取本地与会者的发言语音信息,并将所述发言语音信息发送给多媒体会议服务器;The client obtains the voice information of the voice of the local participant, and sends the voice message of the voice to the multimedia conference server;
所述客户端将所述发言语音信息转换成发言文字信息;The client converts the spoken voice information into speech text information;
所述客户端将所述发言文字信息发送给多媒体会议服务器,以使所 述多媒体会议服务器将所述发言语音信息及所述发言文字信息发送给所述多媒体会议的其它与会者的客户端;Sending, by the client, the spoken text information to a multimedia conference server, so that the client Transmitting, by the multimedia conference server, the speech voice information and the speech text information to a client of another participant of the multimedia conference;
其中,所述其它与会者是所述多媒体会议的与会者中除所述本地与会者之外的其它与会者。The other participant is a participant other than the local participant among the participants of the multimedia conference.
结合第一方面,在第一方面的第一种可能的实现方式中,所述客户端将所述发言语音信息转换为发言文字信息,包括:With reference to the first aspect, in a first possible implementation manner of the first aspect, the performing, by the client, the speaking voice information to the speaking text information includes:
接收多媒体会议服务器发送的发言通知消息,所述发言通知消息携带发言人的用户身份识别信息ID,所述发言人由多媒体会议服务器根据参加多媒体会议的与会者发送的语音信息的能量,按照所述能量由大到小的顺序依次确定出的前预设数量个与会者;Receiving a speech notification message sent by the multimedia conference server, where the speech notification message carries a user identification information ID of the speaker, and the speaker is configured by the multimedia conference server according to the energy of the voice information sent by the participant participating in the multimedia conference, according to the a predetermined number of participants in the order of energy from large to small;
判断所述发言通知消息所携带的用户ID与所述本地与会者的用户ID是否相同;Determining whether the user ID carried in the speech notification message is the same as the user ID of the local participant;
如果所述发言通知消息所携带的用户ID与所述本地与会者的用户ID相同,利用语音识别引擎将采集到的发言语音信息转换成发言文字信息。If the user ID carried by the speech notification message is the same as the user ID of the local participant, the voice speech recognition engine converts the collected speech voice information into speech text information.
结合第一方面,在第一方面的第二种可能的实现方式中,所述客户端获取本地与会者的发言语音信息,包括:With reference to the first aspect, in a second possible implementation manner of the first aspect, the acquiring, by the client, the voice information of the local participant, includes:
所述客户端判断所述本地与会者是否具有发言权限;Determining, by the client, whether the local participant has a speaking right;
如果所述本地与会者具有发言权限,则利用语音设备采集所述本地与会者的发言语音信息。If the local participant has the speaking right, the voice information of the local participant is collected by using the voice device.
结合第一方面,在第一方面的第三种可能的实现方式中,所述客户端获取本地与会者的发言语音信息,包括:With reference to the first aspect, in a third possible implementation manner of the first aspect, the acquiring, by the client, the voice information of the local participant, includes:
所述客户端向多媒体会议服务器发送发言请求消息,所述发言请求消息携带所述本地与会者的用户ID,以使所述多媒体会议服务器将所述发言请求消息发送给主持人对应的客户端;The client sends a speech request message to the multimedia conference server, where the speech request message carries the user ID of the local participant, so that the multimedia conference server sends the speech request message to the client corresponding to the host;
当所述客户端接收到所述多媒体会议服务器发送的语音设备开启指令时,利用语音设备采集所述本地与会者的发言语音信息;所述语音设 备开启指令由所述多媒体会议服务器接收到主持人对应的客户端根据发言请求消息返回的发言响应消息产生。When the client receives the voice device open command sent by the multimedia conference server, the voice device is used to collect the voice information of the local participant; the voice device is configured. The standby enable command is received by the multimedia conference server, and the client corresponding to the host generates the response response message returned according to the speech request message.
第二方面,本发明提供一种实现多媒体会议的方法,包括:In a second aspect, the present invention provides a method for implementing a multimedia conference, including:
多媒体会议服务器获取客户端发送的发言语音信息及与所述发言语音信息相对应的发言文字信息,所述发言文字信息由所述客户端将获得的发言语音信息利用语音识别引擎转换得到;The multimedia conference server acquires the speech speech information sent by the client and the speech text information corresponding to the speech speech information, and the speech speech information obtained by the client is converted by the speech recognition engine by using the speech recognition engine;
所述多媒体会议服务器将所述发言语音信息及发言文字信息发送给其它与会者对应的客户端,以使所述其它与会者对应的客户端展示所述发言语音信息及所述发言文字信息;The multimedia conference server sends the speech voice information and the speech text information to the client corresponding to the other participant, so that the client corresponding to the other participant displays the speech voice information and the speech text information;
其中,所述其它与会者是参加所述多媒体会议的与会者中除发送所述发言语音信息及发言文字信息的与会者之外的与会者。The other participant is a participant other than the participant who sends the speaking voice information and the speaking text information among the participants who participate in the multimedia conference.
结合第二方面,在第二方面的第一种可能的实现方式中,还包括:With reference to the second aspect, in a first possible implementation manner of the second aspect, the method further includes:
所述多媒体会议服务器检测客户端发送的语音信息的能量;The multimedia conference server detects energy of voice information sent by the client;
所述多媒体会议服务器按照所述能量由大到小的顺序确定前预设数量个与会者为发言人;The multimedia conference server determines, according to the order of the energy from the largest to the smallest, a preset number of participants as a speaker;
所述多媒体会议服务器向所述发言人对应的客户端发送发言通知消息,所述发言通知消息携带所述发言人的用户身份识别信息ID,以使所述发言人对应的客户端获取所述发言人的发言语音信息并将所述发言语音信息转换为发言文字信息。The multimedia conference server sends a speech notification message to the client corresponding to the speaker, where the speech notification message carries the user identification information ID of the speaker, so that the client corresponding to the speaker obtains the speech. The person speaks the voice information and converts the spoken voice information into the spoken text information.
结合第二方面,在第二方面的第二种可能的实现方式中,还包括:With reference to the second aspect, in a second possible implementation manner of the second aspect, the method further includes:
所述多媒体会议服务器接收客户端发送的发言请求消息,所述发言请求消息携带所述客户端对应的与会者的用户ID;The multimedia conference server receives a speech request message sent by the client, where the speech request message carries a user ID of the participant corresponding to the client;
所述多媒体会议服务器将所述发言请求消息发送给主持人对应的客户端,以使所述主持人对应的客户端根据所述发言请求消息判断发送发言请求消息的与会者是否具有发言权限;Sending, by the multimedia conference server, the spoofing request message to the client corresponding to the moderator, so that the client corresponding to the moderator determines, according to the spoofing request message, whether the participant who sends the sneak request message has the utterance authority;
所述多媒体会议服务器接收所述主持人对应的客户端发送的发言响应消息,并根据发言响应消息向具有发言权限的与会者对应的客户端发 送语音设备开启指令,以使具有发言权限的与会者采集发言语音信息;Receiving, by the multimedia conference server, a speech response message sent by the client corresponding to the moderator, and sending the message to the client corresponding to the participant having the speaking permission according to the speech response message Sending a voice device open command, so that the participant having the speaking right collects the voice information of the voice;
其中,所述发言响应消息由所述主持人对应的客户端确定发送所述发言请求消息的与会者具有发言权限时产生。The session response message is generated when the client corresponding to the moderator determines that the participant who sends the floor request message has the floor permission.
第三方面,本发明提供一种实现多媒体会议的方法,包括:In a third aspect, the present invention provides a method for implementing a multimedia conference, including:
客户端获取本地与会者的发言语音信息;The client obtains the voice information of the local participant's speech;
所述客户端将所述发言语音信息发送给多媒体会议服务器,以使所述多媒体会议服务器将所述发言语音信息转换为发言文字信息,以及使所述多媒体会议服务器将所述发言语音信息及所述发言文字信息发送给参加多媒体会议的其它与会者的客户端Sending, by the client, the speech voice information to the multimedia conference server, so that the multimedia conference server converts the speech voice information into speech text information, and causes the multimedia conference server to display the speech information and the speech message The text of the speech is sent to the client of other participants participating in the multimedia conference.
其中,所述其它与会者是所述多媒体会议的与会者中除所述本地与会者之外的其它与会者。The other participant is a participant other than the local participant among the participants of the multimedia conference.
第四方面,本发明提供一种实现多媒体会议的方法,包括:In a fourth aspect, the present invention provides a method for implementing a multimedia conference, including:
多媒体会议服务器获取客户端发送的发言语音信息;The multimedia conference server obtains the voice information sent by the client;
所述多媒体会议服务器将所述发言语音信息转换成发言文字信息;The multimedia conference server converts the speech voice information into speech text information;
所述多媒体会议服务器将所述发言语音信息及所述发言文字信息发送给其它与会者对应的客户端,以使所述其它与会者对应的客户端展示所述发言语音信息及所述发言文字信息;The multimedia conference server sends the speech voice information and the speech text information to a client corresponding to another participant, so that the client corresponding to the other participant displays the speech voice information and the speech text information. ;
其中,所述其它与会者是参加所述多媒体会议的与会者中除发送所述发言语音信息的与会者之外的与会者。The other participant is a participant other than the participant who sends the speaking voice information among the participants who participate in the multimedia conference.
结合第四方面,在第四方面的第一种可能的实现方式中,所述多媒体会议服务器将所述发言语音信息转换成发言文字信息包括:With reference to the fourth aspect, in a first possible implementation manner of the fourth aspect, the converting, by the multimedia conference server, the speaking voice information into the speaking text information includes:
多媒体会议服务器检测客户端发送的语音信息的能量,按照所述能量由大到小的顺序依次确定出前预设数量个与会者为发言人;The multimedia conference server detects the energy of the voice information sent by the client, and sequentially determines the preset number of participants as the speaker according to the order of the energy;
利用语音识别引擎将确定出的发言人对应的客户端发送的发言语音信息转换为发言文字信息。The voice recognition engine is used to convert the voice information sent by the client corresponding to the determined speaker into the voice text information.
第五方面,本发明提供一种实现多媒体会议的装置,用于客户端,包括: In a fifth aspect, the present invention provides a device for implementing a multimedia conference, which is used for a client, and includes:
获取单元,用于获取本地与会者的发言语音信息;An obtaining unit, configured to obtain speaking voice information of a local participant;
转换单元,用于将所述发言语音信息转换成发言文字信息;a converting unit, configured to convert the spoken voice information into speech text information;
发送单元,用于将所述发言语音信息及所述发言文字信息发送给多媒体会议服务器,以使所述多媒体会议服务器将所述发言语音信息及所述发言文字信息发送给所述多媒体会议的其它与会者的客户端;a sending unit, configured to send the speech voice information and the speech text information to the multimedia conference server, so that the multimedia conference server sends the speech speech information and the speech text information to the multimedia conference Participant's client;
其中,所述其它与会者是所述多媒体会议的与会者中除所述本地与会者之外的其它与会者。The other participant is a participant other than the local participant among the participants of the multimedia conference.
结合第五方面,在第五方面的第一种可能的实现方式中,所述获取单元包括:With reference to the fifth aspect, in a first possible implementation manner of the fifth aspect, the acquiring unit includes:
第一判断子单元,用于判断所述本地与会者是否具有发言权限;a first determining subunit, configured to determine whether the local participant has a speaking right;
第一采集子单元,用于当所述第一判断单元判定所述本地与会者具有发言权限时,则利用语音设备采集所述本地与会者的发言语音信息。The first collecting subunit is configured to: when the first determining unit determines that the local participant has a speaking right, collect the speaking voice information of the local participant by using a voice device.
结合第五方面,在第五方面的第二种可能的实现方式中,所述转换单元包括:With reference to the fifth aspect, in a second possible implementation manner of the fifth aspect, the converting unit includes:
第一接收子单元,用于接收多媒体会议服务器发送的发言通知消息,所述发言通知消息携带发言人的用户身份识别信息ID,所述发言人由多媒体会议服务器根据参加多媒体会议的与会者发送的语音信息的能量,按照所述能量由大到小的顺序依次确定出的前预设数量个与会者;a first receiving subunit, configured to receive a speech notification message sent by the multimedia conference server, where the speech notification message carries a user identity information ID of the speaker, and the speaker is sent by the multimedia conference server according to the participant participating in the multimedia conference The energy of the voice information is determined according to the order of the energy from the largest to the smallest preset number of participants;
第二判断子单元,用于判断所述发言通知消息所携带的用户ID与所述本地与会者的用户ID是否相同;a second determining subunit, configured to determine whether a user ID carried by the floor notification message is the same as a user ID of the local participant;
第二采集子单元,用于当所述第二判断子单元判定所述发言通知消息所携带的用户ID与所述本地与会者的用户ID相同时,利用语音设备采集所述本地与会者的发言语音信息。a second collection subunit, configured to: when the second judgment subunit determines that the user ID carried by the utterance notification message is the same as the user ID of the local participant, use a voice device to collect the speech of the local participant voice message.
结合第五方面,在第五方面的第三种可能的实现方式中,所述获取单元具体包括:With reference to the fifth aspect, in a third possible implementation manner of the fifth aspect, the acquiring unit specifically includes:
第一发送子单元,用于向多媒体会议服务器发送发言请求消息,所述发言请求消息携带所述本地与会者的用户ID,以使所述多媒体会议服 务器将所述发言请求消息发送给主持人对应的客户端;a first sending subunit, configured to send a speech request message to the multimedia conference server, where the speech request message carries a user ID of the local participant, so that the multimedia conference service Transmitting the speech request message to a client corresponding to the host;
第二接收子单元,用于接收所述多媒体会议服务器发送的语音设备开启指令,a second receiving subunit, configured to receive a voice device open command sent by the multimedia conference server,
第三采集子单元,用于当所述第二接收子单元接收到所述语音设备开启指令时,利用语音设备采集所述本地与会者的发言语音信息;所述语音设备开启指令由所述多媒体会议服务器接收到主持人对应的客户端根据发言请求消息返回的发言响应消息产生。a third collection subunit, configured to: when the second receiving subunit receives the voice device open instruction, use a voice device to collect voice information of the local participant; the voice device open command is used by the multimedia The conference server receives the response response message returned by the client corresponding to the speaker request message.
第六方面,本发明提供一种实现多媒体会议的装置,用于多媒体会议服务器端,包括:In a sixth aspect, the present invention provides an apparatus for implementing a multimedia conference, which is used in a multimedia conference server, and includes:
获取单元,用于获取客户端发送的发言语音信息及与所述发言语音信息相对应的发言文字信息,所述发言文字信息由所述客户端将获得的发言语音信息利用语音识别引擎转换得到;An acquiring unit, configured to obtain speech speech information sent by the client, and speech text information corresponding to the speech speech information, where the speech speech information is converted by the speech recognition engine obtained by the client by using a speech recognition engine;
第一发送单元,用于将所述发言语音信息及发言文字信息发送给其它与会者对应的客户端,以使所述其它与会者对应的客户端展示所述发言语音信息及所述发言文字信息;a first sending unit, configured to send the speaking voice information and the speaking text information to a client corresponding to another participant, so that the client corresponding to the other participant displays the speaking voice information and the speaking text information ;
其中,所述其它与会者是参加所述多媒体会议的与会者中除发送所述发言语音信息及发言文字信息的与会者之外的与会者。The other participant is a participant other than the participant who sends the speaking voice information and the speaking text information among the participants who participate in the multimedia conference.
结合第六方面,在第六方面的第一种可能的实现方式中,还包括:In conjunction with the sixth aspect, in a first possible implementation manner of the sixth aspect, the method further includes:
检测单元,用于检测客户端发送的语音信息的能量;a detecting unit, configured to detect energy of voice information sent by the client;
确定单元,用于按照所述能量由大到小的顺序确定前预设数量个与会者为发言人;a determining unit, configured to determine, according to the order of the energy from the largest to the smallest, a preset number of participants as a speaker;
第二发送单元,用于向所述发言人对应的客户端发送发言通知消息,所述发言通知消息携带所述发言人的用户身份识别信息ID,以使所述发言人对应的客户端获取所述发言人的发言语音信息并将所述发言语音信息转换为发言文字信息。a second sending unit, configured to send a speech notification message to the client corresponding to the speaker, where the speech notification message carries the user identification information ID of the speaker, so that the client corresponding to the speaker acquires the location The speaker's speech information is converted and the speech information is converted into speech text information.
结合第六方面,在第六方面的第二种可能的实现方式中,还包括:With reference to the sixth aspect, in a second possible implementation manner of the sixth aspect, the method further includes:
第一接收单元,用于接收客户端发送的发言请求消息,所述发言请 求消息携带所述客户端对应的与会者的用户ID;a first receiving unit, configured to receive a speech request message sent by the client, where the speech is requested The request message carries the user ID of the participant corresponding to the client;
第三发送单元,用于将所述发言请求消息发送给主持人对应的客户端,以使所述主持人对应的客户端根据所述发言请求消息判断发送发言请求消息的与会者是否具有发言权限;a third sending unit, configured to send the floor request message to the client corresponding to the moderator, so that the client corresponding to the moderator determines, according to the floor request message, whether the participant who sends the floor request message has the speaking right ;
第二接收单元,用于接收所述主持人对应的客户端发送的发言响应消息;所述发言响应消息由所述主持人对应的客户端确定发送所述发言请求消息的与会者具有发言权限时产生;a second receiving unit, configured to receive a speech response message sent by the client corresponding to the host; the speech response message is determined by the client corresponding to the moderator, when the participant sending the speech request message has a speaking permission produce;
第四发送单元,用于根据所述发言响应消息向具有发言权限的与会者对应的客户端发送语音设备开启指令。And a fourth sending unit, configured to send, according to the utterance response message, a voice device open command to a client corresponding to the participant having the utterance authority.
第七方面,本发明提供一种实现多媒体会议的装置,应用于客户端,包括:In a seventh aspect, the present invention provides an apparatus for implementing a multimedia conference, which is applied to a client, and includes:
获取单元,用于获取本地与会者的发言语音信息;An obtaining unit, configured to obtain speaking voice information of a local participant;
发送单元,用于将所述发言语音信息发送给多媒体会议服务器,以使所述多媒体会议服务器将所述发言语音信息转换为发言文字信息,以及使所述多媒体会议服务器将所述发言语音信息及所述发言文字信息发送给参加多媒体会议的其它与会者的客户端a sending unit, configured to send the speech voice information to the multimedia conference server, so that the multimedia conference server converts the speech voice information into speech text information, and causes the multimedia conference server to Sending the spoken text information to the client of other participants participating in the multimedia conference
其中,所述其它与会者是所述多媒体会议的与会者中除所述本地与会者之外的其它与会者。The other participant is a participant other than the local participant among the participants of the multimedia conference.
第八方面,本发明提供一种实现多媒体会议的装置,应用于多媒体服务器中,包括:In an eighth aspect, the present invention provides an apparatus for implementing a multimedia conference, which is applied to a multimedia server, and includes:
获取单元,用于获取客户端发送的发言语音信息;The obtaining unit is configured to obtain the speaking voice information sent by the client;
转换单元,用于将所述发言语音信息转换成发言文字信息;a converting unit, configured to convert the spoken voice information into speech text information;
发送单元,用于将所述发言语音信息及所述发言文字信息发送给其它与会者对应的客户端,以使所述其它与会者对应的客户端展示所述发言语音信息及所述发言文字信息;a sending unit, configured to send the speaking voice information and the speaking text information to a client corresponding to another participant, so that the client corresponding to the other participant displays the speaking voice information and the speaking text information ;
其中,所述其它与会者是参加所述多媒体会议的与会者中除发送所述发言语音信息的与会者之外的与会者。 The other participant is a participant other than the participant who sends the speaking voice information among the participants who participate in the multimedia conference.
结合第八方面,在第八方面的第一种可能的实现方式中,所述转换单元包括:In conjunction with the eighth aspect, in a first possible implementation manner of the eighth aspect, the converting unit includes:
检测子单元,用于检测客户端发送的语音信息的能量,按照所述能量由大到小的顺序依次确定出前预设数量个与会者为发言人;The detecting subunit is configured to detect the energy of the voice information sent by the client, and sequentially determine the preset number of participants as the speaker according to the order of the energy;
转换子单元,用于利用语音识别引擎将确定出的发言人发送的发言语音信息转换为发言文字信息。The conversion subunit is configured to convert the speech speech information sent by the determined speaker into the speech text information by using the speech recognition engine.
第九方面,提供一种实现多媒体会议系统,包括:客户端和多媒体会议服务器;A ninth aspect provides a multimedia conference system, including: a client and a multimedia conference server;
所述客户端,用于获取本地与会者的发言语音信息并发送给多媒体会议服务器;以及将所述发言语音信息转换成发言文字信息,并将所述发言文字信息发送给多媒体会议服务器;The client is configured to obtain the speech information of the local participant and send the speech information to the multimedia conference server; and convert the speech information into speech text information, and send the speech text information to the multimedia conference server;
所述多媒体会议服务器,用于将所述发言语音信息及所述发言文字信息发送给参加多媒体会议的其它与会者的客户端;The multimedia conference server is configured to send the speech voice information and the speech text information to a client of another participant participating in the multimedia conference;
其中,所述其它与会者是所述多媒体会议的与会者中除所述本地与会者之外的其它与会者。The other participant is a participant other than the local participant among the participants of the multimedia conference.
结合第九方面,在第九方面的第一种可能的实现方式中,所述多媒体会议服务器,还用于检测参加所述多媒体体会议的客户端发送的语音信息的能量,按照所述能量由大到小的顺序确定前预设数量个与会者为发言人,并向发言人对应的客户端发送发言通知消息,所述发言通知消息携带发言人的用户身份识别信息ID;With reference to the ninth aspect, in a first possible implementation manner of the ninth aspect, the multimedia conference server is further configured to detect, according to the energy, the energy of the voice information sent by the client participating in the multimedia conference The sequence of the large to small determines that the preset number of participants is a speaker, and sends a speech notification message to the client corresponding to the speaker, where the speech notification message carries the user identification information ID of the speaker;
所述客户端,用于接收多媒体会议服务器发送的发言通知消息,并根据所述发言通知信息确定本地与会者是发言人时,获取本地与会者的发言语音信息并发送给所述多媒体会议服务器,以及将所述发言语音信息转换成发言文字信息,并将所述发言文字信息发送给所述多媒体会议服务器。The client is configured to receive a speech notification message sent by the multimedia conference server, and determine, according to the speech notification information, that the local participant is a speaker, obtain the speech information of the local participant, and send the speech message to the multimedia conference server. And converting the speech voice information into speech text information, and transmitting the speech text information to the multimedia conference server.
第十方面,本发明还提供了一种实现多媒体会议系统,包括:客户端和多媒体会议服务器; In a tenth aspect, the present invention further provides a multimedia conference system, including: a client and a multimedia conference server;
所述客户端,用于获取本地与会者的发言语音信息,并发送给多媒体会议服务器;The client is configured to obtain the voice information of the local participant and send the voice message to the multimedia conference server.
所述多媒体会议服务器,用于将所述发言语音信息转换成发言文字信息,并将所述发言语音信息及与所述发言语音信息对应的发言文字信息发送给其它与会者对应的客户端;其中,所述其它与会者是参加所述多媒体会议的与会者中除发送所述发言语音信息的与会者之外的与会者;The multimedia conference server is configured to convert the speech voice information into speech text information, and send the speech speech information and the speech text information corresponding to the speech speech information to a client corresponding to another participant; And the other participant is a participant other than the participant who sends the speaking voice information among the participants who participate in the multimedia conference;
所述其它与会者对应的客户端,还用于向用户展示所述多媒体会议服务器发送的发言语音信息及发言文字信息。The client corresponding to the other participant is further configured to display the speaking voice information and the speaking text information sent by the multimedia conference server to the user.
结合第十方面,在第十方面的第一种可能的实现方式中,所述多媒体会议服务器,还用于检测参加所述多媒体体会议的客户端发送的语音信息的能量,按照所述能量由大到小的顺序确定前预设数量个与会者为发言人,当接收到的发言语音信息来自确定出的发言人时,将所述发言语音信息转换成发言文字信息。With reference to the tenth aspect, in a first possible implementation manner of the tenth aspect, the multimedia conference server is further configured to detect, according to the energy, the energy of the voice information sent by the client participating in the multimedia conference The order of the large to small determines that the preset number of participants is a speaker, and when the received speech voice information comes from the determined speaker, the speech information is converted into speech text information.
由以上技术方案可见,本发明实施例提供的实现多媒体会议的方案,发言人的客户端能够将发言人的发言语音信息转换为发言文字信息,并通过多媒体会议服务器将该发言文字信息转发给参加多媒体会议的与会者中除发言人之外的其它与会者对应的客户端,以便在所述其它与会者对应的客户端上显示发言人的发言信息,避免了与会者只能够接收发言语音信息,导致与会者无法准确了解发言人的发言内容的情况,从而提高了会议沟通的效果。It can be seen from the foregoing technical solution that, in the solution for implementing the multimedia conference provided by the embodiment of the present invention, the client of the speaker can convert the voice information of the speaker to the text of the speech, and forward the text message to the participant through the multimedia conference server. The client corresponding to the participant other than the speaker in the participant of the multimedia conference, so that the speaker's speech information is displayed on the client corresponding to the other participant, so that the participant can only receive the voice message. This has led to the inability of participants to accurately understand the content of the speaker's speech, thus improving the effectiveness of the meeting communication.
附图说明DRAWINGS
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,对于本领域普通技术人员而言,在不付出创造性劳动性的前提下,还可以根据这些附图获得其它的附图。 In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the embodiments or the description of the prior art will be briefly described below, and it will be apparent to those skilled in the art that In other words, other drawings can be obtained based on these drawings without paying for creative labor.
图1是本发明实施例示出的一种多媒体会议的框图;1 is a block diagram of a multimedia conference according to an embodiment of the present invention;
图2是本发明实施例示出的一种实现多媒体会议的方法流程图;2 is a flowchart of a method for implementing a multimedia conference according to an embodiment of the present invention;
图3是本发明实施例示出的再一种实现多媒体会议的方法流程图;FIG. 3 is a flowchart of still another method for implementing a multimedia conference according to an embodiment of the present invention; FIG.
图4是本发明实施例示出的另一种实现多媒体会议的方法流程图;4 is a flowchart of another method for implementing a multimedia conference according to an embodiment of the present invention;
图5是本发明实施例示出的又一种实现多媒体会议的方法流程图;FIG. 5 is a flowchart of still another method for implementing a multimedia conference according to an embodiment of the present invention; FIG.
图6是本发明实施例示出的另一种实现多媒体会议的方法流程图;FIG. 6 is a flowchart of another method for implementing a multimedia conference according to an embodiment of the present invention; FIG.
图7是本发明实施例的另一种实现多媒体会议的方法的流程图;FIG. 7 is a flowchart of another method for implementing a multimedia conference according to an embodiment of the present invention; FIG.
图8是本发明实施例示出的一种实现多媒体会议的装置结构示意图;FIG. 8 is a schematic structural diagram of an apparatus for implementing a multimedia conference according to an embodiment of the present invention; FIG.
图9是本发明实施例示出的另一种实现多媒体会议的装置结构示意图;FIG. 9 is a schematic structural diagram of another apparatus for implementing a multimedia conference according to an embodiment of the present invention; FIG.
图10是本发明实施例示出的一种获取单元的结构示意图;FIG. 10 is a schematic structural diagram of an acquiring unit according to an embodiment of the present invention; FIG.
图11是本发明实施例示出的一种转换单元的结构示意图;11 is a schematic structural diagram of a conversion unit according to an embodiment of the present invention;
图12是本发明实施例示出的又一种获取单元的结构示意图;FIG. 12 is a schematic structural diagram of still another acquiring unit according to an embodiment of the present invention; FIG.
图13是本发明实施例示出的又一种实现多媒体会议装置结构示意图;FIG. 13 is a schematic structural diagram of another implementation of a multimedia conference apparatus according to an embodiment of the present invention; FIG.
图14是本发明实施例示出的另一种实现多媒体会议的装置结构示意图;FIG. 14 is a schematic structural diagram of another apparatus for implementing a multimedia conference according to an embodiment of the present invention; FIG.
图15是本发明实施例示出的再一种实现多媒体会议的装置结构示意图;FIG. 15 is a schematic structural diagram of another apparatus for implementing a multimedia conference according to an embodiment of the present invention; FIG.
图16是本发明实施例示出的一种应用于客户端的实现多媒体会议的装置的结构示意图;16 is a schematic structural diagram of an apparatus for implementing a multimedia conference applied to a client according to an embodiment of the present invention;
图17是本发明实施例示出的一种应用于多媒体会议服务器的实现多媒体会议的装置的结构示意图;17 is a schematic structural diagram of an apparatus for implementing a multimedia conference applied to a multimedia conference server according to an embodiment of the present invention;
图18是本发明实施例示出的一种实现多媒体会议的客户端的结构示意图;FIG. 18 is a schematic structural diagram of a client for implementing a multimedia conference according to an embodiment of the present invention; FIG.
图19是本发明实施例示出的一种多媒体会议服务器的结构示意图;19 is a schematic structural diagram of a multimedia conference server according to an embodiment of the present invention;
图20是本发明实施例示出的另一种多媒体会议服务器的结构示意图;20 is a schematic structural diagram of another multimedia conference server according to an embodiment of the present invention;
图21是本发明实施例示出的另一种实现多媒体会议的客户端的结构示意图;21 is a schematic structural diagram of another client for implementing a multimedia conference according to an embodiment of the present invention;
图22是本发明实施例示出的另一种多媒体会议服务器的结构示意图。 FIG. 22 is a schematic structural diagram of another multimedia conference server according to an embodiment of the present invention.
具体实施方式detailed description
本发明实施例所提供的多媒体会议的方案,解决了背景技术中所介绍的与会者无法准确了解发言人的发言信息,导致降低会议沟通的问题。The solution of the multimedia conference provided by the embodiment of the present invention solves the problem that the participants in the background technology cannot accurately understand the speech information of the speaker, which leads to the problem of reducing the communication of the conference.
为了使本技术领域的人员更好地理解本发明中的技术方案,下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其它实施例,都应当属于本发明保护的范围。In order to make those skilled in the art better understand the technical solutions in the present invention, the technical solutions in the embodiments of the present invention will be clearly and completely described in conjunction with the accompanying drawings in the embodiments of the present invention. The embodiments are only a part of the embodiments of the invention, and not all of the embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments of the present invention without departing from the inventive scope should fall within the scope of the present invention.
以上是本发明的核心思想,为了使本技术领域的人员更好地理解本发明方案,下面结合附图对本发明作进一步的详细说明。The above is the core idea of the present invention. In order to enable those skilled in the art to better understand the present invention, the present invention will be further described in detail below with reference to the accompanying drawings.
为了使本技术领域的人员更好地理解本发明实施例中的技术方案,并使本发明实施例的上述目的、特征和优点能够更加明显易懂,下面结合附图对本发明实施例中技术方案作进一步详细的说明。The above-mentioned objects, features, and advantages of the embodiments of the present invention will become more apparent and understood. Give further details.
图1是对媒体会议系统的框图,如图1所示,该多媒体会议系统包括多个客户端1和至少一个多媒体会议服务器2。其中,客户端可以是个人PC机、笔记本电脑等终端。1 is a block diagram of a media conferencing system. As shown in FIG. 1, the multimedia conferencing system includes a plurality of clients 1 and at least one multimedia conferencing server 2. Among them, the client can be a personal PC, a laptop, and the like.
客户端获取与会者的媒体流信息(例如,语音信息),并将媒体流信息上传至多媒体会议服务器2,由多媒体会议服务器2将各个客户端发送的媒体流进行混音处理后发送给各个终端,从而使在地理上分散的用户通过图形、声音等方式进行交流。The client obtains the media stream information (for example, the voice information) of the participant, and uploads the media stream information to the multimedia conference server 2, and the multimedia conference server 2 performs the mixing process on the media stream sent by each client, and then sends the media stream to each terminal. So that geographically dispersed users communicate through graphics, sound, and so on.
图2是本发明实施例示出的一种实现多媒体会议的方法的流程图,该方法应用于图1所示的客户端中,如图2所示,该方法包括如下步骤:FIG. 2 is a flowchart of a method for implementing a multimedia conference according to an embodiment of the present invention. The method is applied to the client shown in FIG. 1. As shown in FIG. 2, the method includes the following steps:
S110,客户端获取本地与会者的发言语音信息,并将所述发言语音信息发送给多媒体会议服务器。S110. The client obtains the voice information of the voice of the local participant, and sends the voice message to the multimedia conference server.
本地与会者是指与客户端处于同一地理空间的与会者。例如,与会者A使用客户端a参加多媒体会议,对于客户端a而言,与会者A即与客户端a对应的本地与会者。 A local participant is a participant who is in the same geographic space as the client. For example, participant A uses client a to participate in a multimedia conference. For client a, participant A is a local participant corresponding to client a.
客户端可以使用语音设备获取本地与会者的发言语音信息。该语音设备可以包括集成于客户端上的语音信息采集硬件和控制该语音信息采集硬件的操作软件。其中,所述语音信息采集硬件能够实现语音采集、语音编码及语音解码等功能,例如,MIC。所述操作软件可以查询本地语音信息采集硬件的数量和名称,还可以开启、关闭或者静音该语音采集硬件。The client can use the voice device to obtain the voice information of the local participant's speech. The voice device can include voice information collection hardware integrated on the client and operating software that controls the voice information collection hardware. The voice information collection hardware can implement functions such as voice collection, voice coding, and voice decoding, for example, MIC. The operating software can query the number and name of the local voice information collecting hardware, and can also turn on, turn off or mute the voice collecting hardware.
本实施例适用于讨论式会议应用场景,每个与会者都可以发言,这样,每个客户端都能够获得与自身对应的与会者的发言语音信息。如果客户端通过语音设备获取与会者的发言语音信息,则每个与会者对应的语音设备都处于开启状态。This embodiment is applicable to a discussion-type conference application scenario, and each participant can speak, so that each client can obtain the voice information of the participant's voice corresponding to itself. If the client obtains the voice information of the participant's voice through the voice device, the voice device corresponding to each participant is turned on.
S120,客户端将发言语音信息转换成发言文字信息。S120. The client converts the speech information into speech text information.
客户端利用语音识别技术将获得的本地发言人的发言语音信息转换成发言文字信息。客户端获得的本地与会者的发言语音信息能量相对较强,因此由发言人对应的客户端将发言语音信息转换发言文字信息的准确率较高。同时,此种方式不需要其它与会者对应的客户端将发言人的发言语音信息转换为发言文字信息,节省了其它与会者对应的客户端的资源。The client uses the speech recognition technology to convert the speech speech information of the obtained local speaker into the speech text information. The voice information of the local participants obtained by the client is relatively strong. Therefore, the accuracy of the voice message converted by the client corresponding to the speaker is higher. At the same time, in this way, the client corresponding to other participants does not need to convert the speech information of the speaker to the speech text information, thereby saving the resources of the client corresponding to other participants.
可选地,发言人对应的客户端还可以存储发言文字信息,以便利用所述发言文字信息生成会议纪要。同理,参加多媒体会议的其它与会者对应的客户端也可以存储接收到的发言文字信息,以便根据所述发言文字信息生成会议纪要。此外,发言人对应的客户端也可以显示所述发言文字信息,从而方便发言人查看自己的发言内容。Optionally, the client corresponding to the speaker may further store the spoken text information, so as to generate the meeting minutes by using the spoken text information. Similarly, the client corresponding to other participants participating in the multimedia conference may also store the received speech text information to generate a meeting minutes based on the spoken text information. In addition, the corresponding client of the speaker can also display the spoken text information, so that the speaker can view the content of his speech.
S130,客户端将所述发言文字信息发送给多媒体会议服务器,以使所述多媒体会议服务器将所述发言语音信息及发言文字信息发送给其它与会者对应的客户端。S130. The client sends the speech text information to the multimedia conference server, so that the multimedia conference server sends the speech speech information and the speech text information to the client corresponding to the other participants.
所述其它与会者是参加多媒体会议的全部与会者中除发言人之外的其它与会者。The other participants are participants other than the speaker among all the participants participating in the multimedia conference.
多媒体会议服务器将接收到发言语音信息及发言文字信息发送给参加本次多媒体会议的其它与会者对应的客户端。其它与会者对应的客户端展示接 收到的发言语音信息及发言文字信息,从而有助于与会者快速理解发言人的发言内容。The multimedia conference server sends the received voice information and the spoken text information to the client corresponding to other participants participating in the multimedia conference. Client display corresponding to other participants Received voice messages and text messages, which help participants quickly understand the speaker's speech.
例如,参加本地多媒体会议的与会者包括A、B、C、D和E,其中,与会者A是发言人,则与会者B、C、D、E是其它与会者。多媒体会议服务器将与会者A的发言语音信息和发言文字信息发送给B、C、D、E。For example, participants participating in the local multimedia conference include A, B, C, D, and E. Among them, participant A is a speaker, and participants B, C, D, and E are other participants. The multimedia conference server sends the voice information and the voice text information of the participant A to the B, C, D, and E.
可以在客户端和多媒体会议服务器上都集成T.120协议标准,从而在客户端和多媒体会议服务器之间实现发言语音信息及发言文字信息的收发功能。其中,T.120标准包括T.120-T.127等一系列协议,该协议标准能够实现客户端之间以及客户端与多媒体会议服务器之间的信息传输的可靠性,同时,能够提供点到多点的数据分发服务,并选择传输效率最佳的传输路径传输数据。The T.120 protocol standard can be integrated on both the client and the multimedia conference server to implement the function of transmitting and receiving voice messages and speaking text information between the client and the multimedia conference server. Among them, the T.120 standard includes a series of protocols such as T.120-T.127, which can realize the reliability of information transmission between clients and between the client and the multimedia conference server, and at the same time, can provide points to Multi-point data distribution service and select the transmission channel with the best transmission efficiency to transmit data.
本实施例示出的实现多媒体会议的方法,由客户端获取本地与会者的发言语音信息,并将发言语音信息转换为发言文字信息。然后将发言语音信息及发言文字信息发送给多媒体会议服务器,再由多媒体会议服务器转发给参加多媒体会议的其它与会者对应的客户端,其它与会者对应的客户端展示接收到的发言语音信息及发言文字信息。利用本发明提供的实现多媒体会议的方法,与会者既能够听到发言人的发言语音信息又能够看到对应的发言文字信息,这样,与会者能够结合发言文字信息和发言语音信息准确理解发言人的发言内容,因此提高了多媒体会议的沟通效果。In the method for implementing the multimedia conference shown in this embodiment, the client obtains the speech information of the local participant and converts the speech information into speech text information. Then, the speech information and the speech text information are sent to the multimedia conference server, and then the multimedia conference server forwards the client to the client corresponding to the other participants participating in the multimedia conference, and the client corresponding to the other participant displays the received speech information and the speech. text information. By using the method for implementing the multimedia conference provided by the present invention, the participant can both hear the speech information of the speaker and the corresponding speech text information, so that the participant can accurately understand the speaker by combining the speech text information and the speech voice information. The content of the speech has thus improved the communication effect of the multimedia conference.
在一种应用场景中,所有与会者都允许发言,例如,讨论式会议。但是,如果将所有与会者发出的语音信息都转换成相应的文字信息,将会造成很多与会议无关的语音转换成文字,并造成很多与会议无关的文字显示给与会者,对与会者造成干扰。鉴于上述应用场景,可以将语音能量较大的与会者确定为发言人,并将发言人的发言语音信息转换成发言文字信息,其它语音能量较小的与会者的语音内容被忽略。In an application scenario, all participants are allowed to speak, for example, a discussion session. However, if the voice information sent by all participants is converted into the corresponding text information, many voices that are not related to the conference will be converted into text, and many texts that are not related to the conference will be displayed to the participants, causing interference to the participants. . In view of the above application scenarios, a participant with a large voice energy can be determined as a speaker, and the voice information of the speaker's speech can be converted into a speech text information, and the voice content of other participants with less voice energy is ignored.
图3是本发明实施例示出的另一种实现多媒体会议方法的流程图,该实施例适用于与会者人数较多,与会者均能发言的应用场景。如图3所示,该方法可以包括以下步骤: FIG. 3 is a flowchart of another method for implementing a multimedia conference according to an embodiment of the present invention. The embodiment is applicable to an application scenario in which the number of participants is large and the participants can speak. As shown in FIG. 3, the method may include the following steps:
S210,多媒体会议服务器检测客户端发送的语音信息的能量。S210. The multimedia conference server detects the energy of the voice information sent by the client.
参加多媒体会议的客户端将获得的与会者的语音信息发送给多媒体会议服务器,由多媒体会议服务器检测接收到的语音信息的能量。The client participating in the multimedia conference sends the obtained voice information of the participant to the multimedia conference server, and the multimedia conference server detects the energy of the received voice information.
本实施例中,检测语音信息的能量可以由多媒体会议服务器中的语音会议桥实现。所述语音会议桥用于提供服务器侧的语音会场,将各发言人的语音混音后送给每个与会者。In this embodiment, the energy of detecting the voice information may be implemented by a voice conference bridge in the multimedia conference server. The voice conference bridge is used to provide a voice conference site on the server side, and the voices of the speakers are mixed and sent to each participant.
S220,多媒体会议服务器按照语音信息的能量由大到小的顺序,确定前预设数量个与会者为发言人。S220: The multimedia conference server determines, according to the order of the energy of the voice information, that the preset number of participants is a speaker.
多媒体会议服务器通过检测参加多媒体会议的与会者发送的语音信息的能量,按照能量由大到小进行排序并依次确定出前预设数量个与会者为发言人。例如,所述预设数量可以是一个,即将语音信息的能量最大的与会者确定为发言人;或者,所述预设数量可以是两个,即将语音信息的能量最大的两个与会者确定为发言人。The multimedia conference server detects the energy of the voice information sent by the participants participating in the multimedia conference, sorts the energy from large to small, and sequentially determines the preset number of participants as the speaker. For example, the preset number may be one, that is, the participant with the highest energy of the voice information is determined as a speaker; or the preset number may be two, that is, two participants with the highest energy of the voice information are determined as spokesman.
需要说明的是,此种应用场景下,如果发言人在不同时刻发出语音的能量不同,多媒体会议服务器根据语音信息的能量确定出的发言人可能不同。It should be noted that, in this application scenario, if the speaker emits different voice energy at different times, the multimedia conference server may determine that the speaker may be different according to the energy of the voice information.
S230,多媒体会议服务器向参加多媒体会议的与会者发送发言通知消息,所述发言通知消息携带发言人的用户ID(Identification,身份标识)。S230. The multimedia conference server sends a speech notification message to the participant participating in the multimedia conference, where the speech notification message carries a user ID (Identification) of the speaker.
其中,多媒体会议服务器可以通过广播的形式将发言通知消息发送给所有参加多媒体会议的与会者的客户端,并由与会者的客户端根据发言通知消息中的用户ID判断本客户端所对应的与会者是否为发言人;也可以将发言通知消息一对一发送给该用户ID所对应的与会者的客户端,由其来根据用户ID进行判断。The multimedia conference server may send a speech notification message to all clients participating in the multimedia conference by means of broadcast, and the client of the participant determines the participant corresponding to the client according to the user ID in the speech notification message. Whether the speaker is a speaker or not, the speaker notification message may be sent one-to-one to the client of the participant corresponding to the user ID, and the user may perform the determination according to the user ID.
与会者的客户端接收多媒体会议服务器的发言通知消息,由于该发言通知消息中含有用户ID,与会者的客户端可将该用户ID与自身的用户ID进行比较,从而判断自身客户端所对应的与会者是否是发言人。The participant's client receives the speech notification message of the multimedia conference server. Since the speech notification message contains the user ID, the participant's client can compare the user ID with its own user ID, thereby determining the corresponding client's Whether the participant is a spokesperson.
S240,当客户端确定所述发言通知消息所携带的用户ID与自身的用户ID相同时,确定本地与会者为发言人。 S240. When the client determines that the user ID carried by the speech notification message is the same as the user ID of the user, the local participant is determined to be a speaker.
S250,发言人对应的客户端获取发言人的发言语音信息,并将该发言语音信息发送给多媒体会议服务器。S250. The client corresponding to the speaker obtains the voice information of the speaker of the speaker, and sends the voice message to the multimedia conference server.
S260,发言人对应的客户端将所述发言语音信息转换为发言文字信息。S260. The client corresponding to the speaker converts the spoken voice information into the spoken text information.
S270,发言人对应的客户端将所述发言文字信息发送给多媒体会议服务器。S270. The client corresponding to the speaker sends the spoken text information to the multimedia conference server.
S280,多媒体会议服务器将所述发言语音信息及所述发言文字信息发送给其它与会者对应的客户端。S280. The multimedia conference server sends the speech voice information and the speech text information to a client corresponding to another participant.
S290,所述其它与会者对应的客户端展示所述发言语音信息及所述发言文字信息。S290. The client corresponding to the other participant displays the speaking voice information and the speaking text information.
本实施例提供的实现多媒体会议的方法,多媒体会议服务器检测各个与会者发出语音信息的能量,并按照能量由大到小的顺序,确定前预设数量个与会者为发言人,即只将能量最大的预设数量个与会者的发言内容转换成对应的文字信息。该方法能够避免将很多与会议无关的语音转换成文字,导致很多与会议无关的文字显示给与会者,对与会者造成的干扰现象出现。The method for implementing the multimedia conference provided by the embodiment, the multimedia conference server detects the energy of the voice information sent by each participant, and determines the preset number of participants as the speaker according to the order of the energy, that is, only the energy The speech content of the largest preset number of participants is converted into corresponding text information. This method can avoid converting a lot of speech unrelated to a conference into text, resulting in many texts that are not related to the conference being displayed to the participants, and the interference caused to the participants occurs.
在另一种应用场景中,只需要主持人和主讲人的发言内容转换成对应的文字信息,忽略其它与会者的发言内容。In another application scenario, only the content of the host and the presenter's speech is converted into corresponding text information, and the contents of other participants' speeches are ignored.
图4是本发明实施例示出的另一种实现多媒体会议的方法流程图。本实施例中,只将具有发言权限的发言人的发言内容转换成文字信息。如图4所示,该方法包括以下步骤:FIG. 4 is a flowchart of another method for implementing a multimedia conference according to an embodiment of the present invention. In this embodiment, only the speech content of the speaker having the speaking authority is converted into the text information. As shown in FIG. 4, the method includes the following steps:
S310,客户端判断本地与会者是否具有发言权限;如果所述本地与会者具有发言权限,则执行S320;否则,结束本次流程。S310, the client determines whether the local participant has the speaking right; if the local participant has the speaking right, executes S320; otherwise, ends the current process.
在会议具有主持人和固定的主讲人的应用场景中,通常主讲人和主持人具有发言权限。判断与会者是否具有发言权限可以包括判断与会者的身份属性是否具有主讲人权限或主持人权限。In the application scenario where the conference has a moderator and a fixed presenter, usually the presenter and the moderator have the right to speak. Determining whether the participant has the speaking right may include determining whether the identity attribute of the participant has the presenter right or the moderator right.
S320,客户端获取本地与会者的发言语音信息,并将该发言语音信息发送给多媒体会议服务器。S320: The client obtains the voice information of the voice of the local participant, and sends the voice message to the multimedia conference server.
S330,客户端将所述发言语音信息转换为发言文字信息。 S330. The client converts the speech voice information into speech text information.
其中,客户端可以内置语音识别引擎,客户端利用语音识别引擎将本地与会者的发言语音信息转换为发言文字信息。The client can have a built-in speech recognition engine, and the client uses the speech recognition engine to convert the speech information of the local participants into the speech information.
S340,客户端将发言文字信息发送给多媒体会议服务器。S340. The client sends the text message to the multimedia conference server.
客户端可以在获得本地与会者的发言语音信息后,就立即将所述发言语音信息发送给多媒体会议服务器,以使多媒体会议服务器及时将发言人的发言语音信息转发给其它与会者,保证语音信息传输的实时性。当然,如果发言语音信息转换成发言文字信息所需时间很短,一般在毫秒级,则可以将发言语音信息和发言文字信息一起发送给多媒体会议服务器,从而使其它与会者对应的客户端接播放发言语音信息和显示的发言文字信息同步进行。After obtaining the voice information of the local participant's voice, the client can immediately send the voice message to the multimedia conference server, so that the multimedia conference server can forward the voice message of the speaker to other participants in time to ensure voice information. The real-time nature of the transmission. Of course, if the time required to convert the speech information into the speech information is very short, generally in the millisecond level, the speech information and the speech information can be sent to the multimedia conference server together, so that the corresponding client of the other participants can be played. The speech information and the displayed speech information are synchronized.
S350,多媒体会议服务器将所述发言语音信息及发言文字信息发送给其它与会者对应的客户端。S350. The multimedia conference server sends the speech voice information and the speech text information to a client corresponding to another participant.
S360,其它与会者对应的客户端展示所述发言语音信息及所述发言文字信息。S360. The client corresponding to the other participant displays the speaking voice information and the speaking text information.
本实施例提供的实现多媒体会议的方法,只将具有发言权限的与会者的发言语音信息转换成发言文字信息,而不是将所有与会者的发言内容转换成对应的文字信息。利用该方法能够避免将多媒体会议中与会者发出的与会议无关的语音内容转换成相应的文字信息转发给其它与会者,进而避免其它与会者的客户端显示过多不重要的文字信息对与会者造成干扰的现象发生。The method for implementing the multimedia conference provided in this embodiment only converts the speech information of the participant having the speaking authority into the speech text information, instead of converting the speech content of all the participants into the corresponding text information. The method can be used to prevent the conference-independent voice content sent by the participants in the multimedia conference from being converted into corresponding text information and forwarded to other participants, thereby preventing other participants' clients from displaying too much unimportant text information to the participants. The phenomenon of interference occurs.
在又一种应用场景中,只有主持人和主讲人能够发言,其它与会者不能发言,其它与会者的语音设备被关闭,且与会者自己不能开启语音设备。当与会者需要发言时,与会者可以请求主持人开启该与会者的语音设备。In another application scenario, only the moderator and the presenter can speak, other participants cannot speak, the voice devices of other participants are turned off, and the participants themselves cannot turn on the voice device. When a participant needs to speak, the participant can request the moderator to turn on the participant's voice device.
图5是本发明实施例示出的又一种实现多媒体会议的方法流程图。该方法应用于主持人指定发言人的应用场景,该方法包括以下步骤:FIG. 5 is a flowchart of still another method for implementing a multimedia conference according to an embodiment of the present invention. The method is applied to an application scenario in which a moderator specifies a speaker, and the method includes the following steps:
S410,客户端向多媒体会议服务器发送发言请求消息,所述发言请求消息携带所述客户端对应与会者的用户身份识别ID。S410. The client sends a speech request message to the multimedia conference server, where the speech request message carries a user identity ID of the participant corresponding to the client.
除主持人和主讲人之外的其它与会者需要发言时,由该与会者对应的客户端向多媒体会议服务器发送发言请求消息。所述发言请求消息携带该与会 者的用户ID。When the participant other than the moderator and the presenter needs to speak, the client corresponding to the participant sends a speech request message to the multimedia conference server. The floor request message carries the meeting User ID.
S420,多媒体会议服务器将该发言请求消息转发给主持人所对应的客户端。S420. The multimedia conference server forwards the floor request message to the client corresponding to the moderator.
S430,主持人对应的客户端根据发言请求消息确定允许所述与会者发言时,向多媒体会议服务器发送发言响应消息。S430. The client corresponding to the host sends a speech response message to the multimedia conference server when determining, according to the speech request message, that the participant is allowed to speak.
主持人对应的客户端接收到所述发言请求消息后,根据发言请求消息携带的用户ID判断是否允许该与会者发言,如果允许该与会者发言,则产生发言响应消息并发送给多媒体会议服务器。该发言响应消息中也可以携带该与会者的用户ID,以方便多媒体会议服务器识别该与会者。After receiving the message request message, the client corresponding to the moderator determines whether to allow the participant to speak according to the user ID carried in the message request message. If the participant is allowed to speak, the speaker response message is generated and sent to the multimedia conference server. The participant response message may also carry the participant's user ID to facilitate the multimedia conference server to identify the participant.
其中,主持人对应的客户端可以根据预先设定的与会者的身份属性判断是否允许与会者发言。The client corresponding to the moderator may determine whether to allow the participant to speak according to the identity attribute of the preset participant.
S440,多媒体会议服务器根据所述发言响应消息,生成语音设备开启指令,并向发言人对应的客户端发送所述语音设备开启指令。S440. The multimedia conference server generates a voice device start command according to the voice response message, and sends the voice device open command to the client corresponding to the speaker.
多媒体会议服务器根据接收到发言响应消息生成语音设备开启指令,该语音设备开启指令用于控制主持人允许发言的与会者对应的语音设备开启。The multimedia conference server generates a voice device open command according to the received speech response message, and the voice device open command is used to control the voice device corresponding to the participant who is allowed to speak.
S450,当发言人对应的客户端接收到所述语音设备开启指令时,利用语音设备获取发言人的发言语音信息,并将该发言语音信息发送给多媒体会议服务器。S450: When the client corresponding to the speaker receives the voice device open command, the voice device is used to obtain the voice message of the speaker, and the voice message is sent to the multimedia conference server.
S460,发言人对应的客户端将所述发言语音信息转换成发言文字信息。S460. The client corresponding to the speaker converts the spoken voice information into the spoken text information.
S470,发言人对应的客户端将发言文字信息发送给多媒体会议服务器。S470: The client corresponding to the speaker sends the text message to the multimedia conference server.
S480,多媒体会议服务器将所述发言语音信息和发言文字信息发送给除发言人之外的其它与会者对应的客户端。S480. The multimedia conference server sends the speech voice information and the speech text information to a client corresponding to another participant other than the speaker.
S490,所述其它与会者对应的客户端展示所述发言语音信息及所述发言文字信息。S490. The client corresponding to the other participant displays the speaking voice information and the speaking text information.
本实施例提供的实现多媒体会议的方法,除主持人或主讲人之外的其它与会者需要发言时,向主持人的客户端发送发言请求消息,由主持人根据所述发言请求消息判断是否允许所述与会者发言,若允许所述与会者发言,则 向多媒体会议服务器发送允许所述与会者发言的发言响应消息,多媒体会议服务器根据该发言响应消息产生语音设备开启指令,控制所述与会者对应的语音设备开启。由该与会者对应的语音设备获取该与会者的发言语音信息,由该与会者对应的客户端将所述发言语音信息转换成发言文字信息。该方法适用于正式会议或级别较高的会议场景,扩大了多媒体会议实现方法的适用范围。The method for implementing a multimedia conference provided by this embodiment, when a participant other than the host or the presenter needs to speak, sends a speech request message to the client of the host, and the host determines whether to allow the message according to the speech request message. The participant speaks, if the participant is allowed to speak, then Sending a speech response message to the multimedia conference server that allows the participant to speak, and the multimedia conference server generates a voice device activation command according to the speech response message, and controls the voice device corresponding to the participant to be turned on. The voice information of the participant is obtained by the voice device corresponding to the participant, and the client corresponding to the participant converts the voice information into speech text information. This method is applicable to formal conferences or higher-level conference scenarios, and expands the scope of application of multimedia conference implementation methods.
图6是本发明实施例示出的另一种实现多媒体会议的方法流程图,如图6所示,该方法包括以下步骤:FIG. 6 is a flowchart of another method for implementing a multimedia conference according to an embodiment of the present invention. As shown in FIG. 6, the method includes the following steps:
S510,客户端获取本地与会者的发言语音信息,并将该发言语音信息发送给多媒体会议服务器。S510: The client obtains the voice information of the voice of the local participant, and sends the voice message to the multimedia conference server.
客户端利用语音设备采集与会者的发言语音信息。The client uses the voice device to collect the voice information of the participant's speech.
S520,多媒体会议服务器将所述发言语音信息转换为发言文字信息。S520. The multimedia conference server converts the speech voice information into speech text information.
多媒体会议服务器在将各个与会者发送的语音信息进行混音之前,将接收到的发言语音信息利用语音识别引擎转换为发言文字信息。The multimedia conference server converts the received speech voice information into speech text information by using a voice recognition engine before mixing the voice information sent by each participant.
在本发明的一个实施例中,全部参加多媒体会议的与会者可以自由发言,任意一个与会者都能够将获得的本地与会者的发言语音信息发送给多媒体会议服务器。相应地,多媒体会议服务器可以将任意一个与会者的发言文字信息转换成发言文字信息。In an embodiment of the present invention, all participants participating in the multimedia conference can speak freely, and any one of the participants can send the obtained voice information of the local participant to the multimedia conference server. Correspondingly, the multimedia conference server can convert the speech text information of any one participant into the speech text information.
在本发明的另一个实施例中,只有主持人和主讲人可以发言,只有主持人和主讲人可以将获得的发言语音信息发送给多媒体会议服务器。多媒体会议服务器将接收到的发言语音信息转换成发言文字信息。In another embodiment of the present invention, only the moderator and the presenter can speak, and only the moderator and the presenter can send the obtained voice message to the multimedia conference server. The multimedia conference server converts the received speech voice information into speech text information.
S530,多媒体会议服务器将所述发言语音信息及对应的发言文字信息发送给参加多媒体会议的其它与会者的客户端。所述其它与会者是所述多媒体会议的与会者中除所述本地与会者之外的其它与会者。S530. The multimedia conference server sends the speech voice information and the corresponding speech text information to a client of another participant participating in the multimedia conference. The other participant is a participant other than the local participant among the participants of the multimedia conference.
S540,所述其它与会者的客户端展示所述发言语音信息及对应的发言文字信息。S540. The client of the other participant displays the speaking voice information and the corresponding speaking text information.
本实施例提供的实现多媒体会议的方法,与会者的客户端获得发言语音 信息后发送给多媒体会议服务器,由多媒体会议服务器将发言语音信息转换成发言文字信息,然后,再将发言语音信息及对应的发言文字信息发送给参加多媒体会议的其它与会者对应的客户端。这样,参加多媒体会议的与会者既能够听到发言人的发言语音信息,又能够看到相应的发言文字信息,能够准确理解发言人的发言内容,提高多媒体会议的沟通效果。该方法由多媒体会议服务器将发言语音信息转换成发言文字信息,不需要在各个客户端上集成语音识别引擎,降低了客户端的生产成本。The method for implementing a multimedia conference provided by this embodiment, the client of the participant obtains the speech voice The information is sent to the multimedia conference server, and the multimedia conference server converts the speech voice information into the speech text information, and then sends the speech voice information and the corresponding speech text information to the client corresponding to other participants participating in the multimedia conference. In this way, the participants who participate in the multimedia conference can not only hear the voice information of the speaker's speech, but also can see the corresponding speech text information, can accurately understand the speaker's speech content, and improve the communication effect of the multimedia conference. The method converts the speech information into speech text information by the multimedia conference server, and does not need to integrate the speech recognition engine on each client, thereby reducing the production cost of the client.
图7是本发明实施例的另一种实现多媒体会议的方法的流程图,本实施例根据与会者法语音信息的能量大小,确定能量最大的预设数量个与会者为发言人,并将发言人的发言语音信息转换成发言文字信息。如图7所示,该方法可以包括以下步骤:FIG. 7 is a flowchart of another method for implementing a multimedia conference according to an embodiment of the present invention. In this embodiment, according to the energy of the voice information of the participant method, the preset number of participants with the largest energy is determined as a speaker, and the speaker is spoken. The speech information of the person's speech is converted into speech text information. As shown in FIG. 7, the method may include the following steps:
S610,多媒体会议服务器检测客户端发送的语音信息的能量。S610. The multimedia conference server detects the energy of the voice information sent by the client.
S620,多媒体会议服务器按照语音信息的能量由大到小的顺序,确定前预设数量个与会者为发言人。S620: The multimedia conference server determines, according to the order of the energy of the voice information, that the preset number of participants is a speaker.
S630,客户端获取本地与会者的发言语音信息并将该发言语音信息发送给多媒体会议服务器。S630. The client obtains the voice message of the local participant and sends the voice message to the multimedia conference server.
S640,多媒体会议服务器将确定出的发言人对应的客户端发送的发言语音信息转换为发言文字信息。S640: The multimedia conference server converts the voice information sent by the client corresponding to the determined speaker into the voice text information.
S650,多媒体会议服务器将发言人对应的客户端发送的发言语音信息及对应的发言文字信息,发送给参加多媒体会议的其它与会者的客户端。S650: The multimedia conference server sends the voice message and the corresponding voice text information sent by the client corresponding to the speaker to the client of the other participant participating in the multimedia conference.
所述其它与会者是所述多媒体会议的与会者中除所述本地与会者之外的其它与会者。The other participant is a participant other than the local participant among the participants of the multimedia conference.
S660,其它与会者的客户端展示接收到的发言语音信息和对应的发言文字信息。S660. The client of the other participant displays the received speech voice information and the corresponding speech text information.
本实施例提供的实现多媒体会议的方法,多媒体会议服务器检测各个与会者发出语音信息的能量,并按照能量由大到小的顺序,确定前预设数量个与会者为发言人。多媒体会议服务器只将确定出的发言人的发言内容转换成 对应的文字信息。该方法能够避免将很多与会议无关的语音转换成文字,导致很多与会议无关的文字显示给与会者,对与会者造成的干扰现象出现。In the method for implementing the multimedia conference provided by the embodiment, the multimedia conference server detects the energy of the voice information sent by each participant, and determines the preset number of participants as the speaker according to the order of the energy. The multimedia conference server only converts the contents of the confirmed speaker into Corresponding text information. This method can avoid converting a lot of speech unrelated to a conference into text, resulting in many texts that are not related to the conference being displayed to the participants, and the interference caused to the participants occurs.
图8是本发明实施例示出的一种实现多媒体会议的装置结构示意图,如图8所示,该实现多媒体会议的装置,用于客户端,包括:获取单元110、转换单元120和发送单元130。FIG. 8 is a schematic structural diagram of an apparatus for implementing a multimedia conference according to an embodiment of the present invention. As shown in FIG. 8, the apparatus for implementing a multimedia conference is used for a client, and includes: an obtaining unit 110, a converting unit 120, and a sending unit 130. .
获取单元110,用于获取本地的发言语音信息。The obtaining unit 110 is configured to obtain local speaking voice information.
本地与会者是指与客户端处于同一地理空间的与会者,例如,与会者A使用客户端a参加多媒体会议,对于客户端a而言,与会者A即与客户端a对应的本地与会者。A local participant refers to a participant in the same geographical space as the client. For example, participant A uses client a to participate in a multimedia conference. For client a, participant A is a local participant corresponding to client a.
所述获取单元110可以使用语音设备获取本地与会者的发言语音信息。该语音设备可以包括集成于客户端上的语音信息采集硬件和控制该语音信息采集硬件的操作软件。其中,所述语音信息采集硬件能够实现语音采集、语音编码及语音解码等功能。所述操作软件可以查询本地语音信息采集硬件的数量和名称,还可以开启、关闭或者静音该语音采集硬件。The obtaining unit 110 may acquire the speech information of the local participant using the voice device. The voice device can include voice information collection hardware integrated on the client and operating software that controls the voice information collection hardware. The voice information collection hardware can implement functions such as voice collection, voice coding, and voice decoding. The operating software can query the number and name of the local voice information collecting hardware, and can also turn on, turn off or mute the voice collecting hardware.
本实施例中的实现多媒体的装置可以适用于讨论式会议应用场景,每个与会者都可以发言,这样,每个客户端都能够获得与自身对应的与会者的发言语音信息。如果客户端通过语音设备获取与会者的发言语音信息,则每个与会者对应的语音设备都处于开启状态。The device for implementing multimedia in this embodiment can be applied to a discussion-type conference application scenario, and each participant can speak, so that each client can obtain the voice information of the participant's voice corresponding to itself. If the client obtains the voice information of the participant's voice through the voice device, the voice device corresponding to each participant is turned on.
转换单元120,用于将发言语音信息转换成发言文字信息。The converting unit 120 is configured to convert the speech information into speech text information.
其中,转换单元120利用语音识别技术将获取的本地发言人的发言语音信息转换为发言文字信息。The converting unit 120 converts the voice information of the obtained local speaker into speech text information by using a voice recognition technology.
客户端获得的本地与会者的发言语音信息能量相对较强,因此由发言人对应的客户端将发言语音信息转换发言文字信息的准确率较高。同时,此种方式不需要其它与会者对应的客户端将发言人的发言语音信息转换发言文字信息,节省了其它与会者对应的客户端的资源。The voice information of the local participants obtained by the client is relatively strong. Therefore, the accuracy of the voice message converted by the client corresponding to the speaker is higher. At the same time, in this way, the client corresponding to other participants does not need to convert the speech information of the speaker's speech into the speech text information, thereby saving the resources of the client corresponding to other participants.
发送单元130,用于将发言语音信息及发言文字信息发送给多媒体会议服务器,以使所述多媒体会议服务器将所述发言语音信息及发言文字信息发送 给其它与会者对应的客户端。The sending unit 130 is configured to send the speaking voice information and the speaking text information to the multimedia conference server, so that the multimedia conference server sends the speaking voice information and the speaking text information. The client corresponding to other participants.
其中,所述其它与会者是参加多媒体会议的全部与会者中除发言人之外的其它与会者。The other participant is a participant other than the speaker among all the participants who participate in the multimedia conference.
客户端将发言语音信息及发言文字信息发送多媒体会议服务器,以使多媒体会议服务器将其发送给参加本次多媒体会议的其它与会者对应的客户端,最终使得其它与会者对应的客户端展示接收到的发言语音信息及发言文字信息,从而有助于与会者快速理解发言人的发言内容。The client sends the voice message and the text message to the multimedia conference server, so that the multimedia conference server sends the message to the client corresponding to the other participants participating in the multimedia conference, and finally the client corresponding to the other participant receives the presentation. The speech message and the text message are spoken, which helps the participants to quickly understand the speaker's speech.
例如,参加本地多媒体会议的与会者包括A、B、C、D和E,其中,与会者A是发言人,则与会者B、C、D、E是其它与会者。多媒体会议服务器将与会者A的发言语音信息和发言文字信息发送给B、C、D、E。For example, participants participating in the local multimedia conference include A, B, C, D, and E. Among them, participant A is a speaker, and participants B, C, D, and E are other participants. The multimedia conference server sends the voice information and the voice text information of the participant A to the B, C, D, and E.
可以在客户端和多媒体会议服务器上都集成T.120协议标准,从而在客户端和多媒体会议服务器之间实现发言语音信息及发言文字信息的收发功能。其中,T.120标准包括T.120-T.127等一系列协议,该协议标准能够实现客户端之间以及客户端与多媒体会议服务器之间的信息传输的可靠性,同时,能够提供点到多点的数据分发服务,并选择传输效率最佳的传输路径传输数据。The T.120 protocol standard can be integrated on both the client and the multimedia conference server to implement the function of transmitting and receiving voice messages and speaking text information between the client and the multimedia conference server. Among them, the T.120 standard includes a series of protocols such as T.120-T.127, which can realize the reliability of information transmission between clients and between the client and the multimedia conference server, and at the same time, can provide points to Multi-point data distribution service and select the transmission channel with the best transmission efficiency to transmit data.
该实施例示出的实现多媒体会议的装置,由获取单元获取本地与会者的发言语音信息,并通过转换单元将发言语音信息转换为发言文字信息。然后通过发送单元将发言语音信息及发言文字信息发送给多媒体会议服务器,再由多媒体会议服务器转发给参加多媒体会议的其它与会者对应的客户端,其它与会者对应的客户端展示接收到的发言语音信息及发言文字信息。利用本发明提供的实现多媒体会议的装置,与会者既能够听到发言人的发言语音信息又能够看到对应的发言文字信息,这样,与会者能够结合发言文字信息和发言语音信息准确理解发言人的发言内容,因此提高了多媒体会议的沟通效果。The apparatus for implementing a multimedia conference shown in this embodiment acquires the speech information of the local participant by the acquisition unit, and converts the speech information into speech text information through the conversion unit. Then, the speaking voice information and the speaking text information are sent to the multimedia conference server through the sending unit, and then forwarded by the multimedia conference server to the client corresponding to other participants participating in the multimedia conference, and the client corresponding to the other participant displays the received voice. Information and text messages. With the device for implementing the multimedia conference provided by the present invention, the participant can both hear the voice information of the speaker and the corresponding text message, so that the participant can accurately understand the speaker by combining the text message and the voice message. The content of the speech has thus improved the communication effect of the multimedia conference.
图9是本发明实施例另一种实现多媒体会议的装置的结构示意图,该装置在图8所示实施例的基础上还可以包括:显示单元140和存储单元150。FIG. 9 is a schematic structural diagram of another apparatus for implementing a multimedia conference according to an embodiment of the present invention. The apparatus may further include: a display unit 140 and a storage unit 150 on the basis of the embodiment shown in FIG. 8.
显示单元140,用于显示发言文字信息。 The display unit 140 is configured to display the spoken text information.
存储单元150,用于存储所述发言文字信息。The storage unit 150 is configured to store the spoken text information.
可选地,通过增加存储单元150,发言人对应的客户端还可以存储发言文字信息,以便利用所述发言文字信息生成会议纪要。同理,参加多媒体会议的其它与会者对应的客户端也可以存储接收到的发言文字信息,以便根据所述发言文字信息生成会议纪要。此外,发言人对应的客户端也可以显示所述发言文字信息,从而方便发言人查看自己的发言内容。Optionally, by adding the storage unit 150, the client corresponding to the speaker may further store the utterance text information, so as to generate the conference minutes by using the utterance text information. Similarly, the client corresponding to other participants participating in the multimedia conference may also store the received speech text information to generate a meeting minutes based on the spoken text information. In addition, the corresponding client of the speaker can also display the spoken text information, so that the speaker can view the content of his speech.
在一种应用场景中,只需要主持人和主讲人的发言内容转换成对应的文字信息,忽略其它与会者的发言内容。In an application scenario, only the content of the host and the presenter's speech is converted into corresponding text information, and the contents of other participants' speeches are ignored.
图10是本发明实施例示出的一种获取单元110的结构示意图,该实现多获取单元110适用于只需要主持人或者主讲人的发言内容转换成对应的文字信息,忽略其它与会者的发言内容的应用场景。如图8所示,该获取单元110可以包括第一判断子单元1101和第一采集子单元1102:FIG. 10 is a schematic structural diagram of an obtaining unit 110 according to an embodiment of the present invention. The multi-acquisition unit 110 is configured to convert a speech content of a host or a presenter into corresponding text information, and ignore the content of other participants. Application scenario. As shown in FIG. 8, the obtaining unit 110 may include a first determining subunit 1101 and a first collecting subunit 1102:
第一判断子单元1101,用于当本地客户端对应的与会者需要发言时,判断所述与会者是否具有发言权限。The first determining sub-unit 1101 is configured to determine, when the participant corresponding to the local client needs to speak, whether the participant has the speaking right.
在会议具有主持人和固定的主讲人的应用场景中,通常只有主讲人和主持人具有发言权限。判断与会者是否具有发言权限可以包括判断与会者的身份属性是否具有主讲人权限或主持人权限。In the application scenario where the conference has a moderator and a fixed presenter, usually only the presenter and the moderator have the right to speak. Determining whether the participant has the speaking right may include determining whether the identity attribute of the participant has the presenter right or the moderator right.
第一采集子单元1102,用于当所述第一判断单元1101判定所述本地与会者自身具有发言权限具有主讲人权限或主持人权限时,利用语音设备采集发言语音信息。The first collecting sub-unit 1102 is configured to use the voice device to collect the speaking voice information when the first determining unit 1101 determines that the local participant has the speaking right having the speaking right or the moderator right.
本实施例提供的实现多媒体会议的装置中,只将具有发言权限的与会者的发言语音信息转换成发言文字信息,而不是将所有与会者的发言内容转换成对应的文字信息。利用该装置能够避免将多媒体会议中与会者发出的与会议无关的语音内容转换成相应的文字信息转发给其它与会者,进而避免其它与会者的客户端显示过多不重要的文字信息对与会者造成干扰的现象发生。In the apparatus for implementing a multimedia conference provided by this embodiment, only the speech information of the participant having the speaking authority is converted into the speech text information, instead of converting the speech content of all the participants into the corresponding text information. The device can be used to prevent the conference-independent voice content sent by the participants in the multimedia conference from being converted into corresponding text information and forwarded to other participants, thereby preventing other participants' clients from displaying too much unimportant text information to the participants. The phenomenon of interference occurs.
在另一种应用场景中,所有与会者都允许发言,例如,讨论式会议。但是,如果将所有与会者发出的语音信息都转换成相应的文字信息,将会造成 很多与会议无关的语音转换成文字,并造成很多与会议无关的文字显示给与会者,对与会者造成干扰。鉴于上述应用场景,可以将语音能量较大的与会者确定为发言人,并将发言人的发言语音信息转换成发言文字信息,其它语音能量较小的与会者的语音内容被忽略。In another application scenario, all participants are allowed to speak, for example, a discussion session. However, if the voice information sent by all participants is converted into the corresponding text information, it will result in Many speeches unrelated to the conference are converted into text, and many texts that are not related to the conference are displayed to the participants, causing interference to the participants. In view of the above application scenarios, a participant with a large voice energy can be determined as a speaker, and the voice information of the speaker's speech can be converted into a speech text information, and the voice content of other participants with less voice energy is ignored.
图11是本发明实施例示出的一种转换单元120的结构示意图,该转换单元120适用于与会人数较多,与会者均能够发言的应用场景。如图11所示,该转换单元120可以包括第一接收子单元1201、第二判断子单元1202和第二采集子单元1203:FIG. 11 is a schematic structural diagram of a conversion unit 120 according to an embodiment of the present invention. The conversion unit 120 is applicable to an application scenario in which a large number of participants and participants can speak. As shown in FIG. 11, the converting unit 120 may include a first receiving subunit 1201, a second judging subunit 1202, and a second collecting subunit 1203:
第一接收子单元1201,用于接收多媒体会议服务器发送的发言通知消息,由于该发言通知消息携带发言人的用户身份识别信息ID,所述发言人由多媒体会议服务器根据参加多媒体会议的与会者发送的语音信息的能量,按照所述能量由大到小的顺序依次确定出的前预设数量个与会者,与会者的客户端可将该用户ID与自身的用户ID进行比较,从而判断自身客户端所对应的与会者是否是发言人。The first receiving subunit 1201 is configured to receive a speech notification message sent by the multimedia conference server, where the speaker notification message carries the user identity information ID of the speaker, and the speaker is sent by the multimedia conference server according to the participant participating in the multimedia conference. The energy of the voice information is determined according to the order of the energy from the largest to the smallest preset number of participants, and the client of the participant can compare the user ID with its own user ID to determine its own client. Whether the participant corresponding to the end is a speaker.
第二判断子单元1202,用于判断所述发言通知消息所携带的用户ID与所述本地与会者的用户ID是否相同。The second determining sub-unit 1202 is configured to determine whether the user ID carried by the speech notification message is the same as the user ID of the local participant.
第二采集子单元1203,用于当所述第二判断子单元1202判定所述发言通知消息所携带的用户ID与所述本地与会者的用户ID相同时,利用语音设备采集所述本地与会者的发言语音信息。The second collection sub-unit 1203 is configured to: when the second determination sub-unit 1202 determines that the user ID carried by the speech notification message is the same as the user ID of the local participant, the local participant is collected by using a voice device. Speech message.
本实施例中,通过转换单元120中的第一接收子单元接收多媒体会议服务器发送的发言通知消息,由于该发言通知消息携带发言人的用户身份标识信息ID,所述发言人可由多媒体会议服务器根据参加多媒体会议的与会者发送的语音信息的能量,按照由大到小的顺序依次确定的前设数量个与会者,即客户端只将能量最大的预设数量个与会者的发言内容转换成对应的文字信息。能够避免将很多与会议无关的语音转换成文字,导致很多与会议无关的文字显示给与会者,对与会者造成的干扰现象出现。In this embodiment, the first notification subunit in the conversion unit 120 receives the speech notification message sent by the multimedia conference server. Since the speech notification message carries the user identity information ID of the speaker, the speaker may be according to the multimedia conference server. The energy of the voice information sent by the participants participating in the multimedia conference is determined according to the number of participants in the order of being large to small, that is, the client only converts the content of the voice of the preset number of participants with the largest energy into corresponding Text information. It is possible to avoid converting a lot of speech that is not related to the conference into text, which causes many texts that are not related to the conference to be displayed to the participants, and the interference caused to the participants appears.
在又一种应用场景中,只有主持人和主讲人能够发言,其它与会者不能 发言,其它与会者的语音设备被关闭,且与会者自己不能开启语音设备。当与会者需要发言时,与会者可以请求主持人开启该与会者的语音设备。In another application scenario, only the moderator and the presenter can speak, and other participants cannot. Speaking, the voice devices of other participants are turned off, and the participants themselves cannot turn on the voice device. When a participant needs to speak, the participant can request the moderator to turn on the participant's voice device.
图12是本发明实施例示出的又一种获取单元110的结构示意图。该获取单元110应用于主持人指定发言人的应用场景,如图12所示,所述获取单元110包括:第一发送子单元1103、第二接收子单元1104和第三采集子单元1105。FIG. 12 is a schematic structural diagram of still another acquiring unit 110 according to an embodiment of the present invention. The acquiring unit 110 is applied to an application scenario in which the moderator specifies a speaker. As shown in FIG. 12, the acquiring unit 110 includes: a first sending subunit 1103, a second receiving subunit 1104, and a third collecting subunit 1105.
第一发送子单元1103,用于向多媒体会议服务器发送发言请求消息,所述发言请求消息携带所述本地与会者的用户ID,以使所述多媒体会议服务器将所述发言请求消息发送给主持人。a first sending subunit 1103, configured to send a floor request message to the multimedia conference server, where the floor request message carries a user ID of the local participant, so that the multimedia conference server sends the floor request message to the moderator .
除主持人和主讲人之外的其它与会者需要发言时,由该与会者对应的客户端向多媒体会议服务器发送发言请求消息。所述发言请求消息携带该与会者的用户ID。When the participant other than the moderator and the presenter needs to speak, the client corresponding to the participant sends a speech request message to the multimedia conference server. The floor request message carries the user ID of the participant.
第二接收子单元1104,用于接收所述多媒体会议服务器发送的语音设备开启指令,a second receiving subunit 1104, configured to receive a voice device open command sent by the multimedia conference server, where
该语音设备开启指令由所述多媒体会议服务器在接收到主持人对应的客户端根据发言请求消息返回的发言响应消息产生,具体地,主持人对应的客户端接收到所述发言请求消息后,根据发言请求消息携带的用户ID判断是否允许该与会者发言,如果允许该与会者发言,则主持人人对应的客户端将产生发言响应消息并发送给多媒体会议服务器。该发言响应消息中也可以携带该与会者的用户ID,以方便多媒体会议服务器识别该与会者。The voice device open command is generated by the multimedia conference server after receiving the utterance response message returned by the client corresponding to the utterance request message, and specifically, after receiving the sneak request message, the client corresponding to the host The user ID carried in the floor request message determines whether the participant is allowed to speak. If the participant is allowed to speak, the client corresponding to the host will generate a response message and send it to the multimedia conference server. The participant response message may also carry the participant's user ID to facilitate the multimedia conference server to identify the participant.
其中,主持人对应的客户端可以根据预先设定的与会者的身份属性判断是否允许与会者发言。The client corresponding to the moderator may determine whether to allow the participant to speak according to the identity attribute of the preset participant.
第三采集子单元1105,用于当所述第二接收子单元1104接收到所述语音设备开启指令时,利用语音设备采集所述本地与会者的发言语音信息。The third collection sub-unit 1105 is configured to collect the speech information of the local participant by using the voice device when the second receiving sub-unit 1104 receives the voice device opening instruction.
本实施例提供的获取单元,除主持人或主讲人之外的其它与会者需要发言时,通过多媒体会议服务器向主持人的客户端转发言请求消息,由主持人根据所述发言请求消息判断是否允许所述与会者发言,若被允许所述与会者发言,则主持人的客户端向多媒体会议服务器发送允许所述与会者发言的发 言响应消息,以使多媒体会议服务器根据该发言响应消息产生语音设备开启指令,控制所述与会者对应的语音设备开启。由该与会者对应的语音设备获取该与会者的发言语音信息,由该与会者对应的客户端将所述发言语音信息转换成发言文字信息。该装置适用于正式会议或级别较高的会议场景,扩大了多媒体会议实现方法的适用范围。In the acquiring unit provided by the embodiment, when the participant other than the moderator or the presenter needs to speak, the multimedia conference server transmits a request message to the client of the host, and the host determines whether the message is requested according to the message. Allowing the participant to speak. If the participant is allowed to speak, the host's client sends a message to the multimedia conference server that allows the participant to speak. And responding to the message, so that the multimedia conference server generates a voice device open command according to the voice response message, and controls the voice device corresponding to the participant to be turned on. The voice information of the participant is obtained by the voice device corresponding to the participant, and the client corresponding to the participant converts the voice information into speech text information. The device is suitable for formal meetings or higher-level meeting scenarios, and expands the scope of application of the multimedia conference implementation method.
图13是本发明实施例示出的又一种实现多媒体会议的装置结构示意图,如图13所示,该实现多媒体会议的装置,用于多媒体会议服务器,如图13所示,该装置包括获取单元210和第一发送单元220。FIG. 13 is a schematic structural diagram of another apparatus for implementing a multimedia conference according to an embodiment of the present invention. As shown in FIG. 13, the apparatus for implementing a multimedia conference is used for a multimedia conference server. As shown in FIG. 13, the apparatus includes an acquisition unit. 210 and the first transmitting unit 220.
获取单元210,用于获取客户端发送的发言语音信息及发言文字信息。The obtaining unit 210 is configured to obtain the speaking voice information and the speaking text information sent by the client.
第一发送单元220,用于将所述发言语音信息及发言文字信息发送给其它与会者对应的客户端,以使所述其它与会者对应的客户端展示所述发言语音信息及所述发言文字信息;其中,所述其它与会者是参加所述多媒体会议的与会者中除发送所述发言语音信息及发言文字信息的与会者之外的与会者。The first sending unit 220 is configured to send the speaking voice information and the speaking text information to the client corresponding to the other participant, so that the client corresponding to the other participant displays the speaking voice information and the speaking text. Information; wherein the other participant is a participant other than the participant who sent the speaking voice information and the speaking text information among the participants participating in the multimedia conference.
多媒体会议服务器将接收到发言语音信息及发言文字信息发送给参加本次多媒体会议的其它与会者对应的客户端。其它与会者对应的客户端展示接收到的发言语音信息及发言文字信息,从而有助于与会者快速理解发言人的发言内容。The multimedia conference server sends the received voice information and the spoken text information to the client corresponding to other participants participating in the multimedia conference. The client corresponding to other participants displays the received speech information and the spoken text information, thereby facilitating the participants to quickly understand the speaker's speech.
本实施例示出的应用于多媒体会议服务器的实现多媒体会议的装置,客户端获取本地与会者的发言语音信息并发送给多媒体会议服务器;然后,由多媒体会议服务器将发言语音信息及发言文字信息转发给参加多媒体会议的其它与会者对应的客户端,以使所述其它与会者对应的客户端展示接收到的发言语音信息及发言文字信息。利用本发明提供的实现多媒体会议的装置,与会者既能够听到发言人的发言语音信息又能够看到对应的发言文字信息,这样,与会者能够结合发言文字信息和发言语音信息准确理解发言人的发言内容,因此提高了多媒体会议的沟通效果。The device for implementing the multimedia conference applied to the multimedia conference server in the embodiment, the client obtains the voice information of the voice of the local participant and sends the voice message to the multimedia conference server; and then, the multimedia conference server forwards the voice message and the voice message to the voice conference server. A client corresponding to another participant participating in the multimedia conference, so that the client corresponding to the other participant displays the received speech voice information and the speech text information. With the device for implementing the multimedia conference provided by the present invention, the participant can both hear the voice information of the speaker and the corresponding text message, so that the participant can accurately understand the speaker by combining the text message and the voice message. The content of the speech has thus improved the communication effect of the multimedia conference.
在一种应用场景中,所有与会者都允许发言,例如,讨论式会议。但是,如果多媒体会议服务器将所有与会者发出的语音信息和文字信息都发送 给其它与会者,将会造成很多与会议无关的语音转换成文字,并造成很多与会议无关的文字显示给与会者,对与会者造成干扰。鉴于上述应用场景,可以将语音能量较大的与会者确定为发言人,并将发言人的发言语音信息转换成发言文字信息,其它语音能量较小的与会者的语音内容被忽略。In an application scenario, all participants are allowed to speak, for example, a discussion session. However, if the multimedia conference server sends voice messages and text messages from all participants For other participants, a lot of speech that is not related to the conference will be converted into text, and many texts that are not related to the conference will be displayed to the participants, causing interference to the participants. In view of the above application scenarios, a participant with a large voice energy can be determined as a speaker, and the voice information of the speaker's speech can be converted into a speech text information, and the voice content of other participants with less voice energy is ignored.
图14是本发明实施例示出的另一种实现多媒体会议的装置的结构示意图,该实施例适用于与会者人数较多且与会者均能发言的应用场景,该装置在图13所示的实施例的基础上还可以包括检测单元230、确定单元240和第二发送单元250。FIG. 14 is a schematic structural diagram of another apparatus for implementing a multimedia conference according to an embodiment of the present invention. The embodiment is applicable to an application scenario in which the number of participants is large and the participants can speak. The device is implemented in FIG. 13 . The detection unit 230, the determination unit 240, and the second transmission unit 250 may also be included on an example basis.
检测单元230,用于检测客户端发送的语音信息的能量。The detecting unit 230 is configured to detect energy of the voice information sent by the client.
多媒体会议服务器将接收参加多媒体会议的与会者的客户端获得的该与会者的语音信息,由多媒体会议服务器检测接收到的语音信息的能量。The multimedia conference server will receive the voice information of the participant obtained by the client of the participant participating in the multimedia conference, and the multimedia conference server detects the energy of the received voice message.
本实施例中,检测语音信息的能量可以由多媒体会议服务器中的语音会议桥实现。所述语音会议桥用于提供服务器侧的语音会场,将各发言人的语音混音后送给每个与会者。In this embodiment, the energy of detecting the voice information may be implemented by a voice conference bridge in the multimedia conference server. The voice conference bridge is used to provide a voice conference site on the server side, and the voices of the speakers are mixed and sent to each participant.
确定单元240,用于按照所述能量由大到小的顺序确定前预设数量个与会者为发言人。The determining unit 240 is configured to determine, according to the order of the energy from the largest to the smallest, the pre-preset number of participants as the speaker.
多媒体会议服务器通过检测参加多媒体会议的与会者发送的语音信息的能量,按照能量由大到小进行排序并依次确定出预设数量个与会者为发言人。例如,所述预设数量可以是一个,即将语音信息的能量最大的与会者确定为发言人;或者,所述预设数量可以是两个,即将语音信息的能量最大的两个与会者确定为发言人。The multimedia conference server detects the energy of the voice information sent by the participants participating in the multimedia conference, sorts according to the energy from large to small, and sequentially determines a preset number of participants as speakers. For example, the preset number may be one, that is, the participant with the highest energy of the voice information is determined as a speaker; or the preset number may be two, that is, two participants with the highest energy of the voice information are determined as spokesman.
需要说明的是,此种应用场景下,如果发言人在不同时刻发出语音的能量不同,多媒体会议服务器根据语音信息的能量确定出的发言人可能不同。It should be noted that, in this application scenario, if the speaker emits different voice energy at different times, the multimedia conference server may determine that the speaker may be different according to the energy of the voice information.
第二发送单元250,用于向所述发言人对应的客户端发送发言通知消息,所述发言通知消息携带所述发言人的用户身份识别信息ID,以使所述发言人对应的客户端获取所述发言人的发言语音信息并将所述发言语音信息转换为发言文字信息。 The second sending unit 250 is configured to send a spoofing notification message to the client corresponding to the speaker, where the spoofing notification message carries the user identification information ID of the speaker, so that the client corresponding to the speaker obtains The speaker's speech information is converted and the speech information is converted into speech text information.
其中,多媒体会议服务器可以通过广播的形式将发言通知消息发送给所有参加多媒体会议的与会者的客户端,并由与会者的客户端根据发言通知消息中的用户ID判断本客户端所对应的与会者是否为发言人;也可以将发言通知消息一对一发送给该用户ID所对应的与会者的客户端,由客户端根据用户ID判断自身是否是发言人。The multimedia conference server may send a speech notification message to all clients participating in the multimedia conference by means of broadcast, and the client of the participant determines the participant corresponding to the client according to the user ID in the speech notification message. Whether the speaker is a speaker or not, the speaker notification message may be sent one-to-one to the client of the participant corresponding to the user ID, and the client determines whether the speaker is a speaker according to the user ID.
与会者的客户端接收多媒体会议服务器的发言通知消息,由于该发言通知消息中含有用户ID,与会者的客户端可将该用户ID与自身的用户ID进行比较,从而判断客户端所对应的与会者是否是发言人。The participant's client receives the speech notification message of the multimedia conference server. Since the speech notification message contains the user ID, the participant's client can compare the user ID with its own user ID, thereby determining the client's corresponding participation. Whether it is a spokesman.
本实施例提供的实现多媒体会议的装置,多媒体会议服务器检测各个与会者发出语音信息的能量,并按照能量由大到小的顺序,确定前预设数量个与会者为发言人,即只将能量最大的预设数量个与会者的发言内容转换成对应的文字信息。该装置能够避免让很多客户端产生的与会议无关的语音转换成文字,导致很多与会议无关的文字显示给与会者,对与会者造成的干扰现象出现。The device for implementing the multimedia conference provided by the embodiment, the multimedia conference server detects the energy of the voice information sent by each participant, and determines the preset number of participants as the speaker according to the order of the energy, that is, only the energy The speech content of the largest preset number of participants is converted into corresponding text information. The device can prevent many conference-independent voices generated by many clients from being converted into texts, resulting in many texts that are not related to the conference being displayed to the participants, and the interference caused to the participants occurs.
图15是本发明实施例示出的又一种实现多媒体会议的装置的结构示意图。该实现多媒体会议的装置应用于主持人指定发言人的应用场景,该装置在图13中所示的实施例的基础上还可以包括:第一接收单元260、第三发送单元270、第二接收单元280和第四发送单元290。FIG. 15 is a schematic structural diagram of another apparatus for implementing a multimedia conference according to an embodiment of the present invention. The device for implementing the multimedia conference is applied to the application scenario of the moderator designated by the moderator. The device may further include: a first receiving unit 260, a third sending unit 270, and a second receiving, based on the embodiment shown in FIG. Unit 280 and fourth transmitting unit 290.
第一接收单元260,用于接收客户端发送的发言请求消息,所述发言请求消息携带所述客户端对应的与会者的用户ID。The first receiving unit 260 is configured to receive a floor request message sent by the client, where the floor request message carries a user ID of the participant corresponding to the client.
第三发送单元270,用于将所述发言请求消息发送给主持人对应的客户端,以使所述主持人对应的客户端根据所述发言请求消息判断发送发言请求消息的与会者是否具有发言权限。The third sending unit 270 is configured to send the floor request message to the client corresponding to the host, so that the client corresponding to the moderator determines, according to the floor request message, whether the participant who sends the floor request message has a speech. Permissions.
第二接收单元280,用于接收所述主持人对应的客户端发送的发言响应消息。The second receiving unit 280 is configured to receive a speech response message sent by the client corresponding to the moderator.
主持人对应的客户端接收到所述发言请求消息后,根据发言请求消息携带的用户ID判断是否允许该与会者发言。如果允许该与会者发言,则产生发 言响应消息,多媒体会议服务器将接收到该与会者的发言响应消息。该发言响应消息中也可以携带该与会者的用户ID,以方便多媒体会议服务器识别该与会者。After receiving the message request message, the client corresponding to the moderator determines whether the participant is allowed to speak according to the user ID carried in the message request message. If the participant is allowed to speak, the hair is generated In response to the message, the multimedia conference server will receive the speech response message of the participant. The participant response message may also carry the participant's user ID to facilitate the multimedia conference server to identify the participant.
其中,主持人对应的客户端可以根据预先设定的与会者的身份属性判断是否允许与会者发言。例如,在建立多媒体会议时,主持人就可以根据与会者的与会身份判断该与会者是否能够发言,例如,会议的主讲人允许发言。The client corresponding to the moderator may determine whether to allow the participant to speak according to the identity attribute of the preset participant. For example, when establishing a multimedia conference, the moderator can judge whether the participant can speak according to the attendance status of the participant, for example, the presenter of the conference allows the speaker to speak.
第四发送单元290,用于向具有发言权限的与会者对应的客户端发送语音设备开启指令,所述发言响应消息由所述主持人对应的客户端确定发送所述发言请求消息的与会者具有发言权限时产生。The fourth sending unit 290 is configured to send a voice device open command to the client corresponding to the participant having the floor permission, where the speaker response message is determined by the client corresponding to the moderator to send the participant request message Generated when speaking permission.
多媒体会议服务器根据接收到发言响应消息生成语音设备开启指令,该语音设备开启指令用于控制主持人允许发言的与会者对应的语音设备开启。The multimedia conference server generates a voice device open command according to the received speech response message, and the voice device open command is used to control the voice device corresponding to the participant who is allowed to speak.
本实施例提供的实现多媒体会议的装置,除主持人或主讲人之外的其它与会者需要发言时,多媒体会议服务器将向主持人的客户端转发所述其它与会者的发言请求消息,由主持人根据所述发言请求消息判断是否允许所述与会者发言,若允许所述与会者发言,则多媒体会议服务器将接收到主持人客户端发送的允许所述与会者发言的发言响应消息,多媒体会议服务器根据该发言响应消息产生语音设备开启指令,控制所述与会者对应的语音设备开启。开启后,该与会者对应的语音设备获取该与会者的发言语音信息,由该与会者对应的客户端将所述发言语音信息转换成发言文字信息。该方法适用于正式会议或级别较高的会议场景,扩大了多媒体会议实现方法的适用范围。In the device for implementing the multimedia conference provided by this embodiment, when the participant other than the moderator or the presenter needs to speak, the multimedia conference server forwards the message request message of the other participant to the client of the moderator, and is hosted by the host. The person determines, according to the speech request message, whether the participant is allowed to speak. If the participant is allowed to speak, the multimedia conference server receives the speech response message sent by the moderator client to allow the participant to speak, and the multimedia conference The server generates a voice device open command according to the voice response message, and controls the voice device corresponding to the participant to be turned on. After being enabled, the voice device corresponding to the participant obtains the voice information of the voice of the participant, and the client corresponding to the participant converts the voice message into speech text information. This method is applicable to formal conferences or higher-level conference scenarios, and expands the scope of application of multimedia conference implementation methods.
相应于上述的图6~图7所示的实现多媒体会议的方法实施例,本发明还提供了相应的装置实施例。Corresponding to the method embodiment for implementing the multimedia conference shown in FIG. 6 to FIG. 7 above, the present invention further provides a corresponding device embodiment.
图16是本发明实施例示出的一种应用于客户端的实现多媒体会议的装置的结构示意图,该装置包括:获取单元310和发送单元320。FIG. 16 is a schematic structural diagram of an apparatus for implementing a multimedia conference applied to a client according to an embodiment of the present invention. The apparatus includes: an obtaining unit 310 and a sending unit 320.
获取单元310,用于获取本地与会者的发言语音信息。 The obtaining unit 310 is configured to obtain the speaking voice information of the local participant.
发送单元320,用于将所述发言语音信息发送给多媒体会议服务器,以使所述多媒体会议服务器将所述发言语音信息转换为发言文字信息,以及使所述多媒体会议服务器将所述发言语音信息及所述发言文字信息发送给参加多媒体会议的其它与会者的客户端The sending unit 320 is configured to send the speech voice information to the multimedia conference server, so that the multimedia conference server converts the speech voice information into speech text information, and causes the multimedia conference server to use the speech message information And sending the speech text information to the client of other participants participating in the multimedia conference
其中,所述其它与会者是所述多媒体会议的与会者中除所述本地与会者之外的其它与会者。The other participant is a participant other than the local participant among the participants of the multimedia conference.
本实施例提供的实现多媒体会议的装置,与会者的客户端获得发言语音信息后发送给多媒体会议服务器,由多媒体会议服务器将发言语音信息转换成发言文字信息,然后,再将发言语音信息及对应的发言文字信息发送给参加多媒体会议的其它与会者对应的客户端。这样,参加多媒体会议的与会者既能够听到发言人的发言语音信息,又能够看到相应的发言文字信息,能够准确理解发言人的发言内容,提高多媒体会议的沟通效果。该方法由多媒体会议服务器将发言语音信息转换成发言文字信息,不需要在各个客户端上集成语音识别引擎,降低了客户端的生产成本。In the device for implementing the multimedia conference provided by the embodiment, the client of the participant obtains the voice message and sends the message to the multimedia conference server, and the multimedia conference server converts the voice message into the voice message, and then the voice message and the corresponding voice message. The spoken text information is sent to the client corresponding to other participants participating in the multimedia conference. In this way, the participants who participate in the multimedia conference can not only hear the voice information of the speaker's speech, but also can see the corresponding speech text information, can accurately understand the speaker's speech content, and improve the communication effect of the multimedia conference. The method converts the speech information into speech text information by the multimedia conference server, and does not need to integrate the speech recognition engine on each client, thereby reducing the production cost of the client.
图17是本发明实施例示出的一种应用于多媒体会议服务器的实现多媒体会议的装置的结构示意图,该装置包括:获取单元410、转换单元420和发送单元430。FIG. 17 is a schematic structural diagram of an apparatus for implementing a multimedia conference applied to a multimedia conference server according to an embodiment of the present invention. The apparatus includes: an obtaining unit 410, a converting unit 420, and a sending unit 430.
获取单元410,用于获取客户端发送的发言语音信息。The obtaining unit 410 is configured to obtain the speaking voice information sent by the client.
转换单元420,用于将所述发言语音信息转换成发言文字信息。The converting unit 420 is configured to convert the spoken voice information into speech text information.
在本发明一个实施例中,多媒体会议服务器根据与会者法语音信息的能量大小,确定能量最大的预设数量个与会者为发言人,并将接收到的发言人的发言语音信息转换成发言文字信息。该转换单元420可以包括检测子单元和转换子单元。In an embodiment of the present invention, the multimedia conference server determines, according to the energy of the participant's voice information, a preset number of participants with the largest energy as a speaker, and converts the voice information of the received speaker into a speech text. information. The conversion unit 420 may include a detection subunit and a conversion subunit.
所述检测子单元,用于检测客户端发送的语音信息的能量,按照所述能量由大到小的顺序依次确定出前预设数量个与会者为发言人;所述转换子单元,用于利用语音识别引擎将确定出的发言人发送的发言语音信息转换为发言文字信息。 The detecting subunit is configured to detect the energy of the voice information sent by the client, and determine, according to the energy from the largest to the smallest, the preset number of participants are speakers; the conversion subunit is used to utilize The speech recognition engine converts the speech speech information sent by the determined speaker into speech text information.
发送单元430,用于将所述发言语音信息及所述发言文字信息发送给其它与会者对应的客户端,以使所述其它与会者对应的客户端展示所述发言语音信息及所述发言文字信息。The sending unit 430 is configured to send the speaking voice information and the speaking text information to a client corresponding to another participant, so that the client corresponding to the other participant displays the speaking voice information and the speaking text. information.
其中,所述其它与会者是参加所述多媒体会议的与会者中除发送所述发言语音信息的与会者之外的与会者。The other participant is a participant other than the participant who sends the speaking voice information among the participants who participate in the multimedia conference.
本实施例提供的实现多媒体会议的装置,多媒体会议服务器检测各个与会者发出语音信息的能量,并按照能量由大到小的顺序,确定前预设数量个与会者为发言人。多媒体会议服务器只将确定出的发言人的发言内容转换成对应的文字信息。该方法能够避免将很多与会议无关的语音转换成文字,导致很多与会议无关的文字显示给与会者,对与会者造成的干扰现象出现。In the device for implementing the multimedia conference provided by the embodiment, the multimedia conference server detects the energy of the voice information sent by each participant, and determines the preset number of participants as the speaker according to the order of the energy. The multimedia conference server only converts the content of the speech of the determined speaker into corresponding text information. This method can avoid converting a lot of speech unrelated to a conference into text, resulting in many texts that are not related to the conference being displayed to the participants, and the interference caused to the participants occurs.
相应于图8~图12所示的应用于客户端的实现多媒体会议的装置,本发明实施例还提供了一种实现多媒体会议的客户端,请参见图18,所述客户端包括:处理器1411、发送器1412和存储器1413;Corresponding to the device for implementing the multimedia conference applied to the client shown in FIG. 8 to FIG. 12, the embodiment of the present invention further provides a client for implementing the multimedia conference. Referring to FIG. 18, the client includes: a processor 1411. , the transmitter 1412 and the memory 1413;
存储器1413内存储有处理器1411能够执行的操作指令,处理器1411读取存储器1413内的操作指令用于实现以下功能:获取本地与会者的发言语音信息,并将发言语音信息转换为发言文字信息。The memory 1413 stores an operation instruction executable by the processor 1411, and the processor 1411 reads the operation instruction in the memory 1413 for implementing the following functions: acquiring the speech information of the local participant, and converting the speech information into the speech information. .
在本发明的实施例中,可以通过语音设备采集与会者的音频信号进行相应的处理后提供给处理器1411,例如,所述语音设备可以是MIC。In an embodiment of the present invention, the audio signal of the participant may be collected by the voice device for corresponding processing and then provided to the processor 1411. For example, the voice device may be a MIC.
在本发明的一个实施例中,处理器1411具体用于:判断本地与会者是否具有发言权限;如果所述本地与会者具有发言权限,则采集所述本地与会者的发言语音信息。In an embodiment of the present invention, the processor 1411 is specifically configured to: determine whether the local participant has the speaking right; if the local participant has the speaking right, collect the speaking voice information of the local participant.
所述发送器1412用于将所述发言语音信息及所述发言文字信息发送给多媒体会议服务器;以使所述多媒体会议服务器将所述发言语音信息及所述发言文字信息发送给参加多媒体会议的其它与会者对应的客户端,其中所述其它与会者为参加多媒体会议的全部与会者中除发言人外的其它与会者。The transmitter 1412 is configured to send the speech voice information and the speech text information to the multimedia conference server, so that the multimedia conference server sends the speech speech information and the speech text information to the multimedia conference. The client corresponding to the other participants, wherein the other participants are other participants except the speaker among all the participants who participate in the multimedia conference.
在本发明的一个实施例中,多媒体会议服务器根据与会者法语音信息的 能量大小,确定能量最大的预设数量个与会者为发言人,再由客户端将发言人的发言语音信息转换成发言文字信息。本实施例中,实现多媒体会议的客户端还可以包括接收器。In an embodiment of the present invention, the multimedia conference server is based on the participant's voice information. The amount of energy, the preset number of participants who determine the maximum energy is the speaker, and then the client converts the speech information of the speaker into the speech information. In this embodiment, the client that implements the multimedia conference may further include a receiver.
所述接收器,用于接收多媒体会议服务器发送的发言通知消息,所述发言通知消息携带发言人的用户身份识别信息ID,所述发言人由多媒体会议服务器根据参加多媒体会议的与会者发送的语音信息的能量,按照所述能量有道到小的顺序依次确定出的前设数量个与会者;The receiver is configured to receive a speech notification message sent by the multimedia conference server, where the speech notification message carries a user identity information ID of the speaker, and the speaker is sent by the multimedia conference server according to the voice sent by the participant participating in the multimedia conference. The energy of the information, which is determined according to the order in which the energy is in a small order;
所述处理器1411,还用于判断所述发言通知消息所携带的用户ID与所述本地与会者的用户ID是否相同,如果所述发言通知消息所携带的用户ID与所述本地与会者的用户ID相同,确定本地与会者为发言人,然后获取本地与会者的发言语音信息。The processor 1411 is further configured to determine whether the user ID carried by the floor notification message is the same as the user ID of the local participant, if the user ID carried by the floor notification message and the local participant The user ID is the same, and the local participant is determined to be a speaker, and then the voice information of the local participant is obtained.
在本发明的又一个实施例中,只有主持人和主讲人能够发言,其它与会者不能发言,其它与会者的语音设备被关闭,且与会者自己不能开启语音设备。当与会者需要发言时,与会者可以请求主持人开启该与会者的语音设备。In still another embodiment of the present invention, only the moderator and the presenter can speak, other participants cannot speak, the voice devices of other participants are turned off, and the participants themselves cannot turn on the voice device. When a participant needs to speak, the participant can request the moderator to turn on the participant's voice device.
所述发送器1412还用于向多媒体会议服务器发送发言请求消息,所述发言请求消息携带所述本地与会者的用户ID,以使多媒体会议服务器将所述发言请求消息发送给主持人。The sender 1412 is further configured to send a floor request message to the multimedia conference server, where the floor request message carries the user ID of the local participant, so that the multimedia conference server sends the floor request message to the moderator.
所述接收器,还用于在接收到多媒体会议服务器发送的语音设备开启指令,并将语音设备开启指令提供给语音设备,以使语音设备采集本地与会者的发言语音信息,所述语音设备开启指令由所述多媒体会议服务器接收到主持人对应的客户端根据发言请求消息返回的发言响应消息产生。The receiver is further configured to: receive a voice device open command sent by the multimedia conference server, and provide a voice device open command to the voice device, so that the voice device collects voice information of the local participant, where the voice device is enabled. The instruction is received by the multimedia conference server, and the client corresponding to the host generates the utterance response message returned according to the utterance request message.
在本发明的另一个实施例中,所述实现多媒体会议的客户端还可以包括显示器。该显示器,用于显示所述发言文字信息。所述存储器,还用于存储所述发言文字信息,以便利用发言文字信息生成会议纪要。In another embodiment of the present invention, the client implementing the multimedia conference may further include a display. The display is configured to display the spoken text information. The memory is further configured to store the spoken text information to generate a meeting minutes by using the spoken text information.
本实施例提供的客户端,由客户端获取本地与会者的发言语音信息,并将发言语音信息转换为发言文字信息。然后将发言语音信息及发言文字信息 发送给多媒体会议服务器,再由多媒体会议服务器转发给参加多媒体会议的其它与会者对应的客户端,其它与会者对应的客户端展示接收到的发言语音信息及发言文字信息。利用本发明提供的实现多媒体会议的方法,与会者既能够听到发言人的发言语音信息又能够看到对应的发言文字信息,这样,与会者能够结合发言文字信息和发言语音信息准确理解发言人的发言内容,因此提高了多媒体会议的沟通效果。In the client provided by this embodiment, the client obtains the speech information of the local participant and converts the speech information into speech text information. Then speak speech information and speech text information The message is sent to the multimedia conference server, and then forwarded by the multimedia conference server to the client corresponding to other participants participating in the multimedia conference, and the client corresponding to the other participant displays the received voice message and the text message. By using the method for implementing the multimedia conference provided by the present invention, the participant can both hear the speech information of the speaker and the corresponding speech text information, so that the participant can accurately understand the speaker by combining the speech text information and the speech voice information. The content of the speech has thus improved the communication effect of the multimedia conference.
相应于图13~图15所示的应用于多媒体会议服务器的实现多媒体会议的装置,本发明实施例还提供了一种多媒体会议服务器,请参考图19,所述多媒体会议服务器包括:接收器1511和发送器1512。The embodiment of the present invention further provides a multimedia conference server corresponding to the device for implementing the multimedia conference applied to the multimedia conference server shown in FIG. 13 to FIG. 15 . Referring to FIG. 19 , the multimedia conference server includes: a receiver 1511 And a transmitter 1512.
所述接收器1511,用于获取客户端发送的发言语音信息及发言文字信息。The receiver 1511 is configured to obtain speech voice information and speech text information sent by the client.
所述发送器1512,用于将所述发言语音信息及发言文字信息发送给其它与会者对应的客户端,以使其它与会者对应的客户端展示所述发言语音信息及所述发言文字信息,其中,所述其它与会者是参加所述多媒体会议的与会者中除发送所述发言语音信息及发言文字信息的与会者之外的与会者。The sender 1512 is configured to send the speech voice information and the speech text information to a client corresponding to another participant, so that the client corresponding to the other participant displays the speech voice information and the speech text information. The other participant is a participant other than the participant who sends the speaking voice information and the speaking text information among the participants who participate in the multimedia conference.
在本发明的一个具体的实施例中,如图20所示,所述多媒体会议服务器还包括处理器1513。In a specific embodiment of the present invention, as shown in FIG. 20, the multimedia conference server further includes a processor 1513.
其中,所述接收器1511,还用于获取客户端发送的语音信息能量。The receiver 1511 is further configured to acquire voice information energy sent by the client.
所述处理器1513,用于按照语音信息的能量由大到小的顺序,确定预设数量个与会者作为发言人。The processor 1513 is configured to determine, according to the order of the energy of the voice information, a preset number of participants as a speaker.
所述发送器1512具体用于:向参加多媒体会议的与会者发送发言通知消息,其中,所述发言通知消息携带发言人的用户ID,以使与会者的客户端获取多媒体会议服务器发送的发言语音信息。The transmitter 1512 is specifically configured to: send a speech notification message to the participant participating in the multimedia conference, where the speech notification message carries the user ID of the speaker, so that the client of the participant acquires the speech voice sent by the multimedia conference server. information.
在本发明的另一个实施例中,所述接收器1511,还用于接收客户端发送的发言请求消息,所述发言请求消息携带所述客户端对应的与会者的用户ID。 In another embodiment of the present invention, the receiver 1511 is further configured to receive a floor request message sent by the client, where the floor request message carries a user ID of the participant corresponding to the client.
所述发送器1512,还用于将所述发言请求消息发送给主持人对应的客户端,以使所述主持人对应的客户端根据所述发言请求消息判断发送发言请求消息的与会者是否具有发言权限;The sender 1512 is further configured to send the floor request message to the client corresponding to the moderator, so that the client corresponding to the moderator determines, according to the floor request message, whether the participant sending the floor request message has Right to speak;
所述接收器1511,还用于接收所述主持人对应的客户端发送的发言响应消息,向具有发言权限的与会者对应的客户端发送语音设备开启指令,所述发言响应消息由所述主持人对应的客户端确定发送所述发言请求消息的与会者具有发言权限时产生。The receiver 1511 is further configured to receive a speech response message sent by the client corresponding to the host, and send a voice device open command to the client corresponding to the participant having the speaking permission, where the speech response message is The client corresponding to the person determines that the participant who sent the floor request message has a speaking right.
本实施例提供的多媒体会议服务器,接收客户端发送的客户端获取本地与会者的发言语音信息并发送给多媒体会议服务器;然后,由多媒体会议服务器将发言语音信息及发言文字信息转发给参加多媒体会议的其它与会者对应的客户端,以使所述其它与会者对应的客户端展示接收到的发言语音信息及发言文字信息。利用本发明提供的实现多媒体会议的装置,与会者既能够听到发言人的发言语音信息又能够看到对应的发言文字信息,这样,与会者能够结合发言文字信息和发言语音信息准确理解发言人的发言内容,因此提高了多媒体会议的沟通效果。In the multimedia conference server provided by the embodiment, the client sent by the client obtains the voice information of the local participant and sends the voice message to the multimedia conference server. Then, the multimedia conference server forwards the voice message and the text message to the multimedia conference. The other participant corresponds to the client, so that the client corresponding to the other participant displays the received speech message and the speech message. With the device for implementing the multimedia conference provided by the present invention, the participant can both hear the voice information of the speaker and the corresponding text message, so that the participant can accurately understand the speaker by combining the text message and the voice message. The content of the speech has thus improved the communication effect of the multimedia conference.
本发明还提供了一种多媒体会议系统,包括图18所示的客户端和图19-图20所示的多媒体会议服务器。The present invention also provides a multimedia conference system, including the client shown in FIG. 18 and the multimedia conference server shown in FIGS. 19-20.
所述客户端,用于获取本地与会者的发言语音信息并发送给多媒体会议服务器;以及将所述发言语音信息转换成发言文字信息,并将所述发言文字信息发送给多媒体会议服务器;The client is configured to obtain the speech information of the local participant and send the speech information to the multimedia conference server; and convert the speech information into speech text information, and send the speech text information to the multimedia conference server;
所述多媒体会议服务器,用于将所述发言语音信息及所述发言文字信息发送给参加多媒体会议的其它与会者的客户端;The multimedia conference server is configured to send the speech voice information and the speech text information to a client of another participant participating in the multimedia conference;
其中,所述其它与会者是所述多媒体会议的与会者中除所述本地与会者之外的其它与会者。The other participant is a participant other than the local participant among the participants of the multimedia conference.
在本发明一个实施例中,所述多媒体会议服务器,还用于检测参加所述多媒体体会议的客户端发送的语音信息的能量,按照所述能量由大到小的顺序确定前预设数量个与会者为发言人,并向发言人对应的客户端发送发言通 知消息,所述发言通知消息携带发言人的用户身份识别信息ID。In an embodiment of the present invention, the multimedia conference server is further configured to detect the energy of the voice information sent by the client participating in the multimedia conference, and determine the preset number according to the order of the energy from large to small. The participant is a spokesperson and sends a speech to the client corresponding to the spokesperson. Knowing the message, the speech notification message carries the user identification information ID of the speaker.
所述客户端,用于接收多媒体会议服务器发送的发言通知消息,并根据所述发言通知信息确定本地与会者是发言人时,获取本地与会者的发言语音信息并发送给所述多媒体会议服务器,以及将所述发言语音信息转换成发言文字信息,并将所述发言文字信息发送给所述多媒体会议服务器。The client is configured to receive a speech notification message sent by the multimedia conference server, and determine, according to the speech notification information, that the local participant is a speaker, obtain the speech information of the local participant, and send the speech message to the multimedia conference server. And converting the speech voice information into speech text information, and transmitting the speech text information to the multimedia conference server.
相应于图16所示的应用于客户端的实现多媒体会议的装置,本发明还提供了一种实现多媒体会议的客户端。如图21所示,该客户端包括:处理器1610和发送器1620。Corresponding to the apparatus for implementing a multimedia conference applied to a client shown in FIG. 16, the present invention further provides a client for implementing a multimedia conference. As shown in FIG. 21, the client includes a processor 1610 and a transmitter 1620.
处理器1610,用于获取本地与会者的发言语音信息。The processor 1610 is configured to obtain the speaking voice information of the local participant.
发送器1620,用于将所述发言语音信息发送给多媒体会议服务器,以使所述多媒体会议服务器将所述发言语音信息转换为发言文字信息,以及使所述多媒体会议服务器将所述发言语音信息及所述发言文字信息发送给参加多媒体会议的其它与会者的客户端The transmitter 1620 is configured to send the speech voice information to the multimedia conference server, so that the multimedia conference server converts the speech voice information into speech text information, and causes the multimedia conference server to use the speech voice information And sending the speech text information to the client of other participants participating in the multimedia conference
其中,所述其它与会者是所述多媒体会议的与会者中除所述本地与会者之外的其它与会者。The other participant is a participant other than the local participant among the participants of the multimedia conference.
本实施例提供的实现多媒体会议的客户端,与会者的客户端获得发言语音信息后发送给多媒体会议服务器,由多媒体会议服务器将发言语音信息转换成发言文字信息,然后,再将发言语音信息及对应的发言文字信息发送给参加多媒体会议的其它与会者对应的客户端。这样,参加多媒体会议的与会者既能够听到发言人的发言语音信息,又能够看到相应的发言文字信息,能够准确理解发言人的发言内容,提高多媒体会议的沟通效果。该方法由多媒体会议服务器将发言语音信息转换成发言文字信息,不需要在各个客户端上集成语音识别引擎,降低了客户端的生产成本。In the client for implementing the multimedia conference provided by the embodiment, the client of the participant obtains the voice message and sends the message to the multimedia conference server, and the multimedia conference server converts the voice message into the voice message, and then the voice message is The corresponding speech text information is sent to the client corresponding to other participants participating in the multimedia conference. In this way, the participants who participate in the multimedia conference can not only hear the voice information of the speaker's speech, but also can see the corresponding speech text information, can accurately understand the speaker's speech content, and improve the communication effect of the multimedia conference. The method converts the speech information into speech text information by the multimedia conference server, and does not need to integrate the speech recognition engine on each client, thereby reducing the production cost of the client.
相应于图17所示的应用于多媒体会议服务器的实现多媒体会议的装置,本发明还提供了多媒体会议服务器,如图22所示,所述多媒体会议服务器包括:处理器1710和发送器1720。 The present invention further provides a multimedia conference server, as shown in FIG. 22, which includes a processor 1710 and a transmitter 1720.
处理器1710,用于获取客户端发送的发言语音信息,并将所述发言语音信息转换成发言文字信息。The processor 1710 is configured to obtain the speech information sent by the client, and convert the speech information into speech text information.
发送器1720,用于将所述发言语音信息及所述发言文字信息发送给其它与会者对应的客户端,以使所述其它与会者对应的客户端展示所述发言语音信息及所述发言文字信息。The sender 1720 is configured to send the speaking voice information and the speaking text information to a client corresponding to another participant, so that the client corresponding to the other participant displays the speaking voice information and the speaking text. information.
其中,所述其它与会者是参加所述多媒体会议的与会者中除发送所述发言语音信息的与会者之外的与会者。The other participant is a participant other than the participant who sends the speaking voice information among the participants who participate in the multimedia conference.
本实施例提供的实现多媒体会议的装置,多媒体会议服务器检测各个与会者发出语音信息的能量,并按照能量由大到小的顺序,确定前预设数量个与会者为发言人。多媒体会议服务器只将确定出的发言人的发言内容转换成对应的文字信息。该方法能够避免将很多与会议无关的语音转换成文字,导致很多与会议无关的文字显示给与会者,对与会者造成的干扰现象出现。In the device for implementing the multimedia conference provided by the embodiment, the multimedia conference server detects the energy of the voice information sent by each participant, and determines the preset number of participants as the speaker according to the order of the energy. The multimedia conference server only converts the content of the speech of the determined speaker into corresponding text information. This method can avoid converting a lot of speech unrelated to a conference into text, resulting in many texts that are not related to the conference being displayed to the participants, and the interference caused to the participants occurs.
本发明还提供了另一种多媒体会议系统,包括图21所示的客户端及图22所示的多媒体会议服务器。The present invention also provides another multimedia conference system, including the client shown in FIG. 21 and the multimedia conference server shown in FIG.
所述客户端,用于获取本地与会者的发言语音信息,并发送给多媒体会议服务器;The client is configured to obtain the voice information of the local participant and send the voice message to the multimedia conference server.
所述多媒体会议服务器,用于将所述发言语音信息转换成发言文字信息,并将所述发言语音信息及与所述发言语音信息对应的发言文字信息发送给其它与会者对应的客户端;其中,所述其它与会者是参加所述多媒体会议的与会者中除发送所述发言语音信息的与会者之外的与会者。The multimedia conference server is configured to convert the speech voice information into speech text information, and send the speech speech information and the speech text information corresponding to the speech speech information to a client corresponding to another participant; The other participant is a participant other than the participant who sent the speaking voice information among the participants participating in the multimedia conference.
所述其它与会者对应的客户端,还用于向用户展示所述多媒体会议服务器发送的发言语音信息及发言文字信息。The client corresponding to the other participant is further configured to display the speaking voice information and the speaking text information sent by the multimedia conference server to the user.
在本发明一个实施例中,所述多媒体会议服务器,还用于检测参加所述多媒体体会议的客户端发送的语音信息的能量,按照所述能量由大到小的顺序确定前预设数量个与会者为发言人,当接收到的发言语音信息来自确定出的发言人时,将所述发言语音信息转换成发言文字信息。In an embodiment of the present invention, the multimedia conference server is further configured to detect the energy of the voice information sent by the client participating in the multimedia conference, and determine the preset number according to the order of the energy from large to small. The participant is a speaker, and when the received speech voice information comes from the determined speaker, the speech voice information is converted into speech text information.
通过以上的方法实施例的描述,所属领域的技术人员可以清楚地了解到 本发明可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本发明的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本发明各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:只读存储器(ROM)、随机存取存储器(RAM)、磁碟或者光盘等各种可以存储程序代码的介质。Through the description of the above method embodiments, those skilled in the art can clearly understand The present invention can be implemented by means of software plus a necessary general hardware platform, and of course hardware, but in many cases the former is a better implementation. Based on such understanding, the technical solution of the present invention, which is essential or contributes to the prior art, may be embodied in the form of a software product stored in a storage medium, including a plurality of instructions for causing a A computer device (which may be a personal computer, server, or network device, etc.) performs all or part of the steps of the methods described in various embodiments of the present invention. The foregoing storage medium includes various types of media that can store program codes, such as a read only memory (ROM), a random access memory (RAM), a magnetic disk, or an optical disk.
本说明书中的各个实施例均采用递进的方式描述,各个实施例之间相同相似的部分互相参见即可,每个实施例重点说明的都是与其它实施例的不同之处。尤其,对于装置或系统实施例而言,由于其基本相似于方法实施例,所以描述得比较简单,相关之处参见方法实施例的部分说明即可。以上所描述的装置及系统实施例仅仅是示意性的,其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。本领域普通技术人员在不付出创造性劳动的情况下,即可以理解并实施。The various embodiments in the specification are described in a progressive manner, and the same or similar parts between the various embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for a device or system embodiment, since it is substantially similar to the method embodiment, the description is relatively simple, and the relevant portions can be referred to the description of the method embodiment. The apparatus and system embodiments described above are merely illustrative, wherein the units described as separate components may or may not be physically separate, and the components displayed as units may or may not be physical units, ie It can be located in one place or it can be distributed to multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the embodiment. Those of ordinary skill in the art can understand and implement without any creative effort.
以上所述仅是本发明的具体实施方式,应当指出,对于本技术领域的普通技术人员来说,在不脱离本发明原理的前提下,还可以做出若干改进和润饰,这些改进和润饰也应视为本发明的保护范围。 The above is only a specific embodiment of the present invention, and it should be noted that those skilled in the art can also make several improvements and retouchings without departing from the principles of the present invention. It should be considered as the scope of protection of the present invention.

Claims (24)

  1. 一种实现多媒体会议的方法,其特征在于,包括:A method for implementing a multimedia conference, comprising:
    客户端获取本地与会者的发言语音信息,并将所述发言语音信息发送给多媒体会议服务器;The client obtains the voice information of the voice of the local participant, and sends the voice message of the voice to the multimedia conference server;
    所述客户端将所述发言语音信息转换成发言文字信息;The client converts the spoken voice information into speech text information;
    所述客户端将所述发言文字信息发送给多媒体会议服务器,以使所述多媒体会议服务器将所述发言语音信息及所述发言文字信息发送给所述多媒体会议的其它与会者的客户端;Sending, by the client, the speech text information to the multimedia conference server, so that the multimedia conference server sends the speech speech information and the speech text information to a client of another participant of the multimedia conference;
    其中,所述其它与会者是所述多媒体会议的与会者中除所述本地与会者之外的其它与会者。The other participant is a participant other than the local participant among the participants of the multimedia conference.
  2. 根据权利要求1所述的方法,其特征在于,所述客户端将所述发言语音信息转换为发言文字信息,包括:The method according to claim 1, wherein the client converts the spoken voice information into speech text information, including:
    接收多媒体会议服务器发送的发言通知消息,所述发言通知消息携带发言人的用户身份识别信息ID,所述发言人由多媒体会议服务器根据参加多媒体会议的与会者发送的语音信息的能量,按照所述能量由大到小的顺序依次确定出的前预设数量个与会者;Receiving a speech notification message sent by the multimedia conference server, where the speech notification message carries a user identification information ID of the speaker, and the speaker is configured by the multimedia conference server according to the energy of the voice information sent by the participant participating in the multimedia conference, according to the a predetermined number of participants in the order of energy from large to small;
    判断所述发言通知消息所携带的用户ID与所述本地与会者的用户ID是否相同;Determining whether the user ID carried in the speech notification message is the same as the user ID of the local participant;
    如果所述发言通知消息所携带的用户ID与所述本地与会者的用户ID相同,利用语音识别引擎将采集到的发言语音信息转换成发言文字信息。If the user ID carried by the speech notification message is the same as the user ID of the local participant, the voice speech recognition engine converts the collected speech voice information into speech text information.
  3. 根据权利要求1所述的方法,其特征在于,所述客户端获取本地与会者的发言语音信息,包括:The method according to claim 1, wherein the client obtains speech information of the local participant, including:
    所述客户端判断所述本地与会者是否具有发言权限;Determining, by the client, whether the local participant has a speaking right;
    如果所述本地与会者具有发言权限,则利用语音设备采集所述本地与会者的发言语音信息。If the local participant has the speaking right, the voice information of the local participant is collected by using the voice device.
  4. 根据权利要求1所述的方法,其特征在于,所述客户端获取本地与会者的发言语音信息,包括: The method according to claim 1, wherein the client obtains speech information of the local participant, including:
    所述客户端向多媒体会议服务器发送发言请求消息,所述发言请求消息携带所述本地与会者的用户ID,以使所述多媒体会议服务器将所述发言请求消息发送给主持人对应的客户端;The client sends a speech request message to the multimedia conference server, where the speech request message carries the user ID of the local participant, so that the multimedia conference server sends the speech request message to the client corresponding to the host;
    当所述客户端接收到所述多媒体会议服务器发送的语音设备开启指令时,利用语音设备采集所述本地与会者的发言语音信息;所述语音设备开启指令由所述多媒体会议服务器接收到主持人对应的客户端根据发言请求消息返回的发言响应消息产生。When the client receives the voice device open command sent by the multimedia conference server, the voice device is used to collect the voice information of the local participant; the voice device open command is received by the multimedia conference server. The corresponding client is generated according to the speech response message returned by the speech request message.
  5. 一种实现多媒体会议的方法,其特征在于,包括:A method for implementing a multimedia conference, comprising:
    多媒体会议服务器获取客户端发送的发言语音信息及与所述发言语音信息相对应的发言文字信息,所述发言文字信息由所述客户端将获得的发言语音信息利用语音识别引擎转换得到;The multimedia conference server acquires the speech speech information sent by the client and the speech text information corresponding to the speech speech information, and the speech speech information obtained by the client is converted by the speech recognition engine by using the speech recognition engine;
    所述多媒体会议服务器将所述发言语音信息及发言文字信息发送给其它与会者对应的客户端,以使所述其它与会者对应的客户端展示所述发言语音信息及所述发言文字信息;The multimedia conference server sends the speech voice information and the speech text information to the client corresponding to the other participant, so that the client corresponding to the other participant displays the speech voice information and the speech text information;
    其中,所述其它与会者是参加所述多媒体会议的与会者中除发送所述发言语音信息及发言文字信息的与会者之外的与会者。The other participant is a participant other than the participant who sends the speaking voice information and the speaking text information among the participants who participate in the multimedia conference.
  6. 根据权利要求5所述的方法,其特征在于,还包括:The method of claim 5, further comprising:
    所述多媒体会议服务器检测客户端发送的语音信息的能量;The multimedia conference server detects energy of voice information sent by the client;
    所述多媒体会议服务器按照所述能量由大到小的顺序确定前预设数量个与会者为发言人;The multimedia conference server determines, according to the order of the energy from the largest to the smallest, a preset number of participants as a speaker;
    所述多媒体会议服务器向所述发言人对应的客户端发送发言通知消息,所述发言通知消息携带所述发言人的用户身份识别信息ID,以使所述发言人对应的客户端获取所述发言人的发言语音信息并将所述发言语音信息转换为发言文字信息。The multimedia conference server sends a speech notification message to the client corresponding to the speaker, where the speech notification message carries the user identification information ID of the speaker, so that the client corresponding to the speaker obtains the speech. The person speaks the voice information and converts the spoken voice information into the spoken text information.
  7. 根据权利要求5所述的方法,其特征在于,还包括:The method of claim 5, further comprising:
    所述多媒体会议服务器接收客户端发送的发言请求消息,所述发言请求消息携带所述客户端对应的与会者的用户ID; The multimedia conference server receives a speech request message sent by the client, where the speech request message carries a user ID of the participant corresponding to the client;
    所述多媒体会议服务器将所述发言请求消息发送给主持人对应的客户端,以使所述主持人对应的客户端根据所述发言请求消息判断发送发言请求消息的与会者是否具有发言权限;Sending, by the multimedia conference server, the spoofing request message to the client corresponding to the moderator, so that the client corresponding to the moderator determines, according to the spoofing request message, whether the participant who sends the sneak request message has the utterance authority;
    所述多媒体会议服务器接收所述主持人对应的客户端发送的发言响应消息,并根据发言响应消息向具有发言权限的与会者对应的客户端发送语音设备开启指令,以使具有发言权限的与会者采集发言语音信息;The multimedia conference server receives the speech response message sent by the client corresponding to the host, and sends a voice device open command to the client corresponding to the participant having the speaking permission according to the speech response message, so that the participant with the speaking permission is enabled. Collect speech information;
    其中,所述发言响应消息由所述主持人对应的客户端确定发送所述发言请求消息的与会者具有发言权限时产生。The session response message is generated when the client corresponding to the moderator determines that the participant who sends the floor request message has the floor permission.
  8. 一种实现多媒体会议的方法,其特征在于,包括:A method for implementing a multimedia conference, comprising:
    客户端获取本地与会者的发言语音信息;The client obtains the voice information of the local participant's speech;
    所述客户端将所述发言语音信息发送给多媒体会议服务器,以使所述多媒体会议服务器将所述发言语音信息转换为发言文字信息,以及使所述多媒体会议服务器将所述发言语音信息及所述发言文字信息发送给参加多媒体会议的其它与会者的客户端Sending, by the client, the speech voice information to the multimedia conference server, so that the multimedia conference server converts the speech voice information into speech text information, and causes the multimedia conference server to display the speech information and the speech message The text of the speech is sent to the client of other participants participating in the multimedia conference.
    其中,所述其它与会者是所述多媒体会议的与会者中除所述本地与会者之外的其它与会者。The other participant is a participant other than the local participant among the participants of the multimedia conference.
  9. 一种实现多媒体会议的方法,其特征在于,包括:A method for implementing a multimedia conference, comprising:
    多媒体会议服务器获取客户端发送的发言语音信息;The multimedia conference server obtains the voice information sent by the client;
    所述多媒体会议服务器将所述发言语音信息转换成发言文字信息;The multimedia conference server converts the speech voice information into speech text information;
    所述多媒体会议服务器将所述发言语音信息及所述发言文字信息发送给其它与会者对应的客户端,以使所述其它与会者对应的客户端展示所述发言语音信息及所述发言文字信息;The multimedia conference server sends the speech voice information and the speech text information to a client corresponding to another participant, so that the client corresponding to the other participant displays the speech voice information and the speech text information. ;
    其中,所述其它与会者是参加所述多媒体会议的与会者中除发送所述发言语音信息的与会者之外的与会者。The other participant is a participant other than the participant who sends the speaking voice information among the participants who participate in the multimedia conference.
  10. 根据权利要求9所述的方法,其特征在于,所述多媒体会议服务器将所述发言语音信息转换成发言文字信息包括:The method according to claim 9, wherein the converting the speech information into the speech information by the multimedia conference server comprises:
    多媒体会议服务器检测客户端发送的语音信息的能量,按照所述能量由 大到小的顺序依次确定出前预设数量个与会者为发言人;The multimedia conference server detects the energy of the voice information sent by the client, according to the energy The order of large to small determines the pre-set number of participants as speakers;
    利用语音识别引擎将确定出的发言人对应的客户端发送的发言语音信息转换为发言文字信息。The voice recognition engine is used to convert the voice information sent by the client corresponding to the determined speaker into the voice text information.
  11. 一种实现多媒体会议的装置,用于客户端,其特征在于,包括:An apparatus for implementing a multimedia conference, which is used for a client, and includes:
    获取单元,用于获取本地与会者的发言语音信息;An obtaining unit, configured to obtain speaking voice information of a local participant;
    转换单元,用于将所述发言语音信息转换成发言文字信息;a converting unit, configured to convert the spoken voice information into speech text information;
    发送单元,用于将所述发言语音信息及所述发言文字信息发送给多媒体会议服务器,以使所述多媒体会议服务器将所述发言语音信息及所述发言文字信息发送给所述多媒体会议的其它与会者的客户端;a sending unit, configured to send the speech voice information and the speech text information to the multimedia conference server, so that the multimedia conference server sends the speech speech information and the speech text information to the multimedia conference Participant's client;
    其中,所述其它与会者是所述多媒体会议的与会者中除所述本地与会者之外的其它与会者。The other participant is a participant other than the local participant among the participants of the multimedia conference.
  12. 根据权利要求11所述的装置,其特征在于,所述获取单元包括:The device according to claim 11, wherein the obtaining unit comprises:
    第一判断子单元,用于判断所述本地与会者是否具有发言权限;a first determining subunit, configured to determine whether the local participant has a speaking right;
    第一采集子单元,用于当所述第一判断单元判定所述本地与会者具有发言权限时,则利用语音设备采集所述本地与会者的发言语音信息。The first collecting subunit is configured to: when the first determining unit determines that the local participant has a speaking right, collect the speaking voice information of the local participant by using a voice device.
  13. 根据权利要求11所述的装置,其特征在于,所述转换单元包括:The apparatus according to claim 11, wherein said converting unit comprises:
    第一接收子单元,用于接收多媒体会议服务器发送的发言通知消息,所述发言通知消息携带发言人的用户身份识别信息ID,所述发言人由多媒体会议服务器根据参加多媒体会议的与会者发送的语音信息的能量,按照所述能量由大到小的顺序依次确定出的前预设数量个与会者;a first receiving subunit, configured to receive a speech notification message sent by the multimedia conference server, where the speech notification message carries a user identity information ID of the speaker, and the speaker is sent by the multimedia conference server according to the participant participating in the multimedia conference The energy of the voice information is determined according to the order of the energy from the largest to the smallest preset number of participants;
    第二判断子单元,用于判断所述发言通知消息所携带的用户ID与所述本地与会者的用户ID是否相同;a second determining subunit, configured to determine whether a user ID carried by the floor notification message is the same as a user ID of the local participant;
    第二采集子单元,用于当所述第二判断子单元判定所述发言通知消息所携带的用户ID与所述本地与会者的用户ID相同时,利用语音设备采集所述本地与会者的发言语音信息。a second collection subunit, configured to: when the second judgment subunit determines that the user ID carried by the utterance notification message is the same as the user ID of the local participant, use a voice device to collect the speech of the local participant voice message.
  14. 根据权利要求11所述的装置,其特征在于,所述获取单元具体包括: The device according to claim 11, wherein the obtaining unit specifically comprises:
    第一发送子单元,用于向多媒体会议服务器发送发言请求消息,所述发言请求消息携带所述本地与会者的用户ID,以使所述多媒体会议服务器将所述发言请求消息发送给主持人对应的客户端;a first sending subunit, configured to send a speech request message to the multimedia conference server, where the speech request message carries a user ID of the local participant, so that the multimedia conference server sends the speech request message to the host corresponding Client
    第二接收子单元,用于接收所述多媒体会议服务器发送的语音设备开启指令,a second receiving subunit, configured to receive a voice device open command sent by the multimedia conference server,
    第三采集子单元,用于当所述第二接收子单元接收到所述语音设备开启指令时,利用语音设备采集所述本地与会者的发言语音信息;所述语音设备开启指令由所述多媒体会议服务器接收到主持人对应的客户端根据发言请求消息返回的发言响应消息产生。a third collection subunit, configured to: when the second receiving subunit receives the voice device open instruction, use a voice device to collect voice information of the local participant; the voice device open command is used by the multimedia The conference server receives the response response message returned by the client corresponding to the speaker request message.
  15. 一种实现多媒体会议的装置,用于多媒体会议服务器端,其特征在于,包括:An apparatus for implementing a multimedia conference, which is used in a multimedia conference server, and includes:
    获取单元,用于获取客户端发送的发言语音信息及与所述发言语音信息相对应的发言文字信息,所述发言文字信息由所述客户端将获得的发言语音信息利用语音识别引擎转换得到;An acquiring unit, configured to obtain speech speech information sent by the client, and speech text information corresponding to the speech speech information, where the speech speech information is converted by the speech recognition engine obtained by the client by using a speech recognition engine;
    第一发送单元,用于将所述发言语音信息及发言文字信息发送给其它与会者对应的客户端,以使所述其它与会者对应的客户端展示所述发言语音信息及所述发言文字信息;a first sending unit, configured to send the speaking voice information and the speaking text information to a client corresponding to another participant, so that the client corresponding to the other participant displays the speaking voice information and the speaking text information ;
    其中,所述其它与会者是参加所述多媒体会议的与会者中除发送所述发言语音信息及发言文字信息的与会者之外的与会者。The other participant is a participant other than the participant who sends the speaking voice information and the speaking text information among the participants who participate in the multimedia conference.
  16. 根据权利要求15所述的装置,其特征在于,还包括:The device according to claim 15, further comprising:
    检测单元,用于检测客户端发送的语音信息的能量;a detecting unit, configured to detect energy of voice information sent by the client;
    确定单元,用于按照所述能量由大到小的顺序确定前预设数量个与会者为发言人;a determining unit, configured to determine, according to the order of the energy from the largest to the smallest, a preset number of participants as a speaker;
    第二发送单元,用于向所述发言人对应的客户端发送发言通知消息,所述发言通知消息携带所述发言人的用户身份识别信息ID,以使所述发言人对应的客户端获取所述发言人的发言语音信息并将所述发言语音信息转换为发言文字信息。 a second sending unit, configured to send a speech notification message to the client corresponding to the speaker, where the speech notification message carries the user identification information ID of the speaker, so that the client corresponding to the speaker acquires the location The speaker's speech information is converted and the speech information is converted into speech text information.
  17. 根据权利要求15所述的装置,其特征在于,还包括:The device according to claim 15, further comprising:
    第一接收单元,用于接收客户端发送的发言请求消息,所述发言请求消息携带所述客户端对应的与会者的用户ID;a first receiving unit, configured to receive a speech request message sent by the client, where the speech request message carries a user ID of the participant corresponding to the client;
    第三发送单元,用于将所述发言请求消息发送给主持人对应的客户端,以使所述主持人对应的客户端根据所述发言请求消息判断发送发言请求消息的与会者是否具有发言权限;a third sending unit, configured to send the floor request message to the client corresponding to the moderator, so that the client corresponding to the moderator determines, according to the floor request message, whether the participant who sends the floor request message has the speaking right ;
    第二接收单元,用于接收所述主持人对应的客户端发送的发言响应消息;所述发言响应消息由所述主持人对应的客户端确定发送所述发言请求消息的与会者具有发言权限时产生;a second receiving unit, configured to receive a speech response message sent by the client corresponding to the host; the speech response message is determined by the client corresponding to the moderator, when the participant sending the speech request message has a speaking permission produce;
    第四发送单元,用于根据所述发言响应消息向具有发言权限的与会者对应的客户端发送语音设备开启指令。And a fourth sending unit, configured to send, according to the utterance response message, a voice device open command to a client corresponding to the participant having the utterance authority.
  18. 一种实现多媒体会议的装置,应用于客户端,其特征在于,包括:An apparatus for implementing a multimedia conference is applied to a client, and is characterized in that:
    获取单元,用于获取本地与会者的发言语音信息;An obtaining unit, configured to obtain speaking voice information of a local participant;
    发送单元,用于将所述发言语音信息发送给多媒体会议服务器,以使所述多媒体会议服务器将所述发言语音信息转换为发言文字信息,以及使所述多媒体会议服务器将所述发言语音信息及所述发言文字信息发送给参加多媒体会议的其它与会者的客户端a sending unit, configured to send the speech voice information to the multimedia conference server, so that the multimedia conference server converts the speech voice information into speech text information, and causes the multimedia conference server to Sending the spoken text information to the client of other participants participating in the multimedia conference
    其中,所述其它与会者是所述多媒体会议的与会者中除所述本地与会者之外的其它与会者。The other participant is a participant other than the local participant among the participants of the multimedia conference.
  19. 一种实现多媒体会议的装置,应用于多媒体服务器中,其特征在于,包括:An apparatus for implementing a multimedia conference, which is applied to a multimedia server, and includes:
    获取单元,用于获取客户端发送的发言语音信息;The obtaining unit is configured to obtain the speaking voice information sent by the client;
    转换单元,用于将所述发言语音信息转换成发言文字信息;a converting unit, configured to convert the spoken voice information into speech text information;
    发送单元,用于将所述发言语音信息及所述发言文字信息发送给其它与会者对应的客户端,以使所述其它与会者对应的客户端展示所述发言语音信息及所述发言文字信息;a sending unit, configured to send the speaking voice information and the speaking text information to a client corresponding to another participant, so that the client corresponding to the other participant displays the speaking voice information and the speaking text information ;
    其中,所述其它与会者是参加所述多媒体会议的与会者中除发送所述发 言语音信息的与会者之外的与会者。The other participant is the participant who participates in the multimedia conference, except for sending the hair Participants other than the participants of the voice message.
  20. 根据权利要求19所述的装置,其特征在于,所述转换单元包括:The apparatus according to claim 19, wherein said converting unit comprises:
    检测子单元,用于检测客户端发送的语音信息的能量,按照所述能量由大到小的顺序依次确定出前预设数量个与会者为发言人;The detecting subunit is configured to detect the energy of the voice information sent by the client, and sequentially determine the preset number of participants as the speaker according to the order of the energy;
    转换子单元,用于利用语音识别引擎将确定出的发言人发送的发言语音信息转换为发言文字信息。The conversion subunit is configured to convert the speech speech information sent by the determined speaker into the speech text information by using the speech recognition engine.
  21. 一种实现多媒体会议系统,其特征在于,包括:客户端和多媒体会议服务器;A multimedia conference system is provided, comprising: a client and a multimedia conference server;
    所述客户端,用于获取本地与会者的发言语音信息并发送给多媒体会议服务器;以及将所述发言语音信息转换成发言文字信息,并将所述发言文字信息发送给多媒体会议服务器;The client is configured to obtain the speech information of the local participant and send the speech information to the multimedia conference server; and convert the speech information into speech text information, and send the speech text information to the multimedia conference server;
    所述多媒体会议服务器,用于将所述发言语音信息及所述发言文字信息发送给参加多媒体会议的其它与会者的客户端;The multimedia conference server is configured to send the speech voice information and the speech text information to a client of another participant participating in the multimedia conference;
    其中,所述其它与会者是所述多媒体会议的与会者中除所述本地与会者之外的其它与会者。The other participant is a participant other than the local participant among the participants of the multimedia conference.
  22. 根据权利要求21所述的多媒体会议系统,其特征在于:A multimedia conference system according to claim 21, wherein:
    所述多媒体会议服务器,还用于检测参加所述多媒体体会议的客户端发送的语音信息的能量,按照所述能量由大到小的顺序确定前预设数量个与会者为发言人,并向发言人对应的客户端发送发言通知消息,所述发言通知消息携带发言人的用户身份识别信息ID;The multimedia conference server is further configured to detect the energy of the voice information sent by the client participating in the multimedia conference, and determine, according to the order of the energy, the preset number of participants is a speaker, and The client corresponding to the speaker sends a speech notification message, where the speech notification message carries the user identity information ID of the speaker;
    所述客户端,用于接收多媒体会议服务器发送的发言通知消息,并根据所述发言通知信息确定本地与会者是发言人时,获取本地与会者的发言语音信息并发送给所述多媒体会议服务器,以及将所述发言语音信息转换成发言文字信息,并将所述发言文字信息发送给所述多媒体会议服务器。The client is configured to receive a speech notification message sent by the multimedia conference server, and determine, according to the speech notification information, that the local participant is a speaker, obtain the speech information of the local participant, and send the speech message to the multimedia conference server. And converting the speech voice information into speech text information, and transmitting the speech text information to the multimedia conference server.
  23. 一种实现多媒体会议系统,其特征在于,包括:客户端和多媒体会议服务器;A multimedia conference system is provided, comprising: a client and a multimedia conference server;
    所述客户端,用于获取本地与会者的发言语音信息,并发送给多媒体会 议服务器;The client is configured to obtain the voice information of the local participant and send it to the multimedia conference. Server
    所述多媒体会议服务器,用于将所述发言语音信息转换成发言文字信息,并将所述发言语音信息及与所述发言语音信息对应的发言文字信息发送给其它与会者对应的客户端;其中,所述其它与会者是参加所述多媒体会议的与会者中除发送所述发言语音信息的与会者之外的与会者;The multimedia conference server is configured to convert the speech voice information into speech text information, and send the speech speech information and the speech text information corresponding to the speech speech information to a client corresponding to another participant; And the other participant is a participant other than the participant who sends the speaking voice information among the participants who participate in the multimedia conference;
    所述其它与会者对应的客户端,还用于向用户展示所述多媒体会议服务器发送的发言语音信息及发言文字信息。The client corresponding to the other participant is further configured to display the speaking voice information and the speaking text information sent by the multimedia conference server to the user.
  24. 根据权利要求23所述的多媒体会议系统,其特征在于:A multimedia conferencing system according to claim 23, wherein:
    所述多媒体会议服务器,还用于检测参加所述多媒体体会议的客户端发送的语音信息的能量,按照所述能量由大到小的顺序确定前预设数量个与会者为发言人,当接收到的发言语音信息来自确定出的发言人时,将所述发言语音信息转换成发言文字信息。 The multimedia conference server is further configured to detect energy of voice information sent by a client participating in the multimedia conference, and determine, according to the order of the energy, a preset number of participants as a speaker, when receiving When the spoken speech information comes from the determined speaker, the speech speech information is converted into speech text information.
PCT/CN2015/099559 2015-05-19 2015-12-29 Method and device for realizing multimedia conference WO2016184118A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201510255577.1 2015-05-19
CN201510255577.1A CN106301811A (en) 2015-05-19 2015-05-19 Realize the method and device of multimedia conferencing

Publications (1)

Publication Number Publication Date
WO2016184118A1 true WO2016184118A1 (en) 2016-11-24

Family

ID=57319318

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2015/099559 WO2016184118A1 (en) 2015-05-19 2015-12-29 Method and device for realizing multimedia conference

Country Status (2)

Country Link
CN (1) CN106301811A (en)
WO (1) WO2016184118A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112291502A (en) * 2020-02-24 2021-01-29 北京字节跳动网络技术有限公司 Information interaction method, device and system and electronic equipment
CN114567747A (en) * 2020-11-27 2022-05-31 北京新媒传信科技有限公司 Conference data transmission method and conference system

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108234274A (en) * 2016-12-12 2018-06-29 苏州乐聚堂电子科技有限公司 A kind of display methods of speech message
CN107566340B (en) * 2017-07-27 2020-12-08 杭州迅宜通信技术有限公司 Conference auxiliary communication method and storage medium and device thereof
CN107993665B (en) * 2017-12-14 2021-04-30 科大讯飞股份有限公司 Method for determining role of speaker in multi-person conversation scene, intelligent conference method and system
CN110557596B (en) * 2018-06-04 2021-09-21 杭州海康威视数字技术股份有限公司 Conference system
CN109003608A (en) * 2018-08-07 2018-12-14 北京东土科技股份有限公司 Court's trial control method, system, computer equipment and storage medium
CN111354356A (en) * 2018-12-24 2020-06-30 北京搜狗科技发展有限公司 Voice data processing method and device
CN109802968B (en) * 2019-01-28 2021-06-22 深圳市飞图视讯有限公司 Conference speaking system
CN112420047A (en) * 2019-08-23 2021-02-26 珠海金山办公软件有限公司 Communication method and device for network conference, user terminal and storage medium
CN110491384B (en) * 2019-08-29 2022-04-22 联想(北京)有限公司 Voice data processing method and device
CN110648665A (en) * 2019-09-09 2020-01-03 北京左医科技有限公司 Session process recording system and method
CN110600035A (en) * 2019-09-17 2019-12-20 深圳市天道日新科技有限公司 Display system based on real-time voice transcription
CN112564926B (en) * 2021-02-19 2021-05-11 全时云商务服务股份有限公司 Method and system for processing network conference
CN113128221A (en) * 2021-05-08 2021-07-16 聚好看科技股份有限公司 Method for storing speaking content, display device and server

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040267527A1 (en) * 2003-06-25 2004-12-30 International Business Machines Corporation Voice-to-text reduction for real time IM/chat/SMS
US20070143103A1 (en) * 2005-12-21 2007-06-21 Cisco Technology, Inc. Conference captioning
CN101309390A (en) * 2007-05-17 2008-11-19 华为技术有限公司 Visual communication system, apparatus and subtitle displaying method

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040267527A1 (en) * 2003-06-25 2004-12-30 International Business Machines Corporation Voice-to-text reduction for real time IM/chat/SMS
US20070143103A1 (en) * 2005-12-21 2007-06-21 Cisco Technology, Inc. Conference captioning
CN101309390A (en) * 2007-05-17 2008-11-19 华为技术有限公司 Visual communication system, apparatus and subtitle displaying method

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112291502A (en) * 2020-02-24 2021-01-29 北京字节跳动网络技术有限公司 Information interaction method, device and system and electronic equipment
CN112291502B (en) * 2020-02-24 2023-05-26 北京字节跳动网络技术有限公司 Information interaction method, device and system and electronic equipment
CN114567747A (en) * 2020-11-27 2022-05-31 北京新媒传信科技有限公司 Conference data transmission method and conference system

Also Published As

Publication number Publication date
CN106301811A (en) 2017-01-04

Similar Documents

Publication Publication Date Title
WO2016184118A1 (en) Method and device for realizing multimedia conference
EP3282669B1 (en) Private communications in virtual meetings
JP5967684B2 (en) Method and system for managing real-time audio broadcasts between groups of users
US11695875B2 (en) Multiple device conferencing with improved destination playback
CN111935443B (en) Method and device for sharing instant messaging tool in real-time live broadcast of video conference
US9094524B2 (en) Enhancing conferencing user experience via components
US11500530B2 (en) Simplified sharing of content among computing devices
CN110730952A (en) Method and system for processing audio communication on network
KR102085383B1 (en) Termial using group chatting service and operating method thereof
WO2016127691A1 (en) Method and apparatus for broadcasting dynamic information in multimedia conference
US11115444B2 (en) Private communications in virtual meetings
JP2011125006A5 (en)
US20120259924A1 (en) Method and apparatus for providing summary information in a live media session
US10187432B2 (en) Replaying content of a virtual meeting
EP3772851A1 (en) Method and system for adapted modality conferencing
US20230239406A1 (en) Communication system
CN103391481A (en) Data interaction method, device and system based on real-time messaging protocol
JP4531013B2 (en) Audiovisual conference system and terminal device
US9591037B2 (en) Distributed audio mixing and forwarding
WO2024032111A1 (en) Data processing method and apparatus for online conference, and device, medium and product
US20230421620A1 (en) Method and system for handling a teleconference
KR101488430B1 (en) System and method for video communication

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15892492

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 15892492

Country of ref document: EP

Kind code of ref document: A1