WO2016184118A1

WO2016184118A1 - Method and device for realizing multimedia conference

Info

Publication number: WO2016184118A1
Application number: PCT/CN2015/099559
Authority: WO
Inventors: 应益峰
Original assignee: 华为技术有限公司
Priority date: 2015-05-19
Filing date: 2015-12-29
Publication date: 2016-11-24
Also published as: CN106301811A

Abstract

Disclosed in embodiments of the present invention are a method and device for realizing a multimedia conference. The method comprises: acquiring, by a client, speech voice information from a local conference participant, and converting the speech voice information to speech text information; transmitting the speech voice information and the speech text information to a multimedia conference server, and forwarding, by the multimedia conference server, the same to clients corresponding to other conference participants attending the multimedia conference; and displaying, by the clients corresponding to the other conference participants, the received speech voice information and the speech text information. By utilizing the method for realizing multimedia conference of the present invention, a conference participant can hear speech voice information from a speaker and read corresponding speech text information. The combination of the speech text information and the speech voice information enables the conference participant to accurately understand the content delivered by the speaker, thereby improving the communication efficiency of the multimedia conference.

Description

Method and device for realizing multimedia conference

The present application claims priority to Chinese Patent Application No. 201510255577.1, entitled "Method and Apparatus for Implementing Multimedia Conferences", which is incorporated by reference in its entirety in its entirety. .

Technical field

The present invention relates to the field of multimedia conference technologies, and more particularly, to a method and apparatus for implementing a multimedia conference.

Background technique

Multimedia conference is a kind of conference that integrates voice, video and data on the network. The multimedia conference provides users with long-distance transmission of voice, video, data, instant messaging and other multimedia services through the broadband access network. The web portal allows users to create multimedia conferences.

However, in the prior art multimedia conferences, the conference spokesperson and other participants often have poor communication. For example, when the conference spokesperson is different from the other participants' mother tongue or the conference spokesperson has a dialect, other participants often occur. It is impossible to accurately understand the meaning of the speaker of the meeting; for example, in the multimedia conference, if other participants are distracted and miss some of the speeches of the speaker of the meeting, the speech of the speaker of the meeting cannot be accurately understood, which greatly reduces the communication of the meeting. Effect.

Summary of the invention

In the embodiment of the present invention, a method and an apparatus for implementing a multimedia conference are provided to solve the problem that a participant cannot accurately understand the content of a conference speaker in a multimedia conference in the prior art.

In order to solve the above technical problem, the embodiment of the present invention discloses the following technical solutions:

In a first aspect, the present invention provides a method for implementing a multimedia conference, including:

The client obtains the voice information of the voice of the local participant, and sends the voice message of the voice to the multimedia conference server;

The client converts the spoken voice information into speech text information;

Sending, by the client, the spoken text information to a multimedia conference server, so that the client Transmitting, by the multimedia conference server, the speech voice information and the speech text information to a client of another participant of the multimedia conference;

The other participant is a participant other than the local participant among the participants of the multimedia conference.

With reference to the first aspect, in a first possible implementation manner of the first aspect, the performing, by the client, the speaking voice information to the speaking text information includes:

Receiving a speech notification message sent by the multimedia conference server, where the speech notification message carries a user identification information ID of the speaker, and the speaker is configured by the multimedia conference server according to the energy of the voice information sent by the participant participating in the multimedia conference, according to the a predetermined number of participants in the order of energy from large to small;

Determining whether the user ID carried in the speech notification message is the same as the user ID of the local participant;

If the user ID carried by the speech notification message is the same as the user ID of the local participant, the voice speech recognition engine converts the collected speech voice information into speech text information.

With reference to the first aspect, in a second possible implementation manner of the first aspect, the acquiring, by the client, the voice information of the local participant, includes:

Determining, by the client, whether the local participant has a speaking right;

If the local participant has the speaking right, the voice information of the local participant is collected by using the voice device.

With reference to the first aspect, in a third possible implementation manner of the first aspect, the acquiring, by the client, the voice information of the local participant, includes:

The client sends a speech request message to the multimedia conference server, where the speech request message carries the user ID of the local participant, so that the multimedia conference server sends the speech request message to the client corresponding to the host;

When the client receives the voice device open command sent by the multimedia conference server, the voice device is used to collect the voice information of the local participant; the voice device is configured. The standby enable command is received by the multimedia conference server, and the client corresponding to the host generates the response response message returned according to the speech request message.

In a second aspect, the present invention provides a method for implementing a multimedia conference, including:

The multimedia conference server acquires the speech speech information sent by the client and the speech text information corresponding to the speech speech information, and the speech speech information obtained by the client is converted by the speech recognition engine by using the speech recognition engine;

The multimedia conference server sends the speech voice information and the speech text information to the client corresponding to the other participant, so that the client corresponding to the other participant displays the speech voice information and the speech text information;

The other participant is a participant other than the participant who sends the speaking voice information and the speaking text information among the participants who participate in the multimedia conference.

With reference to the second aspect, in a first possible implementation manner of the second aspect, the method further includes:

The multimedia conference server detects energy of voice information sent by the client;

The multimedia conference server determines, according to the order of the energy from the largest to the smallest, a preset number of participants as a speaker;

The multimedia conference server sends a speech notification message to the client corresponding to the speaker, where the speech notification message carries the user identification information ID of the speaker, so that the client corresponding to the speaker obtains the speech. The person speaks the voice information and converts the spoken voice information into the spoken text information.

With reference to the second aspect, in a second possible implementation manner of the second aspect, the method further includes:

The multimedia conference server receives a speech request message sent by the client, where the speech request message carries a user ID of the participant corresponding to the client;

Sending, by the multimedia conference server, the spoofing request message to the client corresponding to the moderator, so that the client corresponding to the moderator determines, according to the spoofing request message, whether the participant who sends the sneak request message has the utterance authority;

Receiving, by the multimedia conference server, a speech response message sent by the client corresponding to the moderator, and sending the message to the client corresponding to the participant having the speaking permission according to the speech response message Sending a voice device open command, so that the participant having the speaking right collects the voice information of the voice;

The session response message is generated when the client corresponding to the moderator determines that the participant who sends the floor request message has the floor permission.

In a third aspect, the present invention provides a method for implementing a multimedia conference, including:

The client obtains the voice information of the local participant's speech;

Sending, by the client, the speech voice information to the multimedia conference server, so that the multimedia conference server converts the speech voice information into speech text information, and causes the multimedia conference server to display the speech information and the speech message The text of the speech is sent to the client of other participants participating in the multimedia conference.

In a fourth aspect, the present invention provides a method for implementing a multimedia conference, including:

The multimedia conference server obtains the voice information sent by the client;

The multimedia conference server converts the speech voice information into speech text information;

The multimedia conference server sends the speech voice information and the speech text information to a client corresponding to another participant, so that the client corresponding to the other participant displays the speech voice information and the speech text information. ;

The other participant is a participant other than the participant who sends the speaking voice information among the participants who participate in the multimedia conference.

With reference to the fourth aspect, in a first possible implementation manner of the fourth aspect, the converting, by the multimedia conference server, the speaking voice information into the speaking text information includes:

The multimedia conference server detects the energy of the voice information sent by the client, and sequentially determines the preset number of participants as the speaker according to the order of the energy;

The voice recognition engine is used to convert the voice information sent by the client corresponding to the determined speaker into the voice text information.

In a fifth aspect, the present invention provides a device for implementing a multimedia conference, which is used for a client, and includes:

An obtaining unit, configured to obtain speaking voice information of a local participant;

a converting unit, configured to convert the spoken voice information into speech text information;

a sending unit, configured to send the speech voice information and the speech text information to the multimedia conference server, so that the multimedia conference server sends the speech speech information and the speech text information to the multimedia conference Participant's client;

With reference to the fifth aspect, in a first possible implementation manner of the fifth aspect, the acquiring unit includes:

a first determining subunit, configured to determine whether the local participant has a speaking right;

The first collecting subunit is configured to: when the first determining unit determines that the local participant has a speaking right, collect the speaking voice information of the local participant by using a voice device.

With reference to the fifth aspect, in a second possible implementation manner of the fifth aspect, the converting unit includes:

a first receiving subunit, configured to receive a speech notification message sent by the multimedia conference server, where the speech notification message carries a user identity information ID of the speaker, and the speaker is sent by the multimedia conference server according to the participant participating in the multimedia conference The energy of the voice information is determined according to the order of the energy from the largest to the smallest preset number of participants;

a second determining subunit, configured to determine whether a user ID carried by the floor notification message is the same as a user ID of the local participant;

a second collection subunit, configured to: when the second judgment subunit determines that the user ID carried by the utterance notification message is the same as the user ID of the local participant, use a voice device to collect the speech of the local participant voice message.

With reference to the fifth aspect, in a third possible implementation manner of the fifth aspect, the acquiring unit specifically includes:

a first sending subunit, configured to send a speech request message to the multimedia conference server, where the speech request message carries a user ID of the local participant, so that the multimedia conference service Transmitting the speech request message to a client corresponding to the host;

a second receiving subunit, configured to receive a voice device open command sent by the multimedia conference server,

a third collection subunit, configured to: when the second receiving subunit receives the voice device open instruction, use a voice device to collect voice information of the local participant; the voice device open command is used by the multimedia The conference server receives the response response message returned by the client corresponding to the speaker request message.

In a sixth aspect, the present invention provides an apparatus for implementing a multimedia conference, which is used in a multimedia conference server, and includes:

An acquiring unit, configured to obtain speech speech information sent by the client, and speech text information corresponding to the speech speech information, where the speech speech information is converted by the speech recognition engine obtained by the client by using a speech recognition engine;

a first sending unit, configured to send the speaking voice information and the speaking text information to a client corresponding to another participant, so that the client corresponding to the other participant displays the speaking voice information and the speaking text information ;

In conjunction with the sixth aspect, in a first possible implementation manner of the sixth aspect, the method further includes:

a detecting unit, configured to detect energy of voice information sent by the client;

a determining unit, configured to determine, according to the order of the energy from the largest to the smallest, a preset number of participants as a speaker;

a second sending unit, configured to send a speech notification message to the client corresponding to the speaker, where the speech notification message carries the user identification information ID of the speaker, so that the client corresponding to the speaker acquires the location The speaker's speech information is converted and the speech information is converted into speech text information.

With reference to the sixth aspect, in a second possible implementation manner of the sixth aspect, the method further includes:

a first receiving unit, configured to receive a speech request message sent by the client, where the speech is requested The request message carries the user ID of the participant corresponding to the client;

a third sending unit, configured to send the floor request message to the client corresponding to the moderator, so that the client corresponding to the moderator determines, according to the floor request message, whether the participant who sends the floor request message has the speaking right ;

a second receiving unit, configured to receive a speech response message sent by the client corresponding to the host; the speech response message is determined by the client corresponding to the moderator, when the participant sending the speech request message has a speaking permission produce;

And a fourth sending unit, configured to send, according to the utterance response message, a voice device open command to a client corresponding to the participant having the utterance authority.

In a seventh aspect, the present invention provides an apparatus for implementing a multimedia conference, which is applied to a client, and includes:

a sending unit, configured to send the speech voice information to the multimedia conference server, so that the multimedia conference server converts the speech voice information into speech text information, and causes the multimedia conference server to Sending the spoken text information to the client of other participants participating in the multimedia conference

In an eighth aspect, the present invention provides an apparatus for implementing a multimedia conference, which is applied to a multimedia server, and includes:

The obtaining unit is configured to obtain the speaking voice information sent by the client;

a sending unit, configured to send the speaking voice information and the speaking text information to a client corresponding to another participant, so that the client corresponding to the other participant displays the speaking voice information and the speaking text information ;

In conjunction with the eighth aspect, in a first possible implementation manner of the eighth aspect, the converting unit includes:

The detecting subunit is configured to detect the energy of the voice information sent by the client, and sequentially determine the preset number of participants as the speaker according to the order of the energy;

The conversion subunit is configured to convert the speech speech information sent by the determined speaker into the speech text information by using the speech recognition engine.

A ninth aspect provides a multimedia conference system, including: a client and a multimedia conference server;

The client is configured to obtain the speech information of the local participant and send the speech information to the multimedia conference server; and convert the speech information into speech text information, and send the speech text information to the multimedia conference server;

The multimedia conference server is configured to send the speech voice information and the speech text information to a client of another participant participating in the multimedia conference;

With reference to the ninth aspect, in a first possible implementation manner of the ninth aspect, the multimedia conference server is further configured to detect, according to the energy, the energy of the voice information sent by the client participating in the multimedia conference The sequence of the large to small determines that the preset number of participants is a speaker, and sends a speech notification message to the client corresponding to the speaker, where the speech notification message carries the user identification information ID of the speaker;

The client is configured to receive a speech notification message sent by the multimedia conference server, and determine, according to the speech notification information, that the local participant is a speaker, obtain the speech information of the local participant, and send the speech message to the multimedia conference server. And converting the speech voice information into speech text information, and transmitting the speech text information to the multimedia conference server.

In a tenth aspect, the present invention further provides a multimedia conference system, including: a client and a multimedia conference server;

The client is configured to obtain the voice information of the local participant and send the voice message to the multimedia conference server.

The multimedia conference server is configured to convert the speech voice information into speech text information, and send the speech speech information and the speech text information corresponding to the speech speech information to a client corresponding to another participant; And the other participant is a participant other than the participant who sends the speaking voice information among the participants who participate in the multimedia conference;

The client corresponding to the other participant is further configured to display the speaking voice information and the speaking text information sent by the multimedia conference server to the user.

With reference to the tenth aspect, in a first possible implementation manner of the tenth aspect, the multimedia conference server is further configured to detect, according to the energy, the energy of the voice information sent by the client participating in the multimedia conference The order of the large to small determines that the preset number of participants is a speaker, and when the received speech voice information comes from the determined speaker, the speech information is converted into speech text information.

It can be seen from the foregoing technical solution that, in the solution for implementing the multimedia conference provided by the embodiment of the present invention, the client of the speaker can convert the voice information of the speaker to the text of the speech, and forward the text message to the participant through the multimedia conference server. The client corresponding to the participant other than the speaker in the participant of the multimedia conference, so that the speaker's speech information is displayed on the client corresponding to the other participant, so that the participant can only receive the voice message. This has led to the inability of participants to accurately understand the content of the speaker's speech, thus improving the effectiveness of the meeting communication.

DRAWINGS

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the embodiments or the description of the prior art will be briefly described below, and it will be apparent to those skilled in the art that In other words, other drawings can be obtained based on these drawings without paying for creative labor.

1 is a block diagram of a multimedia conference according to an embodiment of the present invention;

2 is a flowchart of a method for implementing a multimedia conference according to an embodiment of the present invention;

FIG. 3 is a flowchart of still another method for implementing a multimedia conference according to an embodiment of the present invention; FIG.

4 is a flowchart of another method for implementing a multimedia conference according to an embodiment of the present invention;

FIG. 5 is a flowchart of still another method for implementing a multimedia conference according to an embodiment of the present invention; FIG.

FIG. 6 is a flowchart of another method for implementing a multimedia conference according to an embodiment of the present invention; FIG.

FIG. 7 is a flowchart of another method for implementing a multimedia conference according to an embodiment of the present invention; FIG.

FIG. 8 is a schematic structural diagram of an apparatus for implementing a multimedia conference according to an embodiment of the present invention; FIG.

FIG. 9 is a schematic structural diagram of another apparatus for implementing a multimedia conference according to an embodiment of the present invention; FIG.

FIG. 10 is a schematic structural diagram of an acquiring unit according to an embodiment of the present invention; FIG.

11 is a schematic structural diagram of a conversion unit according to an embodiment of the present invention;

FIG. 12 is a schematic structural diagram of still another acquiring unit according to an embodiment of the present invention; FIG.

FIG. 13 is a schematic structural diagram of another implementation of a multimedia conference apparatus according to an embodiment of the present invention; FIG.

FIG. 14 is a schematic structural diagram of another apparatus for implementing a multimedia conference according to an embodiment of the present invention; FIG.

FIG. 15 is a schematic structural diagram of another apparatus for implementing a multimedia conference according to an embodiment of the present invention; FIG.

16 is a schematic structural diagram of an apparatus for implementing a multimedia conference applied to a client according to an embodiment of the present invention;

17 is a schematic structural diagram of an apparatus for implementing a multimedia conference applied to a multimedia conference server according to an embodiment of the present invention;

FIG. 18 is a schematic structural diagram of a client for implementing a multimedia conference according to an embodiment of the present invention; FIG.

19 is a schematic structural diagram of a multimedia conference server according to an embodiment of the present invention;

20 is a schematic structural diagram of another multimedia conference server according to an embodiment of the present invention;

21 is a schematic structural diagram of another client for implementing a multimedia conference according to an embodiment of the present invention;

FIG. 22 is a schematic structural diagram of another multimedia conference server according to an embodiment of the present invention.

detailed description

The solution of the multimedia conference provided by the embodiment of the present invention solves the problem that the participants in the background technology cannot accurately understand the speech information of the speaker, which leads to the problem of reducing the communication of the conference.

In order to make those skilled in the art better understand the technical solutions in the present invention, the technical solutions in the embodiments of the present invention will be clearly and completely described in conjunction with the accompanying drawings in the embodiments of the present invention. The embodiments are only a part of the embodiments of the invention, and not all of the embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments of the present invention without departing from the inventive scope should fall within the scope of the present invention.

The above is the core idea of the present invention. In order to enable those skilled in the art to better understand the present invention, the present invention will be further described in detail below with reference to the accompanying drawings.

The above-mentioned objects, features, and advantages of the embodiments of the present invention will become more apparent and understood. Give further details.

1 is a block diagram of a media conferencing system. As shown in FIG. 1, the multimedia conferencing system includes a plurality of clients 1 and at least one multimedia conferencing server 2. Among them, the client can be a personal PC, a laptop, and the like.

The client obtains the media stream information (for example, the voice information) of the participant, and uploads the media stream information to the multimedia conference server 2, and the multimedia conference server 2 performs the mixing process on the media stream sent by each client, and then sends the media stream to each terminal. So that geographically dispersed users communicate through graphics, sound, and so on.

FIG. 2 is a flowchart of a method for implementing a multimedia conference according to an embodiment of the present invention. The method is applied to the client shown in FIG. 1. As shown in FIG. 2, the method includes the following steps:

S110. The client obtains the voice information of the voice of the local participant, and sends the voice message to the multimedia conference server.

A local participant is a participant who is in the same geographic space as the client. For example, participant A uses client a to participate in a multimedia conference. For client a, participant A is a local participant corresponding to client a.

The client can use the voice device to obtain the voice information of the local participant's speech. The voice device can include voice information collection hardware integrated on the client and operating software that controls the voice information collection hardware. The voice information collection hardware can implement functions such as voice collection, voice coding, and voice decoding, for example, MIC. The operating software can query the number and name of the local voice information collecting hardware, and can also turn on, turn off or mute the voice collecting hardware.

This embodiment is applicable to a discussion-type conference application scenario, and each participant can speak, so that each client can obtain the voice information of the participant's voice corresponding to itself. If the client obtains the voice information of the participant's voice through the voice device, the voice device corresponding to each participant is turned on.

S120. The client converts the speech information into speech text information.

The client uses the speech recognition technology to convert the speech speech information of the obtained local speaker into the speech text information. The voice information of the local participants obtained by the client is relatively strong. Therefore, the accuracy of the voice message converted by the client corresponding to the speaker is higher. At the same time, in this way, the client corresponding to other participants does not need to convert the speech information of the speaker to the speech text information, thereby saving the resources of the client corresponding to other participants.

Optionally, the client corresponding to the speaker may further store the spoken text information, so as to generate the meeting minutes by using the spoken text information. Similarly, the client corresponding to other participants participating in the multimedia conference may also store the received speech text information to generate a meeting minutes based on the spoken text information. In addition, the corresponding client of the speaker can also display the spoken text information, so that the speaker can view the content of his speech.

S130. The client sends the speech text information to the multimedia conference server, so that the multimedia conference server sends the speech speech information and the speech text information to the client corresponding to the other participants.

The other participants are participants other than the speaker among all the participants participating in the multimedia conference.

The multimedia conference server sends the received voice information and the spoken text information to the client corresponding to other participants participating in the multimedia conference. Client display corresponding to other participants Received voice messages and text messages, which help participants quickly understand the speaker's speech.

For example, participants participating in the local multimedia conference include A, B, C, D, and E. Among them, participant A is a speaker, and participants B, C, D, and E are other participants. The multimedia conference server sends the voice information and the voice text information of the participant A to the B, C, D, and E.

The T.120 protocol standard can be integrated on both the client and the multimedia conference server to implement the function of transmitting and receiving voice messages and speaking text information between the client and the multimedia conference server. Among them, the T.120 standard includes a series of protocols such as T.120-T.127, which can realize the reliability of information transmission between clients and between the client and the multimedia conference server, and at the same time, can provide points to Multi-point data distribution service and select the transmission channel with the best transmission efficiency to transmit data.

In the method for implementing the multimedia conference shown in this embodiment, the client obtains the speech information of the local participant and converts the speech information into speech text information. Then, the speech information and the speech text information are sent to the multimedia conference server, and then the multimedia conference server forwards the client to the client corresponding to the other participants participating in the multimedia conference, and the client corresponding to the other participant displays the received speech information and the speech. text information. By using the method for implementing the multimedia conference provided by the present invention, the participant can both hear the speech information of the speaker and the corresponding speech text information, so that the participant can accurately understand the speaker by combining the speech text information and the speech voice information. The content of the speech has thus improved the communication effect of the multimedia conference.

In an application scenario, all participants are allowed to speak, for example, a discussion session. However, if the voice information sent by all participants is converted into the corresponding text information, many voices that are not related to the conference will be converted into text, and many texts that are not related to the conference will be displayed to the participants, causing interference to the participants. . In view of the above application scenarios, a participant with a large voice energy can be determined as a speaker, and the voice information of the speaker's speech can be converted into a speech text information, and the voice content of other participants with less voice energy is ignored.

FIG. 3 is a flowchart of another method for implementing a multimedia conference according to an embodiment of the present invention. The embodiment is applicable to an application scenario in which the number of participants is large and the participants can speak. As shown in FIG. 3, the method may include the following steps:

S210. The multimedia conference server detects the energy of the voice information sent by the client.

The client participating in the multimedia conference sends the obtained voice information of the participant to the multimedia conference server, and the multimedia conference server detects the energy of the received voice information.

In this embodiment, the energy of detecting the voice information may be implemented by a voice conference bridge in the multimedia conference server. The voice conference bridge is used to provide a voice conference site on the server side, and the voices of the speakers are mixed and sent to each participant.

S220: The multimedia conference server determines, according to the order of the energy of the voice information, that the preset number of participants is a speaker.

The multimedia conference server detects the energy of the voice information sent by the participants participating in the multimedia conference, sorts the energy from large to small, and sequentially determines the preset number of participants as the speaker. For example, the preset number may be one, that is, the participant with the highest energy of the voice information is determined as a speaker; or the preset number may be two, that is, two participants with the highest energy of the voice information are determined as spokesman.

It should be noted that, in this application scenario, if the speaker emits different voice energy at different times, the multimedia conference server may determine that the speaker may be different according to the energy of the voice information.

S230. The multimedia conference server sends a speech notification message to the participant participating in the multimedia conference, where the speech notification message carries a user ID (Identification) of the speaker.

The multimedia conference server may send a speech notification message to all clients participating in the multimedia conference by means of broadcast, and the client of the participant determines the participant corresponding to the client according to the user ID in the speech notification message. Whether the speaker is a speaker or not, the speaker notification message may be sent one-to-one to the client of the participant corresponding to the user ID, and the user may perform the determination according to the user ID.

The participant's client receives the speech notification message of the multimedia conference server. Since the speech notification message contains the user ID, the participant's client can compare the user ID with its own user ID, thereby determining the corresponding client's Whether the participant is a spokesperson.

S240. When the client determines that the user ID carried by the speech notification message is the same as the user ID of the user, the local participant is determined to be a speaker.

S250. The client corresponding to the speaker obtains the voice information of the speaker of the speaker, and sends the voice message to the multimedia conference server.

S260. The client corresponding to the speaker converts the spoken voice information into the spoken text information.

S270. The client corresponding to the speaker sends the spoken text information to the multimedia conference server.

S280. The multimedia conference server sends the speech voice information and the speech text information to a client corresponding to another participant.

S290. The client corresponding to the other participant displays the speaking voice information and the speaking text information.

The method for implementing the multimedia conference provided by the embodiment, the multimedia conference server detects the energy of the voice information sent by each participant, and determines the preset number of participants as the speaker according to the order of the energy, that is, only the energy The speech content of the largest preset number of participants is converted into corresponding text information. This method can avoid converting a lot of speech unrelated to a conference into text, resulting in many texts that are not related to the conference being displayed to the participants, and the interference caused to the participants occurs.

In another application scenario, only the content of the host and the presenter's speech is converted into corresponding text information, and the contents of other participants' speeches are ignored.

FIG. 4 is a flowchart of another method for implementing a multimedia conference according to an embodiment of the present invention. In this embodiment, only the speech content of the speaker having the speaking authority is converted into the text information. As shown in FIG. 4, the method includes the following steps:

S310, the client determines whether the local participant has the speaking right; if the local participant has the speaking right, executes S320; otherwise, ends the current process.

In the application scenario where the conference has a moderator and a fixed presenter, usually the presenter and the moderator have the right to speak. Determining whether the participant has the speaking right may include determining whether the identity attribute of the participant has the presenter right or the moderator right.

S320: The client obtains the voice information of the voice of the local participant, and sends the voice message to the multimedia conference server.

S330. The client converts the speech voice information into speech text information.

The client can have a built-in speech recognition engine, and the client uses the speech recognition engine to convert the speech information of the local participants into the speech information.

S340. The client sends the text message to the multimedia conference server.

After obtaining the voice information of the local participant's voice, the client can immediately send the voice message to the multimedia conference server, so that the multimedia conference server can forward the voice message of the speaker to other participants in time to ensure voice information. The real-time nature of the transmission. Of course, if the time required to convert the speech information into the speech information is very short, generally in the millisecond level, the speech information and the speech information can be sent to the multimedia conference server together, so that the corresponding client of the other participants can be played. The speech information and the displayed speech information are synchronized.

S350. The multimedia conference server sends the speech voice information and the speech text information to a client corresponding to another participant.

S360. The client corresponding to the other participant displays the speaking voice information and the speaking text information.

The method for implementing the multimedia conference provided in this embodiment only converts the speech information of the participant having the speaking authority into the speech text information, instead of converting the speech content of all the participants into the corresponding text information. The method can be used to prevent the conference-independent voice content sent by the participants in the multimedia conference from being converted into corresponding text information and forwarded to other participants, thereby preventing other participants' clients from displaying too much unimportant text information to the participants. The phenomenon of interference occurs.

In another application scenario, only the moderator and the presenter can speak, other participants cannot speak, the voice devices of other participants are turned off, and the participants themselves cannot turn on the voice device. When a participant needs to speak, the participant can request the moderator to turn on the participant's voice device.

FIG. 5 is a flowchart of still another method for implementing a multimedia conference according to an embodiment of the present invention. The method is applied to an application scenario in which a moderator specifies a speaker, and the method includes the following steps:

S410. The client sends a speech request message to the multimedia conference server, where the speech request message carries a user identity ID of the participant corresponding to the client.

When the participant other than the moderator and the presenter needs to speak, the client corresponding to the participant sends a speech request message to the multimedia conference server. The floor request message carries the meeting User ID.

S420. The multimedia conference server forwards the floor request message to the client corresponding to the moderator.

S430. The client corresponding to the host sends a speech response message to the multimedia conference server when determining, according to the speech request message, that the participant is allowed to speak.

After receiving the message request message, the client corresponding to the moderator determines whether to allow the participant to speak according to the user ID carried in the message request message. If the participant is allowed to speak, the speaker response message is generated and sent to the multimedia conference server. The participant response message may also carry the participant's user ID to facilitate the multimedia conference server to identify the participant.

The client corresponding to the moderator may determine whether to allow the participant to speak according to the identity attribute of the preset participant.

S440. The multimedia conference server generates a voice device start command according to the voice response message, and sends the voice device open command to the client corresponding to the speaker.

The multimedia conference server generates a voice device open command according to the received speech response message, and the voice device open command is used to control the voice device corresponding to the participant who is allowed to speak.

S450: When the client corresponding to the speaker receives the voice device open command, the voice device is used to obtain the voice message of the speaker, and the voice message is sent to the multimedia conference server.

S460. The client corresponding to the speaker converts the spoken voice information into the spoken text information.

S470: The client corresponding to the speaker sends the text message to the multimedia conference server.

S480. The multimedia conference server sends the speech voice information and the speech text information to a client corresponding to another participant other than the speaker.

S490. The client corresponding to the other participant displays the speaking voice information and the speaking text information.

The method for implementing a multimedia conference provided by this embodiment, when a participant other than the host or the presenter needs to speak, sends a speech request message to the client of the host, and the host determines whether to allow the message according to the speech request message. The participant speaks, if the participant is allowed to speak, then Sending a speech response message to the multimedia conference server that allows the participant to speak, and the multimedia conference server generates a voice device activation command according to the speech response message, and controls the voice device corresponding to the participant to be turned on. The voice information of the participant is obtained by the voice device corresponding to the participant, and the client corresponding to the participant converts the voice information into speech text information. This method is applicable to formal conferences or higher-level conference scenarios, and expands the scope of application of multimedia conference implementation methods.

FIG. 6 is a flowchart of another method for implementing a multimedia conference according to an embodiment of the present invention. As shown in FIG. 6, the method includes the following steps:

S510: The client obtains the voice information of the voice of the local participant, and sends the voice message to the multimedia conference server.

The client uses the voice device to collect the voice information of the participant's speech.

S520. The multimedia conference server converts the speech voice information into speech text information.

The multimedia conference server converts the received speech voice information into speech text information by using a voice recognition engine before mixing the voice information sent by each participant.

In an embodiment of the present invention, all participants participating in the multimedia conference can speak freely, and any one of the participants can send the obtained voice information of the local participant to the multimedia conference server. Correspondingly, the multimedia conference server can convert the speech text information of any one participant into the speech text information.

In another embodiment of the present invention, only the moderator and the presenter can speak, and only the moderator and the presenter can send the obtained voice message to the multimedia conference server. The multimedia conference server converts the received speech voice information into speech text information.

S530. The multimedia conference server sends the speech voice information and the corresponding speech text information to a client of another participant participating in the multimedia conference. The other participant is a participant other than the local participant among the participants of the multimedia conference.

S540. The client of the other participant displays the speaking voice information and the corresponding speaking text information.

The method for implementing a multimedia conference provided by this embodiment, the client of the participant obtains the speech voice The information is sent to the multimedia conference server, and the multimedia conference server converts the speech voice information into the speech text information, and then sends the speech voice information and the corresponding speech text information to the client corresponding to other participants participating in the multimedia conference. In this way, the participants who participate in the multimedia conference can not only hear the voice information of the speaker's speech, but also can see the corresponding speech text information, can accurately understand the speaker's speech content, and improve the communication effect of the multimedia conference. The method converts the speech information into speech text information by the multimedia conference server, and does not need to integrate the speech recognition engine on each client, thereby reducing the production cost of the client.

FIG. 7 is a flowchart of another method for implementing a multimedia conference according to an embodiment of the present invention. In this embodiment, according to the energy of the voice information of the participant method, the preset number of participants with the largest energy is determined as a speaker, and the speaker is spoken. The speech information of the person's speech is converted into speech text information. As shown in FIG. 7, the method may include the following steps:

S610. The multimedia conference server detects the energy of the voice information sent by the client.

S620: The multimedia conference server determines, according to the order of the energy of the voice information, that the preset number of participants is a speaker.

S630. The client obtains the voice message of the local participant and sends the voice message to the multimedia conference server.

S640: The multimedia conference server converts the voice information sent by the client corresponding to the determined speaker into the voice text information.

S650: The multimedia conference server sends the voice message and the corresponding voice text information sent by the client corresponding to the speaker to the client of the other participant participating in the multimedia conference.

S660. The client of the other participant displays the received speech voice information and the corresponding speech text information.

In the method for implementing the multimedia conference provided by the embodiment, the multimedia conference server detects the energy of the voice information sent by each participant, and determines the preset number of participants as the speaker according to the order of the energy. The multimedia conference server only converts the contents of the confirmed speaker into Corresponding text information. This method can avoid converting a lot of speech unrelated to a conference into text, resulting in many texts that are not related to the conference being displayed to the participants, and the interference caused to the participants occurs.

FIG. 8 is a schematic structural diagram of an apparatus for implementing a multimedia conference according to an embodiment of the present invention. As shown in FIG. 8, the apparatus for implementing a multimedia conference is used for a client, and includes: an obtaining unit 110, a converting unit 120, and a sending unit 130. .

The obtaining unit 110 is configured to obtain local speaking voice information.

A local participant refers to a participant in the same geographical space as the client. For example, participant A uses client a to participate in a multimedia conference. For client a, participant A is a local participant corresponding to client a.

The obtaining unit 110 may acquire the speech information of the local participant using the voice device. The voice device can include voice information collection hardware integrated on the client and operating software that controls the voice information collection hardware. The voice information collection hardware can implement functions such as voice collection, voice coding, and voice decoding. The operating software can query the number and name of the local voice information collecting hardware, and can also turn on, turn off or mute the voice collecting hardware.

The device for implementing multimedia in this embodiment can be applied to a discussion-type conference application scenario, and each participant can speak, so that each client can obtain the voice information of the participant's voice corresponding to itself. If the client obtains the voice information of the participant's voice through the voice device, the voice device corresponding to each participant is turned on.

The converting unit 120 is configured to convert the speech information into speech text information.

The converting unit 120 converts the voice information of the obtained local speaker into speech text information by using a voice recognition technology.

The voice information of the local participants obtained by the client is relatively strong. Therefore, the accuracy of the voice message converted by the client corresponding to the speaker is higher. At the same time, in this way, the client corresponding to other participants does not need to convert the speech information of the speaker's speech into the speech text information, thereby saving the resources of the client corresponding to other participants.

The sending unit 130 is configured to send the speaking voice information and the speaking text information to the multimedia conference server, so that the multimedia conference server sends the speaking voice information and the speaking text information. The client corresponding to other participants.

The other participant is a participant other than the speaker among all the participants who participate in the multimedia conference.

The client sends the voice message and the text message to the multimedia conference server, so that the multimedia conference server sends the message to the client corresponding to the other participants participating in the multimedia conference, and finally the client corresponding to the other participant receives the presentation. The speech message and the text message are spoken, which helps the participants to quickly understand the speaker's speech.

The apparatus for implementing a multimedia conference shown in this embodiment acquires the speech information of the local participant by the acquisition unit, and converts the speech information into speech text information through the conversion unit. Then, the speaking voice information and the speaking text information are sent to the multimedia conference server through the sending unit, and then forwarded by the multimedia conference server to the client corresponding to other participants participating in the multimedia conference, and the client corresponding to the other participant displays the received voice. Information and text messages. With the device for implementing the multimedia conference provided by the present invention, the participant can both hear the voice information of the speaker and the corresponding text message, so that the participant can accurately understand the speaker by combining the text message and the voice message. The content of the speech has thus improved the communication effect of the multimedia conference.

FIG. 9 is a schematic structural diagram of another apparatus for implementing a multimedia conference according to an embodiment of the present invention. The apparatus may further include: a display unit 140 and a storage unit 150 on the basis of the embodiment shown in FIG. 8.

The display unit 140 is configured to display the spoken text information.

The storage unit 150 is configured to store the spoken text information.

Optionally, by adding the storage unit 150, the client corresponding to the speaker may further store the utterance text information, so as to generate the conference minutes by using the utterance text information. Similarly, the client corresponding to other participants participating in the multimedia conference may also store the received speech text information to generate a meeting minutes based on the spoken text information. In addition, the corresponding client of the speaker can also display the spoken text information, so that the speaker can view the content of his speech.

In an application scenario, only the content of the host and the presenter's speech is converted into corresponding text information, and the contents of other participants' speeches are ignored.

FIG. 10 is a schematic structural diagram of an obtaining unit 110 according to an embodiment of the present invention. The multi-acquisition unit 110 is configured to convert a speech content of a host or a presenter into corresponding text information, and ignore the content of other participants. Application scenario. As shown in FIG. 8, the obtaining unit 110 may include a first determining subunit 1101 and a first collecting subunit 1102:

The first determining sub-unit 1101 is configured to determine, when the participant corresponding to the local client needs to speak, whether the participant has the speaking right.

In the application scenario where the conference has a moderator and a fixed presenter, usually only the presenter and the moderator have the right to speak. Determining whether the participant has the speaking right may include determining whether the identity attribute of the participant has the presenter right or the moderator right.

The first collecting sub-unit 1102 is configured to use the voice device to collect the speaking voice information when the first determining unit 1101 determines that the local participant has the speaking right having the speaking right or the moderator right.

In the apparatus for implementing a multimedia conference provided by this embodiment, only the speech information of the participant having the speaking authority is converted into the speech text information, instead of converting the speech content of all the participants into the corresponding text information. The device can be used to prevent the conference-independent voice content sent by the participants in the multimedia conference from being converted into corresponding text information and forwarded to other participants, thereby preventing other participants' clients from displaying too much unimportant text information to the participants. The phenomenon of interference occurs.

In another application scenario, all participants are allowed to speak, for example, a discussion session. However, if the voice information sent by all participants is converted into the corresponding text information, it will result in Many speeches unrelated to the conference are converted into text, and many texts that are not related to the conference are displayed to the participants, causing interference to the participants. In view of the above application scenarios, a participant with a large voice energy can be determined as a speaker, and the voice information of the speaker's speech can be converted into a speech text information, and the voice content of other participants with less voice energy is ignored.

FIG. 11 is a schematic structural diagram of a conversion unit 120 according to an embodiment of the present invention. The conversion unit 120 is applicable to an application scenario in which a large number of participants and participants can speak. As shown in FIG. 11, the converting unit 120 may include a first receiving subunit 1201, a second judging subunit 1202, and a second collecting subunit 1203:

The first receiving subunit 1201 is configured to receive a speech notification message sent by the multimedia conference server, where the speaker notification message carries the user identity information ID of the speaker, and the speaker is sent by the multimedia conference server according to the participant participating in the multimedia conference. The energy of the voice information is determined according to the order of the energy from the largest to the smallest preset number of participants, and the client of the participant can compare the user ID with its own user ID to determine its own client. Whether the participant corresponding to the end is a speaker.

The second determining sub-unit 1202 is configured to determine whether the user ID carried by the speech notification message is the same as the user ID of the local participant.

The second collection sub-unit 1203 is configured to: when the second determination sub-unit 1202 determines that the user ID carried by the speech notification message is the same as the user ID of the local participant, the local participant is collected by using a voice device. Speech message.

In this embodiment, the first notification subunit in the conversion unit 120 receives the speech notification message sent by the multimedia conference server. Since the speech notification message carries the user identity information ID of the speaker, the speaker may be according to the multimedia conference server. The energy of the voice information sent by the participants participating in the multimedia conference is determined according to the number of participants in the order of being large to small, that is, the client only converts the content of the voice of the preset number of participants with the largest energy into corresponding Text information. It is possible to avoid converting a lot of speech that is not related to the conference into text, which causes many texts that are not related to the conference to be displayed to the participants, and the interference caused to the participants appears.

In another application scenario, only the moderator and the presenter can speak, and other participants cannot. Speaking, the voice devices of other participants are turned off, and the participants themselves cannot turn on the voice device. When a participant needs to speak, the participant can request the moderator to turn on the participant's voice device.

FIG. 12 is a schematic structural diagram of still another acquiring unit 110 according to an embodiment of the present invention. The acquiring unit 110 is applied to an application scenario in which the moderator specifies a speaker. As shown in FIG. 12, the acquiring unit 110 includes: a first sending subunit 1103, a second receiving subunit 1104, and a third collecting subunit 1105.

a first sending subunit 1103, configured to send a floor request message to the multimedia conference server, where the floor request message carries a user ID of the local participant, so that the multimedia conference server sends the floor request message to the moderator .

When the participant other than the moderator and the presenter needs to speak, the client corresponding to the participant sends a speech request message to the multimedia conference server. The floor request message carries the user ID of the participant.

a second receiving subunit 1104, configured to receive a voice device open command sent by the multimedia conference server, where

The voice device open command is generated by the multimedia conference server after receiving the utterance response message returned by the client corresponding to the utterance request message, and specifically, after receiving the sneak request message, the client corresponding to the host The user ID carried in the floor request message determines whether the participant is allowed to speak. If the participant is allowed to speak, the client corresponding to the host will generate a response message and send it to the multimedia conference server. The participant response message may also carry the participant's user ID to facilitate the multimedia conference server to identify the participant.

The third collection sub-unit 1105 is configured to collect the speech information of the local participant by using the voice device when the second receiving sub-unit 1104 receives the voice device opening instruction.

In the acquiring unit provided by the embodiment, when the participant other than the moderator or the presenter needs to speak, the multimedia conference server transmits a request message to the client of the host, and the host determines whether the message is requested according to the message. Allowing the participant to speak. If the participant is allowed to speak, the host's client sends a message to the multimedia conference server that allows the participant to speak. And responding to the message, so that the multimedia conference server generates a voice device open command according to the voice response message, and controls the voice device corresponding to the participant to be turned on. The voice information of the participant is obtained by the voice device corresponding to the participant, and the client corresponding to the participant converts the voice information into speech text information. The device is suitable for formal meetings or higher-level meeting scenarios, and expands the scope of application of the multimedia conference implementation method.

FIG. 13 is a schematic structural diagram of another apparatus for implementing a multimedia conference according to an embodiment of the present invention. As shown in FIG. 13, the apparatus for implementing a multimedia conference is used for a multimedia conference server. As shown in FIG. 13, the apparatus includes an acquisition unit. 210 and the first transmitting unit 220.

The obtaining unit 210 is configured to obtain the speaking voice information and the speaking text information sent by the client.

The first sending unit 220 is configured to send the speaking voice information and the speaking text information to the client corresponding to the other participant, so that the client corresponding to the other participant displays the speaking voice information and the speaking text. Information; wherein the other participant is a participant other than the participant who sent the speaking voice information and the speaking text information among the participants participating in the multimedia conference.

The multimedia conference server sends the received voice information and the spoken text information to the client corresponding to other participants participating in the multimedia conference. The client corresponding to other participants displays the received speech information and the spoken text information, thereby facilitating the participants to quickly understand the speaker's speech.

The device for implementing the multimedia conference applied to the multimedia conference server in the embodiment, the client obtains the voice information of the voice of the local participant and sends the voice message to the multimedia conference server; and then, the multimedia conference server forwards the voice message and the voice message to the voice conference server. A client corresponding to another participant participating in the multimedia conference, so that the client corresponding to the other participant displays the received speech voice information and the speech text information. With the device for implementing the multimedia conference provided by the present invention, the participant can both hear the voice information of the speaker and the corresponding text message, so that the participant can accurately understand the speaker by combining the text message and the voice message. The content of the speech has thus improved the communication effect of the multimedia conference.

In an application scenario, all participants are allowed to speak, for example, a discussion session. However, if the multimedia conference server sends voice messages and text messages from all participants For other participants, a lot of speech that is not related to the conference will be converted into text, and many texts that are not related to the conference will be displayed to the participants, causing interference to the participants. In view of the above application scenarios, a participant with a large voice energy can be determined as a speaker, and the voice information of the speaker's speech can be converted into a speech text information, and the voice content of other participants with less voice energy is ignored.

FIG. 14 is a schematic structural diagram of another apparatus for implementing a multimedia conference according to an embodiment of the present invention. The embodiment is applicable to an application scenario in which the number of participants is large and the participants can speak. The device is implemented in FIG. 13 . The detection unit 230, the determination unit 240, and the second transmission unit 250 may also be included on an example basis.

The detecting unit 230 is configured to detect energy of the voice information sent by the client.

The multimedia conference server will receive the voice information of the participant obtained by the client of the participant participating in the multimedia conference, and the multimedia conference server detects the energy of the received voice message.

The determining unit 240 is configured to determine, according to the order of the energy from the largest to the smallest, the pre-preset number of participants as the speaker.

The multimedia conference server detects the energy of the voice information sent by the participants participating in the multimedia conference, sorts according to the energy from large to small, and sequentially determines a preset number of participants as speakers. For example, the preset number may be one, that is, the participant with the highest energy of the voice information is determined as a speaker; or the preset number may be two, that is, two participants with the highest energy of the voice information are determined as spokesman.

The second sending unit 250 is configured to send a spoofing notification message to the client corresponding to the speaker, where the spoofing notification message carries the user identification information ID of the speaker, so that the client corresponding to the speaker obtains The speaker's speech information is converted and the speech information is converted into speech text information.

The multimedia conference server may send a speech notification message to all clients participating in the multimedia conference by means of broadcast, and the client of the participant determines the participant corresponding to the client according to the user ID in the speech notification message. Whether the speaker is a speaker or not, the speaker notification message may be sent one-to-one to the client of the participant corresponding to the user ID, and the client determines whether the speaker is a speaker according to the user ID.

The participant's client receives the speech notification message of the multimedia conference server. Since the speech notification message contains the user ID, the participant's client can compare the user ID with its own user ID, thereby determining the client's corresponding participation. Whether it is a spokesman.

The device for implementing the multimedia conference provided by the embodiment, the multimedia conference server detects the energy of the voice information sent by each participant, and determines the preset number of participants as the speaker according to the order of the energy, that is, only the energy The speech content of the largest preset number of participants is converted into corresponding text information. The device can prevent many conference-independent voices generated by many clients from being converted into texts, resulting in many texts that are not related to the conference being displayed to the participants, and the interference caused to the participants occurs.

FIG. 15 is a schematic structural diagram of another apparatus for implementing a multimedia conference according to an embodiment of the present invention. The device for implementing the multimedia conference is applied to the application scenario of the moderator designated by the moderator. The device may further include: a first receiving unit 260, a third sending unit 270, and a second receiving, based on the embodiment shown in FIG. Unit 280 and fourth transmitting unit 290.

The first receiving unit 260 is configured to receive a floor request message sent by the client, where the floor request message carries a user ID of the participant corresponding to the client.

The third sending unit 270 is configured to send the floor request message to the client corresponding to the host, so that the client corresponding to the moderator determines, according to the floor request message, whether the participant who sends the floor request message has a speech. Permissions.

The second receiving unit 280 is configured to receive a speech response message sent by the client corresponding to the moderator.

After receiving the message request message, the client corresponding to the moderator determines whether the participant is allowed to speak according to the user ID carried in the message request message. If the participant is allowed to speak, the hair is generated In response to the message, the multimedia conference server will receive the speech response message of the participant. The participant response message may also carry the participant's user ID to facilitate the multimedia conference server to identify the participant.

The client corresponding to the moderator may determine whether to allow the participant to speak according to the identity attribute of the preset participant. For example, when establishing a multimedia conference, the moderator can judge whether the participant can speak according to the attendance status of the participant, for example, the presenter of the conference allows the speaker to speak.

The fourth sending unit 290 is configured to send a voice device open command to the client corresponding to the participant having the floor permission, where the speaker response message is determined by the client corresponding to the moderator to send the participant request message Generated when speaking permission.

In the device for implementing the multimedia conference provided by this embodiment, when the participant other than the moderator or the presenter needs to speak, the multimedia conference server forwards the message request message of the other participant to the client of the moderator, and is hosted by the host. The person determines, according to the speech request message, whether the participant is allowed to speak. If the participant is allowed to speak, the multimedia conference server receives the speech response message sent by the moderator client to allow the participant to speak, and the multimedia conference The server generates a voice device open command according to the voice response message, and controls the voice device corresponding to the participant to be turned on. After being enabled, the voice device corresponding to the participant obtains the voice information of the voice of the participant, and the client corresponding to the participant converts the voice message into speech text information. This method is applicable to formal conferences or higher-level conference scenarios, and expands the scope of application of multimedia conference implementation methods.

Corresponding to the method embodiment for implementing the multimedia conference shown in FIG. 6 to FIG. 7 above, the present invention further provides a corresponding device embodiment.

FIG. 16 is a schematic structural diagram of an apparatus for implementing a multimedia conference applied to a client according to an embodiment of the present invention. The apparatus includes: an obtaining unit 310 and a sending unit 320.

The obtaining unit 310 is configured to obtain the speaking voice information of the local participant.

The sending unit 320 is configured to send the speech voice information to the multimedia conference server, so that the multimedia conference server converts the speech voice information into speech text information, and causes the multimedia conference server to use the speech message information And sending the speech text information to the client of other participants participating in the multimedia conference

In the device for implementing the multimedia conference provided by the embodiment, the client of the participant obtains the voice message and sends the message to the multimedia conference server, and the multimedia conference server converts the voice message into the voice message, and then the voice message and the corresponding voice message. The spoken text information is sent to the client corresponding to other participants participating in the multimedia conference. In this way, the participants who participate in the multimedia conference can not only hear the voice information of the speaker's speech, but also can see the corresponding speech text information, can accurately understand the speaker's speech content, and improve the communication effect of the multimedia conference. The method converts the speech information into speech text information by the multimedia conference server, and does not need to integrate the speech recognition engine on each client, thereby reducing the production cost of the client.

FIG. 17 is a schematic structural diagram of an apparatus for implementing a multimedia conference applied to a multimedia conference server according to an embodiment of the present invention. The apparatus includes: an obtaining unit 410, a converting unit 420, and a sending unit 430.

The obtaining unit 410 is configured to obtain the speaking voice information sent by the client.

The converting unit 420 is configured to convert the spoken voice information into speech text information.

In an embodiment of the present invention, the multimedia conference server determines, according to the energy of the participant's voice information, a preset number of participants with the largest energy as a speaker, and converts the voice information of the received speaker into a speech text. information. The conversion unit 420 may include a detection subunit and a conversion subunit.

The detecting subunit is configured to detect the energy of the voice information sent by the client, and determine, according to the energy from the largest to the smallest, the preset number of participants are speakers; the conversion subunit is used to utilize The speech recognition engine converts the speech speech information sent by the determined speaker into speech text information.

The sending unit 430 is configured to send the speaking voice information and the speaking text information to a client corresponding to another participant, so that the client corresponding to the other participant displays the speaking voice information and the speaking text. information.

In the device for implementing the multimedia conference provided by the embodiment, the multimedia conference server detects the energy of the voice information sent by each participant, and determines the preset number of participants as the speaker according to the order of the energy. The multimedia conference server only converts the content of the speech of the determined speaker into corresponding text information. This method can avoid converting a lot of speech unrelated to a conference into text, resulting in many texts that are not related to the conference being displayed to the participants, and the interference caused to the participants occurs.

Corresponding to the device for implementing the multimedia conference applied to the client shown in FIG. 8 to FIG. 12, the embodiment of the present invention further provides a client for implementing the multimedia conference. Referring to FIG. 18, the client includes: a processor 1411. , the transmitter 1412 and the memory 1413;

The memory 1413 stores an operation instruction executable by the processor 1411, and the processor 1411 reads the operation instruction in the memory 1413 for implementing the following functions: acquiring the speech information of the local participant, and converting the speech information into the speech information. .

In an embodiment of the present invention, the audio signal of the participant may be collected by the voice device for corresponding processing and then provided to the processor 1411. For example, the voice device may be a MIC.

In an embodiment of the present invention, the processor 1411 is specifically configured to: determine whether the local participant has the speaking right; if the local participant has the speaking right, collect the speaking voice information of the local participant.

The transmitter 1412 is configured to send the speech voice information and the speech text information to the multimedia conference server, so that the multimedia conference server sends the speech speech information and the speech text information to the multimedia conference. The client corresponding to the other participants, wherein the other participants are other participants except the speaker among all the participants who participate in the multimedia conference.

In an embodiment of the present invention, the multimedia conference server is based on the participant's voice information. The amount of energy, the preset number of participants who determine the maximum energy is the speaker, and then the client converts the speech information of the speaker into the speech information. In this embodiment, the client that implements the multimedia conference may further include a receiver.

The receiver is configured to receive a speech notification message sent by the multimedia conference server, where the speech notification message carries a user identity information ID of the speaker, and the speaker is sent by the multimedia conference server according to the voice sent by the participant participating in the multimedia conference. The energy of the information, which is determined according to the order in which the energy is in a small order;

The processor 1411 is further configured to determine whether the user ID carried by the floor notification message is the same as the user ID of the local participant, if the user ID carried by the floor notification message and the local participant The user ID is the same, and the local participant is determined to be a speaker, and then the voice information of the local participant is obtained.

In still another embodiment of the present invention, only the moderator and the presenter can speak, other participants cannot speak, the voice devices of other participants are turned off, and the participants themselves cannot turn on the voice device. When a participant needs to speak, the participant can request the moderator to turn on the participant's voice device.

The sender 1412 is further configured to send a floor request message to the multimedia conference server, where the floor request message carries the user ID of the local participant, so that the multimedia conference server sends the floor request message to the moderator.

The receiver is further configured to: receive a voice device open command sent by the multimedia conference server, and provide a voice device open command to the voice device, so that the voice device collects voice information of the local participant, where the voice device is enabled. The instruction is received by the multimedia conference server, and the client corresponding to the host generates the utterance response message returned according to the utterance request message.

In another embodiment of the present invention, the client implementing the multimedia conference may further include a display. The display is configured to display the spoken text information. The memory is further configured to store the spoken text information to generate a meeting minutes by using the spoken text information.

In the client provided by this embodiment, the client obtains the speech information of the local participant and converts the speech information into speech text information. Then speak speech information and speech text information The message is sent to the multimedia conference server, and then forwarded by the multimedia conference server to the client corresponding to other participants participating in the multimedia conference, and the client corresponding to the other participant displays the received voice message and the text message. By using the method for implementing the multimedia conference provided by the present invention, the participant can both hear the speech information of the speaker and the corresponding speech text information, so that the participant can accurately understand the speaker by combining the speech text information and the speech voice information. The content of the speech has thus improved the communication effect of the multimedia conference.

The embodiment of the present invention further provides a multimedia conference server corresponding to the device for implementing the multimedia conference applied to the multimedia conference server shown in FIG. 13 to FIG. 15 . Referring to FIG. 19 , the multimedia conference server includes: a receiver 1511 And a transmitter 1512.

The receiver 1511 is configured to obtain speech voice information and speech text information sent by the client.

The sender 1512 is configured to send the speech voice information and the speech text information to a client corresponding to another participant, so that the client corresponding to the other participant displays the speech voice information and the speech text information. The other participant is a participant other than the participant who sends the speaking voice information and the speaking text information among the participants who participate in the multimedia conference.

In a specific embodiment of the present invention, as shown in FIG. 20, the multimedia conference server further includes a processor 1513.

The receiver 1511 is further configured to acquire voice information energy sent by the client.

The processor 1513 is configured to determine, according to the order of the energy of the voice information, a preset number of participants as a speaker.

The transmitter 1512 is specifically configured to: send a speech notification message to the participant participating in the multimedia conference, where the speech notification message carries the user ID of the speaker, so that the client of the participant acquires the speech voice sent by the multimedia conference server. information.

In another embodiment of the present invention, the receiver 1511 is further configured to receive a floor request message sent by the client, where the floor request message carries a user ID of the participant corresponding to the client.

The sender 1512 is further configured to send the floor request message to the client corresponding to the moderator, so that the client corresponding to the moderator determines, according to the floor request message, whether the participant sending the floor request message has Right to speak;

The receiver 1511 is further configured to receive a speech response message sent by the client corresponding to the host, and send a voice device open command to the client corresponding to the participant having the speaking permission, where the speech response message is The client corresponding to the person determines that the participant who sent the floor request message has a speaking right.

In the multimedia conference server provided by the embodiment, the client sent by the client obtains the voice information of the local participant and sends the voice message to the multimedia conference server. Then, the multimedia conference server forwards the voice message and the text message to the multimedia conference. The other participant corresponds to the client, so that the client corresponding to the other participant displays the received speech message and the speech message. With the device for implementing the multimedia conference provided by the present invention, the participant can both hear the voice information of the speaker and the corresponding text message, so that the participant can accurately understand the speaker by combining the text message and the voice message. The content of the speech has thus improved the communication effect of the multimedia conference.

The present invention also provides a multimedia conference system, including the client shown in FIG. 18 and the multimedia conference server shown in FIGS. 19-20.

In an embodiment of the present invention, the multimedia conference server is further configured to detect the energy of the voice information sent by the client participating in the multimedia conference, and determine the preset number according to the order of the energy from large to small. The participant is a spokesperson and sends a speech to the client corresponding to the spokesperson. Knowing the message, the speech notification message carries the user identification information ID of the speaker.

Corresponding to the apparatus for implementing a multimedia conference applied to a client shown in FIG. 16, the present invention further provides a client for implementing a multimedia conference. As shown in FIG. 21, the client includes a processor 1610 and a transmitter 1620.

The processor 1610 is configured to obtain the speaking voice information of the local participant.

The transmitter 1620 is configured to send the speech voice information to the multimedia conference server, so that the multimedia conference server converts the speech voice information into speech text information, and causes the multimedia conference server to use the speech voice information And sending the speech text information to the client of other participants participating in the multimedia conference

In the client for implementing the multimedia conference provided by the embodiment, the client of the participant obtains the voice message and sends the message to the multimedia conference server, and the multimedia conference server converts the voice message into the voice message, and then the voice message is The corresponding speech text information is sent to the client corresponding to other participants participating in the multimedia conference. In this way, the participants who participate in the multimedia conference can not only hear the voice information of the speaker's speech, but also can see the corresponding speech text information, can accurately understand the speaker's speech content, and improve the communication effect of the multimedia conference. The method converts the speech information into speech text information by the multimedia conference server, and does not need to integrate the speech recognition engine on each client, thereby reducing the production cost of the client.

The present invention further provides a multimedia conference server, as shown in FIG. 22, which includes a processor 1710 and a transmitter 1720.

The processor 1710 is configured to obtain the speech information sent by the client, and convert the speech information into speech text information.

The sender 1720 is configured to send the speaking voice information and the speaking text information to a client corresponding to another participant, so that the client corresponding to the other participant displays the speaking voice information and the speaking text. information.

The present invention also provides another multimedia conference system, including the client shown in FIG. 21 and the multimedia conference server shown in FIG.

The multimedia conference server is configured to convert the speech voice information into speech text information, and send the speech speech information and the speech text information corresponding to the speech speech information to a client corresponding to another participant; The other participant is a participant other than the participant who sent the speaking voice information among the participants participating in the multimedia conference.

In an embodiment of the present invention, the multimedia conference server is further configured to detect the energy of the voice information sent by the client participating in the multimedia conference, and determine the preset number according to the order of the energy from large to small. The participant is a speaker, and when the received speech voice information comes from the determined speaker, the speech voice information is converted into speech text information.

Through the description of the above method embodiments, those skilled in the art can clearly understand The present invention can be implemented by means of software plus a necessary general hardware platform, and of course hardware, but in many cases the former is a better implementation. Based on such understanding, the technical solution of the present invention, which is essential or contributes to the prior art, may be embodied in the form of a software product stored in a storage medium, including a plurality of instructions for causing a A computer device (which may be a personal computer, server, or network device, etc.) performs all or part of the steps of the methods described in various embodiments of the present invention. The foregoing storage medium includes various types of media that can store program codes, such as a read only memory (ROM), a random access memory (RAM), a magnetic disk, or an optical disk.

The various embodiments in the specification are described in a progressive manner, and the same or similar parts between the various embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for a device or system embodiment, since it is substantially similar to the method embodiment, the description is relatively simple, and the relevant portions can be referred to the description of the method embodiment. The apparatus and system embodiments described above are merely illustrative, wherein the units described as separate components may or may not be physically separate, and the components displayed as units may or may not be physical units, ie It can be located in one place or it can be distributed to multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the embodiment. Those of ordinary skill in the art can understand and implement without any creative effort.

The above is only a specific embodiment of the present invention, and it should be noted that those skilled in the art can also make several improvements and retouchings without departing from the principles of the present invention. It should be considered as the scope of protection of the present invention.

Claims

A method for implementing a multimedia conference, comprising:

The client obtains the voice information of the voice of the local participant, and sends the voice message of the voice to the multimedia conference server;

The client converts the spoken voice information into speech text information;

Sending, by the client, the speech text information to the multimedia conference server, so that the multimedia conference server sends the speech speech information and the speech text information to a client of another participant of the multimedia conference;

The other participant is a participant other than the local participant among the participants of the multimedia conference.
The method according to claim 1, wherein the client converts the spoken voice information into speech text information, including:

Receiving a speech notification message sent by the multimedia conference server, where the speech notification message carries a user identification information ID of the speaker, and the speaker is configured by the multimedia conference server according to the energy of the voice information sent by the participant participating in the multimedia conference, according to the a predetermined number of participants in the order of energy from large to small;

Determining whether the user ID carried in the speech notification message is the same as the user ID of the local participant;

If the user ID carried by the speech notification message is the same as the user ID of the local participant, the voice speech recognition engine converts the collected speech voice information into speech text information.
The method according to claim 1, wherein the client obtains speech information of the local participant, including:

Determining, by the client, whether the local participant has a speaking right;

If the local participant has the speaking right, the voice information of the local participant is collected by using the voice device.
The method according to claim 1, wherein the client obtains speech information of the local participant, including:

The client sends a speech request message to the multimedia conference server, where the speech request message carries the user ID of the local participant, so that the multimedia conference server sends the speech request message to the client corresponding to the host;

When the client receives the voice device open command sent by the multimedia conference server, the voice device is used to collect the voice information of the local participant; the voice device open command is received by the multimedia conference server. The corresponding client is generated according to the speech response message returned by the speech request message.
A method for implementing a multimedia conference, comprising:

The multimedia conference server acquires the speech speech information sent by the client and the speech text information corresponding to the speech speech information, and the speech speech information obtained by the client is converted by the speech recognition engine by using the speech recognition engine;

The multimedia conference server sends the speech voice information and the speech text information to the client corresponding to the other participant, so that the client corresponding to the other participant displays the speech voice information and the speech text information;

The other participant is a participant other than the participant who sends the speaking voice information and the speaking text information among the participants who participate in the multimedia conference.
The method of claim 5, further comprising:

The multimedia conference server detects energy of voice information sent by the client;

The multimedia conference server determines, according to the order of the energy from the largest to the smallest, a preset number of participants as a speaker;

The multimedia conference server sends a speech notification message to the client corresponding to the speaker, where the speech notification message carries the user identification information ID of the speaker, so that the client corresponding to the speaker obtains the speech. The person speaks the voice information and converts the spoken voice information into the spoken text information.
The method of claim 5, further comprising:

The multimedia conference server receives a speech request message sent by the client, where the speech request message carries a user ID of the participant corresponding to the client;

Sending, by the multimedia conference server, the spoofing request message to the client corresponding to the moderator, so that the client corresponding to the moderator determines, according to the spoofing request message, whether the participant who sends the sneak request message has the utterance authority;

The multimedia conference server receives the speech response message sent by the client corresponding to the host, and sends a voice device open command to the client corresponding to the participant having the speaking permission according to the speech response message, so that the participant with the speaking permission is enabled. Collect speech information;

The session response message is generated when the client corresponding to the moderator determines that the participant who sends the floor request message has the floor permission.
A method for implementing a multimedia conference, comprising:

The client obtains the voice information of the local participant's speech;

Sending, by the client, the speech voice information to the multimedia conference server, so that the multimedia conference server converts the speech voice information into speech text information, and causes the multimedia conference server to display the speech information and the speech message The text of the speech is sent to the client of other participants participating in the multimedia conference.

The other participant is a participant other than the local participant among the participants of the multimedia conference.
A method for implementing a multimedia conference, comprising:

The multimedia conference server obtains the voice information sent by the client;

The multimedia conference server converts the speech voice information into speech text information;

The multimedia conference server sends the speech voice information and the speech text information to a client corresponding to another participant, so that the client corresponding to the other participant displays the speech voice information and the speech text information. ;

The other participant is a participant other than the participant who sends the speaking voice information among the participants who participate in the multimedia conference.
The method according to claim 9, wherein the converting the speech information into the speech information by the multimedia conference server comprises:

The multimedia conference server detects the energy of the voice information sent by the client, according to the energy The order of large to small determines the pre-set number of participants as speakers;

The voice recognition engine is used to convert the voice information sent by the client corresponding to the determined speaker into the voice text information.
An apparatus for implementing a multimedia conference, which is used for a client, and includes:

An obtaining unit, configured to obtain speaking voice information of a local participant;

a converting unit, configured to convert the spoken voice information into speech text information;

a sending unit, configured to send the speech voice information and the speech text information to the multimedia conference server, so that the multimedia conference server sends the speech speech information and the speech text information to the multimedia conference Participant's client;

The other participant is a participant other than the local participant among the participants of the multimedia conference.
The device according to claim 11, wherein the obtaining unit comprises:

a first determining subunit, configured to determine whether the local participant has a speaking right;

The first collecting subunit is configured to: when the first determining unit determines that the local participant has a speaking right, collect the speaking voice information of the local participant by using a voice device.
The apparatus according to claim 11, wherein said converting unit comprises:

a first receiving subunit, configured to receive a speech notification message sent by the multimedia conference server, where the speech notification message carries a user identity information ID of the speaker, and the speaker is sent by the multimedia conference server according to the participant participating in the multimedia conference The energy of the voice information is determined according to the order of the energy from the largest to the smallest preset number of participants;

a second determining subunit, configured to determine whether a user ID carried by the floor notification message is the same as a user ID of the local participant;

a second collection subunit, configured to: when the second judgment subunit determines that the user ID carried by the utterance notification message is the same as the user ID of the local participant, use a voice device to collect the speech of the local participant voice message.
The device according to claim 11, wherein the obtaining unit specifically comprises:

a first sending subunit, configured to send a speech request message to the multimedia conference server, where the speech request message carries a user ID of the local participant, so that the multimedia conference server sends the speech request message to the host corresponding Client

a second receiving subunit, configured to receive a voice device open command sent by the multimedia conference server,

a third collection subunit, configured to: when the second receiving subunit receives the voice device open instruction, use a voice device to collect voice information of the local participant; the voice device open command is used by the multimedia The conference server receives the response response message returned by the client corresponding to the speaker request message.
An apparatus for implementing a multimedia conference, which is used in a multimedia conference server, and includes:

An acquiring unit, configured to obtain speech speech information sent by the client, and speech text information corresponding to the speech speech information, where the speech speech information is converted by the speech recognition engine obtained by the client by using a speech recognition engine;

a first sending unit, configured to send the speaking voice information and the speaking text information to a client corresponding to another participant, so that the client corresponding to the other participant displays the speaking voice information and the speaking text information ;

The other participant is a participant other than the participant who sends the speaking voice information and the speaking text information among the participants who participate in the multimedia conference.
The device according to claim 15, further comprising:

a detecting unit, configured to detect energy of voice information sent by the client;

a determining unit, configured to determine, according to the order of the energy from the largest to the smallest, a preset number of participants as a speaker;

a second sending unit, configured to send a speech notification message to the client corresponding to the speaker, where the speech notification message carries the user identification information ID of the speaker, so that the client corresponding to the speaker acquires the location The speaker's speech information is converted and the speech information is converted into speech text information.
The device according to claim 15, further comprising:

a first receiving unit, configured to receive a speech request message sent by the client, where the speech request message carries a user ID of the participant corresponding to the client;

a third sending unit, configured to send the floor request message to the client corresponding to the moderator, so that the client corresponding to the moderator determines, according to the floor request message, whether the participant who sends the floor request message has the speaking right ;

a second receiving unit, configured to receive a speech response message sent by the client corresponding to the host; the speech response message is determined by the client corresponding to the moderator, when the participant sending the speech request message has a speaking permission produce;

And a fourth sending unit, configured to send, according to the utterance response message, a voice device open command to a client corresponding to the participant having the utterance authority.
An apparatus for implementing a multimedia conference is applied to a client, and is characterized in that:

An obtaining unit, configured to obtain speaking voice information of a local participant;

a sending unit, configured to send the speech voice information to the multimedia conference server, so that the multimedia conference server converts the speech voice information into speech text information, and causes the multimedia conference server to Sending the spoken text information to the client of other participants participating in the multimedia conference

The other participant is a participant other than the local participant among the participants of the multimedia conference.
An apparatus for implementing a multimedia conference, which is applied to a multimedia server, and includes:

The obtaining unit is configured to obtain the speaking voice information sent by the client;

a converting unit, configured to convert the spoken voice information into speech text information;

a sending unit, configured to send the speaking voice information and the speaking text information to a client corresponding to another participant, so that the client corresponding to the other participant displays the speaking voice information and the speaking text information ;

The other participant is the participant who participates in the multimedia conference, except for sending the hair Participants other than the participants of the voice message.
The apparatus according to claim 19, wherein said converting unit comprises:

The detecting subunit is configured to detect the energy of the voice information sent by the client, and sequentially determine the preset number of participants as the speaker according to the order of the energy;

The conversion subunit is configured to convert the speech speech information sent by the determined speaker into the speech text information by using the speech recognition engine.
A multimedia conference system is provided, comprising: a client and a multimedia conference server;

The client is configured to obtain the speech information of the local participant and send the speech information to the multimedia conference server; and convert the speech information into speech text information, and send the speech text information to the multimedia conference server;

The multimedia conference server is configured to send the speech voice information and the speech text information to a client of another participant participating in the multimedia conference;

The other participant is a participant other than the local participant among the participants of the multimedia conference.
A multimedia conference system according to claim 21, wherein:

The multimedia conference server is further configured to detect the energy of the voice information sent by the client participating in the multimedia conference, and determine, according to the order of the energy, the preset number of participants is a speaker, and The client corresponding to the speaker sends a speech notification message, where the speech notification message carries the user identity information ID of the speaker;

The client is configured to receive a speech notification message sent by the multimedia conference server, and determine, according to the speech notification information, that the local participant is a speaker, obtain the speech information of the local participant, and send the speech message to the multimedia conference server. And converting the speech voice information into speech text information, and transmitting the speech text information to the multimedia conference server.
A multimedia conference system is provided, comprising: a client and a multimedia conference server;

The client is configured to obtain the voice information of the local participant and send it to the multimedia conference. Server

The multimedia conference server is configured to convert the speech voice information into speech text information, and send the speech speech information and the speech text information corresponding to the speech speech information to a client corresponding to another participant; And the other participant is a participant other than the participant who sends the speaking voice information among the participants who participate in the multimedia conference;

The client corresponding to the other participant is further configured to display the speaking voice information and the speaking text information sent by the multimedia conference server to the user.
A multimedia conferencing system according to claim 23, wherein:

The multimedia conference server is further configured to detect energy of voice information sent by a client participating in the multimedia conference, and determine, according to the order of the energy, a preset number of participants as a speaker, when receiving When the spoken speech information comes from the determined speaker, the speech speech information is converted into speech text information.