CN107911646B

CN107911646B - Method and device for sharing conference and generating conference record

Info

Publication number: CN107911646B
Application number: CN201610875451.9A
Authority: CN
Inventors: 初敏; 鄢志杰; 陈一宁
Original assignee: Alibaba Group Holding Ltd
Current assignee: Alibaba Group Holding Ltd
Priority date: 2016-09-30
Filing date: 2016-09-30
Publication date: 2020-09-18
Anticipated expiration: 2036-09-30
Also published as: CN112399133B; CN107911646A; CN112399133A

Abstract

The application discloses a method and a device for conference sharing and conference record generation, in the method, a cloud server can automatically and real-timely convert voice information of participating users into corresponding text information, and generate corresponding conference records according to the text information, therefore, compared with the prior art, the conference record work is not finished by special conference recorders any more, but the conference recorders are replaced by the cloud server, the burden of the conference recorders is greatly reduced, and convenience is brought to the conference recorders. Moreover, because the conference record generated by the cloud server is obtained based on the voice information of the participating users, in other words, the cloud server records every sentence of the speech of the participating users, and then obtains the corresponding conference record.

Description

Method and device for sharing conference and generating conference record

Technical Field

The application relates to the technical field of computers, in particular to a method and a device for sharing a conference and generating a conference record.

Background

With the rapid development of computer technology and network technology, the mode of the conference changes with the place covered, and at present, people do not need to gather participants in a unified conference room for meeting as usual, but can realize cross-region meeting through brand new modes such as teleconference, video conference and the like, thereby greatly enriching the meeting modes of people and bringing convenience to people.

For these completely new conference modes, the participating users usually need to use the terminal devices capable of supporting these conference modes to participate in the conference through these conference modes, for example, if some participating users want to develop a conference through the mode of video conference, the participating users participating in the conference need to use the terminal devices with video playing function and image capturing function to participate in the conference, and if some participating users originally about to participate in the conference do not have such terminal devices, the participating users will not be able to participate in the conference, thereby bringing inconvenience to the participating users.

Furthermore, in the prior art, both for teleconference and video conference, a conference recorder is often required to record the content of the conference, usually in a manner of manual note taking or typing. However, in practical applications, the content of the conference is often more, and if the conference recording is performed manually, great inconvenience is caused to the conference recorder who records the conference content. Moreover, in general, the conference content recorded by the conference recorder is often not the actual content of the conference, but the conference recorder simply records the conference content according to the understanding of the conference recorder, and due to the influence of subjective factors of the conference recorder, the conference content recorded by the conference recorder may deviate from the actual conference content, which may bring inconvenience to other people who subsequently check the conference record.

Disclosure of Invention

The embodiment of the application provides a conference sharing method, which is used for solving the problem that in the prior art, a participating user cannot participate because terminal equipment at the side of the participating user does not have functions required by a conference mode.

The embodiment of the application provides a conference sharing device, which is used for solving the problem that in the prior art, a participant user cannot participate due to the fact that terminal equipment at the side of the participant user does not have the functions required by a conference mode.

The embodiment of the application adopts the following technical scheme:

the embodiment of the application provides a conference sharing method, which comprises the following steps:

collecting voice information of each conference participant;

processing the voice information aiming at each voice information to obtain corresponding character information;

and sharing the text information to other conference participant users.

The embodiment of the application provides a device that meeting was shared, includes:

the voice acquisition module is used for acquiring voice information of each conference participant;

the voice processing module is used for processing the voice information aiming at each voice information to obtain corresponding character information;

and the character sharing module is used for sharing the character information to other conference participant users.

The embodiment of the application provides a method for generating a conference record, which is used for solving the problems that inconvenience is brought to a conference recorder by a mode of manually recording conference contents and deviation between the recorded conference contents and real conference contents occurs in the prior art.

The embodiment of the application provides a device for generating a conference record, which is used for solving the problems that in the prior art, inconvenience is brought to a conference recorder by a mode of manually recording conference contents, and deviation between the recorded conference contents and real conference contents occurs.

The embodiment of the application adopts the following technical scheme:

the embodiment of the application provides a method for generating a conference record, which comprises the following steps:

determining each participant user accessing the conference;

aiming at each participant user, collecting the voice information of the participant user and sending the voice information to other participant users;

converting voice information collected by each participating user into corresponding text information;

and generating a conference record according to the text information.

The embodiment of the application provides a device for generating a conference record, which comprises:

the determining module is used for determining each conference user accessed to the conference;

the acquisition and sending module is used for acquiring the voice information of each participating user and sending the voice information to other participating users;

the conversion module is used for converting the voice information collected by each participating user into corresponding text information;

and the generating module generates a conference record according to the text information.

The embodiment of the application adopts at least one technical scheme which can achieve the following beneficial effects:

in the embodiment of the application, the cloud server can convert the voice information of the participating users into corresponding text information in real time and share the text information to other participating users, so that even if some participating users do not have terminal equipment supporting a specified conference mode, the cloud server can send conference contents to the participating users in a simple text form, and the participating users can also participate in the conference. Moreover, in the embodiment of the application, the cloud server can automatically convert the voice information of the participating users into the corresponding text information in real time, and generate the corresponding conference records according to the text information, so that compared with the prior art, the conference records do not need to be completed by special conference recorders any more, but the cloud server replaces the conference recorders to perform, the burden of the conference recorders is greatly reduced, and convenience is brought to the conference recorders. Moreover, because the conference record generated by the cloud server is obtained based on the voice information of the conference participating users, in other words, the cloud server records every sentence of the speech of the conference participating users, and then obtains the corresponding conference record, compared with the prior art, the conference record generated by the cloud server in the embodiment of the application will not have deviation compared with the real content of the conference, and further brings convenience to the users who subsequently check the conference record.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:

fig. 1 is a process of conference sharing provided in an embodiment of the present application;

fig. 2 is a process for generating a meeting record according to an embodiment of the present disclosure;

fig. 3 is a schematic diagram of voice information collection and transmission provided in an embodiment of the present application;

fig. 4 is a schematic diagram of a conference record provided in an embodiment of the present application;

fig. 5 is a schematic view of a video conference with subtitles according to an embodiment of the present application;

fig. 6 is a schematic view of another video conference with subtitles according to an embodiment of the present application;

fig. 7 is a schematic view of a conference sharing apparatus according to an embodiment of the present disclosure;

fig. 8 is a schematic diagram of an apparatus for generating a meeting record according to an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the technical solutions of the present application will be described in detail and completely with reference to the following specific embodiments of the present application and the accompanying drawings. It should be apparent that the described embodiments are only some of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The technical solutions provided by the embodiments of the present application are described in detail below with reference to the accompanying drawings.

Fig. 1 is a process of conference sharing provided in an embodiment of the present application, which specifically includes the following steps:

s101: and collecting voice information of each conference participant.

In general, when a user participates in a teleconference or a video conference, the user usually needs to have a terminal device capable of supporting the conference so as to ensure that the user can smoothly participate in the conference, but in practical applications, the user may not have a terminal device capable of supporting a specified conference mode, for example, a camera of a tablet computer or a smart phone held by the user is broken, or a microphone or a receiver of the tablet computer or the smart phone is broken, or the user only carries an electronic reader such as a kiddle, which can be connected to the internet but cannot receive video information and voice information, so that for the situation, the user cannot participate in the specified conference.

For example, it is assumed that a user needs to participate in a video conference at the current time, but there is no terminal device such as a computer or a tablet computer for the user to support the video conference at the current time, and therefore, the user cannot participate in the video conference on time, which brings great inconvenience to the user.

In order to effectively solve the above problem, in the embodiment of the present application, the cloud server may convert the voice information of each participating user in the conference into corresponding text information, and share the obtained text information to each participating user, because the threshold for receiving the text information for each terminal device is relatively low, that is, most terminal devices may receive the text information, even if the participating user does not have a terminal device supporting the designated conference mode, the participating user may also receive the text information in the conference shared by the cloud server through the terminal device held by the current user, so as to participate in the conference. In order to receive text information corresponding to conference content shared by the cloud server, each participating user needs to log in the cloud server to a terminal device used by the participating user and access the terminal device to a designated conference, and correspondingly, the cloud server needs to determine each participating user accessing the conference in advance in order to ensure that each text information obtained by subsequent conversion can be shared to each participating user, for example, each participating user accessing the conference is determined through an account number based on which each participating user accesses the conference.

In the embodiment of the present application, the cloud server may collect the voice information sent by each participating user through each participating terminal, and further convert the voice information of each participating user into corresponding text information in a subsequent process.

It should be noted that, when some participating users need to perform a video conference, the cloud server may collect, in addition to the voice information of each participating user, video information sent by each participating user through each participating terminal, and further forward each video information to each participating user in a subsequent process, so that the participating users having a video playing function at the side can see the video information of other participating users sent by the cloud server.

S102: and processing the voice information aiming at each voice information to obtain corresponding character information.

In order to enable a participating user who only holds terminal equipment supporting a text receiving function to participate in a conference, in this embodiment of the application, after acquiring voice information of each participating user, a cloud server may process the voice information for each voice information to obtain text information corresponding to the voice information, where a specific implementation manner of the cloud server converting the voice information into text information may be: and inputting the voice information into a pre-trained semantic recognition model so as to obtain character information corresponding to the voice information.

S103: and sharing the text information to other conference participant users.

The cloud server collects voice information sent by each participant through respective participant terminals in a conference, and then can convert the voice information into corresponding text information in real time, and the cloud server can share the obtained text information with other participant users, so that even some participant users do not have terminal equipment with video information receiving and voice information functions, the participant users can also participate by receiving the text information forwarded by the cloud server, and great convenience is brought to the participant users. The conference mentioned here may be a conference in which a plurality of persons discuss, or a conference similar to a lecture.

For example, in a videoconference, each participant user participating in the videoconference speaks in the videoconference, that is, the video conference is a conference discussed with each other, after the cloud server determines each participating user participating in the video conference, in the video conference process, voice information sent by each participating user through respective terminal equipment needs to be collected in real time, and each voice information is converted into corresponding text information, then, the cloud server can share each text message to each participating user participating in the video conference, thus, even if some participating users do not have terminal equipment for supporting the video conference, the participating users can also participate in the discussion of the video conference through the terminal equipment capable of receiving the text information, thereby bringing great convenience to the participating users.

For another example, a plurality of participant users participate in a video conference, wherein one of the participant users has a conference speaker, and the other participant users do not participate in the speech of the current participant user, that is, one participant user is responsible for explaining the conference content, and the other participant users only need to listen (similar to a conference like a lecture).

Furthermore, after the cloud server converts the collected voice information into corresponding text information in real time, the collected voice information and the corresponding voice information of the text information can be synchronously forwarded to the participating users, so that the participating users of some terminal devices with voice receiving functions can receive the text information forwarded by the cloud server and corresponding to the conference content and can also receive the synchronous voice information, and the participating users can further know the conference content according to the voice information and the corresponding text information sent by the cloud server.

Moreover, for the video conference, the current video conference usually only transmits corresponding video pictures (i.e. video information) and voice information, and in the whole video conference, the voice information of each participating user is usually not matched with corresponding subtitles, and the subtitles are played into the video pictures to be presented to each participating user, which may result in that the participating users may not understand the conference content well.

In the embodiment of the application, the cloud server can convert the voice information of each participating user into corresponding text information, so that after the video information and the voice information of each participating user participating in the video conference are acquired by the cloud server, the acquired voice information can be converted into corresponding text information, and the text information is used as a subtitle of the acquired video information and is forwarded to other participating users. Therefore, some participating users with video playing function terminals can see the video pictures of other participating users through the cloud server and can also see corresponding subtitles in the video pictures, and therefore the understanding of the participating users on the conference contents is further improved.

According to the method, the cloud server can convert the voice information of each participant collected in the conference into the corresponding text information and share the obtained text information to other participants, so that even if some participants do not have terminal equipment supporting specified conferences such as a teleconference, a video conference and the like, the participants can use the terminal equipment supporting text receiving to participate, the threshold of each participant for participating is greatly reduced, and great convenience is brought to each participant.

It should be noted that, in this embodiment of the application, when a participating user participates in a conference by using a terminal device supporting text reception, in addition to receiving text information corresponding to voice information sent by other participating users through a cloud server, the participating user can edit an utterance which the participating user wants to publish in the conference into corresponding text information through a terminal and send the text information to the cloud server, and the cloud server can share the text information sent by the participating user to other participating users, can match the text information with corresponding voice information, and then synchronously send the voice information and the text information to other participating users, where the cloud server converts the text information into corresponding voice information and can convert the text information into an artificial voice information through a voice synthesis technology.

The cloud server can share the text information obtained by converting the voice information (the voice information is the voice information of each participating user collected in the conference) to each participating user in real time, and can also arrange the converted text information according to the time sequence after the conference is finished, and share the arranged text information to each participating user, so that each participating user can further and deeply know the conference.

It should be further noted that, in the embodiment of the present application, the cloud server may share, in addition to the text information obtained by converting the voice information (the voice information is the voice information of each participating user collected in the conference), the text information to each participating user participating in the conference, the text information may also be shared to other users not participating in the conference. Specifically, after determining each participant user who accesses the conference, the cloud server can send an option whether to allow disclosure of the conference content to each participant user, and if the cloud server determines that more than half of the participant users allow disclosure of the conference content according to the option selected by each participant user, the cloud server can convert the voice information of each participant user collected in the conference into corresponding text information and share the text information to a chat group common to each participant user, so that other users who do not participate in the conference in the group can also know the conference content through the text information shared by the cloud server, and a better discussion atmosphere is created for the topic discussed in the conference.

In the prior art, in both a teleconference and a video conference, a conference record generated after the conference is ended is usually completed by a designated conference recorder, that is, the conference record is manually performed. In the process of meeting, the speed of speech of each participant is different, and the speed of speech of some participants in the meeting process is often too fast, so that great obstacles are brought to the meeting recording work of the meeting recorder. Moreover, because the conference records are manually recorded by the conference recorder and are subject to a large amount of subjective factors of the conference recorder, the content of the conference records recorded by the conference recorder is often different from the real conference content, which brings inconvenience to other people who subsequently check the conference records.

In order to avoid the above situation, in the method, the cloud server may convert the collected voice information of each conference-participating user into corresponding text information, and generate a corresponding conference record based on the obtained text information, thereby greatly simplifying the work of conference recorders. The specific procedure will be explained in detail below.

Fig. 2 is a process of generating a conference record according to an embodiment of the present application, which specifically includes the following steps:

s201: and determining each participant user accessing the conference.

In practical applications, when a user participates in a conference by way of a teleconference, a video conference, etc., the user usually needs to log in a corresponding conference system through a terminal device with a conversation function, and then in the conference process, the user transmits information such as own voice, video, etc. to other participating users through the terminal device and the conference system. Because the embodiment of the present application intends to generate the conference record through the cloud server, in the embodiment of the present application, when participating in a conference, the participating user needs to log in the cloud server through the terminal, and accordingly, the cloud server can determine each participating user accessing to the conference, wherein, in order to ensure that the finally generated conference record does not deviate, the cloud server can further determine the user identifier of each participating user, and thus, when subsequently acquiring the voice information of each participating user, the cloud server can further determine which participating user the acquired voice information comes from.

S202: and aiming at each participant user, collecting the voice information of the participant user, and sending the voice information to other participant users.

In order to enable the finally generated conference recording to be free from deviation from the real content of the conference, the cloud server may generate the conference recording based on the voice information of the participating users. Therefore, in the embodiment of the present application, after determining each participating user accessing the conference, the cloud server may collect, for each participating user, the voice information of the participating user, and send the voice information to other participating users, as shown in fig. 2.

Fig. 3 is a schematic diagram of voice information collection and transmission provided in the embodiment of the present application.

In fig. 3, in the conference process, each participating user may send its own voice information to the cloud server through the terminal device, and accordingly, the cloud server needs to collect the voice information transmitted by each terminal device in order to subsequently generate a conference record of the conference, and forward the collected voice information to other participating users. The cloud server can perform certain noise reduction processing on the collected voice information and send the processed voice information to other participating users in order to ensure the call quality of the whole conference.

It should be noted that, in this embodiment of the application, the cloud server may store the collected voice information for use in a subsequent process of verifying the generated meeting record, where the verification method of the meeting record may be that, after the cloud server generates the meeting record, the cloud server may play each voice information corresponding to the stored meeting record according to the collection time of each voice information, convert each played voice information into each text information, and then, the cloud server may compare each text information with the generated meeting record, thereby verifying the generated meeting record. Certainly, the cloud server can perform multiple checks on the generated conference record according to the stored voice information, so as to further ensure the accuracy of the conference record.

S203: and converting the voice information collected by each participating user into corresponding text information.

Because the conference record is generally in a text form, when acquiring each voice message, the cloud server can convert each voice message into corresponding text message, wherein in the embodiment of the application, the cloud server can convert the voice message by memorizing a neural network (BLSTM) through a preset Bidirectional Long Short Term Memory, and the specific implementation process can be that, for each voice message, the cloud server can extract some specified parameters from the voice message and input the parameters as input parameters of a preset BLSTM model, and the preset BLSTM model can output the text message corresponding to the voice message according to the input parameters.

The model that can convert speech information into text information may be a speech recognition model other than the BLSTM model. And the BLSTM model usually needs a large number of training samples to train before implementing conversion to the voice information, therefore, in this embodiment of the application, the cloud server can train the BLSTM model in advance, a specific training process can be that the cloud server can collect a large number of voice sample information and text sample information corresponding to each voice sample information in advance, then, the cloud server can input the voice sample information into the preset BLSTM model for each voice sample information, and obtain text information corresponding to the voice sample information through the BLSTM model, the cloud server can compare the text sample information corresponding to the voice information after obtaining the text information corresponding to the voice sample information, and adjust the BLSTM model according to the compared result. The BLSTM trained by a large amount of voice sample information can accurately convert the voice information into corresponding text information, and further provides service for subsequent conference record generation work.

S204: and generating a conference record according to the text information.

The cloud server may generate the conference record based on each text information after converting each voice information into each text information. Specifically, in practical applications, in addition to recording the conversation content of each participating user, it is usually necessary to record which participating user each sentence originates from in the conference record, in other words, each sentence recorded in the conference record should correspond to each participating user. Therefore, in this embodiment of the application, before generating a conference record, the cloud server may further determine a user identifier corresponding to each piece of voice information, that is, determine which participating user each piece of voice information comes from, where the user identifier may be a user account, or a real name, a nickname, and the like of the participating user, and then, for each user identifier and each piece of voice information corresponding to each user identifier, the cloud server may determine the user identifier as the user identifier of the text information corresponding to the voice information, that is, determine which participating user the text information corresponding to the voice information should come from. And then, the cloud server can integrate the text information according to the determined user identifications corresponding to the text information, so as to generate a conference record.

For example, suppose that in a conference, the cloud server determines that each piece of voice information in the conference respectively comes from the participating user a, the participating user B, and the participating user C, and correspondingly, the cloud server can further determine that each piece of text information corresponding to each piece of voice information also comes from the three participating users, so that when the conference record is generated by the cloud server, the converted text information can be classified, the text information belonging to the same participating user is integrated together to obtain the conference record of each participating user, and then, the cloud server can further integrate the conference records of each participating user, and finally obtain the conference record of the conference.

It should be noted that the above-mentioned text information integration method is not unique, and other integration methods may be used, which are not described in detail herein.

In practical applications, each sentence in the conference record is usually sorted according to the time sequence, and therefore, in this embodiment of the application, before the conference record is generated, the cloud server may also determine the acquisition time corresponding to each piece of voice information, that is, it is determined when each piece of voice information is sent from the mouth of the participating user, and then, for each determined acquisition time and each piece of voice information corresponding to each acquisition time, the cloud server may determine the acquisition time as the generation time of the text information corresponding to the piece of voice information, that is, it is determined when the participating user speaks the text information from the mouth. After determining the generation time of each text message, the cloud server can integrate each text message according to each generation time, and further generate a conference record.

For example, suppose that the cloud server acquires 7 pieces of voice information in a conference, and the cloud server can convert the 7 pieces of voice information into corresponding text information respectively. Before generating the conference record, the cloud server may further determine the collection time of the 7 pieces of voice information, where the collection time of each piece of voice information is shown in table 1.

Voice information	Time of acquisition
		Voice information A	13:02:13
Speech information B	13:00:02
		Voice information C	13:01:08
Voice information D	13:03:24
		Speech information E	13:01:45
Speech information F	13:04:21
		Voice information G	13:03:08

TABLE 1

After the cloud server determines the collection time of each piece of voice information shown in table 1, the collection time may be used as the generation time of each piece of text information corresponding to each piece of voice information, as shown in table 2.

Text information	Generating time
		Text information A	13:02:13
Text information B	13:00:02
		Character information C	13:01:08
Text information D	13:03:24
		Text information E	13:01:45
Text message F	13:04:21
		Character information G	13:03:08

TABLE 2

After determining the generation time of each text message shown in table 2, the cloud server may sequence and integrate the 7 text messages in sequence according to the time sequence of each generation time, so as to obtain a corresponding meeting record, where the sequence of each text message in the meeting record is text message B, text message C, text message E, text message a, text message G, text message D, and text message F.

It should be noted that the manner of generating the conference record according to the determined generation time of each text message in the above example is not unique, and is not described herein.

In addition, in order to enable the generated meeting record to be more accurate and clear in content, in the embodiment of the application, the cloud server can also respectively determine the user identifier and the generation time of each text message according to the user identifier and the acquisition time of each voice message, and further integrate each text message according to the user identifier and the generation time of each text message to generate the corresponding meeting record. Specifically, after the cloud server determines the user identifier and the generation time of each text message in the above manner, the text messages may be arranged according to the time sequence of the generation time, and then the determined user identifiers are used to mark the text messages in sequence, so as to obtain the meeting record.

For example, continuing with the above example, assume that the above-mentioned 7 text messages actually come from 4 participating users, as shown in table 3.

Participating users	Text information
		Participant user a	Character information B and character information F
Participant b	Character information C and character information G
		Participant user c	Text information A
Participant user d	Character information D and character information E

TABLE 3

In the process of generating the meeting record, the cloud server may sort the 7 pieces of text information according to the generation time of each piece of text information shown in table 2, and then sequentially mark and integrate the 7 pieces of text information through the determined user identifier, thereby finally generating the meeting record shown in fig. 3.

Fig. 4 is a schematic diagram of a conference record provided in an embodiment of the present application.

In fig. 4, the time of the upper right corner of the conference record may be generated by the cloud server according to the acquisition time of the first voice information and the acquisition time of the last voice information during the conference, and of course, the time of the conference may also be represented by other manners, for example, the acquisition times of the voice information are added and averaged, and the obtained result is used as the time of the conference record. The time representation is not unique, and the acquisition time of the first voice information may be used as the time of the conference record, or the acquisition time of the last voice information in the conference may be used as the time of the conference record, which is not limited specifically herein.

The date of the lower right corner of the conference record in fig. 4 can be determined by the system time of the cloud server, the conference theme of the conference record in fig. 4 can be sent to the cloud server by the conference user through the terminal device, and correspondingly, the cloud server can take the conference theme drafted by the conference user as the conference theme of the conference record.

After the cloud server generates the conference record, the conference record can be stored and sent to the terminals of the participating users in the forms of mails, in-station messages and the like, and besides, the participating users can log in the cloud server through the terminals and download the conference record from the cloud server.

According to the method, the cloud server can collect the voice information sent by the participating users through the terminal equipment in the conference process, convert the collected voice information into the corresponding text information in real time, and generate the corresponding conference record according to the obtained text information. Therefore, the cloud server can replace a conference recorder to record conference content in real time in the process of a conference and generate corresponding conference records after the conference is finished, so that the workload of the conference recorder is greatly reduced, and the conference records generated by the cloud server are generated based on the voice information of each participating user in the process of the conference, so that the conference records finally generated by the cloud server after the conference is finished are consistent with the real conference content, and the misleading of the conference records of subsequent users in the process of referring to the conference records due to the inconsistency between the conference records and the real conference content is avoided.

Currently, video conferences are rapidly developed, people gradually replace the traditional conference mode in a video conference mode, and in the current video conferences, the participating users only see videos and voices of other participating users, but do not see corresponding subtitle information on video pictures. However, in practical applications, a network on which a video conference is based may sometimes be unstable, and the instability of the network may affect transmission of voice information in the video conference, so that a participating user may not hear the voice information of other participating users in the video conference.

In order to avoid the above problem, in the embodiment of the present application, after converting the collected voice information of the participating users into corresponding text information, the cloud server may also send the text information to other participating users as a subtitle of video information in the video conference, so that the participating users can further understand the real content of the conference through the subtitle in the video conference.

Specifically, for the case of the video conference, in step S102, the cloud server may collect the voice information of each participating user and also collect the video information of each participating user through the video frame transmitted by the terminal device of the participating user, and send the voice information and the video information to other participating users at the same time.

When the cloud server converts the collected voice information into corresponding text information, the text information can be used as the subtitle of the video information to be sent to other participating users, and therefore the video pictures with the subtitle and the voice information can be seen by the other participating users. The subtitle may be in the form shown in fig. 4.

Fig. 5 is a schematic view of a video conference with subtitles according to an embodiment of the present application.

In fig. 5, the video information of each of the other participating users is displayed on the terminal screen in the terminal device of each of the participating users, wherein for the video picture of each of the participating users on the screen, the cloud server, after acquiring the voice information and the video information of the participating user, can convert the voice confidence into corresponding text information, and insert the text information into the video information as a subtitle of the video information, that is, it is equivalent to fuse the text information into the video information to obtain the video information with the subtitle, and then the cloud server can synchronously send the voice information and the video information with the subtitle to the terminal devices of the other participating users, and for each of the participating users, the terminal device of the participating user can display the video picture and the voice information with the subtitle of the other participating users on the terminal screen respectively, for the participating user to view.

In practical applications, a plurality of participants may be present in a video frame, and in order to distinguish voice information of different participants, in this embodiment of the present application, before acquiring voice information of each participant, the cloud server may acquire voiceprint information and facial feature information of the participant for each participant, and store the voiceprint information and facial feature information in correspondence with a determined user identifier (user name) of the participant, and when acquiring voice information of a participant, the cloud server may perform voiceprint analysis on the acquired voice information or analyze facial information in the video frame to determine which participant the voice information belongs to, in this way, after the subsequent cloud server converts the voice information into corresponding text information, the text information can be matched with the user identifier (user name) and displayed together as a subtitle of the video information, as shown in fig. 6.

Fig. 6 is a schematic view of another video conference with subtitles according to an embodiment of the present application.

In fig. 6, after acquiring the voice information of each participant user in the video image, the cloud server performs voice print analysis on each voice information through the previously acquired voice print information and the determined user identifier to determine from which participant user the voice information respectively comes, and of course, the cloud server may also determine the affiliation of each voice information through the previously acquired facial feature information, specifically, after acquiring the voice information of each participant user, the cloud server may determine, for each voice information, which participant user in the video information has a speaking action and acquires the facial information of the participant user when acquiring the voice information and the video information, and then the cloud server may further determine which participant user the facial information belongs to through the previously acquired facial feature information, and further in a subsequent process, and matching the character information obtained by converting the voice information with the user identification of the participating user, and displaying the character information and the user identification as the caption of the participating user in the video information. When adding subtitles to the video information, the cloud server may insert the subtitles into the video information with a certain effect, for example, in fig. 6, the subtitles are displayed in a form of a dialog box above the tops of participating users in a video picture.

It should be noted that, when the cloud server collects a plurality of pieces of voice information at the same time in the process of collecting the voice information, the cloud server may determine the attribution of each piece of voice information through the previously collected voiceprint information and facial feature information, and display each piece of text information obtained by converting each piece of voice information in the form of a subtitle in the video information in a manner in cooperation with each user identifier.

It should be further noted that the above method can be used in video conferencing, as well as in live webcasting, and converts voice information of the webcast into corresponding text information by collecting the voice information of the webcast, and generates live webcasting live broadcast records according to the text information and adds corresponding subtitles in live webcasting, wherein when adding subtitles in a live webcasting picture, a delay time of live webcasting can be utilized, and in the delay time, the subtitles generated according to the text information are added to the live webcasting picture, and the live webcasting picture with the subtitles is presented to viewers.

Based on the same idea, the method for generating a meeting record provided by the embodiment of the present application further provides a device for meeting sharing and a device for generating a meeting record, which are shown in fig. 7 and 8, respectively.

Fig. 7 is a schematic view of a conference sharing apparatus provided in an embodiment of the present application, which specifically includes:

a voice collecting module 701 for collecting voice information of each conference participant;

a voice processing module 702, which processes each voice message to obtain corresponding text messages;

and the text sharing module 703 is configured to share the text information to other conference participant users.

The text sharing module 703 is configured to synchronously share the text information and the voice information corresponding to the text information to other conference participant users.

The voice collecting module 702 collects each voice message of each conference participant and each video message corresponding to each voice message;

the text sharing module 703 shares the text information as a subtitle of the video information to other conference participant users.

Fig. 8 is a schematic diagram of an apparatus for generating a conference record according to an embodiment of the present application, which specifically includes:

a determining module 801, configured to determine each participant user accessing a conference;

the acquisition and sending module 802 is used for acquiring the voice information of each participating user and sending the voice information to other participating users;

the conversion module 803 converts the voice information collected by each participating user into corresponding text information;

and a generating module 804 for generating a conference record according to the text information.

The conversion module 803 inputs the speech information collected by each participating user into a preset bidirectional long-time memory neural network BLSTM model to obtain the text information corresponding to each speech information.

The generating module 804 determines the user identifier corresponding to each piece of voice information, determines the user identifier as the user identifier of the text information corresponding to each piece of voice information for each user identifier and each piece of voice information corresponding to each user identifier, and integrates each piece of text information according to each user identifier corresponding to each piece of text information to generate a conference record; and/or

And determining the acquisition time corresponding to each voice message, determining the acquisition time as the generation time of the text message corresponding to each voice message aiming at each acquisition time and each voice message corresponding to each acquisition time, and integrating each text message according to each generation time corresponding to each text message to generate a conference record.

The device further comprises:

the sending module 805, when it is detected that the conference is finished, sends the conference record to each participating user.

The collecting and sending module 802 collects the voice information and the video information of each participating user;

the acquisition and sending module 802 sends the voice information and the video information to other participating users;

the conversion module 803 converts the voice information collected by each participating user into corresponding text information in real time, and sends the text information to other participating users as the caption of the video information.

The embodiment of the application provides a method and a device for conference sharing and conference record generation, wherein a cloud server can convert voice information of participating users into corresponding text information in real time and share the text information to other participating users, so that even if some participating users do not have terminal equipment supporting a specified conference mode, the cloud server can send conference contents to the participating users in a simple text form, and the participating users can also participate in the conference. Moreover, in the embodiment of the application, the cloud server can automatically convert the voice information of the participating users into the corresponding text information in real time, and generate the corresponding conference records according to the text information, so that compared with the prior art, the conference records do not need to be completed by special conference recorders any more, but the cloud server replaces the conference recorders to perform, the burden of the conference recorders is greatly reduced, and convenience is brought to the conference recorders. Moreover, because the conference record generated by the cloud server is obtained based on the voice information of the conference participating users, in other words, the cloud server records every sentence of the speech of the conference participating users, and then obtains the corresponding conference record, compared with the prior art, the conference record generated by the cloud server in the embodiment of the application will not have deviation compared with the real content of the conference, and further brings convenience to the users who subsequently check the conference record.

It should be noted that the execution subjects of the steps of the method provided in embodiment 1 may be the same device, or different devices may be used as the execution subjects of the method. For example, the executing agent of step S201 and step S202 may be a cloud server, and the executing agent of step S203 may be a conversion unit in the cloud server; for another example, the execution subject of step 201 may be a terminal device, and the execution subjects of step 202 and step 203 may be a cloud server; and so on.

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.

Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The above description is only an example of the present application and is not intended to limit the present application. Various modifications and changes may occur to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the scope of the claims of the present application.

Claims

1. A conference sharing method is applied to a server and comprises the following steps:

in the video conference process, collecting each voice information of each conference participant;

sharing the text information to other conference participant users, including:

if the terminal equipment of other conference participant users only supports the character receiving function, the character information is synchronously sent to the terminal equipment;

if the terminal equipment of other conference participant supports the voice receiving function, the text information and the voice information corresponding to the text information are synchronously shared to the terminal equipment supporting the voice receiving function;

and if the terminal equipment of other conference participant users supports the video playing function, the text information is synchronously shared to the terminal equipment supporting the video playing function as the subtitle of the video information.

2. The method of claim 1, wherein collecting voice information of each conference participant comprises:

the cloud server collects each voice message of each conference participant and each video message corresponding to each voice message.

3. A method for generating a conference record, applied to a server, comprises:

determining each participant user accessing the conference;

if the terminal equipment of other conference participant users supports the video playing function, the text information is synchronously shared to the terminal equipment supporting the video playing function as the subtitle of the video information;

wherein, sending the voice message to other participating users further comprises: if the terminal equipment of other conference participant supports the voice receiving function, the text information and the voice information corresponding to the text information are synchronously shared to the terminal equipment supporting the voice receiving function;

and generating a conference record according to the text information.

4. The method of claim 3, wherein converting the voice information collected for each participating user into corresponding text information specifically comprises:

and inputting the voice information collected by each participating user into a preset bidirectional long-time memory neural network BLSTM to obtain character information corresponding to each voice information.

5. The method of claim 3, wherein generating a meeting record according to the text information specifically comprises:

determining user identifications corresponding to the voice information, determining the user identifications as the user identifications of the text information corresponding to the voice information aiming at each user identification and each voice information corresponding to each user identification, and integrating the text information according to the user identifications corresponding to the text information to generate a conference record; and/or

6. The method of claim 3, wherein the method further comprises:

and when the meeting is monitored to be ended, sending the meeting record to each participating user.

7. The method as claimed in claim 3, wherein collecting the voice information of each participating user comprises:

and aiming at each participating user, collecting voice information and video information of the participating user.

8. A conference sharing device is applied to a server and comprises:

the voice acquisition module is used for acquiring voice information of each conference participant in the video conference process;

the text sharing module shares the text information to other conference participant users, and comprises: if the terminal equipment of other conference participant users only supports the character receiving function, the character information is synchronously sent to the terminal equipment;

9. The apparatus of claim 8, wherein the voice collecting module collects voice information of each conference participant and video information corresponding to each voice information.

10. An apparatus for generating a conference record, applied to a server, comprising:

the conversion module is used for converting the voice information collected by each conference participating user into corresponding text information and synchronously sending the text information to the terminal equipment if the terminal equipment of other conference participating users only supports the text receiving function;

if the terminal equipment of other conference participant supports the video playing function, the text information is synchronously shared to the terminal equipment supporting the video playing function as the subtitle of the video information, wherein the sending of the voice information to other conference participant further comprises: if the terminal equipment of other conference participant supports the voice receiving function, the text information and the voice information corresponding to the text information are synchronously shared to the terminal equipment supporting the voice receiving function;

11. The apparatus of claim 10, wherein the conversion module inputs the collected voice information for each participating user into a BLSTM model of a preset bi-directional long-and-short-term memory neural network to obtain text information corresponding to each voice information.

12. The apparatus according to claim 10, wherein the generating module determines a user identifier corresponding to each piece of voice information, determines, for each user identifier and each piece of voice information corresponding to each user identifier, the user identifier as a user identifier of text information corresponding to the voice information, and integrates the text information according to the user identifiers corresponding to the text information to generate the meeting record; and/or

13. The apparatus of claim 10, wherein the apparatus further comprises:

and the sending module is used for sending the conference record to each participant when the conference end is monitored.

14. The apparatus of claim 10, wherein the collecting and sending module collects, for each participating user, voice information and video information of the participating user;

and the acquisition and sending module is used for sending the voice information and the video information to other participating users.