CN106657865B

CN106657865B - Conference summary generation method and device and video conference system

Info

Publication number: CN106657865B
Application number: CN201611170590.8A
Authority: CN
Inventors: 张雅; 辛玉军
Original assignee: Lenovo Beijing Ltd
Current assignee: Lenovo Beijing Ltd
Priority date: 2016-12-16
Filing date: 2016-12-16
Publication date: 2020-08-25
Anticipated expiration: 2036-12-16
Also published as: CN106657865A

Abstract

The invention discloses a method and a device for generating a conference summary and a conference system, wherein the method comprises the following steps: obtaining biological characteristic information of a speaker; the biological characteristic information comprises voiceprint information and image information; and identifying the identity of the speaker according to the biological characteristic information of the speaker, and displaying the identity information of the speaker on a display interface. According to the technical scheme of the embodiment of the invention, when a video conference is carried out, voice data of a speaker can be converted into text data, identity information of the speaker is identified, and the text data and the identity information are combined to form a conference summary, so that participants can know not only the identity information of the speaker but also the speech content of each participant, and the efficiency of the video conference is improved.

Description

Conference summary generation method and device and video conference system

Technical Field

The invention relates to the technical field of computers, in particular to a method and a device for generating a conference summary and a video conference system.

Background

A video conference system refers to a system device for transmitting audio, video and file data to each other through a transmission line and multimedia devices for individuals or groups in two or more different places, so as to realize real-time and interactive communication and achieve the purpose of conference.

However, in the video conference system, when a video conference is performed among multiple parties, especially when multiple persons participate in the conference in one conference room and speak, an actual conference picture cannot be focused on an actual speaker, and other parties participating in the conference cannot see the behavior and expression of the speaker clearly in real time and cannot know the identity of the speaker, so that communication among the parties participating in the conference is affected, and the effect of the video conference is affected.

Disclosure of Invention

In view of the above, the present invention provides a method and an apparatus for generating a conference summary, which can assist in identifying the identity of a speaker according to the voice of the speaker, and a video conference system.

In order to achieve the above object, an embodiment of the present invention provides a method for generating a conference summary, including:

acquiring voice data of a speaker;

converting the voice data of the speaker into character data;

and combining the identity information of the speakers to form a conference summary.

The invention also provides a device for generating the conference summary, which comprises:

the acquisition module is configured to acquire voice data of a speaker;

the conversion module is configured to convert the voice data of the speaker into text data;

a processing module configured to form a conference summary in conjunction with the identity information of the speaker.

The invention also provides a video conference system comprising the device.

The technical scheme provided by the embodiment of the invention can be seen that when the video conference is carried out, the voice data of the speaker can be converted into the text data, the identity information of the speaker can be identified, and the text data and the identity information are combined to form a conference summary, so that the participants can know the identity information of the speaker and the speech content of each participant, and the efficiency of the video conference is improved.

Drawings

FIG. 1 is a flow chart of a method of generating a conference summary of one embodiment of the present invention;

FIG. 2 is a schematic diagram of the formation of a conference summary in conjunction with speaker identity information in accordance with one embodiment of the present invention;

FIG. 3 is a flow chart of a method for generating a conference summary according to one embodiment of the present invention;

FIG. 4 is a diagram illustrating identification of a speaker based on the speaker's voiceprint in accordance with one embodiment of the present invention;

FIG. 5 is a diagram illustrating identification of a speaker based on an image of the speaker according to one embodiment of the present invention;

FIG. 6 is a flow chart of a method of generating a conference summary of another embodiment of the present invention;

FIG. 7 is a flow chart of a method of generating a conference summary of yet another embodiment of the present invention;

FIG. 8 is a flow chart of a method of generating a conference summary of yet another embodiment of the present invention;

fig. 9 is a schematic diagram of a conference summary generation apparatus according to an embodiment of the present invention;

FIG. 10 is a schematic diagram of a conference summary generation apparatus according to one embodiment of the present invention;

fig. 11 is a schematic diagram of a conference summary generating apparatus according to another embodiment of the present invention;

fig. 12 is a schematic diagram of a conference summary generating apparatus according to still another embodiment of the present invention;

fig. 13 is a schematic diagram of a conference summary generation apparatus according to still another embodiment of the present invention.

Detailed Description

Various aspects and features of the disclosure are described herein with reference to the drawings.

It will be understood that various modifications may be made to the embodiments disclosed herein. Accordingly, the foregoing description should not be construed as limiting, but merely as exemplifications of embodiments. Other modifications will occur to those skilled in the art within the scope and spirit of the disclosure.

The accompanying drawings, which are incorporated in and constitute a part of the specification, illustrate embodiments of the disclosure and, together with a general description of the disclosure given above, and the detailed description of the embodiments given below, serve to explain the principles of the disclosure.

These and other characteristics of the invention will become apparent from the following description of a preferred form of embodiment, given as a non-limiting example, with reference to the accompanying drawings.

It should also be understood that, although the invention has been described with reference to some specific examples, a person of skill in the art shall certainly be able to achieve many other equivalent forms of the invention, having the characteristics as set forth in the claims and hence all coming within the field of protection defined thereby.

The above and other aspects, features and advantages of the present disclosure will become more apparent in view of the following detailed description when taken in conjunction with the accompanying drawings.

Specific embodiments of the present disclosure are described hereinafter with reference to the accompanying drawings; however, it is to be understood that the disclosed embodiments are merely examples of the disclosure that may be embodied in various forms. Well-known and/or repeated functions and structures have not been described in detail so as not to obscure the present disclosure with unnecessary or unnecessary detail. Therefore, specific structural and functional details disclosed herein are not to be interpreted as limiting, but merely as a basis for the claims and as a representative basis for teaching one skilled in the art to variously employ the present disclosure in virtually any appropriately detailed structure.

The specification may use the phrases "in one embodiment," "in another embodiment," "in yet another embodiment," or "in other embodiments," which may each refer to one or more of the same or different embodiments in accordance with the disclosure.

Fig. 1 is a flowchart of a method for generating a conference summary according to an embodiment of the present invention, and as shown in fig. 1, the method for generating a conference summary according to the embodiment may specifically include:

voice data of a speaker is acquired.

The embodiment is applied to a video conference system. The method for acquiring the voice data of the speaker by acquiring the voice data of the speaker during the video conference may be, for example, acquiring the voice data of the speaker by a microphone built in the video conference system or an external high-sensitivity microphone.

And converting the voice data of the speaker into text data.

In order to facilitate the arrangement of the voice data of the speaker, the voice data of the speaker can be converted into text data. In particular, the conversion may be performed by voice conversion text software.

Specifically, not only the utterance content of the speaker but also the identification information, such as a name, of the speaker corresponding to the utterance content is displayed in the conference summary. Therefore, in this step, the identity of the speaker needs to be identified, and then the corresponding speaking contents are combined to form a conference summary. For example, as shown in fig. 2, the participants include A, B and C, a speaker who speaks, a speaks "big morning good", then B speaks, for example, B speaks "moderator morning good", then C speaks, C speaks "start now … …", the speaker speaks and recognizes their identification information, for example, by biometric information, when a speaks, recognizes a name as a, displays a at the position of the speaker corresponding to the conference era, the content of the utterance is "big morning good", displays "big morning good" at the position of the content of the utterance corresponding to a, displays the content of the utterance corresponding to B at the position of the speaker corresponding to a, displays the content of the speaker corresponding to B at the position of the identified speaker according to the information of the identified B, and similarly, displaying the identity information and the speaking content of the C at corresponding positions.

According to the technical scheme of the embodiment of the invention, when a video conference is carried out, voice data of a speaker can be converted into text data, identity information of the speaker is identified, and the text data and the identity information are combined to form a conference summary, so that participants can know not only the identity information of the speaker but also the speech content of each participant, and the efficiency of the video conference is improved.

Fig. 3 is a flowchart of a method for generating a conference summary according to an embodiment of the present invention, and the method for generating a conference summary according to the embodiment further introduces the technical solution of the present invention in more detail based on the embodiment shown in fig. 1. As shown in fig. 3, the method for generating a conference summary in this embodiment may specifically include:

voice data of a speaker is acquired.

Converting the voice data of the speaker into character data;

And identifying the identity information of the speaker according to the biological characteristic information of the speaker.

Further, recognizing the identity information of the speaker according to the biometric information of the speaker includes: a, obtaining biological characteristic information B of a speaker, identifying the identity of the speaker according to the biological characteristic information of the speaker, and displaying the identity information of the speaker on a display interface.

Specifically, because there are interference factors such as noise in the conference site, the identification of the identity of the speaker often cannot achieve an ideal situation, that is, the identification rate that can be achieved by the identification of the identity of the speaker cannot necessarily achieve 100%. In practice, the recognition rate may vary from 60%, 70% or 80% (where the recognition rate represents the probability of recognition rather than the accuracy of recognition). Thus, the rate of identification of the speaker identity information may be differentiated by some means to alert the participants that the identification of the speaker identity information is not necessarily hundreds of accurate. For example, one method that can be employed is to distinguish the recognition rates by color, i.e., a recognition rate of less than 60% can display the name of the speaker in red font, a recognition rate of between 60% and 80%, can display the name of the speaker in yellow font, a recognition rate of between 80% and 100% can display the name of the speaker in blue font; it is also possible to adopt a method in which, for speakers with a recognition rate lower than 60%, their names are displayed while being identified behind them? "to indicate that the speaker is identified less frequently and that the name may be inaccurate.

In addition, when the identity of the speaker is identified, there may be multiple persons in the identity information of the speaker, for example, the identification rate for identifying the current speaker a is 75%, the identification rate for the speaker B is also 75%, and the machine cannot determine whether the speaker is a or B, at this time, the identity information of the identified speakers may be listed, in a further preferred embodiment, the identity information of the speaker may also be listed through a drop-down frame, and the participant may select actual identity information of the speaker when forming a meeting era.

In a further preferred embodiment, if a more accurate speaker is eventually identified after a period of identification, other speaker identity information may be deleted. For example, at the beginning of the video conference, if it is recognized that the recognition rates of the current speakers a and B are both 75%, the current speakers a and B may be displayed at the same time, and then after a period of recognition, the recognition rate of a is finally recognized to be 95%, and the recognition rate of B is still 75%, that is, the recognition rates of the two are different greatly, for example, if the recognition rate exceeds 10%, the difference is considered to be large, the current speaker a may be considered, and a may be retained, and B may be deleted.

Further, the biometric information may include voiceprint information.

If the biometric information is voiceprint information,

accordingly, the method for acquiring the biological characteristic information of the speaker comprises the following steps:

acquiring voiceprint information of a speaker;

identifying the identity of the speaker according to the biometric information of the speaker, comprising:

and identifying the identity of the speaker according to the voiceprint information of the speaker.

Further, recognizing the identity of the speaker according to the voiceprint information of the speaker includes: c, comparing the voiceprints of the speaker with the voiceprints in the voiceprint library one by one; and D, if the comparison is consistent, outputting the identity of the speaker corresponding to the voiceprint in the voiceprint library.

And judging whether the recognition rate for recognizing the identity information of the speaker exceeds a preset threshold value.

If so, processing the identity information of the speaker and corresponding text data to form the conference summary;

otherwise, the identity information of the speaker is marked as a mark, and the corresponding text data is processed to form the conference summary.

Specifically, for example, the recognition rate may be set to 80%, and if the recognition rate exceeds 80%, the identity information of the speaker is relatively accurate, and if the recognition rate does not exceed 80%, the identity information of the speaker is in doubt.

Specifically, when obtaining the identity information of the speaker, the name information of the speaker is first determined according to the list of the participants, and then the name information of the participants is queried in a related database, such as an employee database of a certain department of a company or a staff database of a certain branch of the company, so as to further obtain the identity information of the speaker. Therefore, the identity information includes information such as the name, the title, the affiliated entity, or the address location of the speaker. Then, the identity information of the speaker may be displayed in detail in the conference summary formed later, for example, the post of the name thereof, or the affiliate to which the speaker belongs and the post of the name thereof may be displayed and then the corresponding contents of the speaker may be displayed at the position where the contents of the speaker are displayed in the conference summary.

Since there may be noise in the video conference site, there may be noise interference when acquiring the speaker's audio data, or information may be lost when transmitting audio data for a remote video conference, there may be a deviation when extracting the speaker's voiceprint information and comparing it with the voiceprint library. Therefore, there is a case where the identity information of the speaker cannot be recognized. At this point, the identity of the speaker may be temporarily identified, for example, by marking the name of the speaker with the letter A, B or C. For example, speaker a and its corresponding utterance are displayed in a conference summary. Therefore, the problem that when a plurality of unidentifiable speakers exist, the identity information of the speakers is displayed, and the speaking contents of the speakers are mixed up is avoided.

As shown in fig. 4, in a specific implementation, when a video conference is performed and a speaker speaks, audio information of the speaker is collected, a voiceprint in the audio information of the speaker is extracted, the voiceprint of the speaker is identified by using a voiceprint identification module to determine identity information of the speaker, the identity information of a user is sent to the video conference system, and then a voiceprint identification feedback module is used to verify whether the voiceprint corresponds to the identity information of the speaker, specifically, the voiceprint can be compared with a voiceprint stored in a voiceprint library, and finally, the identity information of the speaker is displayed on a display device connected to the video conference system.

According to the technical scheme of the embodiment of the invention, when the video conference is carried out, the identity of the speaker can be identified according to the voiceprint information of the speaker, and the identity information of the speaker is displayed on the display interface, so that other participants can know the identity of the speaker in real time and can communicate with parties participating in the video conference, and the efficiency of the video conference is improved.

The flowchart of the third embodiment of the method for generating a conference summary of the present invention is consistent with fig. 3, please continue to refer to fig. 3, and the method for generating a conference summary of the present embodiment further introduces the technical solution of the present invention in more detail on the basis of the embodiment shown in fig. 1. As shown in fig. 3, the method for generating a conference summary in this embodiment may specifically include:

acquiring voice data of a speaker;

converting the voice data of the speaker into character data;

identifying the identity information of the speaker according to the biological characteristic information of the speaker;

The biometric information comprises image information;

if the biometric information is image information,

acquiring image information of a speaker;

and E, identifying the identity of the speaker according to the image information of the speaker.

Further, recognizing the identity of the speaker according to the image information of the speaker includes: f, comparing the images of the speaker with the images in the image library one by one; and G, if the comparison is consistent, outputting the identity of the speaker corresponding to the image in the image library.

Judging whether the recognition rate for recognizing the identity information of the speaker exceeds a preset threshold value or not;

The embodiment can also adopt the image information of the speaker to acquire the identity information of the speaker, and certainly, the voice print recognition mode and the image recognition mode can be simultaneously implemented, so that the acquired identity information of the speaker is more accurate.

As shown in fig. 5, in a specific implementation of this embodiment, when the identity information of the speaker is identified through the voiceprint information of the speaker, the image information of the speaker may be collected by setting an image collection module, and the image collection module may adopt a camera; comparing the image information of the speaker with the images in the image library one by one to obtain the identity information of the speaker corresponding to the images in the image library. Displaying identity information of the speaker on a display interface, for example, displaying the name of the speaker; an image of the speaker is displayed simultaneously, which may include an avatar of the speaker, as well as an image of the body of the speaker, such as a gesture of the speaker, etc.

In other embodiments of the present invention, the speaker may also be located according to the image information of the speaker; and displaying the image information of the speaker. In specific implementation, the speaker moves along with the body motion or posture in the speaker and possibly moves according to the position of the display interface, and the speaker is tracked and positioned by the rotation of the holder of the camera, so that other remote participants can see the conference picture more really.

According to the technical scheme of the embodiment of the invention, when the video conference is carried out, the identity of the speaker can be identified according to the image information of the speaker, and the identity information and the image information of the speaker are displayed on the display interface, so that the scene sense of the video conference is enhanced, the communication of participants is facilitated, and the efficiency of the video conference is improved.

Fig. 6 is a flowchart of a method for generating a conference summary according to another embodiment of the present invention, and the method for generating a conference summary according to this embodiment further introduces the technical solution of the present invention in more detail based on the embodiment shown in fig. 3. As shown in fig. 6, the method for generating a conference summary in this embodiment may specifically include:

voice data of a speaker is acquired.

And converting the voice data of the speaker into text data.

The biometric information may include voiceprint information and image information. For a specific embodiment, refer to the second embodiment and the third embodiment shown in fig. 2.

Further, the identity information of different speakers is identified as a unique mark.

For example, when there are a plurality of speakers who cannot recognize the identity information, different speakers are identified as different unique marks to distinguish the different speakers.

In a specific implementation, in a conference process, if the identity information of the speaker is identified, the mark in the conference summary is replaced by the identity information.

Since there may be noise in the video conference site, there may be noise interference when acquiring the speaker's audio data, or information may be lost when transmitting audio data for a remote video conference, there may be a deviation when extracting the speaker's voiceprint information and comparing it with the voiceprint library. Therefore, there is a case where the identity information of the speaker cannot be recognized. At this point, the identity of the speaker may be temporarily identified, for example, by marking the name of the speaker with the letter A, B or C. For example, speaker a and its corresponding utterance are displayed in a conference summary. In the subsequent video conference process, the audio data of the speaker A are further collected, meanwhile, the voiceprint information of the speaker A and the image information of the speaker A are further recognized, and if the identity information of the speaker A is finally recognized after a period of recognition process, for example, Zhang III, the name of the speaker is replaced by the identifier A, and for example, Zhang III is replaced by the identifier A.

In other embodiments of the present invention, an exclusion method may also be adopted, for example, first, a whole list, a voiceprint library and an image library of the participants are obtained, and after the identity information of the participants is obtained by using the above embodiment, only the identity information of 1 participant is left and cannot be identified, then the identified participants may be excluded by the list of the participants through the exclusion method, and the remaining participant is a speaker that cannot be identified, so that the identity information of the speaker can be obtained; in still another embodiment of the present invention, an elimination method is still adopted, for example, first, the whole list, the voiceprint library and the image library of the participants are obtained, after the identity information of the participants is obtained by using the above embodiment, only the identity information of a few participants is left and cannot be identified, one of the identity information of the remaining participants can be corresponding to the identity of the participant which cannot be identified, and the accuracy of the method is higher because the number of the participants which cannot be identified is less.

The identity information includes information such as the name, the title, the affiliated unit or the address location of the speaker. In this embodiment, the name of the speaker may be used, that is, the name of the speaker and the corresponding utterance are displayed in the conference summary.

Fig. 7 is a flowchart of a method for generating a conference summary according to still another embodiment of the present invention, and the method for generating a conference summary according to the embodiment further introduces the technical solution of the present invention in more detail based on the embodiment shown in fig. 1. As shown in fig. 7, the method for generating a conference summary in this embodiment may specifically include:

voice data of a speaker is acquired.

And converting the voice data of the speaker into text data.

Specifically, the speaker is identified and combined with the text data, that is, the speaker and the voice data correspond to the converted text data, and a conference summary is formed. Therefore, the participants can not only know the identity information of the speakers, but also know the speech content of each participant.

Pushing the meeting summary to each participant.

After the conference is finished, the participants can only remember the contents of the conference and possibly forget the contents of the conference, so that the conference summary can be transmitted to each participant through the network after the conference summary is formed, for example, the conference summary can be sent to a mailbox of each participant and can also be pushed to a mobile phone of each participant.

According to the technical scheme of the embodiment of the invention, when the video conference is carried out, the identity of the speaker can be identified according to the voiceprint information of the speaker, and the identity information of the speaker is displayed on the display interface, so that other participants can know the identity of the speaker in real time, communication between participants is facilitated, and the efficiency of the video conference is improved.

Fig. 8 is a flowchart of a method for generating a conference summary according to another embodiment of the present invention, and the method for generating a conference summary according to the embodiment further introduces the technical solution of the present invention in more detail based on the embodiment shown in fig. 1. As shown in fig. 8, the method for generating a conference summary in this embodiment may specifically include:

voice data of a speaker is acquired.

And converting the voice data of the speaker into text data.

The position of each participant is located.

If the positions of the participants are outside the conference room, recording the current conference content to form a conference summary until the positions of the participants are positioned in the conference room; the recording mode can be a recording mode or a mode of converting the speech of the current speaker into characters and storing the characters;

sending the meeting summary to the attendees.

In specific implementation, the conference summary may not be formed for the whole conference process, but only for special cases. For example, the participant may miss important content when going out midway, and therefore, the tracking and positioning function based on the image acquisition module can position the position of the participant, and when the participant is found to go out, the voice data of the current speaker is converted into text data, or the voice data of the current speaker can be directly sorted into a meeting summary and sent to the participant who goes out, so as to prevent the participant from missing important information.

Fig. 9 is a schematic view of a device for generating a conference summary according to an embodiment of the present invention, and as shown in fig. 9, the device for generating a conference summary according to the embodiment may specifically include an obtaining module, a converting module, and a processing module.

The acquisition module is configured to acquire voice data of a speaker;

Fig. 10 is a schematic diagram of a device for generating a conference summary according to one embodiment of the present invention, and the device for generating a conference summary according to the embodiment further introduces the technical solution of the present invention in more detail based on the embodiment shown in fig. 9.

As shown in fig. 10, the processing module includes:

the identification submodule is configured to identify the identity information of the speaker according to the biological characteristic information of the speaker;

the judgment submodule is configured to judge whether the recognition rate for recognizing the identity information of the speaker exceeds a preset threshold value;

the processing submodule is configured to process the identity information of the speaker and corresponding text data when the identity information of the speaker is successfully identified, so as to form the conference summary; alternatively, the first and second electrodes may be,

and when the recognition rate for recognizing the identity information of the speaker does not exceed a preset threshold value, identifying the identity information of the speaker as a mark and processing corresponding text data to form the conference summary.

Further, the identification submodule is specifically configured to:

obtaining biological characteristic information of a speaker; the biological characteristic information comprises voiceprint information and image information;

and identifying the identity of the speaker according to the biological characteristic information of the speaker, and displaying the identity information of the speaker on a display interface.

Further, the identification submodule is further specifically configured to:

comparing the voiceprints of the speaker with the voiceprints in the voiceprint library one by one;

and if the comparison is consistent, outputting the identity of the speaker corresponding to the voiceprint in the voiceprint library.

Fig. 10 is a schematic view of a third embodiment of the device for generating a conference summary of the present invention, and please refer to fig. 10.

With continued reference to fig. 10, the processing module includes:

the processing submodule is configured to process the identity information of the speaker and the corresponding text data to form the conference summary; alternatively, the first and second electrodes may be,

and when the identity information of the speaker cannot be identified according to the biological characteristic information of the speaker, identifying the identity information of the speaker as a mark and processing the corresponding text data to form the conference summary.

The identification submodule is specifically configured to:

when the biological characteristic information is image information, acquiring image information of a speaker; and identifying the identity of the speaker according to the image information of the speaker.

The identification submodule is further specifically configured to:

comparing the images of the speakers with the images in the image library one by one;

and if the comparison is consistent, outputting the identity of the speaker corresponding to the image in the image library.

Fig. 11 is a schematic view of a device for generating a conference summary according to another embodiment of the present invention, and the device for generating a conference summary according to this embodiment further introduces the technical solution of the present invention in more detail based on the embodiment shown in fig. 9.

As shown in fig. 11, the apparatus for generating a conference summary according to this embodiment further includes:

the first positioning module is configured to position the speaker according to the image information of the speaker;

a display module configured to display image information of the speaker.

Fig. 12 is a schematic diagram of a device for generating a conference summary according to still another embodiment of the present invention, and the device for generating a conference summary according to the present embodiment further introduces the technical solution of the present invention in more detail on the basis of the first embodiment shown in fig. 9.

As shown in fig. 12, the apparatus for generating a conference summary according to this embodiment may further include:

the processing module further comprises:

and the marking submodule is configured to identify the identity information of different speakers as a unique mark.

The processing module further comprises:

and the replacing submodule is configured to replace the mark in the conference summary with the identity information when the identity information of the speaker is identified in the conference process.

A first pushing module configured to push the meeting summary to each participant.

Fig. 13 is a schematic diagram of a device for generating a conference summary according to still another embodiment of the present invention, and the device for generating a conference summary according to this embodiment further introduces the technical solution of the present invention in more detail based on the embodiment shown in fig. 9.

The generation apparatus of a conference summary of the present embodiment further includes:

the second positioning module is configured to position the position of each participant;

the recording module is configured to record the current conference content when the positions of the participants are outside the conference room to form a conference summary until the positions of the participants are positioned in the conference room; the recording mode can be a recording mode or a mode of converting the speech of the current speaker into characters and storing the characters;

a second push module configured to send the meeting summary to the attendees.

An embodiment of the present invention further provides a video conference system, including the apparatus shown in any one of fig. 9 to 13.

The above embodiments are only exemplary embodiments of the present invention, and are not intended to limit the present invention, and the scope of the present invention is defined by the claims. Various modifications and equivalents may be made by those skilled in the art within the spirit and scope of the present invention, and such modifications and equivalents should also be considered as falling within the scope of the present invention.

Claims

1. A method of generating a conference summary, comprising:

acquiring voice data of a speaker;

converting the voice data of the speaker into character data;

forming a conference summary by combining the identity information of the speakers; wherein the content of the first and second substances,

forming a conference summary in combination with the identity information of the speaker, comprising:

otherwise, identifying the identity information of the speaker as a mark and processing the corresponding text data to form the conference summary;

wherein, identify the identity information of the speaker as a mark, and process the corresponding text data to form the conference summary, including:

the identity information of different speakers is marked as a unique mark; distinguishing the recognition rate for recognizing the identity information of the speaker to prompt the participants about the recognition condition of the identity information of the speaker;

and if the identity information of the speaker is identified, replacing the unique mark of the speaker in the conference summary with the identity information.

2. The method for generating a conference summary according to claim 1, wherein the identifying information of the speaker according to the biometric information of the speaker comprises:

3. The method of generating a conference summary according to claim 2, said method further comprising:

positioning the speaker according to the image information of the speaker;

and displaying the image information of the speaker.

4. The method of generating a conference summary according to claim 1, said method further comprising:

pushing the meeting summary to each participant.

5. The method for generating a conference summary according to claim 1, forming a conference summary in combination with identity information of the speaker, comprising:

positioning the position of each participant;

sending the meeting summary to the attendees.

6. An apparatus for generating a conference summary, comprising:

the acquisition module is configured to acquire voice data of a speaker;

a processing module configured to form a conference summary in conjunction with the identity information of the speaker; wherein the content of the first and second substances,

the processing module comprises:

when the recognition rate for recognizing the identity information of the speaker does not exceed a preset threshold value, identifying the identity information of the speaker as a mark and processing corresponding text data to form the conference summary;

the processing module further comprises:

the marking submodule is configured to identify the identity information of different speakers as a unique mark; distinguishing the recognition rate for recognizing the identity information of the speaker to prompt the participants about the recognition condition of the identity information of the speaker;

and the replacing submodule is configured to replace the unique mark of the speaker in the conference summary with the identity information if the identity information of the speaker is identified.

7. A video conferencing system comprising the apparatus of claim 6.