CN106657865B - Conference summary generation method and device and video conference system - Google Patents

Conference summary generation method and device and video conference system Download PDF

Info

Publication number
CN106657865B
CN106657865B CN201611170590.8A CN201611170590A CN106657865B CN 106657865 B CN106657865 B CN 106657865B CN 201611170590 A CN201611170590 A CN 201611170590A CN 106657865 B CN106657865 B CN 106657865B
Authority
CN
China
Prior art keywords
speaker
identity information
information
conference
conference summary
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201611170590.8A
Other languages
Chinese (zh)
Other versions
CN106657865A (en
Inventor
张雅
辛玉军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Lenovo Beijing Ltd
Original Assignee
Lenovo Beijing Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Lenovo Beijing Ltd filed Critical Lenovo Beijing Ltd
Priority to CN201611170590.8A priority Critical patent/CN106657865B/en
Publication of CN106657865A publication Critical patent/CN106657865A/en
Application granted granted Critical
Publication of CN106657865B publication Critical patent/CN106657865B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • H04N7/15Conference systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/02Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Acoustics & Sound (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Alarm Systems (AREA)
  • Telephonic Communication Services (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

The invention discloses a method and a device for generating a conference summary and a conference system, wherein the method comprises the following steps: obtaining biological characteristic information of a speaker; the biological characteristic information comprises voiceprint information and image information; and identifying the identity of the speaker according to the biological characteristic information of the speaker, and displaying the identity information of the speaker on a display interface. According to the technical scheme of the embodiment of the invention, when a video conference is carried out, voice data of a speaker can be converted into text data, identity information of the speaker is identified, and the text data and the identity information are combined to form a conference summary, so that participants can know not only the identity information of the speaker but also the speech content of each participant, and the efficiency of the video conference is improved.

Description

Conference summary generation method and device and video conference system
Technical Field
The invention relates to the technical field of computers, in particular to a method and a device for generating a conference summary and a video conference system.
Background
A video conference system refers to a system device for transmitting audio, video and file data to each other through a transmission line and multimedia devices for individuals or groups in two or more different places, so as to realize real-time and interactive communication and achieve the purpose of conference.
However, in the video conference system, when a video conference is performed among multiple parties, especially when multiple persons participate in the conference in one conference room and speak, an actual conference picture cannot be focused on an actual speaker, and other parties participating in the conference cannot see the behavior and expression of the speaker clearly in real time and cannot know the identity of the speaker, so that communication among the parties participating in the conference is affected, and the effect of the video conference is affected.
Disclosure of Invention
In view of the above, the present invention provides a method and an apparatus for generating a conference summary, which can assist in identifying the identity of a speaker according to the voice of the speaker, and a video conference system.
In order to achieve the above object, an embodiment of the present invention provides a method for generating a conference summary, including:
acquiring voice data of a speaker;
converting the voice data of the speaker into character data;
and combining the identity information of the speakers to form a conference summary.
The invention also provides a device for generating the conference summary, which comprises:
the acquisition module is configured to acquire voice data of a speaker;
the conversion module is configured to convert the voice data of the speaker into text data;
a processing module configured to form a conference summary in conjunction with the identity information of the speaker.
The invention also provides a video conference system comprising the device.
The technical scheme provided by the embodiment of the invention can be seen that when the video conference is carried out, the voice data of the speaker can be converted into the text data, the identity information of the speaker can be identified, and the text data and the identity information are combined to form a conference summary, so that the participants can know the identity information of the speaker and the speech content of each participant, and the efficiency of the video conference is improved.
Drawings
FIG. 1 is a flow chart of a method of generating a conference summary of one embodiment of the present invention;
FIG. 2 is a schematic diagram of the formation of a conference summary in conjunction with speaker identity information in accordance with one embodiment of the present invention;
FIG. 3 is a flow chart of a method for generating a conference summary according to one embodiment of the present invention;
FIG. 4 is a diagram illustrating identification of a speaker based on the speaker's voiceprint in accordance with one embodiment of the present invention;
FIG. 5 is a diagram illustrating identification of a speaker based on an image of the speaker according to one embodiment of the present invention;
FIG. 6 is a flow chart of a method of generating a conference summary of another embodiment of the present invention;
FIG. 7 is a flow chart of a method of generating a conference summary of yet another embodiment of the present invention;
FIG. 8 is a flow chart of a method of generating a conference summary of yet another embodiment of the present invention;
fig. 9 is a schematic diagram of a conference summary generation apparatus according to an embodiment of the present invention;
FIG. 10 is a schematic diagram of a conference summary generation apparatus according to one embodiment of the present invention;
fig. 11 is a schematic diagram of a conference summary generating apparatus according to another embodiment of the present invention;
fig. 12 is a schematic diagram of a conference summary generating apparatus according to still another embodiment of the present invention;
fig. 13 is a schematic diagram of a conference summary generation apparatus according to still another embodiment of the present invention.
Detailed Description
Various aspects and features of the disclosure are described herein with reference to the drawings.
It will be understood that various modifications may be made to the embodiments disclosed herein. Accordingly, the foregoing description should not be construed as limiting, but merely as exemplifications of embodiments. Other modifications will occur to those skilled in the art within the scope and spirit of the disclosure.
The accompanying drawings, which are incorporated in and constitute a part of the specification, illustrate embodiments of the disclosure and, together with a general description of the disclosure given above, and the detailed description of the embodiments given below, serve to explain the principles of the disclosure.
These and other characteristics of the invention will become apparent from the following description of a preferred form of embodiment, given as a non-limiting example, with reference to the accompanying drawings.
It should also be understood that, although the invention has been described with reference to some specific examples, a person of skill in the art shall certainly be able to achieve many other equivalent forms of the invention, having the characteristics as set forth in the claims and hence all coming within the field of protection defined thereby.
The above and other aspects, features and advantages of the present disclosure will become more apparent in view of the following detailed description when taken in conjunction with the accompanying drawings.
Specific embodiments of the present disclosure are described hereinafter with reference to the accompanying drawings; however, it is to be understood that the disclosed embodiments are merely examples of the disclosure that may be embodied in various forms. Well-known and/or repeated functions and structures have not been described in detail so as not to obscure the present disclosure with unnecessary or unnecessary detail. Therefore, specific structural and functional details disclosed herein are not to be interpreted as limiting, but merely as a basis for the claims and as a representative basis for teaching one skilled in the art to variously employ the present disclosure in virtually any appropriately detailed structure.
The specification may use the phrases "in one embodiment," "in another embodiment," "in yet another embodiment," or "in other embodiments," which may each refer to one or more of the same or different embodiments in accordance with the disclosure.
Fig. 1 is a flowchart of a method for generating a conference summary according to an embodiment of the present invention, and as shown in fig. 1, the method for generating a conference summary according to the embodiment may specifically include:
voice data of a speaker is acquired.
The embodiment is applied to a video conference system. The method for acquiring the voice data of the speaker by acquiring the voice data of the speaker during the video conference may be, for example, acquiring the voice data of the speaker by a microphone built in the video conference system or an external high-sensitivity microphone.
And converting the voice data of the speaker into text data.
In order to facilitate the arrangement of the voice data of the speaker, the voice data of the speaker can be converted into text data. In particular, the conversion may be performed by voice conversion text software.
And combining the identity information of the speakers to form a conference summary.
Specifically, not only the utterance content of the speaker but also the identification information, such as a name, of the speaker corresponding to the utterance content is displayed in the conference summary. Therefore, in this step, the identity of the speaker needs to be identified, and then the corresponding speaking contents are combined to form a conference summary. For example, as shown in fig. 2, the participants include A, B and C, a speaker who speaks, a speaks "big morning good", then B speaks, for example, B speaks "moderator morning good", then C speaks, C speaks "start now … …", the speaker speaks and recognizes their identification information, for example, by biometric information, when a speaks, recognizes a name as a, displays a at the position of the speaker corresponding to the conference era, the content of the utterance is "big morning good", displays "big morning good" at the position of the content of the utterance corresponding to a, displays the content of the utterance corresponding to B at the position of the speaker corresponding to a, displays the content of the speaker corresponding to B at the position of the identified speaker according to the information of the identified B, and similarly, displaying the identity information and the speaking content of the C at corresponding positions.
According to the technical scheme of the embodiment of the invention, when a video conference is carried out, voice data of a speaker can be converted into text data, identity information of the speaker is identified, and the text data and the identity information are combined to form a conference summary, so that participants can know not only the identity information of the speaker but also the speech content of each participant, and the efficiency of the video conference is improved.
Fig. 3 is a flowchart of a method for generating a conference summary according to an embodiment of the present invention, and the method for generating a conference summary according to the embodiment further introduces the technical solution of the present invention in more detail based on the embodiment shown in fig. 1. As shown in fig. 3, the method for generating a conference summary in this embodiment may specifically include:
voice data of a speaker is acquired.
The embodiment is applied to a video conference system. The method for acquiring the voice data of the speaker by acquiring the voice data of the speaker during the video conference may be, for example, acquiring the voice data of the speaker by a microphone built in the video conference system or an external high-sensitivity microphone.
Converting the voice data of the speaker into character data;
in order to facilitate the arrangement of the voice data of the speaker, the voice data of the speaker can be converted into text data. In particular, the conversion may be performed by voice conversion text software.
And identifying the identity information of the speaker according to the biological characteristic information of the speaker.
Further, recognizing the identity information of the speaker according to the biometric information of the speaker includes: a, obtaining biological characteristic information B of a speaker, identifying the identity of the speaker according to the biological characteristic information of the speaker, and displaying the identity information of the speaker on a display interface.
Specifically, because there are interference factors such as noise in the conference site, the identification of the identity of the speaker often cannot achieve an ideal situation, that is, the identification rate that can be achieved by the identification of the identity of the speaker cannot necessarily achieve 100%. In practice, the recognition rate may vary from 60%, 70% or 80% (where the recognition rate represents the probability of recognition rather than the accuracy of recognition). Thus, the rate of identification of the speaker identity information may be differentiated by some means to alert the participants that the identification of the speaker identity information is not necessarily hundreds of accurate. For example, one method that can be employed is to distinguish the recognition rates by color, i.e., a recognition rate of less than 60% can display the name of the speaker in red font, a recognition rate of between 60% and 80%, can display the name of the speaker in yellow font, a recognition rate of between 80% and 100% can display the name of the speaker in blue font; it is also possible to adopt a method in which, for speakers with a recognition rate lower than 60%, their names are displayed while being identified behind them? "to indicate that the speaker is identified less frequently and that the name may be inaccurate.
In addition, when the identity of the speaker is identified, there may be multiple persons in the identity information of the speaker, for example, the identification rate for identifying the current speaker a is 75%, the identification rate for the speaker B is also 75%, and the machine cannot determine whether the speaker is a or B, at this time, the identity information of the identified speakers may be listed, in a further preferred embodiment, the identity information of the speaker may also be listed through a drop-down frame, and the participant may select actual identity information of the speaker when forming a meeting era.
In a further preferred embodiment, if a more accurate speaker is eventually identified after a period of identification, other speaker identity information may be deleted. For example, at the beginning of the video conference, if it is recognized that the recognition rates of the current speakers a and B are both 75%, the current speakers a and B may be displayed at the same time, and then after a period of recognition, the recognition rate of a is finally recognized to be 95%, and the recognition rate of B is still 75%, that is, the recognition rates of the two are different greatly, for example, if the recognition rate exceeds 10%, the difference is considered to be large, the current speaker a may be considered, and a may be retained, and B may be deleted.
Further, the biometric information may include voiceprint information.
If the biometric information is voiceprint information,
accordingly, the method for acquiring the biological characteristic information of the speaker comprises the following steps:
acquiring voiceprint information of a speaker;
identifying the identity of the speaker according to the biometric information of the speaker, comprising:
and identifying the identity of the speaker according to the voiceprint information of the speaker.
Further, recognizing the identity of the speaker according to the voiceprint information of the speaker includes: c, comparing the voiceprints of the speaker with the voiceprints in the voiceprint library one by one; and D, if the comparison is consistent, outputting the identity of the speaker corresponding to the voiceprint in the voiceprint library.
And judging whether the recognition rate for recognizing the identity information of the speaker exceeds a preset threshold value.
If so, processing the identity information of the speaker and corresponding text data to form the conference summary;
otherwise, the identity information of the speaker is marked as a mark, and the corresponding text data is processed to form the conference summary.
Specifically, for example, the recognition rate may be set to 80%, and if the recognition rate exceeds 80%, the identity information of the speaker is relatively accurate, and if the recognition rate does not exceed 80%, the identity information of the speaker is in doubt.
Specifically, when obtaining the identity information of the speaker, the name information of the speaker is first determined according to the list of the participants, and then the name information of the participants is queried in a related database, such as an employee database of a certain department of a company or a staff database of a certain branch of the company, so as to further obtain the identity information of the speaker. Therefore, the identity information includes information such as the name, the title, the affiliated entity, or the address location of the speaker. Then, the identity information of the speaker may be displayed in detail in the conference summary formed later, for example, the post of the name thereof, or the affiliate to which the speaker belongs and the post of the name thereof may be displayed and then the corresponding contents of the speaker may be displayed at the position where the contents of the speaker are displayed in the conference summary.
Since there may be noise in the video conference site, there may be noise interference when acquiring the speaker's audio data, or information may be lost when transmitting audio data for a remote video conference, there may be a deviation when extracting the speaker's voiceprint information and comparing it with the voiceprint library. Therefore, there is a case where the identity information of the speaker cannot be recognized. At this point, the identity of the speaker may be temporarily identified, for example, by marking the name of the speaker with the letter A, B or C. For example, speaker a and its corresponding utterance are displayed in a conference summary. Therefore, the problem that when a plurality of unidentifiable speakers exist, the identity information of the speakers is displayed, and the speaking contents of the speakers are mixed up is avoided.
As shown in fig. 4, in a specific implementation, when a video conference is performed and a speaker speaks, audio information of the speaker is collected, a voiceprint in the audio information of the speaker is extracted, the voiceprint of the speaker is identified by using a voiceprint identification module to determine identity information of the speaker, the identity information of a user is sent to the video conference system, and then a voiceprint identification feedback module is used to verify whether the voiceprint corresponds to the identity information of the speaker, specifically, the voiceprint can be compared with a voiceprint stored in a voiceprint library, and finally, the identity information of the speaker is displayed on a display device connected to the video conference system.
According to the technical scheme of the embodiment of the invention, when the video conference is carried out, the identity of the speaker can be identified according to the voiceprint information of the speaker, and the identity information of the speaker is displayed on the display interface, so that other participants can know the identity of the speaker in real time and can communicate with parties participating in the video conference, and the efficiency of the video conference is improved.
The flowchart of the third embodiment of the method for generating a conference summary of the present invention is consistent with fig. 3, please continue to refer to fig. 3, and the method for generating a conference summary of the present embodiment further introduces the technical solution of the present invention in more detail on the basis of the embodiment shown in fig. 1. As shown in fig. 3, the method for generating a conference summary in this embodiment may specifically include:
acquiring voice data of a speaker;
converting the voice data of the speaker into character data;
identifying the identity information of the speaker according to the biological characteristic information of the speaker;
further, recognizing the identity information of the speaker according to the biometric information of the speaker includes: a, obtaining biological characteristic information B of a speaker, identifying the identity of the speaker according to the biological characteristic information of the speaker, and displaying the identity information of the speaker on a display interface.
The biometric information comprises image information;
if the biometric information is image information,
accordingly, the method for acquiring the biological characteristic information of the speaker comprises the following steps:
acquiring image information of a speaker;
identifying the identity of the speaker according to the biometric information of the speaker, comprising:
and E, identifying the identity of the speaker according to the image information of the speaker.
Further, recognizing the identity of the speaker according to the image information of the speaker includes: f, comparing the images of the speaker with the images in the image library one by one; and G, if the comparison is consistent, outputting the identity of the speaker corresponding to the image in the image library.
Judging whether the recognition rate for recognizing the identity information of the speaker exceeds a preset threshold value or not;
if so, processing the identity information of the speaker and corresponding text data to form the conference summary;
otherwise, the identity information of the speaker is marked as a mark, and the corresponding text data is processed to form the conference summary.
The embodiment can also adopt the image information of the speaker to acquire the identity information of the speaker, and certainly, the voice print recognition mode and the image recognition mode can be simultaneously implemented, so that the acquired identity information of the speaker is more accurate.
As shown in fig. 5, in a specific implementation of this embodiment, when the identity information of the speaker is identified through the voiceprint information of the speaker, the image information of the speaker may be collected by setting an image collection module, and the image collection module may adopt a camera; comparing the image information of the speaker with the images in the image library one by one to obtain the identity information of the speaker corresponding to the images in the image library. Displaying identity information of the speaker on a display interface, for example, displaying the name of the speaker; an image of the speaker is displayed simultaneously, which may include an avatar of the speaker, as well as an image of the body of the speaker, such as a gesture of the speaker, etc.
In other embodiments of the present invention, the speaker may also be located according to the image information of the speaker; and displaying the image information of the speaker. In specific implementation, the speaker moves along with the body motion or posture in the speaker and possibly moves according to the position of the display interface, and the speaker is tracked and positioned by the rotation of the holder of the camera, so that other remote participants can see the conference picture more really.
According to the technical scheme of the embodiment of the invention, when the video conference is carried out, the identity of the speaker can be identified according to the image information of the speaker, and the identity information and the image information of the speaker are displayed on the display interface, so that the scene sense of the video conference is enhanced, the communication of participants is facilitated, and the efficiency of the video conference is improved.
Fig. 6 is a flowchart of a method for generating a conference summary according to another embodiment of the present invention, and the method for generating a conference summary according to this embodiment further introduces the technical solution of the present invention in more detail based on the embodiment shown in fig. 3. As shown in fig. 6, the method for generating a conference summary in this embodiment may specifically include:
voice data of a speaker is acquired.
The embodiment is applied to a video conference system. The method for acquiring the voice data of the speaker by acquiring the voice data of the speaker during the video conference may be, for example, acquiring the voice data of the speaker by a microphone built in the video conference system or an external high-sensitivity microphone.
And converting the voice data of the speaker into text data.
In order to facilitate the arrangement of the voice data of the speaker, the voice data of the speaker can be converted into text data. In particular, the conversion may be performed by voice conversion text software.
And identifying the identity information of the speaker according to the biological characteristic information of the speaker.
Further, recognizing the identity information of the speaker according to the biometric information of the speaker includes: a, obtaining biological characteristic information B of a speaker, identifying the identity of the speaker according to the biological characteristic information of the speaker, and displaying the identity information of the speaker on a display interface.
The biometric information may include voiceprint information and image information. For a specific embodiment, refer to the second embodiment and the third embodiment shown in fig. 2.
And judging whether the recognition rate for recognizing the identity information of the speaker exceeds a preset threshold value.
If so, processing the identity information of the speaker and corresponding text data to form the conference summary;
otherwise, the identity information of the speaker is marked as a mark, and the corresponding text data is processed to form the conference summary.
Further, the identity information of different speakers is identified as a unique mark.
For example, when there are a plurality of speakers who cannot recognize the identity information, different speakers are identified as different unique marks to distinguish the different speakers.
In a specific implementation, in a conference process, if the identity information of the speaker is identified, the mark in the conference summary is replaced by the identity information.
Since there may be noise in the video conference site, there may be noise interference when acquiring the speaker's audio data, or information may be lost when transmitting audio data for a remote video conference, there may be a deviation when extracting the speaker's voiceprint information and comparing it with the voiceprint library. Therefore, there is a case where the identity information of the speaker cannot be recognized. At this point, the identity of the speaker may be temporarily identified, for example, by marking the name of the speaker with the letter A, B or C. For example, speaker a and its corresponding utterance are displayed in a conference summary. In the subsequent video conference process, the audio data of the speaker A are further collected, meanwhile, the voiceprint information of the speaker A and the image information of the speaker A are further recognized, and if the identity information of the speaker A is finally recognized after a period of recognition process, for example, Zhang III, the name of the speaker is replaced by the identifier A, and for example, Zhang III is replaced by the identifier A.
In other embodiments of the present invention, an exclusion method may also be adopted, for example, first, a whole list, a voiceprint library and an image library of the participants are obtained, and after the identity information of the participants is obtained by using the above embodiment, only the identity information of 1 participant is left and cannot be identified, then the identified participants may be excluded by the list of the participants through the exclusion method, and the remaining participant is a speaker that cannot be identified, so that the identity information of the speaker can be obtained; in still another embodiment of the present invention, an elimination method is still adopted, for example, first, the whole list, the voiceprint library and the image library of the participants are obtained, after the identity information of the participants is obtained by using the above embodiment, only the identity information of a few participants is left and cannot be identified, one of the identity information of the remaining participants can be corresponding to the identity of the participant which cannot be identified, and the accuracy of the method is higher because the number of the participants which cannot be identified is less.
The identity information includes information such as the name, the title, the affiliated unit or the address location of the speaker. In this embodiment, the name of the speaker may be used, that is, the name of the speaker and the corresponding utterance are displayed in the conference summary.
According to the technical scheme of the embodiment of the invention, when the video conference is carried out, the identity of the speaker can be identified according to the voiceprint information of the speaker, and the identity information of the speaker is displayed on the display interface, so that other participants can know the identity of the speaker in real time and can communicate with parties participating in the video conference, and the efficiency of the video conference is improved.
Fig. 7 is a flowchart of a method for generating a conference summary according to still another embodiment of the present invention, and the method for generating a conference summary according to the embodiment further introduces the technical solution of the present invention in more detail based on the embodiment shown in fig. 1. As shown in fig. 7, the method for generating a conference summary in this embodiment may specifically include:
voice data of a speaker is acquired.
The embodiment is applied to a video conference system. The method for acquiring the voice data of the speaker by acquiring the voice data of the speaker during the video conference may be, for example, acquiring the voice data of the speaker by a microphone built in the video conference system or an external high-sensitivity microphone.
And converting the voice data of the speaker into text data.
In order to facilitate the arrangement of the voice data of the speaker, the voice data of the speaker can be converted into text data. In particular, the conversion may be performed by voice conversion text software.
And combining the identity information of the speakers to form a conference summary.
Specifically, the speaker is identified and combined with the text data, that is, the speaker and the voice data correspond to the converted text data, and a conference summary is formed. Therefore, the participants can not only know the identity information of the speakers, but also know the speech content of each participant.
Pushing the meeting summary to each participant.
After the conference is finished, the participants can only remember the contents of the conference and possibly forget the contents of the conference, so that the conference summary can be transmitted to each participant through the network after the conference summary is formed, for example, the conference summary can be sent to a mailbox of each participant and can also be pushed to a mobile phone of each participant.
According to the technical scheme of the embodiment of the invention, when the video conference is carried out, the identity of the speaker can be identified according to the voiceprint information of the speaker, and the identity information of the speaker is displayed on the display interface, so that other participants can know the identity of the speaker in real time, communication between participants is facilitated, and the efficiency of the video conference is improved.
Fig. 8 is a flowchart of a method for generating a conference summary according to another embodiment of the present invention, and the method for generating a conference summary according to the embodiment further introduces the technical solution of the present invention in more detail based on the embodiment shown in fig. 1. As shown in fig. 8, the method for generating a conference summary in this embodiment may specifically include:
voice data of a speaker is acquired.
The embodiment is applied to a video conference system. The method for acquiring the voice data of the speaker by acquiring the voice data of the speaker during the video conference may be, for example, acquiring the voice data of the speaker by a microphone built in the video conference system or an external high-sensitivity microphone.
And converting the voice data of the speaker into text data.
In order to facilitate the arrangement of the voice data of the speaker, the voice data of the speaker can be converted into text data. In particular, the conversion may be performed by voice conversion text software.
The position of each participant is located.
If the positions of the participants are outside the conference room, recording the current conference content to form a conference summary until the positions of the participants are positioned in the conference room; the recording mode can be a recording mode or a mode of converting the speech of the current speaker into characters and storing the characters;
sending the meeting summary to the attendees.
In specific implementation, the conference summary may not be formed for the whole conference process, but only for special cases. For example, the participant may miss important content when going out midway, and therefore, the tracking and positioning function based on the image acquisition module can position the position of the participant, and when the participant is found to go out, the voice data of the current speaker is converted into text data, or the voice data of the current speaker can be directly sorted into a meeting summary and sent to the participant who goes out, so as to prevent the participant from missing important information.
According to the technical scheme of the embodiment of the invention, when the video conference is carried out, the identity of the speaker can be identified according to the voiceprint information of the speaker, and the identity information of the speaker is displayed on the display interface, so that other participants can know the identity of the speaker in real time and can communicate with parties participating in the video conference, and the efficiency of the video conference is improved.
Fig. 9 is a schematic view of a device for generating a conference summary according to an embodiment of the present invention, and as shown in fig. 9, the device for generating a conference summary according to the embodiment may specifically include an obtaining module, a converting module, and a processing module.
The acquisition module is configured to acquire voice data of a speaker;
the conversion module is configured to convert the voice data of the speaker into text data;
a processing module configured to form a conference summary in conjunction with the identity information of the speaker.
According to the technical scheme of the embodiment of the invention, when a video conference is carried out, voice data of a speaker can be converted into text data, identity information of the speaker is identified, and the text data and the identity information are combined to form a conference summary, so that participants can know not only the identity information of the speaker but also the speech content of each participant, and the efficiency of the video conference is improved.
Fig. 10 is a schematic diagram of a device for generating a conference summary according to one embodiment of the present invention, and the device for generating a conference summary according to the embodiment further introduces the technical solution of the present invention in more detail based on the embodiment shown in fig. 9.
As shown in fig. 10, the processing module includes:
the identification submodule is configured to identify the identity information of the speaker according to the biological characteristic information of the speaker;
the judgment submodule is configured to judge whether the recognition rate for recognizing the identity information of the speaker exceeds a preset threshold value;
the processing submodule is configured to process the identity information of the speaker and corresponding text data when the identity information of the speaker is successfully identified, so as to form the conference summary; alternatively, the first and second electrodes may be,
and when the recognition rate for recognizing the identity information of the speaker does not exceed a preset threshold value, identifying the identity information of the speaker as a mark and processing corresponding text data to form the conference summary.
Further, the identification submodule is specifically configured to:
obtaining biological characteristic information of a speaker; the biological characteristic information comprises voiceprint information and image information;
and identifying the identity of the speaker according to the biological characteristic information of the speaker, and displaying the identity information of the speaker on a display interface.
Further, the identification submodule is further specifically configured to:
comparing the voiceprints of the speaker with the voiceprints in the voiceprint library one by one;
and if the comparison is consistent, outputting the identity of the speaker corresponding to the voiceprint in the voiceprint library.
According to the technical scheme of the embodiment of the invention, when the video conference is carried out, the identity of the speaker can be identified according to the voiceprint information of the speaker, and the identity information of the speaker is displayed on the display interface, so that other participants can know the identity of the speaker in real time and can communicate with parties participating in the video conference, and the efficiency of the video conference is improved.
Fig. 10 is a schematic view of a third embodiment of the device for generating a conference summary of the present invention, and please refer to fig. 10.
With continued reference to fig. 10, the processing module includes:
the identification submodule is configured to identify the identity information of the speaker according to the biological characteristic information of the speaker;
the processing submodule is configured to process the identity information of the speaker and the corresponding text data to form the conference summary; alternatively, the first and second electrodes may be,
and when the identity information of the speaker cannot be identified according to the biological characteristic information of the speaker, identifying the identity information of the speaker as a mark and processing the corresponding text data to form the conference summary.
The identification submodule is specifically configured to:
when the biological characteristic information is image information, acquiring image information of a speaker; and identifying the identity of the speaker according to the image information of the speaker.
The identification submodule is further specifically configured to:
comparing the images of the speakers with the images in the image library one by one;
and if the comparison is consistent, outputting the identity of the speaker corresponding to the image in the image library.
According to the technical scheme of the embodiment of the invention, when the video conference is carried out, the identity of the speaker can be identified according to the image information of the speaker, and the identity information and the image information of the speaker are displayed on the display interface, so that the scene sense of the video conference is enhanced, the communication of participants is facilitated, and the efficiency of the video conference is improved.
Fig. 11 is a schematic view of a device for generating a conference summary according to another embodiment of the present invention, and the device for generating a conference summary according to this embodiment further introduces the technical solution of the present invention in more detail based on the embodiment shown in fig. 9.
As shown in fig. 11, the apparatus for generating a conference summary according to this embodiment further includes:
the first positioning module is configured to position the speaker according to the image information of the speaker;
a display module configured to display image information of the speaker.
According to the technical scheme of the embodiment of the invention, when the video conference is carried out, the identity of the speaker can be identified according to the voiceprint information of the speaker, and the identity information of the speaker is displayed on the display interface, so that other participants can know the identity of the speaker in real time and can communicate with parties participating in the video conference, and the efficiency of the video conference is improved.
Fig. 12 is a schematic diagram of a device for generating a conference summary according to still another embodiment of the present invention, and the device for generating a conference summary according to the present embodiment further introduces the technical solution of the present invention in more detail on the basis of the first embodiment shown in fig. 9.
As shown in fig. 12, the apparatus for generating a conference summary according to this embodiment may further include:
the processing module further comprises:
and the marking submodule is configured to identify the identity information of different speakers as a unique mark.
The processing module further comprises:
and the replacing submodule is configured to replace the mark in the conference summary with the identity information when the identity information of the speaker is identified in the conference process.
A first pushing module configured to push the meeting summary to each participant.
According to the technical scheme of the embodiment of the invention, when the video conference is carried out, the identity of the speaker can be identified according to the voiceprint information of the speaker, and the identity information of the speaker is displayed on the display interface, so that other participants can know the identity of the speaker in real time and can communicate with parties participating in the video conference, and the efficiency of the video conference is improved.
Fig. 13 is a schematic diagram of a device for generating a conference summary according to still another embodiment of the present invention, and the device for generating a conference summary according to this embodiment further introduces the technical solution of the present invention in more detail based on the embodiment shown in fig. 9.
The generation apparatus of a conference summary of the present embodiment further includes:
the second positioning module is configured to position the position of each participant;
the recording module is configured to record the current conference content when the positions of the participants are outside the conference room to form a conference summary until the positions of the participants are positioned in the conference room; the recording mode can be a recording mode or a mode of converting the speech of the current speaker into characters and storing the characters;
a second push module configured to send the meeting summary to the attendees.
According to the technical scheme of the embodiment of the invention, when the video conference is carried out, the identity of the speaker can be identified according to the voiceprint information of the speaker, and the identity information of the speaker is displayed on the display interface, so that other participants can know the identity of the speaker in real time and can communicate with parties participating in the video conference, and the efficiency of the video conference is improved.
An embodiment of the present invention further provides a video conference system, including the apparatus shown in any one of fig. 9 to 13.
The above embodiments are only exemplary embodiments of the present invention, and are not intended to limit the present invention, and the scope of the present invention is defined by the claims. Various modifications and equivalents may be made by those skilled in the art within the spirit and scope of the present invention, and such modifications and equivalents should also be considered as falling within the scope of the present invention.

Claims (7)

1. A method of generating a conference summary, comprising:
acquiring voice data of a speaker;
converting the voice data of the speaker into character data;
forming a conference summary by combining the identity information of the speakers; wherein the content of the first and second substances,
forming a conference summary in combination with the identity information of the speaker, comprising:
identifying the identity information of the speaker according to the biological characteristic information of the speaker;
judging whether the recognition rate for recognizing the identity information of the speaker exceeds a preset threshold value or not;
if so, processing the identity information of the speaker and corresponding text data to form the conference summary;
otherwise, identifying the identity information of the speaker as a mark and processing the corresponding text data to form the conference summary;
wherein, identify the identity information of the speaker as a mark, and process the corresponding text data to form the conference summary, including:
the identity information of different speakers is marked as a unique mark; distinguishing the recognition rate for recognizing the identity information of the speaker to prompt the participants about the recognition condition of the identity information of the speaker;
and if the identity information of the speaker is identified, replacing the unique mark of the speaker in the conference summary with the identity information.
2. The method for generating a conference summary according to claim 1, wherein the identifying information of the speaker according to the biometric information of the speaker comprises:
obtaining biological characteristic information of a speaker; the biological characteristic information comprises voiceprint information and image information;
and identifying the identity of the speaker according to the biological characteristic information of the speaker, and displaying the identity information of the speaker on a display interface.
3. The method of generating a conference summary according to claim 2, said method further comprising:
positioning the speaker according to the image information of the speaker;
and displaying the image information of the speaker.
4. The method of generating a conference summary according to claim 1, said method further comprising:
pushing the meeting summary to each participant.
5. The method for generating a conference summary according to claim 1, forming a conference summary in combination with identity information of the speaker, comprising:
positioning the position of each participant;
if the positions of the participants are outside the conference room, recording the current conference content to form a conference summary until the positions of the participants are positioned in the conference room; the recording mode can be a recording mode or a mode of converting the speech of the current speaker into characters and storing the characters;
sending the meeting summary to the attendees.
6. An apparatus for generating a conference summary, comprising:
the acquisition module is configured to acquire voice data of a speaker;
the conversion module is configured to convert the voice data of the speaker into text data;
a processing module configured to form a conference summary in conjunction with the identity information of the speaker; wherein the content of the first and second substances,
the processing module comprises:
the identification submodule is configured to identify the identity information of the speaker according to the biological characteristic information of the speaker;
the judgment submodule is configured to judge whether the recognition rate for recognizing the identity information of the speaker exceeds a preset threshold value;
the processing submodule is configured to process the identity information of the speaker and corresponding text data when the identity information of the speaker is successfully identified, so as to form the conference summary; alternatively, the first and second electrodes may be,
when the recognition rate for recognizing the identity information of the speaker does not exceed a preset threshold value, identifying the identity information of the speaker as a mark and processing corresponding text data to form the conference summary;
the processing module further comprises:
the marking submodule is configured to identify the identity information of different speakers as a unique mark; distinguishing the recognition rate for recognizing the identity information of the speaker to prompt the participants about the recognition condition of the identity information of the speaker;
and the replacing submodule is configured to replace the unique mark of the speaker in the conference summary with the identity information if the identity information of the speaker is identified.
7. A video conferencing system comprising the apparatus of claim 6.
CN201611170590.8A 2016-12-16 2016-12-16 Conference summary generation method and device and video conference system Active CN106657865B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201611170590.8A CN106657865B (en) 2016-12-16 2016-12-16 Conference summary generation method and device and video conference system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611170590.8A CN106657865B (en) 2016-12-16 2016-12-16 Conference summary generation method and device and video conference system

Publications (2)

Publication Number Publication Date
CN106657865A CN106657865A (en) 2017-05-10
CN106657865B true CN106657865B (en) 2020-08-25

Family

ID=58822170

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611170590.8A Active CN106657865B (en) 2016-12-16 2016-12-16 Conference summary generation method and device and video conference system

Country Status (1)

Country Link
CN (1) CN106657865B (en)

Families Citing this family (41)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107527623B (en) * 2017-08-07 2021-02-09 广州视源电子科技股份有限公司 Screen transmission method and device, electronic equipment and computer readable storage medium
CN107257448A (en) * 2017-08-09 2017-10-17 成都全云科技有限公司 A kind of video conferencing system exchanged with font
CN107609045B (en) * 2017-08-17 2020-09-29 深圳壹秘科技有限公司 Conference record generating device and method thereof
CN107689225B (en) * 2017-09-29 2019-11-19 福建实达电脑设备有限公司 A method of automatically generating minutes
CN107749313B (en) * 2017-11-23 2019-03-01 郑州大学第一附属医院 A kind of method of automatic transcription and generation Telemedicine Consultation record
CN108022584A (en) * 2017-11-29 2018-05-11 芜湖星途机器人科技有限公司 Office Voice identifies optimization method
CN109920428A (en) * 2017-12-12 2019-06-21 杭州海康威视数字技术股份有限公司 A kind of notes input method, device, electronic equipment and storage medium
CN108074576B (en) * 2017-12-14 2022-04-08 讯飞智元信息科技有限公司 Speaker role separation method and system under interrogation scene
CN107993665B (en) * 2017-12-14 2021-04-30 科大讯飞股份有限公司 Method for determining role of speaker in multi-person conversation scene, intelligent conference method and system
CN107978317A (en) * 2017-12-18 2018-05-01 北京百度网讯科技有限公司 Meeting summary synthetic method, system and terminal device
CN108231064A (en) * 2018-01-02 2018-06-29 联想(北京)有限公司 A kind of data processing method and system
CN110022454B (en) * 2018-01-10 2021-02-23 华为技术有限公司 Method for identifying identity in video conference and related equipment
CN108399923B (en) * 2018-02-01 2019-06-28 深圳市鹰硕技术有限公司 More human hairs call the turn spokesman's recognition methods and device
CN108417218B (en) * 2018-03-09 2020-12-22 福州米鱼信息科技有限公司 Memorandum reminding method and terminal based on voiceprint
KR102562227B1 (en) * 2018-06-12 2023-08-02 현대자동차주식회사 Dialogue system, Vehicle and method for controlling the vehicle
CN110661923A (en) * 2018-06-28 2020-01-07 视联动力信息技术股份有限公司 Method and device for recording speech information in conference
CN108986826A (en) * 2018-08-14 2018-12-11 中国平安人寿保险股份有限公司 Automatically generate method, electronic device and the readable storage medium storing program for executing of minutes
CN109525800A (en) * 2018-11-08 2019-03-26 江西国泰利民信息科技有限公司 A kind of teleconference voice recognition data transmission method
CN111193890B (en) * 2018-11-14 2022-06-17 株式会社理光 Conference record analyzing device and method and conference record playing system
CN109741754A (en) * 2018-12-10 2019-05-10 上海思创华信信息技术有限公司 A kind of conference voice recognition methods and system, storage medium and terminal
CN111354356B (en) * 2018-12-24 2024-04-30 北京搜狗科技发展有限公司 Voice data processing method and device
CN111385185A (en) * 2018-12-28 2020-07-07 中兴通讯股份有限公司 Information processing method, computer device, and computer-readable storage medium
CN109361527B (en) * 2018-12-28 2021-02-05 苏州思必驰信息科技有限公司 Voice conference recording method and system
CN109783642A (en) * 2019-01-09 2019-05-21 上海极链网络科技有限公司 Structured content processing method, device, equipment and the medium of multi-person conference scene
CN109816722A (en) * 2019-01-18 2019-05-28 深圳市沃特沃德股份有限公司 Position method, apparatus, storage medium and the computer equipment of spokesman position
CN109887508A (en) * 2019-01-25 2019-06-14 广州富港万嘉智能科技有限公司 A kind of meeting automatic record method, electronic equipment and storage medium based on vocal print
CN112466306B (en) * 2019-08-19 2023-07-04 中国科学院自动化研究所 Conference summary generation method, device, computer equipment and storage medium
CN110580907B (en) * 2019-08-28 2021-09-24 云知声智能科技股份有限公司 Voice recognition method and system for multi-person speaking scene
CN110677614A (en) * 2019-10-15 2020-01-10 广州国音智能科技有限公司 Information processing method, device and computer readable storage medium
CN112750247A (en) * 2019-10-30 2021-05-04 京东方科技集团股份有限公司 Participant identification method, identification system, computer device, and medium
CN110827853A (en) * 2019-11-11 2020-02-21 广州国音智能科技有限公司 Voice feature information extraction method, terminal and readable storage medium
KR102178175B1 (en) * 2019-12-09 2020-11-12 김경철 User device and method of controlling thereof
CN111048095A (en) * 2019-12-24 2020-04-21 苏州思必驰信息科技有限公司 Voice transcription method, equipment and computer readable storage medium
CN113014854B (en) * 2020-04-30 2022-11-11 北京字节跳动网络技术有限公司 Method, device, equipment and medium for generating interactive record
CN111818294A (en) * 2020-08-03 2020-10-23 上海依图信息技术有限公司 Method, medium and electronic device for multi-person conference real-time display combined with audio and video
CN112037791B (en) * 2020-08-12 2023-01-13 广东电力信息科技有限公司 Conference summary transcription method, apparatus and storage medium
CN114333853A (en) * 2020-09-25 2022-04-12 华为技术有限公司 Audio data processing method, equipment and system
CN114792522A (en) * 2021-01-26 2022-07-26 阿里巴巴集团控股有限公司 Audio signal processing method, conference recording and presenting method, apparatus, system and medium
CN113113022A (en) * 2021-04-15 2021-07-13 吉林大学 Method for automatically identifying identity based on voiceprint information of speaker
CN114240342A (en) * 2021-11-30 2022-03-25 珠海大横琴科技发展有限公司 Conference control method and device
CN115623132B (en) * 2022-11-18 2023-04-04 北京中电慧声科技有限公司 Intelligent conference system

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2000023130A (en) * 1998-06-30 2000-01-21 Toshiba Corp Video conference system
CN102256098A (en) * 2010-05-18 2011-11-23 宝利通公司 Videoconferencing endpoint having multiple voice-tracking cameras
CN102572372A (en) * 2011-12-28 2012-07-11 中兴通讯股份有限公司 Extraction method and device for conference summary

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2000023130A (en) * 1998-06-30 2000-01-21 Toshiba Corp Video conference system
CN102256098A (en) * 2010-05-18 2011-11-23 宝利通公司 Videoconferencing endpoint having multiple voice-tracking cameras
CN102572372A (en) * 2011-12-28 2012-07-11 中兴通讯股份有限公司 Extraction method and device for conference summary

Also Published As

Publication number Publication date
CN106657865A (en) 2017-05-10

Similar Documents

Publication Publication Date Title
CN106657865B (en) Conference summary generation method and device and video conference system
CN106782545B (en) A kind of system and method that audio, video data is converted to writing record
CN108346034B (en) Intelligent conference management method and system
US20190190908A1 (en) Systems and methods for automatic meeting management using identity database
CN112037791B (en) Conference summary transcription method, apparatus and storage medium
KR101636716B1 (en) Apparatus of video conference for distinguish speaker from participants and method of the same
CN107333090B (en) Video conference data processing method and platform
CN111193890B (en) Conference record analyzing device and method and conference record playing system
US20090123035A1 (en) Automated Video Presence Detection
CN101715102A (en) Displaying dynamic caller identity during point-to-point and multipoint audio/video conference
JP5316248B2 (en) Video conference device, video conference method, and program thereof
CN111401699A (en) Intelligent conference management method, robot and storage medium
KR102263154B1 (en) Smart mirror system and realization method for training facial sensibility expression
CN110188364B (en) Translation method, device and computer readable storage medium based on intelligent glasses
CN114240342A (en) Conference control method and device
JP7204337B2 (en) CONFERENCE SUPPORT DEVICE, CONFERENCE SUPPORT SYSTEM, CONFERENCE SUPPORT METHOD AND PROGRAM
US8452599B2 (en) Method and system for extracting messages
CN112954451A (en) Method, device and equipment for adding information to video character and storage medium
US20220222449A1 (en) Presentation transcripts
CN113611308B (en) Voice recognition method, device, system, server and storage medium
TWM591655U (en) Spokesperson audio and video tracking system
CN113643708B (en) Method and device for identifying ginseng voiceprint, electronic equipment and storage medium
CN211788155U (en) Intelligent conference recording system
KR102291113B1 (en) Apparatus and method for producing conference record
CN114764690A (en) Method, device and system for intelligently conducting conference summary

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant