CN106657865B - Conference summary generation method and device and video conference system - Google Patents
Conference summary generation method and device and video conference system Download PDFInfo
- Publication number
- CN106657865B CN106657865B CN201611170590.8A CN201611170590A CN106657865B CN 106657865 B CN106657865 B CN 106657865B CN 201611170590 A CN201611170590 A CN 201611170590A CN 106657865 B CN106657865 B CN 106657865B
- Authority
- CN
- China
- Prior art keywords
- speaker
- identity information
- information
- conference
- conference summary
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 58
- 238000006243 chemical reaction Methods 0.000 claims description 13
- 239000000126 substance Substances 0.000 claims 2
- 238000010586 diagram Methods 0.000 description 11
- 238000004891 communication Methods 0.000 description 5
- 238000012986 modification Methods 0.000 description 4
- 230000004048 modification Effects 0.000 description 4
- 230000007717 exclusion Effects 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000008030 elimination Effects 0.000 description 1
- 238000003379 elimination reaction Methods 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 230000000717 retained effect Effects 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N7/00—Television systems
- H04N7/14—Systems for two-way working
- H04N7/15—Conference systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/02—Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Acoustics & Sound (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Alarm Systems (AREA)
- Telephonic Communication Services (AREA)
- Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
Abstract
The invention discloses a method and a device for generating a conference summary and a conference system, wherein the method comprises the following steps: obtaining biological characteristic information of a speaker; the biological characteristic information comprises voiceprint information and image information; and identifying the identity of the speaker according to the biological characteristic information of the speaker, and displaying the identity information of the speaker on a display interface. According to the technical scheme of the embodiment of the invention, when a video conference is carried out, voice data of a speaker can be converted into text data, identity information of the speaker is identified, and the text data and the identity information are combined to form a conference summary, so that participants can know not only the identity information of the speaker but also the speech content of each participant, and the efficiency of the video conference is improved.
Description
Technical Field
The invention relates to the technical field of computers, in particular to a method and a device for generating a conference summary and a video conference system.
Background
A video conference system refers to a system device for transmitting audio, video and file data to each other through a transmission line and multimedia devices for individuals or groups in two or more different places, so as to realize real-time and interactive communication and achieve the purpose of conference.
However, in the video conference system, when a video conference is performed among multiple parties, especially when multiple persons participate in the conference in one conference room and speak, an actual conference picture cannot be focused on an actual speaker, and other parties participating in the conference cannot see the behavior and expression of the speaker clearly in real time and cannot know the identity of the speaker, so that communication among the parties participating in the conference is affected, and the effect of the video conference is affected.
Disclosure of Invention
In view of the above, the present invention provides a method and an apparatus for generating a conference summary, which can assist in identifying the identity of a speaker according to the voice of the speaker, and a video conference system.
In order to achieve the above object, an embodiment of the present invention provides a method for generating a conference summary, including:
acquiring voice data of a speaker;
converting the voice data of the speaker into character data;
and combining the identity information of the speakers to form a conference summary.
The invention also provides a device for generating the conference summary, which comprises:
the acquisition module is configured to acquire voice data of a speaker;
the conversion module is configured to convert the voice data of the speaker into text data;
a processing module configured to form a conference summary in conjunction with the identity information of the speaker.
The invention also provides a video conference system comprising the device.
The technical scheme provided by the embodiment of the invention can be seen that when the video conference is carried out, the voice data of the speaker can be converted into the text data, the identity information of the speaker can be identified, and the text data and the identity information are combined to form a conference summary, so that the participants can know the identity information of the speaker and the speech content of each participant, and the efficiency of the video conference is improved.
Drawings
FIG. 1 is a flow chart of a method of generating a conference summary of one embodiment of the present invention;
FIG. 2 is a schematic diagram of the formation of a conference summary in conjunction with speaker identity information in accordance with one embodiment of the present invention;
FIG. 3 is a flow chart of a method for generating a conference summary according to one embodiment of the present invention;
FIG. 4 is a diagram illustrating identification of a speaker based on the speaker's voiceprint in accordance with one embodiment of the present invention;
FIG. 5 is a diagram illustrating identification of a speaker based on an image of the speaker according to one embodiment of the present invention;
FIG. 6 is a flow chart of a method of generating a conference summary of another embodiment of the present invention;
FIG. 7 is a flow chart of a method of generating a conference summary of yet another embodiment of the present invention;
FIG. 8 is a flow chart of a method of generating a conference summary of yet another embodiment of the present invention;
fig. 9 is a schematic diagram of a conference summary generation apparatus according to an embodiment of the present invention;
FIG. 10 is a schematic diagram of a conference summary generation apparatus according to one embodiment of the present invention;
fig. 11 is a schematic diagram of a conference summary generating apparatus according to another embodiment of the present invention;
fig. 12 is a schematic diagram of a conference summary generating apparatus according to still another embodiment of the present invention;
fig. 13 is a schematic diagram of a conference summary generation apparatus according to still another embodiment of the present invention.
Detailed Description
Various aspects and features of the disclosure are described herein with reference to the drawings.
It will be understood that various modifications may be made to the embodiments disclosed herein. Accordingly, the foregoing description should not be construed as limiting, but merely as exemplifications of embodiments. Other modifications will occur to those skilled in the art within the scope and spirit of the disclosure.
The accompanying drawings, which are incorporated in and constitute a part of the specification, illustrate embodiments of the disclosure and, together with a general description of the disclosure given above, and the detailed description of the embodiments given below, serve to explain the principles of the disclosure.
These and other characteristics of the invention will become apparent from the following description of a preferred form of embodiment, given as a non-limiting example, with reference to the accompanying drawings.
It should also be understood that, although the invention has been described with reference to some specific examples, a person of skill in the art shall certainly be able to achieve many other equivalent forms of the invention, having the characteristics as set forth in the claims and hence all coming within the field of protection defined thereby.
The above and other aspects, features and advantages of the present disclosure will become more apparent in view of the following detailed description when taken in conjunction with the accompanying drawings.
Specific embodiments of the present disclosure are described hereinafter with reference to the accompanying drawings; however, it is to be understood that the disclosed embodiments are merely examples of the disclosure that may be embodied in various forms. Well-known and/or repeated functions and structures have not been described in detail so as not to obscure the present disclosure with unnecessary or unnecessary detail. Therefore, specific structural and functional details disclosed herein are not to be interpreted as limiting, but merely as a basis for the claims and as a representative basis for teaching one skilled in the art to variously employ the present disclosure in virtually any appropriately detailed structure.
The specification may use the phrases "in one embodiment," "in another embodiment," "in yet another embodiment," or "in other embodiments," which may each refer to one or more of the same or different embodiments in accordance with the disclosure.
Fig. 1 is a flowchart of a method for generating a conference summary according to an embodiment of the present invention, and as shown in fig. 1, the method for generating a conference summary according to the embodiment may specifically include:
voice data of a speaker is acquired.
The embodiment is applied to a video conference system. The method for acquiring the voice data of the speaker by acquiring the voice data of the speaker during the video conference may be, for example, acquiring the voice data of the speaker by a microphone built in the video conference system or an external high-sensitivity microphone.
And converting the voice data of the speaker into text data.
In order to facilitate the arrangement of the voice data of the speaker, the voice data of the speaker can be converted into text data. In particular, the conversion may be performed by voice conversion text software.
And combining the identity information of the speakers to form a conference summary.
Specifically, not only the utterance content of the speaker but also the identification information, such as a name, of the speaker corresponding to the utterance content is displayed in the conference summary. Therefore, in this step, the identity of the speaker needs to be identified, and then the corresponding speaking contents are combined to form a conference summary. For example, as shown in fig. 2, the participants include A, B and C, a speaker who speaks, a speaks "big morning good", then B speaks, for example, B speaks "moderator morning good", then C speaks, C speaks "start now … …", the speaker speaks and recognizes their identification information, for example, by biometric information, when a speaks, recognizes a name as a, displays a at the position of the speaker corresponding to the conference era, the content of the utterance is "big morning good", displays "big morning good" at the position of the content of the utterance corresponding to a, displays the content of the utterance corresponding to B at the position of the speaker corresponding to a, displays the content of the speaker corresponding to B at the position of the identified speaker according to the information of the identified B, and similarly, displaying the identity information and the speaking content of the C at corresponding positions.
According to the technical scheme of the embodiment of the invention, when a video conference is carried out, voice data of a speaker can be converted into text data, identity information of the speaker is identified, and the text data and the identity information are combined to form a conference summary, so that participants can know not only the identity information of the speaker but also the speech content of each participant, and the efficiency of the video conference is improved.
Fig. 3 is a flowchart of a method for generating a conference summary according to an embodiment of the present invention, and the method for generating a conference summary according to the embodiment further introduces the technical solution of the present invention in more detail based on the embodiment shown in fig. 1. As shown in fig. 3, the method for generating a conference summary in this embodiment may specifically include:
voice data of a speaker is acquired.
The embodiment is applied to a video conference system. The method for acquiring the voice data of the speaker by acquiring the voice data of the speaker during the video conference may be, for example, acquiring the voice data of the speaker by a microphone built in the video conference system or an external high-sensitivity microphone.
Converting the voice data of the speaker into character data;
in order to facilitate the arrangement of the voice data of the speaker, the voice data of the speaker can be converted into text data. In particular, the conversion may be performed by voice conversion text software.
And identifying the identity information of the speaker according to the biological characteristic information of the speaker.
Further, recognizing the identity information of the speaker according to the biometric information of the speaker includes: a, obtaining biological characteristic information B of a speaker, identifying the identity of the speaker according to the biological characteristic information of the speaker, and displaying the identity information of the speaker on a display interface.
Specifically, because there are interference factors such as noise in the conference site, the identification of the identity of the speaker often cannot achieve an ideal situation, that is, the identification rate that can be achieved by the identification of the identity of the speaker cannot necessarily achieve 100%. In practice, the recognition rate may vary from 60%, 70% or 80% (where the recognition rate represents the probability of recognition rather than the accuracy of recognition). Thus, the rate of identification of the speaker identity information may be differentiated by some means to alert the participants that the identification of the speaker identity information is not necessarily hundreds of accurate. For example, one method that can be employed is to distinguish the recognition rates by color, i.e., a recognition rate of less than 60% can display the name of the speaker in red font, a recognition rate of between 60% and 80%, can display the name of the speaker in yellow font, a recognition rate of between 80% and 100% can display the name of the speaker in blue font; it is also possible to adopt a method in which, for speakers with a recognition rate lower than 60%, their names are displayed while being identified behind them? "to indicate that the speaker is identified less frequently and that the name may be inaccurate.
In addition, when the identity of the speaker is identified, there may be multiple persons in the identity information of the speaker, for example, the identification rate for identifying the current speaker a is 75%, the identification rate for the speaker B is also 75%, and the machine cannot determine whether the speaker is a or B, at this time, the identity information of the identified speakers may be listed, in a further preferred embodiment, the identity information of the speaker may also be listed through a drop-down frame, and the participant may select actual identity information of the speaker when forming a meeting era.
In a further preferred embodiment, if a more accurate speaker is eventually identified after a period of identification, other speaker identity information may be deleted. For example, at the beginning of the video conference, if it is recognized that the recognition rates of the current speakers a and B are both 75%, the current speakers a and B may be displayed at the same time, and then after a period of recognition, the recognition rate of a is finally recognized to be 95%, and the recognition rate of B is still 75%, that is, the recognition rates of the two are different greatly, for example, if the recognition rate exceeds 10%, the difference is considered to be large, the current speaker a may be considered, and a may be retained, and B may be deleted.
Further, the biometric information may include voiceprint information.
If the biometric information is voiceprint information,
accordingly, the method for acquiring the biological characteristic information of the speaker comprises the following steps:
acquiring voiceprint information of a speaker;
identifying the identity of the speaker according to the biometric information of the speaker, comprising:
and identifying the identity of the speaker according to the voiceprint information of the speaker.
Further, recognizing the identity of the speaker according to the voiceprint information of the speaker includes: c, comparing the voiceprints of the speaker with the voiceprints in the voiceprint library one by one; and D, if the comparison is consistent, outputting the identity of the speaker corresponding to the voiceprint in the voiceprint library.
And judging whether the recognition rate for recognizing the identity information of the speaker exceeds a preset threshold value.
If so, processing the identity information of the speaker and corresponding text data to form the conference summary;
otherwise, the identity information of the speaker is marked as a mark, and the corresponding text data is processed to form the conference summary.
Specifically, for example, the recognition rate may be set to 80%, and if the recognition rate exceeds 80%, the identity information of the speaker is relatively accurate, and if the recognition rate does not exceed 80%, the identity information of the speaker is in doubt.
Specifically, when obtaining the identity information of the speaker, the name information of the speaker is first determined according to the list of the participants, and then the name information of the participants is queried in a related database, such as an employee database of a certain department of a company or a staff database of a certain branch of the company, so as to further obtain the identity information of the speaker. Therefore, the identity information includes information such as the name, the title, the affiliated entity, or the address location of the speaker. Then, the identity information of the speaker may be displayed in detail in the conference summary formed later, for example, the post of the name thereof, or the affiliate to which the speaker belongs and the post of the name thereof may be displayed and then the corresponding contents of the speaker may be displayed at the position where the contents of the speaker are displayed in the conference summary.
Since there may be noise in the video conference site, there may be noise interference when acquiring the speaker's audio data, or information may be lost when transmitting audio data for a remote video conference, there may be a deviation when extracting the speaker's voiceprint information and comparing it with the voiceprint library. Therefore, there is a case where the identity information of the speaker cannot be recognized. At this point, the identity of the speaker may be temporarily identified, for example, by marking the name of the speaker with the letter A, B or C. For example, speaker a and its corresponding utterance are displayed in a conference summary. Therefore, the problem that when a plurality of unidentifiable speakers exist, the identity information of the speakers is displayed, and the speaking contents of the speakers are mixed up is avoided.
As shown in fig. 4, in a specific implementation, when a video conference is performed and a speaker speaks, audio information of the speaker is collected, a voiceprint in the audio information of the speaker is extracted, the voiceprint of the speaker is identified by using a voiceprint identification module to determine identity information of the speaker, the identity information of a user is sent to the video conference system, and then a voiceprint identification feedback module is used to verify whether the voiceprint corresponds to the identity information of the speaker, specifically, the voiceprint can be compared with a voiceprint stored in a voiceprint library, and finally, the identity information of the speaker is displayed on a display device connected to the video conference system.
According to the technical scheme of the embodiment of the invention, when the video conference is carried out, the identity of the speaker can be identified according to the voiceprint information of the speaker, and the identity information of the speaker is displayed on the display interface, so that other participants can know the identity of the speaker in real time and can communicate with parties participating in the video conference, and the efficiency of the video conference is improved.
The flowchart of the third embodiment of the method for generating a conference summary of the present invention is consistent with fig. 3, please continue to refer to fig. 3, and the method for generating a conference summary of the present embodiment further introduces the technical solution of the present invention in more detail on the basis of the embodiment shown in fig. 1. As shown in fig. 3, the method for generating a conference summary in this embodiment may specifically include:
acquiring voice data of a speaker;
converting the voice data of the speaker into character data;
identifying the identity information of the speaker according to the biological characteristic information of the speaker;
further, recognizing the identity information of the speaker according to the biometric information of the speaker includes: a, obtaining biological characteristic information B of a speaker, identifying the identity of the speaker according to the biological characteristic information of the speaker, and displaying the identity information of the speaker on a display interface.
The biometric information comprises image information;
if the biometric information is image information,
accordingly, the method for acquiring the biological characteristic information of the speaker comprises the following steps:
acquiring image information of a speaker;
identifying the identity of the speaker according to the biometric information of the speaker, comprising:
and E, identifying the identity of the speaker according to the image information of the speaker.
Further, recognizing the identity of the speaker according to the image information of the speaker includes: f, comparing the images of the speaker with the images in the image library one by one; and G, if the comparison is consistent, outputting the identity of the speaker corresponding to the image in the image library.
Judging whether the recognition rate for recognizing the identity information of the speaker exceeds a preset threshold value or not;
if so, processing the identity information of the speaker and corresponding text data to form the conference summary;
otherwise, the identity information of the speaker is marked as a mark, and the corresponding text data is processed to form the conference summary.
The embodiment can also adopt the image information of the speaker to acquire the identity information of the speaker, and certainly, the voice print recognition mode and the image recognition mode can be simultaneously implemented, so that the acquired identity information of the speaker is more accurate.
As shown in fig. 5, in a specific implementation of this embodiment, when the identity information of the speaker is identified through the voiceprint information of the speaker, the image information of the speaker may be collected by setting an image collection module, and the image collection module may adopt a camera; comparing the image information of the speaker with the images in the image library one by one to obtain the identity information of the speaker corresponding to the images in the image library. Displaying identity information of the speaker on a display interface, for example, displaying the name of the speaker; an image of the speaker is displayed simultaneously, which may include an avatar of the speaker, as well as an image of the body of the speaker, such as a gesture of the speaker, etc.
In other embodiments of the present invention, the speaker may also be located according to the image information of the speaker; and displaying the image information of the speaker. In specific implementation, the speaker moves along with the body motion or posture in the speaker and possibly moves according to the position of the display interface, and the speaker is tracked and positioned by the rotation of the holder of the camera, so that other remote participants can see the conference picture more really.
According to the technical scheme of the embodiment of the invention, when the video conference is carried out, the identity of the speaker can be identified according to the image information of the speaker, and the identity information and the image information of the speaker are displayed on the display interface, so that the scene sense of the video conference is enhanced, the communication of participants is facilitated, and the efficiency of the video conference is improved.
Fig. 6 is a flowchart of a method for generating a conference summary according to another embodiment of the present invention, and the method for generating a conference summary according to this embodiment further introduces the technical solution of the present invention in more detail based on the embodiment shown in fig. 3. As shown in fig. 6, the method for generating a conference summary in this embodiment may specifically include:
voice data of a speaker is acquired.
The embodiment is applied to a video conference system. The method for acquiring the voice data of the speaker by acquiring the voice data of the speaker during the video conference may be, for example, acquiring the voice data of the speaker by a microphone built in the video conference system or an external high-sensitivity microphone.
And converting the voice data of the speaker into text data.
In order to facilitate the arrangement of the voice data of the speaker, the voice data of the speaker can be converted into text data. In particular, the conversion may be performed by voice conversion text software.
And identifying the identity information of the speaker according to the biological characteristic information of the speaker.
Further, recognizing the identity information of the speaker according to the biometric information of the speaker includes: a, obtaining biological characteristic information B of a speaker, identifying the identity of the speaker according to the biological characteristic information of the speaker, and displaying the identity information of the speaker on a display interface.
The biometric information may include voiceprint information and image information. For a specific embodiment, refer to the second embodiment and the third embodiment shown in fig. 2.
And judging whether the recognition rate for recognizing the identity information of the speaker exceeds a preset threshold value.
If so, processing the identity information of the speaker and corresponding text data to form the conference summary;
otherwise, the identity information of the speaker is marked as a mark, and the corresponding text data is processed to form the conference summary.
Further, the identity information of different speakers is identified as a unique mark.
For example, when there are a plurality of speakers who cannot recognize the identity information, different speakers are identified as different unique marks to distinguish the different speakers.
In a specific implementation, in a conference process, if the identity information of the speaker is identified, the mark in the conference summary is replaced by the identity information.
Since there may be noise in the video conference site, there may be noise interference when acquiring the speaker's audio data, or information may be lost when transmitting audio data for a remote video conference, there may be a deviation when extracting the speaker's voiceprint information and comparing it with the voiceprint library. Therefore, there is a case where the identity information of the speaker cannot be recognized. At this point, the identity of the speaker may be temporarily identified, for example, by marking the name of the speaker with the letter A, B or C. For example, speaker a and its corresponding utterance are displayed in a conference summary. In the subsequent video conference process, the audio data of the speaker A are further collected, meanwhile, the voiceprint information of the speaker A and the image information of the speaker A are further recognized, and if the identity information of the speaker A is finally recognized after a period of recognition process, for example, Zhang III, the name of the speaker is replaced by the identifier A, and for example, Zhang III is replaced by the identifier A.
In other embodiments of the present invention, an exclusion method may also be adopted, for example, first, a whole list, a voiceprint library and an image library of the participants are obtained, and after the identity information of the participants is obtained by using the above embodiment, only the identity information of 1 participant is left and cannot be identified, then the identified participants may be excluded by the list of the participants through the exclusion method, and the remaining participant is a speaker that cannot be identified, so that the identity information of the speaker can be obtained; in still another embodiment of the present invention, an elimination method is still adopted, for example, first, the whole list, the voiceprint library and the image library of the participants are obtained, after the identity information of the participants is obtained by using the above embodiment, only the identity information of a few participants is left and cannot be identified, one of the identity information of the remaining participants can be corresponding to the identity of the participant which cannot be identified, and the accuracy of the method is higher because the number of the participants which cannot be identified is less.
The identity information includes information such as the name, the title, the affiliated unit or the address location of the speaker. In this embodiment, the name of the speaker may be used, that is, the name of the speaker and the corresponding utterance are displayed in the conference summary.
According to the technical scheme of the embodiment of the invention, when the video conference is carried out, the identity of the speaker can be identified according to the voiceprint information of the speaker, and the identity information of the speaker is displayed on the display interface, so that other participants can know the identity of the speaker in real time and can communicate with parties participating in the video conference, and the efficiency of the video conference is improved.
Fig. 7 is a flowchart of a method for generating a conference summary according to still another embodiment of the present invention, and the method for generating a conference summary according to the embodiment further introduces the technical solution of the present invention in more detail based on the embodiment shown in fig. 1. As shown in fig. 7, the method for generating a conference summary in this embodiment may specifically include:
voice data of a speaker is acquired.
The embodiment is applied to a video conference system. The method for acquiring the voice data of the speaker by acquiring the voice data of the speaker during the video conference may be, for example, acquiring the voice data of the speaker by a microphone built in the video conference system or an external high-sensitivity microphone.
And converting the voice data of the speaker into text data.
In order to facilitate the arrangement of the voice data of the speaker, the voice data of the speaker can be converted into text data. In particular, the conversion may be performed by voice conversion text software.
And combining the identity information of the speakers to form a conference summary.
Specifically, the speaker is identified and combined with the text data, that is, the speaker and the voice data correspond to the converted text data, and a conference summary is formed. Therefore, the participants can not only know the identity information of the speakers, but also know the speech content of each participant.
Pushing the meeting summary to each participant.
After the conference is finished, the participants can only remember the contents of the conference and possibly forget the contents of the conference, so that the conference summary can be transmitted to each participant through the network after the conference summary is formed, for example, the conference summary can be sent to a mailbox of each participant and can also be pushed to a mobile phone of each participant.
According to the technical scheme of the embodiment of the invention, when the video conference is carried out, the identity of the speaker can be identified according to the voiceprint information of the speaker, and the identity information of the speaker is displayed on the display interface, so that other participants can know the identity of the speaker in real time, communication between participants is facilitated, and the efficiency of the video conference is improved.
Fig. 8 is a flowchart of a method for generating a conference summary according to another embodiment of the present invention, and the method for generating a conference summary according to the embodiment further introduces the technical solution of the present invention in more detail based on the embodiment shown in fig. 1. As shown in fig. 8, the method for generating a conference summary in this embodiment may specifically include:
voice data of a speaker is acquired.
The embodiment is applied to a video conference system. The method for acquiring the voice data of the speaker by acquiring the voice data of the speaker during the video conference may be, for example, acquiring the voice data of the speaker by a microphone built in the video conference system or an external high-sensitivity microphone.
And converting the voice data of the speaker into text data.
In order to facilitate the arrangement of the voice data of the speaker, the voice data of the speaker can be converted into text data. In particular, the conversion may be performed by voice conversion text software.
The position of each participant is located.
If the positions of the participants are outside the conference room, recording the current conference content to form a conference summary until the positions of the participants are positioned in the conference room; the recording mode can be a recording mode or a mode of converting the speech of the current speaker into characters and storing the characters;
sending the meeting summary to the attendees.
In specific implementation, the conference summary may not be formed for the whole conference process, but only for special cases. For example, the participant may miss important content when going out midway, and therefore, the tracking and positioning function based on the image acquisition module can position the position of the participant, and when the participant is found to go out, the voice data of the current speaker is converted into text data, or the voice data of the current speaker can be directly sorted into a meeting summary and sent to the participant who goes out, so as to prevent the participant from missing important information.
According to the technical scheme of the embodiment of the invention, when the video conference is carried out, the identity of the speaker can be identified according to the voiceprint information of the speaker, and the identity information of the speaker is displayed on the display interface, so that other participants can know the identity of the speaker in real time and can communicate with parties participating in the video conference, and the efficiency of the video conference is improved.
Fig. 9 is a schematic view of a device for generating a conference summary according to an embodiment of the present invention, and as shown in fig. 9, the device for generating a conference summary according to the embodiment may specifically include an obtaining module, a converting module, and a processing module.
The acquisition module is configured to acquire voice data of a speaker;
the conversion module is configured to convert the voice data of the speaker into text data;
a processing module configured to form a conference summary in conjunction with the identity information of the speaker.
According to the technical scheme of the embodiment of the invention, when a video conference is carried out, voice data of a speaker can be converted into text data, identity information of the speaker is identified, and the text data and the identity information are combined to form a conference summary, so that participants can know not only the identity information of the speaker but also the speech content of each participant, and the efficiency of the video conference is improved.
Fig. 10 is a schematic diagram of a device for generating a conference summary according to one embodiment of the present invention, and the device for generating a conference summary according to the embodiment further introduces the technical solution of the present invention in more detail based on the embodiment shown in fig. 9.
As shown in fig. 10, the processing module includes:
the identification submodule is configured to identify the identity information of the speaker according to the biological characteristic information of the speaker;
the judgment submodule is configured to judge whether the recognition rate for recognizing the identity information of the speaker exceeds a preset threshold value;
the processing submodule is configured to process the identity information of the speaker and corresponding text data when the identity information of the speaker is successfully identified, so as to form the conference summary; alternatively, the first and second electrodes may be,
and when the recognition rate for recognizing the identity information of the speaker does not exceed a preset threshold value, identifying the identity information of the speaker as a mark and processing corresponding text data to form the conference summary.
Further, the identification submodule is specifically configured to:
obtaining biological characteristic information of a speaker; the biological characteristic information comprises voiceprint information and image information;
and identifying the identity of the speaker according to the biological characteristic information of the speaker, and displaying the identity information of the speaker on a display interface.
Further, the identification submodule is further specifically configured to:
comparing the voiceprints of the speaker with the voiceprints in the voiceprint library one by one;
and if the comparison is consistent, outputting the identity of the speaker corresponding to the voiceprint in the voiceprint library.
According to the technical scheme of the embodiment of the invention, when the video conference is carried out, the identity of the speaker can be identified according to the voiceprint information of the speaker, and the identity information of the speaker is displayed on the display interface, so that other participants can know the identity of the speaker in real time and can communicate with parties participating in the video conference, and the efficiency of the video conference is improved.
Fig. 10 is a schematic view of a third embodiment of the device for generating a conference summary of the present invention, and please refer to fig. 10.
With continued reference to fig. 10, the processing module includes:
the identification submodule is configured to identify the identity information of the speaker according to the biological characteristic information of the speaker;
the processing submodule is configured to process the identity information of the speaker and the corresponding text data to form the conference summary; alternatively, the first and second electrodes may be,
and when the identity information of the speaker cannot be identified according to the biological characteristic information of the speaker, identifying the identity information of the speaker as a mark and processing the corresponding text data to form the conference summary.
The identification submodule is specifically configured to:
when the biological characteristic information is image information, acquiring image information of a speaker; and identifying the identity of the speaker according to the image information of the speaker.
The identification submodule is further specifically configured to:
comparing the images of the speakers with the images in the image library one by one;
and if the comparison is consistent, outputting the identity of the speaker corresponding to the image in the image library.
According to the technical scheme of the embodiment of the invention, when the video conference is carried out, the identity of the speaker can be identified according to the image information of the speaker, and the identity information and the image information of the speaker are displayed on the display interface, so that the scene sense of the video conference is enhanced, the communication of participants is facilitated, and the efficiency of the video conference is improved.
Fig. 11 is a schematic view of a device for generating a conference summary according to another embodiment of the present invention, and the device for generating a conference summary according to this embodiment further introduces the technical solution of the present invention in more detail based on the embodiment shown in fig. 9.
As shown in fig. 11, the apparatus for generating a conference summary according to this embodiment further includes:
the first positioning module is configured to position the speaker according to the image information of the speaker;
a display module configured to display image information of the speaker.
According to the technical scheme of the embodiment of the invention, when the video conference is carried out, the identity of the speaker can be identified according to the voiceprint information of the speaker, and the identity information of the speaker is displayed on the display interface, so that other participants can know the identity of the speaker in real time and can communicate with parties participating in the video conference, and the efficiency of the video conference is improved.
Fig. 12 is a schematic diagram of a device for generating a conference summary according to still another embodiment of the present invention, and the device for generating a conference summary according to the present embodiment further introduces the technical solution of the present invention in more detail on the basis of the first embodiment shown in fig. 9.
As shown in fig. 12, the apparatus for generating a conference summary according to this embodiment may further include:
the processing module further comprises:
and the marking submodule is configured to identify the identity information of different speakers as a unique mark.
The processing module further comprises:
and the replacing submodule is configured to replace the mark in the conference summary with the identity information when the identity information of the speaker is identified in the conference process.
A first pushing module configured to push the meeting summary to each participant.
According to the technical scheme of the embodiment of the invention, when the video conference is carried out, the identity of the speaker can be identified according to the voiceprint information of the speaker, and the identity information of the speaker is displayed on the display interface, so that other participants can know the identity of the speaker in real time and can communicate with parties participating in the video conference, and the efficiency of the video conference is improved.
Fig. 13 is a schematic diagram of a device for generating a conference summary according to still another embodiment of the present invention, and the device for generating a conference summary according to this embodiment further introduces the technical solution of the present invention in more detail based on the embodiment shown in fig. 9.
The generation apparatus of a conference summary of the present embodiment further includes:
the second positioning module is configured to position the position of each participant;
the recording module is configured to record the current conference content when the positions of the participants are outside the conference room to form a conference summary until the positions of the participants are positioned in the conference room; the recording mode can be a recording mode or a mode of converting the speech of the current speaker into characters and storing the characters;
a second push module configured to send the meeting summary to the attendees.
According to the technical scheme of the embodiment of the invention, when the video conference is carried out, the identity of the speaker can be identified according to the voiceprint information of the speaker, and the identity information of the speaker is displayed on the display interface, so that other participants can know the identity of the speaker in real time and can communicate with parties participating in the video conference, and the efficiency of the video conference is improved.
An embodiment of the present invention further provides a video conference system, including the apparatus shown in any one of fig. 9 to 13.
The above embodiments are only exemplary embodiments of the present invention, and are not intended to limit the present invention, and the scope of the present invention is defined by the claims. Various modifications and equivalents may be made by those skilled in the art within the spirit and scope of the present invention, and such modifications and equivalents should also be considered as falling within the scope of the present invention.
Claims (7)
1. A method of generating a conference summary, comprising:
acquiring voice data of a speaker;
converting the voice data of the speaker into character data;
forming a conference summary by combining the identity information of the speakers; wherein the content of the first and second substances,
forming a conference summary in combination with the identity information of the speaker, comprising:
identifying the identity information of the speaker according to the biological characteristic information of the speaker;
judging whether the recognition rate for recognizing the identity information of the speaker exceeds a preset threshold value or not;
if so, processing the identity information of the speaker and corresponding text data to form the conference summary;
otherwise, identifying the identity information of the speaker as a mark and processing the corresponding text data to form the conference summary;
wherein, identify the identity information of the speaker as a mark, and process the corresponding text data to form the conference summary, including:
the identity information of different speakers is marked as a unique mark; distinguishing the recognition rate for recognizing the identity information of the speaker to prompt the participants about the recognition condition of the identity information of the speaker;
and if the identity information of the speaker is identified, replacing the unique mark of the speaker in the conference summary with the identity information.
2. The method for generating a conference summary according to claim 1, wherein the identifying information of the speaker according to the biometric information of the speaker comprises:
obtaining biological characteristic information of a speaker; the biological characteristic information comprises voiceprint information and image information;
and identifying the identity of the speaker according to the biological characteristic information of the speaker, and displaying the identity information of the speaker on a display interface.
3. The method of generating a conference summary according to claim 2, said method further comprising:
positioning the speaker according to the image information of the speaker;
and displaying the image information of the speaker.
4. The method of generating a conference summary according to claim 1, said method further comprising:
pushing the meeting summary to each participant.
5. The method for generating a conference summary according to claim 1, forming a conference summary in combination with identity information of the speaker, comprising:
positioning the position of each participant;
if the positions of the participants are outside the conference room, recording the current conference content to form a conference summary until the positions of the participants are positioned in the conference room; the recording mode can be a recording mode or a mode of converting the speech of the current speaker into characters and storing the characters;
sending the meeting summary to the attendees.
6. An apparatus for generating a conference summary, comprising:
the acquisition module is configured to acquire voice data of a speaker;
the conversion module is configured to convert the voice data of the speaker into text data;
a processing module configured to form a conference summary in conjunction with the identity information of the speaker; wherein the content of the first and second substances,
the processing module comprises:
the identification submodule is configured to identify the identity information of the speaker according to the biological characteristic information of the speaker;
the judgment submodule is configured to judge whether the recognition rate for recognizing the identity information of the speaker exceeds a preset threshold value;
the processing submodule is configured to process the identity information of the speaker and corresponding text data when the identity information of the speaker is successfully identified, so as to form the conference summary; alternatively, the first and second electrodes may be,
when the recognition rate for recognizing the identity information of the speaker does not exceed a preset threshold value, identifying the identity information of the speaker as a mark and processing corresponding text data to form the conference summary;
the processing module further comprises:
the marking submodule is configured to identify the identity information of different speakers as a unique mark; distinguishing the recognition rate for recognizing the identity information of the speaker to prompt the participants about the recognition condition of the identity information of the speaker;
and the replacing submodule is configured to replace the unique mark of the speaker in the conference summary with the identity information if the identity information of the speaker is identified.
7. A video conferencing system comprising the apparatus of claim 6.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611170590.8A CN106657865B (en) | 2016-12-16 | 2016-12-16 | Conference summary generation method and device and video conference system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611170590.8A CN106657865B (en) | 2016-12-16 | 2016-12-16 | Conference summary generation method and device and video conference system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106657865A CN106657865A (en) | 2017-05-10 |
CN106657865B true CN106657865B (en) | 2020-08-25 |
Family
ID=58822170
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201611170590.8A Active CN106657865B (en) | 2016-12-16 | 2016-12-16 | Conference summary generation method and device and video conference system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106657865B (en) |
Families Citing this family (41)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107527623B (en) * | 2017-08-07 | 2021-02-09 | 广州视源电子科技股份有限公司 | Screen transmission method and device, electronic equipment and computer readable storage medium |
CN107257448A (en) * | 2017-08-09 | 2017-10-17 | 成都全云科技有限公司 | A kind of video conferencing system exchanged with font |
CN107609045B (en) * | 2017-08-17 | 2020-09-29 | 深圳壹秘科技有限公司 | Conference record generating device and method thereof |
CN107689225B (en) * | 2017-09-29 | 2019-11-19 | 福建实达电脑设备有限公司 | A method of automatically generating minutes |
CN107749313B (en) * | 2017-11-23 | 2019-03-01 | 郑州大学第一附属医院 | A kind of method of automatic transcription and generation Telemedicine Consultation record |
CN108022584A (en) * | 2017-11-29 | 2018-05-11 | 芜湖星途机器人科技有限公司 | Office Voice identifies optimization method |
CN109920428A (en) * | 2017-12-12 | 2019-06-21 | 杭州海康威视数字技术股份有限公司 | A kind of notes input method, device, electronic equipment and storage medium |
CN108074576B (en) * | 2017-12-14 | 2022-04-08 | 讯飞智元信息科技有限公司 | Speaker role separation method and system under interrogation scene |
CN107993665B (en) * | 2017-12-14 | 2021-04-30 | 科大讯飞股份有限公司 | Method for determining role of speaker in multi-person conversation scene, intelligent conference method and system |
CN107978317A (en) * | 2017-12-18 | 2018-05-01 | 北京百度网讯科技有限公司 | Meeting summary synthetic method, system and terminal device |
CN108231064A (en) * | 2018-01-02 | 2018-06-29 | 联想(北京)有限公司 | A kind of data processing method and system |
CN110022454B (en) * | 2018-01-10 | 2021-02-23 | 华为技术有限公司 | Method for identifying identity in video conference and related equipment |
CN108399923B (en) * | 2018-02-01 | 2019-06-28 | 深圳市鹰硕技术有限公司 | More human hairs call the turn spokesman's recognition methods and device |
CN108417218B (en) * | 2018-03-09 | 2020-12-22 | 福州米鱼信息科技有限公司 | Memorandum reminding method and terminal based on voiceprint |
KR102562227B1 (en) * | 2018-06-12 | 2023-08-02 | 현대자동차주식회사 | Dialogue system, Vehicle and method for controlling the vehicle |
CN110661923A (en) * | 2018-06-28 | 2020-01-07 | 视联动力信息技术股份有限公司 | Method and device for recording speech information in conference |
CN108986826A (en) * | 2018-08-14 | 2018-12-11 | 中国平安人寿保险股份有限公司 | Automatically generate method, electronic device and the readable storage medium storing program for executing of minutes |
CN109525800A (en) * | 2018-11-08 | 2019-03-26 | 江西国泰利民信息科技有限公司 | A kind of teleconference voice recognition data transmission method |
CN111193890B (en) * | 2018-11-14 | 2022-06-17 | 株式会社理光 | Conference record analyzing device and method and conference record playing system |
CN109741754A (en) * | 2018-12-10 | 2019-05-10 | 上海思创华信信息技术有限公司 | A kind of conference voice recognition methods and system, storage medium and terminal |
CN111354356B (en) * | 2018-12-24 | 2024-04-30 | 北京搜狗科技发展有限公司 | Voice data processing method and device |
CN111385185A (en) * | 2018-12-28 | 2020-07-07 | 中兴通讯股份有限公司 | Information processing method, computer device, and computer-readable storage medium |
CN109361527B (en) * | 2018-12-28 | 2021-02-05 | 苏州思必驰信息科技有限公司 | Voice conference recording method and system |
CN109783642A (en) * | 2019-01-09 | 2019-05-21 | 上海极链网络科技有限公司 | Structured content processing method, device, equipment and the medium of multi-person conference scene |
CN109816722A (en) * | 2019-01-18 | 2019-05-28 | 深圳市沃特沃德股份有限公司 | Position method, apparatus, storage medium and the computer equipment of spokesman position |
CN109887508A (en) * | 2019-01-25 | 2019-06-14 | 广州富港万嘉智能科技有限公司 | A kind of meeting automatic record method, electronic equipment and storage medium based on vocal print |
CN112466306B (en) * | 2019-08-19 | 2023-07-04 | 中国科学院自动化研究所 | Conference summary generation method, device, computer equipment and storage medium |
CN110580907B (en) * | 2019-08-28 | 2021-09-24 | 云知声智能科技股份有限公司 | Voice recognition method and system for multi-person speaking scene |
CN110677614A (en) * | 2019-10-15 | 2020-01-10 | 广州国音智能科技有限公司 | Information processing method, device and computer readable storage medium |
CN112750247A (en) * | 2019-10-30 | 2021-05-04 | 京东方科技集团股份有限公司 | Participant identification method, identification system, computer device, and medium |
CN110827853A (en) * | 2019-11-11 | 2020-02-21 | 广州国音智能科技有限公司 | Voice feature information extraction method, terminal and readable storage medium |
KR102178175B1 (en) * | 2019-12-09 | 2020-11-12 | 김경철 | User device and method of controlling thereof |
CN111048095A (en) * | 2019-12-24 | 2020-04-21 | 苏州思必驰信息科技有限公司 | Voice transcription method, equipment and computer readable storage medium |
CN113014854B (en) * | 2020-04-30 | 2022-11-11 | 北京字节跳动网络技术有限公司 | Method, device, equipment and medium for generating interactive record |
CN111818294A (en) * | 2020-08-03 | 2020-10-23 | 上海依图信息技术有限公司 | Method, medium and electronic device for multi-person conference real-time display combined with audio and video |
CN112037791B (en) * | 2020-08-12 | 2023-01-13 | 广东电力信息科技有限公司 | Conference summary transcription method, apparatus and storage medium |
CN114333853A (en) * | 2020-09-25 | 2022-04-12 | 华为技术有限公司 | Audio data processing method, equipment and system |
CN114792522A (en) * | 2021-01-26 | 2022-07-26 | 阿里巴巴集团控股有限公司 | Audio signal processing method, conference recording and presenting method, apparatus, system and medium |
CN113113022A (en) * | 2021-04-15 | 2021-07-13 | 吉林大学 | Method for automatically identifying identity based on voiceprint information of speaker |
CN114240342A (en) * | 2021-11-30 | 2022-03-25 | 珠海大横琴科技发展有限公司 | Conference control method and device |
CN115623132B (en) * | 2022-11-18 | 2023-04-04 | 北京中电慧声科技有限公司 | Intelligent conference system |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2000023130A (en) * | 1998-06-30 | 2000-01-21 | Toshiba Corp | Video conference system |
CN102256098A (en) * | 2010-05-18 | 2011-11-23 | 宝利通公司 | Videoconferencing endpoint having multiple voice-tracking cameras |
CN102572372A (en) * | 2011-12-28 | 2012-07-11 | 中兴通讯股份有限公司 | Extraction method and device for conference summary |
-
2016
- 2016-12-16 CN CN201611170590.8A patent/CN106657865B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2000023130A (en) * | 1998-06-30 | 2000-01-21 | Toshiba Corp | Video conference system |
CN102256098A (en) * | 2010-05-18 | 2011-11-23 | 宝利通公司 | Videoconferencing endpoint having multiple voice-tracking cameras |
CN102572372A (en) * | 2011-12-28 | 2012-07-11 | 中兴通讯股份有限公司 | Extraction method and device for conference summary |
Also Published As
Publication number | Publication date |
---|---|
CN106657865A (en) | 2017-05-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106657865B (en) | Conference summary generation method and device and video conference system | |
CN106782545B (en) | A kind of system and method that audio, video data is converted to writing record | |
CN108346034B (en) | Intelligent conference management method and system | |
US20190190908A1 (en) | Systems and methods for automatic meeting management using identity database | |
CN112037791B (en) | Conference summary transcription method, apparatus and storage medium | |
KR101636716B1 (en) | Apparatus of video conference for distinguish speaker from participants and method of the same | |
CN107333090B (en) | Video conference data processing method and platform | |
CN111193890B (en) | Conference record analyzing device and method and conference record playing system | |
US20090123035A1 (en) | Automated Video Presence Detection | |
CN101715102A (en) | Displaying dynamic caller identity during point-to-point and multipoint audio/video conference | |
JP5316248B2 (en) | Video conference device, video conference method, and program thereof | |
CN111401699A (en) | Intelligent conference management method, robot and storage medium | |
KR102263154B1 (en) | Smart mirror system and realization method for training facial sensibility expression | |
CN110188364B (en) | Translation method, device and computer readable storage medium based on intelligent glasses | |
CN114240342A (en) | Conference control method and device | |
JP7204337B2 (en) | CONFERENCE SUPPORT DEVICE, CONFERENCE SUPPORT SYSTEM, CONFERENCE SUPPORT METHOD AND PROGRAM | |
US8452599B2 (en) | Method and system for extracting messages | |
CN112954451A (en) | Method, device and equipment for adding information to video character and storage medium | |
US20220222449A1 (en) | Presentation transcripts | |
CN113611308B (en) | Voice recognition method, device, system, server and storage medium | |
TWM591655U (en) | Spokesperson audio and video tracking system | |
CN113643708B (en) | Method and device for identifying ginseng voiceprint, electronic equipment and storage medium | |
CN211788155U (en) | Intelligent conference recording system | |
KR102291113B1 (en) | Apparatus and method for producing conference record | |
CN114764690A (en) | Method, device and system for intelligently conducting conference summary |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |