CN112468665A - Method, device, equipment and storage medium for generating conference summary - Google Patents

Method, device, equipment and storage medium for generating conference summary Download PDF

Info

Publication number
CN112468665A
CN112468665A CN202011224273.6A CN202011224273A CN112468665A CN 112468665 A CN112468665 A CN 112468665A CN 202011224273 A CN202011224273 A CN 202011224273A CN 112468665 A CN112468665 A CN 112468665A
Authority
CN
China
Prior art keywords
conference
preset
target
text
generating
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011224273.6A
Other languages
Chinese (zh)
Inventor
曹乐
李琪
宋育芳
李孔仁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Construction Bank Corp
Original Assignee
China Construction Bank Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Construction Bank Corp filed Critical China Construction Bank Corp
Priority to CN202011224273.6A priority Critical patent/CN112468665A/en
Publication of CN112468665A publication Critical patent/CN112468665A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M3/00Automatic or semi-automatic exchanges
    • H04M3/42Systems providing special services or facilities to subscribers
    • H04M3/56Arrangements for connecting several subscribers to a common circuit, i.e. affording conference facilities
    • H04M3/568Arrangements for connecting several subscribers to a common circuit, i.e. affording conference facilities audio processing specific to telephonic conferencing, e.g. spatial distribution, mixing of participants
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3343Query execution using phonetics
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems

Abstract

The invention discloses a method, a device, equipment and a storage medium for generating a conference summary, wherein the method comprises the following steps: determining a target meeting theme of the target meeting in response to the meeting event; determining voice information corresponding to each participant in the target conference; determining a target preset text matched with the target conference theme in a preset text library; the preset text library comprises preset texts associated with conference subjects, and the conference subjects comprise the target conference subjects; recognizing the voice information according to the target preset text to obtain a voice recognition text; and generating a conference summary according to the voice recognition text and a preset conference summary format. The conference summary generation method and the conference summary generation device can accurately generate the conference summary in real time according to the voice information in the multi-person conference, avoid the complexity and time consumption of manually arranging the conference voice text in the prior art, improve the generation efficiency of the conference summary, and greatly improve the accuracy and the real-time performance of conference voice recognition.

Description

Method, device, equipment and storage medium for generating conference summary
Technical Field
The invention relates to the technical field of voice recognition, in particular to a method, a device, equipment and a storage medium for generating a conference summary.
Background
In the prior art, after the conference summary is recorded by a recording pen, voice recognition is performed by voice recognition software, and based on voice characteristics of different speakers in voice streams, voice information corresponding to each speaker is determined from the voice streams, so that a plurality of voice information is obtained, and then voice transcription is performed on the basis, so that the conference summary is generated. In a multi-person conference, the method often cannot accurately extract the voice information of different speakers, and a voice recognition system cannot accurately recognize the real semantics of the speakers and cannot perform real-time escaping.
Disclosure of Invention
The invention aims to provide a method, a device, equipment and a storage medium for generating a conference summary so as to improve the accuracy and the real-time performance of conference voice recognition and save labor cost.
The invention is realized by the following technical scheme:
in a first aspect, the present invention provides a method for generating a conference summary, including:
determining a target meeting theme of the target meeting in response to the meeting event;
determining voice information corresponding to each participant in the target conference;
determining a target preset text matched with the target conference theme in a preset text library; the preset text library comprises preset texts associated with conference subjects, and the conference subjects comprise the target conference subjects;
recognizing the voice information according to the target preset text to obtain a voice recognition text;
and generating a conference summary according to the voice recognition text and a preset conference summary format.
In a second aspect, the present invention provides a device for generating a conference summary, including:
the first acquisition module is used for responding to the conference event and determining a target conference theme of the target conference;
the second acquisition module is used for determining the voice information corresponding to each participant in the target conference;
the target preset text determining module is used for determining a target preset text matched with the target conference theme in a preset text library; the preset text library comprises preset texts associated with conference subjects, and the conference subjects comprise the target conference subjects;
the voice recognition module is used for recognizing the voice information according to the target preset text to obtain a voice recognition text;
and the conference summary generation module is used for generating a conference summary according to the voice recognition text and a preset conference summary format.
In a third aspect, the present invention provides an apparatus comprising a processor and a memory, wherein the memory stores at least one instruction, at least one program, a set of codes, or a set of instructions, and the at least one instruction, the at least one program, the set of codes, or the set of instructions is loaded and executed by the processor to implement the method for generating a meeting summary described above.
In a fourth aspect, the present invention provides a computer-readable storage medium, in which at least one instruction, at least one program, a set of codes, or a set of instructions is stored, and the at least one instruction, the at least one program, the set of codes, or the set of instructions is loaded and executed by a processor to implement the method for generating a meeting summary described above.
The implementation of the technical scheme of the invention has the following beneficial effects:
the invention solves the problem that the existing voice recognition system is inaccurate in special word recognition by the aid of the preset text library and the preset words associated with the conference theme, can accurately generate conference summary in real time from voice information in a multi-person conference, avoids the complexity and time consumption of manually arranging conference voice texts in the prior art, improves the generation efficiency of the conference summary, is favorable for saving labor cost, and greatly improves the accuracy and the real-time performance of conference voice recognition.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions and advantages of the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.
Fig. 1 is a schematic flow chart of a method for generating a conference summary according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of the microphone identification range provided by the embodiment of the invention;
fig. 3 is a schematic flowchart of a process of recognizing voice information according to a target preset text to obtain a voice recognition text according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of a process for extracting keywords from a conference summary according to an embodiment of the present invention;
fig. 5 is a schematic flow chart of modifying a preset text library according to an embodiment of the present invention;
fig. 6 is a schematic diagram of creating a preset text library according to an embodiment of the present invention;
fig. 7 is a schematic diagram of a device for generating a conference summary according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be further described in detail with reference to the following examples. It is to be understood that the described embodiments are merely exemplary of the invention, and not restrictive of the full scope of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that the terms "first," "second," "third," and the like in the description and in the claims, and in the drawings, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or server that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
Examples
The generation of the conference summary is generally achieved by three steps: firstly, acquiring voice information of participants through a voice acquisition device; then, sending the acquired voice information to a voice recognition system through a network for voice recognition; and finally, receiving the text content recognized by the system voice at the system interface, and generating a conference summary according to a preset conference summary format.
In the prior art, the conference recording mainly includes acquiring user voice by using a recording pen and then performing semantic content transcription by using a voice recognition tool. In a multi-person conference, because a plurality of participants participate in the discussion, the method often cannot accurately extract voices of different users, and the method needs to be manually arranged after the conference is finished so as to enable voice information to accurately correspond to a speaker, so that the method has no real-time performance and low efficiency. Moreover, since the conference generally has an exclusive theme, and a special word corresponding to the exclusive theme generally has a special feature, the special word adopted by the conference has special specificity, the semantic recognition function may not be able to accurately recognize the real semantics of the user, and for some texts with inaccurate speech recognition results existing in the conference summary, the user is required to further manually modify the conference summary after speech recognition, which results in an increase in labor cost.
Therefore, the present specification provides a technical solution that can simultaneously achieve the accuracy and the real-time performance of the conference voice recognition; specifically, the method comprises the following steps:
an embodiment of the present invention provides a method for generating a conference summary, such as the flowchart shown in fig. 1, and the present specification provides the method operation steps described in the embodiment or the flowchart, but may include more or less operation steps based on conventional or non-creative labor. The order of steps recited in the embodiments is merely one manner of performing the steps in a multitude of orders and does not represent the only order of execution. In practice, the system or server product may be implemented in a sequential or parallel manner (e.g., parallel processor or multi-threaded environment) according to the embodiments or methods shown in the figures. Specifically, as shown in fig. 1, the method may include:
s101: in response to the meeting event, a target meeting topic for the target meeting is determined. The conference generally has a special subject, and the subject of the conference is determined to be the basis of the establishment of a preset text library based on the conference subject.
S102: and determining the voice information corresponding to each participant in the target conference.
In the prior art, a voice of a user is obtained by using a recording pen, and then semantic content is transcribed by using a voice recognition tool. Because the method can not accurately extract the voices of different users, the method also needs manual finishing after the conference is finished, and has no real-time performance and low efficiency. When a plurality of persons participate in the meeting, a plurality of recording pens are needed to record voice, and the time sequence among the text information recognized by a plurality of voice streams needs to be confirmed manually, so that more labor cost is occupied. With the popularization of computers and the rapid development of internet technology, people have higher and higher requirements on intelligent experience. For example, when a multi-person conference discussion is performed, a user wants to recognize the speaking content of the participants in the conference in real time, and arrange the conference summary content according to the time line and the roles.
In a specific embodiment, determining the voice information corresponding to each participant in the target conference specifically includes: receiving target voice information of a preset angle of a target microphone; determining a target participant corresponding to the target microphone based on the corresponding relation between the microphones and the participants; and taking the target voice information as the voice information of the target participant to obtain the voice information corresponding to each participant.
As shown in fig. 2, microphones equal to the number of participants are provided, and the microphones correspond to the participants one-to-one, for example, in case of N participants, N microphones are provided, which are respectively a first microphone, a second microphone, a third microphone, and so on, each microphone corresponds to one participant, and the microphones are configured to receive voice information of a preset angle, where the preset angle refers to a maximum angle range of a sound source that can be received by the microphone with a position of the microphone as an origin and a connection line between the microphone and the participant corresponding to the microphone as a center line. As shown in fig. 2, the microphones 1, 2 and 3 are respectively corresponding to participants, the solid line indicates that the participant corresponding to the microphone is aligned with the microphone, the dotted line indicates the maximum range of the microphone that can receive voice, and the microphone cannot receive voice outside the dotted line. The embodiment of the invention considers the complexity of the environment when multiple persons communicate, focuses on how to accurately and clearly acquire the voice of each participant in a voice acquisition mode aiming at a special scene of a multi-person conference, and eliminates the common interference factors of simultaneous sound production of multiple sound sources, noise of the environment and the like.
Specifically, the position of the microphone is aligned with the position of each participant, and the voice receiving angle of each microphone may be 45 °, that is, the preset angle is 45 °, so as to ensure that only the voice information of the participant corresponding to the microphone is received, and the sound from other sound sources is shielded. After the microphone collects the voice information of each participant, the voice information is transmitted to a back-end system through a wireless network in real time for voice analysis and processing, and a semantic recognition stage is started.
In practical applications, the voice receiving angle of the microphone may be adjusted arbitrarily according to the actual performance of the microphone and the arrangement of the conference hall, which is not limited in the embodiment of the present invention.
It is understood that other speech capture methods may be used by those skilled in the art to achieve accurate and clear capture of the speech of each participant.
S103: determining a target preset text matched with the target conference theme in a preset text library; the preset text library comprises preset texts associated with conference subjects, and the conference subjects comprise target conference subjects.
The emphasis of the semantic recognition function is to realize that the system can accurately recognize the real semantics of the speaker. The conference generally has a special subject, and special words are provided corresponding to the subject. For meetings with different subjects, the same voice spoken by a speaker may correspond to different translated words, namely, homophone phenomenon occurs, homophone refers to a group of words with completely the same voice form and no connection in meaning, for example, "lodging-complaint, simple-quarantine, fighter-war, self-describing-word number" and the like, and according to the statistics of modern Chinese vocabularies, homophone accounts for about one tenth of the number of vocabularies, which brings great difficulty to voice recognition. Meanwhile, there are some special words or words for short which are only used in specific environments in the conference, for example, in a bank working conference, "credit loan process management system" is often referred to as "old CP" for short, and because the special words adopted in the conference with different topics have special specificity and other factors, the voice recognition system often cannot accurately recognize the real semantics of the user. Moreover, since the conference generally has a special theme, and a special vocabulary corresponding to the theme, the semantic recognition function cannot accurately recognize the real semantics of the user, and the user needs to manually modify the recognized conference summary, thereby increasing the labor cost.
The embodiment of the present invention solves the above problem of inaccurate speech recognition by presetting a text library, and as a specific implementation manner, the method further includes: acquiring a preset conference theme and preset words corresponding to the preset conference theme; establishing a corresponding relation between a preset conference theme and preset words; and generating a preset text corresponding to the preset conference theme in a preset text library based on the corresponding relation.
According to the conference theme, preset words related to the conference theme can be obtained, the preset words can comprise homophones, special words used in specific themes and contexts, short words and the like, and the preset words can be input as preset texts in a manual input mode. Specifically, obtaining a preset conference theme and preset words corresponding to the preset conference theme includes:
responding to a creation request of a preset text, and displaying a text creation interface, wherein the text creation interface comprises a theme input area and a word input area; and responding to the creation confirmation operation, and acquiring the input contents in the theme input area and the word input area to obtain a preset conference theme and preset words corresponding to the preset conference theme.
When the preset words are input into the preset text base in the manual mode, a user can judge whether the words are easily confused words such as homophones and the like according to actual conditions, so that the words can be input according to specific needs; for special words and short words used in specific environment and context, the user can also enter the words according to actual conditions so as to avoid the increase of labor cost caused by unnecessary entering.
In another specific embodiment, the obtaining of the preset conference theme and the preset words corresponding to the preset conference theme may further include:
responding to a creation request of a preset text, and acquiring an imported target document; determining a conference theme of a target document to obtain a preset conference theme; and extracting key words in the target document, wherein the key words are used as preset words corresponding to preset conference subjects.
Specifically, the key terms in the target document are extracted, where the key terms may be high-frequency terms, and the high-frequency terms may be extracted specifically according to whether the number of times that the terms appear in the target document is greater than a preset threshold.
It should be noted that, in the step of importing the high-frequency words into the preset text library, in order to accurately identify the real semantics, the user may further determine whether to import the high-frequency words into the preset text library according to actual conditions and specific needs. For example, although some words appear more times and belong to high-frequency words, the words are not related to conference subjects and belong to common words, and errors generally do not occur in speech recognition of the words, so that a user can choose not to import the high-frequency words into a preset text library.
S104: and recognizing voice information according to the target preset text to obtain a voice recognition text.
When the system recognizes a voice, preferentially detecting a preset text in a preset text library, in a specific embodiment, as shown in fig. 3, recognizing voice information according to a target preset text to obtain a voice recognition text, specifically including:
s301: performing semantic recognition according to the voice information to obtain a primary recognition text;
s302: searching whether a target preset text matched with the primary recognition text exists in a preset text library;
s303: if a target preset text matched with the primary recognition text exists in the preset text library, using the target preset text as a voice recognition text; and if the target preset text matched with the primary recognition text does not exist in the preset text library, obtaining the voice recognition text in an automatic judgment mode.
In the embodiment, after the voice information is acquired, the preset text in the preset text library is preferentially searched during voice recognition, and because the preset text library is associated with the conference theme, the semantic recognition is more accurate, so that the accuracy of the voice recognition can be improved.
S105: and generating a conference summary according to the voice recognition text and a preset conference summary format.
The conference summary generally has a preset format, and the conference summary can be obtained by obtaining the voice information of each participant and the voice recognition texts corresponding to the voice information in the above manner, and displaying the voice recognition texts of the participants according to the preset conference summary format according to the time sequence.
According to the embodiment of the invention, the problem that the existing voice recognition system is inaccurate in special word recognition is solved through the preset text library and the preset words associated with the conference theme, the conference summary can be generated accurately in real time by the voice information in the multi-person conference, the complexity and the time consumption of manually arranging the conference voice text in the prior art are avoided, the generation efficiency of the conference summary is improved, the labor cost is saved, and the accuracy and the real-time performance of the conference voice recognition are greatly improved.
In a specific embodiment, as shown in fig. 4, after this step, the method further comprises:
s401: extracting key words in the conference summary;
s402: and adding the key words as preset texts into a preset text library.
Specifically, a key word in the conference summary is extracted, the key word may be a high-frequency word, for example, and the high-frequency word may be extracted specifically according to whether the number of times of occurrence of the word in the conference summary is greater than a preset threshold.
It should be noted that, in the step of adding the high-frequency words as the preset text to the preset text library, in order to accurately identify the real semantics, the user may further determine whether to import the high-frequency words into the preset text library according to actual conditions and specific needs. For example, although some words appear more times and belong to high-frequency words, the words are not related to conference subjects and belong to common words, and errors generally do not occur in speech recognition of the words, so that a user can choose not to import the high-frequency words into a preset text library as preset texts.
In a specific embodiment, as shown in fig. 5, after this step, the method further comprises:
s501: responding to a modification instruction of the conference summary, and determining a target word to be modified in the conference summary;
s502: determining a preset word to be replaced corresponding to the target word to be modified in the target preset text;
s503: and carrying out replacement operation on the preset words to be replaced according to the modified texts corresponding to the target words to be modified.
In the embodiment of the invention, in order to further improve the accuracy of the conference voice recognition, the conference summary can be modified after the conference summary is generated. Further, based on the modified conference summary, the preset text base can be modified to obtain a more accurate preset text base.
As shown in fig. 6, in the embodiment of the present invention, the generation manner of the preset text library mainly includes several manners of pre-importing, automatically adding, and manually modifying, specifically, the pre-importing may include manually entering preset words related to a conference topic, and importing key words in a target document as preset texts; the automatic identification refers to extracting key words in the conference summary and adding the key words into a preset text library; the manual modification refers to the modification of the conference summary, and the modified text is imported into a preset text library so as to obtain more accurate preset text, and further improve the accuracy of conference voice recognition.
According to the technical scheme provided by the embodiment of the specification, the problem that the existing voice recognition system is inaccurate in special word recognition is solved through the preset text library and the preset words associated with the conference theme, and the accuracy of voice transcription is greatly improved; the conference summary generation method and the conference summary generation device adopt the multiple microphones, and set the angle of the microphones for receiving the voice to ensure that only the voice information of the participants corresponding to the microphones is received, so that the voice information in the multi-person conference can be generated accurately in real time, the complexity and time consumption of manually arranging the conference voice text in the prior art are avoided, the generation efficiency of the conference summary is improved, the labor cost is saved, the accuracy and the real-time performance of conference voice recognition are greatly improved, and the user experience is improved.
An embodiment of the present invention further provides a device for generating a conference summary, as shown in fig. 7, the device for generating a conference summary in this embodiment includes: the first acquisition module is used for responding to the conference event and determining a target conference theme of the target conference; the second acquisition module is used for determining the voice information corresponding to each participant in the target conference; the target preset text determining module is used for determining a target preset text matched with the target conference theme in a preset text library; the preset text library comprises preset texts associated with conference subjects, and the conference subjects comprise target conference subjects; the voice recognition module is used for recognizing voice information according to the target preset text to obtain a voice recognition text; and the conference summary generation module is used for generating a conference summary according to the voice recognition text and a preset conference summary format.
It should be noted that the device and method embodiments in the device embodiment are based on the same inventive concept. For details, please refer to the method embodiment, which is not described herein.
An embodiment of the present invention provides an electronic device, where the electronic device includes a processor and a memory, where the memory stores at least one instruction or at least one program, and the at least one instruction or the at least one program is loaded and executed by the processor to implement the method for generating a conference summary provided in the above method embodiment.
The embodiment of the present invention further provides a storage medium, where the storage medium may be disposed in an electronic device to store at least one instruction or at least one program for implementing a virus detection method in the method embodiment, and the at least one instruction or the at least one program is loaded and executed by the processor to implement the method for generating a conference summary provided in the method embodiment.
Alternatively, in this embodiment, the storage medium may be located in at least one network server of a plurality of network servers of a computer network. Optionally, in this embodiment, the storage medium may include, but is not limited to: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.
It should be noted that: the precedence order of the above embodiments of the present invention is only for description, and does not represent the merits of the embodiments. And specific embodiments thereof have been described above. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.
The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the device and electronic apparatus embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and reference may be made to some descriptions of the method embodiments for relevant points.
It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, where the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.
Finally, it should be noted that: the above-mentioned embodiments are only specific embodiments of the present invention, which are used for illustrating the technical solutions of the present invention and not for limiting the same, and the protection scope of the present invention is not limited thereto, although the present invention is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: any person skilled in the art can modify or easily conceive the technical solutions described in the foregoing embodiments or equivalent substitutes for some technical features within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the embodiments of the present invention, and they should be construed as being included therein. Therefore, the protection scope of the present invention shall be subject to the protection scope of the above claims.

Claims (11)

1. A method for generating a conference summary, comprising:
determining a target meeting theme of the target meeting in response to the meeting event;
determining voice information corresponding to each participant in the target conference;
determining a target preset text matched with the target conference theme in a preset text library; the preset text library comprises preset texts associated with conference subjects, and the conference subjects comprise the target conference subjects;
recognizing the voice information according to the target preset text to obtain a voice recognition text;
and generating a conference summary according to the voice recognition text and a preset conference summary format.
2. The method of generating a conference summary according to claim 1, further comprising:
acquiring a preset conference theme and preset words corresponding to the preset conference theme;
establishing a corresponding relation between the preset conference theme and the preset words;
and generating a preset text corresponding to the preset conference theme in a preset text library based on the corresponding relation.
3. The method for generating the conference summary according to claim 2, wherein obtaining a preset conference subject and a preset word corresponding to the preset conference subject comprises:
responding to a creation request of a preset text, and displaying a text creation interface, wherein the text creation interface comprises a theme input area and a word input area;
and responding to a creation confirmation operation, acquiring the input contents in the theme input area and the word input area, and obtaining the preset conference theme and preset words corresponding to the preset conference theme.
4. The method for generating the conference summary according to claim 2, wherein obtaining a preset conference subject and a preset word corresponding to the preset conference subject comprises:
responding to a creation request of a preset text, and acquiring an imported target document;
determining a conference theme of the target document to obtain the preset conference theme;
and extracting key terms in the target document, wherein the key terms are used as preset terms corresponding to the preset conference theme.
5. The method for generating a conference summary according to claim 1, wherein the recognizing the voice information according to the target preset text to obtain a voice recognition text specifically comprises:
performing semantic recognition according to the voice information to obtain a primary recognition text;
searching whether a target preset text matched with the preliminary recognition text exists in a preset text library;
if a target preset text matched with the primary recognition text exists in a preset text library, using the target preset text as a voice recognition text; and if the target preset text matched with the preliminary recognition text does not exist in the preset text library, obtaining the voice recognition text in an automatic judgment mode.
6. The method for generating a conference summary according to claim 1, after generating a conference summary according to a preset conference summary format based on the speech recognition text, further comprising:
extracting key words in the conference summary;
and adding the key words as preset texts to the preset text library.
7. The method for generating a conference summary according to claim 1, after generating a conference summary according to a preset conference summary format based on the speech recognition text, further comprising:
responding to a modification instruction of a conference summary, and determining a target word to be modified in the conference summary;
determining a preset word to be replaced corresponding to the target word to be modified in the target preset text;
and replacing the preset words to be replaced according to the modified texts corresponding to the target words to be modified.
8. The method for generating a conference summary according to claim 1, wherein determining the voice information corresponding to each participant in the target conference specifically comprises:
receiving target voice information of a preset angle of a target microphone;
determining a target participant corresponding to the target microphone based on the corresponding relation between the microphones and the participants;
and taking the target voice information as the voice information of the target participant to obtain the voice information corresponding to each participant.
9. An apparatus for generating a conference summary, comprising:
the first acquisition module is used for responding to the conference event and determining a target conference theme of the target conference;
the second acquisition module is used for determining the voice information corresponding to each participant in the target conference;
the target preset text determining module is used for determining a target preset text matched with the target conference theme in a preset text library; the preset text library comprises preset texts associated with conference subjects, and the conference subjects comprise the target conference subjects;
the voice recognition module is used for recognizing the voice information according to the target preset text to obtain a voice recognition text;
and the conference summary generation module is used for generating a conference summary according to the voice recognition text and a preset conference summary format.
10. An apparatus comprising a processor and a memory, the memory having stored therein at least one instruction, at least one program, set of codes, or set of instructions, the at least one instruction, the at least one program, set of codes, or set of instructions being loaded and executed by the processor to implement a method of generating a conference summary according to any one of claims 1 to 8.
11. A computer readable storage medium having stored therein at least one instruction, at least one program, a set of codes, or a set of instructions, which is loaded and executed by a processor to implement a method of generating a conference summary according to any one of claims 1 to 8.
CN202011224273.6A 2020-11-05 2020-11-05 Method, device, equipment and storage medium for generating conference summary Pending CN112468665A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011224273.6A CN112468665A (en) 2020-11-05 2020-11-05 Method, device, equipment and storage medium for generating conference summary

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011224273.6A CN112468665A (en) 2020-11-05 2020-11-05 Method, device, equipment and storage medium for generating conference summary

Publications (1)

Publication Number Publication Date
CN112468665A true CN112468665A (en) 2021-03-09

Family

ID=74826369

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011224273.6A Pending CN112468665A (en) 2020-11-05 2020-11-05 Method, device, equipment and storage medium for generating conference summary

Country Status (1)

Country Link
CN (1) CN112468665A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112802480A (en) * 2021-04-15 2021-05-14 广东际洲科技股份有限公司 Voice data text conversion method based on multi-party communication
CN113434663A (en) * 2021-06-30 2021-09-24 平安科技(深圳)有限公司 Conference summary generation method based on edge calculation and related equipment
CN114757155A (en) * 2022-06-14 2022-07-15 深圳乐播科技有限公司 Method and device for generating conference document
CN115037739A (en) * 2022-06-13 2022-09-09 深圳乐播科技有限公司 File transmission method and device, electronic equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102750949A (en) * 2012-07-16 2012-10-24 深圳市车音网科技有限公司 Voice recognition method and device
CN108538286A (en) * 2017-03-02 2018-09-14 腾讯科技(深圳)有限公司 A kind of method and computer of speech recognition
CN108847241A (en) * 2018-06-07 2018-11-20 平安科技(深圳)有限公司 It is method, electronic equipment and the storage medium of text by meeting speech recognition
CN111564157A (en) * 2020-03-18 2020-08-21 浙江省北大信息技术高等研究院 Conference record optimization method, device, equipment and storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102750949A (en) * 2012-07-16 2012-10-24 深圳市车音网科技有限公司 Voice recognition method and device
CN108538286A (en) * 2017-03-02 2018-09-14 腾讯科技(深圳)有限公司 A kind of method and computer of speech recognition
CN108847241A (en) * 2018-06-07 2018-11-20 平安科技(深圳)有限公司 It is method, electronic equipment and the storage medium of text by meeting speech recognition
CN111564157A (en) * 2020-03-18 2020-08-21 浙江省北大信息技术高等研究院 Conference record optimization method, device, equipment and storage medium

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112802480A (en) * 2021-04-15 2021-05-14 广东际洲科技股份有限公司 Voice data text conversion method based on multi-party communication
CN112802480B (en) * 2021-04-15 2021-07-13 广东际洲科技股份有限公司 Voice data text conversion method based on multi-party communication
CN113434663A (en) * 2021-06-30 2021-09-24 平安科技(深圳)有限公司 Conference summary generation method based on edge calculation and related equipment
CN115037739A (en) * 2022-06-13 2022-09-09 深圳乐播科技有限公司 File transmission method and device, electronic equipment and storage medium
CN115037739B (en) * 2022-06-13 2024-02-23 深圳乐播科技有限公司 File transmission method and device, electronic equipment and storage medium
CN114757155A (en) * 2022-06-14 2022-07-15 深圳乐播科技有限公司 Method and device for generating conference document

Similar Documents

Publication Publication Date Title
CN112468665A (en) Method, device, equipment and storage medium for generating conference summary
Anguera et al. Speaker diarization: A review of recent research
CN205647778U (en) Intelligent conference system
CN110517689B (en) Voice data processing method, device and storage medium
US20040064322A1 (en) Automatic consolidation of voice enabled multi-user meeting minutes
CN107562760B (en) Voice data processing method and device
CN103165131A (en) Voice processing system and voice processing method
CN104252464B (en) Information processing method and device
US20210232776A1 (en) Method for recording and outputting conversion between multiple parties using speech recognition technology, and device therefor
CN111128223A (en) Text information-based auxiliary speaker separation method and related device
US20130253932A1 (en) Conversation supporting device, conversation supporting method and conversation supporting program
CN102339193A (en) Voice control conference speed method and system
US8126715B2 (en) Facilitating multimodal interaction with grammar-based speech applications
CN106713111B (en) Processing method for adding friends, terminal and server
CN101867742A (en) Television system based on sound control
EP2682931B1 (en) Method and apparatus for recording and playing user voice in mobile terminal
CN111063355A (en) Conference record generation method and recording terminal
US20220093103A1 (en) Method, system, and computer-readable recording medium for managing text transcript and memo for audio file
US8788621B2 (en) Method, device, and computer product for managing communication situation
CN111627446A (en) Communication conference system based on intelligent voice recognition technology
CN111223487B (en) Information processing method and electronic equipment
CN113779208A (en) Method and device for man-machine conversation
CN111626061A (en) Conference record generation method, device, equipment and readable storage medium
CN112581965A (en) Transcription method, device, recording pen and storage medium
CN113782026A (en) Information processing method, device, medium and equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20210309