CN112687272A

CN112687272A - Conference summary recording method and device and electronic equipment

Info

Publication number: CN112687272A
Application number: CN202011511134.1A
Authority: CN
Inventors: 王思远
Original assignee: Beijing Kingsoft Cloud Network Technology Co Ltd
Current assignee: Beijing Kingsoft Cloud Network Technology Co Ltd
Priority date: 2020-12-18
Filing date: 2020-12-18
Publication date: 2021-04-20
Anticipated expiration: 2040-12-18
Also published as: CN112687272B

Abstract

The utility model provides a recording method, a device and an electronic device of a conference summary, which relates to the technical field of computers, and the method comprises the following steps: acquiring a first keyword added in advance; detecting voice information according to a first keyword which is added in advance to obtain a first target voice statement; and converting the first target voice sentence into a text sentence to obtain a conference summary. The conference summary obtained by the method can briefly and visually represent the key contents of the conference, and is beneficial to improving the efficiency of reading and understanding the conference summary of participants.

Description

Conference summary recording method and device and electronic equipment

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to a method and an apparatus for recording a conference summary, and an electronic device.

Background

In the online meeting process, meeting contents need to be recorded to assist meeting participants to complete work better. At present, the voice recognition technology is already applied to the scene of online meeting, and the voice speech in the meeting process can be converted into a text-form meeting summary by using the voice recognition technology. Generally, online meetings have long time, and the contents of the meetings are complicated and rich, and the meeting notes generated through voice recognition are long and tedious, so that when a user subsequently looks over the meeting summary, the user can hardly find the key contents of the meeting, the user can not understand the meeting conveniently, and the efficiency of reading the meeting summary is reduced.

Disclosure of Invention

In order to solve the technical problem or at least partially solve the technical problem, the present disclosure provides a method and an apparatus for recording a meeting summary, and an electronic device, so as to improve the simplicity of the meeting summary and highlight the focus of the meeting summary.

The present disclosure provides a recording method of a conference summary, including: acquiring a first keyword added in advance; detecting voice information according to the pre-added first keyword to obtain a first target voice statement; and converting the first target voice sentence into a text sentence to obtain a conference summary.

Further, the step of detecting the voice information according to the pre-added first keyword to obtain a first target voice statement includes: detecting voice information recorded in a conference process according to the first keyword added in advance to obtain a first voice statement containing the first keyword; extracting a first voice segment positioned in front of the first voice statement and a second voice segment positioned behind the first voice statement from the voice information; and taking the first voice statement, the first voice segment and the second voice segment as a first target voice statement.

Further, the step of extracting a first speech segment located before the first speech sentence and a second speech segment located after the first speech sentence from the speech information includes: and taking the first voice sentence as a starting point, extracting sentences in a first preset range before the first voice sentence to be used as a first voice segment, and extracting sentences in a second preset range after the first voice sentence to be used as a second voice segment.

Further, the method further comprises: judging whether an adding operation aiming at the keyword is detected or not in the process of detecting the voice information according to the first keyword; when the adding operation is detected, acquiring a second keyword corresponding to the adding operation, and determining the time when the adding operation is detected as the starting time; detecting the voice information recorded after the starting time according to the second keyword to obtain a second voice statement; detecting the voice information recorded before the starting time according to the second keyword to obtain a third voice statement; and determining a second target voice statement based on the second voice statement and the third voice statement.

Further, the method further comprises: when a deletion operation for the first keyword is detected, determining the time when the deletion operation is detected as an end time; and stopping detecting the first keyword in the voice message recorded after the end time.

Further, the step of obtaining the first keyword added in advance includes: receiving an adding operation aiming at a keyword; the adding operation is initiated after the participant logs in an account before voice detection; and acquiring a first keyword associated with the account according to the adding operation.

Further, after the obtaining of the meeting summary, the method further comprises: generating an association relation between the account and the conference summary according to the first keyword used in the conference summary process and the account associated with the first keyword; and sending the conference summary to the account with the association relationship.

The present disclosure also provides a recording apparatus for a conference summary, including: the acquisition module is used for acquiring a first keyword which is added in advance; the detection module is used for detecting the voice information according to the pre-added first keyword to obtain a first target voice statement; and the text conversion module is used for converting the first target voice sentence into a text sentence to obtain a conference summary.

The present disclosure also provides an electronic device, including: a processor and a storage device; the storage means has stored thereon a computer program which, when executed by the processor, performs the method described above.

The present disclosure also provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the above-described method.

Compared with the prior art, the technical scheme provided by the embodiment of the disclosure has the following advantages:

the embodiment of the disclosure provides a recording method, a recording device and electronic equipment of a conference summary, wherein the method comprises the steps of firstly obtaining a first keyword which is added in advance; then, detecting voice information according to a first keyword which is added in advance to obtain a first target voice statement; and finally, converting the first target voice sentence into a text sentence to obtain a conference summary. In the above manner provided by this embodiment, since the first keyword is added in advance, the focus of attention of the participant to the contents of the subsequent conference can be fully reflected, and then the first target voice sentence obtained by detecting the voice information according to the first keyword added in advance is mainly a voice sentence containing the first keyword in the voice information after the first keyword is added; in this case, the first target speech sentence can not only embody important contents of the conference, but also not all speech information in the conference process, that is, the first target speech sentence is brief speech information embodying important contents of the conference. Furthermore, the conference summary obtained based on the first target voice sentence can briefly and intuitively represent important contents of the conference, and the efficiency of reading and understanding the conference summary by the participants is improved.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure.

In order to more clearly illustrate the embodiments or technical solutions in the prior art of the present disclosure, the drawings used in the description of the embodiments or prior art will be briefly described below, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without inventive exercise.

Fig. 1 is a flowchart of a recording method for a conference summary according to an embodiment of the present disclosure;

FIG. 2 is a diagram illustrating a first target speech statement provided by an embodiment of the present disclosure;

fig. 3 is a flowchart of another recording method for a conference summary according to an embodiment of the present disclosure;

fig. 4 is a block diagram of a structure of a recording apparatus for a conference summary according to an embodiment of the present disclosure.

Detailed Description

In order that the above objects, features and advantages of the present disclosure may be more clearly understood, aspects of the present disclosure will be further described below. It should be noted that the embodiments and features of the embodiments of the present disclosure may be combined with each other without conflict.

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present disclosure, but the present disclosure may be practiced in other ways than those described herein; it is to be understood that the embodiments disclosed in the specification are only a few embodiments of the present disclosure, and not all embodiments.

Considering that the conference summary recorded based on the voice recognition technology is long and complicated at present, the key content of the conference cannot be highlighted, and the user cannot conveniently and quickly understand the conference summary, the method, the device and the electronic device for recording the conference summary provided by the embodiment of the disclosure can effectively improve the simplicity of the conference summary, highlight the key point of the conference summary, and further are beneficial to improving the efficiency of reading and understanding the conference summary of the participants. The technology can be applied to various occasions needing voice recognition, such as conferences, live webcasts, customer service robots, voice telephones and the like, and in order to facilitate understanding of the embodiment, a recording method of a conference summary disclosed by the embodiment of the invention is firstly introduced in detail.

The first embodiment is as follows:

referring to a flowchart of a recording method of a conference summary shown in fig. 1, the method specifically includes the following steps:

step S102, a first keyword added in advance is obtained.

In this embodiment, an obtaining method of a first keyword in a remote conference scenario is provided: before a conference starts or in the process of the conference, participants add at least one first keyword in advance through a local terminal according to the focused contents of the conference concerned by the participants; when the first keyword is multiple; the different keywords may be segmented by preset delimiters (e.g., space, comma, pause, and, etc.). In practice, the local terminal can directly acquire the first keyword and perform subsequent steps of detection, conversion and the like; and/or the first keyword can be sent to a remote terminal hosting the conference through the local terminal, and the remote terminal acquires the first keyword and executes subsequent steps.

And step S104, detecting voice information according to the pre-added first keyword to obtain a first target voice statement.

In the conference process, according to the first keyword added in advance, the voice information recorded after the first keyword is added is detected, so that a first target voice sentence containing the first keyword is obtained. It is understood that in the recorded voice message, at least one sentence will usually include the first keyword, so that the first target voice sentence obtained by detecting the first keyword is at least one sentence.

The manner in which the speech information is detected based on the first keyword may be various and will be described in detail below.

And step S106, converting the first target voice sentence into a text sentence to obtain a conference summary.

The embodiment can convert the first target speech sentence into a text sentence based on a speech recognition algorithm to obtain a conference summary. For example, in one possible implementation, a first target speech sentence may be input to a trained speech recognition model (e.g., an N-Gram model), with the original phonemes of the first target speech sentence being identified by the speech recognition model; then matching the original phoneme with a phoneme prestored in a database, and determining a target phoneme successfully matched with the original phoneme; and acquiring a text sentence corresponding to the target phoneme, and obtaining a conference summary based on the text sentence.

According to the recording method for the conference summary provided by the embodiment of the disclosure, the acquired first keyword is added in advance, so that the focus of attention of participants to the contents of a subsequent conference can be fully reflected, and then the voice information is detected according to the first keyword which is added in advance to obtain a first target voice statement, which is mainly a voice statement containing the first keyword in the voice information after the first keyword is added; in this case, the first target voice sentence can not only embody the key content of the conference, but also not all voice information in the conference process; that is, the first target speech sentence is brief speech information that embodies the important contents of the conference. Furthermore, the conference summary obtained based on the first target voice sentence can briefly and intuitively represent important contents of the conference, and the efficiency of reading and understanding the conference summary by the participants is improved.

In this embodiment, an implementation manner of detecting speech information according to a first keyword is provided, and the following steps 1 to 3 may be referred to:

step 1, detecting voice information recorded in a conference process according to a first keyword added in advance to obtain a first voice sentence containing the first keyword.

There are various ways of detecting the speech information according to the first keyword to obtain the first speech statement, and the following two specific implementations are provided as examples.

The first method is as follows: (1) extracting information of the voice information recorded in the conference process and decoding the information into PCM (Pulse Code Modulation) voice information; (2) performing spectrum analysis on the PCM voice information, constructing a filter, and filtering noise to obtain preprocessed voice information; (3) converting the preprocessed voice information into a text character string according to a voice recognition algorithm; (4) splitting the text character string into a plurality of phrases by using a semantic analysis technology, calculating the length of each phrase character string, and then comparing the length with a first keyword with the same character string length; (5) and when the character string is successfully compared with any first keyword, determining the voice information corresponding to the successfully matched character string as a first voice sentence containing the first keyword.

The second method comprises the following steps: (1) mapping each first keyword into a corresponding first phoneme; (2) inputting the voice information recorded in the conference process into a voice recognition model (such as an N-Gram model), and identifying a second phoneme of the voice information through the voice recognition model; (3) calculating a score of the voice information as any first keyword according to the first phoneme and the second phoneme; (4) and determining the voice information with the score higher than the preset score as a first voice sentence containing the first keyword.

And 2, extracting a first voice segment positioned in front of the first voice sentence and a second voice segment positioned behind the first voice sentence from the voice information. In a specific implementation, the first speech sentence may be used as an extraction starting point, a first speech segment that is continuous with the first speech sentence before the extraction starting point, and a second speech segment that is continuous with the first speech sentence after the extraction starting point.

And 3, taking the first voice statement, the first voice segment and the second voice segment as a first target voice statement.

On the basis of the consideration that the first speech sentence is generally strongly associated with the context, for any one first speech sentence, the speech segments before and after the first speech sentence can be extracted from the speech information, and the first speech sentence, the first speech segment before and after the first speech sentence, and the second speech segment before and after the first speech sentence can be jointly used as the first target speech sentence. The first target voice sentence obtained by the embodiment is a section of voice segment containing the keyword, and the context and the semanteme can be expressed more completely and clearly; in this case, the conference summary converted from the first target speech sentence can be more complete, clear and coherent, so as to facilitate the reading and understanding of the participants.

For step 2 in the foregoing embodiment, the present embodiment provides a specific implementation manner, and refer to the following:

and taking the first voice sentence as a starting point, extracting sentences in a first preset range before the first voice sentence to be used as a first voice segment, and extracting sentences in a second preset range after the first voice sentence to be used as a second voice segment.

The first preset range and the second preset range are preset ranges of the participants according to time and/or the number of sentences, and the first preset range and the second preset range can be the same or different. Taking the time-corresponding range as an example, the first speech segment may be a sentence within three minutes before the first speech sentence, and the second speech segment may be a sentence within two minutes after the first speech sentence. Taking the range corresponding to the number of sentences as an example, the first speech segment may be three sentences before the first speech sentence, and the second speech segment may be five sentences after the first speech sentence. Or, taking the range corresponding to the time and the number of the sentences as an example, the first voice segment is a sentence t times before the first voice sentence, and the second voice segment is m sentences after the first voice sentence.

In practical applications, the first speech sentence may be a sentence at the beginning or the end of the conference, in which case, the speech extraction range before and after the starting point is smaller than the first preset range and the second preset range, for example, the number a of sentences before the starting point is smaller than the first preset range (m1 sentences), and the number b of sentences before the starting point is smaller than the second preset range (m2 sentences). For this, in this embodiment, the sentences covered within the first preset range may be extracted as the first voice segment, and the sentences covered within the second preset range may be extracted as the second voice segment. For example, a sentence a in a first preset range is extracted to obtain a first voice segment, and a sentence b in a second preset range is extracted to obtain a second voice segment.

Referring to the schematic diagram of the first target speech sentence shown in fig. 2, after the first speech segment and the second speech segment before and after the first speech sentence are extracted in the above manner, the first speech sentence, the first speech segment and the second speech segment are taken as the first target speech sentence. It can be understood that, for any one first keyword, there may be i (i ≧ 1) pieces of the corresponding first speech statement, and thus the number of the first speech segment, the second speech segment, and the first target speech statement corresponding to the first speech statement is also i. In addition, the second voice segment i shown in fig. 2 is the aforementioned second voice segment smaller than the second preset range.

In practical application, the first keyword added by the participator may not be accurate, or the keyword does not need to be detected and recorded in the subsequent conference process any more, so that the participator can delete the first keyword in order to flexibly adapt to the actual content of the conference. Based on this, an embodiment of deleting a keyword may be provided, including the following steps 1) and 2):

step 1), when the deletion operation aiming at the first keyword is detected, determining the time when the deletion operation is detected as the end time (such as T)_e1). When the participant executes the deletion operation on the first keyword on the graphical user interface of the terminal in a mode of fingers, a touch pen, a mouse or the like, the terminal can detect the deletion operation acted on the first keyword by the participant; then, the terminal determines the time when the deletion operation is detected as the end time T_e1(ii) a The end time T_e1It may be understood as a time when the first keyword detection is stopped.

Step 2), stopping detection at the end time T_e1And then recording the first key word in the voice information. I.e. no longer for the end time T_e1And detecting the first keyword by the recorded voice information.

Adding a first keyword in the conference process until the ending time T_e1Detecting a first keyword by the previously recorded voice information to obtain a first target voice statement; the conference summary obtained based on the first target voice sentence only corresponds to the conference content of the section from the time when the first keyword is added to the time before the end time, so that the brief of the conference summary can be further improved, and the important conference content concerned by the participants is kept.

Of course, as the conference progresses, the participants may also add new keywords so that the conference summary can better reflect the conference contents of the important attention. Based on this, an embodiment of adding new keywords can be provided, as shown below: acquiring a second keyword newly added by a participant in the conference process; detecting the voice information recorded after the second keyword is added according to the second keyword to obtain a second target voice statement; and converting the second target voice sentence into a text sentence to obtain a conference summary.

In a specific implementation process, referring to the following steps a to f, a conference summary is obtained according to the newly added second keyword:

step a, judging whether the adding operation aiming at the key words is detected or not in the process of detecting the voice information according to the first key words. In the process of detecting the first keyword for the voice information, whether the adding operation of the new keyword exists can be detected at the same time. Specifically, when the participant performs an operation of adding a new keyword on the graphical user interface of the terminal, the terminal detects the operation of adding the new keyword by the participant.

Step b, when the adding operation is detected, acquiring a second keyword corresponding to the adding operation, and determining the time when the adding operation is detected as the starting time (such as T)_s2) (ii) a The start time T_s2Which may be understood as the time at which the detection of the second keyword is started.

Consider that the second keyword is added later as the meeting progresses. On this basis, reference may be made to step c, for the starting time T_s2And detecting a second keyword by the recorded voice information. Alternatively, in order to avoid missing the conference content before adding the second keyword, the following steps c and d can be referred to, and the starting time T can be considered_s2And respectively detecting the second keywords by the voice information recorded before and after.

Step c, according to the second keyword pair start time T_s2And detecting the recorded voice information to obtain a second voice statement.

Step d, according to the second keyword pair starting time T_s2And detecting the previously recorded voice information to obtain a third voice statement.

The implementation manners of step c and step d may refer to the implementation manner of the first speech sentence obtained according to the first keyword in the above embodiment, and are not described herein.

And e, determining a second target voice statement based on the second voice statement and the third voice statement. In a specific implementation manner, referring to the foregoing embodiment, the second speech sentence and the third speech sentence are respectively used as starting points, speech segments within a preset range before and after the starting point are extracted, and a second target speech sentence is determined based on the second speech sentence, the third speech sentence and the extracted speech segments. For the sake of brief description, the present embodiment does not specifically describe the manner in which the second target speech sentence is obtained.

The second target voice sentence obtained through the second voice sentence and the third voice sentence not only contains the conference content after the second keyword is added, but also can supplement the conference content before the second keyword is added, so that the completeness of the conference record can be improved.

And f, converting the second target voice sentence into a text sentence to obtain a conference summary. The implementation manner of converting the second target speech sentence into the text sentence can refer to the implementation manner of obtaining the conference summary based on the first target speech sentence in the above embodiment; and will not be described further herein. It can be understood that, in an actual conference process, if the detection of the voice information is performed according to the first keyword and the second keyword, respectively, the obtained conference summary includes the text sentence converted by the first target voice sentence and the text sentence converted by the second target voice sentence.

In view of this, compared to the start time T_s2Previous second target speech statement, start time T_s2The second target speech sentence thereafter may be more in line with the important attention of the participant, thereby facilitating the user to quickly refer to the start time T_s2Later conference summary, this embodiment may further include:

establishing a time mark of a second target voice statement; wherein the time stamp is used to distinguish whether the second target speech sentence is determined based on the second speech sentence or based on the third speech sentence. That is, it can be distinguished by the time stamp that the second target speech sentence is the start time T_s2Whether a preceding or a following speech statement. And then, converting the second target voice sentence with the time identifier into a text sentence to obtain a conference summary carrying the time identifier.

Referring to the deletion operation of the first keyword, the participant may also perform the deletion operation on the second keyword. Based on this, when a deletion operation for the second keyword is detected, the deletion operation will be executedThe time at which the delete operation is detected is determined as the end time (e.g., denoted as T)_e2) Stopping the detection at the end time T_e2And then recording a second keyword in the voice message. In this case, a second keyword (corresponding to the start time T) is added to the conference procedure_s2) And deleting the second keyword (corresponding to the end time T)_e2) And detecting a second keyword by the recorded voice information to obtain a second target voice sentence.

In an actual conference scene, there are usually a plurality of participants, and the important contents of the conference concerned by different participants may be different, but the prior art cannot obtain the conference records for different participants. In order to improve this problem, this embodiment may also provide another recording method of the conference summary, and referring to fig. 3, the method includes the following steps:

step S302, receiving an adding operation aiming at a keyword; wherein, the adding operation is initiated after the participant logs in an account before voice detection; different account numbers may represent participants of different identities during the conference.

Step S304, acquiring a first keyword associated with the account according to the adding operation. Because the participants participate in the teleconference by logging in the account, the association relationship between the account of the participants and the first keyword can be recorded by the first keyword added after the account is logged in. It should be noted that, in this embodiment, only the first keyword is taken as an example, and similarly, after the account is logged in, the second keyword obtained according to the new adding operation is the second keyword associated with the account.

Step S306, detecting the voice information according to the first keyword added in advance to obtain a first target voice statement.

Step S308, converting the first target voice sentence into a text sentence to obtain a conference summary.

Step S310, generating an association relation between the account and the conference summary according to the first keyword used in the process of obtaining the conference summary and the account associated with the first keyword.

In step S312, the conference summary is sent to the account with the association relationship.

By means of the method, the conference summary which is different from person to person, exclusive and customized and targeted can be generated, and the conference summary is accurately sent to the associated account, so that the intelligence of generating the conference summary in the conference process is improved, and the user experience of the participants is improved.

In summary, in the recording method of the conference summary provided in the above embodiment, the first keyword can fully reflect the focus of attention of the participants to the conference, so that the first target voice sentence is obtained by using the first keyword, and the obtained conference summary is converted, thereby not only highlighting the focus content of the conference concerned by the participants, but also obviously simplifying the space content, and making the online conference more intelligent. In addition, under the condition that the incidence relation between the account number of the participant and the first keyword is considered, the conference record can also correspond to the specific participant, and the efficiency of reading and understanding the conference summary of the participant is improved.

Example two:

the present embodiment provides a recording apparatus of a conference summary, which is used to implement the recording method of the conference summary provided in the foregoing embodiment, and with reference to fig. 4, the apparatus includes:

an obtaining module 402, configured to obtain a first keyword added in advance;

the detection module 404 is configured to detect voice information according to a first keyword added in advance, and obtain a first target voice statement;

and the text conversion module 406 is configured to convert the first target speech sentence into a text sentence, so as to obtain a conference summary.

According to the recording device for the conference summary provided by the embodiment of the disclosure, the acquired first keyword is added in advance, so that the focus of attention of participants to the contents of a subsequent conference can be fully reflected, and then the voice information is detected according to the first keyword added in advance to obtain a first target voice sentence, which is mainly a voice sentence containing the first keyword in the voice information after the first keyword is added; in this case, the first target speech sentence can not only embody important contents of the conference, but also not all speech information in the conference process, that is, the first target speech sentence is brief speech information embodying important contents of the conference. Furthermore, the conference summary obtained based on the first target voice sentence can briefly and intuitively represent important contents of the conference, and the efficiency of reading and understanding the conference summary by the participants is improved.

In one implementation, the detecting module 404 is further configured to: detecting voice information recorded in a conference process according to a first keyword added in advance to obtain a first voice sentence containing the first keyword; extracting a first voice segment positioned in front of the first voice sentence and a second voice segment positioned behind the first voice sentence from the voice information; and taking the first voice statement, the first voice segment and the second voice segment as a first target voice statement.

In one implementation, the detecting module 404 is further configured to: and taking the first voice sentence as a starting point, extracting sentences in a first preset range before the first voice sentence to be used as a first voice segment, and extracting sentences in a second preset range after the first voice sentence to be used as a second voice segment.

In one implementation, the recording apparatus of the conference summary further includes an adding module (not shown in the figure), and the adding module is configured to: judging whether an adding operation aiming at a keyword is detected or not in the process of detecting the voice information according to the first keyword; when the adding operation is detected, acquiring a second keyword corresponding to the adding operation, and determining the time when the adding operation is detected as the starting time; detecting the voice information recorded after the starting time according to the second keyword to obtain a second voice statement; detecting the voice information recorded before the starting time according to the second keyword to obtain a third voice statement; and determining a second target voice statement based on the second voice statement and the third voice statement.

In one implementation, the recording apparatus of the conference summary further includes a deleting module (not shown in the figure), and the deleting module is configured to: when a deletion operation for the first keyword is detected, determining the time when the deletion operation is detected as an end time; the detection of the first keyword in the voice message recorded after the end time is stopped.

In an implementation manner, the obtaining module 402 is further configured to: receiving an adding operation aiming at a keyword; wherein, the adding operation is initiated after the participant logs in an account before voice detection; and acquiring a first keyword associated with the account according to the adding operation.

In one implementation, the recording apparatus of the conference summary further includes an association module (not shown in the figure), configured to: generating an association relation between an account and a conference summary according to a first keyword used in the conference summary process and the account associated with the first keyword; and sending the conference summary to the account with the association relationship.

The device provided by the embodiment of the present invention has the same implementation principle and technical effect as the method embodiments, and for the sake of brief description, reference may be made to the corresponding contents in the method embodiments without reference to the device embodiments.

Example three:

an embodiment of the present invention provides an electronic device, including: a processor and a storage device; the storage means has stored thereon a computer program which, when executed by the processor, performs the method of recording a conference summary as described in embodiment one.

The embodiment of the present invention provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and the computer program is executed by a processor to perform the steps of the method for recording a conference summary in the first embodiment.

It is noted that, in this document, relational terms such as "first" and "second," and the like, may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

The foregoing are merely exemplary embodiments of the present disclosure, which enable those skilled in the art to understand or practice the present disclosure. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the disclosure. Thus, the present disclosure is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A method of recording a meeting summary, comprising:

acquiring a first keyword added in advance;

detecting voice information according to the pre-added first keyword to obtain a first target voice statement;

and converting the first target voice sentence into a text sentence to obtain a conference summary.

2. The method according to claim 1, wherein the step of detecting the speech information according to the pre-added first keyword to obtain the first target speech sentence comprises:

detecting voice information recorded in a conference process according to the first keyword added in advance to obtain a first voice statement containing the first keyword;

extracting a first voice segment positioned in front of the first voice statement and a second voice segment positioned behind the first voice statement from the voice information;

and taking the first voice statement, the first voice segment and the second voice segment as a first target voice statement.

3. The method of claim 2, wherein the step of extracting a first speech segment preceding the first speech sentence and a second speech segment following the first speech sentence from the speech information comprises:

4. The method according to claim 1 or 2, characterized in that the method further comprises:

judging whether an adding operation aiming at the keyword is detected or not in the process of detecting the voice information according to the first keyword;

when the adding operation is detected, acquiring a second keyword corresponding to the adding operation, and determining the time when the adding operation is detected as the starting time;

detecting the voice information recorded after the starting time according to the second keyword to obtain a second voice statement;

detecting the voice information recorded before the starting time according to the second keyword to obtain a third voice statement;

and determining a second target voice statement based on the second voice statement and the third voice statement.

5. The method according to claim 1 or 2, characterized in that the method further comprises:

when a deletion operation for the first keyword is detected, determining the time when the deletion operation is detected as an end time;

and stopping detecting the first keyword in the voice message recorded after the end time.

6. The method according to claim 1, wherein the step of obtaining the first keyword added in advance comprises:

receiving an adding operation aiming at a keyword; the adding operation is initiated after the participant logs in an account before voice detection;

and acquiring a first keyword associated with the account according to the adding operation.

7. The method of claim 6, wherein after the obtaining of the meeting summary, the method further comprises:

generating an association relation between the account and the conference summary according to the first keyword used in the conference summary process and the account associated with the first keyword;

and sending the conference summary to the account with the association relationship.

8. A recording apparatus of a conference summary, comprising:

the acquisition module is used for acquiring a first keyword which is added in advance;

the detection module is used for detecting the voice information according to the pre-added first keyword to obtain a first target voice statement;

and the text conversion module is used for converting the first target voice sentence into a text sentence to obtain a conference summary.

9. An electronic device, comprising: a processor and a storage device;

the storage device has stored thereon a computer program which, when executed by the processor, performs the method of any one of claims 1 to 7.

10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method according to any one of the claims 1 to 7.