CN111415128B

CN111415128B - Method, system, device, equipment and medium for controlling conference

Info

Publication number: CN111415128B
Application number: CN201910013104.9A
Authority: CN
Inventors: 孙辉; 王思杰; 李胜; 张泽旋
Original assignee: Alibaba Group Holding Ltd
Current assignee: Alibaba Group Holding Ltd
Priority date: 2019-01-07
Filing date: 2019-01-07
Publication date: 2024-06-07
Anticipated expiration: 2039-01-07
Also published as: CN111415128A

Abstract

The invention discloses a method, a system, a device, equipment and a medium for controlling a conference, wherein the method comprises the following steps: analyzing the received meeting offer to obtain a meeting flow, wherein the meeting flow comprises meeting flow keywords of each meeting stage; converting the received audio information into text information, and identifying audio keywords in the text information; when the audio keywords are successfully matched with the conference flow keywords, playing conference topics corresponding to the conference flow keywords which are successfully matched; a meeting summary is generated based on the text information. According to the method provided by the embodiment of the invention, the conference working efficiency can be improved.

Description

Method, system, device, equipment and medium for controlling conference

Technical Field

The present invention relates to the field of computers, and in particular, to a method, system, apparatus, device, and medium for controlling a conference.

Background

Conference refers to an organized, leadership, purposeful agenda that is conducted at a specified time and place, according to a certain program. Currently, conferences may take a variety of forms, such as: teleconferencing and web conferencing. In particular, web conferences may also include voice conferences and video conferences.

During the above-mentioned various conferences, the number of people participating in the conference is large, and the notification, the topic and the summary of the conference are all realized manually, so that the technical problem of low conference work efficiency exists.

Disclosure of Invention

The embodiment of the invention provides a method, a system, a device, equipment and a medium for controlling a conference, which can improve the working efficiency of the conference.

In a first aspect, an embodiment of the present invention provides a method for controlling a conference, including:

Analyzing the received meeting offer to obtain a meeting flow, wherein the meeting flow comprises meeting flow keywords of each meeting stage; converting the received audio information into text information, and identifying audio keywords in the text information; when the audio keywords are successfully matched with the conference flow keywords, playing conference topics corresponding to the conference flow keywords which are successfully matched; a meeting summary is generated based on the text information.

In a second aspect, an embodiment of the present invention provides a speech processing system, including: a sound sensor and a voice processing device, the sound sensor being coupled to the voice processing device;

a sound sensor for receiving audio information;

The voice processing equipment is used for analyzing the received meeting invites to obtain a meeting flow, the meeting flow comprises meeting flow keywords of each meeting stage, the received audio information is converted into text information, the audio keywords in the text information are identified, when the audio keywords are successfully matched with the meeting flow keywords, the meeting subjects corresponding to the meeting flow keywords which are successfully matched are played, and meeting summary is generated based on the text information.

In a third aspect, an embodiment of the present invention provides an apparatus for controlling a conference, including:

The analysis module is used for analyzing the meeting offer to obtain a meeting flow, and the meeting flow comprises meeting flow keywords of each meeting stage; the recognition module is used for converting the received audio information into text information and recognizing audio keywords in the text information; the control module is used for playing conference issues corresponding to the successfully matched conference flow keywords when the audio keywords and the conference flow keywords are successfully matched; and the generation module is used for generating a meeting summary based on the text information.

In a fourth aspect, an embodiment of the present invention provides an apparatus for controlling a conference, including a memory and a processor; wherein, the memory is used for storing programs; a processor for executing a program stored in the memory to perform the method of controlling a conference described above in connection with the first aspect.

In a fifth aspect, embodiments of the present invention provide a computer readable storage medium having instructions stored therein which, when executed on a computer, cause the computer to perform the method of controlling a conference described above in connection with the first aspect.

In a sixth aspect, an embodiment of the present invention provides a method for controlling a conference, including:

Analyzing the meeting offer to obtain a meeting flow, wherein the meeting flow comprises meeting flow keywords of each meeting stage; converting the received audio information into text information, and identifying audio keywords in the text information; and when the audio keywords are successfully matched with the conference flow keywords, playing conference topics corresponding to the conference flow keywords which are successfully matched.

In a seventh aspect, an embodiment of the present invention provides a speech processing system, including:

a sound sensor and a voice processing device, the sound sensor being coupled to the voice processing device; a sound sensor for receiving audio information; the voice processing equipment is used for analyzing the meeting offer to obtain a meeting flow, the meeting flow comprises meeting flow keywords of each meeting stage, the received audio information is converted into text information, the audio keywords in the text information are identified, and when the audio keywords are successfully matched with the meeting flow keywords, the meeting questions corresponding to the meeting flow keywords which are successfully matched are played.

In an eighth aspect, an embodiment of the present invention provides an apparatus for controlling a conference, including:

The analysis module is used for analyzing the meeting offer to obtain a meeting flow, and the meeting flow comprises meeting flow keywords of each meeting stage; the recognition module is used for converting the received audio information into text information and recognizing audio keywords in the text information; and the control module is used for playing the conference subjects corresponding to the successfully matched conference flow keywords when the audio keywords are successfully matched with the conference flow keywords.

In a ninth aspect, an embodiment of the present invention provides an apparatus for controlling a conference, including:

A memory for storing a program; and a processor for executing a program stored in the memory to perform the method of controlling a conference described above in connection with the sixth aspect.

In a tenth aspect, embodiments of the present invention provide a computer readable storage medium having instructions stored therein which, when executed on a computer, cause the computer to perform the method of controlling a conference described above in connection with the sixth aspect.

According to the technical scheme, the received meeting offer is firstly analyzed to obtain a meeting flow, and then the audio keywords are identified. Under the condition that the audio keywords are successfully matched with the conference flow keywords, conference issues corresponding to the successfully matched conference flow keywords can be played. The automatic conference control method and the automatic conference control system can improve the conference work efficiency.

Drawings

The invention will be better understood from the following description of specific embodiments thereof taken in conjunction with the accompanying drawings in which like or similar reference characters designate like or similar features.

FIG. 1 is a schematic diagram illustrating an email in an exemplary embodiment in accordance with the present invention;

FIG. 2 is a flow diagram illustrating a method of controlling a conference in accordance with one embodiment of the present invention;

FIG. 3 is a schematic diagram illustrating a birthday party template in an exemplary embodiment according to the present invention;

Fig. 4 is a schematic diagram illustrating a working conference template in an exemplary embodiment according to the present invention;

FIG. 5 is a schematic diagram illustrating the architecture of a speech processing system in accordance with one embodiment of the present invention;

fig. 6 is a schematic view showing the structure of an apparatus for controlling a conference according to an embodiment of the present invention;

Fig. 7 is a flow diagram illustrating a method of controlling a conference according to another embodiment of the present invention;

fig. 8 is a schematic view showing the structure of an apparatus for controlling a conference according to another embodiment of the present invention;

FIG. 9 is a schematic diagram illustrating the architecture of a speech processing system in accordance with another embodiment of the present invention;

fig. 10 is a block diagram of an exemplary hardware architecture of a computing device of the method and apparatus of controlling a conference of embodiments of the present invention.

Detailed Description

The present invention will be described in further detail with reference to the accompanying drawings and the specific embodiments thereof, in order to make the objects, technical solutions and advantages of the present invention more apparent.

Features and exemplary embodiments of various aspects of the present application will be described in detail below, and in order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be described in further detail below with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are merely configured to illustrate the application and are not configured to limit the application. It will be apparent to one skilled in the art that the present application may be practiced without some of these specific details. The following description of the embodiments is merely intended to provide a better understanding of the application by showing examples of the application.

It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises an element.

Conferences include a variety of things such as: teleconferencing and web conferencing. Typically, multiple people are participating in the same meeting. After the time and place of the meeting are determined, the participants need to be notified in a number of ways. As one example, the participant may be notified by email.

Referring to fig. 1, fig. 1 is a schematic diagram of an email in an embodiment of the invention. It should be noted that, the email in fig. 1 may be understood as an implementation of meeting offer in the embodiment of the present invention.

In the e-mail, the meeting time, meeting place, meeting flow, and participant are notified. The conference is performed according to the conference flow, i.e. the conference host is required to remind to enter a certain conference topic. The conference subjects include 4 items, namely, a pilot speaking, an attendance system, canteen suggestions and summaries. Moreover, after the conference is ended, the conference summary needs to be manually summarized. Therefore, there is a technical problem that the conference work efficiency is low.

Referring to fig. 2, fig. 2 is a flowchart of a method for controlling a conference according to an embodiment of the present invention.

In fig. 2, the meeting is initiated by a meeting offer. In particular, the meeting offer may be initiated by a worker through the meeting offer, and may also be initiated by a meeting trigger condition. Wherein the meeting triggering condition is a condition for initiating a meeting. As one example, the meeting trigger condition may be a trigger point in time or a trigger event, or the like.

As shown in fig. 2, the method 200 for controlling a conference specifically includes the following steps:

Step S201, analyzing the received meeting offer to obtain a meeting flow, wherein the meeting flow comprises meeting flow keywords of each meeting stage.

Meeting offers are requests to invite specific personnel to attend a meeting at a specified time and at a specified location. The specific form of the meeting offer can be an email meeting offer, or can be a meeting offer in office software.

As one example, in the event that a meeting needs to be initiated, an email meeting offer may be sent to the participant. As another example, a meeting request may be initiated in office software, with which a participant receives a corresponding meeting offer.

In one embodiment of the invention, meeting offer templates may be pre-set to account for the different scenarios involved in the meeting. Based on the meeting offer template, part of the meeting flow is filled in, so that the time for initiating the meeting can be saved, and the efficiency for controlling the meeting can be improved.

As one example, meeting offer templates may include a birthday meeting template and a work meeting template.

Referring to fig. 3, fig. 3 is a schematic diagram of a birthday party template according to an embodiment of the present invention. The flow of each birthday is the same, except for meeting time, meeting location and participants. Based on the birthday meeting template, only meeting time, meeting place and participants need to be filled in.

Referring to fig. 4, fig. 4 is a schematic diagram of a working conference template in an embodiment of the present invention. The flow of each work session is different in that the issues are different. On the basis of the working meeting template, only the topics that need to be discussed during the meeting need be filled in.

The meeting offer can be quickly constructed by utilizing the meeting offer template, so that the working efficiency of controlling the meeting is improved.

After receiving the meeting offer, the received meeting offer may be parsed. With continued reference to fig. 1, the meeting offer in fig. 1 includes a meeting flow. That is, the received meeting offer is parsed, and the meeting flow can be obtained directly.

In one embodiment of the present invention, the conference flow includes a plurality of conference topics from which the main content of the conference discussion is known. Then, conference flow keywords may be extracted on the basis of the conference subjects. The specific process of extracting conference flow keywords is similar to identifying audio keywords in text. First, the conference topic may be segmented into one or more tokens and the part of speech of each token is tagged. Then, based on the part-of-speech tagged word, conference flow keywords of the conference topic are identified.

The extracting of the meeting flow keywords may be extracting the meeting flow keywords while receiving the meeting offer, that is, extracting the meeting flow keywords in real time; it is also possible to extract meeting flow keywords after receiving the meeting offer, in case the meeting offer has been stored.

In the above step S201, the received meeting offer is an offer set based on a meeting offer template, which is a template set in advance based on a meeting scenario.

In one embodiment, the step of parsing the received meeting offer to obtain the meeting flow in step S201 may specifically include:

analyzing the received meeting solicitation to obtain meeting issues of each meeting stage, and extracting meeting flow keywords from the meeting issues.

In one embodiment, the received meeting offer comprises an email meeting offer.

Step S202, converting the received audio information into text information, and identifying audio keywords in the text information.

For audio information of a user, i.e., received audio information, it is possible to convert a voice signal into a digital signal in consideration of great advantages of the digital signal in storage, transmission and processing. The received audio information is buffered for further processing of the information of the audio information. To ensure usability of the digital signal, the digital signal may also be filtered. Further, the speech signal of the audio information may be divided into a plurality of speech frames. Acoustic features are extracted for each of a plurality of speech frames, i.e., the waveform of each speech frame is changed into a multi-dimensional vector. And finally, converting the multidimensional vector into text information, namely a text, by utilizing an acoustic model.

After converting the audio information into text information, it is difficult to determine audio keywords in the text because the text includes a plurality of words. In the embodiment of the invention, the audio keywords can be understood as words which can embody the main semantics of the corresponding text.

In one embodiment of the invention, text may first be segmented into tokens. In particular, word segmentation may be based on a lexicon or based on statistical word segmentation.

Word segmentation is carried out based on a word stock, namely, word segmentation of a text is matched with words in an established word stock according to a certain strategy, if a certain word segmentation is found, the word segmentation is successfully matched, and the word segmentation is identified. Among other things, policies may include the following: according to different scanning directions, word library word segmentation can be divided into forward matching and reverse matching; the case of preferential matching according to different lengths can be classified into longest matching and shortest matching.

Text may be segmented into tokens based on statistical tokens. Based on statistical word segmentation, on the premise of giving a large number of segmented texts, a statistical machine learning model is utilized to learn word segmentation rules, so that text segmentation is realized. The main statistical models are: an N-gram model, a hidden Markov model (Hidden Markov Model, HMM), a maximum entropy Model (ME), a conditional random field model (Conditional Random Fields, CRF), and the like.

In addition, word library segmentation and statistical word segmentation can be combined, so that the characteristics of high word library segmentation speed and high efficiency are brought into play, and the advantages of word generation and automatic disambiguation by combining the statistical word segmentation with the context recognition are utilized.

After segmenting the text into segments, audio keywords need to be identified in one or more of the segments.

In one embodiment of the invention, the part of speech of each word may be first tagged. Parts of speech is a basic grammatical attribute of a vocabulary. Labeling parts of speech refers to labeling each word with a correct part of speech, i.e., the process of determining whether each word is a noun, verb, adjective, or other part of speech.

After determining the part of speech of each word, named entity recognition is performed. Named entity recognition is directed to the e-commerce field, and is used for recognizing brands, products, models and the like, and also comprises the step of recognizing some common field entities such as personal names, place names, organization names, time dates and the like. As one example, named entity recognition may be based on one of three methods: rule-based methods, statistics-based methods, and hybrid rule-and statistics-based methods.

Generally, the word segment to which the named entity corresponds is not an audio keyword. Therefore, after the named entity is identified, the audio keywords can be extracted from the word segments corresponding to the non-named entity.

Any of the following methods may be employed: common weighting techniques (Term frequency-inverse document frequency, TF-IDF) for information retrieval data mining, topic model (Topic model) methods, and fast automatic keyword extraction (Rapid Automatic Keyword Extraction, RAKE) methods, extract audio keywords from the words corresponding to the non-named entities.

In one embodiment, the step of converting the received audio information into text information in step S202 may specifically include:

When the received audio information is the audio information including the tag, the audio information including the tag is converted into text information including the tag.

In this embodiment, the step of generating the meeting summary based on the text information may specifically include:

and generating a meeting summary corresponding to the label according to the text information comprising the label.

In one embodiment, the tag includes a speaker tag for identifying a speaker to which the audio information corresponds; or the tag comprises a conference subject tag for identifying a conference subject to which the audio information belongs.

In one embodiment, the tags include a conference subject tag for identifying a conference subject to which the audio information pertains and a speaker tag for identifying a speaker in the conference subject to which the audio information pertains.

In one embodiment, the step of identifying the audio keyword in the text information in step S202 may specifically include:

in step S2021, the text is segmented into one or more parts of speech and the part of speech of each part of speech is tagged.

Step S2022, based on the part-of-speech tagged word, identifies the audio keywords in the text information.

In step S203, when the audio keyword is successfully matched with the conference flow keyword, the conference topic corresponding to the successfully matched conference flow keyword is played.

In an embodiment of the invention, the speech is played at different times as the sound sensor may receive different users. Thus, the number of audio keywords is more than one. That is, during the meeting, there are multiple audio keywords.

In one embodiment, one conference flow corresponds to one conference flow keyword, and a plurality of conference flows respectively correspond to a plurality of conference flow keywords. And if the audio keywords are the same as any conference flow keywords, determining that the audio keywords are successfully matched with the conference flow keywords.

The following is a schematic illustration with reference to fig. 1. With continued reference to fig. 1, four conference flows are included in fig. 1.

The conference flow keywords of the first conference flow are: a leader; the conference flow keywords of the second conference flow are: checking in; the conference flow keywords of the fourth conference flow are: a canteen; the conference flow keywords of the fourth conference flow are: summary.

The audio keywords are: and (5) leading. The audio keywords are successfully matched with the conference flow keywords. Conference issues corresponding to the successfully matched conference flow keywords can be played in the voice playing device: leading the speech, i.e. play: the pilot speaks.

Correspondingly, if any conference flow keyword of other audio keywords is successfully matched, conference issues corresponding to the successfully matched conference flow keyword can be played in the voice playing device.

In one embodiment, the determining that the audio keyword matches the conference flow keyword successfully in step S203 may specifically include:

And when the audio keywords are the same as any conference flow keywords, determining that the audio keywords are successfully matched with the conference flow keywords.

Step S204, generating a meeting summary based on the text information.

Text is a document obtained based on the conversion of audio information received by a sound sensor from a user. The content corresponding to the audio information of the user includes specific content related to the conference subjects, so that a conference summary can be generated based on the text information converted from the audio information.

Specifically, a meeting summary corresponding to the text information may be generated based on a text summarization algorithm. As one example, the text summarization algorithm includes at least one of: word frequency algorithm, clue word algorithm, location algorithm, title algorithm, vocabulary chain algorithm and associated network algorithm.

In one embodiment of the present invention, the method of controlling a conference may further include: sending the meeting summary to the participants in the meeting offer. In this embodiment, the meeting offer may include the contact of the participant, and then the meeting summary generated from the text information may be automatically sent according to the contact of the participant in the meeting offer.

As one example, the meeting offer is an email meeting offer. In the email meeting offer, the email boxes of each participant are recorded. Automatically sending meeting summary generated according to the text information to the email boxes of each participant. It can be seen that each participant can receive automatically generated meeting summary.

In the embodiment of the invention, according to the conference flow obtained by analyzing the conference offer; and converting the audio information of the user into a text, identifying an audio keyword in the text, and playing the conference subjects corresponding to the conference flow keywords under the condition that the audio keyword is successfully matched with the conference flow keywords, namely realizing automatic hosting of the conference and conference progress control, and generating conference summary based on the text. The conference operation efficiency can be improved because the conference is automatically hosted, the conference progress is controlled and the conference summary is automatically generated.

In one embodiment of the invention, the utterances of the same speaker need to be collected during the conference. Then a tag may be inserted in the audio information of the speaker, which tag may distinguish between the different speakers. Thus, the received audio information may be speech including a tag. Further, the voice including the same speaker tag can be converted into the text including the tag, and the text record is generated according to the text information including the same speaker tag. Wherein different speakers may be identified based on their voiceprints.

In one embodiment of the invention, it is desirable to collect utterances of the same conference subject during a conference. The conference session may be announced while the speech of the conference session is being started, i.e. a tag is inserted in the audio information of each speech. The collection of the utterances of the conference topic is stopped, i.e. the insertion of the tag in the audio information of each utterance is stopped, at the same time as the next conference is announced or after the conference is ended.

Thus, the received audio information may be speech including a tag. And then the voice comprising the same conference theme label can be converted into the text comprising the label, and finally the text record is generated according to the text information comprising the same conference theme label.

In one embodiment of the invention, a text record may also be generated based on the speaker tag and the conference subject tag. The collection of the utterances of the conference topic may begin while the conference topic is announced, i.e., a tag is inserted into the audio information of each speaker. The collection of the utterances of the conference topic is stopped, i.e. the insertion of tags in the audio information of each speaker is stopped, at the same time as the next conference is announced or after the conference is ended. Then the text record may include not only the speaker tag and the conference subject tag.

Referring to fig. 5, fig. 5 is a schematic diagram of a speech processing system according to an embodiment of the invention. In an embodiment of the present invention, the speech processing system 500 can include a sound sensor 510 and a speech processing device 520, the sound sensor 510 being coupled to the speech processing device 520.

The sound sensor in the embodiment of the invention can be independently arranged with the voice processing device, namely, the sound sensor and the voice processing device are independent devices respectively, and as an example, the sound sensor is arranged at a local end, and the voice processing device is arranged at a cloud end. The sound sensor may also be provided in the same apparatus as the speech processing device, as an example: the sound sensor and the voice processing device are provided in the conference device.

The acoustic sensor is a sensor that can sense an acoustic quantity and convert it into an outputtable signal. The sound sensor includes a sound pressure sensor, a noise sensor, an ultrasonic sensor, and a microphone.

The sound sensor may collect the sound of the user. The voice processing device receives audio information of a user from the sound sensor. As one example, the voice processing device may play the meeting topic corresponding to the meeting flow keyword that is successfully matched, so that the user may speak according to the played meeting topic to audio information to the sound sensor.

It should be noted that the speech processing system in fig. 5 may perform the method for controlling a conference in the embodiment of the present invention described above with reference to fig. 1 to fig. 4. For convenience and brevity of description, detailed descriptions of known methods are omitted herein, and specific method steps for controlling the conference may refer to corresponding procedures in the foregoing method embodiments, which are not repeated herein.

Referring to fig. 6, fig. 6 is a schematic structural diagram of a device for controlling a conference according to an embodiment of the present invention, where the device for controlling a conference corresponds to a method for controlling a conference, and the device 600 for controlling a conference specifically includes:

the parsing module 601 is configured to parse the meeting offer to obtain a meeting flow, where the meeting flow includes meeting flow keywords of each meeting stage.

The identifying module 602 is configured to convert the received audio information into text information, and identify audio keywords in the text information.

And the control module 603 is configured to play the conference subjects corresponding to the successfully matched conference flow keywords when the audio keywords and the conference flow keywords are successfully matched.

The generating module 604 is configured to generate a meeting summary based on the text information.

In one embodiment, the receipt of the meeting offer is an offer set based on a meeting offer template, which is a template preset based on the meeting scenario.

In one embodiment, the parsing module 601 is specifically configured to:

In one embodiment, the received meeting offer comprises an email meeting offer.

In one embodiment, the recognition module 602, when specifically used to recognize audio keywords in text information, is specifically used to:

Segmenting the text into one or more segmented words, and labeling the part of speech of each segmented word;

based on the part-of-speech tagged word, audio keywords in the text information are identified.

In one embodiment, the control module 603 may also be specifically configured to:

when the audio keywords are the same as any conference flow keywords, determining that the audio keywords are successfully matched with the conference flow keywords, and playing conference subjects corresponding to the successfully matched conference flow keywords.

In one embodiment, the generating module 604 may be specifically configured to:

Generating a meeting summary corresponding to the text information based on a text summarization algorithm, wherein the text summarization algorithm comprises at least one of the following: word frequency algorithm, clue word algorithm, location algorithm, title algorithm, vocabulary chain algorithm and associated network algorithm.

In one embodiment, the recognition module 602, when specifically configured to convert received audio information to text information, is specifically configured to:

In this embodiment, the generating module 604 may be further specifically configured to generate a meeting summary corresponding to the tag according to the text information including the tag.

In one embodiment, the apparatus 600 for controlling a conference may further include:

a sending module (not shown in fig. 6) for sending the meeting summary to the participants in the meeting offer.

It should be clear that the invention is not limited to the specific arrangements and processes described in the foregoing embodiments and shown in the drawings. For convenience and brevity of description, detailed descriptions of known methods are omitted herein, and specific working processes of the above-described systems, modules and units may refer to corresponding processes in the above-described method embodiments, which are not repeated herein.

Fig. 7 shows a flow diagram of a method of controlling a conference according to another embodiment of the invention. As shown in fig. 7, in one embodiment, a method 700 of controlling a conference may include:

Step S710, analyzing the meeting offer to obtain a meeting flow, wherein the meeting flow comprises meeting flow keywords of each meeting stage;

Step S720, converting the received audio information into text information, and identifying audio keywords in the text information;

in step S730, when the audio keyword is successfully matched with the conference flow keyword, the conference topic corresponding to the successfully matched conference flow keyword is played.

In one embodiment, the meeting offer is an offer set based on a meeting offer template, which is a template preset according to the meeting scenario.

In one embodiment, the method 700 of controlling a conference may further include:

step S740, generating a meeting summary based on the text information.

step S750, sending a meeting summary to the participants in the meeting offer.

In one embodiment, the step of converting the received audio information into text information in step S720 may specifically include:

step S721-01, obtaining received audio information, wherein the audio information comprises a speaker tag, and the speaker tag is used for identifying a speaker corresponding to the audio information;

Step S722-01, converts the audio information into text information including a speaker tag.

In this embodiment, the method 700 of controlling a conference may further include:

Step S723-01, generating a meeting summary corresponding to the speaker tag according to the text information including the speaker tag.

step S724-01, transmitting the meeting summary corresponding to the speaker tag to the participant in the meeting offer.

Step S721-02, obtaining received audio information, wherein the audio information comprises a conference subject tag, and the conference subject tag is used for identifying a conference subject to which the audio information belongs;

Step S722-02 converts the audio information into text information including a conference subject tag.

Step S723-02, generating a meeting summary corresponding to the meeting topic label according to the text information comprising the meeting topic label.

step S724-02, sending the meeting summary corresponding to the meeting topic label to the participants in the meeting offer.

Step S721-03, acquiring received audio information, wherein the audio information comprises a conference subject label and a speaker label, the conference subject label is used for identifying a conference subject to which the audio information belongs, and the speaker label is used for identifying a speaker in the conference subject to which the audio information belongs;

Step S722-03 converts the audio information into text information including the conference subject tag and the speaker tag.

step S723-03, generating a meeting summary corresponding to the meeting topic tag and the speaker tag according to the text information including the meeting topic tag and the speaker tag.

Step S724-03, transmitting the meeting summary corresponding to the meeting topic label and the speaker label to the participants in the meeting offer.

According to the method for controlling the conference, provided by the embodiment of the invention, the received conference offer is analyzed to obtain the conference flow, the audio keyword is identified, and under the condition that the audio keyword is successfully matched with the conference flow keyword, the conference topic corresponding to the successfully matched conference flow keyword can be played. By the automatic conference control process, conference work efficiency can be improved.

Fig. 8 is a schematic structural view showing an apparatus for controlling a conference according to another embodiment of the present invention. As shown in fig. 8, in one 810 embodiment, an apparatus 800 for controlling a conference may include:

The parsing module 810 is configured to parse the meeting offer to obtain a meeting flow, where the meeting flow includes meeting flow keywords of each meeting stage;

An identification module 820 for converting the received audio information into text information and identifying audio keywords in the text information;

the control module 830 is configured to play the conference subjects corresponding to the successfully matched conference flow keywords when the audio keywords and the conference flow keywords are successfully matched.

In one embodiment, the apparatus 800 for controlling a conference may further include:

and the generation module is used for generating a meeting summary based on the text information.

In this embodiment, the apparatus 800 for controlling a conference may further include:

and the sending module is used for sending the meeting summary to the participants in the meeting offer.

In one embodiment, the recognition module 820, when specifically configured to convert received audio information into text information, may be specifically configured to:

acquiring received audio information, wherein the audio information comprises a speaker tag, and the speaker tag is used for identifying a speaker corresponding to the audio information; the audio information is converted into text information including a speaker tag.

And the generation module is used for generating a meeting summary corresponding to the speaker tag according to the text information comprising the speaker tag.

and the sending module is used for sending the meeting summary corresponding to the speaker tag to the participants in the meeting offer.

Acquiring received audio information, wherein the audio information comprises a conference subject tag, and the conference subject tag is used for identifying a conference subject to which the audio information belongs; the audio information is converted to text information including a conference subject label.

The generation module is used for generating a meeting summary corresponding to the meeting topic label according to the text information comprising the meeting topic label.

and the sending module is used for sending the meeting summary corresponding to the meeting theme label to the participants in the meeting offer.

the method comprises the steps that received audio information is obtained, the audio information comprises a conference subject tag and a speaker tag, the conference subject tag is used for identifying a conference subject to which the audio information belongs, and the speaker tag is used for identifying a speaker in the conference subject to which the audio information belongs; the audio information is converted into text information including a conference subject tag and a speaker tag.

the generating module is used for generating meeting summary corresponding to the meeting topic label and the speaker label according to the text information comprising the meeting topic label and the speaker label.

and the sending module is used for sending the meeting summary corresponding to the meeting topic label and the speaker label to the participants in the meeting offer.

It should be clear that the invention is not limited to the specific arrangements and processes described in the foregoing embodiments and shown in the drawings. For convenience and brevity of description, detailed descriptions of known methods are omitted herein, and specific working processes of the above-described systems, modules and units may refer to corresponding processes in the method embodiment described above in connection with fig. 7, which are not repeated herein.

FIG. 9 is a schematic diagram of a speech processing system according to another embodiment of the present invention. As shown in fig. 9, the speech processing system 900 can include a sound sensor 910 and a speech processing device 920. The sound sensor is coupled to the speech processing device.

In one embodiment, a sound sensor 910 is used to receive audio information;

the voice processing device 920 is configured to parse the meeting offer to obtain a meeting flow, where the meeting flow includes meeting flow keywords of each meeting stage, convert the received audio information into text information, identify the audio keywords in the text information, and play the meeting issues corresponding to the meeting flow keywords that are successfully matched when the audio keywords are successfully matched with the meeting flow keywords.

It should be noted that the speech processing system in fig. 9 may execute the method for controlling a conference in the embodiment of the present invention described above in connection with fig. 7. For convenience and brevity of description, detailed descriptions of known methods are omitted herein, and specific method steps for controlling the conference may refer to corresponding procedures in the foregoing method embodiments, which are not repeated herein.

Fig. 10 is a block diagram illustrating an exemplary hardware architecture of a computing device capable of implementing methods and apparatus for controlling conferences in accordance with embodiments of the present invention.

As shown in fig. 10, the computing device 1000 includes an input device 1001, an input interface 1002, a central processor 1003, a memory 1004, an output interface 1005, and an output device 1006. The input interface 1002, the central processing unit 1003, the memory 1004, and the output interface 1005 are connected to each other via the bus 710, and the input device 1001 and the output device 1006 are connected to the bus 710 via the input interface 1002 and the output interface 1005, respectively, and further connected to other components of the computing device 1000.

Specifically, the input device 1001 receives input information from the outside, and transmits the input information to the central processor 1003 through the input interface 1002; the central processor 1003 processes the input information based on computer executable instructions stored in the memory 1004 to generate output information, temporarily or permanently stores the output information in the memory 1004, and then transmits the output information to the output device 1006 through the output interface 1005; output device 1006 outputs output information to the outside of computing device 1000 for use by a user.

That is, in one embodiment, the computing device shown in FIG. 10 may also be implemented to include: a memory storing computer-executable instructions; and a processor that when executing the computer-executable instructions may implement the method of controlling a conference described in connection with fig. 1-6.

In one embodiment, the computing device shown in FIG. 10 may also be implemented to include: a memory storing computer-executable instructions; and a processor that when executing the computer-executable instructions can implement the method of controlling a conference described in connection with fig. 7.

The processes described above with reference to flowcharts may be implemented as computer software programs according to embodiments of the present invention. For example, embodiments of the present invention include a computer program product comprising a computer program tangibly embodied on a machine-readable medium, the computer program comprising program code for performing the method shown in the flowchart. In such embodiments, the computer program may be downloaded and installed from a network, and/or installed from a removable storage medium.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the corresponding technical solutions from the scope of the technical solutions of the embodiments of the present invention.

Claims

1. A method of controlling a conference, comprising:

Analyzing the received meeting offers to obtain meeting flows, wherein the meeting flows comprise a plurality of meeting issues, the meeting issues comprise main contents of meeting discussions, and the meeting flows comprise meeting flow keywords of each meeting stage; the analyzing the received meeting offer to obtain a meeting flow includes: analyzing the received meeting offer to obtain meeting issues of each meeting stage, and extracting the meeting flow keywords from the meeting issues;

When the received audio information is the audio including the tag, converting the audio information including the tag into text information including the tag, and identifying audio keywords in the text information of the tag;

when the audio keywords are successfully matched with the conference flow keywords, playing conference issues corresponding to the conference flow keywords which are successfully matched, starting to collect the utterances of the conference issues, inserting a tag into the audio information of each utterance to obtain the audio information comprising the tag, and converting the audio information comprising the tag into text information comprising the tag;

and generating a meeting summary corresponding to the tag based on the text information comprising the tag.

2. The method of controlling a conference according to claim 1, wherein,

The received meeting offer is an offer set based on a meeting offer template, which is a template preset based on a meeting scene.

3. The method of controlling a meeting of claim 1, wherein the received meeting offer comprises an email meeting offer.

4. The method of controlling a conference according to claim 1, wherein said identifying audio keywords in said text information comprises:

segmenting the text into one or more segmentation words, and marking the part of speech of each segmentation word;

and identifying the audio keywords in the text information based on the word segmentation marked with the part of speech.

5. The method for controlling a conference according to claim 1, wherein when the audio keyword is successfully matched with the conference flow keyword, playing a conference topic corresponding to the conference flow keyword successfully matched, comprising:

And when the audio keywords are the same as any conference flow keywords, determining that the audio keywords are successfully matched with the conference flow keywords, and playing conference issues corresponding to the successfully matched conference flow keywords.

6. The method of controlling a meeting of claim 1, wherein the generating a meeting summary based on the text information comprises:

7. The method of controlling a conference according to claim 1, wherein,

The label comprises a speaker label, and the speaker label is used for identifying a speaker corresponding to the audio information; or alternatively

The tag comprises a conference subject tag, and the conference subject tag is used for identifying conference subjects to which the audio information belongs.

8. The method of controlling a conference according to claim 1, wherein,

The labels comprise conference theme labels and speaker labels, wherein the conference theme labels are used for identifying conference topics to which the audio information belongs, and the speaker labels are used for identifying speakers in the conference topics to which the audio information belongs.

9. The method of controlling a conference of claim 1, further comprising:

sending the meeting summary to the participants in the meeting offer.

10. A method of controlling a conference, comprising:

Analyzing a meeting offer to obtain a meeting flow, wherein the meeting flow comprises a plurality of meeting issues, the meeting issues comprise main contents of meeting discussion, and the meeting flow comprises meeting flow keywords of each meeting stage; the step of analyzing the meeting offer to obtain a meeting flow comprises the following steps: analyzing meeting offers to obtain meeting issues of each meeting stage, and extracting keywords of the meeting flow from the meeting issues;

When the received audio information is audio comprising a conference theme tag, converting the audio information comprising the conference theme tag into text information comprising the conference theme tag, and identifying audio keywords in the text information of the conference theme tag, wherein the conference theme tag is used for identifying a conference theme to which the audio information belongs;

When the audio keywords are successfully matched with the conference flow keywords, playing conference subjects corresponding to the conference flow keywords which are successfully matched, starting to collect the utterances of the conference subjects, inserting conference subject labels into the audio information of each utterance to obtain audio information comprising the conference subject labels, and converting the audio information comprising the conference subject labels into text information comprising the conference subject labels.

11. The method of controlling a conference according to claim 10, wherein,

The meeting offer is an offer set based on a meeting offer template, and the meeting offer template is a template preset according to a meeting scene.

12. The method of controlling a conference according to claim 10, wherein said converting the received audio information into text information comprises:

acquiring received audio information, wherein the audio information comprises a speaker tag, and the speaker tag is used for identifying a speaker corresponding to the audio information;

the audio information is converted into text information including the speaker tag.

13. The method of controlling a conference of claim 12, further comprising:

and generating a meeting summary corresponding to the speaker tag according to the text information comprising the speaker tag.

14. The method of controlling a conference of claim 13, further comprising:

and sending the conference summary corresponding to the speaker tag to the participants in the conference offer.

15. The method of controlling a conference of claim 10, further comprising:

and generating a meeting summary corresponding to the meeting topic label according to the text information comprising the meeting topic label.

16. The method of controlling a conference of claim 15, further comprising:

and sending the meeting summary corresponding to the meeting theme label to the participants in the meeting offer.

17. The method of controlling a conference according to claim 10, wherein said converting the received audio information into text information comprises:

The method comprises the steps that received audio information is obtained, the audio information comprises a conference subject tag and a speaker tag, the conference subject tag is used for identifying a conference subject to which the audio information belongs, and the speaker tag is used for identifying a speaker in the conference subject to which the audio information belongs;

And converting the audio information into text information comprising the conference theme tag and the speaker tag.

18. The method of controlling a conference of claim 17, further comprising:

and generating meeting summary corresponding to the meeting topic label and the speaker label according to the text information comprising the meeting topic label and the speaker label.

19. The method of controlling a conference of claim 18, further comprising:

and sending the conference summary corresponding to the conference theme label and the speaker label to the participants in the conference offer.

20. A speech processing system comprising: a sound sensor and a speech processing device, the sound sensor coupled with the speech processing device;

The sound sensor is used for receiving audio information;

The voice processing device is configured to parse a received meeting offer to obtain a meeting flow, where the meeting flow includes a plurality of meeting topics, the meeting topics include main contents of meeting discussion, the meeting flow includes meeting flow keywords of each meeting stage, and the parsing the received meeting offer to obtain the meeting flow includes: analyzing the received meeting offer to obtain meeting issues of each meeting stage, and extracting the meeting flow keywords from the meeting issues; when the received audio information is the audio including the tag, converting the audio information including the tag into the text information including the tag, identifying the audio keyword in the text information of the tag, playing the conference subjects corresponding to the conference flow keywords successfully matched when the audio keyword is successfully matched with the conference flow keywords, starting to collect the comments of the conference subjects, inserting the tag into the audio information of each comment to obtain the audio information including the tag, converting the audio information including the tag into the text information including the tag, and generating the conference subjects corresponding to the tag based on the text information including the tag.

21. A speech processing system comprising: a sound sensor and a speech processing device, the sound sensor coupled with the speech processing device;

The sound sensor is used for receiving audio information;

The voice processing device is configured to parse a meeting offer to obtain a meeting flow, where the meeting flow includes a plurality of meeting issues, the meeting issues include main contents of a meeting discussion, and the meeting flow includes meeting flow keywords of each meeting stage, where parsing the meeting offer to obtain the meeting flow includes: analyzing meeting offers to obtain meeting issues of each meeting stage, and extracting keywords of the meeting flow from the meeting issues; when the received audio information is audio including a conference theme tag, converting the audio information including the conference theme tag into text information including the conference theme tag, identifying an audio keyword in the text information of the conference theme tag, wherein the conference theme tag is used for identifying a conference theme to which the audio information belongs, when the audio keyword is successfully matched with the conference flow keyword, playing a conference topic corresponding to the conference flow keyword which is successfully matched, starting to collect speaking of the conference topic, inserting the conference theme tag into the audio information of each speaking to obtain the audio information including the conference theme tag, and converting the audio information including the conference theme tag into the text information including the conference theme tag.

22. An apparatus for controlling a conference, comprising:

the analysis module is used for analyzing the meeting offer to obtain a meeting flow, wherein the meeting flow comprises a plurality of meeting issues, the meeting issues comprise main contents of meeting discussion, and the meeting flow comprises meeting flow keywords of each meeting stage; the analyzing the received meeting offer to obtain a meeting flow includes: analyzing the received meeting offer to obtain meeting issues of each meeting stage, and extracting the meeting flow keywords from the meeting issues;

the identification module is used for converting the audio information comprising the tag into text information comprising the tag when the received audio information is the audio comprising the tag, and identifying audio keywords in the text information of the tag;

The control module is used for playing conference subjects corresponding to the conference flow keywords which are successfully matched when the audio keywords are successfully matched with the conference flow keywords, starting to collect the speech of the conference subjects, inserting a tag into the audio information of each speech to obtain the audio information comprising the tag, and converting the audio information comprising the tag into text information comprising the tag;

and the generation module is used for generating a meeting summary corresponding to the tag based on the text information comprising the tag.

23. An apparatus for controlling a conference, comprising:

The analysis module is used for analyzing the meeting offer to obtain a meeting flow, wherein the meeting flow comprises a plurality of meeting issues, the meeting issues comprise main contents of meeting discussion, and the meeting flow comprises meeting flow keywords of each meeting stage; the step of analyzing the meeting offer to obtain a meeting flow comprises the following steps: analyzing meeting offers to obtain meeting issues of each meeting stage, and extracting keywords of the meeting flow from the meeting issues;

the recognition module is used for converting the audio information comprising the conference theme tag into text information comprising the conference theme tag when the received audio information is audio comprising the conference theme tag, and recognizing audio keywords in the text information of the conference theme tag, wherein the conference theme tag is used for identifying a conference theme to which the audio information belongs;

And the control module is used for playing conference subjects corresponding to the conference flow keywords which are successfully matched when the audio keywords are successfully matched with the conference flow keywords, starting to collect the comments of the conference subjects, inserting conference subject labels into the audio information of each speech to obtain the audio information comprising the conference subject labels, and converting the audio information comprising the conference subject labels into text information comprising the conference subject labels.

24. An apparatus for controlling a conference, comprising a memory and a processor;

the memory is used for storing executable program codes;

The processor for reading executable program code stored in the memory to perform the method of controlling a conference of any of claims 1-9.

25. An apparatus for controlling a conference, comprising a memory and a processor;

the memory is used for storing executable program codes;

the processor for reading executable program code stored in the memory to perform the method of controlling a conference of any of claims 10-19.

26. A computer-readable storage medium comprising instructions that, when run on a computer, cause the computer to perform the method of controlling a conference as claimed in any one of claims 1-9.

27. A computer-readable storage medium comprising instructions that, when run on a computer, cause the computer to perform the method of controlling a conference as claimed in any one of claims 10-19.