CN113139392B

CN113139392B - Conference summary generation method, device and storage medium

Info

Publication number: CN113139392B
Application number: CN202010052589.5A
Authority: CN
Inventors: 李晓林; 曾小光; 毕驰
Original assignee: Qingdao Hisense Commercial Display Co Ltd
Current assignee: Qingdao Hisense Commercial Display Co Ltd
Priority date: 2020-01-17
Filing date: 2020-01-17
Publication date: 2023-08-15
Anticipated expiration: 2040-01-17
Also published as: CN113139392A

Abstract

The embodiment of the application provides a method, a device and a storage medium for generating a meeting summary, wherein the method comprises the following steps: the speaking terminal acquires the voice data of the speaking person and sends the voice data to the server, the server acquires the voice data of the speaking person through the speaking terminal and converts the voice data into a text, the server correlates the identification of the speaking person, the voice data and the text according to a preset conference summary format to generate a conference summary, and the server sends the conference summary to the speaking terminal. According to the technical scheme, the meeting summary is generated according to the format of the preset meeting summary, the problems that the meeting summary is easy to make mistakes, time and labor are consumed in manual arrangement are solved, and the generation efficiency of the meeting summary is improved.

Description

Conference summary generation method, device and storage medium

Technical Field

The present application relates to the field of data processing technologies, and in particular, to a method and apparatus for generating a meeting summary, and a storage medium.

Background

In the daily office process, participation in a meeting has become an indispensable task, and in the meeting process, a meeting summary is generally required to be collected to collect and arrange the speech of meeting participants and follow-up of subsequent tasks.

In the practical application process, the conference summary is often collected and arranged by special staff to speak of each conference participant in the conference, wherein the staff needs to record important speaking of each conference participant in detail.

However, because of the numerous numbers of attendees and the linguistic differences, the manner in which the meeting summary is manually recorded is prone to errors, missing important content, and is time-consuming and labor-consuming.

Disclosure of Invention

The application provides a method, a device and a storage medium for generating a meeting summary, which are used for solving the problems of time consumption, labor consumption and easy error in the arrangement of the existing meeting summary.

In a first aspect, an embodiment of the present application provides a method for generating a meeting summary, including:

acquiring voice data of a speaker through a speaking terminal, and converting the voice data into a text;

according to a preset conference summary format, the identification of the speaker, the voice data and the text are associated to generate a conference summary;

and sending the conference summary to the speaking terminal.

Further, the speaking terminals comprise a first speaking terminal and a second speaking terminal, and the equipment information of the first speaking terminal is stored in the server;

When the speaking terminal is a first speaking terminal and is a single microphone device, or the speaking terminal is a second speaking terminal, the converting the voice data into text includes:

the voice data is converted to text by a voice transcription service.

Further, when the speaking terminal is a first speaking terminal and is a microphone array device, the converting the voice data into text includes:

determining whether a camera is arranged on the speaking terminal according to the equipment information;

if yes, acquiring the number of the speakers through the camera;

according to the number of the speakers, converting the voice data into text through a conversation listening and recording service or a voice transcription service;

if not, the voice data is converted into text through the session listening and recording service.

Further, the converting the voice data into text through a conversational listening service or a voice transcription service according to the number of the speakers includes:

if the number of the speakers is greater than or equal to 2, converting the voice data into text through a conversation listening and recording service;

and if the number of the speakers is equal to 1, converting the voice data into text through a voice transcription service.

Further, the converting the voice data into text through the session listening service includes:

acquiring the identification of each target speaker and target voice data from the voice data through a session audion service, wherein the speaker comprises the target speaker;

converting the target voice data into target text for each target speaker, the text including the target text;

correspondingly, the step of associating the speaker identifier, the voice data and the text to generate a conference summary includes:

and for each target speaker, associating the identification of the target speaker, the target voice data and the target text to generate the conference summary.

Further, if the speaking terminal is provided with a camera, the method further includes:

in the conference process, receiving an update message from the speaking terminal, wherein the update message comprises the number of changed speaking persons;

according to the number of the changed speakers, the changed voice data is converted into changed texts through a conversation listening and recording service or a voice transcription service;

And associating the speaker identification, the voice data and the text, and associating the modified speaker identification, the modified voice data and the modified text to generate the conference summary.

Further, the method further comprises:

translating the text into a translation text corresponding to a preset plurality of languages;

and associating the identification of the speaker, the voice data, the text and the translation text to generate the meeting summary.

Further, if the speaking terminal is a first speaking terminal, before the voice data of the speaker is acquired through the speaking terminal, the method includes:

receiving a registration message from the speaking terminal, wherein the registration message comprises a conference number, an identification of the speaking terminal and a device type, and the device type comprises single microphone equipment and microphone array equipment;

checking the speaking terminal according to the conference number;

correspondingly, the step of obtaining the voice data of the speaker through the speaking terminal comprises the following steps:

And if the verification is passed, receiving the voice data of the speaker from the speaking terminal, and acquiring the voice data of the speaker through a voice transcription service.

In a second aspect, an embodiment of the present application provides a method for generating a meeting summary, including:

acquiring voice data of a speaker, and sending the voice data to a server;

and receiving a meeting summary from the server.

judging whether the number of the speakers is changed or not through the camera in the conference process;

if yes, an update message is sent to the server, wherein the update message comprises the number of the changed speakers.

Further, if the speaking terminal is a first speaking terminal, before the voice data of the speaker is obtained, the method further includes:

and sending a registration message to the server, wherein the registration message comprises a conference number, an identification of the speaking terminal and a device type, and the device type comprises single microphone equipment and microphone array equipment.

In a third aspect, an embodiment of the present application provides a device for generating a meeting summary, including:

The system comprises an acquisition module, a sending module and a receiving module, wherein the acquisition module is used for acquiring voice data of a speaker through a speaking terminal, and the sending module is used for converting the voice data into texts;

the processing module is used for associating the identification of the speaker, the voice data and the text according to a preset conference summary format to generate a conference summary;

and the sending module is further used for sending the conference summary to the speaking terminal.

In a fourth aspect, an embodiment of the present application provides a device for generating a meeting summary, including:

the acquisition module is used for acquiring voice data of a speaker and sending the voice data to the server;

and the receiving module is used for receiving the meeting summary from the server.

In a fifth aspect, an embodiment of the present application provides a server comprising a processor, a memory, a transceiver, and a computer program stored on the memory and executable on the processor, the processor implementing the method according to the first aspect when executing the program.

In a sixth aspect, an embodiment of the present application provides a speaking terminal comprising a processor, a memory, a transceiver and a computer program stored on the memory and executable on the processor, the processor implementing the method according to the second aspect when executing the program.

In a seventh aspect, embodiments of the present application provide a storage medium having stored therein instructions which, when run on a computer, cause the computer to perform the method according to the first aspect and any of the various possible implementations of the first aspect.

The method, the device and the storage medium for generating the meeting summary provided by the embodiment of the application comprise the following steps: the method for generating the meeting summary provided by the embodiment comprises the following steps: the speaking terminal acquires the voice data of the speaking person and sends the voice data to the server, the server acquires the voice data of the speaking person through the speaking terminal and converts the voice data into a text, the server correlates the identification of the speaking person, the voice data and the text according to a preset conference summary format to generate a conference summary, and the server sends the conference summary to the speaking terminal. According to the technical scheme, the meeting summary is generated according to the format of the preset meeting summary, the problems that the meeting summary is easy to make mistakes, time and labor are consumed in manual arrangement are solved, and the generation efficiency of the meeting summary is improved.

Drawings

Fig. 1 is a schematic diagram of a teleconferencing system according to an embodiment of the present application;

FIG. 2 is a schematic diagram of a reservation interface according to an embodiment of the present application;

fig. 3 is a schematic diagram of a conference setting interface according to an embodiment of the present application;

fig. 4 is a flowchart of a method for generating a meeting summary according to an embodiment of the present application;

fig. 5 is a second flow chart of a method for generating a meeting summary provided in an embodiment of the present application;

fig. 6 is a flowchart illustrating a method for generating a meeting summary according to an embodiment of the present application;

fig. 7 is a schematic structural diagram of a device for generating a meeting summary according to an embodiment of the present application;

fig. 8 is a schematic structural diagram II of a device for generating a meeting summary provided in an embodiment of the present application;

fig. 9 is a schematic diagram of a hardware structure of a server according to an embodiment of the present application;

fig. 10 is a schematic hardware structure of a speaking terminal according to an embodiment of the present application.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present application more apparent, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments of the present application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

Aiming at the problems that the manual meeting summary is prone to error and consumes time and labor in the current stage, the embodiment of the application provides a method for generating the meeting summary, and the problems that the manual meeting summary is prone to error and consumes time and labor are solved by generating the meeting summary according to the format of the preset meeting summary, so that the efficiency of generating the meeting summary is improved.

Fig. 1 is a schematic diagram of a teleconference system according to an embodiment of the present application, as shown in fig. 1, where the teleconference system includes an speaking terminal 101 and a server 102, where the speaking terminal 101 includes a first speaking terminal 1011 and a second speaking terminal 1012, where device information of the first speaking terminal 1011 is stored in the server 102, and the device information includes a device model, hardware configuration information, software configuration information, and the like, that is, the first speaking terminal 1011 and the server 102 may be devices adapted for the teleconference system, and the second speaking terminal 1012 is a third party device. Wherein the number of terminals 101 is at least two.

The first talker terminal 1011 may join the teleconference through conference software or a browser, and the first talker terminal 1011 may correspond to a conference room in which at least one talker may be included.

The conference system may further include a receiving terminal, where the first speaking terminal 1011 and the second speaking terminal 1012 are terminals to speak, and the receiving terminal is a terminal that receives voice data but does not speak, and the first speaking terminal 1011, the second speaking terminal 1012, and the receiving terminal are connected to the server 102 via a network, respectively. Since the conference summary according to the present embodiment is generated and the receiving terminal does not speak, only the procedure of accessing the conference system by the first speaking terminal 1011 and the second speaking terminal 1012 will be described.

In general, before a teleconference formally starts, a reservation before a meeting is required, and when the reservation is made, meeting information including the time, date, name, document, etc. of the meeting needs to be set; when a conference room is reserved, a conference room name of a participant, whether a terminal of the conference room of the participant is a speaking terminal, an identification of the speaking terminal, the number of participants, participant information in the conference room, a language permitted to be used by the conference, and the like need to be set. The participant information may be a user name and/or mailbox information of the participant. The identifier of the first speaking terminal may be an identifier preset for the first speaking terminal, such as a conference room No. 1, an a301 conference room, and the like, and the identifier of the first speaking terminal of the current conference may be displayed in conference software when the conference proceeds. The speaking terminals involved in the pre-meeting reservation are all first speaking terminals.

As an example, fig. 2 is a schematic diagram of a reservation interface provided by an embodiment of the present application, where, as shown in fig. 2, when reserving before a meeting, meeting information needs to be set, including a meeting date, a meeting time, a duration, a meeting theme, a meeting content, and a meeting attachment; when a conference room is reserved, the name of the conference room, whether the terminal of the conference room is a speaking terminal, the identification of the speaking terminal and the participant information in the conference room need to be filled in.

Then, after clicking and submitting, the user further sets the teleconference, and fig. 3 is a schematic diagram of a conference setting interface provided by the embodiment of the present application, as shown in fig. 3, in the conference setting options, including voice transcription, voice translation and session listening record, whether each option is opened can be set according to actual service requirements. The method and the device for recording the voice in the voice communication system do not allow voice transcription to be started when voice transcription is not started, do not allow conversation listening to be started when voice transcription is not started, and do not allow voice translation to be started when voice transcription is not started. After the conference setting is submitted, the server may send the conference number to the speaking terminal of each conference room.

In practical application, due to service requirements, a conference number can be sent to the second speaking terminal, in a possible application scenario, two first speaking terminals of the company access the conference through conference software or a browser, due to practical requirements, a partner is required to access the conference by adopting the conference number based on the conference software or the browser, wherein the terminal adopted by the partner to access the conference is the second speaking terminal, and the server does not know equipment information of the terminal.

The technical scheme of the application is described in detail through specific examples. It should be noted that the following embodiments may be combined with each other, and the same or similar concepts or processes may not be described in detail in some embodiments. Embodiments of the present application will be described below with reference to the accompanying drawings.

Fig. 4 is a flow chart diagram of a method for generating a meeting summary provided by an embodiment of the present application, as shown in fig. 4, the method includes the following steps:

s101, the speaking terminal acquires voice data of a speaker and sends the voice data to the server.

S102, the server acquires voice data of a speaker through the speaking terminal and converts the voice data into text.

In general, when the meeting time of the reserved teleconference arrives, the server may start a listening process to listen to a registration message of an speaking terminal, where the speaking terminal may access the meeting through meeting software or a browser, and the speaking terminal includes a first speaking terminal and a second speaking terminal, where the first speaking terminal and the server may be devices adapted to the meeting system, and the second speaking terminal is a third party device. The number of the speaking terminals is at least two, and one speaking terminal will be described as an example.

If the speaking terminal is the first speaking terminal, before the speaking terminal acquires the voice data of the speaker, the method further comprises:

the speaking terminal sends a registration message to the server, wherein the registration message comprises a conference number, an identification of the speaking terminal and a device type, and the device type comprises single microphone equipment and microphone array equipment. Correspondingly, the server receives the registration message from the speaking terminal, and the server checks the speaking terminal according to the conference number.

Specifically, the server receives a registration message from the speaking terminal, checks the speaking terminal according to the conference number in the registration message, and if the conference number sent by the speaking terminal is the same as the conference number of the current conference, the verification of the speaking terminal is passed, otherwise, the verification fails.

If the verification is passed, the server can also respond to the registration message to send the conference information when the reservation is made to the speaking terminal, so that the speaking terminal can know the outline of the accessed conference, and then the speaking terminal can perform language interaction with other terminals accessed in the conference. Specifically, the speaking terminal acquires the voice data of the speaker, and sends the voice data to the server, and correspondingly, the server acquires the voice data of the speaker through the speaking device.

When the server acquires the voice data of the speaker through the speaking device, the voice data can be sent to other terminals in the conference, and a certain time is needed in the process of converting voice into text, so that the voice data can be recorded and stored through the voice transcription service when the voice transcription function is started, and the server acquires the voice data of the speaker through the speaking terminal, and the voice data processing method comprises the following steps:

if the verification is passed, the server receives the voice data of the speaker from the speaking terminal and acquires the voice data of the speaker through the voice transcription service.

If the speaking device is the second speaking device, before the speaking terminal acquires the voice data of the speaker, the method further includes:

The speaking terminal sends a registration message to the server, wherein the registration message comprises the conference number and the identification of the speaking terminal. Correspondingly, the server receives a registration message from the speaking terminal, wherein the registration message comprises a conference number and an identification of the speaking terminal, and the server checks the speaking terminal according to the conference number.

When the speaking device is the second speaking device, the identification of the speaking device may be an identification indicating the identity of the speaking device, such as a serial number or an identification code of the speaking device, so that the identification of the speaking terminal may be displayed on the conference interface when the speaking device accesses the conference. The specific access procedure of the second speaking terminal is similar to that of the first speaking terminal, and will not be described here again. The second speaker terminal may then obtain the voice data of the speaker and send the voice data to the server.

The server may then convert the voice data to text, a specific implementation of which is described in the embodiment of fig. 5.

S103, the server correlates the identification of the speaker, the voice data and the text according to a preset conference summary format to generate a conference summary.

And S104, the server sends the conference summary to the speaking terminal.

The preset meeting summary format may be a table format, a text format, etc., and the speaker identifier may be a text user name, a pinyin user name, etc., which is not limited in this embodiment.

In this embodiment, the server correlates the identifier of the speaker, the voice data and the text according to a preset conference summary format to generate a conference summary, and sends the conference summary to the speaking terminal, and the speaking terminal receives the conference summary from the server. The server may also send the meeting summary to all terminals accessed by the current meeting.

In one possible implementation, the method further comprises:

the server translates the text into translated text corresponding to a plurality of preset languages.

In the present embodiment, when voice data of a speaker is converted into text, it is usually converted into chinese text, since a language that the conference is permitted to use is set at the time of reservation before the conference, for example: chinese, english and Japanese, the text is translated into a translation text corresponding to a preset plurality of languages, for example: and translating the Chinese text into English and Japanese.

Correspondingly, the step S103 specifically includes:

and correlating the identification of the speaker, the voice data, the text and the translation text to generate a meeting summary.

The conference summary comprises identification of a speaker, voice data, text and translation text.

Table 1 is a summary of conferences provided in the embodiments of the present application, and as shown in table 1, the conference summary is divided into three columns, wherein the first column is a speaker identifier, the second column is a speaker text, and the third column is speaker voice data. The method comprises the steps of presetting a plurality of languages into Chinese, english and Japanese, wherein the texts of a speaker comprise Chinese texts, translation texts corresponding to English and translation texts corresponding to Japanese. As can be seen from table 1, the number of speaking terminals is 3.

TABLE 1

In one possible implementation, the speaking terminal may also display the conference summary, for example, in an interface of conference software, so as to view the conference summary of the current conference in real time, thereby improving user experience.

The method for generating the meeting summary provided by the embodiment comprises the following steps: the speaking terminal acquires the voice data of the speaking person and sends the voice data to the server, the server acquires the voice data of the speaking person through the speaking terminal and converts the voice data into a text, the server correlates the identification of the speaking person, the voice data and the text according to a preset conference summary format to generate a conference summary, and the server sends the conference summary to the speaking terminal. According to the technical scheme, the meeting summary is generated according to the format of the preset meeting summary, the problems that the meeting summary is easy to make mistakes, time and labor are consumed in manual arrangement are solved, and the generation efficiency of the meeting summary is improved.

On the basis of the foregoing embodiment, fig. 5 is a second flowchart of a method for generating a conference summary according to an embodiment of the present application, where, as shown in fig. 5, a server converts voice data into text, and specifically includes:

s201, when the speaking terminal is a first speaking terminal and is a single microphone device, or the speaking terminal is a second speaking terminal, the server converts the voice data into text through a voice transcription service.

The server receives a registration message of the speaking terminal, and can determine that the speaking terminal is a first speaking terminal or a second speaking terminal according to the registration message, wherein the registration message sent by the first speaking terminal comprises a device type, and the registration message sent by the second speaking terminal does not comprise the device type.

In this embodiment, when the server determines that the speaking terminal is the first speaking terminal and the device type is the single microphone device according to the registration message, the server indicates that the number of speakers corresponding to the first speaking terminal is 1, and converts the voice data into text through the voice transcription service.

The identity (identifier) of the speaker is determined when the number of speakers is 1, because when a participant corresponding to each first speaker terminal is set at the time of pre-meeting reservation, the participant is the speaker corresponding to the first speaker terminal.

When the server determines that the speaking terminal is the second speaking terminal according to the registration message, the second speaking terminal is a third party device, and the number of the speaking persons corresponding to the second speaking terminal is considered to be 1, the voice data is converted into text through the voice transcription service.

S202, when the speaking terminal is a first speaking terminal and is microphone array equipment, the server determines whether a camera is arranged on the speaking terminal according to equipment information.

Since the session listening service belongs to a charging project, cost is saved. In this embodiment, when the server determines that the speaking terminal is the first speaking terminal and is the microphone array device according to the received registration message, which indicates that the number of speakers corresponding to the first speaking terminal is at least one, the server determines whether a camera is set on the speaking terminal according to the stored device information of the speaking terminal. The session audiobook service has the identity recognition functions of voiceprint recognition and the like.

If yes, go to step S203-S204; if not, go to step S205.

S203, the server acquires the number of the speakers through the camera.

S204, the server converts the voice data into texts through a conversation listening and recording service or a voice transcription service according to the number of the speakers.

When the first speaking terminal is microphone array equipment, the number of the speakers corresponding to the first speaking terminal is at least one, if the speaking terminal is provided with a camera, the server issues a photographing instruction to the speaking terminal, the speaking terminal responds to the photographing instruction, photographs and uploads a photograph to the server through the camera, and the server can determine the number of the speakers according to the photograph.

The voice data is converted into text through a conversation listening recording service or a voice transcription service according to the number of the speakers, which specifically comprises:

if the number of the speakers is more than or equal to 2, converting the voice data into text through the session listening and recording service;

if the number of speakers is equal to 1, the voice data is converted into text through the voice transcription service.

In this embodiment, although the participant corresponding to each first speaker terminal is set at the time of the pre-meeting reservation, if the number of speakers (that is, the participants) is 2 or more, since the voice data corresponding to each speaker cannot be distinguished, in this embodiment, the voice data is converted into text by the session listening and recording service, and if the number of speakers is 1, the voice data is converted into text by the voice transcription service.

S205, the server converts the voice data into text through the session listening and recording service.

In this embodiment, a speaking terminal is taken as an example, and in steps S204 to S205, voice data is converted into text through a session listening service, which specifically includes:

the method comprises the steps that identification of each target speaker and target voice data are obtained from voice data through a session listening and recording service, and the speakers comprise target speakers;

for each target speaker, the target speech data is converted into target text, the text comprising the target text.

The sound signal can be converted into an electric signal through the conversation listening and recording service and then recognized by a computer so as to determine the identity of a speaker. The speaker comprises at least one target speaker, the target voice data is voice data of the target speaker, and the corresponding relation between the target voice data and the target speaker can be determined through the session audiobook service.

Before the meeting starts, the server may store the correspondence between the voiceprint features of all the speakers and the identifications of the speakers, in this embodiment, the target voice data and the voiceprint features of each target speaker may be determined from the voice data by the session listening service, and then the identifications of each target speaker may be determined from the pre-stored correspondence by comparing the voiceprint features. Thereby obtaining the corresponding relation between the identification of the target speaker and the target voice data.

The target voice data may then be converted by the conversational listening service to target text, including target text, for each target speaker.

Correspondingly, the identification of the speaker, the voice data and the text are associated to generate a meeting summary, which comprises the following steps:

and for each target speaker, the identification of the target speaker, the target voice data and the target text are associated to generate a conference summary.

Table 2 is a second illustration of a conference summary provided in an embodiment of the present application, and as shown in table 2, the conference summary is divided into three columns by adopting a table format, where the first column is an identifier of a speaker, the second column is a text of the speaker, and the third column is voice data of the speaker, where the number of speaking terminals is 3, each speaking terminal corresponds to two target speakers, and each speaking terminal is: target speaker 1, target speaker 2 … target speaker 6, the corresponding text is: target text 1, target text 2 … target text 6, the corresponding voice data are respectively: target voice data 1, target voice data 2 …, target voice data 6.

TABLE 2

Target speaker 1	Target text 1	Target voice data 1
			Target speaker 2	Target text 2	Target speech data 2
Target speaker 3	Target text 3	Target voice data 3
			Target speaker 4	Target text 4	Target voice data 4
Target speaker 5	Target text 5	Target voice data 5
			Target speaker 6	Target text 6	Target voice data 6

The method for generating the meeting summary provided by the embodiment comprises the following steps: when the speaking terminal is a first speaking terminal and is single microphone equipment, or the speaking terminal is a second speaking terminal, the server converts voice data into text through a voice transcription service, when the speaking terminal is the first speaking terminal and is microphone array equipment, the server determines whether a camera is arranged on the speaking terminal according to equipment information, if so, the server acquires the number of the speaking persons through the camera, and converts the voice data into the text through a conversation listening and recording service or the voice transcription service according to the number of the speaking persons, and if not, the server converts the voice data into the text through the conversation listening and recording service. By judging the number of the speakers corresponding to the first speaker terminal, whether the voice transcription service or the conversation listening and recording service is started is determined, so that the cost is saved while the voice data is converted into the text.

On the basis of the foregoing embodiment, fig. 6 is a flowchart of a method for generating a conference summary according to an embodiment of the present application, as shown in fig. 6, where an speaking device is a first speaking device and a camera is provided on the speaking device, the method includes:

S301, in the conference process, the speaking terminal judges whether the number of the speaking persons is changed through the camera.

And S302, if yes, the speaking terminal sends an update message to the server, wherein the update message comprises the number of the changed speaking persons.

If the speaking device is the first speaking device and is provided with a camera, in the conference process, whether the number of the speaking persons is changed or not can be judged through the camera, if so, an update message is sent to the server, the update message comprises the number of the speaking persons after being changed, and accordingly, the server receives the update message from the speaking terminal.

S303, the server converts the changed voice data into changed texts through a conversation listening recording service or a voice transcription service according to the number of the changed speakers.

Specifically, if the number of the changed speakers is greater than or equal to 2, acquiring the changed voice data through the speaking terminal, and converting the voice data into a changed text through the session listening and recording service; if the number of the changed speakers is equal to 1, the changed voice data is converted into changed text through the voice transcription service. In the process of meeting, if the number of the speakers is 1, the session starting listening and recording service is converted into the voice transferring service, so that the cost is saved.

and associating the speaker identification, the voice data and the text, and associating the modified speaker identification, the modified voice data and the modified text to generate a meeting summary.

The conference summary includes the speaker identification, voice data, text, and the modified speaker identification, modified voice data, and modified text.

The method for generating the meeting summary provided by the embodiment comprises the following steps: in the conference process, the speaking terminal judges whether the number of the speaking persons is changed through the camera, if so, the speaking terminal sends an update message to the server, the update message comprises the number of the speaking persons after being changed, the speaking terminal sends the number of the speaking persons after being changed to the server, and the server converts the voice data after being changed into a text after being changed through a conversation listening and recording service or a voice transferring service according to the number of the speaking persons after being changed. The number of the speakers is determined in the conference process, so that the voice transcription service or the conversation listening and recording service is determined to be started, and the cost is saved while the voice data is converted into the text.

The following are examples of the apparatus of the present application that may be used to perform the method embodiments of the present application. For details not disclosed in the embodiments of the apparatus of the present application, please refer to the embodiments of the method of the present application.

Fig. 7 is a schematic structural diagram of a device for generating a meeting summary according to an embodiment of the present application. In this embodiment, the processing device of the service data may be integrated in a server. As shown in fig. 7, the apparatus may include:

an acquisition module 71, configured to acquire voice data of a speaker through a speaking terminal, and a transmission module 72, configured to convert the voice data into text;

a processing module 73, configured to correlate the speaker identifier, the voice data, and the text according to a preset conference summary format, to generate a conference summary;

and the sending module 72 is further configured to send the conference summary to the speaking terminal.

when the speaking terminal is a first speaking terminal and is a single microphone device, or the speaking terminal is a second speaking terminal, the processing module 73 is specifically configured to:

The voice data is converted to text by a voice transcription service.

Further, when the speaking terminal is a first speaking terminal and is a microphone array device, the processing module 73 is specifically configured to:

if yes, acquiring the number of the speakers through the camera;

Further, the processing module 73 is specifically configured to:

a receiving module 74, configured to receive an update message from the speaking terminal during a conference, where the update message includes the number of modified speakers;

the processing module 73 is further configured to convert the modified voice data into a modified text through a conversational listening recording service or a voice transcription service according to the number of modified speakers;

the processing module 73 is specifically configured to:

Further, the processing module 73 is further configured to:

the processing module 73 is specifically configured to:

Further, if the speaking terminal is the first speaking terminal, the receiving module 74 is further configured to:

the processing module 73 is further configured to:

checking the speaking terminal according to the conference number;

the obtaining module 71 is specifically configured to:

The technical scheme in the method of the server side may be executed by the generating device of the conference summary of the present embodiment, and specific implementation processes and technical principles thereof refer to related descriptions in the method shown above, which are not repeated here.

Fig. 8 is a schematic diagram II of a device for generating a meeting summary according to an embodiment of the present application. In this embodiment, the processing means of the service data may be integrated in the speaking terminal. As shown in fig. 8, the apparatus may include:

an obtaining module 81, configured to obtain voice data of a speaker, and send the voice data to a server;

And the receiving module 82 is used for receiving the meeting summary from the server.

the processing module 83 is configured to determine, by using the camera, whether the number of speakers is changed during the conference;

if yes, the sending module 84 is configured to send an update message to the server, where the update message includes the number of modified speakers;

and sending the changed number of the speakers to a server.

Further, if the speaking terminal is the first speaking terminal, the sending module 84 is further configured to:

The technical scheme in the method of the speaking terminal side may be executed by the generating device of the conference summary of the present embodiment, and specific implementation processes and technical principles thereof refer to related descriptions in the method shown above, which are not repeated here.

Fig. 9 is a schematic hardware structure of a server according to an embodiment of the present application, as shown in fig. 9, where the server in this embodiment may include: processor, memory, transceiver.

A memory for storing a computer program (e.g., an application program, a functional module, etc. for implementing the above method), computer instructions, etc.;

the computer programs, computer instructions, etc. described above may be stored in one or more memories in partitions. And the above-described computer programs, computer instructions, data, etc. may be invoked by a processor.

A processor for executing the computer program stored in the memory to implement the steps in the method according to the above embodiment.

Reference may be made in particular to the description of the embodiments of the method described above.

The processor, memory, transceiver may be stand alone structures or may be integrated together. When the processor, memory, and transceiver are separate structures, the processor, memory, and transceiver may be connected by a bus coupling.

The specific implementation process and technical principle of the technical solution of the embodiment that may execute the method on the server side in the present embodiment are referred to in the related description of the method, and are not repeated herein.

Fig. 10 is a schematic hardware structure of an speaking terminal provided in an embodiment of the present application, and as shown in fig. 10, the speaking terminal in this embodiment may include: processor, memory, transceiver.

The technical scheme of the present embodiment, in which the method on the speaking terminal side may be executed, is specific implementation process and technical principle of the technical scheme, refer to related descriptions in the method, and are not repeated herein.

In addition, the embodiment of the application further provides a computer-readable storage medium, wherein computer-executable instructions are stored in the computer-readable storage medium, and when at least one processor of the user equipment executes the computer-executable instructions, the user equipment executes the various possible methods.

Among them, computer-readable media include computer storage media and communication media including any medium that facilitates transfer of a computer program from one place to another. A storage media may be any available media that can be accessed by a general purpose or special purpose computer. An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. In the alternative, the storage medium may be integral to the processor. The processor and the storage medium may reside in an ASIC. In addition, the ASIC may reside in a user device. The processor and the storage medium may reside as discrete components in a communication device.

Those of ordinary skill in the art will appreciate that: all or part of the steps for implementing the method embodiments described above may be performed by hardware associated with program instructions. The foregoing program may be stored in a computer readable storage medium. The program, when executed, performs steps including the method embodiments described above; and the aforementioned storage medium includes: various media that can store program code, such as ROM, RAM, magnetic or optical disks.

The present application also provides a program product comprising a computer program stored in a readable storage medium, from which the computer program can be read by at least one processor of a server, the at least one processor executing the computer program causing the server to implement the method of any one of the embodiments of the present application described above.

Claims

1. The method for generating the conference summary is characterized by being applied to a server and comprising the following steps:

acquiring voice data of a speaker through a speaking terminal, and determining whether a camera is arranged on the speaking terminal according to equipment information when the speaking terminal is a first speaking terminal and is microphone array equipment through receiving a registration message of the speaking terminal, wherein the registration message comprises a conference number, an identification of the speaking terminal and an equipment type;

If yes, acquiring the number of the speakers through the camera; if the number of the speakers is greater than or equal to 2, converting the voice data into text through a conversation listening and recording service; if the number of the speakers is equal to 1, converting the voice data into text through a voice transcription service, wherein the conversation audion service has an identity recognition function;

if not, converting the voice data into text through the session listening service;

sending the conference summary to the speaking terminal;

if the speaking terminal is provided with a camera, in the conference process, receiving an update message from the speaking terminal, wherein the update message comprises the number of changed speaking persons;

correspondingly, according to a preset conference summary format, the identification of the speaker, the voice data and the text are associated, and the identification of the modified speaker, the modified voice data and the modified text are associated, so that the conference summary is generated.

2. The method of claim 1, wherein the speaking terminals include a first speaking terminal and a second speaking terminal, and wherein device information of the first speaking terminal is stored in the server;

the voice data is converted to text by a voice transcription service.

3. The method of claim 1, wherein the converting the voice data to text via a conversational listening recording service comprises:

4. The method according to claim 1, wherein the method further comprises:

5. The method according to claim 2, wherein if the speaking terminal is a first speaking terminal, the step of acquiring, by the speaking terminal, voice data of a speaker includes:

checking the speaking terminal according to the conference number;

6. The method for generating the conference summary is characterized by being applied to the speaking terminal and comprising the following steps:

the method comprises the steps that voice data of a speaker are obtained and sent to a server, so that when the server determines that the speaker terminal is a first speaker terminal and is microphone array equipment through receiving a registration message of the speaker terminal, whether a camera is arranged on the speaker terminal or not is determined according to equipment information, wherein the registration message comprises a conference number, an identification of the speaker terminal and equipment types; if yes, acquiring the number of the speakers through the camera; if the number of the speakers is more than or equal to 2, converting the voice data into text through a conversation audiobook service; if the number of the speakers is equal to 1, converting the voice data into text through a voice transcription service, wherein the conversation audion service has an identity recognition function; if not, converting the voice data into text through the session listening service;

receiving a meeting summary from the server;

if the speaking terminal is provided with a camera, judging whether the number of the speaking persons is changed or not through the camera in the conference process;

If yes, sending an update message to the server so that the server converts the changed voice data into changed texts through a conversation listening recording service or a voice transcription service according to the number of the changed speakers; correspondingly, the server associates the identification of the speaker, the voice data and the text according to a preset conference summary format, and associates the identification of the modified speaker, the modified voice data and the modified text to generate the conference summary; the updated message includes the number of modified speakers.

7. The method of claim 6, wherein the step of providing the first layer comprises,

the device types include a single microphone device and a microphone array device.

8. A server comprising a processor, a memory, a transceiver and a computer program stored on the memory and executable on the processor, wherein the processor implements the method of any of the preceding claims 1-5 when executing the program.

9. A speaking terminal comprising a processor, a memory, a transceiver and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the method of the preceding claim 6 or 7 when executing the program.