CN113139392A

CN113139392A - Method and device for generating conference summary and storage medium

Info

Publication number: CN113139392A
Application number: CN202010052589.5A
Authority: CN
Inventors: 李晓林; 曾小光; 毕驰
Original assignee: Qingdao Hisense Commercial Display Co Ltd
Current assignee: Qingdao Hisense Commercial Display Co Ltd
Priority date: 2020-01-17
Filing date: 2020-01-17
Publication date: 2021-07-20
Anticipated expiration: 2040-01-17
Also published as: CN113139392B

Abstract

The embodiment of the application provides a method, a device and a storage medium for generating a conference summary, wherein the method comprises the following steps: the method comprises the steps that a speaking terminal obtains voice data of a speaker and sends the voice data to a server, the server obtains the voice data of the speaker through the speaking terminal and converts the voice data into a text, the server associates identification of the speaker, the voice data and the text according to a preset conference summary format to generate a conference summary, and the server sends the conference summary to the speaking terminal. According to the technical scheme, the conference summary is generated according to the preset format of the conference summary, the problems that manual conference summary arrangement is prone to making mistakes and consuming time and labor are solved, and the generation efficiency of the conference summary is improved.

Description

Method and device for generating conference summary and storage medium

Technical Field

The present application relates to the field of data processing technologies, and in particular, to a method and an apparatus for generating a conference summary, and a storage medium.

Background

In the daily office process, the participation in the conference becomes an indispensable work, and in the conference process, a conference summary is generally required to be collected to collect and arrange the speech of the participants and follow-up the follow-up work.

In the practical application process, the conference summary is often collected and arranged by special staff members to the speech of each participant in the conference, wherein the staff members need to record the important speech of each participant in detail.

However, due to the large number of participants and the difference in language, the way to manually record the conference summary is prone to errors, missing important content, and is time and labor consuming.

Disclosure of Invention

The application provides a generation method and device of a conference summary and a storage medium, which are used for solving the problems of time consumption, labor consumption and easy error in the arrangement of the existing conference summary.

In a first aspect, an embodiment of the present application provides a method for generating a conference summary, including:

acquiring voice data of a speaker through a speaking terminal, and converting the voice data into a text;

according to a preset conference summary format, associating the identifier of the speaker, the voice data and the text to generate a conference summary;

and sending the conference summary to the speaking terminal.

Furthermore, the speaking terminal comprises a first speaking terminal and a second speaking terminal, and the equipment information of the first speaking terminal is stored in the server;

when the speaking terminal is a first speaking terminal and is a single-microphone device, or when the speaking terminal is a second speaking terminal, the converting the voice data into a text includes:

converting the voice data into text through a voice transcription service.

Further, when the speaking terminal is a first speaking terminal and is a microphone array device, the converting the speech data into text includes:

determining whether a camera is arranged on the speaking terminal or not according to the equipment information;

if yes, acquiring the number of the speakers through the camera;

converting the voice data into a text through a session listening and recording service or a voice transcription service according to the number of the speakers;

if not, the voice data is converted into a text through the session listening and recording service.

Further, the converting the voice data into text through a session listening and recording service or a voice transcription service according to the number of speakers includes:

if the number of the speakers is more than or equal to 2, converting the voice data into a text through a session listening and recording service;

and if the number of the speakers is equal to 1, converting the voice data into a text through a voice transcription service.

Further, the converting the voice data into text through the session audiorecord service includes:

acquiring the identification of each target speaker and target voice data from the voice data through session listening and recording service, wherein the speakers comprise the target speakers;

for each target speaker, converting the target speech data into a target text, the text comprising the target text;

correspondingly, the associating the identifier of the speaker, the voice data, and the text to generate a conference summary includes:

and for each target speaker, associating the identifier of the target speaker, the target voice data and the target text to generate the conference summary.

Further, if a camera is disposed on the speaking terminal, the method further includes:

receiving an update message from the speaking terminal in a conference process, wherein the update message comprises the number of changed speakers;

converting the changed voice data into a changed text through a session listening and recording service or a voice transcription service according to the number of the changed speakers;

and associating the identifier of the speaker, the voice data and the text, and associating the changed identifier of the speaker, the changed voice data and the changed text to generate the conference summary.

Further, the method further comprises:

translating the text into a translation text corresponding to a plurality of preset languages;

and associating the identifier of the speaker, the voice data, the text and the translation text to generate the conference summary.

Further, if the speaking terminal is the first speaking terminal, before the speech data of the speaker is acquired by the speaking terminal, the method includes:

receiving a registration message from the speaking terminal, wherein the registration message comprises a conference number, an identifier of the speaking terminal and a device type, and the device type comprises a single microphone device and a microphone array device;

verifying the speaking terminal according to the conference number;

correspondingly, the acquiring voice data of the speaker through the speaking terminal includes:

and if the verification is passed, receiving voice data of the speaker from the speaker terminal, and acquiring the voice data of the speaker through a voice transcription service.

In a second aspect, an embodiment of the present application provides a method for generating a conference summary, including:

acquiring voice data of a speaker, and sending the voice data to a server;

receiving a conference summary from the server.

in the conference process, whether the number of speakers is changed or not is judged through the camera;

and if so, sending an update message to the server, wherein the update message comprises the number of the changed speakers.

Further, if the speaking terminal is the first speaking terminal, before the acquiring the voice data of the speaker, the method further includes:

and sending a registration message to the server, wherein the registration message comprises a conference number, the identification of the speaking terminal and the equipment type, and the equipment type comprises single-microphone equipment and microphone array equipment.

In a third aspect, an embodiment of the present application provides a device for generating a conference summary, including:

the system comprises an acquisition module, a sending module and a processing module, wherein the acquisition module is used for acquiring voice data of a speaker through a speaking terminal, and the sending module is used for converting the voice data into a text;

the processing module is used for associating the identifier of the speaker, the voice data and the text according to a preset conference summary format to generate a conference summary;

the sending module is further configured to send the conference summary to the speaking terminal.

In a fourth aspect, an embodiment of the present application provides an apparatus for generating a conference summary, including:

the system comprises an acquisition module, a server and a voice processing module, wherein the acquisition module is used for acquiring voice data of a speaker and sending the voice data to the server;

a receiving module for receiving the conference summary from the server.

In a fifth aspect, embodiments of the present application provide a server, including a processor, a memory, a transceiver, and a computer program stored on the memory and executable on the processor, the processor implementing the method according to the first aspect when executing the program.

In a sixth aspect, embodiments of the present application provide a speaking terminal comprising a processor, a memory, a transceiver, and a computer program stored on the memory and executable on the processor, wherein the processor, when executing the program, implements the method according to the second aspect.

In a seventh aspect, an embodiment of the present application provides a storage medium, which stores instructions that, when executed on a computer, cause the computer to perform the method according to any one of the foregoing first aspect and various possible implementations of the first aspect.

The embodiment of the application provides a method, a device and a storage medium for generating a conference summary, wherein the method comprises the following steps: the method for generating the conference summary provided by the embodiment includes: the method comprises the steps that a speaking terminal obtains voice data of a speaker and sends the voice data to a server, the server obtains the voice data of the speaker through the speaking terminal and converts the voice data into a text, the server associates identification of the speaker, the voice data and the text according to a preset conference summary format to generate a conference summary, and the server sends the conference summary to the speaking terminal. According to the technical scheme, the conference summary is generated according to the preset format of the conference summary, the problems that manual conference summary arrangement is prone to making mistakes and consuming time and labor are solved, and the generation efficiency of the conference summary is improved.

Drawings

Fig. 1 is a schematic diagram of a teleconference system provided in an embodiment of the present application;

FIG. 2 is a schematic diagram of an appointment interface provided in an embodiment of the present application;

fig. 3 is a schematic diagram of a conference setting interface provided in an embodiment of the present application;

fig. 4 is a first flowchart of a method for generating a conference summary according to an embodiment of the present application;

fig. 5 is a schematic flowchart of a second method for generating a conference summary according to an embodiment of the present application;

fig. 6 is a third schematic flowchart of a method for generating a conference summary provided in the embodiment of the present application;

fig. 7 is a first schematic structural diagram of a device for generating a conference summary according to an embodiment of the present application;

fig. 8 is a schematic structural diagram of a device for generating a conference summary according to an embodiment of the present application;

fig. 9 is a schematic hardware structure diagram of a server according to an embodiment of the present application;

fig. 10 is a schematic hardware structure diagram of an origination terminal according to an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

Aiming at the problems of easy error, time consumption and labor consumption in manual conference summary arrangement in the current stage, the embodiment of the application provides a generation method of the conference summary, the conference summary is generated according to a preset conference summary format, the problems of easy error, time consumption and labor consumption in manual conference summary arrangement are solved, and the generation efficiency of the conference summary is improved.

Fig. 1 is a schematic diagram of a teleconference system according to an embodiment of the present application, as shown in fig. 1, the teleconference system includes an origination terminal 101 and a server 102, where the origination terminal 101 includes a first origination terminal 1011 and a second origination terminal 1012, where device information of the first origination terminal 1011 is stored in the server 102, and the device information includes a device model, hardware configuration information, software configuration information, and the like, that is, the first origination terminal 1011 and the server 102 may be devices adapted to the teleconference system, and the second origination terminal 1012 is a third-party device. Wherein the number of terminals 101 is at least two.

The first voice terminal 1011 may join the teleconference through conference software or a browser, and the first voice terminal 1011 may correspond to a conference room in which at least one speaker may be included.

The conference system may further include receiving terminals, where the first originating terminal 1011 and the second originating terminal 1012 are terminals to be made an utterance, the receiving terminals are terminals that receive voice data but do not make an utterance, and the first originating terminal 1011, the second originating terminal 1012, and the receiving terminals are connected to the server 102 through networks, respectively. Since the conference summary is generated and the receiving terminal does not speak in the conference generation according to the present embodiment, only the procedure of accessing the conference system by the first speaking terminal 1011 and the second speaking terminal 1012 will be described.

Generally, a pre-meeting appointment needs to be made before a teleconference is formally started, and meeting information including meeting time, date, meeting name, file materials and the like needs to be set during the appointment; when a conference room is scheduled, the name of the conference room in which the conference is participating, whether the terminal in the conference room in which the conference is participating is a speaking terminal, the identifier of the speaking terminal, the number of participants, information of participants in the conference room, and a language allowed to be used in the conference need to be set. The information of the participants can be user names and/or mailbox information of the participants. The identifier of the first calling terminal may be an identification code preset for the first calling terminal, such as a conference room No. 1, a conference room a301, and the like, and the identifier of the first calling terminal of the current conference may be displayed in conference software when the conference is in progress. It should be noted that all originating terminals involved in the pre-meeting reservation are the first originating terminal.

As an example, fig. 2 is a schematic diagram of a reservation interface provided in the embodiment of the present application, and as shown in fig. 2, when a reservation is made before a meeting, meeting information needs to be set, including a meeting date, a meeting time, a duration, a meeting subject, meeting content, and a meeting accessory; when a conference room is scheduled, the name of the conference room, whether a terminal of the conference room is a speaking terminal, the identifier of the speaking terminal and the information of participants in the conference room need to be filled in.

Then, after the user clicks and submits, the remote conference is further set, fig. 3 is a schematic view of a conference setting interface provided in the embodiment of the present application, and as shown in fig. 3, in conference setting options, including voice transcription, voice translation, and session listening and recording, whether to open each option can be set according to actual service requirements. The voice transcription is not allowed to be started when the voice transcription is not started, the session listening record is not allowed to be started when the voice transcription is not started, and the voice translation is not allowed to be started when the voice transcription is not started. After the conference setting is submitted, the server may send the conference number to the speaking terminal of each conference room.

In practical application, because of business requirements, a conference number can also be sent to a second speaking terminal, in a possible application scenario, two first speaking terminals of the company access the conference through conference software or a browser, because of the practical requirements, a partner needs to participate in the conference, the partner can access the conference by adopting the conference number based on the conference software or the browser, wherein a terminal adopted by the partner to access the conference is the second speaking terminal, and the server does not know device information of the terminal.

The technical solution of the present application will be described in detail by specific examples. It should be noted that the following specific embodiments may be combined with each other, and the same or similar concepts or processes may not be described in detail in some embodiments. Embodiments of the present application will be described below with reference to the accompanying drawings.

Fig. 4 is a first schematic flow chart of a method for generating a conference summary provided in an embodiment of the present application, and as shown in fig. 4, the method includes the following steps:

s101, the speaking terminal acquires voice data of a speaker and sends the voice data to a server.

S102, the server acquires voice data of a speaker through the speaking terminal and converts the voice data into a text.

In general, when the conference time of the reserved teleconference arrives, the server may start a monitoring process to monitor a registration message of a speaking terminal, where the speaking terminal may access the conference through conference software or a browser, the speaking terminal includes a first speaking terminal and a second speaking terminal, the first speaking terminal and the server may be devices adapted to the conference system, and the second speaking terminal is a third-party device. The number of originating terminals is at least two, and one originating terminal is described as an example below.

If the speaking terminal is the first speaking terminal, before the speaking terminal acquires the voice data of the speaker, the method further includes:

the speaking terminal sends a registration message to the server, wherein the registration message comprises a conference number, an identification of the speaking terminal and a device type, and the device type comprises a single-microphone device and a microphone array device. Correspondingly, the server receives the registration message from the speaking terminal, and the server verifies the speaking terminal according to the conference number.

Specifically, the server receives a registration message from the speaking terminal, verifies the speaking terminal according to the conference number in the registration message, and if the conference number sent by the speaking terminal is the same as the conference number of the current conference, it indicates that the speaking terminal passes verification, otherwise, verification fails.

If the verification is passed, the server can also respond to the registration message to send conference information during reservation to the speaking terminal, so that the speaking terminal can acquire the profile of the accessed conference, and then the speaking terminal can perform language interaction with other terminals accessed in the conference. Specifically, the speaking terminal acquires voice data of a speaker and sends the voice data to the server, and correspondingly, the server acquires the voice data of the speaker through the speaking device.

Wherein, when the server acquires the voice data of the speaker through the speaking device, the server sends the voice data to other terminals in the conference, and the process of converting the voice into the text requires a certain time, so that the server can record and store the voice data through the voice transcription service when the voice transcription function is started, and the server acquires the voice data of the speaker through the speaking device, including:

and if the verification is passed, the server receives the voice data of the speaker from the speaker terminal and acquires the voice data of the speaker through the voice transcription service.

If the speaking device is the second speaking device, before the speaking terminal acquires the voice data of the speaker, the method further includes:

and the speaking terminal sends a registration message to the server, wherein the registration message comprises the conference number and the identification of the speaking terminal. Correspondingly, the server receives a registration message from the speaking terminal, the registration message comprises a conference number and an identifier of the speaking terminal, and the server verifies the speaking terminal according to the conference number.

If the speaking device is the second speaking device, the identifier of the speaking device may be an identifier used to indicate the identity of the speaking device, such as a serial number and an identification code of the speaking device, so that when the speaking device accesses the conference, the identifier of the speaking terminal may be displayed in a conference interface. The specific access procedure of the second origination terminal is similar to the access procedure of the first origination terminal, and is not described herein again. The second speaking terminal may then obtain the speaker's voice data and send the voice data to the server.

The server may then convert the voice data to text, and the specific implementation of the server converting the voice data to text is described in the embodiment of fig. 5.

S103, the server associates the identifier of the speaker, the voice data and the text according to a preset conference summary format to generate a conference summary.

And S104, the server sends the conference summary to the speaking terminal.

The preset conference summary format may be a table format, a text format, and the like, and the identifier of the speaker may be a literal user name, a pinyin user name, and the like of the speaker, which is not limited in this embodiment.

In this embodiment, the server associates the identifier of the speaker, the voice data, and the text according to a preset conference summary format to generate a conference summary, and sends the conference summary to the speaking terminal, and correspondingly, the speaking terminal receives the conference summary from the server. The server can also send the conference summary to all terminals accessed by the current conference.

In one possible implementation, the method further comprises:

the server translates the text into a translation text corresponding to a plurality of preset languages.

In this embodiment, when converting the voice data of the speaker into text, it is usually converted into chinese text, since the language allowed to be used by the conference is set when making a reservation before the conference, for example: if the text is translated into a translation text corresponding to a plurality of preset languages, for example: the Chinese text is translated into English and Japanese.

Correspondingly, step S103 specifically includes:

and associating the identifier of the speaker, the voice data, the text and the translation text to generate a conference summary.

Wherein the conference summary includes an identification of a speaker, voice data, text, and translated text.

Table 1 is a summary of the conference provided in the embodiment of the present application, and as shown in table 1, the conference is divided into three columns, where the first column is an identifier of a speaker, the second column is a text of the speaker, and the third column is voice data of the speaker. The method includes the steps that multiple languages including Chinese, English and Japanese are preset, and texts of speakers comprise Chinese texts, translation texts corresponding to English and translation texts corresponding to Japanese. As can be seen from table 1, the number of originating terminals is 3.

TABLE 1

In a possible implementation, the speaking terminal may further display the conference summary, for example, in an interface of conference software, so as to view the conference summary of the current conference in real time, thereby improving user experience.

The method for generating the conference summary provided by the embodiment includes: the method comprises the steps that a speaking terminal obtains voice data of a speaker and sends the voice data to a server, the server obtains the voice data of the speaker through the speaking terminal and converts the voice data into a text, the server associates identification of the speaker, the voice data and the text according to a preset conference summary format to generate a conference summary, and the server sends the conference summary to the speaking terminal. According to the technical scheme, the conference summary is generated according to the preset format of the conference summary, the problems that manual conference summary arrangement is prone to making mistakes and consuming time and labor are solved, and the generation efficiency of the conference summary is improved.

On the basis of the foregoing embodiment, fig. 5 is a flowchart illustrating a second method for generating a conference summary according to the embodiment of the present application, and as shown in fig. 5, the converting, by a server, voice data into a text specifically includes:

and S201, when the speaking terminal is a first speaking terminal and is a single-microphone device or the speaking terminal is a second speaking terminal, the server converts the voice data into text through a voice transcription service.

The server receives a registration message of the speaking terminal, and can determine that the speaking terminal is a first speaking terminal or a second speaking terminal according to the registration message, wherein the registration message sent by the first speaking terminal comprises a device type, and the registration message sent by the second speaking terminal does not comprise the device type.

In this embodiment, when the server determines that the speaking terminal is the first speaking terminal and the device type is the single-microphone device according to the registration message, it indicates that the number of speakers corresponding to the first speaking terminal is 1, and converts the voice data into text through the voice transcription service.

It should be noted that the identity (identification) of the talker is determined when the number of talkers is 1, because when a participant corresponding to each first talker terminal is set in the pre-meeting reservation, the participant is the talker corresponding to the first talker terminal.

And when the server determines that the speaking terminal is the second speaking terminal according to the registration message, the second speaking terminal is a third-party device, and the number of speakers corresponding to the second speaking terminal is considered to be 1, the voice data is converted into the text through the voice transcription service.

And S202, when the speaking terminal is a first speaking terminal and is a microphone array device, the server determines whether a camera is arranged on the speaking terminal according to the device information.

Since the session listening and recording service belongs to a charging project, the cost is saved. In this embodiment, when the server determines that the speaking terminal is the first speaking terminal and is a microphone array device according to the received registration message, it indicates that the number of speakers corresponding to the first speaking terminal is at least one, and then the server determines whether a camera is provided on the speaking terminal according to the stored device information of the speaking terminal. The session listening and recording service has the identity recognition functions of voiceprint recognition and the like.

If yes, executing steps S203-S204; if not, go to step S205.

S203, the server acquires the number of speakers through the camera.

And S204, the server converts the voice data into texts through a session listening and recording service or a voice transcription service according to the number of speakers.

When the first speaking terminal is a microphone array device, the number of speakers corresponding to the first speaking terminal is at least one, in order to further determine the number of speakers, if a camera is arranged on the speaking terminal, a photographing instruction is issued to the speaking terminal by a server, the speaking terminal responds to the photographing instruction, photographs through the camera and uploads the photographs to the server, and the server can determine the number of speakers according to the photographs.

Converting the voice data into a text through a session listening and recording service or a voice transcription service according to the number of speakers, which specifically includes:

if the number of speakers is greater than or equal to 2, voice data is converted into a text through the session listening and recording service;

if the number of speakers is equal to 1, voice data is converted into text through the voice transcription service.

Although the participant corresponding to each first talker terminal is set during pre-meeting reservation, if the number of talkers (i.e., participants) is greater than or equal to 2, since voice data corresponding to each talker cannot be distinguished, in this embodiment, the voice data is converted into a text through a session listening and recording service, and if the number of talkers is equal to 1, the voice data is converted into a text through a voice transcription service.

S205, the server converts the voice data into text through the session listening and recording service.

In this embodiment, taking a speaking terminal as an example for explanation, in steps S204 to S205, converting voice data into a text by using a session audiogram service specifically includes:

for each target speaker, converting the target voice data into a target text, the text including the target text.

Wherein, the voice signal can be converted into electric signal by the conversation listening and recording service and then identified by the computer to determine the identity of the speaker. The speakers comprise at least one target speaker, the target voice data are voice data of the target speaker, and the corresponding relation between the target voice data and the target speaker can be determined through the session listening and recording service.

Before the conference starts, the server may store correspondence between voiceprint features of all speakers and identifiers of the speakers, and in this embodiment, the target voice data and the voiceprint features of each target speaker may be determined from the voice data through the session listening and recording service, and then, by comparing the voiceprint features, the identifier of each target speaker is determined from the correspondence stored in advance. Thereby obtaining the corresponding relation between the identification of the target speaker and the target voice data.

Then, for each target speaker, the target voice data can be converted into a target text through the session audiorecord service, and the text comprises the target text.

Correspondingly, the step of associating the identifier of the speaker, the voice data and the text to generate a conference summary comprises the following steps:

and for each target speaker, associating the identification of the target speaker, the target voice data and the target text to generate a conference summary.

Table 2 is a second schematic illustration of the conference summary provided in the embodiment of the present application, and as shown in table 2, the conference summary is in a table format and is divided into three columns, where the first column is an identifier of a speaker, the second column is a text of the speaker, and the third column is voice data of the speaker, where the number of the speaker terminals is 3, and each speaker terminal corresponds to two target speakers, which are: the target speaker 1, the target speaker 2 …, and the target speaker 6 have the corresponding texts: the target text 1 and the target text 2 … are the target text 6, and the corresponding voice data are respectively: target speech data 1, target speech data 2 … target speech data 6.

TABLE 2

Target speaker 1	Target text 1	Target speech data 1
			Target speaker 2	Target text 2	Target speech data 2
Target speaker 3	Target text 3	Target speech data 3
			Target speaker 4	Target text 4	Target speech data 4
Target speaker 5	Target text 5	Target speech data 5
			Target speaker 6	Target text 6	Target speech data 6

The method for generating the conference summary provided by the embodiment includes: when the speaking terminal is a first speaking terminal and is a single-microphone device or the speaking terminal is a second speaking terminal, the server converts the voice data into a text through the voice transcription service, when the speaking terminal is the first speaking terminal and is a microphone array device, the server determines whether a camera is arranged on the speaking terminal according to device information, if yes, the server acquires the number of speakers through the camera, and according to the number of the speakers, the server converts the voice data into the text through the conversation listening and recording service or the voice transcription service, and if not, the server converts the voice data into the text through the conversation listening and recording service. By judging the number of speakers corresponding to the first speaking terminal and determining whether to start the voice transcription service or the session listening and recording service, the voice data is converted into the text, and meanwhile, the cost is saved.

On the basis of the foregoing embodiment, fig. 6 is a flowchart illustrating a third method for generating a conference summary according to an embodiment of the present application, and as shown in fig. 6, when a speaking device is a first speaking device and a camera is disposed on the speaking device, the method includes:

and S301, in the conference process, the speaking terminal judges whether the number of speakers is changed or not through the camera.

And S302, if so, the speaking terminal sends an updating message to the server, wherein the updating message comprises the number of the changed speakers.

If the speaking equipment is the first speaking equipment and is provided with the camera, whether the number of speakers is changed or not can be judged through the camera in the process of the conference, if yes, an updating message is sent to the server, the updating message comprises the changed number of the speakers, and correspondingly, the server receives the updating message from the speaking terminal.

And S303, the server converts the changed voice data into a changed text through a session listening and recording service or a voice transcription service according to the number of the changed speakers.

Specifically, if the number of speakers after change is greater than or equal to 2, obtaining the voice data after change through a speaker terminal, and converting the voice data into a text after change through a session listening and recording service; if the number of speakers after the change is equal to 1, the voice data after the change is converted into a text after the change through the voice transcription service. Therefore, in the process of meeting, if the speakers quit and the number of the speakers is 1, the session listening and recording service is switched to the voice transcription starting service, and the cost is saved.

and associating the identifier of the speaker, the voice data and the text, and associating the changed identifier of the speaker, the changed voice data and the changed text to generate a conference summary.

The conference summary comprises an identifier of a speaker, voice data, a text, and an identifier of a speaker to be changed, changed voice data, and a changed text.

The method for generating the conference summary provided by the embodiment includes: in the conference process, the speaking terminal judges whether the number of speakers is changed or not through the camera, if so, the speaking terminal sends an updating message to the server, the updating message comprises the number of the changed speakers, the speaking terminal sends the number of the changed speakers to the server, and the server converts the changed voice data into changed texts through a session listening and recording service or a voice transcription service according to the number of the changed speakers. The number of speakers is determined in the conference process, so that the voice transcription service or the session listening and recording service is determined to be started, the voice data is converted into the text, and meanwhile, the cost is saved.

The following are embodiments of the apparatus of the present application that may be used to perform embodiments of the method of the present application. For details which are not disclosed in the embodiments of the apparatus of the present application, reference is made to the embodiments of the method of the present application.

Fig. 7 is a first schematic structural diagram of a device for generating a conference summary provided in an embodiment of the present application. In this embodiment, the processing device of the service data may be integrated in the server. As shown in fig. 7, the apparatus may include:

an acquiring module 71, configured to acquire voice data of a speaker through a speaking terminal, and a sending module 72, configured to convert the voice data into a text;

the processing module 73 is configured to associate the identifier of the speaker, the voice data, and the text according to a preset conference summary format, and generate a conference summary;

a sending module 72, configured to send the conference summary to the speaking terminal.

when the speaking terminal is a first speaking terminal and is a single-microphone device, or the speaking terminal is a second speaking terminal, the processing module 73 is specifically configured to:

converting the voice data into text through a voice transcription service.

Further, when the speaking terminal is a first speaking terminal and is a microphone array device, the processing module 73 is specifically configured to:

if yes, acquiring the number of the speakers through the camera;

Further, the processing module 73 is specifically configured to:

Further, if the speaking terminal is provided with a camera, the method further comprises the following steps:

a receiving module 74, configured to receive, during a conference, an update message from the speaking terminal, where the update message includes the number of changed speakers;

the processing module 73 is further configured to convert the changed voice data into a changed text through a session listening and recording service or a voice transcription service according to the number of the changed speakers;

the processing module 73 is specifically configured to:

Further, the processing module 73 is further configured to:

the processing module 73 is specifically configured to:

Further, if the speaking terminal is the first speaking terminal, the receiving module 74 is further configured to:

the processing module 73 is further configured to:

verifying the speaking terminal according to the conference number;

the obtaining module 71 is specifically configured to:

The device for generating a conference summary of this embodiment may execute the technical solution in the method on the server side, and for the specific implementation process and the technical principle, reference is made to the related description in the method described above, and details are not described here again.

Fig. 8 is a schematic structural diagram ii of a device for generating a conference summary provided in the embodiment of the present application. In this embodiment, the processing device of the service data may be integrated in the originating terminal. As shown in fig. 8, the apparatus may include:

an obtaining module 81, configured to obtain voice data of a speaker, and send the voice data to a server;

a receiving module 82, configured to receive the conference summary from the server.

the processing module 83 is configured to determine, through the camera, whether the number of speakers changes in the conference process;

if yes, the sending module 84 is configured to send an update message to the server, where the update message includes the number of changed speakers;

and sending the changed number of speakers to a server.

Further, if the speaking terminal is the first speaking terminal, the sending module 84 is further configured to:

The apparatus for generating a conference summary of this embodiment may execute the technical solution in the method at the speaking terminal side, and for the specific implementation process and the technical principle, reference is made to the related description in the method shown above, and details are not repeated here.

Fig. 9 is a schematic diagram of a hardware structure of a server provided in an embodiment of the present application, and as shown in fig. 9, the server of the present embodiment may include: processor, memory, transceiver.

A memory for storing a computer program (e.g., an application program, a functional module, etc. implementing the above-described method), computer instructions, etc.;

the computer programs, computer instructions, etc. described above may be stored in one or more memories in a partitioned manner. And the computer programs, computer instructions, data, etc. described above may be invoked by a processor.

A processor for executing the computer program stored in the memory to implement the steps of the method according to the above embodiments.

Reference may be made in particular to the description relating to the preceding method embodiment.

The processor, memory, and transceiver may be separate structures or integrated structures integrated together. When the processor, the memory, and the transceiver are independent structures, the processor, the memory, and the transceiver may be coupled by a bus.

In the technical solution of this embodiment, for implementing the technical scheme in the server-side method, specific implementation processes and technical principles thereof refer to relevant descriptions in the method, and are not described herein again.

Fig. 10 is a schematic diagram of a hardware structure of an origination terminal according to an embodiment of the present application, and as shown in fig. 10, the origination terminal according to the embodiment may include: processor, memory, transceiver.

For the technical solution of this embodiment that can be implemented in the above-mentioned method at the end-to-speak, the specific implementation process and technical principle thereof refer to the relevant description in the above-mentioned method, and are not described herein again.

In addition, embodiments of the present application further provide a computer-readable storage medium, in which computer-executable instructions are stored, and when at least one processor of the user equipment executes the computer-executable instructions, the user equipment performs the above-mentioned various possible methods.

Computer-readable media includes both computer storage media and communication media including any medium that facilitates transfer of a computer program from one place to another. A storage media may be any available media that can be accessed by a general purpose or special purpose computer. An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. Of course, the storage medium may also be integral to the processor. The processor and the storage medium may reside in an ASIC. Additionally, the ASIC may reside in user equipment. Of course, the processor and the storage medium may reside as discrete components in a communication device.

Those of ordinary skill in the art will understand that: all or a portion of the steps of implementing the above-described method embodiments may be performed by hardware associated with program instructions. The program may be stored in a computer-readable storage medium. When executed, the program performs steps comprising the method embodiments described above; and the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks.

The present application further provides a program product comprising a computer program stored in a readable storage medium, from which the computer program can be read by at least one processor of a server, the computer program being executable by the at least one processor to cause the server to carry out the method of any of the embodiments of the present application described above.

Claims

1. A method for generating a conference summary is applied to a server and comprises the following steps:

and sending the conference summary to the speaking terminal.

2. The method according to claim 1, wherein the originating terminal includes a first originating terminal and a second originating terminal, and device information of the first originating terminal is stored in the server;

converting the voice data into text through a voice transcription service.

3. The method of claim 2, wherein converting the speech data to text when the speaking terminal is a first speaking terminal and is a microphone array device comprises:

if yes, acquiring the number of the speakers through the camera;

4. The method of claim 3, wherein converting the voice data into text through a session listening and recording service or a voice transcription service according to the number of speakers comprises:

5. The method of claim 4, wherein converting the voice data into text by a session audiometry service comprises:

6. The method of claim 3, wherein if a camera is provided on the speaking terminal, the method further comprises:

7. The method of claim 1, further comprising:

8. The method of claim 2, wherein if the originating terminal is the first originating terminal, before the obtaining of the voice data of the speaker by the originating terminal, the method comprises:

verifying the speaking terminal according to the conference number;

9. A method for generating a conference summary, applied to a speaking terminal, includes:

acquiring voice data of a speaker, and sending the voice data to a server;

receiving a conference summary from the server.

10. The method of claim 9, wherein if a camera is provided on the originating terminal, the method further comprises:

11. The method of claim 9, wherein if the speaking terminal is the first speaking terminal, before the obtaining the voice data of the speaking person, further comprising:

12. A server comprising a processor, a memory, a transceiver, and a computer program stored on the memory and executable on the processor, wherein the processor implements the method of any of claims 1-8 when executing the program.

13. A speaking terminal comprising a processor, a memory, a transceiver, and a computer program stored on the memory and executable on the processor, wherein the processor when executing the program implements the method of any one of claims 9 to 11.