CN113139392A - Method and device for generating conference summary and storage medium - Google Patents

Method and device for generating conference summary and storage medium Download PDF

Info

Publication number
CN113139392A
CN113139392A CN202010052589.5A CN202010052589A CN113139392A CN 113139392 A CN113139392 A CN 113139392A CN 202010052589 A CN202010052589 A CN 202010052589A CN 113139392 A CN113139392 A CN 113139392A
Authority
CN
China
Prior art keywords
voice data
text
speaker
terminal
conference
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010052589.5A
Other languages
Chinese (zh)
Other versions
CN113139392B (en
Inventor
李晓林
曾小光
毕驰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Qingdao Hisense Commercial Display Co Ltd
Original Assignee
Qingdao Hisense Commercial Display Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Qingdao Hisense Commercial Display Co Ltd filed Critical Qingdao Hisense Commercial Display Co Ltd
Priority to CN202010052589.5A priority Critical patent/CN113139392B/en
Publication of CN113139392A publication Critical patent/CN113139392A/en
Application granted granted Critical
Publication of CN113139392B publication Critical patent/CN113139392B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The embodiment of the application provides a method, a device and a storage medium for generating a conference summary, wherein the method comprises the following steps: the method comprises the steps that a speaking terminal obtains voice data of a speaker and sends the voice data to a server, the server obtains the voice data of the speaker through the speaking terminal and converts the voice data into a text, the server associates identification of the speaker, the voice data and the text according to a preset conference summary format to generate a conference summary, and the server sends the conference summary to the speaking terminal. According to the technical scheme, the conference summary is generated according to the preset format of the conference summary, the problems that manual conference summary arrangement is prone to making mistakes and consuming time and labor are solved, and the generation efficiency of the conference summary is improved.

Description

Method and device for generating conference summary and storage medium
Technical Field
The present application relates to the field of data processing technologies, and in particular, to a method and an apparatus for generating a conference summary, and a storage medium.
Background
In the daily office process, the participation in the conference becomes an indispensable work, and in the conference process, a conference summary is generally required to be collected to collect and arrange the speech of the participants and follow-up the follow-up work.
In the practical application process, the conference summary is often collected and arranged by special staff members to the speech of each participant in the conference, wherein the staff members need to record the important speech of each participant in detail.
However, due to the large number of participants and the difference in language, the way to manually record the conference summary is prone to errors, missing important content, and is time and labor consuming.
Disclosure of Invention
The application provides a generation method and device of a conference summary and a storage medium, which are used for solving the problems of time consumption, labor consumption and easy error in the arrangement of the existing conference summary.
In a first aspect, an embodiment of the present application provides a method for generating a conference summary, including:
acquiring voice data of a speaker through a speaking terminal, and converting the voice data into a text;
according to a preset conference summary format, associating the identifier of the speaker, the voice data and the text to generate a conference summary;
and sending the conference summary to the speaking terminal.
Furthermore, the speaking terminal comprises a first speaking terminal and a second speaking terminal, and the equipment information of the first speaking terminal is stored in the server;
when the speaking terminal is a first speaking terminal and is a single-microphone device, or when the speaking terminal is a second speaking terminal, the converting the voice data into a text includes:
converting the voice data into text through a voice transcription service.
Further, when the speaking terminal is a first speaking terminal and is a microphone array device, the converting the speech data into text includes:
determining whether a camera is arranged on the speaking terminal or not according to the equipment information;
if yes, acquiring the number of the speakers through the camera;
converting the voice data into a text through a session listening and recording service or a voice transcription service according to the number of the speakers;
if not, the voice data is converted into a text through the session listening and recording service.
Further, the converting the voice data into text through a session listening and recording service or a voice transcription service according to the number of speakers includes:
if the number of the speakers is more than or equal to 2, converting the voice data into a text through a session listening and recording service;
and if the number of the speakers is equal to 1, converting the voice data into a text through a voice transcription service.
Further, the converting the voice data into text through the session audiorecord service includes:
acquiring the identification of each target speaker and target voice data from the voice data through session listening and recording service, wherein the speakers comprise the target speakers;
for each target speaker, converting the target speech data into a target text, the text comprising the target text;
correspondingly, the associating the identifier of the speaker, the voice data, and the text to generate a conference summary includes:
and for each target speaker, associating the identifier of the target speaker, the target voice data and the target text to generate the conference summary.
Further, if a camera is disposed on the speaking terminal, the method further includes:
receiving an update message from the speaking terminal in a conference process, wherein the update message comprises the number of changed speakers;
converting the changed voice data into a changed text through a session listening and recording service or a voice transcription service according to the number of the changed speakers;
correspondingly, the associating the identifier of the speaker, the voice data, and the text to generate a conference summary includes:
and associating the identifier of the speaker, the voice data and the text, and associating the changed identifier of the speaker, the changed voice data and the changed text to generate the conference summary.
Further, the method further comprises:
translating the text into a translation text corresponding to a plurality of preset languages;
correspondingly, the associating the identifier of the speaker, the voice data, and the text to generate a conference summary includes:
and associating the identifier of the speaker, the voice data, the text and the translation text to generate the conference summary.
Further, if the speaking terminal is the first speaking terminal, before the speech data of the speaker is acquired by the speaking terminal, the method includes:
receiving a registration message from the speaking terminal, wherein the registration message comprises a conference number, an identifier of the speaking terminal and a device type, and the device type comprises a single microphone device and a microphone array device;
verifying the speaking terminal according to the conference number;
correspondingly, the acquiring voice data of the speaker through the speaking terminal includes:
and if the verification is passed, receiving voice data of the speaker from the speaker terminal, and acquiring the voice data of the speaker through a voice transcription service.
In a second aspect, an embodiment of the present application provides a method for generating a conference summary, including:
acquiring voice data of a speaker, and sending the voice data to a server;
receiving a conference summary from the server.
Further, if a camera is disposed on the speaking terminal, the method further includes:
in the conference process, whether the number of speakers is changed or not is judged through the camera;
and if so, sending an update message to the server, wherein the update message comprises the number of the changed speakers.
Further, if the speaking terminal is the first speaking terminal, before the acquiring the voice data of the speaker, the method further includes:
and sending a registration message to the server, wherein the registration message comprises a conference number, the identification of the speaking terminal and the equipment type, and the equipment type comprises single-microphone equipment and microphone array equipment.
In a third aspect, an embodiment of the present application provides a device for generating a conference summary, including:
the system comprises an acquisition module, a sending module and a processing module, wherein the acquisition module is used for acquiring voice data of a speaker through a speaking terminal, and the sending module is used for converting the voice data into a text;
the processing module is used for associating the identifier of the speaker, the voice data and the text according to a preset conference summary format to generate a conference summary;
the sending module is further configured to send the conference summary to the speaking terminal.
In a fourth aspect, an embodiment of the present application provides an apparatus for generating a conference summary, including:
the system comprises an acquisition module, a server and a voice processing module, wherein the acquisition module is used for acquiring voice data of a speaker and sending the voice data to the server;
a receiving module for receiving the conference summary from the server.
In a fifth aspect, embodiments of the present application provide a server, including a processor, a memory, a transceiver, and a computer program stored on the memory and executable on the processor, the processor implementing the method according to the first aspect when executing the program.
In a sixth aspect, embodiments of the present application provide a speaking terminal comprising a processor, a memory, a transceiver, and a computer program stored on the memory and executable on the processor, wherein the processor, when executing the program, implements the method according to the second aspect.
In a seventh aspect, an embodiment of the present application provides a storage medium, which stores instructions that, when executed on a computer, cause the computer to perform the method according to any one of the foregoing first aspect and various possible implementations of the first aspect.
The embodiment of the application provides a method, a device and a storage medium for generating a conference summary, wherein the method comprises the following steps: the method for generating the conference summary provided by the embodiment includes: the method comprises the steps that a speaking terminal obtains voice data of a speaker and sends the voice data to a server, the server obtains the voice data of the speaker through the speaking terminal and converts the voice data into a text, the server associates identification of the speaker, the voice data and the text according to a preset conference summary format to generate a conference summary, and the server sends the conference summary to the speaking terminal. According to the technical scheme, the conference summary is generated according to the preset format of the conference summary, the problems that manual conference summary arrangement is prone to making mistakes and consuming time and labor are solved, and the generation efficiency of the conference summary is improved.
Drawings
Fig. 1 is a schematic diagram of a teleconference system provided in an embodiment of the present application;
FIG. 2 is a schematic diagram of an appointment interface provided in an embodiment of the present application;
fig. 3 is a schematic diagram of a conference setting interface provided in an embodiment of the present application;
fig. 4 is a first flowchart of a method for generating a conference summary according to an embodiment of the present application;
fig. 5 is a schematic flowchart of a second method for generating a conference summary according to an embodiment of the present application;
fig. 6 is a third schematic flowchart of a method for generating a conference summary provided in the embodiment of the present application;
fig. 7 is a first schematic structural diagram of a device for generating a conference summary according to an embodiment of the present application;
fig. 8 is a schematic structural diagram of a device for generating a conference summary according to an embodiment of the present application;
fig. 9 is a schematic hardware structure diagram of a server according to an embodiment of the present application;
fig. 10 is a schematic hardware structure diagram of an origination terminal according to an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
Aiming at the problems of easy error, time consumption and labor consumption in manual conference summary arrangement in the current stage, the embodiment of the application provides a generation method of the conference summary, the conference summary is generated according to a preset conference summary format, the problems of easy error, time consumption and labor consumption in manual conference summary arrangement are solved, and the generation efficiency of the conference summary is improved.
Fig. 1 is a schematic diagram of a teleconference system according to an embodiment of the present application, as shown in fig. 1, the teleconference system includes an origination terminal 101 and a server 102, where the origination terminal 101 includes a first origination terminal 1011 and a second origination terminal 1012, where device information of the first origination terminal 1011 is stored in the server 102, and the device information includes a device model, hardware configuration information, software configuration information, and the like, that is, the first origination terminal 1011 and the server 102 may be devices adapted to the teleconference system, and the second origination terminal 1012 is a third-party device. Wherein the number of terminals 101 is at least two.
The first voice terminal 1011 may join the teleconference through conference software or a browser, and the first voice terminal 1011 may correspond to a conference room in which at least one speaker may be included.
The conference system may further include receiving terminals, where the first originating terminal 1011 and the second originating terminal 1012 are terminals to be made an utterance, the receiving terminals are terminals that receive voice data but do not make an utterance, and the first originating terminal 1011, the second originating terminal 1012, and the receiving terminals are connected to the server 102 through networks, respectively. Since the conference summary is generated and the receiving terminal does not speak in the conference generation according to the present embodiment, only the procedure of accessing the conference system by the first speaking terminal 1011 and the second speaking terminal 1012 will be described.
Generally, a pre-meeting appointment needs to be made before a teleconference is formally started, and meeting information including meeting time, date, meeting name, file materials and the like needs to be set during the appointment; when a conference room is scheduled, the name of the conference room in which the conference is participating, whether the terminal in the conference room in which the conference is participating is a speaking terminal, the identifier of the speaking terminal, the number of participants, information of participants in the conference room, and a language allowed to be used in the conference need to be set. The information of the participants can be user names and/or mailbox information of the participants. The identifier of the first calling terminal may be an identification code preset for the first calling terminal, such as a conference room No. 1, a conference room a301, and the like, and the identifier of the first calling terminal of the current conference may be displayed in conference software when the conference is in progress. It should be noted that all originating terminals involved in the pre-meeting reservation are the first originating terminal.
As an example, fig. 2 is a schematic diagram of a reservation interface provided in the embodiment of the present application, and as shown in fig. 2, when a reservation is made before a meeting, meeting information needs to be set, including a meeting date, a meeting time, a duration, a meeting subject, meeting content, and a meeting accessory; when a conference room is scheduled, the name of the conference room, whether a terminal of the conference room is a speaking terminal, the identifier of the speaking terminal and the information of participants in the conference room need to be filled in.
Then, after the user clicks and submits, the remote conference is further set, fig. 3 is a schematic view of a conference setting interface provided in the embodiment of the present application, and as shown in fig. 3, in conference setting options, including voice transcription, voice translation, and session listening and recording, whether to open each option can be set according to actual service requirements. The voice transcription is not allowed to be started when the voice transcription is not started, the session listening record is not allowed to be started when the voice transcription is not started, and the voice translation is not allowed to be started when the voice transcription is not started. After the conference setting is submitted, the server may send the conference number to the speaking terminal of each conference room.
In practical application, because of business requirements, a conference number can also be sent to a second speaking terminal, in a possible application scenario, two first speaking terminals of the company access the conference through conference software or a browser, because of the practical requirements, a partner needs to participate in the conference, the partner can access the conference by adopting the conference number based on the conference software or the browser, wherein a terminal adopted by the partner to access the conference is the second speaking terminal, and the server does not know device information of the terminal.
The technical solution of the present application will be described in detail by specific examples. It should be noted that the following specific embodiments may be combined with each other, and the same or similar concepts or processes may not be described in detail in some embodiments. Embodiments of the present application will be described below with reference to the accompanying drawings.
Fig. 4 is a first schematic flow chart of a method for generating a conference summary provided in an embodiment of the present application, and as shown in fig. 4, the method includes the following steps:
s101, the speaking terminal acquires voice data of a speaker and sends the voice data to a server.
S102, the server acquires voice data of a speaker through the speaking terminal and converts the voice data into a text.
In general, when the conference time of the reserved teleconference arrives, the server may start a monitoring process to monitor a registration message of a speaking terminal, where the speaking terminal may access the conference through conference software or a browser, the speaking terminal includes a first speaking terminal and a second speaking terminal, the first speaking terminal and the server may be devices adapted to the conference system, and the second speaking terminal is a third-party device. The number of originating terminals is at least two, and one originating terminal is described as an example below.
If the speaking terminal is the first speaking terminal, before the speaking terminal acquires the voice data of the speaker, the method further includes:
the speaking terminal sends a registration message to the server, wherein the registration message comprises a conference number, an identification of the speaking terminal and a device type, and the device type comprises a single-microphone device and a microphone array device. Correspondingly, the server receives the registration message from the speaking terminal, and the server verifies the speaking terminal according to the conference number.
Specifically, the server receives a registration message from the speaking terminal, verifies the speaking terminal according to the conference number in the registration message, and if the conference number sent by the speaking terminal is the same as the conference number of the current conference, it indicates that the speaking terminal passes verification, otherwise, verification fails.
If the verification is passed, the server can also respond to the registration message to send conference information during reservation to the speaking terminal, so that the speaking terminal can acquire the profile of the accessed conference, and then the speaking terminal can perform language interaction with other terminals accessed in the conference. Specifically, the speaking terminal acquires voice data of a speaker and sends the voice data to the server, and correspondingly, the server acquires the voice data of the speaker through the speaking device.
Wherein, when the server acquires the voice data of the speaker through the speaking device, the server sends the voice data to other terminals in the conference, and the process of converting the voice into the text requires a certain time, so that the server can record and store the voice data through the voice transcription service when the voice transcription function is started, and the server acquires the voice data of the speaker through the speaking device, including:
and if the verification is passed, the server receives the voice data of the speaker from the speaker terminal and acquires the voice data of the speaker through the voice transcription service.
If the speaking device is the second speaking device, before the speaking terminal acquires the voice data of the speaker, the method further includes:
and the speaking terminal sends a registration message to the server, wherein the registration message comprises the conference number and the identification of the speaking terminal. Correspondingly, the server receives a registration message from the speaking terminal, the registration message comprises a conference number and an identifier of the speaking terminal, and the server verifies the speaking terminal according to the conference number.
If the speaking device is the second speaking device, the identifier of the speaking device may be an identifier used to indicate the identity of the speaking device, such as a serial number and an identification code of the speaking device, so that when the speaking device accesses the conference, the identifier of the speaking terminal may be displayed in a conference interface. The specific access procedure of the second origination terminal is similar to the access procedure of the first origination terminal, and is not described herein again. The second speaking terminal may then obtain the speaker's voice data and send the voice data to the server.
The server may then convert the voice data to text, and the specific implementation of the server converting the voice data to text is described in the embodiment of fig. 5.
S103, the server associates the identifier of the speaker, the voice data and the text according to a preset conference summary format to generate a conference summary.
And S104, the server sends the conference summary to the speaking terminal.
The preset conference summary format may be a table format, a text format, and the like, and the identifier of the speaker may be a literal user name, a pinyin user name, and the like of the speaker, which is not limited in this embodiment.
In this embodiment, the server associates the identifier of the speaker, the voice data, and the text according to a preset conference summary format to generate a conference summary, and sends the conference summary to the speaking terminal, and correspondingly, the speaking terminal receives the conference summary from the server. The server can also send the conference summary to all terminals accessed by the current conference.
In one possible implementation, the method further comprises:
the server translates the text into a translation text corresponding to a plurality of preset languages.
In this embodiment, when converting the voice data of the speaker into text, it is usually converted into chinese text, since the language allowed to be used by the conference is set when making a reservation before the conference, for example: if the text is translated into a translation text corresponding to a plurality of preset languages, for example: the Chinese text is translated into English and Japanese.
Correspondingly, step S103 specifically includes:
and associating the identifier of the speaker, the voice data, the text and the translation text to generate a conference summary.
Wherein the conference summary includes an identification of a speaker, voice data, text, and translated text.
Table 1 is a summary of the conference provided in the embodiment of the present application, and as shown in table 1, the conference is divided into three columns, where the first column is an identifier of a speaker, the second column is a text of the speaker, and the third column is voice data of the speaker. The method includes the steps that multiple languages including Chinese, English and Japanese are preset, and texts of speakers comprise Chinese texts, translation texts corresponding to English and translation texts corresponding to Japanese. As can be seen from table 1, the number of originating terminals is 3.
TABLE 1
Figure BDA0002371719500000091
In a possible implementation, the speaking terminal may further display the conference summary, for example, in an interface of conference software, so as to view the conference summary of the current conference in real time, thereby improving user experience.
The method for generating the conference summary provided by the embodiment includes: the method comprises the steps that a speaking terminal obtains voice data of a speaker and sends the voice data to a server, the server obtains the voice data of the speaker through the speaking terminal and converts the voice data into a text, the server associates identification of the speaker, the voice data and the text according to a preset conference summary format to generate a conference summary, and the server sends the conference summary to the speaking terminal. According to the technical scheme, the conference summary is generated according to the preset format of the conference summary, the problems that manual conference summary arrangement is prone to making mistakes and consuming time and labor are solved, and the generation efficiency of the conference summary is improved.
On the basis of the foregoing embodiment, fig. 5 is a flowchart illustrating a second method for generating a conference summary according to the embodiment of the present application, and as shown in fig. 5, the converting, by a server, voice data into a text specifically includes:
and S201, when the speaking terminal is a first speaking terminal and is a single-microphone device or the speaking terminal is a second speaking terminal, the server converts the voice data into text through a voice transcription service.
The server receives a registration message of the speaking terminal, and can determine that the speaking terminal is a first speaking terminal or a second speaking terminal according to the registration message, wherein the registration message sent by the first speaking terminal comprises a device type, and the registration message sent by the second speaking terminal does not comprise the device type.
In this embodiment, when the server determines that the speaking terminal is the first speaking terminal and the device type is the single-microphone device according to the registration message, it indicates that the number of speakers corresponding to the first speaking terminal is 1, and converts the voice data into text through the voice transcription service.
It should be noted that the identity (identification) of the talker is determined when the number of talkers is 1, because when a participant corresponding to each first talker terminal is set in the pre-meeting reservation, the participant is the talker corresponding to the first talker terminal.
And when the server determines that the speaking terminal is the second speaking terminal according to the registration message, the second speaking terminal is a third-party device, and the number of speakers corresponding to the second speaking terminal is considered to be 1, the voice data is converted into the text through the voice transcription service.
And S202, when the speaking terminal is a first speaking terminal and is a microphone array device, the server determines whether a camera is arranged on the speaking terminal according to the device information.
Since the session listening and recording service belongs to a charging project, the cost is saved. In this embodiment, when the server determines that the speaking terminal is the first speaking terminal and is a microphone array device according to the received registration message, it indicates that the number of speakers corresponding to the first speaking terminal is at least one, and then the server determines whether a camera is provided on the speaking terminal according to the stored device information of the speaking terminal. The session listening and recording service has the identity recognition functions of voiceprint recognition and the like.
If yes, executing steps S203-S204; if not, go to step S205.
S203, the server acquires the number of speakers through the camera.
And S204, the server converts the voice data into texts through a session listening and recording service or a voice transcription service according to the number of speakers.
When the first speaking terminal is a microphone array device, the number of speakers corresponding to the first speaking terminal is at least one, in order to further determine the number of speakers, if a camera is arranged on the speaking terminal, a photographing instruction is issued to the speaking terminal by a server, the speaking terminal responds to the photographing instruction, photographs through the camera and uploads the photographs to the server, and the server can determine the number of speakers according to the photographs.
Converting the voice data into a text through a session listening and recording service or a voice transcription service according to the number of speakers, which specifically includes:
if the number of speakers is greater than or equal to 2, voice data is converted into a text through the session listening and recording service;
if the number of speakers is equal to 1, voice data is converted into text through the voice transcription service.
Although the participant corresponding to each first talker terminal is set during pre-meeting reservation, if the number of talkers (i.e., participants) is greater than or equal to 2, since voice data corresponding to each talker cannot be distinguished, in this embodiment, the voice data is converted into a text through a session listening and recording service, and if the number of talkers is equal to 1, the voice data is converted into a text through a voice transcription service.
S205, the server converts the voice data into text through the session listening and recording service.
In this embodiment, taking a speaking terminal as an example for explanation, in steps S204 to S205, converting voice data into a text by using a session audiogram service specifically includes:
acquiring the identification of each target speaker and target voice data from the voice data through session listening and recording service, wherein the speakers comprise the target speakers;
for each target speaker, converting the target voice data into a target text, the text including the target text.
Wherein, the voice signal can be converted into electric signal by the conversation listening and recording service and then identified by the computer to determine the identity of the speaker. The speakers comprise at least one target speaker, the target voice data are voice data of the target speaker, and the corresponding relation between the target voice data and the target speaker can be determined through the session listening and recording service.
Before the conference starts, the server may store correspondence between voiceprint features of all speakers and identifiers of the speakers, and in this embodiment, the target voice data and the voiceprint features of each target speaker may be determined from the voice data through the session listening and recording service, and then, by comparing the voiceprint features, the identifier of each target speaker is determined from the correspondence stored in advance. Thereby obtaining the corresponding relation between the identification of the target speaker and the target voice data.
Then, for each target speaker, the target voice data can be converted into a target text through the session audiorecord service, and the text comprises the target text.
Correspondingly, the step of associating the identifier of the speaker, the voice data and the text to generate a conference summary comprises the following steps:
and for each target speaker, associating the identification of the target speaker, the target voice data and the target text to generate a conference summary.
Table 2 is a second schematic illustration of the conference summary provided in the embodiment of the present application, and as shown in table 2, the conference summary is in a table format and is divided into three columns, where the first column is an identifier of a speaker, the second column is a text of the speaker, and the third column is voice data of the speaker, where the number of the speaker terminals is 3, and each speaker terminal corresponds to two target speakers, which are: the target speaker 1, the target speaker 2 …, and the target speaker 6 have the corresponding texts: the target text 1 and the target text 2 … are the target text 6, and the corresponding voice data are respectively: target speech data 1, target speech data 2 … target speech data 6.
TABLE 2
Target speaker 1 Target text 1 Target speech data 1
Target speaker 2 Target text 2 Target speech data 2
Target speaker 3 Target text 3 Target speech data 3
Target speaker 4 Target text 4 Target speech data 4
Target speaker 5 Target text 5 Target speech data 5
Target speaker 6 Target text 6 Target speech data 6
The method for generating the conference summary provided by the embodiment includes: when the speaking terminal is a first speaking terminal and is a single-microphone device or the speaking terminal is a second speaking terminal, the server converts the voice data into a text through the voice transcription service, when the speaking terminal is the first speaking terminal and is a microphone array device, the server determines whether a camera is arranged on the speaking terminal according to device information, if yes, the server acquires the number of speakers through the camera, and according to the number of the speakers, the server converts the voice data into the text through the conversation listening and recording service or the voice transcription service, and if not, the server converts the voice data into the text through the conversation listening and recording service. By judging the number of speakers corresponding to the first speaking terminal and determining whether to start the voice transcription service or the session listening and recording service, the voice data is converted into the text, and meanwhile, the cost is saved.
On the basis of the foregoing embodiment, fig. 6 is a flowchart illustrating a third method for generating a conference summary according to an embodiment of the present application, and as shown in fig. 6, when a speaking device is a first speaking device and a camera is disposed on the speaking device, the method includes:
and S301, in the conference process, the speaking terminal judges whether the number of speakers is changed or not through the camera.
And S302, if so, the speaking terminal sends an updating message to the server, wherein the updating message comprises the number of the changed speakers.
If the speaking equipment is the first speaking equipment and is provided with the camera, whether the number of speakers is changed or not can be judged through the camera in the process of the conference, if yes, an updating message is sent to the server, the updating message comprises the changed number of the speakers, and correspondingly, the server receives the updating message from the speaking terminal.
And S303, the server converts the changed voice data into a changed text through a session listening and recording service or a voice transcription service according to the number of the changed speakers.
Specifically, if the number of speakers after change is greater than or equal to 2, obtaining the voice data after change through a speaker terminal, and converting the voice data into a text after change through a session listening and recording service; if the number of speakers after the change is equal to 1, the voice data after the change is converted into a text after the change through the voice transcription service. Therefore, in the process of meeting, if the speakers quit and the number of the speakers is 1, the session listening and recording service is switched to the voice transcription starting service, and the cost is saved.
Correspondingly, the step of associating the identifier of the speaker, the voice data and the text to generate a conference summary comprises the following steps:
and associating the identifier of the speaker, the voice data and the text, and associating the changed identifier of the speaker, the changed voice data and the changed text to generate a conference summary.
The conference summary comprises an identifier of a speaker, voice data, a text, and an identifier of a speaker to be changed, changed voice data, and a changed text.
The method for generating the conference summary provided by the embodiment includes: in the conference process, the speaking terminal judges whether the number of speakers is changed or not through the camera, if so, the speaking terminal sends an updating message to the server, the updating message comprises the number of the changed speakers, the speaking terminal sends the number of the changed speakers to the server, and the server converts the changed voice data into changed texts through a session listening and recording service or a voice transcription service according to the number of the changed speakers. The number of speakers is determined in the conference process, so that the voice transcription service or the session listening and recording service is determined to be started, the voice data is converted into the text, and meanwhile, the cost is saved.
The following are embodiments of the apparatus of the present application that may be used to perform embodiments of the method of the present application. For details which are not disclosed in the embodiments of the apparatus of the present application, reference is made to the embodiments of the method of the present application.
Fig. 7 is a first schematic structural diagram of a device for generating a conference summary provided in an embodiment of the present application. In this embodiment, the processing device of the service data may be integrated in the server. As shown in fig. 7, the apparatus may include:
an acquiring module 71, configured to acquire voice data of a speaker through a speaking terminal, and a sending module 72, configured to convert the voice data into a text;
the processing module 73 is configured to associate the identifier of the speaker, the voice data, and the text according to a preset conference summary format, and generate a conference summary;
a sending module 72, configured to send the conference summary to the speaking terminal.
Furthermore, the speaking terminal comprises a first speaking terminal and a second speaking terminal, and the equipment information of the first speaking terminal is stored in the server;
when the speaking terminal is a first speaking terminal and is a single-microphone device, or the speaking terminal is a second speaking terminal, the processing module 73 is specifically configured to:
converting the voice data into text through a voice transcription service.
Further, when the speaking terminal is a first speaking terminal and is a microphone array device, the processing module 73 is specifically configured to:
determining whether a camera is arranged on the speaking terminal or not according to the equipment information;
if yes, acquiring the number of the speakers through the camera;
converting the voice data into a text through a session listening and recording service or a voice transcription service according to the number of the speakers;
if not, the voice data is converted into a text through the session listening and recording service.
Further, the processing module 73 is specifically configured to:
if the number of the speakers is more than or equal to 2, converting the voice data into a text through a session listening and recording service;
and if the number of the speakers is equal to 1, converting the voice data into a text through a voice transcription service.
Further, the processing module 73 is specifically configured to:
acquiring the identification of each target speaker and target voice data from the voice data through session listening and recording service, wherein the speakers comprise the target speakers;
for each target speaker, converting the target speech data into a target text, the text comprising the target text;
and for each target speaker, associating the identifier of the target speaker, the target voice data and the target text to generate the conference summary.
Further, if the speaking terminal is provided with a camera, the method further comprises the following steps:
a receiving module 74, configured to receive, during a conference, an update message from the speaking terminal, where the update message includes the number of changed speakers;
the processing module 73 is further configured to convert the changed voice data into a changed text through a session listening and recording service or a voice transcription service according to the number of the changed speakers;
the processing module 73 is specifically configured to:
and associating the identifier of the speaker, the voice data and the text, and associating the changed identifier of the speaker, the changed voice data and the changed text to generate the conference summary.
Further, the processing module 73 is further configured to:
translating the text into a translation text corresponding to a plurality of preset languages;
the processing module 73 is specifically configured to:
and associating the identifier of the speaker, the voice data, the text and the translation text to generate the conference summary.
Further, if the speaking terminal is the first speaking terminal, the receiving module 74 is further configured to:
receiving a registration message from the speaking terminal, wherein the registration message comprises a conference number, an identifier of the speaking terminal and a device type, and the device type comprises a single microphone device and a microphone array device;
the processing module 73 is further configured to:
verifying the speaking terminal according to the conference number;
the obtaining module 71 is specifically configured to:
and if the verification is passed, receiving voice data of the speaker from the speaker terminal, and acquiring the voice data of the speaker through a voice transcription service.
The device for generating a conference summary of this embodiment may execute the technical solution in the method on the server side, and for the specific implementation process and the technical principle, reference is made to the related description in the method described above, and details are not described here again.
Fig. 8 is a schematic structural diagram ii of a device for generating a conference summary provided in the embodiment of the present application. In this embodiment, the processing device of the service data may be integrated in the originating terminal. As shown in fig. 8, the apparatus may include:
an obtaining module 81, configured to obtain voice data of a speaker, and send the voice data to a server;
a receiving module 82, configured to receive the conference summary from the server.
Further, if the speaking terminal is provided with a camera, the method further comprises the following steps:
the processing module 83 is configured to determine, through the camera, whether the number of speakers changes in the conference process;
if yes, the sending module 84 is configured to send an update message to the server, where the update message includes the number of changed speakers;
and sending the changed number of speakers to a server.
Further, if the speaking terminal is the first speaking terminal, the sending module 84 is further configured to:
and sending a registration message to the server, wherein the registration message comprises a conference number, the identification of the speaking terminal and the equipment type, and the equipment type comprises single-microphone equipment and microphone array equipment.
The apparatus for generating a conference summary of this embodiment may execute the technical solution in the method at the speaking terminal side, and for the specific implementation process and the technical principle, reference is made to the related description in the method shown above, and details are not repeated here.
Fig. 9 is a schematic diagram of a hardware structure of a server provided in an embodiment of the present application, and as shown in fig. 9, the server of the present embodiment may include: processor, memory, transceiver.
A memory for storing a computer program (e.g., an application program, a functional module, etc. implementing the above-described method), computer instructions, etc.;
the computer programs, computer instructions, etc. described above may be stored in one or more memories in a partitioned manner. And the computer programs, computer instructions, data, etc. described above may be invoked by a processor.
A processor for executing the computer program stored in the memory to implement the steps of the method according to the above embodiments.
Reference may be made in particular to the description relating to the preceding method embodiment.
The processor, memory, and transceiver may be separate structures or integrated structures integrated together. When the processor, the memory, and the transceiver are independent structures, the processor, the memory, and the transceiver may be coupled by a bus.
In the technical solution of this embodiment, for implementing the technical scheme in the server-side method, specific implementation processes and technical principles thereof refer to relevant descriptions in the method, and are not described herein again.
Fig. 10 is a schematic diagram of a hardware structure of an origination terminal according to an embodiment of the present application, and as shown in fig. 10, the origination terminal according to the embodiment may include: processor, memory, transceiver.
A memory for storing a computer program (e.g., an application program, a functional module, etc. implementing the above-described method), computer instructions, etc.;
the computer programs, computer instructions, etc. described above may be stored in one or more memories in a partitioned manner. And the computer programs, computer instructions, data, etc. described above may be invoked by a processor.
A processor for executing the computer program stored in the memory to implement the steps of the method according to the above embodiments.
Reference may be made in particular to the description relating to the preceding method embodiment.
The processor, memory, and transceiver may be separate structures or integrated structures integrated together. When the processor, the memory, and the transceiver are independent structures, the processor, the memory, and the transceiver may be coupled by a bus.
For the technical solution of this embodiment that can be implemented in the above-mentioned method at the end-to-speak, the specific implementation process and technical principle thereof refer to the relevant description in the above-mentioned method, and are not described herein again.
In addition, embodiments of the present application further provide a computer-readable storage medium, in which computer-executable instructions are stored, and when at least one processor of the user equipment executes the computer-executable instructions, the user equipment performs the above-mentioned various possible methods.
Computer-readable media includes both computer storage media and communication media including any medium that facilitates transfer of a computer program from one place to another. A storage media may be any available media that can be accessed by a general purpose or special purpose computer. An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. Of course, the storage medium may also be integral to the processor. The processor and the storage medium may reside in an ASIC. Additionally, the ASIC may reside in user equipment. Of course, the processor and the storage medium may reside as discrete components in a communication device.
Those of ordinary skill in the art will understand that: all or a portion of the steps of implementing the above-described method embodiments may be performed by hardware associated with program instructions. The program may be stored in a computer-readable storage medium. When executed, the program performs steps comprising the method embodiments described above; and the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks.
The present application further provides a program product comprising a computer program stored in a readable storage medium, from which the computer program can be read by at least one processor of a server, the computer program being executable by the at least one processor to cause the server to carry out the method of any of the embodiments of the present application described above.
Those of ordinary skill in the art will understand that: all or a portion of the steps of implementing the above-described method embodiments may be performed by hardware associated with program instructions. The program may be stored in a computer-readable storage medium. When executed, the program performs steps comprising the method embodiments described above; and the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks.

Claims (13)

1. A method for generating a conference summary is applied to a server and comprises the following steps:
acquiring voice data of a speaker through a speaking terminal, and converting the voice data into a text;
according to a preset conference summary format, associating the identifier of the speaker, the voice data and the text to generate a conference summary;
and sending the conference summary to the speaking terminal.
2. The method according to claim 1, wherein the originating terminal includes a first originating terminal and a second originating terminal, and device information of the first originating terminal is stored in the server;
when the speaking terminal is a first speaking terminal and is a single-microphone device, or when the speaking terminal is a second speaking terminal, the converting the voice data into a text includes:
converting the voice data into text through a voice transcription service.
3. The method of claim 2, wherein converting the speech data to text when the speaking terminal is a first speaking terminal and is a microphone array device comprises:
determining whether a camera is arranged on the speaking terminal or not according to the equipment information;
if yes, acquiring the number of the speakers through the camera;
converting the voice data into a text through a session listening and recording service or a voice transcription service according to the number of the speakers;
if not, the voice data is converted into a text through the session listening and recording service.
4. The method of claim 3, wherein converting the voice data into text through a session listening and recording service or a voice transcription service according to the number of speakers comprises:
if the number of the speakers is more than or equal to 2, converting the voice data into a text through a session listening and recording service;
and if the number of the speakers is equal to 1, converting the voice data into a text through a voice transcription service.
5. The method of claim 4, wherein converting the voice data into text by a session audiometry service comprises:
acquiring the identification of each target speaker and target voice data from the voice data through session listening and recording service, wherein the speakers comprise the target speakers;
for each target speaker, converting the target speech data into a target text, the text comprising the target text;
correspondingly, the associating the identifier of the speaker, the voice data, and the text to generate a conference summary includes:
and for each target speaker, associating the identifier of the target speaker, the target voice data and the target text to generate the conference summary.
6. The method of claim 3, wherein if a camera is provided on the speaking terminal, the method further comprises:
receiving an update message from the speaking terminal in a conference process, wherein the update message comprises the number of changed speakers;
converting the changed voice data into a changed text through a session listening and recording service or a voice transcription service according to the number of the changed speakers;
correspondingly, the associating the identifier of the speaker, the voice data, and the text to generate a conference summary includes:
and associating the identifier of the speaker, the voice data and the text, and associating the changed identifier of the speaker, the changed voice data and the changed text to generate the conference summary.
7. The method of claim 1, further comprising:
translating the text into a translation text corresponding to a plurality of preset languages;
correspondingly, the associating the identifier of the speaker, the voice data, and the text to generate a conference summary includes:
and associating the identifier of the speaker, the voice data, the text and the translation text to generate the conference summary.
8. The method of claim 2, wherein if the originating terminal is the first originating terminal, before the obtaining of the voice data of the speaker by the originating terminal, the method comprises:
receiving a registration message from the speaking terminal, wherein the registration message comprises a conference number, an identifier of the speaking terminal and a device type, and the device type comprises a single microphone device and a microphone array device;
verifying the speaking terminal according to the conference number;
correspondingly, the acquiring voice data of the speaker through the speaking terminal includes:
and if the verification is passed, receiving voice data of the speaker from the speaker terminal, and acquiring the voice data of the speaker through a voice transcription service.
9. A method for generating a conference summary, applied to a speaking terminal, includes:
acquiring voice data of a speaker, and sending the voice data to a server;
receiving a conference summary from the server.
10. The method of claim 9, wherein if a camera is provided on the originating terminal, the method further comprises:
in the conference process, whether the number of speakers is changed or not is judged through the camera;
and if so, sending an update message to the server, wherein the update message comprises the number of the changed speakers.
11. The method of claim 9, wherein if the speaking terminal is the first speaking terminal, before the obtaining the voice data of the speaking person, further comprising:
and sending a registration message to the server, wherein the registration message comprises a conference number, the identification of the speaking terminal and the equipment type, and the equipment type comprises single-microphone equipment and microphone array equipment.
12. A server comprising a processor, a memory, a transceiver, and a computer program stored on the memory and executable on the processor, wherein the processor implements the method of any of claims 1-8 when executing the program.
13. A speaking terminal comprising a processor, a memory, a transceiver, and a computer program stored on the memory and executable on the processor, wherein the processor when executing the program implements the method of any one of claims 9 to 11.
CN202010052589.5A 2020-01-17 2020-01-17 Conference summary generation method, device and storage medium Active CN113139392B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010052589.5A CN113139392B (en) 2020-01-17 2020-01-17 Conference summary generation method, device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010052589.5A CN113139392B (en) 2020-01-17 2020-01-17 Conference summary generation method, device and storage medium

Publications (2)

Publication Number Publication Date
CN113139392A true CN113139392A (en) 2021-07-20
CN113139392B CN113139392B (en) 2023-08-15

Family

ID=76808284

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010052589.5A Active CN113139392B (en) 2020-01-17 2020-01-17 Conference summary generation method, device and storage medium

Country Status (1)

Country Link
CN (1) CN113139392B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102968991A (en) * 2012-11-29 2013-03-13 华为技术有限公司 Method, device and system for sorting voice conference minutes
CN104427292A (en) * 2013-08-22 2015-03-18 中兴通讯股份有限公司 Method and device for extracting a conference summary
CN107609045A (en) * 2017-08-17 2018-01-19 深圳壹秘科技有限公司 A kind of minutes generating means and its method
CN108922538A (en) * 2018-05-29 2018-11-30 平安科技(深圳)有限公司 Conferencing information recording method, device, computer equipment and storage medium
CN109741754A (en) * 2018-12-10 2019-05-10 上海思创华信信息技术有限公司 A kind of conference voice recognition methods and system, storage medium and terminal
CN110049270A (en) * 2019-03-12 2019-07-23 平安科技(深圳)有限公司 Multi-person conference speech transcription method, apparatus, system, equipment and storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102968991A (en) * 2012-11-29 2013-03-13 华为技术有限公司 Method, device and system for sorting voice conference minutes
CN104427292A (en) * 2013-08-22 2015-03-18 中兴通讯股份有限公司 Method and device for extracting a conference summary
CN107609045A (en) * 2017-08-17 2018-01-19 深圳壹秘科技有限公司 A kind of minutes generating means and its method
CN108922538A (en) * 2018-05-29 2018-11-30 平安科技(深圳)有限公司 Conferencing information recording method, device, computer equipment and storage medium
CN109741754A (en) * 2018-12-10 2019-05-10 上海思创华信信息技术有限公司 A kind of conference voice recognition methods and system, storage medium and terminal
CN110049270A (en) * 2019-03-12 2019-07-23 平安科技(深圳)有限公司 Multi-person conference speech transcription method, apparatus, system, equipment and storage medium

Also Published As

Publication number Publication date
CN113139392B (en) 2023-08-15

Similar Documents

Publication Publication Date Title
US10614173B2 (en) Auto-translation for multi user audio and video
JP4466666B2 (en) Minutes creation method, apparatus and program thereof
US9210269B2 (en) Active speaker indicator for conference participants
US20150149149A1 (en) System and method for translation
US20080300852A1 (en) Multi-Lingual Conference Call
CN110728976B (en) Method, device and system for voice recognition
CN108172223A (en) Voice instruction recognition method, device and server and computer readable storage medium
US9444934B2 (en) Speech to text training method and system
US9601117B1 (en) Method and apparatus of processing user data of a multi-speaker conference call
US20100150331A1 (en) System and method for telephony simultaneous translation teleconference
US20120259924A1 (en) Method and apparatus for providing summary information in a live media session
CN114514577A (en) Method and system for generating and transmitting a text recording of a verbal communication
WO2018166367A1 (en) Real-time prompt method and device in real-time conversation, storage medium, and electronic device
US9110888B2 (en) Service server apparatus, service providing method, and service providing program for providing a service other than a telephone call during the telephone call on a telephone
US11978443B2 (en) Conversation assistance device, conversation assistance method, and program
US10789954B2 (en) Transcription presentation
JP6364775B2 (en) Electronic conference system and program thereof
US9277051B2 (en) Service server apparatus, service providing method, and service providing program
CN106911832B (en) Voice recording method and device
CN113139392B (en) Conference summary generation method, device and storage medium
CN112399022A (en) Data processing method, device, equipment and storage medium
CN113611308B (en) Voice recognition method, device, system, server and storage medium
CN112562677B (en) Conference voice transcription method, device, equipment and storage medium
JP2019139280A (en) Text analyzer, text analysis method and text analysis program
WO2021159734A1 (en) Data processing method and apparatus, device, and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant