WO2021134284A1 - Voice information processing method, hub device, control terminal and storage medium - Google Patents

Voice information processing method, hub device, control terminal and storage medium Download PDF

Info

Publication number
WO2021134284A1
WO2021134284A1 PCT/CN2019/130075 CN2019130075W WO2021134284A1 WO 2021134284 A1 WO2021134284 A1 WO 2021134284A1 CN 2019130075 W CN2019130075 W CN 2019130075W WO 2021134284 A1 WO2021134284 A1 WO 2021134284A1
Authority
WO
WIPO (PCT)
Prior art keywords
information
interview
record
speaker
voice data
Prior art date
Application number
PCT/CN2019/130075
Other languages
French (fr)
Chinese (zh)
Inventor
郝杰
Original Assignee
深圳市欢太科技有限公司
Oppo广东移动通信有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳市欢太科技有限公司, Oppo广东移动通信有限公司 filed Critical 深圳市欢太科技有限公司
Priority to CN201980101053.3A priority Critical patent/CN114503117A/en
Priority to PCT/CN2019/130075 priority patent/WO2021134284A1/en
Publication of WO2021134284A1 publication Critical patent/WO2021134284A1/en

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/02Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction

Definitions

  • the embodiments of the present application relate to the field of voice processing technology, and in particular, to a voice information processing method, a central device, a control terminal, and a storage medium.
  • the embodiments of the present application expect to provide a voice information processing method, a central device, a control terminal, and a storage medium, which can generate interview records and improve the generation speed and processing efficiency of the interview records of voice interviews.
  • the embodiment of the present application provides a voice information processing method, including:
  • the identity information of the speaker is determined, and the voice data to be interpreted simultaneously is translated into the listener's target language in real time to obtain translation information;
  • the preset mapping relationship is The correspondence between the identity information of the participant, the target language, and the voiceprint information of the participant; wherein the listener is a person other than the speaker among the participants;
  • an interview record is generated.
  • the identity information of the speaker is determined based on the voice data to be simultaneously interpreted and the preset mapping relationship, and the voice data to be interpreted simultaneously is translated into the target language of the listener in real time to obtain the translation information, include:
  • the speech corresponding to the voiceprint of the voice data to be simultaneously transmitted is determined.
  • the identity information of the listener and based on the corresponding relationship between the identity information of the participant in the preset mapping relationship and the target voice, obtain the listener's target language corresponding to the listener;
  • the recording of the collection time, the speaker identity information, and the translation information corresponding to the voice data to be transcribed obtains a piece of record information, which is then obtained at the end of the voice interview
  • At least one record segment information including:
  • the recording of the collection time, the speaker identity information, and the translation information corresponding to the voice data to be transcribed obtains a piece of record information, which is then obtained at the end of the voice interview After recording at least one piece of information, the method further includes:
  • abstract extraction is performed on the at least one record fragment information, and full text abstract information is extracted;
  • the generating an interview record based on the information of the at least one record segment includes:
  • the interview record is generated.
  • the method further includes:
  • the at least one recorded segment information and the at least one speaker summary information are obtained.
  • the generating an interview record based on the information of the at least one record segment includes:
  • the interview record is generated.
  • the generating an interview record based on the information of the at least one record segment includes:
  • the method further includes:
  • the embodiment of the present application also provides a voice information processing method, including:
  • At the end of the interview receive an interview trigger instruction, and generate an interview generation instruction in response to the interview trigger instruction;
  • the interview record is generated by the central device in response to the interview generation instruction based on the preset mapping relationship and the voice data to be simultaneously transmitted received in real time .
  • the method further includes:
  • the interview records are displayed in order of time axis.
  • each segment of the interview record in the interview record includes: speaker identity information, collection time, voice data to be simultaneously translated, and translation information.
  • the display of the interview records in the order of time axis includes:
  • each segment of the interview record in the interview record further includes: speaker summary information; the speaker identity information corresponding to each segment of the interview record after the arrangement, the voice data to be transcribed, and After the translation information is displayed, the method further includes:
  • the speaker summary information is displayed.
  • the interview record further includes: full-text summary information; the speaker identity information, the voice data to be transcribed, and the translation information corresponding to each segment of the interview record after being arranged are displayed
  • the method further includes:
  • the full text summary information is displayed in front of the speaker identity information, the voice data to be transcribed, and the translation information corresponding to each segment of the interview record after the arrangement.
  • the method further includes:
  • the interview record is edited, and the final interview record is obtained and displayed.
  • the method further includes:
  • the embodiment of the application provides a hub device, including:
  • the first receiving unit is configured to receive the voice data for simultaneous interpretation of the speaker transmitted by the collection terminal after the voice interview starts, and obtain the collection time when the voice data for simultaneous interpretation starts to be collected;
  • the determining unit is configured to determine the identity information of the speaker based on the voice data to be simultaneously transmitted and the preset mapping relationship;
  • the translation unit is used for real-time translation of the voice data to be simultaneously translated into the target language of the listener to obtain translation information;
  • the preset mapping relationship is between the identity information of the participant, the target language and the voiceprint information of the participant Correspondence relationship; wherein, the listener is a person other than the speaker among the participants;
  • the recording unit is used to record the collection time, the speaker identity information, and the translation information corresponding to the voice data to be simultaneously translated to obtain a piece of recorded piece information, and then at the end of the voice interview, at least one piece of information is obtained Record fragment information;
  • the first generating unit is configured to generate interview records based on the at least one record segment information.
  • control terminal including:
  • the second receiving unit is used to receive the participant's identity information, the target language and the participant's voiceprint information
  • a mapping unit configured to send the preset mapping relationship formed by the participant's identity information, the target language, and the participant's voiceprint information to the central device;
  • the second receiving unit is further configured to receive an interview trigger instruction at the end of the interview;
  • the second generation unit is configured to generate an interview generation instruction in response to the interview trigger instruction
  • the second sending unit is configured to send the interview generation instruction to the central device
  • the second receiving unit is configured to receive an interview record of the central device responding to the interview generation instruction feedback, the interview record being the central device responding to the interview generation instruction, based on the preset mapping relationship and real-time Generated by the received voice data to be interpreted simultaneously.
  • the embodiment of the present application also provides a hub device, including:
  • the first processor and the first memory are The first processor and the first memory;
  • the first processor is configured to execute the simultaneous interpretation program stored in the first memory to implement the voice information processing method on the central device side.
  • control terminal including:
  • the second processor is configured to execute the simultaneous interpretation program stored in the second memory to implement the voice information processing method on the control terminal side.
  • An embodiment of the present application provides a storage medium on which a simultaneous interpretation program is stored, and when the simultaneous interpretation program is executed by a first processor, the voice information processing method on the central device side is implemented; or, the When the simultaneous interpretation program is executed by the second processor, the voice information processing method on the control terminal side is realized.
  • the embodiment of the application expects to provide a voice information processing method, a central device, a control terminal, and a storage medium, including: after the voice interview starts, receiving the voice data of the speaker to be simultaneously transmitted transmitted by the collection terminal, and obtaining the voice data of the speaker to be synchronized.
  • the collection time of the transmitted voice data based on the voice data to be simultaneously interpreted and the preset mapping relationship, determine the identity information of the speaker, and treat the simultaneous interpretation of the simultaneous voice data into the listener's target language in real time to obtain the translation information;
  • the preset mapping relationship is The corresponding relationship between the identity information of the participant, the target language, and the voiceprint information of the participant; among them, the listener is the participant other than the speaker; the collection time and the speaker corresponding to the voice data to be transcribed are recorded.
  • the identity information and the translation information are used to obtain a piece of recorded information, and then at the end of the voice interview, at least one piece of recorded information is obtained; based on the at least one piece of recorded information, an interview record is generated.
  • the central device can determine the identity information of the speaker according to the voice data of the speaker to be transcribed in the voice interview scene, and obtain the translation information in the language that meets the needs of the listener, at the end of the interview,
  • the interview record for this interview can be generated based on the above information, so that while the central device performs real-time simultaneous interpretation of the voice data to be simultaneously interpreted, it can also record the identified speaker identity information, translation information and other structured data.
  • the central device can also record the identified speaker identity information, translation information and other structured data.
  • the central device can also record the identified speaker identity information, translation information and other structured data.
  • the central device can also record the identified speaker identity information, translation information and other structured data.
  • the recorded information at the end of the interview, multiple recorded information will be obtained to generate the interview record of the voice interview. Therefore, the efficiency of data collation in the voice interview is improved, that is, the interview record of the voice interview is improved. Generation speed and processing efficiency.
  • FIG. 1 is an architecture diagram of a voice information processing system provided by an embodiment of the application
  • FIG. 2 is a first schematic flowchart of a voice information processing method provided by an embodiment of this application
  • FIG. 3 is a second schematic flowchart of a voice information processing method provided by an embodiment of this application.
  • FIG. 4 is a third schematic flowchart of a voice information processing method provided by an embodiment of this application.
  • FIG. 5 is a first schematic flowchart of a voice information processing method provided by an embodiment of this application.
  • FIG. 6 is a second schematic flowchart of a voice information processing method provided by an embodiment of this application.
  • FIG. 7 is a schematic diagram 1 of an exemplary display interface of interview records provided by an embodiment of the application.
  • Fig. 8 is a second schematic diagram of an exemplary display interface for interview records provided by an embodiment of the application.
  • FIG. 9 is a third schematic diagram of an exemplary display interface for interview records provided by an embodiment of the application.
  • FIG. 10 is a fourth schematic diagram of a display interface of an exemplary interview record provided by an embodiment of the application.
  • FIG. 11 is an interaction diagram of a voice information processing method provided by an embodiment of this application.
  • FIG. 12 is a schematic diagram 1 of the composition structure of a hub device provided by an embodiment of the application.
  • FIG. 13 is a second schematic diagram of the composition structure of the hub device provided by an embodiment of the application.
  • FIG. 14 is a schematic diagram 1 of the composition structure of a control terminal provided by an embodiment of the application.
  • FIG. 15 is a second schematic diagram of the composition structure of a control terminal provided by an embodiment of the application.
  • the embodiment of the application provides a voice information processing method, which is implemented by a voice information processing device.
  • the voice information processing device provided in the embodiment of the application may include a central device, a control terminal, and a transceiver integrated terminal (including a collection terminal and a receiving terminal).
  • FIG 1 is a schematic structural diagram of a voice information processing system applied by a voice information processing method; as shown in Figure 1, the voice information processing system may include: a central device 1, a control terminal 2, multiple transceiver integrated terminals 3 (including collection terminals 3- 1 and the receiving terminal 3-2).
  • the speaker can use the integrated transceiver terminal 3 (ie collection terminal 3-1) that he wears to give conference lectures.
  • the collection terminal 3- 1 Collect the speaker's voice data (that is, the voice data to be simultaneously transmitted), and transmit the voice data to be simultaneously transmitted to the hub device 1 in real time.
  • the hub device 1 obtains the voice data to be synchronized, it will start to receive the voice data to be synchronized.
  • the time of simultaneous voice data is acquired as the collection time, and then based on the voice data to be simultaneously transferred and the preset mapping relationship, the identity information of the speaker who is speaking or speaking is determined, and at the same time, the voice data to be simultaneously transferred is determined in real time.
  • Simultaneous interpretation is translated into the target language of the listener, so as to obtain the translation information (specifically according to the language required by the listener, the corresponding translation result of the corresponding language is sent); the translation information is simultaneously transmitted to the integrated terminal of the listeners participating in the meeting in real time (ie receiving On the terminal 3-2), the voice data to be interpreted corresponding to each speaker, the collection time, speaker identity information, and translation information, etc., are recorded as the record fragment information corresponding to the speaker.
  • the central device 1 can receive the interview generation instruction sent by the control terminal 2, and according to the interview generation instruction, generate interview records for all the recorded pieces of information obtained in the meeting, and finally send the interview records to the control terminal 2.
  • the control terminal 2 can display the interview record, or share the interview record with the user terminal owned by the participant participating in the meeting through the control terminal 2, so that the participant can access or browse the meeting record.
  • FIG. 2 is a first schematic flowchart of a voice information processing method provided by an embodiment of the application. As shown in Figure 2, when applied to a hub device, the voice information processing method includes the following steps:
  • the voice information processing method provided by the embodiment of this application can be applied to international conferences, international interviews, or various scenarios that require simultaneous interpretation and translation, and the embodiments of this application are not limited.
  • application scenarios can also be divided into large-scale international conferences, small-scale work conferences, public service venues, public social venues, social applications, and general scenarios.
  • public service places may be waiting halls, government office halls, etc.
  • public social places may be coffee shops, concert halls, etc.
  • the actual application scenario corresponding to the voice data to be synchronized is actually the specific application scenario in which the voice data to be synchronized is collected.
  • the specific actual application scenario is not limited in the embodiment of this application.
  • the hub device communicates with the integrated terminal and the control terminal.
  • the integrated terminal is a transceiver integrated terminal worn by participants participating in a voice interview, such as a headset/microphone integrated terminal.
  • the integrated terminal worn by each speaker can also be called In order to be a collection terminal, the all-in-one terminal worn by other listeners at this time can be called a receiving terminal.
  • the communication mode may be wireless communication technology, wired communication technology, near field communication technology, etc., for example, Bluetooth or Wi-Fi, etc., which is not limited in the embodiment of the present application.
  • the integrated terminal has both a headset and a microphone.
  • the microphone is used to collect voice data for simultaneous interpretation when speaking, and the headset is used to play translation information when listening. Therefore, each participant can be either a speaker or a listener, and the specific decision is based on actual conditions, and the embodiment of the present application does not limit it.
  • the speaker uses the collection terminal to collect the voice data to be synchronized, and transmits the voice data to be synchronized to the central device in real time.
  • the hub device also obtains the time when the voice data to be synchronized is received. , That is, the acquisition time.
  • the voice data of each speaker to be simultaneously transmitted is transmitted in real time, but the central device only obtains the time when each speaker starts to speak, that is, the collection time.
  • the voice data to be simultaneously translated may be any voice that requires voice translation, for example, voice collected in real time in an application scenario.
  • the voice data to be interpreted can be voices in any type of language.
  • the specific voice data to be simultaneously transmitted is not limited in this embodiment of the application.
  • the speakers in this application are the people who speak each time among the participants, and the embodiments of this application are not limited.
  • S102 Determine the identity information of the speaker based on the voice data to be simultaneously interpreted and the preset mapping relationship, and treat the simultaneous interpretation of the voice data to the listener's target language in real time to obtain the translation information;
  • the preset mapping relationship is the identity information of the participants, The corresponding relationship between the target language and the voiceprint information of the participants; among them, the listener is a person other than the speaker among the participants.
  • the hub device After the hub device obtains the voice data to be simultaneously transmitted, the preset mapping relationship is stored in the hub device, and the preset mapping relationship is the corresponding relationship between the identity information of the participant, the target language, and the voiceprint information of the participant.
  • the hub device can first find the target voiceprint information matching the voice data to be simultaneously transmitted based on the voiceprint information of each participant stored based on the preset mapping relationship, and then based on the difference between the participant’s identity information and the participant’s voiceprint information Find the participant identity information corresponding to the target voiceprint information, that is, the speaker’s identity information, and confirm the listener’s identity information, and then determine the listener’s target language from the correspondence between the participant’s identity information and the target language , And then translate the voice data to be simultaneously translated into the translation information in the target language of each listener corresponding to each listener, so that each listener can use their familiar voice and hear the speaker's speech through the receiving terminal.
  • the listener is a person other than the speaker among the participants.
  • the speaker identity information may be the name or the unique identifier of the speaker, which is not limited in the embodiment of the present application.
  • the preset mapping relationship is the corresponding relationship between the participant's identity information, the target language, and the participant's voiceprint information, which can be expressed as the preset mapping relationship including the corresponding relationship between the participant's identity information and the target language, The corresponding relationship between the participant's identity information and the participant's voiceprint information, the relationship between the target language and the participant's voiceprint information, the participant's voiceprint information database, the participant's identity information database, and the target language database.
  • the hub device determines the target voiceprint information that matches the voiceprint of the voice data to be simultaneously transmitted from the voiceprint information of the participants in the preset mapping relationship; based on the target voiceprint information and The corresponding relationship between the participant's identity information and the participant's voiceprint information in the preset mapping relationship is determined, and the speaker identity information corresponding to the voiceprint of the voice data to be transcribed is determined, and based on the participant in the preset mapping relationship The corresponding relationship between the identity information and the target voice is obtained, and the target language of the listener corresponding to the listener is obtained; finally, the voice data to be simultaneously translated is translated into the target language of the listener in real time to obtain the translation information.
  • the hub device after the hub device has learned the correspondence between the target language and the voiceprint information of the participants, after determining the target voiceprint information, since the participants are the listeners except for the speaker Therefore, after the target voiceprint information corresponding to the speaker is determined, the target language corresponding to other unmatched voiceprint information is the listener's target language, so that the listener's target language corresponding to the listener can be determined.
  • the hub device has built-in functions such as speech recognition (ASR, Automatic Speech Recognition), speech synthesis (TTS, Text-To-Speech), voiceprint recognition, translation, and recording (support online or offline) Mode), which has networking and communication functions, and can interact with control terminals and integrated terminals.
  • ASR Automatic Speech Recognition
  • TTS speech synthesis
  • TTS Text-To-Speech
  • voiceprint recognition translation
  • recording support online or offline
  • the central equipment combines voiceprint recognition, ASR technology, machine translation technology, and TTS technology to build a simultaneous interpretation system in interview scenarios, which solves the communication barriers between different languages.
  • the hub device uses the voiceprint recognition technology to identify the target voiceprint information matching the voice data to be simultaneously transmitted from the voiceprint information of the participants in the preset mapping relationship, and adopts Machine translation technology translates the voice data to be simultaneously interpreted into the listener's target language in real time, and obtains the translation information that each listener can understand.
  • the central device after the central device translates the translation information required by each listener in real time, it can send the translation information to the receiving terminal of the listener participating in the voice interview in real time for the listener to use their corresponding receiving terminal Hear the spoken information in the language you are familiar with, that is, translate the voice data.
  • the hub device when the hub device communicates with each integrated terminal, the corresponding relationship between each integrated terminal and the participant is stored, that is, the hub device can accurately send the data to be sent to the participant to the corresponding integrated terminal. In the terminal. In this way, the hub device can correspondingly send the respective translation information obtained according to the target language of each listener to the receiving terminal of the listener.
  • the translation information includes translated text information and translated voice data.
  • This is the central device that uses TTS technology to convert the translated text information into translated voice data after translating the voice data to be transcribed into translated text information. In this way, the central device can send the translated voice data and the translated text information to the receiving terminal of the listener participating in the voice interview in real time.
  • the translated text information may also be displayed on the display for the listener to watch.
  • the specific implementation of the embodiment of the present application is not limited.
  • S103 Record the collection time, speaker identity information, and translation information corresponding to the voice data to be simultaneously translated to obtain one piece of recorded information, and then at the end of the voice interview, obtain at least one piece of recorded information.
  • the central device After the central device obtains the translation information corresponding to each listener, it can obtain a recorded segment information of the speaker by recording the collection time, speaker identity information and translation information corresponding to the voice data to be simultaneously interpreted, and then continue to the next speech Obtain the recorded segment information of the speaker, so that at the end of the voice interview, the central device can obtain at least one recorded segment information corresponding to different speakers.
  • a piece of recording information may include voice data to be interpreted, speaker identity information, translation information, and collection time.
  • Each record fragment can use fields for data recording.
  • the hub device can perform text recognition on the voice data to be interpreted to obtain source text information; record the collection time, speaker identity information, translation information, and source text information corresponding to the voice data to be interpreted simultaneously, Until the target voiceprint information changes, one piece of recorded information is obtained, and then at the end of the voice interview, at least one piece of recorded information is obtained.
  • a piece of record segment information may also include: source text information.
  • the central device adopts ASR technology to convert the voice data to be transcribed into source text information.
  • the central device recognizes the speaker's identity based on the voiceprint recognition technology for the voice data to be transmitted in real time, and when the identity of the speaker at a certain moment is received changes, it represents a speaker When the speech is over, the next speaker starts to speak, so it returns to the implementation of S101, and the information corresponding to the previous speaker is recorded as the above-mentioned record fragment information, so that at the end of the interview, the central device can obtain at least one Record fragment information.
  • At least one recorded segment information may include recorded segment information of the same speaker at different moments, which is based on actual recording conditions and is not limited in the embodiment of the present application.
  • the speaker's name is the speaker's identity information
  • the time stamp is the collection time
  • the audio is the voice data to be interpreted simultaneously
  • the speech text is the source text information
  • the translated text is the translation information.
  • S104 Generate an interview record based on the information of at least one record segment.
  • the central device records at least one recorded piece of information, so that the central device can generate an interview record for the interview based on the at least one recorded piece of information.
  • the hub device can communicate with the control terminal, and the control terminal is used to receive some conventional settings for input, such as the target language, the number of people, and the listening language of each integrated terminal.
  • the control terminal can also control the realization of the functions of the central device. For example, the function of generating interview records.
  • the hub device may receive an interview generation instruction sent by the control terminal; in response to the interview generation instruction, generate interview records for at least one piece of record information in the order of the time axis; the hub device sends the interview records to the control terminal.
  • control terminal side can be provided with an input device.
  • the user can generate an interview generation instruction through the input device, and then send the interview generation instruction to the central device, so that the central device has recorded Regarding at least one recorded segment information of the voice interview, the central device generates an interview record with at least one recorded segment information, and then sends the interview record to the control terminal for the control terminal to present the interview record.
  • the central device can determine the identity information of the speaker and obtain the translation information in the language that meets the needs of the listener based on the voice data of the speaker in the voice interview scene. Based on the above information, the interview record for this interview is generated, so that the central device can perform real-time simultaneous interpretation of the voice data to be simultaneously interpreted, and at the same time, it can also record the identified speaker identity information, translation information and other structured data. Record fragment information, and finally at the end of the interview, multiple recorded fragment information will be obtained to generate the interview record of the voice interview. Therefore, the efficiency of data collation in the voice interview is improved, that is, the generation of the interview record of the voice interview is improved Speed and processing efficiency.
  • an embodiment of the present application further provides a voice information processing method, including: S105-S106. as follows:
  • S105 Use abstract extraction technology to abstract at least one piece of record information to extract full text abstract information
  • S106 Generate an interview record based on at least one piece of record information and full text summary information.
  • the hub device recorded at least one piece of record information.
  • the hub device can also use abstract extraction technology (such as the TextRank algorithm) to extract at least one piece of record information to extract the full text summary information.
  • the summary information represents the summary of the main content discussed by the speakers in this voice interview, that is, the full-text summary information is the full-text summary extracted after summarizing all the speeches of each speaker.
  • the central device can generate an interview record for this interview based on at least one piece of record information and full-text summary information.
  • the full text summary information can be placed at the beginning of the interview record to facilitate the user to roughly understand the content of the interview and decide whether to continue reading the interview record.
  • the recording in S103 corresponds to the collection time, speaker identity information, and translation information corresponding to the voice data to be interpreted. After obtaining a piece of record information, it further includes: S107 -S109. as follows:
  • the hub device can use the abstract extraction technology to compare a piece of record information. Perform abstract extraction to extract the speaker summary information, that is, after each recorded segment information is generated, the speaker summary information of each recorded segment information can be extracted at the same time. In this way, at the end of the voice interview, the At least one recorded segment information and at least one speaker summary information; finally, the hub device can generate an interview record based on at least one recorded segment information and at least one speaker summary information.
  • the hub device may also perform summary extraction on each recorded segment information after acquiring at least one recorded segment information to obtain at least one speaker summary information, which is not limited in the embodiment of the present application.
  • the central device can summarize the central idea of each recorded piece of information into speaker summary information, so that the reading efficiency of each spoken information in the generated interview record can be improved.
  • the generation of interview records can also be based on full-text summary information, at least one speaker summary information, and at least one piece of record information at the same time.
  • the generated interview records not only include the content summary of the full text, but also The summary of the content of each speech information greatly improves the embodiment of the main ideas in the interview record, thereby increasing the richness and diversity of the interview record.
  • the hub device may also be provided with a speaker or connected to a speaker.
  • the control terminal can control the playback language of the speaker, and the speaker's speech can be converted into the playback language through the speaker. Play it.
  • FIG. 5 is a first schematic flowchart of a voice information processing method provided by an embodiment of this application. As shown in Figure 5, when applied to a control terminal, the voice information processing method includes the following steps:
  • S201 Receive participant identity information, target language, and participant's voiceprint information.
  • S202 Send the preset mapping relationship formed by the participant's identity information, the target language, and the participant's voiceprint information to the central device.
  • the control terminal may be a smart device installed with a designated application (such as a simultaneous interpretation application that implements the voice processing method provided in this application), such as a smart phone, a tablet computer, or a computer, which is not limited in the embodiment of this application.
  • the control terminal can communicate with the central device.
  • the control terminal can be provided with an input device on the control terminal side, which can input some general settings, such as the target language, the number of people, and the listening language of each earphone microphone terminal.
  • the user before the voice interview is conducted, the user can enter the relevant information of the participant through the control terminal, for example, the identity information of the participant and the target language; at the same time, the participant’s voice can also be collected through the integrated terminal recording.
  • the control terminal After the voiceprint information of the participant is sent to the control terminal through the central device, the control terminal corresponds the participant’s identity information, the participant’s voiceprint information, and the target language to form a preset mapping relationship.
  • the control terminal can The preset mapping relationship is sent to the central device for use by the central device.
  • control terminal may also send participant identity information and target language to the central device, and the integrated terminal sends the participant’s voiceprint information to the central device, and the central device transmits the participant’s identity information and participation
  • the voiceprint information of the speaker and the target language correspond to form a preset mapping relationship. Therefore, the process of obtaining the preset mapping relationship can be selected according to actual conditions, and the embodiment of the present application does not limit it.
  • the participant's identity information may be the participant's name, or a unique identification, etc., which is not limited in the embodiment of the present application.
  • the preset mapping relationship is the corresponding relationship between the participant's identity information, the target language, and the participant's voiceprint information, which can be expressed as the preset mapping relationship includes the participant's identity information Correspondence with the target language, the corresponding relationship between the participant's identity information and the participant's voiceprint information, the relationship between the target language and the participant's voiceprint information, the participant's voiceprint information database, the participant's identity information database and the target language Library etc.
  • an interview trigger instruction is received, and an interview generation instruction is generated in response to the interview trigger instruction.
  • S205 Receive the interview record of the central device's response to the interview generation instruction feedback.
  • the interview record is generated by the central device in response to the interview generation instruction, based on the preset mapping relationship and real-time received voice data for simultaneous interpretation.
  • the control terminal can control the central device to generate interview records.
  • the user can trigger the interview record generation function through the control terminal, generate interview trigger instructions, generate interview generation instructions, and generate interviews
  • the instruction is sent to the central device, so that the central device can record the determined speaker identity information, translation information and other structured data as the recorded fragment information while performing real-time simultaneous interpretation of the voice data to be simultaneously interpreted.
  • the central device generates interview records from at least one piece of record information, and then sends the interview records to the control terminal for the control terminal to present the interview records.
  • the interview record is generated by the central device in response to the interview generation instruction, based on the preset mapping relationship and the voice data to be simultaneously transmitted received in real time.
  • control terminal can obtain the interview record from the central device.
  • the interview record is the information that records the content of the voice interview. In this way, the user can obtain or watch the interview record through the control terminal, which is convenient and fast.
  • a control terminal is provided. The intelligence.
  • S205 after S205, it further includes: S206, or, S207-S208, or, S209-S211. as follows:
  • the control terminal After the control terminal obtains the interview record, because the spokesperson’s speech related information in the voice interview recorded in the interview record, and the speech sequence of the spokesperson is temporal, the content contained in the interview record can also be temporal In this way, the control terminal can display the interview records in the order of the time axis.
  • each segment of the interview record in the interview record may include: speaker identity information, collection time, voice data to be simultaneously translated, and translation information.
  • the control terminal can arrange each interview record in the interview record in the order of the time axis according to the collection time; and display the identity information of the speaker, the voice data to be transcribed and the translation information corresponding to each interview record after the arrangement. .
  • the voice data and translation information corresponding to each interview record after the arrangement can be arranged in the corresponding area of the speaker’s identity information through the function buttons.
  • the function button When the function button is triggered, the corresponding function button is displayed.
  • the content can be.
  • the text information to be translated into simultaneous voice data that is, the source text information
  • the translation information can also include: translated text information and translated voice data, which is convenient for display and function
  • the key can be set to one type of content corresponding to one, or multiple content corresponding to two keys can be combined to realize the joint implementation.
  • source text information corresponds to a function button 1
  • translated text information corresponds to a function button 2
  • voice data corresponds to a function button 3.
  • the voice data to be transmitted will be played; when 2 and 3 are triggered at the same time , Play the translated voice data; when triggered by 1, display the source text information; when triggered by 3, display the translated text information.
  • a comparison button can also be set to simultaneously display source text information and translate text information, etc.
  • the embodiments of this application do not limit the setting method and arrangement of the buttons based on the content displayed in the interview record.
  • the speaker’s language is English and the translation language is Chinese.
  • the control terminal will obtain Xxx interview records.
  • the display interface is equipped with four function buttons: “original”, “translation”, “sound”, and “contrast”.
  • the speakers include :Speaker A, Speaker B and Speaker C. There are four function buttons “original”, “translation”, “sound” and “contrast” in the corresponding area 1 behind each speaker's identity information.
  • collection time is set (for example, Speaker A: 2019.08.31 22:10:15; Speaker B: 2019.08.31 22:12:10; Speaker C: 2019.08.31 22:13:08), that is The start time of the speech, and the interview records are arranged according to the time of the speech to display each interview record.
  • the translated text information is displayed in area 3: "I think China will be far ahead in the 5G competition! In the next one to two years, 5G will begin to be applied and achieved explosive growth. increase”.
  • the source text information is displayed in area 4: "I agree with you very much, I think 5G will bring a lot of new opportunities”.
  • the translated voice data is played in area 5.
  • the source text information is displayed in area 6: "I believe that China will lead in the 5G competition! In the next one to two years, 5G will begin to apply and achieve explosive growth” Contrast with the translated text message: "I think China will be far ahead in the 5G race! In the next one to two years, 5G will begin to be applied and achieve explosive growth”.
  • the interview record of each speaker's speech has a time stamp, and you can choose the original text, translation, audio, and comparison options.
  • each segment of the interview record in the interview record also includes: speaker summary information; the control terminal will arrange each segment of the interview record corresponding to the speaker's identity information, the voice data to be interpreted, and translation information After the display, the control terminal may also display the speaker summary information in the first preset area in the display area of each interview record.
  • the central device if the speaker’s speech fragment is relatively long (for example, more than 70 words), the central device generates a speech fragment summary (that is, speaker summary information), and carries the speaker summary information in The interview record is sent to the control terminal for readers to select and read the speaker's summary information on the control terminal, so as to select the selection of the specific speaker's speech information.
  • a speech fragment summary that is, speaker summary information
  • the central device can summarize the central idea of each recorded piece of information into speaker summary information, so that the speaker summary information displayed by the control terminal can allow readers to quickly understand the main content of each interview record, thereby Improve the reading efficiency of each statement in the generated interview records.
  • the translated text information is: "I think China will be far ahead in the 5G competition! In the next one to two years, 5G will begin to be applied and achieve a burst In the previous 234G era, China has always been in a passive state, which makes foreigners think that China has no examples to develop 5G first. Unexpectedly, China has become a leader in the 5G era.”
  • the abstract of the speaker extracted was “A It is believed that China’s 5G is in a leading position and has achieved large-scale growth”, so that the control terminal can display the speaker summary in area 11 on the display interface of the interview record.
  • the speaker summary of each speaker can be displayed together with each interview record, or it can be displayed in the area E through the user's manipulation on the display interface of the interview record.
  • the specific implementation method is this The application examples are not limited.
  • the interview record also includes: full-text summary information; when the control terminal displays the speaker identity information, the voice data to be simultaneously translated and the translation information corresponding to each segment of the interview record after being arranged, it controls at the same time
  • the terminal can also display the full-text summary information in front of the speaker identity information, the voice data to be transcribed, and the translation information corresponding to each interview record after the arrangement.
  • the central device can use abstract extraction technology (such as the TextRank algorithm) to abstract at least one recorded segment information to extract full-text summary information, where the full-text summary information represents the main content of the speakers in the voice interview Summary, that is, the full-text summary information is the full-text summary extracted after summarizing all the speeches of each speaker.
  • abstract extraction technology such as the TextRank algorithm
  • the central device can generate an interview record for this interview based on at least one recorded fragment information and full-text summary information, and send it to the control terminal.
  • the full text summary information can be placed at the beginning of the interview record to facilitate the user to roughly understand the content of the interview and decide whether to continue reading the interview record.
  • control terminal After the control terminal receives the interview record, there may still be errors in the interview record processed by the machine or need to be manually added and polished. At this time, the control terminal also provides an editable function.
  • the control terminal receives the editing instruction, so in the display interface of the interview record, it responds to the editing instruction, obtains the editing information, edits the interview record according to the obtained editing information, and obtains the final interview record and displays it.
  • the editing function improves the content of the interview record, eliminates some errors, etc., and the final interview record obtained is more accurate and complete.
  • the control terminal After the control terminal receives the interview record, there may be a need to share the interview record. At this time, the control terminal also provides a sharing function. When the user triggers the sharing function to turn on, the control terminal receives the export command. In the display interface of the interview record, respond to the export instruction, obtain the export format, and perform the preset format processing on the interview record according to the obtained export format to obtain the export file, and finally share the export file.
  • the export format may include: HTML format, txt format, PDF format, etc., any text format or web page format, etc., and the embodiment of the application does not limit it.
  • control terminal when the control terminal exports the interview records, it can export different formats according to different purposes and platforms. Such as exporting plain text in txt format, exporting to PDF format for archiving, exporting to HTML format for sharing, etc.
  • the voice data related to this interview can also be stored in the central device or the cloud, so that when the control terminal exports the interview record, it can also attach a link to share the voice, so that the interview record and the voice can be combined.
  • the link is shared with others, so that others can obtain the text information and voice information of the interview record.
  • the sharing function provides the function of sharing interview records, which improves the intelligence and diversity in the voice interview scene.
  • the voice information processing method provided in this application solves the problem of interview record generation, greatly improves the efficiency of post-processing reports, and provides convenient functions such as original text, translation, comparison, audio, and abstract.
  • the embodiment of the present application provides a voice information processing method, as shown in FIG. 11, including:
  • the control terminal receives the participant's identity information, the target language, and the participant's voiceprint information.
  • the control terminal sends the preset mapping relationship formed by the participant's identity information, the target language, and the participant's voiceprint information to the central device.
  • the central device receives the voice data for simultaneous interpretation of the speaker transmitted by the collection terminal from the collection terminal, and obtains the collection time of the voice data for simultaneous interpretation.
  • S304 Determine the identity information of the speaker based on the voice data to be simultaneously interpreted and the preset mapping relationship, and treat the simultaneous interpretation of the voice data into the listener's target language in real time to obtain the translation information;
  • the preset mapping relationship is the identity information of the participants, The corresponding relationship between the target language and the voiceprint information of the participants; among them, the listener is a person other than the speaker among the participants.
  • the central device sends the translation information to the receiving terminal in real time, so that the receiving terminal can play the translation information.
  • the hub device records the collection time, speaker identity information, and translation information corresponding to the voice data to be simultaneously translated to obtain a piece of recorded information, and then at the end of the voice interview, obtain at least one piece of recorded information.
  • the central device receives the interview generation instruction sent by the control terminal; in response to the interview generation instruction, generates an interview record for at least one piece of record information in the order of the time axis.
  • the central device sends the interview record to the control terminal.
  • the control terminal displays the interview record.
  • the participant’s name (participant identity information) and language (target language) are entered through the control terminal, and the participant’s voiceprint information is collected through the headset/microphone integrated terminal, and the participant’s identity information and target language are recorded through the central device.
  • the preset mapping relationship is obtained; the simultaneous interpretation module in the central device starts to work at the beginning of the interview, and the speaker's simultaneous interpretation voice data is recorded through the headset/microphone integrated terminal and sent to the central device in real time , The central device completes text writing, translation, recording, and translation voice generation according to the preset mapping relationship, and obtains the recorded fragment information, and then sends the translated voice to the headset/microphone integrated terminal of other people (listeners), and the listener can Hear the corresponding translated audio.
  • the central device will generate interview records from multiple recorded pieces of information, and finally edit the interview records through the control terminal to obtain the final interview records and display them.
  • an embodiment of the present application provides a hub device 1, and the hub device 1 may include:
  • the first receiving unit 10 is configured to receive the voice data for simultaneous interpretation of the speaker transmitted by the collection terminal after the voice interview starts, and obtain the collection time when the voice data for simultaneous interpretation starts to be collected;
  • the determining unit 11 is configured to determine the identity information of the speaker based on the voice data to be simultaneously transmitted and the preset mapping relationship;
  • the translation unit 12 is used for real-time translation of the voice data to be simultaneously interpreted into the target language of the listener to obtain translation information;
  • the preset mapping relationship is among the identity information of the participant, the target language, and the voiceprint information of the participant Correspondence between; wherein, the listener is a person other than the speaker among the participants;
  • the recording unit 13 is used to record the collection time, the speaker identity information, and the translation information corresponding to the voice data to be simultaneously translated to obtain a piece of recorded information, and then at the end of the voice interview, at least A piece of record information;
  • the first generating unit 14 is configured to generate an interview record based on the at least one record segment information.
  • the determining unit 11 is further configured to determine the target voiceprint information that matches the voiceprint of the voice data to be simultaneously transmitted from the voiceprint information of the participants in the preset mapping relationship And based on the target voiceprint information and the corresponding relationship between the participant’s identity information and the participant’s voiceprint information in the preset mapping relationship, determine all the voiceprints corresponding to the voice data to be simultaneously transmitted The speaker identity information, and based on the correspondence between the participant identity information in the preset mapping relationship and the target voice, obtain the listener's target language corresponding to the listener;
  • the translation unit 12 is also used to translate the voice data to be simultaneously translated into the target language of the listener in real time to obtain the translation information.
  • the recording unit 13 is further configured to perform text recognition on the voice data to be simultaneously translated to obtain source text information; and record the collection corresponding to the voice data to be simultaneously translated Time, the speaker’s identity information, the translation information, and the source text information, until the target voiceprint information changes, obtain a piece of recorded information, and then at the end of the voice interview, obtain the at least one Record clip information.
  • the hub device 1 further includes: an extracting unit 15;
  • the extraction unit 15 is configured to record the collection time, the speaker identity information, and the translation information corresponding to the voice data to be simultaneously translated to obtain a piece of record information, and then at the end of the voice interview , After obtaining at least one piece of record information, use abstract extraction technology to abstract the at least one piece of record information to extract the full text abstract information;
  • the first generating unit 14 is further configured to generate the interview record based on the at least one record fragment information and the full-text summary information.
  • the hub device 1 further includes: an extracting unit 15;
  • the extraction unit 15 is configured to record the collection time, the speaker identity information, and the translation information corresponding to the voice data to be simultaneously translated, and after obtaining a piece of record fragment information, use the abstract extraction technology, A summary extraction is performed on one recorded segment information to extract the speaker summary information; at the end of the voice interview, the at least one recorded segment information and the at least one speaker summary information are obtained.
  • the first generating unit 14 is further configured to generate the interview record based on the at least one record fragment information and the at least one speaker summary information.
  • the hub device 1 further includes: a first sending unit 16;
  • the first receiving unit 10 is also configured to receive an interview generation instruction sent by the control terminal;
  • the first generating unit 14 is further configured to generate the interview record for the at least one piece of record information in the order of the time axis in response to the interview generating instruction;
  • the first sending unit 16 is configured to send the interview record to the control terminal.
  • the central device can determine the identity information of the speaker and obtain the translation information in the language that meets the needs of the listener in the voice interview scene, according to the voice data of the speaker to be simultaneously translated, at the end of the interview, you can Based on the above information, the interview record for this interview is generated, so that the central device can perform real-time simultaneous interpretation of the voice data to be simultaneously interpreted, and at the same time, it can also record the identified speaker identity information, translation information and other structured data. Record fragment information, and finally at the end of the interview, multiple recorded fragment information will be obtained to generate the interview record of the voice interview. Therefore, the efficiency of data collation in the voice interview is improved, that is, the generation of the interview record of the voice interview is improved Speed and processing efficiency.
  • control terminal 2 may include:
  • the second receiving unit 20 is configured to receive participant identity information, target language, and participant's voiceprint information
  • the mapping unit 21 is configured to send the preset mapping relationship formed by the participant's identity information, the target language, and the participant's voiceprint information to the central device;
  • the second receiving unit 20 is further configured to receive an interview trigger instruction at the end of the interview;
  • the second generating unit 22 is configured to generate an interview generating instruction in response to the interview trigger instruction
  • the second sending unit 23 is configured to send the interview generation instruction to the central device
  • the second receiving unit 20 is configured to receive an interview record of the central device responding to the interview generation instruction feedback, the interview record being the central device responding to the interview generation instruction, based on the preset mapping relationship and It is generated from the voice data to be interpreted in real time.
  • control terminal 2 further includes: a display unit 24;
  • the display unit 24 is configured to display the interview records in the order of time axis after receiving the interview records of the instructions generated by the central device for the interview.
  • each segment of the interview record in the interview record includes: speaker identity information, collection time, voice data to be simultaneously translated, and translation information.
  • the display unit 24 is also used to arrange each segment of the interview record in the interview record in the order of the time axis according to the collection time; and to arrange each segment of the interview record corresponding to the The identity information of the speaker, the voice data to be simultaneously translated, and the translation information are displayed.
  • each segment of the interview record in the interview record further includes: speaker summary information
  • the display unit 24 is also used to display the speaker identity information, the voice data to be transcribed, and the translation information corresponding to each segment of the interview record after the arrangement, and then to display each segment of the interview.
  • the speaker summary information is displayed in the first preset area in the recorded display area.
  • the interview record further includes: full-text summary information
  • the display unit 24 is also used to display the full-text summary information when displaying the speaker identity information, the voice data to be transcribed, and the translation information corresponding to each segment of the interview record after being arranged. It is displayed in front of the speaker identity information, the voice data to be transcribed, and the translation information corresponding to each segment of the interview record after the arrangement.
  • control terminal 2 further includes: an editing unit 25 and a display unit 24;
  • the second receiving unit 20 is further configured to receive an editing instruction after receiving the interview record of the instruction feedback generated by the central device for the interview;
  • the editing unit 25 is configured to edit the interview record in response to the editing instruction to obtain the final interview record
  • the display unit 24 is used to display the final interview record.
  • control terminal 2 further includes: an export unit 26 and a sharing unit 27;
  • the second receiving unit 20 is configured to receive an export instruction after receiving the interview record of the instruction feedback generated by the central device for the interview;
  • the export unit 26 is configured to respond to the export instruction and process the interview record in a preset format to obtain an export file;
  • the sharing unit 27 is configured to share the exported file.
  • control terminal can obtain the interview record from the central device.
  • the interview record is the information that records the content of the voice interview. In this way, the user can obtain or watch the interview record through the control terminal, which is convenient and fast.
  • a control terminal is provided. The intelligence.
  • an embodiment of the present application provides a hub device, including:
  • the first processor 17 is configured to execute the simultaneous interpretation program stored in the first memory 18 to implement the voice information processing method on the central device side.
  • an embodiment of the present application provides a control terminal, including:
  • the second processor 28 is configured to execute the simultaneous interpretation program stored in the second memory 29 to implement the voice information processing method on the control terminal side.
  • the above-mentioned first processor 17 or second processor 28 may be an Application Specific Integrated Circuit (ASIC), a digital signal processor (Digital Signal Processor, DSP), or a digital signal processor.
  • ASIC Application Specific Integrated Circuit
  • DSP Digital Signal Processor
  • Device Digital Signal Processing Device, DSPD
  • Programmable Logic Device ProgRAMmable Logic Device, PLD
  • Field Programmable Gate Array Field Programmable Gate Array, FPGA
  • CPU controller, microcontroller, microprocessor At least one.
  • the electronic device used to implement the functions of the first processor 17 or the second processor 28 may also be other, which is not limited in the embodiment of the present disclosure.
  • the hub device further includes a first memory 18, and the control terminal further includes a second memory 29.
  • the first memory 18 can be connected to the first processor 17, and the second memory 30 can be connected to the second processor 28.
  • the first memory 18 or the second memory 29 may include a high-speed RAM memory, or may also include a non-volatile memory, for example, at least two disk memories.
  • first memory 18 or second memory 29 may be volatile memory (volatile memory), such as random-access memory (Random-Access Memory, RAM); or non-volatile memory (non-volatile memory).
  • volatile memory such as random-access memory (Random-Access Memory, RAM); or non-volatile memory (non-volatile memory).
  • memory such as read-only memory (Read-Only Memory, ROM), flash memory (flash memory), hard disk (HDD) or solid-state drive (Solid-State Drive, SSD); or Combine and provide instructions and data to the first processor 17 or the second processor 28.
  • the functional modules in this embodiment may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
  • the above-mentioned integrated unit can be realized in the form of hardware or software function module.
  • the integrated unit is implemented in the form of a software function module and is not sold or used as an independent product, it can be stored in a computer readable storage medium.
  • the technical solution of this embodiment is essentially or It is said that the part that contributes to the existing technology or all or part of the technical solution can be embodied in the form of a software product.
  • the computer software product is stored in a storage medium and includes several instructions to enable a computer device (which can It is a personal computer, a server, or a network device, etc.) or a processor (processor) that executes all or part of the steps of the method in this embodiment.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory, random access memory, magnetic disk or optical disk and other media that can store program codes.
  • an embodiment of the present application also provides a computer-readable storage medium on which a simultaneous interpretation program is stored, and the computer program is executed by one or more first processors to realize the voice information processing method on the central device side.
  • the embodiment of the present application also provides a computer-readable storage medium on which a simultaneous interpretation program is stored, and when the computer program is executed by one or more second processors, it realizes the voice information processing method on the control terminal side.
  • the computer-readable storage medium may be a volatile memory (volatile memory), such as a random-access memory (Random-Access Memory, RAM); or a non-volatile memory (non-volatile memory), such as a read-only memory (Read-only memory). Only Memory, ROM, flash memory, Hard Disk Drive (HDD), or Solid-State Drive (SSD); it can also be a respective device including one or any combination of the above-mentioned memories, such as Mobile phones, computers, tablet devices, personal digital assistants, etc.
  • volatile memory volatile memory
  • RAM random-access memory
  • non-volatile memory such as a read-only memory (Read-only memory).
  • SSD Solid-State Drive
  • this application can be provided as methods, systems, or computer program products. Therefore, this application may adopt the form of hardware embodiments, software embodiments, or embodiments combining software and hardware. Moreover, this application may adopt the form of a computer program product implemented on one or more computer-usable storage media (including but not limited to disk storage, optical storage, etc.) containing computer-usable program codes.
  • These computer program instructions can also be stored in a computer-readable memory that can guide a computer or other programmable data processing equipment to work in a specific manner, so that the instructions stored in the computer-readable memory produce an article of manufacture including the instruction device.
  • the device realizes the functions specified in one or more processes in the schematic diagram and/or one block or more in the block diagram.
  • These computer program instructions can also be loaded on a computer or other programmable data processing equipment, so that a series of operation steps are executed on the computer or other programmable equipment to produce computer-implemented processing, so as to execute on the computer or other programmable equipment.
  • the instructions provide steps for implementing functions specified in one or more processes in the schematic diagram and/or one block or more in the block diagram.
  • the central device can determine the identity information of the speaker and obtain the translation information in the language that meets the needs of the listener for the voice data of the speaker to be simultaneously translated in the voice interview scene.
  • an interview record for the interview can be generated based on the above information, so that the central device can perform real-time simultaneous interpretation of the voice data to be simultaneously interpreted, and can also record the identified speaker identity information and translation information.
  • the structured data is used as the recorded fragment information.
  • multiple recorded fragments will be obtained to generate the interview record of the voice interview. Therefore, the efficiency of data collation in the voice interview is improved, that is, the voice interview is improved.
  • the generation speed and processing efficiency of interview records are examples of interview records.

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computational Linguistics (AREA)
  • Telephonic Communication Services (AREA)

Abstract

A voice information processing method, a hub device (1), a control terminal (2) and a storage medium. The method comprises: after a voice interview begins, receiving voice data, to be interpreted simultaneously, of a speaker transmitted by an acquisition terminal, and acquiring the acquisition time of starting to acquire said voice data (S101); determining identity information of the speaker on the basis of said voice data and a preset mapping relationship, and simultaneously interpreting said voice data into a target language of a listener in real time, so as to obtain interpretation information, the preset mapping relationship being a correlation between identity information of participants, target languages and voiceprint information of the participants, and the listener being a person other than the speaker among the participants (S102); recording said acquisition time, the identity information of the speaker and the interpretation information, so as to obtain one piece of record fragment information, and then acquiring at least one piece of record fragment information when the voice interview ends (S103); and generating, on the basis of the at least one piece of record fragment information, an interview record (S104).

Description

语音信息处理方法、中枢设备、控制终端及存储介质Voice information processing method, central equipment, control terminal and storage medium 技术领域Technical field
本申请实施例涉及语音处理技术领域,尤其涉及一种语音信息处理方法、中枢设备、控制终端及存储介质。The embodiments of the present application relate to the field of voice processing technology, and in particular, to a voice information processing method, a central device, a control terminal, and a storage medium.
背景技术Background technique
随着经济全球化趋势的不断加深,不同国家、不同文化之间的交流越来越频繁。As the trend of economic globalization continues to deepen, exchanges between different countries and different cultures have become more and more frequent.
在多人访谈的场景或会议中,由于参与者可能来自不同国家和地区,相互之间沟通会存在障碍。另外,在访谈结束后,对访谈记录的翻译和整理也要耗费巨大的人力,效率较低且不利于快速对外发布和传播访谈内容。In multi-person interview scenarios or meetings, since participants may come from different countries and regions, there will be barriers to communication with each other. In addition, after the interview, the translation and sorting of the interview records will also consume a lot of manpower, which is inefficient and is not conducive to the rapid release and dissemination of the interview content.
发明内容Summary of the invention
本申请实施例期望提供一种语音信息处理方法、中枢设备、控制终端及存储介质,能够生成访谈记录,提高语音访谈的访谈记录的生成速度和处理效率。The embodiments of the present application expect to provide a voice information processing method, a central device, a control terminal, and a storage medium, which can generate interview records and improve the generation speed and processing efficiency of the interview records of voice interviews.
本申请实施例的技术方案是这样实现的:The technical solutions of the embodiments of the present application are implemented as follows:
本申请实施例提供了一种语音信息处理方法,包括:The embodiment of the present application provides a voice information processing method, including:
在语音访谈开始后,接收采集终端传输的发言者的待同传语音数据,并获取开始采集所述待同传语音数据的采集时间;After the start of the voice interview, receiving the voice data to be interpreted from the speaker transmitted by the collection terminal, and acquiring the collection time when the voice data to be transcribed was collected;
基于所述待同传语音数据和预设映射关系,确定发言者身份信息,并实时对所述待同传语音数据同传翻译为收听者目标语言,得到翻译信息;所述预设映射关系为参与者身份信息、目标语言以及参与者的声纹信息之间的对应关系;其中,收听者为所述参与者中除所述发言者之外的人员;Based on the voice data to be interpreted and a preset mapping relationship, the identity information of the speaker is determined, and the voice data to be interpreted simultaneously is translated into the listener's target language in real time to obtain translation information; the preset mapping relationship is The correspondence between the identity information of the participant, the target language, and the voiceprint information of the participant; wherein the listener is a person other than the speaker among the participants;
记录与所述待同传语音数据对应的所述采集时间、所述发言者身份信息和所述翻译信息,得到一个记录片段信息,进而在语音访谈结束时,获取到至少一个记录片段信息;Record the collection time, the speaker identity information, and the translation information corresponding to the voice data to be simultaneously translated to obtain one piece of recorded information, and then at the end of the voice interview, obtain at least one piece of recorded information;
基于所述至少一个记录片段信息,生成访谈记录。Based on the at least one piece of record information, an interview record is generated.
在上述方案中,所述基于所述待同传语音数据和预设映射关系,确定发言者身份信息,并实时对所述待同传语音数据同传翻译为收听者目标语言,得到翻译信息,包括:In the above solution, the identity information of the speaker is determined based on the voice data to be simultaneously interpreted and the preset mapping relationship, and the voice data to be interpreted simultaneously is translated into the target language of the listener in real time to obtain the translation information, include:
从预设映射关系中的参与者的声纹信息中,确定与所述待同传语音数据声纹匹配的目标声纹信息;From the voiceprint information of the participants in the preset mapping relationship, determine the target voiceprint information that matches the voiceprint of the voice data to be simultaneously transmitted;
基于所述目标声纹信息和所述预设映射关系中的参与者身份信息与参与者的声纹信息之间的对应关系,确定出与所述待同传语音数据声纹对应的所述发言者身份信息,并基于所述预设映射关系中的参与者身份信息与目标语音之间的对应关系,获取收听者对应的所述收听者目标语言;Based on the target voiceprint information and the corresponding relationship between the participant identity information in the preset mapping relationship and the voiceprint information of the participant, the speech corresponding to the voiceprint of the voice data to be simultaneously transmitted is determined The identity information of the listener, and based on the corresponding relationship between the identity information of the participant in the preset mapping relationship and the target voice, obtain the listener's target language corresponding to the listener;
将所述待同传语音数据实时翻译为所述收听者目标语言,得到所述翻译信息。Translate the voice data to be simultaneously translated into the target language of the listener in real time to obtain the translation information.
在上述方案中,所述记录与所述待同传语音数据对应的所述采集时间、所述发言者身份信息和所述翻译信息,得到一个记录片段信息,进而在语音访谈结束时,获取到至少一个记录片段信息,包括:In the above solution, the recording of the collection time, the speaker identity information, and the translation information corresponding to the voice data to be transcribed, obtains a piece of record information, which is then obtained at the end of the voice interview At least one record segment information, including:
对所述待同传语音数据进行文本识别,得到源文本信息;Performing text recognition on the voice data to be simultaneously translated to obtain source text information;
记录与所述待同传语音数据对应的所述采集时间、所述发言者身份信息、所述翻译信息和所述源文本信息,直至所述目标声纹信息发生改变时,得到一个记录片段信息,进而在语音访谈结束时,获取到所述至少一个记录片段信息。Record the collection time, the speaker identity information, the translation information, and the source text information corresponding to the voice data to be simultaneously translated, until the target voiceprint information changes, obtain a piece of recorded information , And then at the end of the voice interview, the at least one recorded segment information is obtained.
在上述方案中,所述记录与所述待同传语音数据对应的所述采集时间、所述发言者身份信息和所述翻译信息,得到一个记录片段信息,进而在语音访谈结束时,获取到至少一个记录片段信息之后,所述方法还包括:In the above solution, the recording of the collection time, the speaker identity information, and the translation information corresponding to the voice data to be transcribed, obtains a piece of record information, which is then obtained at the end of the voice interview After recording at least one piece of information, the method further includes:
采用摘要提取技术,对所述至少一个记录片段信息进行摘要提取,提取出全文摘要信息;Using abstract extraction technology, abstract extraction is performed on the at least one record fragment information, and full text abstract information is extracted;
所述基于所述至少一个记录片段信息,生成访谈记录,包括:The generating an interview record based on the information of the at least one record segment includes:
基于所述至少一个记录片段信息和所述全文摘要信息,生成所述访谈记录。Based on the at least one recorded segment information and the full-text summary information, the interview record is generated.
在上述方案中,所述记录与所述待同传语音数据对应的所述采集时间、所述发言者身份信息和所述翻译信息,得到一个记录片段信息之后,所述方法还包括:In the above solution, after the recording of the collection time, the speaker identity information, and the translation information corresponding to the voice data to be simultaneously translated, and obtaining a piece of record information, the method further includes:
采用摘要提取技术,对一个记录片段信息进行摘要提取,提取出发言者摘要信息;Using abstract extraction technology, extract a summary of the information of a record segment, and extract the summary information of the speaker;
在语音访谈结束时,获取到所述至少一个记录片段信息和至少一个发言者摘要信息。At the end of the voice interview, the at least one recorded segment information and the at least one speaker summary information are obtained.
在上述方案中,所述基于所述至少一个记录片段信息,生成访谈记录,包括:In the above solution, the generating an interview record based on the information of the at least one record segment includes:
基于所述至少一个记录片段信息和所述至少一个发言者摘要信息,生成所述访谈记录。Based on the at least one recorded segment information and the at least one speaker summary information, the interview record is generated.
在上述方案中,所述基于所述至少一个记录片段信息,生成访谈记录,包括:In the above solution, the generating an interview record based on the information of the at least one record segment includes:
接收控制终端发送的访谈生成指令;Receive interview generation instructions sent by the control terminal;
响应于所述访谈生成指令,对所述至少一个记录片段信息按照时间轴的顺序,生成所述访谈记录;In response to the interview generation instruction, generating the interview record for the at least one piece of record information in the order of the time axis;
所述基于所述至少一个记录片段信息,生成访谈记录之后,所述方法还包括:After the interview record is generated based on the at least one record piece information, the method further includes:
发送所述访谈记录给所述控制终端。Send the interview record to the control terminal.
本申请实施例还提供了一种语音信息处理方法,包括:The embodiment of the present application also provides a voice information processing method, including:
接收参与者身份信息、目标语言以及参与者的声纹信息;Receive participant identity information, target language, and participant’s voiceprint information;
将所述参与者身份信息、目标语言以及参与者的声纹信息构成的预设映射关系发送至中枢设备;Sending the preset mapping relationship formed by the participant's identity information, the target language, and the participant's voiceprint information to the central device;
在访谈结束时,接收访谈触发指令,响应于所述访谈触发指令,生成访谈生成指令;At the end of the interview, receive an interview trigger instruction, and generate an interview generation instruction in response to the interview trigger instruction;
将所述访谈生成指令发送至所述中枢设备;Sending the interview generation instruction to the central device;
接收所述中枢设备针对所述访谈生成指令反馈的访谈记录,所述访谈记录为所述中枢设备响应所述访谈生成指令,基于所述预设映射关系和实时接收的待同传语音数据生成的。Receive the interview record of the central device's feedback on the interview generation instruction, the interview record is generated by the central device in response to the interview generation instruction based on the preset mapping relationship and the voice data to be simultaneously transmitted received in real time .
在上述方案中,所述接收所述中枢设备针对所述访谈生成指令反馈的访谈记录之后,所述方法还包括:In the above solution, after the receiving the interview record of the instruction feedback generated by the central device for the interview, the method further includes:
对所述访谈记录按照时间轴顺序进行展示。The interview records are displayed in order of time axis.
在上述方案中,所述访谈记录中的每一段访谈记录包括:发言者身份信息、采集时间、待同传语音数据和翻译信息。In the above solution, each segment of the interview record in the interview record includes: speaker identity information, collection time, voice data to be simultaneously translated, and translation information.
在上述方案中,所述对所述访谈记录按照时间轴顺序进行展示,包括:In the above solution, the display of the interview records in the order of time axis includes:
对所述访谈记录中的每一段访谈记录按照采集时间进行时间轴顺序的排列;Arrange each segment of the interview record in the interview record in the order of the time axis according to the collection time;
将排列后的每一段访谈记录对应的所述发言者身份信息、所述待同传语音数据和所述翻译信息进行展示。Display the speaker identity information, the voice data to be simultaneously translated, and the translation information corresponding to each segment of the interview record after the arrangement.
在上述方案中,所述访谈记录中的每一段访谈记录还包括:发言者摘要信息;所述将排列后的每一段访谈记录对应的所述发言者身份信息、所述待同传语音数据和所述翻译信息进行展示之后,所述方法还包括:In the above solution, each segment of the interview record in the interview record further includes: speaker summary information; the speaker identity information corresponding to each segment of the interview record after the arrangement, the voice data to be transcribed, and After the translation information is displayed, the method further includes:
在所述每一段访谈记录的展示区域中的第一预设区域中,显示所述发言者摘要信息。In the first preset area in the display area of each interview record, the speaker summary information is displayed.
在上述方案中,所述访谈记录中还包括:全文摘要信息;所述将排列后的每一段访谈记录对应的所述发言者身份信息、所述待同传语音数据和所述翻译信息进行展示时,所述方法还包括:In the above solution, the interview record further includes: full-text summary information; the speaker identity information, the voice data to be transcribed, and the translation information corresponding to each segment of the interview record after being arranged are displayed When, the method further includes:
将所述全文摘要信息展示在排列后的每一段访谈记录对应的所述发言者身份信息、所述待同传语音数据和所述翻译信息的前面。The full text summary information is displayed in front of the speaker identity information, the voice data to be transcribed, and the translation information corresponding to each segment of the interview record after the arrangement.
在上述方案中,所述接收所述中枢设备针对所述访谈生成指令反馈的访谈记录之后,所述方法还包括:In the above solution, after the receiving the interview record of the instruction feedback generated by the central device for the interview, the method further includes:
接收编辑指令;Receive editing instructions;
响应所述编辑指令,对所述访谈记录进行编辑,得到最终访谈记录并展示。In response to the editing instruction, the interview record is edited, and the final interview record is obtained and displayed.
在上述方案中,所述接收所述中枢设备针对所述访谈生成指令反馈的访谈记录之后,所述方法还包括:In the above solution, after the receiving the interview record of the instruction feedback generated by the central device for the interview, the method further includes:
接收导出指令;Receive export instructions;
响应所述导出指令,将所述访谈记录进行预设格式处理,得到导出文件;In response to the export instruction, process the interview record in a preset format to obtain an export file;
将所述导出文件进行分享。Share the exported file.
本申请实施例提供了一种中枢设备,包括:The embodiment of the application provides a hub device, including:
第一接收单元,用于在语音访谈开始后,接收采集终端传输的发言者的待同传语音数据,并 获取开始采集所述待同传语音数据的采集时间;The first receiving unit is configured to receive the voice data for simultaneous interpretation of the speaker transmitted by the collection terminal after the voice interview starts, and obtain the collection time when the voice data for simultaneous interpretation starts to be collected;
确定单元,用于基于所述待同传语音数据和预设映射关系,确定发言者身份信息;The determining unit is configured to determine the identity information of the speaker based on the voice data to be simultaneously transmitted and the preset mapping relationship;
翻译单元,用于实时对所述待同传语音数据同传翻译为收听者目标语言,得到翻译信息;所述预设映射关系为参与者身份信息、目标语言以及参与者的声纹信息之间的对应关系;其中,收听者为所述参与者中除所述发言者之外的人员;The translation unit is used for real-time translation of the voice data to be simultaneously translated into the target language of the listener to obtain translation information; the preset mapping relationship is between the identity information of the participant, the target language and the voiceprint information of the participant Correspondence relationship; wherein, the listener is a person other than the speaker among the participants;
记录单元,用于记录与所述待同传语音数据对应的所述采集时间、所述发言者身份信息和所述翻译信息,得到一个记录片段信息,进而在语音访谈结束时,获取到至少一个记录片段信息;The recording unit is used to record the collection time, the speaker identity information, and the translation information corresponding to the voice data to be simultaneously translated to obtain a piece of recorded piece information, and then at the end of the voice interview, at least one piece of information is obtained Record fragment information;
第一生成单元,用于基于所述至少一个记录片段信息,生成访谈记录。The first generating unit is configured to generate interview records based on the at least one record segment information.
本申请实施例提供了一种控制终端,包括:The embodiment of the present application provides a control terminal, including:
第二接收单元,用于接收参与者身份信息、目标语言以及参与者的声纹信息;The second receiving unit is used to receive the participant's identity information, the target language and the participant's voiceprint information;
映射单元,用于将所述参与者身份信息、目标语言以及参与者的声纹信息构成的预设映射关系发送至中枢设备;A mapping unit, configured to send the preset mapping relationship formed by the participant's identity information, the target language, and the participant's voiceprint information to the central device;
所述第二接收单元,还用于在访谈结束时,接收访谈触发指令;The second receiving unit is further configured to receive an interview trigger instruction at the end of the interview;
第二生成单元,用于响应于所述访谈触发指令,生成访谈生成指令;The second generation unit is configured to generate an interview generation instruction in response to the interview trigger instruction;
第二发送单元,用于将所述访谈生成指令发送至所述中枢设备;The second sending unit is configured to send the interview generation instruction to the central device;
所述第二接收单元,用于接收所述中枢设备针对所述访谈生成指令反馈的访谈记录,所述访谈记录为所述中枢设备响应所述访谈生成指令,基于所述预设映射关系和实时接收的待同传语音数据生成的。The second receiving unit is configured to receive an interview record of the central device responding to the interview generation instruction feedback, the interview record being the central device responding to the interview generation instruction, based on the preset mapping relationship and real-time Generated by the received voice data to be interpreted simultaneously.
本申请实施例还提供了一种中枢设备,包括:The embodiment of the present application also provides a hub device, including:
第一处理器和第一存储器;The first processor and the first memory;
所述第一处理器,配置为执行所述第一存储器中存储的同声传译程序,以实现中枢设备侧的所述的语音信息处理方法。The first processor is configured to execute the simultaneous interpretation program stored in the first memory to implement the voice information processing method on the central device side.
本申请实施例还提供了一种控制终端,包括:The embodiment of the present application also provides a control terminal, including:
第二处理器和第二存储器;A second processor and a second memory;
所述第二处理器,配置为执行所述第二存储器中存储的同声传译程序,以实现控制终端侧的所述的语音信息处理方法。The second processor is configured to execute the simultaneous interpretation program stored in the second memory to implement the voice information processing method on the control terminal side.
本申请实施例提供了一种存储介质,其上存储有同声传译程序,所述同声传译程序被第一处理器执行时实现中枢设备侧的所述的语音信息处理方法;或者,所述同声传译程序被第二处理器执行时实现控制终端侧的所述的语音信息处理方法。An embodiment of the present application provides a storage medium on which a simultaneous interpretation program is stored, and when the simultaneous interpretation program is executed by a first processor, the voice information processing method on the central device side is implemented; or, the When the simultaneous interpretation program is executed by the second processor, the voice information processing method on the control terminal side is realized.
本申请实施例期望提供一种语音信息处理方法、中枢设备、控制终端及存储介质,包括:在语音访谈开始后,接收采集终端传输的发言者的待同传语音数据,并获取开始采集待同传语音数据的采集时间;基于待同传语音数据和预设映射关系,确定发言者身份信息,并实时对待同传语音数据同传翻译为收听者目标语言,得到翻译信息;预设映射关系为参与者身份信息、目标语言以及参与者的声纹信息之间的对应关系;其中,收听者为参与者中除发言者之外的人员;记录与待同传语音数据对应的采集时间、发言者身份信息和翻译信息,得到一个记录片段信息,进而在语音访谈结束时,获取到至少一个记录片段信息;基于至少一个记录片段信息,生成访谈记录。采用上述技术实现方案,由于中枢设备可以在语音访谈场景中,针对发言者的待同传语音数据,确定出发言者身份信息,以及获取满足收听者需求的语言的翻译信息,在访谈结束时,可以基于上述信息生成针对此次访谈的访谈记录,这样中枢设备在针对待同传语音数据进行实时同传翻译的同时,还可以记录下来确定出的发言者身份信息、翻译信息等结构化数据,作为记录片段信息,最后在访谈结束的时候,将得到多个记录片段信息生成此次语音访谈的访谈记录,因此,提高了语音访谈中的数据整理的效率,即提高了语音访谈的访谈记录的生成速度和处理效率。The embodiment of the application expects to provide a voice information processing method, a central device, a control terminal, and a storage medium, including: after the voice interview starts, receiving the voice data of the speaker to be simultaneously transmitted transmitted by the collection terminal, and obtaining the voice data of the speaker to be synchronized. The collection time of the transmitted voice data; based on the voice data to be simultaneously interpreted and the preset mapping relationship, determine the identity information of the speaker, and treat the simultaneous interpretation of the simultaneous voice data into the listener's target language in real time to obtain the translation information; the preset mapping relationship is The corresponding relationship between the identity information of the participant, the target language, and the voiceprint information of the participant; among them, the listener is the participant other than the speaker; the collection time and the speaker corresponding to the voice data to be transcribed are recorded The identity information and the translation information are used to obtain a piece of recorded information, and then at the end of the voice interview, at least one piece of recorded information is obtained; based on the at least one piece of recorded information, an interview record is generated. By adopting the above technical implementation scheme, since the central device can determine the identity information of the speaker according to the voice data of the speaker to be transcribed in the voice interview scene, and obtain the translation information in the language that meets the needs of the listener, at the end of the interview, The interview record for this interview can be generated based on the above information, so that while the central device performs real-time simultaneous interpretation of the voice data to be simultaneously interpreted, it can also record the identified speaker identity information, translation information and other structured data. As the recorded information, at the end of the interview, multiple recorded information will be obtained to generate the interview record of the voice interview. Therefore, the efficiency of data collation in the voice interview is improved, that is, the interview record of the voice interview is improved. Generation speed and processing efficiency.
附图说明Description of the drawings
图1为本申请实施例提供的语音信息处理系统的架构图;FIG. 1 is an architecture diagram of a voice information processing system provided by an embodiment of the application;
图2为本申请实施例提供的一种语音信息处理方法的流程示意图一;FIG. 2 is a first schematic flowchart of a voice information processing method provided by an embodiment of this application;
图3为本申请实施例提供的一种语音信息处理方法的流程示意图二;FIG. 3 is a second schematic flowchart of a voice information processing method provided by an embodiment of this application;
图4为本申请实施例提供的一种语音信息处理方法的流程示意图三;FIG. 4 is a third schematic flowchart of a voice information processing method provided by an embodiment of this application;
图5为本申请实施例还提供的一种语音信息处理方法的流程示意图一;FIG. 5 is a first schematic flowchart of a voice information processing method provided by an embodiment of this application;
图6为本申请实施例还提供的一种语音信息处理方法的流程示意图二;FIG. 6 is a second schematic flowchart of a voice information processing method provided by an embodiment of this application;
图7为本申请实施例提供的一种示例性的访谈记录的展示界面示意图一;FIG. 7 is a schematic diagram 1 of an exemplary display interface of interview records provided by an embodiment of the application; FIG.
图8为本申请实施例提供的一种示例性的访谈记录的展示界面示意图二;Fig. 8 is a second schematic diagram of an exemplary display interface for interview records provided by an embodiment of the application;
图9为本申请实施例提供的一种示例性的访谈记录的展示界面示意图三;FIG. 9 is a third schematic diagram of an exemplary display interface for interview records provided by an embodiment of the application; FIG.
图10为本申请实施例提供的一种示例性的访谈记录的展示界面示意图四;FIG. 10 is a fourth schematic diagram of a display interface of an exemplary interview record provided by an embodiment of the application; FIG.
图11为本申请实施例提供的一种语音信息处理方法的交互图;FIG. 11 is an interaction diagram of a voice information processing method provided by an embodiment of this application;
图12为本申请实施例提供的中枢设备的组成结构示意图一;FIG. 12 is a schematic diagram 1 of the composition structure of a hub device provided by an embodiment of the application; FIG.
图13为本申请实施例提供的中枢设备的组成结构示意图二;FIG. 13 is a second schematic diagram of the composition structure of the hub device provided by an embodiment of the application;
图14为本申请实施例提供的控制终端的组成结构示意图一;FIG. 14 is a schematic diagram 1 of the composition structure of a control terminal provided by an embodiment of the application;
图15为本申请实施例提供的控制终端的组成结构示意图二。FIG. 15 is a second schematic diagram of the composition structure of a control terminal provided by an embodiment of the application.
具体实施方式Detailed ways
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述。可以理解的是,此处所描述的具体实施例仅仅用于解释相关申请,而非对该申请的限定。另外还需要说明的是,为了便于描述,附图中仅示出了与有关申请相关的部分。The technical solutions in the embodiments of the present application will be clearly and completely described below in conjunction with the drawings in the embodiments of the present application. It is understandable that the specific embodiments described here are only used to explain the related application, but not to limit the application. In addition, it should be noted that, for ease of description, only the parts related to the relevant application are shown in the drawings.
本申请实施例提供了一种语音信息处理方法,通过语音信息处理装置实现,本申请实施例提供的语音信息处理装置可以包括中枢设备、控制终端和收发一体终端(包括采集终端和接收终端)。The embodiment of the application provides a voice information processing method, which is implemented by a voice information processing device. The voice information processing device provided in the embodiment of the application may include a central device, a control terminal, and a transceiver integrated terminal (including a collection terminal and a receiving terminal).
图1为语音信息处理方法应用的语音信息处理系统的架构示意图;如图1所示,语音信息处理系统可以包括:中枢设备1、控制终端2、多个收发一体终端3(包括采集终端3-1和接收终端3-2)。Figure 1 is a schematic structural diagram of a voice information processing system applied by a voice information processing method; as shown in Figure 1, the voice information processing system may include: a central device 1, a control terminal 2, multiple transceiver integrated terminals 3 (including collection terminals 3- 1 and the receiving terminal 3-2).
在多人访谈或者多人会议等场景中,发言者(演讲者)可以自身佩戴的收发一体终端3(即采集终端3-1)进行会议演讲,在进行会议演讲的过程中,采集终端3-1采集发言者的语音数据(即待同传语音数据),并将所述待同传语音数据实时的传输给中枢设备1,中枢设备1在获取待同传语音数据时,将开始接收该待同传语音数据的时间当作采集时间进行获取,再基于待同传语音数据和预设映射关系,确定出正在发言或者演讲的发言者身份信息,同时还采用实时对所述待同传语音数据同传翻译为收听者目标语言,从而得到翻译信息(具体依据收听者所需的语种,对应发送相应语种的翻译结果);将翻译信息实时同传给参加会议的收听者的一体终端(即接收终端3-2)上,还将每个发言者对应的待同传语音数据,采集时间和发言者身份信息以及翻译信息等,都记录成发言者对应的记录片段信息,除此之外,还在会议结束时,中枢设备1可以接收到控制终端2发送的访谈生成指令,根据该访谈生成指令,对所有此次会议得到的记录片段信息生成访谈记录,最后将访谈记录发送给控制终端2,使得控制终端2可以将访谈记录进行展示,或者通过控制终端2将访谈记录分享给参加会议的参与者所拥有的用户终端等,以便参与者进行会议记录的访问或浏览。In scenarios such as multi-person interviews or multi-person conferences, the speaker (speaker) can use the integrated transceiver terminal 3 (ie collection terminal 3-1) that he wears to give conference lectures. In the process of conducting conference lectures, the collection terminal 3- 1 Collect the speaker's voice data (that is, the voice data to be simultaneously transmitted), and transmit the voice data to be simultaneously transmitted to the hub device 1 in real time. When the hub device 1 obtains the voice data to be synchronized, it will start to receive the voice data to be synchronized. The time of simultaneous voice data is acquired as the collection time, and then based on the voice data to be simultaneously transferred and the preset mapping relationship, the identity information of the speaker who is speaking or speaking is determined, and at the same time, the voice data to be simultaneously transferred is determined in real time. Simultaneous interpretation is translated into the target language of the listener, so as to obtain the translation information (specifically according to the language required by the listener, the corresponding translation result of the corresponding language is sent); the translation information is simultaneously transmitted to the integrated terminal of the listeners participating in the meeting in real time (ie receiving On the terminal 3-2), the voice data to be interpreted corresponding to each speaker, the collection time, speaker identity information, and translation information, etc., are recorded as the record fragment information corresponding to the speaker. In addition, At the end of the meeting, the central device 1 can receive the interview generation instruction sent by the control terminal 2, and according to the interview generation instruction, generate interview records for all the recorded pieces of information obtained in the meeting, and finally send the interview records to the control terminal 2. The control terminal 2 can display the interview record, or share the interview record with the user terminal owned by the participant participating in the meeting through the control terminal 2, so that the participant can access or browse the meeting record.
图2为本申请实施例提供的一种语音信息处理方法的流程示意图一。如图2所示,应用于中枢设备中,语音信息处理方法包括以下步骤:FIG. 2 is a first schematic flowchart of a voice information processing method provided by an embodiment of the application. As shown in Figure 2, when applied to a hub device, the voice information processing method includes the following steps:
S101、在语音访谈开始后,接收采集终端传输的发言者的待同传语音数据,并获取开始采集待同传语音数据的采集时间。S101. After the start of the voice interview, receive the voice data to be interpreted from the speaker transmitted by the collection terminal, and obtain the collection time when the voice data to be transcribed starts to be collected.
本申请实施例提供的一种语音信息处理方法可以应用在国际会议,国际访谈,或者各种需同传翻译的场景中,本申请实施例不作限制。The voice information processing method provided by the embodiment of this application can be applied to international conferences, international interviews, or various scenarios that require simultaneous interpretation and translation, and the embodiments of this application are not limited.
需要说明的是,在本申请的实施例中,应用场景还可以划分为大型国际会议、小型工作会议、公共服务场所、公共社交场所、社交类应用,以及通用场景等。其中,公共服务场所可以为候车厅、政府办公大厅等,公共社交场所可以为咖啡厅、音乐厅等。待同传语音数据对应的实际应用场景,实际上就是采集待同传语音数据具体所处的应用场景。具体的实际应用场景本申请实施例不作限定。It should be noted that in the embodiments of the present application, application scenarios can also be divided into large-scale international conferences, small-scale work conferences, public service venues, public social venues, social applications, and general scenarios. Among them, public service places may be waiting halls, government office halls, etc., and public social places may be coffee shops, concert halls, etc. The actual application scenario corresponding to the voice data to be synchronized is actually the specific application scenario in which the voice data to be synchronized is collected. The specific actual application scenario is not limited in the embodiment of this application.
在本申请实施例中,中枢设备与一体终端和控制终端通信,一体终端为参与语音访谈的参与者佩戴的收发一体终端,例如耳机/麦克风一体终端,每次发言者佩戴的一体终端还可以称为是采集终端,此时其他收听者佩戴的一体终端则可以称为接收终端。In the embodiments of this application, the hub device communicates with the integrated terminal and the control terminal. The integrated terminal is a transceiver integrated terminal worn by participants participating in a voice interview, such as a headset/microphone integrated terminal. The integrated terminal worn by each speaker can also be called In order to be a collection terminal, the all-in-one terminal worn by other listeners at this time can be called a receiving terminal.
其中,通信方式可以为无线通信技术、有线通信技术和近场通信技术等,例如,蓝牙或者Wi-Fi等,本申请实施例不作限制。Wherein, the communication mode may be wireless communication technology, wired communication technology, near field communication technology, etc., for example, Bluetooth or Wi-Fi, etc., which is not limited in the embodiment of the present application.
需要说明的是,一体终端上既有耳机又有麦克风,发言时采用麦克风采集待同传语音数据,收听时采用耳机播放翻译信息。因此,每个参与者既可以是发言者也可以是收听者,具体的根据实际情况决定,本申请实施例不做限制。It should be noted that the integrated terminal has both a headset and a microphone. The microphone is used to collect voice data for simultaneous interpretation when speaking, and the headset is used to play translation information when listening. Therefore, each participant can be either a speaker or a listener, and the specific decision is based on actual conditions, and the embodiment of the present application does not limit it.
在语音访谈开始后,发言者采用采集终端收音,采集到待同传语音数据,并且实时的将待同传语音数据传输给中枢设备,该中枢设备同时获取开始接收到待同传语音数据的时间,即采集时间。After the start of the voice interview, the speaker uses the collection terminal to collect the voice data to be synchronized, and transmits the voice data to be synchronized to the central device in real time. The hub device also obtains the time when the voice data to be synchronized is received. , That is, the acquisition time.
需要说明的是,每个发言者的待同传语音数据都是实时传输的,但是中枢设备只获取每个发言者开始发言的时间,即采集时间。在本申请的实施例中,待同传语音数据可以为需要进行语音翻译的任何语音,例如,在应用场景中实时采集到的语音。此外,待同传语音数据可以为任意类型语言的语音。具体的待同传语音数据本申请实施例不作限定。It should be noted that the voice data of each speaker to be simultaneously transmitted is transmitted in real time, but the central device only obtains the time when each speaker starts to speak, that is, the collection time. In the embodiment of the present application, the voice data to be simultaneously translated may be any voice that requires voice translation, for example, voice collected in real time in an application scenario. In addition, the voice data to be interpreted can be voices in any type of language. The specific voice data to be simultaneously transmitted is not limited in this embodiment of the application.
在本申请实施例中,在访谈开始之后,可能会有多个发言者进行发言,本申请中的发言者为参与者中每次发言的人员,本申请实施例不作限制。In the embodiments of this application, after the interview starts, there may be multiple speakers who speak. The speakers in this application are the people who speak each time among the participants, and the embodiments of this application are not limited.
S102、基于待同传语音数据和预设映射关系,确定发言者身份信息,并实时对待同传语音数据同传翻译为收听者目标语言,得到翻译信息;预设映射关系为参与者身份信息、目标语言以及参与者的声纹信息之间的对应关系;其中,收听者为参与者中除发言者之外的人员。S102. Determine the identity information of the speaker based on the voice data to be simultaneously interpreted and the preset mapping relationship, and treat the simultaneous interpretation of the voice data to the listener's target language in real time to obtain the translation information; the preset mapping relationship is the identity information of the participants, The corresponding relationship between the target language and the voiceprint information of the participants; among them, the listener is a person other than the speaker among the participants.
中枢设备在获取到待同传语音数据之后,由于中枢设备中存储有预设映射关系,而预设映射关系为参与者身份信息、目标语言以及参与者的声纹信息之间的对应关系,这样,该中枢设备就可以基于预设映射关系存储的各个参与者的声纹信息,先找到待同传语音数据匹配的目标声纹信息,再基于参与者身份信息与参与者的声纹信息之间的对应关系,找到目标声纹信息对应的参与者身份信息,即发言者身份信息,同时确认了收听者身份信息,再从参与者身份信息与目标语言的对应关系中,确定出收听者目标语言,进而将待同传语音数据实时翻译为各个收听者对应的收听者目标语言的翻译信息,供各个收听者可以采用自己熟悉的语音,且通过接收终端来听到发言者的讲话。其中,收听者为参与者中除发言者之外的人员。After the hub device obtains the voice data to be simultaneously transmitted, the preset mapping relationship is stored in the hub device, and the preset mapping relationship is the corresponding relationship between the identity information of the participant, the target language, and the voiceprint information of the participant. , The hub device can first find the target voiceprint information matching the voice data to be simultaneously transmitted based on the voiceprint information of each participant stored based on the preset mapping relationship, and then based on the difference between the participant’s identity information and the participant’s voiceprint information Find the participant identity information corresponding to the target voiceprint information, that is, the speaker’s identity information, and confirm the listener’s identity information, and then determine the listener’s target language from the correspondence between the participant’s identity information and the target language , And then translate the voice data to be simultaneously translated into the translation information in the target language of each listener corresponding to each listener, so that each listener can use their familiar voice and hear the speaker's speech through the receiving terminal. Among them, the listener is a person other than the speaker among the participants.
在本申请实施例中,发言者身份信息可以为发言者的姓名或身份的唯一标识,本申请实施例不作限制。In the embodiment of the present application, the speaker identity information may be the name or the unique identifier of the speaker, which is not limited in the embodiment of the present application.
需要说明的是,预设映射关系为参与者身份信息、目标语言以及参与者的声纹信息之间的对应关系,可以表示为预设映射关系中包括参与者身份信息与目标语言的对应关系、参与者身份信息与参与者的声纹信息的对应关系、目标语言与参与者的声纹信息的关系、参与者的声纹信息库、参与者身份信息库和目标语言库等。It should be noted that the preset mapping relationship is the corresponding relationship between the participant's identity information, the target language, and the participant's voiceprint information, which can be expressed as the preset mapping relationship including the corresponding relationship between the participant's identity information and the target language, The corresponding relationship between the participant's identity information and the participant's voiceprint information, the relationship between the target language and the participant's voiceprint information, the participant's voiceprint information database, the participant's identity information database, and the target language database.
也就是说,在本申请实施例中,中枢设备从预设映射关系中的参与者的声纹信息中,确定与待同传语音数据声纹匹配的目标声纹信息;基于目标声纹信息和预设映射关系中的参与者身份信息与参与者的声纹信息之间的对应关系,确定出与待同传语音数据声纹对应的发言者身份信息,并基于预设映射关系中的参与者身份信息与目标语音之间的对应关系,获取收听者对应的收听者目标语言;最后将待同传语音数据实时翻译为收听者目标语言,得到翻译信息。That is to say, in this embodiment of the application, the hub device determines the target voiceprint information that matches the voiceprint of the voice data to be simultaneously transmitted from the voiceprint information of the participants in the preset mapping relationship; based on the target voiceprint information and The corresponding relationship between the participant's identity information and the participant's voiceprint information in the preset mapping relationship is determined, and the speaker identity information corresponding to the voiceprint of the voice data to be transcribed is determined, and based on the participant in the preset mapping relationship The corresponding relationship between the identity information and the target voice is obtained, and the target language of the listener corresponding to the listener is obtained; finally, the voice data to be simultaneously translated is translated into the target language of the listener in real time to obtain the translation information.
在本申请的一些实施例中,中枢设备在获知了目标语言以及参与者的声纹信息之间的对应关系之后,那么在确定了目标声纹信息之后,由于参与者中除了发言者就是收听者,因此,在确定了发言者对应的目标声纹信息之后,其他不匹配的声纹信息对应的目标语言就是收听者目标语言了,这样就可以确定出收听者对应的收听者目标语言了。In some embodiments of the present application, after the hub device has learned the correspondence between the target language and the voiceprint information of the participants, after determining the target voiceprint information, since the participants are the listeners except for the speaker Therefore, after the target voiceprint information corresponding to the speaker is determined, the target language corresponding to other unmatched voiceprint information is the listener's target language, so that the listener's target language corresponding to the listener can be determined.
示例性的,假设参与者为A、B和C,A在发言,则确定出A为发言者时,那么B和C为收听者,从而可以从A目标语言、B目标语言和C目标语言中找到收听者B和C对应的目标语言了。Exemplarily, assuming that the participants are A, B, and C, and A is speaking, when it is determined that A is the speaker, then B and C are the listeners, so that the target language of A, the target language of B, and the target language of C can be selected Find the target languages corresponding to listeners B and C.
在本申请的一些实施例中,中枢设备中内置有语音识别(ASR,Automatic Speech Recognition)、语音合成(TTS,Text-To-Speech)、声纹识别、翻译、记录等功能(支持在线或者离线模式),具有联网和通信功能,可以与控制终端和一体终端进行数据交互。In some embodiments of the present application, the hub device has built-in functions such as speech recognition (ASR, Automatic Speech Recognition), speech synthesis (TTS, Text-To-Speech), voiceprint recognition, translation, and recording (support online or offline) Mode), which has networking and communication functions, and can interact with control terminals and integrated terminals.
可以理解的是,中枢设备将声纹识别、ASR技术、机器翻译技术、TTS技术结合,构建了一套访谈场景下的同声传译系统,解决了不同语言之间的沟通障碍。It is understandable that the central equipment combines voiceprint recognition, ASR technology, machine translation technology, and TTS technology to build a simultaneous interpretation system in interview scenarios, which solves the communication barriers between different languages.
在本申请实施例中,中枢设备是采用声纹识别的技术,从预设映射关系中的参与者的声纹信息中,识别出与待同传语音数据匹配的目标声纹信息的,且采用机器翻译技术,实时将待同传语音数据同传翻译为收听者目标语言,得到了各个收听者可以听懂的翻译信息。In the embodiment of this application, the hub device uses the voiceprint recognition technology to identify the target voiceprint information matching the voice data to be simultaneously transmitted from the voiceprint information of the participants in the preset mapping relationship, and adopts Machine translation technology translates the voice data to be simultaneously interpreted into the listener's target language in real time, and obtains the translation information that each listener can understand.
在本申请的一些实施例中,中枢设备在实时翻译出各个收听者需要的翻译信息之后,就可以实时发送翻译信息至参加语音访谈的收听者的接收终端,供收听者通过各自对应的接收终端听到自己熟悉语言的发言信息,即翻译语音数据。In some embodiments of the present application, after the central device translates the translation information required by each listener in real time, it can send the translation information to the receiving terminal of the listener participating in the voice interview in real time for the listener to use their corresponding receiving terminal Hear the spoken information in the language you are familiar with, that is, translate the voice data.
需要说明的是,中枢设备在与各个一体终端进行通信时,存储有各个一体终端与参与者的对应关系的,即中枢设备可以将要发给参与者的数据准确的发送到与参与者对应的一体终端中。这样,中枢设备就可以将根据各个收听者目标语言得到的各自的翻译信息,对应发送给收听者的接收终端 上了。It should be noted that when the hub device communicates with each integrated terminal, the corresponding relationship between each integrated terminal and the participant is stored, that is, the hub device can accurately send the data to be sent to the participant to the corresponding integrated terminal. In the terminal. In this way, the hub device can correspondingly send the respective translation information obtained according to the target language of each listener to the receiving terminal of the listener.
在本申请的一些实施例中,翻译信息中包含有翻译文本信息和翻译语音数据。这是中枢设备将待同传语音数据翻译为翻译文本信息后,采用TTS技术,将翻译文本信息转换为翻译语音数据的。这样,中枢设备就可以实时发送翻译语音数据和翻译文本信息至参加语音访谈的收听者的接收终端了。In some embodiments of the present application, the translation information includes translated text information and translated voice data. This is the central device that uses TTS technology to convert the translated text information into translated voice data after translating the voice data to be transcribed into translated text information. In this way, the central device can send the translated voice data and the translated text information to the receiving terminal of the listener participating in the voice interview in real time.
在本申请实施例中,若一体终端上设置有显示设备时,翻译文本信息还可以显示在显示器上,供收听者观看,具体实现本申请实施例不作限制。In the embodiment of the present application, if a display device is provided on the integrated terminal, the translated text information may also be displayed on the display for the listener to watch. The specific implementation of the embodiment of the present application is not limited.
S103、记录与待同传语音数据对应的采集时间、发言者身份信息和翻译信息,得到一个记录片段信息,进而在语音访谈结束时,获取到至少一个记录片段信息。S103. Record the collection time, speaker identity information, and translation information corresponding to the voice data to be simultaneously translated to obtain one piece of recorded information, and then at the end of the voice interview, obtain at least one piece of recorded information.
中枢设备在得到各个收听者对应的翻译信息之后,就可以通过记录与待同传语音数据对应的采集时间、发言者身份信息和翻译信息,得到发言者的一个记录片段信息,然后继续下一个发言者的记录片段信息的获取,这样在语音访谈结束时,中枢设备就可以获取到不同发言者对应的至少一个记录片段信息了。After the central device obtains the translation information corresponding to each listener, it can obtain a recorded segment information of the speaker by recording the collection time, speaker identity information and translation information corresponding to the voice data to be simultaneously interpreted, and then continue to the next speech Obtain the recorded segment information of the speaker, so that at the end of the voice interview, the central device can obtain at least one recorded segment information corresponding to different speakers.
需要说明的是,一个记录片段信息中可以包括待同传语音数据、发言者身份信息、翻译信息和采集时间。每个记录片段都可以采用字段进行数据记录。It should be noted that a piece of recording information may include voice data to be interpreted, speaker identity information, translation information, and collection time. Each record fragment can use fields for data recording.
在本申请的一些实施例中,中枢设备可以对待同传语音数据进行文本识别,得到源文本信息;记录与待同传语音数据对应的采集时间、发言者身份信息、翻译信息和源文本信息,直至目标声纹信息发生改变时,得到一个记录片段信息,进而在语音访谈结束时,获取到至少一个记录片段信息。In some embodiments of the present application, the hub device can perform text recognition on the voice data to be interpreted to obtain source text information; record the collection time, speaker identity information, translation information, and source text information corresponding to the voice data to be interpreted simultaneously, Until the target voiceprint information changes, one piece of recorded information is obtained, and then at the end of the voice interview, at least one piece of recorded information is obtained.
进一步地,在本申请实施例中,一个记录片段信息中还可以包括:源文本信息。Further, in this embodiment of the present application, a piece of record segment information may also include: source text information.
需要说明的是,中枢设备是采用ASR技术,将待同传语音数据转换为源文本信息的。It should be noted that the central device adopts ASR technology to convert the voice data to be transcribed into source text information.
在本申请的实施例中,在多人访谈场景下,一般是多个参与者交叉轮流说话,也有可能某个发言者在说话时被打断;被打断可以认为是发言对象发生了切换,记录片段的划分以说话人切换或者结束为边界。也就是说,中枢设备针对实时传输来的待同传语音数据,基于声纹识别技术识别发言者的身份,当接收到的某一时刻的发言者的身份识别出来发生变化时,表征一发言者发言结束,下一个发言者开始了发言,这样就返回执行S101的实现,而将上一个发言者对应的信息记录为上述一个记录片段信息,这样在访谈结束时,中枢设备就可以获取到至少一个记录片段信息了。In the embodiment of this application, in a multi-person interview scenario, multiple participants are generally speaking alternately and alternately, and it is also possible that a certain speaker is interrupted while speaking; being interrupted can be considered as a switch of the speaking object. The division of recorded segments is based on the speaker switching or ending. That is to say, the central device recognizes the speaker's identity based on the voiceprint recognition technology for the voice data to be transmitted in real time, and when the identity of the speaker at a certain moment is received changes, it represents a speaker When the speech is over, the next speaker starts to speak, so it returns to the implementation of S101, and the information corresponding to the previous speaker is recorded as the above-mentioned record fragment information, so that at the end of the interview, the central device can obtain at least one Record fragment information.
需要说明的是,至少一个记录片段信息中可能包括同一个发言者在不同时刻的记录片段信息,具体依据实际记录情况为准,本申请实施例不作限制。It should be noted that at least one recorded segment information may include recorded segment information of the same speaker at different moments, which is based on actual recording conditions and is not limited in the embodiment of the present application.
示例性的,在访谈开始之前,首先录入每个与会人的声纹和名字。访谈开始后,针对每个人说的每一句话,都会被保存下来,以完整的片段为单位,一个记录片段信息可以记录如表1所示:Exemplarily, before the start of the interview, first enter the voiceprint and name of each participant. After the start of the interview, every sentence spoken by each person will be saved. In the unit of a complete segment, the information of a record segment can be recorded as shown in Table 1:
表1Table 1
字段Field 字段的意义The meaning of the field  To
说话人名字Speaker's name 发音人名字Speaker's name 根据声纹识别得到According to the voiceprint recognition
时间戳Timestamp 发言开始时间戳Speech start timestamp  To
音频Audio 发言对应的源语言音频The source language audio corresponding to the speech 麦克风采集得到Microphone collection
发言文本Speech text 源语言发言音频对应的文本The text corresponding to the source language speech audio ASR得到ASR gets
翻译文本Translated text 翻译为目标语言的发言文本Speaking text translated into the target language 翻译得到Translate
其中,说话人名字为发言者身份信息,时间戳为采集时间,音频为待同传语音数据,发言文本为源文本信息,翻译文本为翻译信息。Among them, the speaker's name is the speaker's identity information, the time stamp is the collection time, the audio is the voice data to be interpreted simultaneously, the speech text is the source text information, and the translated text is the translation information.
S104、基于至少一个记录片段信息,生成访谈记录。S104: Generate an interview record based on the information of at least one record segment.
中枢设备在访谈结束时,记录了至少一个记录片段信息,这样中枢设备就可以基于至少一个记录片段信息,生成针对此次访谈的访谈记录了。At the end of the interview, the central device records at least one recorded piece of information, so that the central device can generate an interview record for the interview based on the at least one recorded piece of information.
在本申请实施例中,中枢设备是可以与控制终端通信的,而控制终端用于接收输入的一些常规设置,比如目标语言、人数,以及每个一体终端的收听语言等等。除此之外,控制终端还可以控制中枢设备的功能实现。例如,访谈记录生成功能等。In this embodiment of the application, the hub device can communicate with the control terminal, and the control terminal is used to receive some conventional settings for input, such as the target language, the number of people, and the listening language of each integrated terminal. In addition, the control terminal can also control the realization of the functions of the central device. For example, the function of generating interview records.
在本申请的一些实施例中,中枢设备可以接收控制终端发送的访谈生成指令;响应于访谈生成指令,对至少一个记录片段信息按照时间轴的顺序,生成访谈记录;中枢设备发送访谈记录给控制终端。In some embodiments of the present application, the hub device may receive an interview generation instruction sent by the control terminal; in response to the interview generation instruction, generate interview records for at least one piece of record information in the order of the time axis; the hub device sends the interview records to the control terminal.
需要说明的是,控制终端侧可以设置有输入器件,用户在语音访谈结束后,可以通过输入器 件,生成访谈生成指令,然后将访谈生成指令发送给中枢设备,这样,该中枢设备中已经记录了关于语音访谈的至少一个记录片段信息,于是,中枢设备就将至少一个记录片段信息生成访谈记录,再发送访谈记录给控制终端,供控制终端对访谈记录进行呈现。It should be noted that the control terminal side can be provided with an input device. After the voice interview is over, the user can generate an interview generation instruction through the input device, and then send the interview generation instruction to the central device, so that the central device has recorded Regarding at least one recorded segment information of the voice interview, the central device generates an interview record with at least one recorded segment information, and then sends the interview record to the control terminal for the control terminal to present the interview record.
可以理解的是,由于中枢设备可以在语音访谈场景中,针对发言者的待同传语音数据,确定出发言者身份信息,以及获取满足收听者需求的语言的翻译信息,在访谈结束时,可以基于上述信息生成针对此次访谈的访谈记录,这样中枢设备在针对待同传语音数据进行实时同传翻译的同时,还可以记录下来确定出的发言者身份信息、翻译信息等结构化数据,作为记录片段信息,最后在访谈结束的时候,将得到多个记录片段信息生成此次语音访谈的访谈记录,因此,提高了语音访谈中的数据整理的效率,即提高了语音访谈的访谈记录的生成速度和处理效率。It is understandable that because the central device can determine the identity information of the speaker and obtain the translation information in the language that meets the needs of the listener based on the voice data of the speaker in the voice interview scene. Based on the above information, the interview record for this interview is generated, so that the central device can perform real-time simultaneous interpretation of the voice data to be simultaneously interpreted, and at the same time, it can also record the identified speaker identity information, translation information and other structured data. Record fragment information, and finally at the end of the interview, multiple recorded fragment information will be obtained to generate the interview record of the voice interview. Therefore, the efficiency of data collation in the voice interview is improved, that is, the generation of the interview record of the voice interview is improved Speed and processing efficiency.
在本申请的一些实施例中,如图3所示,在S103之后,本申请实施例还提供了一种语音信息处理方法,包括:S105-S106。如下:In some embodiments of the present application, as shown in FIG. 3, after S103, an embodiment of the present application further provides a voice information processing method, including: S105-S106. as follows:
S105、采用摘要提取技术,对至少一个记录片段信息进行摘要提取,提取出全文摘要信息;S105. Use abstract extraction technology to abstract at least one piece of record information to extract full text abstract information;
S106、基于至少一个记录片段信息和全文摘要信息,生成访谈记录。S106: Generate an interview record based on at least one piece of record information and full text summary information.
中枢设备在访谈结束时,记录了至少一个记录片段信息,此外,中枢设备还可以采用摘要提取技术(比如TextRank算法),对至少一个记录片段信息进行摘要提取,提取出全文摘要信息,其中,全文摘要信息表征此次语音访谈中发言者们谈论的主要内容的总结,即,全文摘要信息是在汇总各个发言人的全部发言之后,提取出来的全文总结。这样中枢设备就可以基于至少一个记录片段信息和全文摘要信息,生成针对此次访谈的访谈记录了。At the end of the interview, the hub device recorded at least one piece of record information. In addition, the hub device can also use abstract extraction technology (such as the TextRank algorithm) to extract at least one piece of record information to extract the full text summary information. The summary information represents the summary of the main content discussed by the speakers in this voice interview, that is, the full-text summary information is the full-text summary extracted after summarizing all the speeches of each speaker. In this way, the central device can generate an interview record for this interview based on at least one piece of record information and full-text summary information.
在本申请实施例中,全文摘要信息可以放在访谈记录的开头处,以方便用户大概了解访谈内容,决定是否要继续阅读访谈记录。In the embodiments of the present application, the full text summary information can be placed at the beginning of the interview record to facilitate the user to roughly understand the content of the interview and decide whether to continue reading the interview record.
需要说明的是,信息爆炸时代,用户需要快速的获取信息。当发言人某个发言片段较长时,可以应用摘要提取技术,提取出核心总结,提高用户阅读访谈记录的效率。It should be noted that in the era of information explosion, users need to obtain information quickly. When a spokesperson has a long speech segment, abstract extraction technology can be used to extract the core summary, which improves the efficiency of users reading interview records.
在本申请的一些实施例中,如图4所示,在S103中的记录与待同传语音数据对应的采集时间、发言者身份信息和翻译信息,得到一个记录片段信息之后,还包括:S107-S109。如下:In some embodiments of the present application, as shown in FIG. 4, the recording in S103 corresponds to the collection time, speaker identity information, and translation information corresponding to the voice data to be interpreted. After obtaining a piece of record information, it further includes: S107 -S109. as follows:
S107、采用摘要提取技术,对一个记录片段信息进行摘要提取,提取出发言者摘要信息。S107. Using the abstract extraction technology, perform abstract extraction on a piece of record information, and extract the summary information of the speaker.
S108、在语音访谈结束时,获取到至少一个记录片段信息和至少一个发言者摘要信息;S108. At the end of the voice interview, obtain at least one recorded segment information and at least one speaker summary information;
S109、基于至少一个记录片段信息和至少一个发言者摘要信息,生成访谈记录。S109. Generate an interview record based on the at least one record segment information and the at least one speaker summary information.
在本申请实施例中,中枢设备在记录与待同传语音数据对应的采集时间、发言者身份信息和翻译信息,得到一个记录片段信息之后,中枢设备可以采用摘要提取技术,对一个记录片段信息进行摘要提取,提取出发言者摘要信息,也即是说,在每一个记录片段信息产生之后,都可以同时提取出每个记录片段信息的发言者摘要信息,这样,在语音访谈结束时,获取到至少一个记录片段信息和至少一个发言者摘要信息;最后,中枢设备就可以基于至少一个记录片段信息和至少一个发言者摘要信息,生成访谈记录。In the embodiment of this application, after the hub device records the collection time, speaker identity information, and translation information corresponding to the voice data to be simultaneously translated, and obtains a piece of record information, the hub device can use the abstract extraction technology to compare a piece of record information. Perform abstract extraction to extract the speaker summary information, that is, after each recorded segment information is generated, the speaker summary information of each recorded segment information can be extracted at the same time. In this way, at the end of the voice interview, the At least one recorded segment information and at least one speaker summary information; finally, the hub device can generate an interview record based on at least one recorded segment information and at least one speaker summary information.
在本申请的一些实施例中,中枢设备也可以在获取到了至少一个记录片段信息之后,再对每个记录片段信息进行摘要提取,得到至少一个发言者摘要信息,本申请实施例不作限制。In some embodiments of the present application, the hub device may also perform summary extraction on each recorded segment information after acquiring at least one recorded segment information to obtain at least one speaker summary information, which is not limited in the embodiment of the present application.
可以理解的是,中枢设备可以将每一个记录片段信息的中心思想总结成发言者摘要信息,这样就可以提高生成的访谈记录中的每个发言信息的阅读效率。It is understandable that the central device can summarize the central idea of each recorded piece of information into speaker summary information, so that the reading efficiency of each spoken information in the generated interview record can be improved.
在本申请的一些实施例中,访谈记录的生成还可以同时基于全文摘要信息,至少一个发言者摘要信息和至少一个记录片段信息生成,这样生成的访谈记录不仅包含全文的内容总结,还可以包括每个发言信息的内容总结,大大提高了访谈记录中的主要思想的体现,进而提高了访谈记录的丰富性和多样性。In some embodiments of the present application, the generation of interview records can also be based on full-text summary information, at least one speaker summary information, and at least one piece of record information at the same time. In this way, the generated interview records not only include the content summary of the full text, but also The summary of the content of each speech information greatly improves the embodiment of the main ideas in the interview record, thereby increasing the richness and diversity of the interview record.
在本申请的一些实施例中,中枢设备中还可以设置有扬声器,或者连接有扬声器,控制终端可以控制该扬声器的播放语言,可以通过扬声器将发言者的发言转换成播放语言,在访谈场合中进行播放。In some embodiments of the application, the hub device may also be provided with a speaker or connected to a speaker. The control terminal can control the playback language of the speaker, and the speaker's speech can be converted into the playback language through the speaker. Play it.
图5为本申请实施例还提供的一种语音信息处理方法的流程示意图一。如图5所示,应用于控制终端中,语音信息处理方法包括以下步骤:FIG. 5 is a first schematic flowchart of a voice information processing method provided by an embodiment of this application. As shown in Figure 5, when applied to a control terminal, the voice information processing method includes the following steps:
S201、接收参与者身份信息、目标语言以及参与者的声纹信息。S201: Receive participant identity information, target language, and participant's voiceprint information.
S202、将参与者身份信息、目标语言以及参与者的声纹信息构成的预设映射关系发送至中枢设备。S202: Send the preset mapping relationship formed by the participant's identity information, the target language, and the participant's voiceprint information to the central device.
在语音访谈的场景中,控制终端可以是智能手机、平板电脑、计算机等安装了指定应用(例如实现本申请提供的语音处理方法的同声传译应用等)的智能设备,本申请实施例不作限制,控制终 端可以和中枢装置通信。控制终端控制终端侧可以设置有输入器件,可以输入一些常规设置,比如目标语言、人数,以及每个耳机麦克风终端的收听语言。In the voice interview scenario, the control terminal may be a smart device installed with a designated application (such as a simultaneous interpretation application that implements the voice processing method provided in this application), such as a smart phone, a tablet computer, or a computer, which is not limited in the embodiment of this application. , The control terminal can communicate with the central device. The control terminal can be provided with an input device on the control terminal side, which can input some general settings, such as the target language, the number of people, and the listening language of each earphone microphone terminal.
在本申请实施例中,在进行语音访谈之前,用户可以通过控制终端先进行参与者的相关信息的录入,例如,参与者的身份信息、目标语言;同时,还通过一体终端录音采集参与者声纹信息,通过中枢设备参与者声纹信息发送给控制终端后,控制终端将参与者身份信息、参与者声纹信息,以及目标语言对应起来,构成了预设映射关系,最后,控制终端可以将预设映射关系发送给中枢设备,供中枢设备使用。In the embodiment of the present application, before the voice interview is conducted, the user can enter the relevant information of the participant through the control terminal, for example, the identity information of the participant and the target language; at the same time, the participant’s voice can also be collected through the integrated terminal recording. After the voiceprint information of the participant is sent to the control terminal through the central device, the control terminal corresponds the participant’s identity information, the participant’s voiceprint information, and the target language to form a preset mapping relationship. Finally, the control terminal can The preset mapping relationship is sent to the central device for use by the central device.
在本申请的一些实施例中,控制终端还可以将参与者身份信息、目标语言发送给中枢设备,且一体终端将参与者声纹信息发送给中枢设备,由中枢设备将参与者身份信息、参与者声纹信息,以及目标语言对应起来,构成了预设映射关系。因此,预设映射关系的获取过程可以根据实际情况进行选择,本申请实施例不作限制。In some embodiments of the present application, the control terminal may also send participant identity information and target language to the central device, and the integrated terminal sends the participant’s voiceprint information to the central device, and the central device transmits the participant’s identity information and participation The voiceprint information of the speaker and the target language correspond to form a preset mapping relationship. Therefore, the process of obtaining the preset mapping relationship can be selected according to actual conditions, and the embodiment of the present application does not limit it.
其中,参与者身份信息可以为参与者的姓名,或者唯一的身份标识等,本申请实施例不作限制。Among them, the participant's identity information may be the participant's name, or a unique identification, etc., which is not limited in the embodiment of the present application.
需要说明的是,在本申请实施例中,预设映射关系为参与者身份信息、目标语言以及参与者的声纹信息之间的对应关系,可以表示为预设映射关系中包括参与者身份信息与目标语言的对应关系、参与者身份信息与参与者的声纹信息的对应关系、目标语言与参与者的声纹信息的关系、参与者的声纹信息库、参与者身份信息库和目标语言库等。It should be noted that in this embodiment of the application, the preset mapping relationship is the corresponding relationship between the participant's identity information, the target language, and the participant's voiceprint information, which can be expressed as the preset mapping relationship includes the participant's identity information Correspondence with the target language, the corresponding relationship between the participant's identity information and the participant's voiceprint information, the relationship between the target language and the participant's voiceprint information, the participant's voiceprint information database, the participant's identity information database and the target language Library etc.
S203、在访谈结束时,接收访谈触发指令,响应于访谈触发指令,生成访谈生成指令。S203. At the end of the interview, an interview trigger instruction is received, and an interview generation instruction is generated in response to the interview trigger instruction.
S204、将访谈生成指令发送至中枢设备。S204. Send the interview generation instruction to the central device.
S205、接收中枢设备针对访谈生成指令反馈的访谈记录,访谈记录为中枢设备响应访谈生成指令,基于预设映射关系和实时接收的待同传语音数据生成的。S205. Receive the interview record of the central device's response to the interview generation instruction feedback. The interview record is generated by the central device in response to the interview generation instruction, based on the preset mapping relationship and real-time received voice data for simultaneous interpretation.
在本申请实施例中,控制终端可以控制中枢设备进行访谈记录的生成,在访谈结束时,用户可以通过控制终端,触发访谈记录生成功能,产生访谈触发指令,生成访谈生成指令,并将访谈生成指令发送给中枢设备,使得中枢设备在针对待同传语音数据进行实时同传翻译的同时,还可以记录下来确定出的发言者身份信息、翻译信息等结构化数据,作为记录片段信息,即至少一个记录片段信息,于是,中枢设备就将至少一个记录片段信息生成访谈记录,再发送访谈记录给控制终端,供控制终端对访谈记录进行呈现。In the embodiment of this application, the control terminal can control the central device to generate interview records. At the end of the interview, the user can trigger the interview record generation function through the control terminal, generate interview trigger instructions, generate interview generation instructions, and generate interviews The instruction is sent to the central device, so that the central device can record the determined speaker identity information, translation information and other structured data as the recorded fragment information while performing real-time simultaneous interpretation of the voice data to be simultaneously interpreted. One piece of record information, then, the central device generates interview records from at least one piece of record information, and then sends the interview records to the control terminal for the control terminal to present the interview records.
其中,访谈记录为中枢设备响应访谈生成指令,基于预设映射关系和实时接收的待同传语音数据生成的。Among them, the interview record is generated by the central device in response to the interview generation instruction, based on the preset mapping relationship and the voice data to be simultaneously transmitted received in real time.
可以理解的是,控制终端可以从中枢设备中获取到访谈记录,访谈记录是记录了语音访谈内容的信息,这样,用户可以通过控制终端获取或者观看到访谈记录,方便且快捷,提供了控制终端的智能性。It is understandable that the control terminal can obtain the interview record from the central device. The interview record is the information that records the content of the voice interview. In this way, the user can obtain or watch the interview record through the control terminal, which is convenient and fast. A control terminal is provided. The intelligence.
在本申请的一些实施例中,如图6所示,在S205之后,还包括:S206,或者,S207-S208,或者,S209-S211。如下:In some embodiments of the present application, as shown in FIG. 6, after S205, it further includes: S206, or, S207-S208, or, S209-S211. as follows:
S206、对访谈记录按照时间轴顺序进行展示。S206: Display the interview records in order of time axis.
控制终端获取了访谈记录后,由于访谈记录中记录的语音访谈中的发言人的发言相关信息,且发言人的发言顺序具有时间性,因此,访谈记录中包含的内容也可以是具有时间性的,这样,控制终端就可以按照时间轴顺序,将访谈记录进行展示。After the control terminal obtains the interview record, because the spokesperson’s speech related information in the voice interview recorded in the interview record, and the speech sequence of the spokesperson is temporal, the content contained in the interview record can also be temporal In this way, the control terminal can display the interview records in the order of the time axis.
在本申请实施例中,访谈记录中按照每次发言对应一段访谈记录的情况下,访谈记录中的每一段访谈记录可以包括:发言者身份信息、采集时间、待同传语音数据和翻译信息。这样,控制终端可以对访谈记录中的每一段访谈记录按照采集时间进行时间轴顺序的排列;并将排列后的每一段访谈记录对应的发言者身份信息、待同传语音数据和翻译信息进行展示。In the embodiment of the present application, in the case that each speech corresponds to a segment of the interview record in the interview record, each segment of the interview record in the interview record may include: speaker identity information, collection time, voice data to be simultaneously translated, and translation information. In this way, the control terminal can arrange each interview record in the interview record in the order of the time axis according to the collection time; and display the identity information of the speaker, the voice data to be transcribed and the translation information corresponding to each interview record after the arrangement. .
需要说明的是,排列后的每一段访谈记录对应的待同传语音数据和翻译信息可以通过功能按键依次排布在发言者身份信息的对应区域,当功能按键被触发,对应展示该功能按键相应的内容即可。其中,待同传语音数据翻译成的文本信息,即源文本信息也可以由中枢设备传输给控制终端,翻译信息中也可以包括:翻译文本信息和翻译语音数据,这样便于展示的便利性,功能按键可以设置为一种内容对应一个,还可以设置有多个内容对应两个按键联合实现。例如,源文本信息对应一个功能按键1,翻译文本信息对应一个功能按键2,语音数据对应一个功能按键3,在1和3同时触发时,播放待同传语音数据;在2和3同时触发时,播放翻译语音数据;在1触发时,展示源文本信息;在3触发时,展示翻译文本信息。It should be noted that the voice data and translation information corresponding to each interview record after the arrangement can be arranged in the corresponding area of the speaker’s identity information through the function buttons. When the function button is triggered, the corresponding function button is displayed. The content can be. Among them, the text information to be translated into simultaneous voice data, that is, the source text information, can also be transmitted by the central device to the control terminal. The translation information can also include: translated text information and translated voice data, which is convenient for display and function The key can be set to one type of content corresponding to one, or multiple content corresponding to two keys can be combined to realize the joint implementation. For example, source text information corresponds to a function button 1, translated text information corresponds to a function button 2, and voice data corresponds to a function button 3. When 1 and 3 are triggered at the same time, the voice data to be transmitted will be played; when 2 and 3 are triggered at the same time , Play the translated voice data; when triggered by 1, display the source text information; when triggered by 3, display the translated text information.
在本申请的一些实施例中,还可以设置一个对照按键,用于同时对照展示源文本信息和翻译本文信息等,本申请实施例不作限制基于访谈记录展示的内容的按键设置方式和排布。In some embodiments of the present application, a comparison button can also be set to simultaneously display source text information and translate text information, etc. The embodiments of this application do not limit the setting method and arrangement of the buttons based on the content displayed in the interview record.
需要说明的是,针对不同的收听者,由于收听者目标语言的不同,生成的访谈记录中的翻译信息就会不同,因此,可以针对每个收听者分别生成自己对应的访谈记录,还可以将所有的发言均翻译成通用的语言,本申请实施例不作限制。It should be noted that for different listeners, due to the different target languages of the listeners, the translated information in the generated interview records will be different. Therefore, you can generate your own corresponding interview records for each listener, and you can also All speeches are translated into a common language, which is not limited in the embodiments of this application.
示例性的,采用发言者语言为英文,翻译语言为中文进行举例。如图7和8所示的一种访谈记录的展示界面,控制终端将获取的Xxx访谈记录,展示界面中设置有“原”“译”“音”“对照”四种功能按键,发言者包括:Speaker A、Speaker B和Speaker C,每个发言者身份信息后面对应的区域1中设置有“原”“译”“音”“对照”四种功能按键,每个发言者身份信息的下面的区域2中,都设置有采集时间(例如,Speaker A:2019.08.31 22:10:15;Speaker B:2019.08.31 22:12:10;Speaker C:2019.08.31 22:13:08),即发言开始的时间,且访谈记录是按照发言时间排列展示每一段访谈记录的。当触发Speaker A对应的“译”时,在区域3中展示翻译文本信息:“我认为,中国将在5G竞赛中遥遥领先!在未来一到两年内,5G将开始落地应用,并取得爆发式增长”。当触发Speaker B对应的“原”时,在区域4中展示源文本信息:“I agree with you very much,I think 5G will bring a lot of new opportunities”。当触发Speaker C对应的“音”时,在区域5中播放翻译语音数据。当触发Speaker A对应的“对照”时,在区域6中展示源文本信息:“I believe that China will lead in the 5G competition!In the next one to two years,5G will begin to apply and achieve explosive growth”与翻译文本信息:“我认为,中国将在5G竞赛中遥遥领先!在未来一到两年内,5G将开始落地应用,并取得爆发式增长”的对照。从图中可以看到,每个发言者的发言的访谈记录片段,都有时间戳,并可以选择原文、译文、音频、对照几个选项。Exemplarily, the speaker’s language is English and the translation language is Chinese. As shown in Figures 7 and 8 for the display interface of interview records, the control terminal will obtain Xxx interview records. The display interface is equipped with four function buttons: "original", "translation", "sound", and "contrast". The speakers include :Speaker A, Speaker B and Speaker C. There are four function buttons "original", "translation", "sound" and "contrast" in the corresponding area 1 behind each speaker's identity information. In area 2, collection time is set (for example, Speaker A: 2019.08.31 22:10:15; Speaker B: 2019.08.31 22:12:10; Speaker C: 2019.08.31 22:13:08), that is The start time of the speech, and the interview records are arranged according to the time of the speech to display each interview record. When the "translation" corresponding to Speaker A is triggered, the translated text information is displayed in area 3: "I think China will be far ahead in the 5G competition! In the next one to two years, 5G will begin to be applied and achieved explosive growth. increase". When the "original" corresponding to Speaker B is triggered, the source text information is displayed in area 4: "I agree with you very much, I think 5G will bring a lot of new opportunities". When the "tone" corresponding to Speaker C is triggered, the translated voice data is played in area 5. When the "control" corresponding to Speaker A is triggered, the source text information is displayed in area 6: "I believe that China will lead in the 5G competition! In the next one to two years, 5G will begin to apply and achieve explosive growth" Contrast with the translated text message: "I think China will be far ahead in the 5G race! In the next one to two years, 5G will begin to be applied and achieve explosive growth". As can be seen from the figure, the interview record of each speaker's speech has a time stamp, and you can choose the original text, translation, audio, and comparison options.
在本申请的一些实施例中,访谈记录中的每一段访谈记录还包括:发言者摘要信息;控制终端将排列后的每一段访谈记录对应的发言者身份信息、待同传语音数据和翻译信息进行展示之后,该控制终端还可以在每一段访谈记录的展示区域中的第一预设区域中,显示发言者摘要信息。In some embodiments of the application, each segment of the interview record in the interview record also includes: speaker summary information; the control terminal will arrange each segment of the interview record corresponding to the speaker's identity information, the voice data to be interpreted, and translation information After the display, the control terminal may also display the speaker summary information in the first preset area in the display area of each interview record.
需要说明的是,在本申请实施例中,如果发言人的发言片段比较长(比如超过70个字),中枢设备生成发言片段摘要(即发言者摘要信息),并将发言者摘要信息携带在访谈记录中发送给控制终端,可以供读者在控制终端上进行发言者摘要信息的选读,从而选择具体的发言者的发言信息的选取。It should be noted that, in this embodiment of the application, if the speaker’s speech fragment is relatively long (for example, more than 70 words), the central device generates a speech fragment summary (that is, speaker summary information), and carries the speaker summary information in The interview record is sent to the control terminal for readers to select and read the speaker's summary information on the control terminal, so as to select the selection of the specific speaker's speech information.
可以理解的是,中枢设备可以将每一个记录片段信息的中心思想总结成发言者摘要信息,这样通过控制终端展示的发言者摘要信息就可以供读者快速的了解每一段访谈记录的主要内容,从而提高生成的访谈记录中的每个发言信息的阅读效率。It is understandable that the central device can summarize the central idea of each recorded piece of information into speaker summary information, so that the speaker summary information displayed by the control terminal can allow readers to quickly understand the main content of each interview record, thereby Improve the reading efficiency of each statement in the generated interview records.
示例性的,如图9所示,针对Speaker A的发言,翻译文本信息为:“我认为,中国将在5G竞赛中遥遥领先!在未来一到两年内,5G将开始落地应用,并取得爆发式增长。在之前234G时代,中国一直处于被动状态,这让外国人认为中国没有示例率先研发出5G。万万没想到,中国成了5G时代的领跑者”,提取的发言者摘要为“A认为中国5G处于领先地位,并取得大规模增长”,这样控制终端就可以在访谈记录的展示界面上的区域11中展示发言者摘要了。Exemplarily, as shown in Figure 9, for Speaker A's speech, the translated text information is: "I think China will be far ahead in the 5G competition! In the next one to two years, 5G will begin to be applied and achieve a burst In the previous 234G era, China has always been in a passive state, which makes foreigners think that China has no examples to develop 5G first. Unexpectedly, China has become a leader in the 5G era.” The abstract of the speaker extracted was “A It is believed that China’s 5G is in a leading position and has achieved large-scale growth”, so that the control terminal can display the speaker summary in area 11 on the display interface of the interview record.
需要说明的是,每个发言者的发言者摘要可以与每段访谈记录一起展示的,也可以是在访谈记录的展示界面通过用户的操控,再展示在区域E中的,具体的实现方式本申请实施例不作限制。It should be noted that the speaker summary of each speaker can be displayed together with each interview record, or it can be displayed in the area E through the user's manipulation on the display interface of the interview record. The specific implementation method is this The application examples are not limited.
在本申请的一些实施例中,访谈记录中还包括:全文摘要信息;控制终端将排列后的每一段访谈记录对应的发言者身份信息、待同传语音数据和翻译信息进行展示时,同时控制终端还可以将全文摘要信息展示在排列后的每一段访谈记录对应的发言者身份信息、待同传语音数据和翻译信息的前面。In some embodiments of the application, the interview record also includes: full-text summary information; when the control terminal displays the speaker identity information, the voice data to be simultaneously translated and the translation information corresponding to each segment of the interview record after being arranged, it controls at the same time The terminal can also display the full-text summary information in front of the speaker identity information, the voice data to be transcribed, and the translation information corresponding to each interview record after the arrangement.
其中,中枢设备可以采用摘要提取技术(比如TextRank算法),对至少一个记录片段信息进行摘要提取,提取出全文摘要信息,其中,全文摘要信息表征此次语音访谈中发言者们谈论的主要内容的总结,即,全文摘要信息是在汇总各个发言人的全部发言之后,提取出来的全文总结。这样中枢设备就可以基于至少一个记录片段信息和全文摘要信息,生成针对此次访谈的访谈记录了,并发送给控制终端。Among them, the central device can use abstract extraction technology (such as the TextRank algorithm) to abstract at least one recorded segment information to extract full-text summary information, where the full-text summary information represents the main content of the speakers in the voice interview Summary, that is, the full-text summary information is the full-text summary extracted after summarizing all the speeches of each speaker. In this way, the central device can generate an interview record for this interview based on at least one recorded fragment information and full-text summary information, and send it to the control terminal.
在本申请实施例中,全文摘要信息可以放在访谈记录的开头处,以方便用户大概了解访谈内容,决定是否要继续阅读访谈记录。In the embodiments of the present application, the full text summary information can be placed at the beginning of the interview record to facilitate the user to roughly understand the content of the interview and decide whether to continue reading the interview record.
需要说明的是,信息爆炸时代,用户需要快速的获取信息。当发言人某个发言片段较长时,可以应用摘要提取技术,提取出核心总结,提高用户阅读访谈记录的效率。It should be noted that in the era of information explosion, users need to obtain information quickly. When a spokesperson has a long speech segment, abstract extraction technology can be used to extract the core summary, which improves the efficiency of users reading interview records.
示例性的,如图10所示,针对Xxx访谈记录,在全文记录,即多个片段的访谈记录展示的最前面的区域F中,展示的是此次语音访谈的全文摘要信息:“全文摘要:5G新时代,ABC发表了重要展望。A认为中国5G遥遥领先,并将大规模商业化进而带来无限机会,BC对此表示认同”。Exemplarily, as shown in Figure 10, for the Xxx interview record, in the full-text record, that is, in the frontmost area F where the interview record of multiple fragments is displayed, the full-text summary information of the voice interview is displayed: "Full-text summary : In the new era of 5G, ABC has issued an important outlook. A believes that China’s 5G is far ahead and will bring infinite opportunities for large-scale commercialization. BC agrees with this.”
S207、接收编辑指令;S207. Receive an editing instruction;
S208、响应编辑指令,对访谈记录进行编辑,得到最终访谈记录并展示。S208. In response to the editing instruction, edit the interview record to obtain the final interview record and display it.
在控制终端接收到了访谈记录之后,可能机器处理后的访谈记录仍然存在错误或者需要进行人工添加润色等,这时,控制终端还提供了一种可编辑功能,当用户触发可编辑功能开启时,即控制终端接收到编辑指令,这样在访谈记录的展示界面中,响应编辑指令,获取编辑信息,根据获取的编辑信息对访谈记录进行编辑,得到最终访谈记录并展示出来。After the control terminal receives the interview record, there may still be errors in the interview record processed by the machine or need to be manually added and polished. At this time, the control terminal also provides an editable function. When the user triggers the editable function to be turned on, That is, the control terminal receives the editing instruction, so in the display interface of the interview record, it responds to the editing instruction, obtains the editing information, edits the interview record according to the obtained editing information, and obtains the final interview record and displays it.
可以理解的是,编辑功能完善了访谈记录的内容,消除了一些错误等,得到的最终访谈记录更加准确和完善。It is understandable that the editing function improves the content of the interview record, eliminates some errors, etc., and the final interview record obtained is more accurate and complete.
S209、接收导出指令;S209. Receive an export instruction;
S210、响应导出指令,将访谈记录进行预设格式处理,得到导出文件;S210. Responding to the export instruction, process the interview record in a preset format to obtain an export file;
S211、将导出文件进行分享。S211. Share the exported file.
控制终端在接收到访谈记录后,可能存在需要对访谈记录进行分享的需求,这时,控制终端还提供了一种分享功能,当用户触发分享功能开启时,即控制终端接收到导出指令,这样在访谈记录的展示界面中,响应导出指令,获取导出格式,根据获取的导出格式对访谈记录进行预设格式处理,得到导出文件,最终将导出文件进行分享。After the control terminal receives the interview record, there may be a need to share the interview record. At this time, the control terminal also provides a sharing function. When the user triggers the sharing function to turn on, the control terminal receives the export command. In the display interface of the interview record, respond to the export instruction, obtain the export format, and perform the preset format processing on the interview record according to the obtained export format to obtain the export file, and finally share the export file.
在本申请的实施例中,导出格式可以包括:HTML格式,txt格式和PDF格式等,任意文本格式或网页格式等均可,本申请实施例不作限制。In the embodiment of the application, the export format may include: HTML format, txt format, PDF format, etc., any text format or web page format, etc., and the embodiment of the application does not limit it.
需要说明的是,控制终端在导出访谈记录时,可以根据用途和平台不同,导出不同格式。比如导出txt格式纯文本,导出PDF格式存档,导出HTML格式分享等等。It should be noted that when the control terminal exports the interview records, it can export different formats according to different purposes and platforms. Such as exporting plain text in txt format, exporting to PDF format for archiving, exporting to HTML format for sharing, etc.
在本申请的一些实施例中,与此次访谈相关的语音数据还可以存储在中枢设备或者云端,这样控制终端在导出访谈记录时,还可以附带分享语音的链接,这样可以将访谈记录和语音链接分享给其他人,使得其他人可以获取到访谈记录的文本信息和语音信息。In some embodiments of the present application, the voice data related to this interview can also be stored in the central device or the cloud, so that when the control terminal exports the interview record, it can also attach a link to share the voice, so that the interview record and the voice can be combined. The link is shared with others, so that others can obtain the text information and voice information of the interview record.
可以理解的是,分享功能提供了可以将访谈记录进行分享的功能,提高了语音访谈场景中的智能性和多样性。It is understandable that the sharing function provides the function of sharing interview records, which improves the intelligence and diversity in the voice interview scene.
可以理解的是,本申请提供的语音信息处理方法解决了访谈记录生成问题,大大提高了后期整理报告的效率,并且提供了原文、译文、对照、音频、摘要等等便利功能。It is understandable that the voice information processing method provided in this application solves the problem of interview record generation, greatly improves the efficiency of post-processing reports, and provides convenient functions such as original text, translation, comparison, audio, and abstract.
本申请实施例提供了一种语音信息处理方法,如图11所示,包括:The embodiment of the present application provides a voice information processing method, as shown in FIG. 11, including:
S301、控制终端接收参与者身份信息、目标语言以及参与者的声纹信息。S301: The control terminal receives the participant's identity information, the target language, and the participant's voiceprint information.
S302、控制终端将参与者身份信息、目标语言以及参与者的声纹信息构成的预设映射关系发送至中枢设备。S302: The control terminal sends the preset mapping relationship formed by the participant's identity information, the target language, and the participant's voiceprint information to the central device.
S303、中枢设备在语音访谈开始后,从采集终端中接收采集终端传输的发言者的待同传语音数据,并获取开始采集待同传语音数据的采集时间。S303. After the voice interview starts, the central device receives the voice data for simultaneous interpretation of the speaker transmitted by the collection terminal from the collection terminal, and obtains the collection time of the voice data for simultaneous interpretation.
S304、基于待同传语音数据和预设映射关系,确定发言者身份信息,并实时对待同传语音数据同传翻译为收听者目标语言,得到翻译信息;预设映射关系为参与者身份信息、目标语言以及参与者的声纹信息之间的对应关系;其中,收听者为参与者中除发言者之外的人员。S304. Determine the identity information of the speaker based on the voice data to be simultaneously interpreted and the preset mapping relationship, and treat the simultaneous interpretation of the voice data into the listener's target language in real time to obtain the translation information; the preset mapping relationship is the identity information of the participants, The corresponding relationship between the target language and the voiceprint information of the participants; among them, the listener is a person other than the speaker among the participants.
S305、中枢设备将翻译信息实时发送至接收终端,供接收终端播放翻译信息。S305. The central device sends the translation information to the receiving terminal in real time, so that the receiving terminal can play the translation information.
S306、中枢设备记录与待同传语音数据对应的采集时间、发言者身份信息和翻译信息,得到一个记录片段信息,进而在语音访谈结束时,获取到至少一个记录片段信息。S306. The hub device records the collection time, speaker identity information, and translation information corresponding to the voice data to be simultaneously translated to obtain a piece of recorded information, and then at the end of the voice interview, obtain at least one piece of recorded information.
S307、中枢设备接收控制终端发送的访谈生成指令;响应于访谈生成指令,对至少一个记录片段信息按照时间轴的顺序,生成访谈记录。S307. The central device receives the interview generation instruction sent by the control terminal; in response to the interview generation instruction, generates an interview record for at least one piece of record information in the order of the time axis.
S308、中枢设备发送访谈记录给控制终端。S308. The central device sends the interview record to the control terminal.
S309、控制终端展示访谈记录。S309. The control terminal displays the interview record.
示例性的,通过控制终端录入与会人姓名(参与者身份信息)、语言(目标语言),并通过耳机/麦克风一体终端录音采集参与者声纹信息,通过中枢设备将参与者身份信息、目标语言和参与者声纹信息相对应,得到预设映射关系;访谈开始,中枢设备中的同声传译模块开始工作,发言者的待同传语音数据通过耳机/麦克风一体终端录音并实时发送到中枢设备,在中枢设备依据预设映射关系完成文字撰写、翻译、记录、翻译语音生成,得到记录片段信息,然后将翻译语音发送到其他人(收听者)的耳机/麦克风一体终端上,收听者便可以听到相应的翻译后的音频。访谈结束,通过控制终端的控制,使得中枢设备将多个记录片段信息生成访谈记录,最后通过控制终端编辑访谈记录,得到最终访谈记录并展示。Exemplarily, the participant’s name (participant identity information) and language (target language) are entered through the control terminal, and the participant’s voiceprint information is collected through the headset/microphone integrated terminal, and the participant’s identity information and target language are recorded through the central device. Corresponding to the voiceprint information of the participants, the preset mapping relationship is obtained; the simultaneous interpretation module in the central device starts to work at the beginning of the interview, and the speaker's simultaneous interpretation voice data is recorded through the headset/microphone integrated terminal and sent to the central device in real time , The central device completes text writing, translation, recording, and translation voice generation according to the preset mapping relationship, and obtains the recorded fragment information, and then sends the translated voice to the headset/microphone integrated terminal of other people (listeners), and the listener can Hear the corresponding translated audio. At the end of the interview, through the control of the control terminal, the central device will generate interview records from multiple recorded pieces of information, and finally edit the interview records through the control terminal to obtain the final interview records and display them.
如图12所示,本申请实施例提供了一种中枢设备1,该中枢设备1可以包括:As shown in FIG. 12, an embodiment of the present application provides a hub device 1, and the hub device 1 may include:
第一接收单元10,用于在语音访谈开始后,接收采集终端传输的发言者的待同传语音数据,并获取开始采集所述待同传语音数据的采集时间;The first receiving unit 10 is configured to receive the voice data for simultaneous interpretation of the speaker transmitted by the collection terminal after the voice interview starts, and obtain the collection time when the voice data for simultaneous interpretation starts to be collected;
确定单元11,用于基于所述待同传语音数据和预设映射关系,确定发言者身份信息;The determining unit 11 is configured to determine the identity information of the speaker based on the voice data to be simultaneously transmitted and the preset mapping relationship;
翻译单元12,用于实时对所述待同传语音数据同传翻译为收听者目标语言,得到翻译信息;所述预设映射关系为参与者身份信息、目标语言以及参与者的声纹信息之间的对应关系;其中,收听者为所述参与者中除所述发言者之外的人员;The translation unit 12 is used for real-time translation of the voice data to be simultaneously interpreted into the target language of the listener to obtain translation information; the preset mapping relationship is among the identity information of the participant, the target language, and the voiceprint information of the participant Correspondence between; wherein, the listener is a person other than the speaker among the participants;
记录单元13,用于记录与所述待同传语音数据对应的所述采集时间、所述发言者身份信息和所述翻译信息,得到一个记录片段信息,进而在语音访谈结束时,获取到至少一个记录片段信息;The recording unit 13 is used to record the collection time, the speaker identity information, and the translation information corresponding to the voice data to be simultaneously translated to obtain a piece of recorded information, and then at the end of the voice interview, at least A piece of record information;
第一生成单元14,用于基于所述至少一个记录片段信息,生成访谈记录。The first generating unit 14 is configured to generate an interview record based on the at least one record segment information.
在本申请的一些实施例中,所述确定单元11,还用于从预设映射关系中的参与者的声纹信息中,确定与所述待同传语音数据声纹匹配的目标声纹信息;以及基于所述目标声纹信息和所述预设映射关系中的参与者身份信息与参与者的声纹信息之间的对应关系,确定出与所述待同传语音数据声纹对应的所述发言者身份信息,并基于所述预设映射关系中的参与者身份信息与目标语音之间的对应关系,获取收听者对应的所述收听者目标语言;In some embodiments of the present application, the determining unit 11 is further configured to determine the target voiceprint information that matches the voiceprint of the voice data to be simultaneously transmitted from the voiceprint information of the participants in the preset mapping relationship And based on the target voiceprint information and the corresponding relationship between the participant’s identity information and the participant’s voiceprint information in the preset mapping relationship, determine all the voiceprints corresponding to the voice data to be simultaneously transmitted The speaker identity information, and based on the correspondence between the participant identity information in the preset mapping relationship and the target voice, obtain the listener's target language corresponding to the listener;
所述翻译单元12,还用于将所述待同传语音数据实时翻译为所述收听者目标语言,得到所述翻译信息。The translation unit 12 is also used to translate the voice data to be simultaneously translated into the target language of the listener in real time to obtain the translation information.
在本申请的一些实施例中,所述记录单元13,还用于对所述待同传语音数据进行文本识别,得到源文本信息;以及记录与所述待同传语音数据对应的所述采集时间、所述发言者身份信息、所述翻译信息和所述源文本信息,直至所述目标声纹信息发生改变时,得到一个记录片段信息,进而在语音访谈结束时,获取到所述至少一个记录片段信息。In some embodiments of the present application, the recording unit 13 is further configured to perform text recognition on the voice data to be simultaneously translated to obtain source text information; and record the collection corresponding to the voice data to be simultaneously translated Time, the speaker’s identity information, the translation information, and the source text information, until the target voiceprint information changes, obtain a piece of recorded information, and then at the end of the voice interview, obtain the at least one Record clip information.
在本申请的一些实施例中,所述中枢设备1还包括:提取单元15;In some embodiments of the present application, the hub device 1 further includes: an extracting unit 15;
所述提取单元15,用于所述记录与所述待同传语音数据对应的所述采集时间、所述发言者身份信息和所述翻译信息,得到一个记录片段信息,进而在语音访谈结束时,获取到至少一个记录片段信息之后,采用摘要提取技术,对所述至少一个记录片段信息进行摘要提取,提取出全文摘要信息;The extraction unit 15 is configured to record the collection time, the speaker identity information, and the translation information corresponding to the voice data to be simultaneously translated to obtain a piece of record information, and then at the end of the voice interview , After obtaining at least one piece of record information, use abstract extraction technology to abstract the at least one piece of record information to extract the full text abstract information;
所述第一生成单元14,还用于基于所述至少一个记录片段信息和所述全文摘要信息,生成所述访谈记录。The first generating unit 14 is further configured to generate the interview record based on the at least one record fragment information and the full-text summary information.
在本申请的一些实施例中,所述中枢设备1还包括:提取单元15;In some embodiments of the present application, the hub device 1 further includes: an extracting unit 15;
所述提取单元15,用于所述记录与所述待同传语音数据对应的所述采集时间、所述发言者身份信息和所述翻译信息,得到一个记录片段信息之后,采用摘要提取技术,对一个记录片段信息进行摘要提取,提取出发言者摘要信息;在语音访谈结束时,获取到所述至少一个记录片段信息和至少一个发言者摘要信息。The extraction unit 15 is configured to record the collection time, the speaker identity information, and the translation information corresponding to the voice data to be simultaneously translated, and after obtaining a piece of record fragment information, use the abstract extraction technology, A summary extraction is performed on one recorded segment information to extract the speaker summary information; at the end of the voice interview, the at least one recorded segment information and the at least one speaker summary information are obtained.
在本申请的一些实施例中,所述第一生成单元14,还用于基于所述至少一个记录片段信息和所述至少一个发言者摘要信息,生成所述访谈记录。In some embodiments of the present application, the first generating unit 14 is further configured to generate the interview record based on the at least one record fragment information and the at least one speaker summary information.
在本申请的一些实施例中,所述中枢设备1还包括:第一发送单元16;In some embodiments of the present application, the hub device 1 further includes: a first sending unit 16;
所述第一接收单元10,还用于接收控制终端发送的访谈生成指令;The first receiving unit 10 is also configured to receive an interview generation instruction sent by the control terminal;
所述第一生成单元14,还用于响应于所述访谈生成指令,对所述至少一个记录片段信息按照时间轴的顺序,生成所述访谈记录;The first generating unit 14 is further configured to generate the interview record for the at least one piece of record information in the order of the time axis in response to the interview generating instruction;
所述第一发送单元16,用于发送所述访谈记录给所述控制终端。The first sending unit 16 is configured to send the interview record to the control terminal.
可以理解的是,由于中枢设备可以在语音访谈场景中,针对发言者的待同传语音数据,确定出发言者身份信息,以及获取满足收听者需求的语言的翻译信息,在访谈结束时,可以基于上述信息生成针对此次访谈的访谈记录,这样中枢设备在针对待同传语音数据进行实时同传翻译的同时,还可以记录下来确定出的发言者身份信息、翻译信息等结构化数据,作为记录片段信息,最后在访谈结束的时候,将得到多个记录片段信息生成此次语音访谈的访谈记录,因此,提高了语音访谈中的数据整理的效率,即提高了语音访谈的访谈记录的生成速度和处理效率。It is understandable that because the central device can determine the identity information of the speaker and obtain the translation information in the language that meets the needs of the listener in the voice interview scene, according to the voice data of the speaker to be simultaneously translated, at the end of the interview, you can Based on the above information, the interview record for this interview is generated, so that the central device can perform real-time simultaneous interpretation of the voice data to be simultaneously interpreted, and at the same time, it can also record the identified speaker identity information, translation information and other structured data. Record fragment information, and finally at the end of the interview, multiple recorded fragment information will be obtained to generate the interview record of the voice interview. Therefore, the efficiency of data collation in the voice interview is improved, that is, the generation of the interview record of the voice interview is improved Speed and processing efficiency.
如图13所示,本申请实施例提供了一种控制终端2,该控制终端2可以包括:As shown in FIG. 13, an embodiment of the present application provides a control terminal 2. The control terminal 2 may include:
第二接收单元20,用于接收参与者身份信息、目标语言以及参与者的声纹信息;The second receiving unit 20 is configured to receive participant identity information, target language, and participant's voiceprint information;
映射单元21,用于将所述参与者身份信息、目标语言以及参与者的声纹信息构成的预设映射关系发送至中枢设备;The mapping unit 21 is configured to send the preset mapping relationship formed by the participant's identity information, the target language, and the participant's voiceprint information to the central device;
所述第二接收单元20,还用于在访谈结束时,接收访谈触发指令;The second receiving unit 20 is further configured to receive an interview trigger instruction at the end of the interview;
第二生成单元22,用于响应于所述访谈触发指令,生成访谈生成指令;The second generating unit 22 is configured to generate an interview generating instruction in response to the interview trigger instruction;
第二发送单元23,用于将所述访谈生成指令发送至所述中枢设备;The second sending unit 23 is configured to send the interview generation instruction to the central device;
所述第二接收单元20,用于接收所述中枢设备针对所述访谈生成指令反馈的访谈记录,所述访谈记录为所述中枢设备响应所述访谈生成指令,基于所述预设映射关系和实时接收的待同传语音数据生成的。The second receiving unit 20 is configured to receive an interview record of the central device responding to the interview generation instruction feedback, the interview record being the central device responding to the interview generation instruction, based on the preset mapping relationship and It is generated from the voice data to be interpreted in real time.
在本申请的一些实施例中,所述控制终端2还包括:显示单元24;In some embodiments of the present application, the control terminal 2 further includes: a display unit 24;
所述显示单元24,用于所述接收所述中枢设备针对所述访谈生成指令反馈的访谈记录之后,对所述访谈记录按照时间轴顺序进行展示。The display unit 24 is configured to display the interview records in the order of time axis after receiving the interview records of the instructions generated by the central device for the interview.
在本申请的一些实施例中,所述访谈记录中的每一段访谈记录包括:发言者身份信息、采集时间、待同传语音数据和翻译信息。In some embodiments of the present application, each segment of the interview record in the interview record includes: speaker identity information, collection time, voice data to be simultaneously translated, and translation information.
在本申请的一些实施例中,所述显示单元24,还用于对所述访谈记录中的每一段访谈记录按照采集时间进行时间轴顺序的排列;以及将排列后的每一段访谈记录对应的所述发言者身份信息、所述待同传语音数据和所述翻译信息进行展示。In some embodiments of the present application, the display unit 24 is also used to arrange each segment of the interview record in the interview record in the order of the time axis according to the collection time; and to arrange each segment of the interview record corresponding to the The identity information of the speaker, the voice data to be simultaneously translated, and the translation information are displayed.
在本申请的一些实施例中,所述访谈记录中的每一段访谈记录还包括:发言者摘要信息;In some embodiments of the application, each segment of the interview record in the interview record further includes: speaker summary information;
所述显示单元24,还用于所述将排列后的每一段访谈记录对应的所述发言者身份信息、所述待同传语音数据和所述翻译信息进行展示之后,在所述每一段访谈记录的展示区域中的第一预设区域中,显示所述发言者摘要信息。The display unit 24 is also used to display the speaker identity information, the voice data to be transcribed, and the translation information corresponding to each segment of the interview record after the arrangement, and then to display each segment of the interview. In the first preset area in the recorded display area, the speaker summary information is displayed.
在本申请的一些实施例中,所述访谈记录中还包括:全文摘要信息;In some embodiments of the present application, the interview record further includes: full-text summary information;
所述显示单元24,还用于所述将排列后的每一段访谈记录对应的所述发言者身份信息、所述待同传语音数据和所述翻译信息进行展示时,将所述全文摘要信息展示在排列后的每一段访谈记录对应的所述发言者身份信息、所述待同传语音数据和所述翻译信息的前面。The display unit 24 is also used to display the full-text summary information when displaying the speaker identity information, the voice data to be transcribed, and the translation information corresponding to each segment of the interview record after being arranged. It is displayed in front of the speaker identity information, the voice data to be transcribed, and the translation information corresponding to each segment of the interview record after the arrangement.
在本申请的一些实施例中,所述控制终端2还包括:编辑单元25和显示单元24;In some embodiments of the present application, the control terminal 2 further includes: an editing unit 25 and a display unit 24;
所述第二接收单元20,还用于所述接收所述中枢设备针对所述访谈生成指令反馈的访谈记录之后,接收编辑指令;The second receiving unit 20 is further configured to receive an editing instruction after receiving the interview record of the instruction feedback generated by the central device for the interview;
所述编辑单元25,用于响应所述编辑指令,对所述访谈记录进行编辑,得到最终访谈记录;The editing unit 25 is configured to edit the interview record in response to the editing instruction to obtain the final interview record;
所述显示单元24,用于展示所述最终访谈记录。The display unit 24 is used to display the final interview record.
在本申请的一些实施例中,所述控制终端2还包括:导出单元26和分享单元27;In some embodiments of the present application, the control terminal 2 further includes: an export unit 26 and a sharing unit 27;
所述第二接收单元20,用于所述接收所述中枢设备针对所述访谈生成指令反馈的访谈记录之后,接收导出指令;The second receiving unit 20 is configured to receive an export instruction after receiving the interview record of the instruction feedback generated by the central device for the interview;
所述导出单元26,用于响应所述导出指令,将所述访谈记录进行预设格式处理,得到导出文件;The export unit 26 is configured to respond to the export instruction and process the interview record in a preset format to obtain an export file;
所述分享单元27,用于将所述导出文件进行分享。The sharing unit 27 is configured to share the exported file.
可以理解的是,控制终端可以从中枢设备中获取到访谈记录,访谈记录是记录了语音访谈内容的信息,这样,用户可以通过控制终端获取或者观看到访谈记录,方便且快捷,提供了控制终端的智能性。It is understandable that the control terminal can obtain the interview record from the central device. The interview record is the information that records the content of the voice interview. In this way, the user can obtain or watch the interview record through the control terminal, which is convenient and fast. A control terminal is provided. The intelligence.
如图14所示,本申请实施例提供了一种中枢设备,包括:As shown in FIG. 14, an embodiment of the present application provides a hub device, including:
第一处理器17和第一存储器18;The first processor 17 and the first memory 18;
所述第一处理器17,配置为执行所述第一存储器18中存储的同声传译程序,以实现中枢设备侧的语音信息处理方法。The first processor 17 is configured to execute the simultaneous interpretation program stored in the first memory 18 to implement the voice information processing method on the central device side.
如图15所示,本申请实施例提供了一种控制终端,包括:As shown in FIG. 15, an embodiment of the present application provides a control terminal, including:
第二处理器28和第二存储器29;A second processor 28 and a second memory 29;
所述第二处理器28,配置为执行所述第二存储器29中存储的同声传译程序,以实现控制终端侧的语音信息处理方法。The second processor 28 is configured to execute the simultaneous interpretation program stored in the second memory 29 to implement the voice information processing method on the control terminal side.
在本公开的实施例中,上述第一处理器17或第二处理器28可以为特定用途集成电路(Application Specific Integrated Circuit,ASIC)、数字信号处理器(Digital Signal Processor,DSP)、数字信号处理装置(Digital Signal Processing Device,DSPD)、可编程逻辑装置(ProgRAMmable Logic Device,PLD)、现场可编程门阵列(Field ProgRAMmable Gate Array,FPGA)、CPU、控制器、微控制器、微处理器中的至少一种。可以理解地,对于不同的设备,用于实现上述第一处理器17或第二处理器28功能的电子器件还可以为其它,本公开实施例不作限定。该中枢设备还包括第一存储器18,控制终端还包括第二存储器29,该第一存储器18可以与第一处理器17连接,该第二存储器30可以与第二处理器28连接。其中,第一存储器18或第二存储器29可能包含高速RAM存储器,也可能还包括非易失性存储器,例如,至少两个磁盘存储器。In the embodiment of the present disclosure, the above-mentioned first processor 17 or second processor 28 may be an Application Specific Integrated Circuit (ASIC), a digital signal processor (Digital Signal Processor, DSP), or a digital signal processor. Device (Digital Signal Processing Device, DSPD), Programmable Logic Device (ProgRAMmable Logic Device, PLD), Field Programmable Gate Array (Field Programmable Gate Array, FPGA), CPU, controller, microcontroller, microprocessor At least one. It is understandable that, for different devices, the electronic device used to implement the functions of the first processor 17 or the second processor 28 may also be other, which is not limited in the embodiment of the present disclosure. The hub device further includes a first memory 18, and the control terminal further includes a second memory 29. The first memory 18 can be connected to the first processor 17, and the second memory 30 can be connected to the second processor 28. Among them, the first memory 18 or the second memory 29 may include a high-speed RAM memory, or may also include a non-volatile memory, for example, at least two disk memories.
在实际应用中,上述第一存储器18或第二存储器29可以是易失性存储器(volatile memory), 例如随机存取存储器(Random-Access Memory,RAM);或者非易失性存储器(non-volatile memory),例如只读存储器(Read-Only Memory,ROM),快闪存储器(flash memory),硬盘(Hard Disk Drive,HDD)或固态硬盘(Solid-State Drive,SSD);或者上述种类的存储器的组合,并向第一处理器17或第二处理器28提供指令和数据。In practical applications, the above-mentioned first memory 18 or second memory 29 may be volatile memory (volatile memory), such as random-access memory (Random-Access Memory, RAM); or non-volatile memory (non-volatile memory). memory), such as read-only memory (Read-Only Memory, ROM), flash memory (flash memory), hard disk (HDD) or solid-state drive (Solid-State Drive, SSD); or Combine and provide instructions and data to the first processor 17 or the second processor 28.
另外,在本实施例中的各功能模块可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。In addition, the functional modules in this embodiment may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit. The above-mentioned integrated unit can be realized in the form of hardware or software function module.
其中,集成的单元如果以软件功能模块的形式实现并非作为独立的产品进行销售或使用时,可以存储在一个计算机可读取存储介质中,基于这样的理解,本实施例的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)或processor(处理器)执行本实施例方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器、随机存取存储器、磁碟或者光盘等各种可以存储程序代码的介质。Among them, if the integrated unit is implemented in the form of a software function module and is not sold or used as an independent product, it can be stored in a computer readable storage medium. Based on this understanding, the technical solution of this embodiment is essentially or It is said that the part that contributes to the existing technology or all or part of the technical solution can be embodied in the form of a software product. The computer software product is stored in a storage medium and includes several instructions to enable a computer device (which can It is a personal computer, a server, or a network device, etc.) or a processor (processor) that executes all or part of the steps of the method in this embodiment. The aforementioned storage media include: U disk, mobile hard disk, read-only memory, random access memory, magnetic disk or optical disk and other media that can store program codes.
因此,本申请实施例还提供了一种计算机可读存储介质,其上存储有同声传译程序,该计算机程序被一个或者多个第一处理器执行时实现中枢设备侧的语音信息处理方法。Therefore, an embodiment of the present application also provides a computer-readable storage medium on which a simultaneous interpretation program is stored, and the computer program is executed by one or more first processors to realize the voice information processing method on the central device side.
本申请实施例还提供了一种计算机可读存储介质,其上存储有同声传译程序,该计算机程序被一个或者多个第二处理器执行时实现控制终端侧的语音信息处理方法。The embodiment of the present application also provides a computer-readable storage medium on which a simultaneous interpretation program is stored, and when the computer program is executed by one or more second processors, it realizes the voice information processing method on the control terminal side.
计算机可读存储介质可以是易失性存储器(volatile memory),例如随机存取存储器(Random-Access Memory,RAM);或者非易失性存储器(non-volatile memory),例如只读存储器(Read-Only Memory,ROM),快闪存储器(flash memory),硬盘(Hard Disk Drive,HDD)或固态硬盘(Solid-State Drive,SSD);也可以是包括上述存储器之一或任意组合的各自设备,如移动电话、计算机、平板设备、个人数字助理等。The computer-readable storage medium may be a volatile memory (volatile memory), such as a random-access memory (Random-Access Memory, RAM); or a non-volatile memory (non-volatile memory), such as a read-only memory (Read-only memory). Only Memory, ROM, flash memory, Hard Disk Drive (HDD), or Solid-State Drive (SSD); it can also be a respective device including one or any combination of the above-mentioned memories, such as Mobile phones, computers, tablet devices, personal digital assistants, etc.
本领域内的技术人员应明白,本申请的实施例可提供为方法、系统、或计算机程序产品。因此,本申请可采用硬件实施例、软件实施例、或结合软件和硬件方面的实施例的形式。而且,本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器和光学存储器等)上实施的计算机程序产品的形式。Those skilled in the art should understand that the embodiments of the present application can be provided as methods, systems, or computer program products. Therefore, this application may adopt the form of hardware embodiments, software embodiments, or embodiments combining software and hardware. Moreover, this application may adopt the form of a computer program product implemented on one or more computer-usable storage media (including but not limited to disk storage, optical storage, etc.) containing computer-usable program codes.
本申请是参照根据本申请实施例的方法、设备(系统)、和计算机程序产品的实现流程示意图和/或方框图来描述的。应理解可由计算机程序指令实现流程示意图和/或方框图中的每一流程和/或方框、以及实现流程示意图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在实现流程示意图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。This application is described with reference to the schematic diagrams and/or block diagrams of the methods, devices (systems), and computer program products according to the embodiments of the application. It should be understood that each process and/or block in the schematic flow diagram and/or block diagram can be realized by computer program instructions, and the combination of processes and/or blocks in the schematic flow diagram and/or block diagram can be realized. These computer program instructions can be provided to the processor of a general-purpose computer, a special-purpose computer, an embedded processor, or other programmable data processing equipment to generate a machine, so that the instructions executed by the processor of the computer or other programmable data processing equipment are used to generate It is a device for realizing the functions specified in one or more processes in the schematic flow chart and/or one block or more blocks in the block diagram.
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在实现流程示意图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。These computer program instructions can also be stored in a computer-readable memory that can guide a computer or other programmable data processing equipment to work in a specific manner, so that the instructions stored in the computer-readable memory produce an article of manufacture including the instruction device. The device realizes the functions specified in one or more processes in the schematic diagram and/or one block or more in the block diagram.
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在实现流程示意图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。These computer program instructions can also be loaded on a computer or other programmable data processing equipment, so that a series of operation steps are executed on the computer or other programmable equipment to produce computer-implemented processing, so as to execute on the computer or other programmable equipment. The instructions provide steps for implementing functions specified in one or more processes in the schematic diagram and/or one block or more in the block diagram.
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本实用申请揭露的技术范围内,都应涵盖在本申请的保护范围之内。The above are only specific implementations of this application, but the scope of protection of this application is not limited to this. Anyone familiar with the technical field within the technical scope disclosed in this application shall be covered by the scope of this application. Within the scope of protection.
工业实用性Industrial applicability
本申请实施例提供的语音信息处理方法中,中枢设备可以在语音访谈场景中,针对发言者的待同传语音数据,确定出发言者身份信息,以及获取满足收听者需求的语言的翻译信息,在访谈结束时,可以基于上述信息生成针对此次访谈的访谈记录,这样中枢设备在针对待同传语音数据进行实时同传翻译的同时,还可以记录下来确定出的发言者身份信息、翻译信息等结构化数据,作为记录片段信息,最后在访谈结束的时候,将得到多个记录片段信息生成此次语音访谈的访谈记录,因此,提高了语音访谈中的数据整理的效率,即提高了语音访谈的访谈记录的生成速度和处理效率。In the voice information processing method provided by the embodiments of the present application, the central device can determine the identity information of the speaker and obtain the translation information in the language that meets the needs of the listener for the voice data of the speaker to be simultaneously translated in the voice interview scene. At the end of the interview, an interview record for the interview can be generated based on the above information, so that the central device can perform real-time simultaneous interpretation of the voice data to be simultaneously interpreted, and can also record the identified speaker identity information and translation information. The structured data is used as the recorded fragment information. At the end of the interview, multiple recorded fragments will be obtained to generate the interview record of the voice interview. Therefore, the efficiency of data collation in the voice interview is improved, that is, the voice interview is improved. The generation speed and processing efficiency of interview records.

Claims (20)

  1. 一种语音信息处理方法,包括:A voice information processing method, including:
    在语音访谈开始后,接收采集终端传输的发言者的待同传语音数据,并获取开始采集所述待同传语音数据的采集时间;After the start of the voice interview, receiving the voice data to be interpreted from the speaker transmitted by the collection terminal, and acquiring the collection time when the voice data to be transcribed was collected;
    基于所述待同传语音数据和预设映射关系,确定发言者身份信息,并实时对所述待同传语音数据同传翻译为收听者目标语言,得到翻译信息;所述预设映射关系为参与者身份信息、目标语言以及参与者的声纹信息之间的对应关系;其中,收听者为所述参与者中除所述发言者之外的人员;Based on the voice data to be interpreted and a preset mapping relationship, the identity information of the speaker is determined, and the voice data to be interpreted simultaneously is translated into the listener's target language in real time to obtain translation information; the preset mapping relationship is The correspondence between the identity information of the participant, the target language, and the voiceprint information of the participant; wherein the listener is a person other than the speaker among the participants;
    记录与所述待同传语音数据对应的所述采集时间、所述发言者身份信息和所述翻译信息,得到一个记录片段信息,进而在语音访谈结束时,获取到至少一个记录片段信息;Record the collection time, the speaker identity information, and the translation information corresponding to the voice data to be simultaneously translated to obtain one piece of recorded information, and then at the end of the voice interview, obtain at least one piece of recorded information;
    基于所述至少一个记录片段信息,生成访谈记录。Based on the at least one piece of record information, an interview record is generated.
  2. 根据权利要求1所述的方法,其中,所述基于所述待同传语音数据和预设映射关系,确定发言者身份信息,并实时对所述待同传语音数据同传翻译为收听者目标语言,得到翻译信息,包括:The method according to claim 1, wherein said determining the identity information of the speaker based on the voice data to be simultaneously interpreted and a preset mapping relationship, and simultaneously interpreting the voice data to be simultaneously interpreted into a listener target in real time Language, get translation information, including:
    从预设映射关系中的参与者的声纹信息中,确定与所述待同传语音数据声纹匹配的目标声纹信息;From the voiceprint information of the participants in the preset mapping relationship, determine the target voiceprint information that matches the voiceprint of the voice data to be simultaneously transmitted;
    基于所述目标声纹信息和所述预设映射关系中的参与者身份信息与参与者的声纹信息之间的对应关系,确定出与所述待同传语音数据声纹对应的所述发言者身份信息,并基于所述预设映射关系中的参与者身份信息与目标语音之间的对应关系,获取收听者对应的所述收听者目标语言;Based on the target voiceprint information and the corresponding relationship between the participant identity information in the preset mapping relationship and the voiceprint information of the participant, the speech corresponding to the voiceprint of the voice data to be simultaneously transmitted is determined The identity information of the listener, and based on the corresponding relationship between the identity information of the participant in the preset mapping relationship and the target voice, obtain the listener's target language corresponding to the listener;
    将所述待同传语音数据实时翻译为所述收听者目标语言,得到所述翻译信息。Translate the voice data to be simultaneously translated into the target language of the listener in real time to obtain the translation information.
  3. 根据权利要求1所述的方法,其中,所述记录与所述待同传语音数据对应的所述采集时间、所述发言者身份信息和所述翻译信息,得到一个记录片段信息,进而在语音访谈结束时,获取到至少一个记录片段信息,包括:The method according to claim 1, wherein the recording of the collection time, the speaker identity information and the translation information corresponding to the to-be-translated voice data obtains a piece of record fragment information, which is then recorded in the voice data. At the end of the interview, at least one recorded piece of information was obtained, including:
    对所述待同传语音数据进行文本识别,得到源文本信息;Performing text recognition on the voice data to be simultaneously translated to obtain source text information;
    记录与所述待同传语音数据对应的所述采集时间、所述发言者身份信息、所述翻译信息和所述源文本信息,直至所述目标声纹信息发生改变时,得到一个记录片段信息,进而在语音访谈结束时,获取到所述至少一个记录片段信息。Record the collection time, the speaker identity information, the translation information, and the source text information corresponding to the voice data to be simultaneously translated, until the target voiceprint information changes, obtain a piece of recorded information , And then at the end of the voice interview, the at least one recorded segment information is obtained.
  4. 根据权利要求1所述的方法,其中,所述记录与所述待同传语音数据对应的所述采集时间、所述发言者身份信息和所述翻译信息,得到一个记录片段信息,进而在语音访谈结束时,获取到至少一个记录片段信息之后,所述方法还包括:The method according to claim 1, wherein the recording of the collection time, the speaker identity information and the translation information corresponding to the to-be-translated voice data obtains a piece of record fragment information, which is then recorded in the voice data. At the end of the interview, after obtaining at least one piece of recorded information, the method further includes:
    采用摘要提取技术,对所述至少一个记录片段信息进行摘要提取,提取出全文摘要信息;Using abstract extraction technology, abstract extraction is performed on the at least one record fragment information, and full text abstract information is extracted;
    所述基于所述至少一个记录片段信息,生成访谈记录,包括:The generating an interview record based on the information of the at least one record segment includes:
    基于所述至少一个记录片段信息和所述全文摘要信息,生成所述访谈记录。Based on the at least one recorded segment information and the full-text summary information, the interview record is generated.
  5. 根据权利要求1或4所述的方法,其中,所述记录与所述待同传语音数据对应的所述采集时间、所述发言者身份信息和所述翻译信息,得到一个记录片段信息之后,所述方法还包括:The method according to claim 1 or 4, wherein after the recording of the collection time, the speaker identity information, and the translation information corresponding to the voice data to be simultaneously translated, after obtaining a piece of record information, The method also includes:
    采用摘要提取技术,对一个记录片段信息进行摘要提取,提取出发言者摘要信息;Using abstract extraction technology, extract a summary of the information of a record segment, and extract the summary information of the speaker;
    在语音访谈结束时,获取到所述至少一个记录片段信息和至少一个发言者摘要信息。At the end of the voice interview, the at least one recorded segment information and the at least one speaker summary information are obtained.
  6. 根据权利要求5所述的方法,其中,所述基于所述至少一个记录片段信息,生成访谈记录,包括:The method according to claim 5, wherein said generating an interview record based on said at least one record fragment information comprises:
    基于所述至少一个记录片段信息和所述至少一个发言者摘要信息,生成所述访谈记录。Based on the at least one recorded segment information and the at least one speaker summary information, the interview record is generated.
  7. 根据权利要求1所述的方法,其中,所述基于所述至少一个记录片段信息,生成访谈记录,包括:The method according to claim 1, wherein said generating an interview record based on said at least one record fragment information comprises:
    接收控制终端发送的访谈生成指令;Receive interview generation instructions sent by the control terminal;
    响应于所述访谈生成指令,对所述至少一个记录片段信息按照时间轴的顺序,生成所述访谈记录;In response to the interview generation instruction, generating the interview record for the at least one piece of record information in the order of the time axis;
    所述基于所述至少一个记录片段信息,生成访谈记录之后,所述方法还包括:After the interview record is generated based on the at least one record piece information, the method further includes:
    发送所述访谈记录给所述控制终端。Send the interview record to the control terminal.
  8. 一种语音信息处理方法,包括:A voice information processing method, including:
    接收参与者身份信息、目标语言以及参与者的声纹信息;Receive participant identity information, target language, and participant’s voiceprint information;
    将所述参与者身份信息、目标语言以及参与者的声纹信息构成的预设映射关系发送至中枢设备;Sending the preset mapping relationship formed by the participant's identity information, the target language, and the participant's voiceprint information to the central device;
    在访谈结束时,接收访谈触发指令,响应于所述访谈触发指令,生成访谈生成指令;At the end of the interview, receive an interview trigger instruction, and generate an interview generation instruction in response to the interview trigger instruction;
    将所述访谈生成指令发送至所述中枢设备;Sending the interview generation instruction to the central device;
    接收所述中枢设备针对所述访谈生成指令反馈的访谈记录,所述访谈记录为所述中枢设备响应所述访谈生成指令,基于所述预设映射关系和实时接收的待同传语音数据生成的。Receive the interview record of the central device's feedback on the interview generation instruction, the interview record is generated by the central device in response to the interview generation instruction based on the preset mapping relationship and the voice data to be simultaneously transmitted received in real time .
  9. 根据权利要求8所述的方法,其中,所述接收所述中枢设备针对所述访谈生成指令反馈的访谈记录之后,所述方法还包括:8. The method according to claim 8, wherein after the receiving the interview record of the instruction feedback generated by the central device for the interview, the method further comprises:
    对所述访谈记录按照时间轴顺序进行展示。The interview records are displayed in order of time axis.
  10. 根据权利要求9所述的方法,其中,The method according to claim 9, wherein:
    所述访谈记录中的每一段访谈记录包括:发言者身份信息、采集时间、待同传语音数据和翻译信息。Each segment of the interview record in the interview record includes: speaker identity information, collection time, voice data to be simultaneously translated, and translation information.
  11. 根据权利要求10所述的方法,其中,所述对所述访谈记录按照时间轴顺序进行展示,包括:The method according to claim 10, wherein said displaying said interview records in order of time axis comprises:
    对所述访谈记录中的每一段访谈记录按照采集时间进行时间轴顺序的排列;Arrange each segment of the interview record in the interview record in the order of the time axis according to the collection time;
    将排列后的每一段访谈记录对应的所述发言者身份信息、所述待同传语音数据和所述翻译信息进行展示。Display the speaker identity information, the voice data to be simultaneously translated, and the translation information corresponding to each segment of the interview record after the arrangement.
  12. 根据权利要求11所述的方法,其中,所述访谈记录中的每一段访谈记录还包括:发言者摘要信息;所述将排列后的每一段访谈记录对应的所述发言者身份信息、所述待同传语音数据和所述翻译信息进行展示之后,所述方法还包括:The method according to claim 11, wherein each segment of the interview record in the interview record further comprises: speaker summary information; the speaker identity information and the speaker identity information corresponding to each segment of the interview record after being arranged After the simultaneous interpretation voice data and the translation information are displayed, the method further includes:
    在所述每一段访谈记录的展示区域中的第一预设区域中,显示所述发言者摘要信息。In the first preset area in the display area of each interview record, the speaker summary information is displayed.
  13. 根据权利要求11所述的方法,其中,所述访谈记录中还包括:全文摘要信息;所述将排列后的每一段访谈记录对应的所述发言者身份信息、所述待同传语音数据和所述翻译信息进行展示时,所述方法还包括:The method according to claim 11, wherein the interview record further includes: full-text summary information; the speaker identity information corresponding to each segment of the interview record after the arrangement, the voice data to be transcribed, and When the translated information is displayed, the method further includes:
    将所述全文摘要信息展示在排列后的每一段访谈记录对应的所述发言者身份信息、所述待同传语音数据和所述翻译信息的前面。The full text summary information is displayed in front of the speaker identity information, the voice data to be transcribed, and the translation information corresponding to each segment of the interview record after the arrangement.
  14. 根据权利要求8所述的方法,其中,所述接收所述中枢设备针对所述访谈生成指令反馈的访谈记录之后,所述方法还包括:8. The method according to claim 8, wherein after the receiving the interview record of the instruction feedback generated by the central device for the interview, the method further comprises:
    接收编辑指令;Receive editing instructions;
    响应所述编辑指令,对所述访谈记录进行编辑,得到最终访谈记录并展示。In response to the editing instruction, the interview record is edited, and the final interview record is obtained and displayed.
  15. 根据权利要求8所述的方法,其中,所述接收所述中枢设备针对所述访谈生成指令反馈的访谈记录之后,所述方法还包括:8. The method according to claim 8, wherein after the receiving the interview record of the instruction feedback generated by the central device for the interview, the method further comprises:
    接收导出指令;Receive export instructions;
    响应所述导出指令,将所述访谈记录进行预设格式处理,得到导出文件;In response to the export instruction, process the interview record in a preset format to obtain an export file;
    将所述导出文件进行分享。Share the exported file.
  16. 一种中枢设备,包括:A central equipment including:
    第一接收单元,用于在语音访谈开始后,接收采集终端传输的发言者的待同传语音数据,并获取开始采集所述待同传语音数据的采集时间;The first receiving unit is configured to receive the voice data for simultaneous interpretation of the speaker transmitted by the collection terminal after the start of the voice interview, and obtain the collection time when the voice data for simultaneous interpretation is started;
    确定单元,用于基于所述待同传语音数据和预设映射关系,确定发言者身份信息;The determining unit is configured to determine the identity information of the speaker based on the voice data to be simultaneously transmitted and the preset mapping relationship;
    翻译单元,用于实时对所述待同传语音数据同传翻译为收听者目标语言,得到翻译信息;所述预设映射关系为参与者身份信息、目标语言以及参与者的声纹信息之间的对应关系;其中,收听者为所述参与者中除所述发言者之外的人员;The translation unit is used for real-time translation of the voice data to be simultaneously translated into the target language of the listener to obtain translation information; the preset mapping relationship is between the identity information of the participant, the target language and the voiceprint information of the participant Correspondence relationship; wherein, the listener is a person other than the speaker among the participants;
    记录单元,用于记录与所述待同传语音数据对应的所述采集时间、所述发言者身份信息和所述翻译信息,得到一个记录片段信息,进而在语音访谈结束时,获取到至少一个记录片段信息;The recording unit is used to record the collection time, the speaker identity information, and the translation information corresponding to the voice data to be simultaneously translated to obtain a piece of recorded piece information, and then at the end of the voice interview, at least one piece of information is obtained Record fragment information;
    第一生成单元,用于基于所述至少一个记录片段信息,生成访谈记录。The first generating unit is configured to generate interview records based on the at least one record segment information.
  17. 一种控制终端,包括:A control terminal, including:
    第二接收单元,用于接收参与者身份信息、目标语言以及参与者的声纹信息;The second receiving unit is used to receive the participant's identity information, the target language and the participant's voiceprint information;
    映射单元,用于将所述参与者身份信息、目标语言以及参与者的声纹信息构成的预设映射关系发送至中枢设备;A mapping unit, configured to send the preset mapping relationship formed by the participant's identity information, the target language, and the participant's voiceprint information to the central device;
    所述第二接收单元,还用于在访谈结束时,接收访谈触发指令;The second receiving unit is further configured to receive an interview trigger instruction at the end of the interview;
    第二生成单元,用于响应于所述访谈触发指令,生成访谈生成指令;The second generation unit is configured to generate an interview generation instruction in response to the interview trigger instruction;
    第二发送单元,用于将所述访谈生成指令发送至所述中枢设备;The second sending unit is configured to send the interview generation instruction to the central device;
    所述第二接收单元,用于接收所述中枢设备针对所述访谈生成指令反馈的访谈记录,所述访谈记录为所述中枢设备响应所述访谈生成指令,基于所述预设映射关系和实时接收的待同传语音数据生成的。The second receiving unit is configured to receive an interview record of the central device responding to the interview generation instruction feedback, the interview record being the central device responding to the interview generation instruction, based on the preset mapping relationship and real-time Generated by the received voice data to be interpreted simultaneously.
  18. 一种中枢设备,包括:A central equipment including:
    第一处理器和第一存储器;The first processor and the first memory;
    所述第一处理器,配置为执行所述第一存储器中存储的同声传译程序,以实现权利要求1至7任一项所述的语音信息处理方法。The first processor is configured to execute the simultaneous interpretation program stored in the first memory to implement the voice information processing method according to any one of claims 1 to 7.
  19. 一种控制终端,包括:A control terminal, including:
    第二处理器和第二存储器;A second processor and a second memory;
    所述第二处理器,配置为执行所述第二存储器中存储的同声传译程序,以实现权利要求8至15任一项所述的语音信息处理方法。The second processor is configured to execute the simultaneous interpretation program stored in the second memory to implement the voice information processing method according to any one of claims 8 to 15.
  20. 一种存储介质,其上存储有同声传译程序,所述同声传译程序被第一处理器执行时实现如权利要求1至7任一项所述的语音信息处理方法;或者,所述同声传译程序被第二处理器执行时实现如权利要求8至15任一项所述的语音信息处理方法。A storage medium on which a simultaneous interpretation program is stored, and when the simultaneous interpretation program is executed by a first processor, the method for processing voice information according to any one of claims 1 to 7 is implemented; or, the simultaneous interpretation program When the voice interpretation program is executed by the second processor, the voice information processing method according to any one of claims 8 to 15 is realized.
PCT/CN2019/130075 2019-12-30 2019-12-30 Voice information processing method, hub device, control terminal and storage medium WO2021134284A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201980101053.3A CN114503117A (en) 2019-12-30 2019-12-30 Voice information processing method, center device, control terminal and storage medium
PCT/CN2019/130075 WO2021134284A1 (en) 2019-12-30 2019-12-30 Voice information processing method, hub device, control terminal and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2019/130075 WO2021134284A1 (en) 2019-12-30 2019-12-30 Voice information processing method, hub device, control terminal and storage medium

Publications (1)

Publication Number Publication Date
WO2021134284A1 true WO2021134284A1 (en) 2021-07-08

Family

ID=76686162

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/130075 WO2021134284A1 (en) 2019-12-30 2019-12-30 Voice information processing method, hub device, control terminal and storage medium

Country Status (2)

Country Link
CN (1) CN114503117A (en)
WO (1) WO2021134284A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116383432A (en) * 2023-04-20 2023-07-04 中关村科学城城市大脑股份有限公司 Audio data screening method and system

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2001014314A (en) * 1999-06-28 2001-01-19 Sony Corp Simultaneous translation system
US20170075882A1 (en) * 2006-10-26 2017-03-16 Facebook, Inc. Simultaneous Translation of Open Domain Lectures and Speeches
CN108305632A (en) * 2018-02-02 2018-07-20 深圳市鹰硕技术有限公司 A kind of the voice abstract forming method and system of meeting
CN108766414A (en) * 2018-06-29 2018-11-06 北京百度网讯科技有限公司 Method, apparatus, equipment and computer readable storage medium for voiced translation
CN108922538A (en) * 2018-05-29 2018-11-30 平安科技(深圳)有限公司 Conferencing information recording method, device, computer equipment and storage medium
CN109741754A (en) * 2018-12-10 2019-05-10 上海思创华信信息技术有限公司 A kind of conference voice recognition methods and system, storage medium and terminal
CN110491385A (en) * 2019-07-24 2019-11-22 深圳市合言信息科技有限公司 Simultaneous interpretation method, apparatus, electronic device and computer readable storage medium
CN110516265A (en) * 2019-08-31 2019-11-29 青岛谷力互联科技有限公司 A kind of single identification real-time translation system based on intelligent sound

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2001014314A (en) * 1999-06-28 2001-01-19 Sony Corp Simultaneous translation system
US20170075882A1 (en) * 2006-10-26 2017-03-16 Facebook, Inc. Simultaneous Translation of Open Domain Lectures and Speeches
CN108305632A (en) * 2018-02-02 2018-07-20 深圳市鹰硕技术有限公司 A kind of the voice abstract forming method and system of meeting
CN108922538A (en) * 2018-05-29 2018-11-30 平安科技(深圳)有限公司 Conferencing information recording method, device, computer equipment and storage medium
CN108766414A (en) * 2018-06-29 2018-11-06 北京百度网讯科技有限公司 Method, apparatus, equipment and computer readable storage medium for voiced translation
CN109741754A (en) * 2018-12-10 2019-05-10 上海思创华信信息技术有限公司 A kind of conference voice recognition methods and system, storage medium and terminal
CN110491385A (en) * 2019-07-24 2019-11-22 深圳市合言信息科技有限公司 Simultaneous interpretation method, apparatus, electronic device and computer readable storage medium
CN110516265A (en) * 2019-08-31 2019-11-29 青岛谷力互联科技有限公司 A kind of single identification real-time translation system based on intelligent sound

Also Published As

Publication number Publication date
CN114503117A (en) 2022-05-13

Similar Documents

Publication Publication Date Title
US11699456B2 (en) Automated transcript generation from multi-channel audio
US10270736B2 (en) Account adding method, terminal, server, and computer storage medium
TWI536365B (en) Voice print identification
CN205647778U (en) Intelligent conference system
WO2019205271A1 (en) Conference speech management method and apparatus
CN112653902B (en) Speaker recognition method and device and electronic equipment
CN104700836A (en) Voice recognition method and voice recognition system
CN106713111B (en) Processing method for adding friends, terminal and server
CN109361527B (en) Voice conference recording method and system
WO2019071808A1 (en) Video image display method, apparatus and system, terminal device, and storage medium
JP2020003774A (en) Method and apparatus for processing speech
US20220231873A1 (en) System for facilitating comprehensive multilingual virtual or real-time meeting with real-time translation
CN108320761B (en) Audio recording method, intelligent recording device and computer readable storage medium
CN112581965A (en) Transcription method, device, recording pen and storage medium
WO2021134284A1 (en) Voice information processing method, hub device, control terminal and storage medium
TW200824408A (en) Methods and systems for information retrieval during communication, and machine readable medium thereof
CN112562677B (en) Conference voice transcription method, device, equipment and storage medium
CN114244793A (en) Information processing method, device, equipment and storage medium
JP4735640B2 (en) Audio conference system
JP7417272B2 (en) Terminal device, server device, distribution method, learning device acquisition method, and program
CN110175260B (en) Method and device for distinguishing recording roles and computer-readable storage medium
CN112447179A (en) Voice interaction method, device, equipment and computer readable storage medium
CN111028837B (en) Voice conversation method, voice recognition system and computer storage medium
US20200184973A1 (en) Transcription of communications
CN116193179A (en) Conference recording method, terminal equipment and conference recording system

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19958579

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 01.12.2022)

122 Ep: pct application non-entry in european phase

Ref document number: 19958579

Country of ref document: EP

Kind code of ref document: A1