WO2019214359A1 - 基于同声传译的数据处理方法、计算机设备和存储介质 - Google Patents

基于同声传译的数据处理方法、计算机设备和存储介质 Download PDF

Info

Publication number
WO2019214359A1
WO2019214359A1 PCT/CN2019/080027 CN2019080027W WO2019214359A1 WO 2019214359 A1 WO2019214359 A1 WO 2019214359A1 CN 2019080027 W CN2019080027 W CN 2019080027W WO 2019214359 A1 WO2019214359 A1 WO 2019214359A1
Authority
WO
WIPO (PCT)
Prior art keywords
text
simultaneous interpretation
user terminal
modified
server
Prior art date
Application number
PCT/CN2019/080027
Other languages
English (en)
French (fr)
Inventor
白晶亮
欧阳才晟
刘海康
陈联武
陈祺
张宇露
罗敏
苏丹
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Priority to EP19799122.7A priority Critical patent/EP3792916B1/en
Publication of WO2019214359A1 publication Critical patent/WO2019214359A1/zh
Priority to US16/941,503 priority patent/US20200357389A1/en

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/183Speech classification or search using natural language modelling using context dependencies, e.g. language models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/58Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems
    • G10L15/30Distributed recognition, e.g. in client-server systems, for mobile phones or network applications
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L21/0232Processing in the frequency domain
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/21Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being power information
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/78Detection of presence or absence of voice signals
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/78Detection of presence or absence of voice signals
    • G10L25/84Detection of presence or absence of voice signals for discriminating voice from noise
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems
    • G10L15/32Multiple recognisers used in sequence or in parallel; Score combination systems therefor, e.g. voting systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • G10L2015/0635Training updating or merging of old and new templates; Mean values; Weighting
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • G10L2015/0635Training updating or merging of old and new templates; Mean values; Weighting
    • G10L2015/0636Threshold criteria for the updating
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/78Detection of presence or absence of voice signals
    • G10L2025/783Detection of presence or absence of voice signals based on threshold decision
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use

Definitions

  • the present application relates to the field of simultaneous interpretation technology, and in particular to a data processing method based on simultaneous interpretation, a computer device and a storage medium.
  • SI Simultaneous Interpretation
  • Simultaneous interpretation refers to the translation method that continuously translates the content of the speech to the audience without interrupting the speaker's speech. At present, 95% of international conferences in the world use simultaneous interpretation.
  • the commonly used simultaneous interpretation method is: the simultaneous interpretation device collects the audio emitted by the speaker, and uploads the collected audio to the server.
  • the server processes the received audio to obtain the corresponding text and displays the text on the display of the simultaneous interpretation system.
  • the embodiment of the present application provides a data processing method, a computer device, and a storage medium based on simultaneous interpretation, which can solve the problem that the accuracy of the text content of the related technology in simultaneous interpretation is low.
  • the embodiment of the present application provides a data processing method based on simultaneous interpretation, the method is applied to a server in a simultaneous interpretation system, and the simultaneous interpretation system further includes a simultaneous interpretation device and a user terminal, including:
  • the embodiment of the present application provides a data processing device based on simultaneous interpretation, the device is applied to a server in a simultaneous interpretation system, and the simultaneous interpretation system further includes a simultaneous interpretation device and a user terminal, including:
  • An acquiring module configured to acquire audio sent by the simultaneous interpretation device
  • a processing module configured to process the audio by a simultaneous interpretation model to obtain an initial text
  • a sending module configured to send the initial text to the user terminal
  • a receiving module configured to receive the modified text fed back by the user terminal, where the modified text is obtained by modifying the initial text by the user terminal;
  • the embodiment of the present application provides a storage medium storing a computer program, when the computer program is executed by a processor, causing the processor to perform the steps of the above-described simultaneous data processing method based on simultaneous interpretation.
  • An embodiment of the present application provides a computer device, including a memory and a processor, where the memory stores a computer program, when the computer program is executed by the processor, causing the processor to perform the simultaneous interpretation based on the above The steps of the data processing method.
  • the data processing method, device and storage medium based on the simultaneous interpretation can obtain the modified text obtained by modifying the initial text fed back by the user terminal, so that when the initial text obtained by processing the audio is modified, the corresponding text can be obtained in time. Feedback.
  • the simultaneous interpretation model is updated according to the initial text and the modified text, and the subsequent audio is processed by the updated simultaneous interpretation model, thereby improving the accuracy of the text obtained by processing the audio.
  • the embodiment of the present application provides a data processing method based on simultaneous interpretation, which is applied to a user terminal in a simultaneous interpretation system, and the simultaneous interpretation system further includes a simultaneous interpretation device and a server, including:
  • the initial text is obtained by the server processing the audio sent by the simultaneous interpretation device by using a simultaneous interpretation model
  • the modified text is used to instruct the server to update the simultaneous interpretation model according to the initial text and the modified text.
  • the embodiment of the present application provides a data processing device based on simultaneous interpretation, the device is applied to a user terminal in a simultaneous interpretation system, and the simultaneous interpretation system further includes a simultaneous interpretation device and a server, including:
  • a first display module for displaying a simultaneous interpretation auxiliary page
  • a receiving module configured to receive initial text sent by the server; the initial text is obtained by the server processing the audio sent by the simultaneous interpretation device by using a simultaneous interpretation model;
  • a second display module configured to display the initial text in the simultaneous interpretation assistance page
  • An obtaining module configured to acquire a modified text corresponding to the initial text when a modification instruction is detected
  • a sending module configured to send the modified text to the server, and the modified text is used to instruct the server to update the simultaneous interpretation model according to the initial text and the modified text.
  • the embodiment of the present application provides a storage medium storing a computer program, when the computer program is executed by a processor, causing the processor to perform the steps of the above-described simultaneous data processing method based on simultaneous interpretation.
  • An embodiment of the present application provides a computer device, including a memory and a processor, where the memory stores a computer program, when the computer program is executed by the processor, causing the processor to perform the simultaneous interpretation based on the above The steps of the data processing method.
  • the user terminal can perform corresponding modification, and synchronize the obtained modified text to the server to instruct the server to update the simultaneous interpretation model according to the initial text and the modified text, thereby improving the accuracy of the text obtained by processing the audio. Sex.
  • 1 is an application environment diagram of a data processing method based on simultaneous interpretation in an embodiment
  • FIG. 2 is a schematic flow chart of a data processing method based on simultaneous interpretation in an embodiment
  • 3 is a flow chart showing the steps of audio processing and voice recognition in one embodiment
  • FIG. 4 is a schematic flowchart of a step of merging text and video in an embodiment, and transmitting the merged content to a user terminal for display;
  • FIG. 5 is a schematic flowchart of a step of synchronizing a conference number to a user terminal in an embodiment
  • 6 is a flow chart showing the steps of updating the stored text in an embodiment and feeding back the updated text to the user terminal;
  • FIG. 7 is a schematic flowchart of a step of adjusting a weight corresponding to a user terminal identifier in an embodiment
  • FIG. 8 is a schematic flow chart of a data processing method based on simultaneous interpretation in another embodiment
  • FIG. 9 is a schematic diagram of a page of a simultaneous interpretation auxiliary page in one embodiment.
  • FIG. 10 is a flow chart showing the steps of constructing a simultaneous interpretation auxiliary page in one embodiment
  • 11 is a flow chart showing the steps of updating locally stored text in an embodiment
  • FIG. 12 is a schematic flow chart of a data processing method based on simultaneous interpretation in another embodiment
  • FIG. 13 is a schematic flow chart of a data processing method based on simultaneous interpretation in still another embodiment
  • FIG. 14 is a schematic structural diagram of a conventional simultaneous interpretation system in an embodiment
  • 15 is a schematic structural diagram of a simultaneous interpretation system applied to a data processing method based on simultaneous interpretation in an embodiment
  • 16 is a timing diagram of a data processing method based on simultaneous interpretation in one embodiment
  • 17 is a structural block diagram of a data processing apparatus based on simultaneous interpretation in an embodiment
  • 19 is a structural block diagram of a data processing apparatus based on simultaneous interpretation in another embodiment
  • 20 is a structural block diagram of a data processing apparatus based on simultaneous interpretation in another embodiment
  • 21 is a structural block diagram of a computer device in an embodiment
  • Figure 22 is a block diagram showing the structure of a computer device in another embodiment.
  • FIG. 1 is an application environment diagram of a data processing method based on simultaneous interpretation in an embodiment.
  • the data processing method based on simultaneous interpretation is applied to a simultaneous interpretation system.
  • the simultaneous interpretation system includes a user terminal 110, a server 120, and a simultaneous interpretation device 130.
  • the user terminal 110 and the simultaneous interpretation device 130 are connected to the server 120 via a network.
  • the user terminal 110 may be a desktop terminal or a mobile terminal, and the mobile terminal may be at least one of a mobile phone, a tablet computer, a notebook computer, and the like.
  • the server 120 may be implemented by a separate server or a server cluster composed of multiple servers, which is not specifically limited in this embodiment of the present application. As an example, when the server 120 is a plurality of servers, a voice server and a translation server may be included.
  • the simultaneous interpretation device 130 may be a terminal having an audio collection function, such as a notebook computer, a desktop computer carrying a microphone, and the like.
  • a data processing method based on simultaneous interpretation is provided. This embodiment is mainly illustrated by the method being applied to the server 120 in FIG. 1 described above. Referring to FIG. 2, the data processing method based on simultaneous interpretation includes the following steps:
  • the server acquires audio sent by the simultaneous interpretation device.
  • audio refers to the audio that the speaker sends during the simultaneous interpretation process.
  • the method before S202, the method further includes: when receiving the connection request that carries the user identifier sent by the user terminal, the server determines whether the user identifier has the right to access the simultaneous interpretation conference. If it is determined that the user identity has the right to access the simultaneous interpretation conference, the server establishes a communication connection with the user terminal. If it is determined that the user identity does not have the right to access the simultaneous interpretation conference, the server refuses to establish a communication connection with the user terminal.
  • the communication connection may be a TCP (Transmission Control Protocol) connection, a UDP (User Datagram Protocol), or a websocket connection.
  • TCP Transmission Control Protocol
  • UDP User Datagram Protocol
  • websocket connection The embodiment of the present application does not specifically limit this.
  • the method for acquiring audio may include: the simultaneous interpretation device collects the audio emitted by the outside, and the collected audio is sent to the server by the simultaneous interpretation device, so that the server acquires the audio.
  • the simultaneous interpretation device collects the audio sent by the outside world
  • the audio is subjected to noise reduction processing, and then, the noise-reduced audio is power-amplified, and the amplified audio is detected for voice activity, and the non-speech portion of the audio is used. Filtering is performed, and then the audio after filtering the voice portion is sent to the server, so that the server acquires the audio.
  • the server processes the obtained audio through a simultaneous interpretation model to obtain initial text.
  • the simultaneous interpretation model is used to process the acquired audio, such as speech recognition and translation of the result of the recognition.
  • the simultaneous interpretation model can include a speech model and a translation model.
  • the speech model includes a general speech model and an auxiliary speech model.
  • the universal speech model is used for speech recognition of audio to obtain corresponding recognized text.
  • the auxiliary speech model is used to correct the recognized text, that is, when the recognized text appears the same error as the last time, the error is corrected.
  • the translation model is used to translate the recognized text to obtain translated text.
  • the initial text refers to the recognized text and the translated text. That is, the initial text obtained by processing the audio includes: identifying the text and translating the text.
  • the recognized text is text obtained by performing speech recognition on the audio.
  • the translated text is the text of the target language obtained by translating the recognized text, and may also be referred to as a translation.
  • the initial text may further include a modified update text for modifying the recognized text, and identifying the updated text may also be referred to as an updated recognized text.
  • the server performs speech recognition on the acquired audio through the simultaneous interpretation model, and obtains the recognized text after the speech recognition.
  • the server translates the recognized text through the simultaneous interpretation model, obtains the translated text of the target language, and determines the recognized text and the translated text as the initial text obtained by processing the audio.
  • the server processes the received complete speech for the initial text.
  • a complete speech can be a preset duration speech, or a speech between a speaker and a statement pause.
  • the speaker said: "Gentlemen, ladies, good evening everyone.".
  • the speaker has a pause after saying "Good night everyone", then the complete voice can be "Gentle, ladies, good evening everyone.”
  • the server sends the initial text to the user terminal.
  • the server sends the text to the user terminal, and the sent text is used to indicate that the user terminal displays the received text on the simultaneous interpretation auxiliary page, so that the audience in the simultaneous interpretation conference can view the simultaneous interpretation through the user terminal.
  • Text content is used to indicate that the user terminal displays the received text on the simultaneous interpretation auxiliary page, so that the audience in the simultaneous interpretation conference can view the simultaneous interpretation through the user terminal.
  • each time the server processes a piece of audio to obtain the corresponding text the server sends the resulting text to the user terminal.
  • the above-mentioned piece of audio may be: a speech of the speaker's speech, and the duration of the speech is within a certain time range, such as one minute or half minute.
  • each time the server processes a piece of audio to obtain a corresponding text if it is determined that the number of words of the text reaches a preset word count threshold, the server sends the text to the user terminal.
  • the server receives the modified text fed back by the user terminal, where the modified text is obtained by modifying the initial text by the user terminal.
  • the modified text since the text sent by the server may include both the recognized text and the translated text, the modified text may be the text obtained based on the modified text or the text modified based on the translated text. It should be noted that the modification of the text may be a modification of a word in the text, or a word, or a sentence, or the whole of the text.
  • the server receives the modified text obtained by the user terminal and modified by the recognized text.
  • the server receives the modified text obtained by the user terminal and modified after the translated text.
  • the server updates the simultaneous interpretation model according to the initial text and the modified text.
  • the server may update the speech model based on the recognized text and the modified text when the weighted cumulative value reaches a threshold and the modified text is obtained based on the recognized text modification.
  • the server may update the translation model based on the translated text and the modified text.
  • the weighted cumulative value reaches the threshold value refers to the weighted cumulative value being greater than or equal to the threshold.
  • the server determines the size between the weighted cumulative value and the preset threshold.
  • the server obtains the modified text obtained by modifying the initial text fed back by the user terminal, so that when the initial text is modified, the corresponding feedback can be obtained in time.
  • the simultaneous interpretation model is updated according to the initial text and the modified text
  • the subsequent audio is processed by the updated simultaneous interpretation model, thereby improving the accuracy of the text obtained by processing the audio.
  • S204 can include:
  • the server performs noise reduction processing on the acquired audio.
  • the server performs noise reduction processing on the acquired audio by using a noise reduction algorithm, where the noise reduction algorithm may include a Wiener filter noise reduction algorithm, a base spectrum subtraction, or an adaptive notch algorithm of the LMS.
  • the noise reduction algorithm may include a Wiener filter noise reduction algorithm, a base spectrum subtraction, or an adaptive notch algorithm of the LMS.
  • the server may further perform power amplification processing on the noise-reduced audio.
  • the server acquires a voice part included in the audio after the noise reduction process.
  • the audio may include a voice part and a non-speech part.
  • the server may also perform voice activity detection on the noise-reduced audio or on the noise-reduced and power-amplified audio to determine whether there is a non-speech portion in the audio. When it is determined that there is a non-speech portion in the audio, the non-speech portion is deleted, thereby acquiring the voice portion in the audio.
  • the server acquires, from the obtained voice part, an audio part whose energy value is greater than or equal to an energy threshold.
  • the speaker As the speaker is in the process of speaking, there may be others speaking. Then, in the audio in which the non-speech portion has been deleted, in addition to the audio of the presenter, the audio of others may be included. Among them, the energy of other people's audio is less than the energy of the speaker. Therefore, energy detection can be performed on the acquired speech portion, and an audio portion whose energy value is greater than or equal to the energy threshold is obtained from the acquired speech portion.
  • the server processes the audio part by using a simultaneous interpretation model to obtain initial text.
  • the server performs speech recognition on the audio portion obtained in step s306 by a speech recognition algorithm to obtain an initial text.
  • the server performs noise reduction on the obtained audio, which is beneficial to improve the correct rate of speech recognition.
  • obtaining the voice part in the audio after the noise reduction processing is beneficial to avoiding the codec decoding and decoding of the entire audio in the process of voice recognition, thereby improving the computing efficiency of the server.
  • the audio portion whose energy is greater than or equal to the energy threshold is obtained from the obtained voice portion, thereby avoiding the interference of the voice of another person on the voice of the speaker during the voice recognition process, thereby avoiding obtaining the text corresponding to the voice of the non-presenter.
  • the simultaneous interpretation model includes a universal speech model and an auxiliary speech model; the initial text includes at least one of identifying text and identifying updated text.
  • the recognition text is obtained by performing speech recognition on the acquired audio through a universal speech model; the recognition update text is obtained by updating the recognition text by the auxiliary speech model; and the expression pattern is processed by the simultaneous interpretation model to obtain the initial text.
  • the method comprises: performing speech recognition on the audio through the universal speech model to obtain the recognized text; and updating the recognized text through the auxiliary speech model to obtain the recognized updated text.
  • S210 can include updating the auxiliary speech model according to the initial text and the modified text.
  • the universal speech model is used for speech recognition of the acquired audio to obtain recognized text.
  • the auxiliary speech model is used to update the recognized text, for example, after the server updates the auxiliary speech model according to the initial text and the modified text, when the auxiliary speech model detects that the recognized text has an error, and the error has corresponding modified text,
  • the server updates the erroneous identification text through the auxiliary speech model, that is, replaces the erroneous identification text with the modified text.
  • the server will not update the recognized text.
  • the method further comprises: the server will obtain a new audio input universal speech model, and the input audio is recognized as the corresponding recognized text by the universal speech model.
  • the server inputs the recognized identification text into the auxiliary speech model, and detects whether the identification text includes content corresponding to the modified text through the auxiliary speech model, and if the identification text includes content corresponding to the modified text, updating the corresponding content. To modify the text.
  • the server updates the auxiliary speech model according to the initial text and the modified text to update the subsequent text through the updated auxiliary speech model, that is, if the subsequent text contains the content corresponding to the modified text When the corresponding content is replaced with the modified text, the error before the update is avoided again, thereby improving the accuracy of the text obtained in the simultaneous interpretation.
  • the simultaneous interpretation model includes a translation model; the initial text includes translated text; the modified text includes modified translated text; and S210 can include: updating the translation model based on the translated text and the modified translated text.
  • the method further comprises: the server inputting the recognized text or identifying the updated text into the translation model, when detecting the recognized text by the translation model or identifying the updated text comprises When there is content corresponding to the modified translated text, the corresponding content is updated to the modified translated text.
  • the translation model may include a general translation model and an auxiliary translation model; the step of updating the translation model according to the translated text and the modified translated text may include: updating the auxiliary translation model according to the translated text and the modified translated text.
  • the server inputs the recognized text or the recognized updated text into the universal translation model, and translates the recognized text or the recognized updated text into the translated text through the universal translation model.
  • the server inputs the translated text into the auxiliary translation model, and detects whether the translated text contains the content matching the modified translated text through the auxiliary translation model, and if the translated text contains the content matching the modified translated text, the matched content is matched. Update to the modified translated text to get the final translated text.
  • the server updates the translation model according to the translated text and the modified translated text, so as to translate the subsequent text through the updated translation model, thereby avoiding errors occurring before the update, thereby improving simultaneous interpretation.
  • the accuracy of the text obtained in .
  • the method further includes:
  • the server receives the video matched by the audio sent by the simultaneous interpretation device.
  • the video can be the speaker's video or the speaker's PPT (Power Point, presentation).
  • PPT Power Point, presentation
  • the simultaneous interpretation device captures a video that matches the acquired audio and sends the captured video to the server.
  • the server receives the video captured by the simultaneous interpretation device.
  • the server embeds the initial text into the video.
  • the server may embed the text in the video based on the time at which the text from the audio is processed in the video.
  • the appearance time refers to the time when the text appears in the video in the form of subtitles when the user terminal plays the video.
  • the server can embed the initial text in the bottom, middle, or top of the video.
  • the server can also set the number of lines in which the initial text is embedded in the video, such as greater than or equal to two lines.
  • the server sends the video embedded with the initial text to the user terminal.
  • the server transmits the video of the embedded text to the user terminal through a connection channel established with the user terminal.
  • the connection channel may be a TCP connection channel or a UDP connection channel.
  • the server embeds the text obtained in the simultaneous interpretation process into the video, and transmits the video embedded in the text to the user terminal.
  • the combination of text and video is beneficial to improving the viewer's text. Understand; on the other hand, in addition to viewing the text in simultaneous interpretation, viewers can also watch the video content, enriching the content displayed by the user terminal.
  • the audio acquired by the server corresponds to the group identification; as an example, the group identification refers to the conference number.
  • S206 can include:
  • the server sends the initial text to the user terminal accessed through the conference number.
  • the conference number refers to the number in the simultaneous interpretation conference.
  • multiple simultaneous interpretation conferences can be supported at the same time, and different simultaneous interpretation conferences have different conference numbers.
  • the server after the user terminal scans the two-dimensional code or barcode in the conference room, the server establishes a communication connection with the user terminal, and sends the simultaneous interpretation list to the user terminal, so that the viewer holding the user terminal selects the same
  • the conference number in the audio translation list enters the corresponding simultaneous interpretation conference.
  • the server receives an access request that the user terminal carries the conference number and the user identifier, and determines, according to the user identifier, whether the user has the right to access the simultaneous interpretation conference corresponding to the conference number.
  • the server allows access by the user terminal if there is access to the simultaneous interpretation conference corresponding to the conference number. If there is no right to access the simultaneous interpretation conference corresponding to the conference number, the server rejects the access of the user terminal.
  • the user identifier may be a mobile phone number or a social account.
  • the server receives the comment information fed back by the user terminal.
  • the comment information refers to the comments initiated by the viewer during the viewing of the speaker's speech content.
  • the speech includes text processed by the audio and corresponding translated text.
  • the server synchronizes the comment information between the user terminals accessed through the conference number.
  • the server determines the connected user terminal according to the conference number, and synchronizes the received comment information to all the determined user terminals, to indicate that the user terminal displays the received comment information in the form of a barrage in simultaneous interpretation. Auxiliary page.
  • the server determines the user terminal that receives the comment information according to the conference number.
  • the viewer can initiate a comment through the user terminal to improve the interaction between the user and the simultaneous interpretation system; It is possible to effectively avoid sending comment information to user terminals of other simultaneous interpretation conferences.
  • the audio acquired by the server corresponds to the group identification; as an example, the group identification refers to the conference number.
  • S206 can include: sending the initial text to the user terminal accessed by the conference number; as shown in FIG. 6, the method further includes:
  • the server stores the initial text corresponding to the conference number.
  • the server starts processing the audio of a certain speaker to obtain the corresponding text
  • the document in the target format is created
  • the initial text is added into the document
  • the mapping relationship between the document and the conference number is established, and
  • the documents and conference numbers that establish the mapping relationship are stored.
  • the server processes the newly acquired audio to get the corresponding text, the text is directly added to the created document.
  • the server updates the text stored corresponding to the conference number to the modified text.
  • the server When receiving the synchronization request sent by the user terminal accessed by the conference number, the server feeds back the updated text corresponding to the conference number to the user terminal that initiates the synchronization request.
  • the server updates the stored text according to the modified text, thereby ensuring that the text with the original error can be corrected in time.
  • the server receives the synchronization request of the user terminal, the updated text can be sent to the user terminal, ensuring that the text obtained by the user terminal is the updated correct text, and the accuracy of the text is improved.
  • the method further includes:
  • the server collects the number of times of text modification corresponding to each user terminal identifier.
  • the number of times of text modification refers to the number of times the user carrying the user terminal modifies the viewed text.
  • the text viewed may be text obtained by the server processing different audios.
  • the viewed text includes identifying text, identifying updated text, and translating text.
  • the user terminal identifier is used to indicate the user who carries the user terminal.
  • the server determines the number of modified texts belonging to the same user terminal identifier according to the received modified text and the corresponding user terminal identifier, and the number is used as the number of text modification times corresponding to the same user terminal identifier.
  • the server detects a text modification correct rate corresponding to each user terminal identifier.
  • the text modification correct rate refers to that the user terminal corresponding to the user terminal identifier modifies the correct correct rate of the corresponding text, that is, the correct rate of the modified text obtained in the preset time.
  • the server detects the modified text corresponding to the user terminal identifier, determines whether the modified text is correct, and calculates a correct rate, thereby obtaining a corresponding identifier of the user terminal.
  • the text is modified correctly.
  • the weight refers to the modification weight of each user terminal for modifying the received text.
  • Different levels of user terminal identifiers may have different weights.
  • the user terminal ID of the common user level has a corresponding weight.
  • the user terminal identifier with the administrator authority user level has a corresponding weight.
  • the server adjusts the weight corresponding to the user terminal identification according to the number of text modification times and the text modification accuracy rate.
  • the adjusting the weight corresponding to the identifier of the user terminal includes: when the number of times of text modification is less than the threshold of the number of modification times, and the correct rate of the text modification is less than the threshold of the text modification correctness, the weight corresponding to the identifier of the user terminal is lowered.
  • the weight corresponding to the identifier of the user terminal is increased.
  • the server determines that the number of text modification times and the text modification correct rate reach a preset condition, the weight corresponding to the user terminal identifier is increased, and the user who has a large contribution to the modified text is given a greater weight.
  • the weight of the text helps to improve the accuracy of the text.
  • S210 may include: determining a weighted cumulative value of the modified text according to a weight corresponding to each user terminal identifier; and updating the simultaneous interpretation model according to the initial text and the modified text when the weighted cumulative value reaches the threshold.
  • the weighted cumulative value is obtained by accumulating or accumulating weights.
  • accumulating refers to adding each weight.
  • Accumulation refers to: When a user terminal makes multiple modifications to the same text, the number of modifications is multiplied by the corresponding weight, and then added with other weights.
  • the user terminal A, the user terminal B, and the user terminal C modify a certain text
  • the weights corresponding to the user terminal identifier are respectively q1, q2, and q3
  • the user terminal A is modified twice
  • the server when receiving the modified text sent by the user terminal, determines a level corresponding to the user terminal identifier of the user terminal, and obtains a corresponding weight according to the determined level. After that, the server accumulates or accumulates the obtained weights, and determines the calculation result as the weighted cumulative value of the modified text.
  • the server when the server receives a plurality of modified texts sent by a certain user terminal, and the plurality of modified texts are modified based on the same text, the server uses the last modified text as the final modification of the user terminal. version.
  • the above data processing method based on simultaneous interpretation according to the modified text of the feedback, the weighted cumulative value of the modified text by the user terminal is calculated, and when the weighted cumulative value reaches the threshold, the simultaneous interpretation model is updated according to the initial text and the modified text, and the update is used.
  • the subsequent simultaneous interpretation model processes the subsequent audio to improve the accuracy of the text obtained by processing the audio.
  • the simultaneous interpretation model is updated, so that the simultaneous interpretation model can be effectively avoided due to invalid modification, and the accuracy of the text obtained by processing the audio is further ensured.
  • a data processing method based on simultaneous interpretation is provided. This embodiment is mainly illustrated by the method being applied to the user terminal 110 in FIG. 1 described above.
  • the data processing method based on simultaneous interpretation includes the following steps:
  • the user terminal displays the simultaneous interpretation auxiliary page.
  • the simultaneous interpretation auxiliary page can be used to display text or to display a video embedded with text.
  • the simultaneous interpretation auxiliary page can also display a simultaneous interpretation list.
  • the user terminal scans the barcode or the two-dimensional code in the simultaneous interpretation conference through the social application, and enters the applet in the social application according to the link address in the barcode or the two-dimensional code.
  • the user terminal displays the simultaneous interpretation auxiliary page in the applet, and displays the simultaneous interpretation list in the simultaneous interpretation auxiliary page, and the simultaneous interpretation list includes simultaneous interpretation conferences with different conference numbers. According to the input selection instruction, the corresponding simultaneous interpretation conference in the simultaneous interpretation list is entered.
  • the step of displaying the simultaneous interpretation list in the simultaneous interpretation assistance page may include: the user terminal sends an acquisition request carrying the mobile phone number or the social account to the server, and receiving the simultaneous interpretation sent by the server with the access right. List.
  • Figure 9 shows a schematic diagram of entering and displaying a simultaneous interpretation assistance page.
  • the user terminal when entering the simultaneous interpretation auxiliary page for the first time, the user terminal first displays the simultaneous interpretation list, and displays the corresponding simultaneous interpretation conference according to the selection instruction. If you do not enter the simultaneous interpretation auxiliary page for the first time, you will enter the simultaneous interpretation conference directly.
  • the user terminal receives the initial text sent by the server; the initial text is obtained by the server processing the audio sent by the simultaneous interpretation device by using the simultaneous interpretation model.
  • the user terminal displays the initial text in the simultaneous interpretation auxiliary page.
  • the user terminal when the text is displayed in the simultaneous interpretation assistance page, the user terminal synthesizes the voice of the corresponding language according to the displayed text, and broadcasts the voice.
  • the text shown in the Simultaneous Interpretation Assistant page is also shown in FIG.
  • the user terminal can switch between different languages to selectively display text, and can also synthesize and synthesize text using different timbres.
  • the user terminal acquires the modified text corresponding to the initial text.
  • the user terminal detects the input modification instruction for the initial text in real time, and obtains the modified text corresponding to the initial text according to the modification instruction.
  • S810 The user terminal sends the modified text to the server; and the modified text is used to instruct the server to update the simultaneous interpretation model according to the initial text and the modified text.
  • the user terminal also sends the local user terminal identifier to the server.
  • the modified text is used to instruct the server to determine the weighted cumulative value of the modified text according to the weight corresponding to the user terminal identifier;
  • the simultaneous interpretation model is updated based on the initial text and the modified text.
  • the user terminal displays the text obtained by the server processing the audio through the simultaneous interpretation auxiliary display page, and when the modification instruction is detected, the user terminal obtains the corresponding modified text, and realizes the text obtained when the audio is processed. When an error occurs, the user terminal can make corresponding modifications.
  • the user terminal also synchronizes the obtained modified text to the server, and instructs the server to update the simultaneous interpretation model according to the text obtained by processing the audio and the modified text when modifying the weighted cumulative value of the text to the threshold, thereby improving the accuracy of the text. .
  • the display simultaneous interpretation assistance page includes:
  • S1002 The user terminal acquires the sub-application identifier by using the parent application.
  • the parent program is an application that hosts the child application and provides an environment for the implementation of the child application.
  • the parent application is a native application that runs directly to the operating system.
  • the parent program can include a social application or a live application.
  • a sub-application is an application that can be implemented in the environment provided by the parent application. As an example, the sub-application can be a simultaneous interpreter applet.
  • the user terminal may display a sub-application list through the parent application, receive a selection instruction for an option in the sub-application list, determine, according to the selection instruction, the selected option in the sub-application list, thereby obtaining the selected item.
  • S1004 The user terminal acquires a corresponding simultaneous interpretation auxiliary page configuration file according to the sub-application identifier.
  • the user terminal can obtain the simultaneous interpretation auxiliary page configuration file corresponding to the sub-application identifier from the local or server through the parent application. Further, the user terminal may determine the corresponding folder locally or by the server according to the sub-application identifier, and then obtain the simultaneous interpretation auxiliary page configuration file from the folder. Alternatively, the user terminal may obtain a simultaneous interpretation auxiliary page configuration file corresponding to the sub-application identifier according to the correspondence between the sub-application identifier and the page identifier.
  • the page identifier is used to uniquely identify a simultaneous interpretation auxiliary page included in a sub-application, and different sub-applications may use the same page identifier.
  • the simultaneous interpretation auxiliary page configuration file is a file for configuring a page presented by the sub-application.
  • the configuration file can be source code or a file obtained by compiling the source code.
  • the page rendered by the sub-application is referred to as a simultaneous interpretation auxiliary page, and the sub-application may include one or more simultaneous interpretation auxiliary pages.
  • S1006 The user terminal obtains a common component identifier from the simultaneous interpretation auxiliary page configuration file.
  • the user terminal can parse the simultaneous interpretation auxiliary page configuration file to obtain the common component identification from the simultaneous interpretation auxiliary page configuration file.
  • the common component identifier is used to uniquely identify the corresponding common component.
  • a public component is a component provided by a parent application that can be shared by different child applications.
  • the public component has a visual form and is a component of the simultaneous interpretation auxiliary page.
  • the common component can also be packaged with logic code that is used to handle trigger events for the common component.
  • Different sub-applications share common components, which can be called the same common component at the same time or at different times.
  • the common component can also be shared by the parent application and the child application.
  • S1008 The user terminal selects a common component corresponding to the common component identifier in a common component library provided by the parent application.
  • the common component library is a collection of common components provided by the parent application. Each common component in the common component library has a unique common component identity.
  • the common component library can be downloaded from the server to the local by the parent application at runtime, or can be decompressed from the corresponding application installation package when the parent application is installed.
  • the user terminal constructs a simultaneous interpretation auxiliary page according to the selected common component.
  • the user terminal can obtain the default component style data that is selected by the selected common component, so that the selected common component is organized according to the default component style data and rendered to form a simultaneous interpretation auxiliary page.
  • the default component style data is data that describes the default display form of the public component.
  • the default component style data can include attributes such as location, size, color, font, and font size of the common component by default in the simultaneous interpretation helper page.
  • the user terminal can construct a simultaneous interpretation auxiliary page through the browser control integrated by the parent application and according to the selected common component.
  • the user terminal runs the parent application, and the parent application provides a common component library, and the parent application identifier can be obtained through the parent application, thereby obtaining the corresponding simultaneous interpretation auxiliary page configuration file, thereby According to the simultaneous interpretation auxiliary page configuration file, a common component is selected from the common component library to construct a simultaneous interpretation auxiliary page.
  • the sub-application identifier can identify different sub-applications, and the parent application can implement different sub-applications according to the corresponding simultaneous interpretation auxiliary page configuration file of different sub-application identifiers.
  • the common component provided by the parent program can be used to quickly build the simultaneous interpretation auxiliary page, which shortens the application installation time and improves the application efficiency.
  • the method further includes:
  • the user terminal receives the modified text synchronized by the server and the corresponding sorting sequence number; the received modified text and the corresponding text before the modification share the sorting sequence number.
  • the sort number is used to indicate the position of a certain text in the document, or to indicate the storage location of a certain text in a certain storage area.
  • a document is a text file used to save and edit text, including TEXT documents, WORD documents, and XML documents.
  • the server determines that the weighted cumulative value reaches a threshold
  • the modified text and the corresponding sorted sequence number are synchronized to the user terminal.
  • the user terminal executes S1104 after receiving the modified text and the corresponding sorting number.
  • S1104 The user terminal locally searches for text corresponding to the received sorting sequence number.
  • the user terminal looks up the text in the storage area corresponding to the received sorting number.
  • the user terminal can also search for the document that saves the text according to the conference number, and search for the corresponding identifier according to the received sequence number in the document. Text.
  • S1106 The user terminal replaces the locally found text with the received modified text.
  • the user terminal searches for the corresponding text according to the received sequence number, and replaces the found text with the received modified text, ensuring that when a text error occurs, each user terminal is Modifications can be made simultaneously to improve the accuracy of the text obtained.
  • a data processing method based on simultaneous interpretation is provided. This embodiment is mainly illustrated by the method being applied to the server 120 in FIG. 1 described above. Referring to FIG. 12, the data processing method based on simultaneous interpretation includes the following steps:
  • the server acquires audio.
  • S1204 The server performs noise reduction processing on the acquired audio.
  • the server obtains a voice part in the audio after the noise reduction process.
  • the server obtains, from the voice part, an audio part whose energy value is greater than or equal to the energy threshold.
  • the server processes the audio portion through a simultaneous interpretation model to obtain initial text.
  • the server sends the initial text to the user terminal.
  • the server receives a video that matches the acquired audio.
  • the server embeds the initial text into the video.
  • the server sends the video embedded with the initial text to the user terminal.
  • the server may also send the initial text to the user terminal accessed via the conference number.
  • S1220 The server receives the modified text fed back by the user terminal, where the modified text is obtained by modifying the initial text by the user terminal.
  • S1222 The server determines a weighted cumulative value of the modified text according to a weight corresponding to the identifier of the user terminal.
  • the server updates the simultaneous interpretation model according to the initial text and the modified text.
  • the server receives the comment information fed back by the user terminal.
  • S1228 The server synchronizes the comment information between the user terminals accessed through the conference number.
  • the server stores the initial text corresponding to the conference number.
  • another data processing method based on simultaneous interpretation is provided. This embodiment is mainly illustrated by the method being applied to the user terminal 110 in FIG. 1 described above.
  • the data processing method based on simultaneous interpretation includes the following steps:
  • the user terminal displays the simultaneous interpretation auxiliary page.
  • S1304 The user terminal receives the initial text sent by the server; the text is obtained by the server processing the audio sent by the simultaneous interpretation device by using the simultaneous interpretation model.
  • the user terminal displays the initial text in the simultaneous interpretation auxiliary page.
  • the user terminal acquires the modified text corresponding to the initial text.
  • the user terminal sends the local user terminal identifier and the modified text to the server, and the modified text is used to instruct the server to determine the weighted cumulative value of the modified text according to the weight corresponding to the user terminal identifier; when the weighted cumulative value reaches the threshold, according to The initial text and the modified text update the simultaneous interpretation model.
  • S1312 The user terminal receives the modified text synchronized by the server and the corresponding sorting sequence number; the received modified text and the corresponding text before the modification share the sorting sequence number.
  • S1314 The user terminal locally searches for text corresponding to the received sorting sequence number.
  • the user terminal replaces the locally found text with the received modified text.
  • the simultaneous interpretation device collects audio and performs corresponding processing, and then uploads the processed audio to the voice server for voice recognition, and after the voice server is recognized, sends the identification text to the translation server, and the translation server The recognized text is translated into the target language, and the translated text is returned to the simultaneous interpretation client, and finally the simultaneous interpretation device displays the returned result to the display screen.
  • a typical simultaneous interpretation system is shown in Figure 14.
  • the simultaneous interpretation system of the related art the following two ways of displaying text are mainly used: one is a split screen display, the image of the presenter or the PPT occupies part of the screen, and the text of simultaneous interpretation accounts for another part of the screen.
  • the second is subtitle display, the speaker's image or PPT fills the screen, and the simultaneous interpretation text is displayed as subtitles at the bottom of the screen.
  • the above two display methods have the following problems: 1) It is not clear that for the conferences with a large number of participants, the viewers with poor rear view and low viewing angle will not be able to see the text displayed on the conference display. In addition, the content of the meeting is not available to viewers who are unable to attend the meeting for any reason. 2) No interaction, the audience can only passively obtain the simultaneous translation. 3) The simultaneous interpretation model cannot be optimized. The viewer cannot modify the recognized text and/or translated text in real time, and cannot optimize the speech model and translation model in simultaneous interpretation.
  • the simultaneous interpretation system includes a server, a simultaneous interpretation device, a microphone, a user terminal, and a display screen.
  • the foregoing server may be composed of a server cluster, and may include, for example, a voice server and a translation server.
  • yet another data processing method based on simultaneous interpretation includes the following steps:
  • the microphone outputs the collected audio to the simultaneous interpretation device.
  • the simultaneous interpretation device performs noise reduction, gain, and voice activity detection processing on the received audio.
  • the simultaneous interpretation device performs noise reduction, gain and voice activity detection processing on the audio collected by the microphone through a front-end processing algorithm.
  • the front-end processing algorithm may adopt the "DNN (Deep Neural Network) + energy" dual detection method.
  • DNN can be used to suppress noise.
  • Energy detection can be used to filter out portions of the audio that have energy less than a threshold.
  • the simultaneous interpretation device sends the audio to the voice server.
  • the simultaneous interpretation device sends the received video to the voice server.
  • a video in addition to collecting voice as an input source, a video is also acquired as an input source.
  • the video can be either the speaker's PPT or the speaker's own video.
  • the simultaneous interpretation client uniquely identifies the simultaneous interpretation conference and the corresponding speech content (including the recognized text and the translated text) by uploading fields such as "meeting number".
  • the voice server identifies the audio through the universal voice model, obtains the recognized text, and detects and updates the recognized text through the auxiliary voice model to obtain the updated recognized text.
  • the voice server sends the identification text to the translation server.
  • the translation server translates the received recognized text to obtain translated text of the target language.
  • the translation server sends the translated text to the voice server.
  • the voice server combines the recognized text and the translated text, and sends the combined text to the simultaneous interpretation device.
  • the voice server combines the recognized text, the translated text, and the video, and sends the combined text and video to the user terminal.
  • the voice server pushes the merged text and video to all activated user terminals.
  • the simultaneous interpretation device sends the combined text and video to the display.
  • the simultaneous interpretation device sends the recognition text, translated text and video to the display screen of the simultaneous interpretation conference for display.
  • S1624 The user terminal modifies the recognized text, and sends the obtained modified text to the voice server.
  • the user can scan the QR code through the social application or click the corresponding link to enter the webpage or applet, and the user terminal selects the simultaneous interpretation list with access rights by using the mobile phone number or the micro signal, and the user clicks on the same.
  • An entry goes to the simultaneous interpretation help page.
  • the user terminal After entering the simultaneous interpretation auxiliary page, the user terminal will be activated.
  • the simultaneous interpretation assistance page of the user terminal displays the text currently being spoken by default.
  • the user terminal can also switch texts in different languages by itself, synthesize the voices of different corresponding voices according to the displayed text, and broadcast them.
  • a function key saved by a key is set in the simultaneous interpretation auxiliary page.
  • the user terminal saves the received identification text and the translated text to form a simultaneous interpretation full text.
  • the user can modify the recognized text and the translated text at the user terminal, and the modified text can be uploaded to the server.
  • the voice server updates the auxiliary voice model according to the recognized text and the modified text.
  • S1628 The user terminal modifies the translated text, and sends the obtained modified text to the translation server through the voice server.
  • the translation server updates the translation model according to the translated text and the modified text.
  • speech models include a common language model and a secondary language model.
  • the common language model completes a load when the program starts running.
  • the auxiliary language model is updated and reloaded hot, enabling seamless switching throughout the process. It should be noted that the auxiliary speech model can be hot-loaded multiple times during the running of the program, and each time the auxiliary speech model is updated, the auxiliary speech model is hot-loaded once.
  • Hot loading refers to: reloading the class (development environment) at runtime, based on bytecode changes, does not release memory development available, online is not available, hot loading does not restart tomcat, does not repackage.
  • the server inputs the audio symbol sequence of the audio into a common language model for speech recognition to obtain the recognized text.
  • the recognized text is then entered into the auxiliary language model, and the previously erroneous text is replaced with the modified text by the auxiliary speech model.
  • the server performs plausibility testing on the modified text and detects reasonable modified text for updating the speech model and/or the translation model. For example, if an error translation is found and multiple people modify the error translation, the server determines the weighted cumulative value of the modified text according to the weight of the user carrying the user terminal. When the weighted cumulative value reaches the threshold, the server optimizes the translation model.
  • the server determines the contribution degree of the user modification according to the number of text modification and the correct rate of text modification, and adaptively adjusts the corresponding weight.
  • the viewer comments on the speaker or the content of the speech through the user terminal.
  • the user terminal sends the comment information to the server, transfers to the conference display screen and each activated user terminal through the server, and the comment information is displayed on the display screen and the user terminal in the form of a barrage.
  • the user can enter the simultaneous interpretation auxiliary page through the user terminal, and the viewer can comment and submit the speaker or the speech content, and the submitted comments will be sent by the server to the conference display screen and each user terminal.
  • the simultaneous interpretation model (including the speech model and the translation model) can be updated in real time, and the identification text and/or the translated text can be modified through the user terminal, and if a large number of users simultaneously modify or have the administrator authority to modify a certain text or For one of the words, the server updates the speech model and/or the translation model, and the updated speech model and translation model are used for subsequent speech recognition and translation to avoid recurring errors.
  • the target language can be switched at any time.
  • the user can set the language of the translation and select the voice corresponding to the personalized voice synthesis.
  • FIGS. 2 and 8 are schematic flowcharts of a data processing method based on simultaneous interpretation in an embodiment. It should be understood that although the various steps in the flowcharts of FIGS. 2 and 8 are sequentially displayed as indicated by the arrows, these steps are not necessarily performed in the order indicated by the arrows. Except as explicitly stated herein, the execution of these steps is not strictly limited, and the steps may be performed in other orders. Moreover, at least some of the steps in FIGS.
  • 2 and 8 may include multiple sub-steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, these sub-steps Alternatively, the order of execution of the sub-phases is not necessarily performed sequentially, but may be performed alternately or alternately with other steps, or at least a portion of sub-steps or sub-stages of other steps.
  • a data processing apparatus based on simultaneous interpretation includes: an obtaining module 1702, a processing module 1704, a sending module 1706, and receiving. Module 1708, determining module 1712 and updating module 1710; wherein:
  • the obtaining module 1702 is configured to obtain audio sent by the simultaneous interpretation device
  • the processing module 1704 is configured to process audio by using a simultaneous interpretation model to obtain an initial text.
  • a sending module 1706 configured to send initial text to a user terminal
  • the receiving module 1708 is configured to receive a modified text fed back by the user terminal, where the modified text is obtained by modifying the initial text by the user terminal;
  • the update module 1710 updates the simultaneous interpretation model based on the initial text and the modified text.
  • the data processing device based on the simultaneous interpretation receives the modified text obtained by modifying the initial text fed back by the user terminal, so that when the initial text is modified, the corresponding feedback can be obtained in time.
  • the simultaneous interpretation model is updated according to the initial text and the modified text, and the subsequent audio is processed by the updated simultaneous interpretation model, thereby improving the accuracy of the text obtained by processing the audio.
  • the apparatus may further include: a determining module 1712;
  • the determining module 1712 is configured to determine, according to the weight corresponding to each user terminal identifier, a weighted cumulative value of the modified text.
  • the update module 1710 is further configured to update the simultaneous interpretation model according to the initial text and the modified text when the weighted cumulative value reaches the threshold.
  • the processing module 1704 is further configured to perform noise reduction processing on the audio; acquire a voice portion included in the audio after the noise reduction processing; and acquire an audio portion whose energy value is greater than or equal to the energy threshold from the voice portion;
  • the interpreting model processes the audio portion to obtain the initial text.
  • the simultaneous interpretation model includes a universal speech model and an auxiliary speech model
  • the processing module 1704 is further configured to perform voice recognition on the audio through the universal voice model to obtain the recognized text; update the recognized text by using the auxiliary voice model to obtain the recognized updated text; wherein the initial text includes at least the recognized text and the recognized updated text.
  • voice recognition on the audio through the universal voice model to obtain the recognized text; update the recognized text by using the auxiliary voice model to obtain the recognized updated text; wherein the initial text includes at least the recognized text and the recognized updated text.
  • the update module 1710 is further for updating the auxiliary speech model based on the initial text and the modified text.
  • the simultaneous interpretation model includes a translation model; the initial text includes translated text; and the modified text includes modified translated text;
  • the update module 1710 is further for updating the translation model based on the translated text and the modified translated text.
  • the apparatus further includes: an embedded module 1714; wherein
  • the receiving module 1708 is further configured to receive a video that is matched by the audio sent by the simultaneous interpretation device;
  • An embedding module 1714 configured to embed initial text into the video
  • the sending module 1706 is further configured to send the video that has been embedded with the initial text to the user terminal.
  • the apparatus further includes: a synchronization module 1716;
  • the audio corresponds to the group identifier
  • the sending module 1706 is further configured to send the initial text to the user terminal accessed through the group identifier;
  • the receiving module 1708 is further configured to receive the comment information fed back by the user terminal;
  • the synchronization module 1716 is configured to synchronize the comment information between the user terminals accessed through the group identity.
  • the device further includes: a storage module 1718 and a feedback module 1720; wherein the audio corresponds to the group identifier;
  • the sending module 1706 is further configured to send the initial text to the user terminal accessed through the group identifier;
  • a storage module 1718 configured to store initial text and a group identifier
  • the updating module 1710 is further configured to: when the weighted cumulative value of the modified text reaches the threshold, update the text stored corresponding to the group identifier to the modified text;
  • the feedback module 1720 is configured to: when receiving the synchronization request sent by the user terminal accessed by the group identifier, feed back the updated text corresponding to the group identifier to the user terminal that initiates the synchronization request.
  • the device further includes: a statistics module 1722, a detection module 1724, and an adjustment module 1726; wherein
  • the statistics module 1722 is configured to count the number of text modification corresponding to each user terminal identifier
  • the detecting module 1724 is configured to detect a correct rate of text modification corresponding to each user terminal identifier
  • the adjustment module 1726 is configured to: if the number of times of text modification reaches the threshold of the modification number and the text modification correct rate reaches the threshold of the text modification correct rate, the weight corresponding to the identifier of the user terminal is increased.
  • a data processing apparatus based on simultaneous interpretation includes: a first display module 1902, a receiving module 1904, and a second display. Module 1906, acquisition module 1908, and transmission module 1910; wherein:
  • a first display module 1902 configured to display a simultaneous interpretation auxiliary page
  • the receiving module 1904 is configured to receive initial text sent by the server; the initial text is obtained by the server processing the audio sent by the simultaneous interpretation device by using the simultaneous interpretation model;
  • a second display module 1906 configured to display initial text in the simultaneous interpretation auxiliary page
  • the obtaining module 1908 is configured to acquire, when the modification instruction is detected, the modified text corresponding to the initial text;
  • the sending module 1910 is configured to send the modified text to the server, and modify the text to instruct the server to update the simultaneous interpretation model according to the initial text and the modified text.
  • the above-mentioned data processing apparatus based on simultaneous interpretation displays the initial text obtained by the server processing the audio through the simultaneous interpretation display page, and when the modification instruction is detected, obtains the corresponding modified text, and realizes the error of the text obtained when the audio is processed by the server.
  • the user terminal can perform corresponding modification to synchronize the obtained modified text to the server to instruct the server to update the simultaneous interpretation model according to the initial text and the modified text, thereby improving the accuracy of the text obtained by processing the audio.
  • the simultaneous interpretation assistance page is displayed, the first presentation module 1902 is further configured to obtain the sub-application identification by using the parent application; and obtain the corresponding simultaneous interpretation auxiliary page configuration file according to the sub-application identifier;
  • the common component identifier is obtained in the sound interpretation auxiliary page configuration file; in the public component library provided by the parent application, the common component corresponding to the common component identifier is selected; and the simultaneous interpretation auxiliary page is constructed according to the selected common component.
  • the device further includes: a searching module 1912 and a replacement module 1914; wherein
  • the receiving module 1904 is further configured to receive the modified text synchronized by the server and the corresponding sorting sequence number; the received modified text and the corresponding text sharing sorting sequence number before the modification;
  • the searching module 1912 is configured to locally search for text corresponding to the sorting sequence number
  • the replacement module 1914 is configured to replace the locally found text with the received modified text.
  • Figure 21 is a diagram showing the internal structure of a computer device in one embodiment.
  • the computer device can be the server 120 of FIG.
  • the computer device includes the computer device including a processor, a memory, and a network interface connected by a system bus.
  • the memory comprises a non-volatile storage medium and an internal memory.
  • the non-volatile storage medium of the computer device stores an operating system, and can also store a computer program, which when executed by the processor, can cause the processor to implement a data processing method based on simultaneous interpretation.
  • the internal memory can also store a computer program that, when executed by the processor, causes the processor to perform a data processing method based on simultaneous interpretation.
  • FIG. 21 is only a block diagram of a part of the structure related to the solution of the present application, and does not constitute a limitation of the computer device to which the solution of the present application is applied.
  • the computer device may include a ratio. More or fewer components are shown in the figures, or some components are combined, or have different component arrangements.
  • the simultaneous interpretation based data processing apparatus can be implemented in the form of a computer program executable on a computer device as shown in FIG.
  • the program modules constituting the simultaneous interpretation-based data processing device may be stored in a memory of the computer device, such as the acquisition module 1702, the processing module 1704, the transmission module 1706, the receiving module 1708, the determination module 1712, and the update shown in FIG. Module 1710.
  • the computer program of the various program modules causes the processor to perform the steps in the simultaneous interpretation based data processing method of the various embodiments of the present application described in this specification.
  • the computer device shown in FIG. 21 can execute S202 by the acquisition module 1702 in the simultaneous interpretation-based data processing apparatus shown in FIG.
  • the computer device can execute S204 through the processing module 1704.
  • the computer device can execute S206 via the transmitting module 1706.
  • the computer device can perform S208 via the receiving module 1708.
  • the computer device can execute S210 through the update module 1710.
  • a computer apparatus comprising a memory and a processor, the memory storing a computer program that, when executed by a processor of the computer device, enables the processor to perform the aforementioned execution by the server 120 of FIG. A data processing method based on simultaneous interpretation.
  • a computer readable storage medium stored with a computer program that, when executed by a processor of the computer device, enables the processor to perform the aforementioned simultaneous interpretation by the server 120 of FIG. Interpreting data processing methods.
  • Figure 22 is a diagram showing the internal structure of a computer device in one embodiment.
  • the computer device can be the user terminal 110 of FIG.
  • the computer device includes the computer device including a processor, a memory, a network interface, an input device, and a display screen connected by a system bus.
  • the memory comprises a non-volatile storage medium and an internal memory.
  • the non-volatile storage medium of the computer device stores an operating system, and can also store a computer program, which when executed by the processor, can cause the processor to implement a data processing method based on simultaneous interpretation.
  • the internal memory can also store a computer program that, when executed by the processor, causes the processor to perform a data processing method based on simultaneous interpretation.
  • the display screen of the computer device may be a liquid crystal display or an electronic ink display screen
  • the input device of the computer device may be a touch layer covered on the display screen, or a button, a trackball or a touchpad provided on the computer device casing, and It can be an external keyboard, trackpad or mouse.
  • FIG. 22 is only a block diagram of a part of the structure related to the solution of the present application, and does not constitute a limitation of the computer device to which the solution of the present application is applied.
  • the computer device may include a ratio. More or fewer components are shown in the figures, or some components are combined, or have different component arrangements.
  • the simultaneous interpretation based data processing apparatus may be implemented in the form of a computer program executable on a computer device as shown in FIG.
  • Each program module constituting the simultaneous interpretation-based data processing device may be stored in a memory of the computer device, such as the first display module 1902, the receiving module 1904, the second display module 1906, the acquisition module 1908, and the transmission shown in FIG. Module 1910.
  • the computer program of the various program modules causes the processor to perform the steps in the simultaneous interpretation based data processing method of the various embodiments of the present application described in this specification.
  • the computer device shown in FIG. 22 can execute S802 through the first presentation module 1902 in the simultaneous interpretation based data processing apparatus shown in FIG.
  • the computer device can execute S804 through the receiving module 1904.
  • the computer device can execute S806 via the second display module 1906.
  • the computer device can execute S808 via the acquisition module 1908.
  • the computer device can execute S810 through the transmitting module 1910.
  • a computer apparatus comprising a memory and a processor, the memory storing a computer program that, when executed by a processor of the computer device, enables the processor to perform the aforementioned user terminal 110 of FIG. A data processing method based on simultaneous interpretation.
  • a computer readable storage medium stored with a computer program that, when executed by a processor of the computer device, enables the processor to perform the aforementioned operations based on the user terminal 110 of FIG. Simultaneous interpretation of data processing methods.
  • Non-volatile memory can include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory.
  • Volatile memory can include random access memory (RAM) or external cache memory.
  • RAM is available in a variety of formats, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronization chain.
  • SRAM static RAM
  • DRAM dynamic RAM
  • SDRAM synchronous DRAM
  • DDRSDRAM double data rate SDRAM
  • ESDRAM enhanced SDRAM
  • Synchlink DRAM SLDRAM
  • Memory Bus Radbus
  • RDRAM Direct RAM
  • DRAM Direct Memory Bus Dynamic RAM
  • RDRAM Memory Bus Dynamic RAM

Abstract

本申请涉及一种基于同声传译的数据处理方法、计算机设备和存储介质,所述方法应用于同声传译系统中的服务器,所述同声传译系统还包括同声传译设备和用户终端,包括:获取所述同声传译设备发送的音频;通过同声传译模型处理所述音频得到初始文本;将所述初始文本发送至所述用户终端;接收所述用户终端反馈的修改文本,所述修改文本是所述用户终端对所述初始文本进行修改后得到的;根据所述初始文本和所述修改文本更新所述同声传译模型。本申请提供的方案可以提高由处理音频得到的文本的准确性。

Description

基于同声传译的数据处理方法、计算机设备和存储介质
本申请要求于2018年05月10日提交的申请号为201810443090.X、发明名称为“基于同声传译的数据处理方法、装置和存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及同声传译技术领域,特别是涉及一种基于同声传译的数据处理方法、计算机设备和存储介质。
背景技术
SI(Simultaneous Interpretation,同声传译),简称“同传”,是指在不打断演讲者演讲的情况下,不间断地将演讲内容翻译给观众的翻译方式。目前,世界上95%的国际会议都采用同声传译的方式。
相关技术的同声传译技术方案中,常用的同声传译方法为:同声传译设备采集演讲者发出的音频,将采集到的音频上传到服务器。服务器对接收到的音频进行处理得到相应的文本,并将该文本展示在同声传译系统的显示屏。
然而,在相关技术的同声传译技术方案中,服务器处理音频得到的文本很有可能会出错,这会严重影响同声传译中文本内容的准确性。
发明内容
本申请实施例提供了一种基于同声传译的数据处理方法、计算机设备和存储介质,能够解决相关技术在同声传译中文本内容的准确性偏低的问题。
本申请实施例提供了一种基于同声传译的数据处理方法,所述方法应用于同声传译系统中的服务器,所述同声传译系统中还包括同声传译设备和用户终端,包括:
获取所述同声传译设备发送的音频;
通过同声传译模型处理所述音频得到初始文本;
将所述初始文本发送至所述用户终端;
接收所述用户终端反馈的修改文本,所述修改文本是所述用户终端对所述初始文本进行修改后得到的;
根据所述文本和所述修改文本更新所述同声传译模型。
本申请实施例提供了一种基于同声传译的数据处理装置,所述装置应用于同声传译系统中的服务器,所述同声传译系统中还包括同声传译设备和用户终端,包括:
获取模块,用于获取所述同声传译设备发送的音频;
处理模块,用于通过同声传译模型处理所述音频得到初始文本;
发送模块,用于将所述初始文本发送至所述用户终端;
接收模块,用于接收所述用户终端反馈的修改文本,所述修改文本是所述用户终端对所述初始文本进行修改后得到的;
更新模块,根据所述初始文本和所述修改文本更新所述同声传译模型。
本申请实施例提供了一种存储介质,存储有计算机程序,所述计算机程序被处理器执行时,使得所述处理器执行如上述基于同声传译的数据处理方法的步骤。
本申请实施例提供了一种计算机设备,包括存储器和处理器,所述存储器存储有计算机程序,所述计算机程序被所述处理器执行时,使得所述处理器执行如上述基于同声传译的数据处理方法的步骤。
上述基于同声传译的数据处理方法、装置和存储介质,通过接收用户终端反馈的对初始文本修改后得到的修改文本,实现了当处理音频得到的初始文本发生修改时,能及时地获得相应的反馈。另外,根据初始文本和修改文本更新同声传译模型,再通过更新后的同声传译模型对后续的音频进行处理,从而提高了由处理音频得到的文本的准确性。
本申请实施例提供了一种基于同声传译的数据处理方法,所述方法应用于同声传译系统中的用户终端,所述同声传译系统中还包括同声传译设备和服务器,包括:
展示同声传译辅助页面;
接收所述服务器发送的初始文本;所述初始文本是所述服务器通过同声传译模型处理所述同声传译设备发送的音频得到的;
在所述同声传译辅助页面中展示所述初始文本;
当检测到修改指令时,获取与所述初始文本对应的修改文本;
将所述修改文本发送至所述服务器;所述修改文本,用于指示所述服务器根据所述初始文本和所述修改文本更新所述同声传译模型。
本申请实施例提供了一种基于同声传译的数据处理装置,所述装置应用于同声传译系统中的用户终端,所述同声传译系统中还包括同声传译设备和服务器,包括:
第一展示模块,用于展示同声传译辅助页面;
接收模块,用于接收所述服务器发送的初始文本;所述初始文本是所述服务器通过同声传译模型处理所述同声传译设备发送的音频得到的;
第二展示模块,用于在所述同声传译辅助页面中展示所述初始文本;
获取模块,用于当检测到修改指令时,获取与所述初始文本对应的修改文本;
发送模块,用于将所述修改文本发送至所述服务器;所述修改文本,用于指示所述服务器根据所述初始文本和所述修改文本更新所述同声传译模型。
本申请实施例提供了一种存储介质,存储有计算机程序,所述计算机程序被处理器执行时,使得所述处理器执行如上述基于同声传译的数据处理方法的步骤。
本申请实施例提供了一种计算机设备,包括存储器和处理器,所述存储器存储有计算机程序,所述计算机程序被所述处理器执行时,使得所述处理器执 行如上述基于同声传译的数据处理方法的步骤。
上述基于同声传译的数据处理方法、装置和存储介质,通过同传辅助展示页面展示由服务器处理音频得到的初始文本,当检测到修改指令时,获得对应的修改文本,实现了当由服务器处理音频得到的文本出错时,用户终端可以进行相应的修改,将获得的修改文本同步至服务器,以指示服务器根据初始文本和修改文本更新同声传译模型,从而提高了由处理音频得到的文本的准确性。
附图说明
图1为一个实施例中基于同声传译的数据处理方法的应用环境图;
图2为一个实施例中基于同声传译的数据处理方法的流程示意图;
图3为一个实施例中音频处理和语音识别的步骤的流程示意图;
图4为一个实施例中合并文本与视频,并将合并的内容发送用户终端展示的步骤的流程示意图;
图5为一个实施例中将会议号同步至用户终端的步骤的流程示意图;
图6为一个实施例中更新存储的文本,并向用户终端反馈更新后的文本的步骤的流程示意图;
图7为一个实施例中调整与用户终端标识对应的权重的步骤的流程示意图;
图8为另一个实施例中基于同声传译的数据处理方法的流程示意图;
图9为一个实施例中同声传译辅助页面的页面示意图;
图10为一个实施例中构建同声传译辅助页面的步骤的流程示意图;
图11为一个实施例中对本地存储的文本进行更新的步骤的流程示意图;
图12为另一个实施例中基于同声传译的数据处理方法的流程示意图;
图13为又一个实施例中基于同声传译的数据处理方法的流程示意图;
图14为一个实施例中传统同声传译系统的结构示意图;
图15为一个实施例中应用于基于同声传译的数据处理方法的同声传译系统的结构示意图;
图16为一个实施例中基于同声传译的数据处理方法的时序图;
图17为一个实施例中基于同声传译的数据处理装置的结构框图;
图18为另一个实施例中基于同声传译的数据处理装置的结构框图;
图19为另一个实施例中基于同声传译的数据处理装置的结构框图;
图20为另一个实施例中基于同声传译的数据处理装置的结构框图;
图21为一个实施例中计算机设备的结构框图;
图22为另一个实施例中计算机设备的结构框图。
具体实施方式
为了使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处所描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。
图1为一个实施例中基于同声传译的数据处理方法的应用环境图。参照图1,该基于同声传译的数据处理方法应用于同声传译系统。该同声传译系统包括用 户终端110、服务器120和同声传译设备130。用户终端110和同声传译设备130与服务器120通过网络连接。
用户终端110可以是台式终端或移动终端,该移动终端可以是手机、平板电脑、笔记本电脑等中的至少一种。服务器120可以用独立的服务器或者由多个服务器组成的服务器集群来实现,本申请实施例对此不进行具体限定。作为一个示例,当服务器120为多个服务器时,可以包括语音服务器和翻译服务器。同声传译设备130可以是具有音频采集功能的终端,如笔记本电脑、携带麦克风的台式电脑等。
如图2所示,在一个实施例中,提供了一种基于同声传译的数据处理方法。本实施例主要以该方法应用于上述图1中的服务器120来举例说明。参照图2,该基于同声传译的数据处理方法包括如下步骤:
S202,服务器获取同声传译设备发送的音频。
其中,音频指的是在同声传译过程中,演讲者在演讲过程中所发出的音频。
在一个实施例中,S202之前,该方法还包括:当接收到用户终端发送的携带用户标识的连接请求时,服务器判断该用户标识是否具有访问同声传译会议的权限。若判定该用户标识具有访问同声传译会议的权限时,则服务器建立与用户终端的通信连接。若判定该用户标识不具有访问同声传译会议的权限,则服务器拒绝建立与用户终端的通信连接。
其中,该通信连接可以是TCP(Transmission Control Protocol,传输控制协议)连接、UDP(User Datagram Protocol,用户数据报协议)或websocket连接等,本申请实施例对此不进行具体限定。
在一个实施例中,获取音频的方法可以包括:同声传译设备采集外界发出的音频,并由同声传译设备将采集的音频发送至服务器,从而服务器获取到音频。或者,当同声传译设备采集到外界发出的音频时,对音频进行降噪处理,之后,对降噪的音频进行功率放大,并对放大的音频进行语音活动性检测,将非语音部分的音频进行滤除,然后将滤除非语音部分后的音频发送至服务器,从而服务器获取到音频。
S204,服务器通过同声传译模型处理获取到的音频,得到初始文本。
其中,同声传译模型用于对获取到的音频进行处理,如语音识别以及对识别的结果进行翻译等。作为一个示例,同声传译模型可以包括语音模型和翻译模型。语音模型包括通用语音模型和辅助语音模型。
通用语音模型用于对音频进行语音识别,获得对应的识别文本。辅助语音模型用于对识别文本进行矫正,即当识别文本出现与上次一样的错误时,对出错的地方进行矫正。翻译模型用于对识别文本进行翻译,得到翻译文本。
作为一个示例,初始文本指的是识别文本和翻译文本。即,处理音频得到的初始文本包括:识别文本和翻译文本。其中,识别文本为对音频进行语音识别得到的文本。翻译文本为对识别文本进行翻译得到的目标语种的文本,也可称为译文。
此外,初始文本中还可以包括对识别文本进行修改后的识别更新文本,识 别更新文本也可称为更新后的识别文本。
在一个实施例中,服务器通过同声传译模型,对获取到的音频进行语音识别,获得语音识别后的识别文本。服务器通过同声传译模型,对识别文本进行翻译,获得目标语种的翻译文本,将识别文本和翻译文本确定为处理音频得到的初始文本。
在一个实施例中,服务器对接收的一段完整语音进行处理,得到初始文本。其中,一段完整语音可以是预设时长的语音,也可以是演讲者从演讲到语句停顿之间的语音。例如,演讲者演讲道:“先生们,女士们,大家晚上好。……”。在该语音中,演讲者在说完“大家晚上好”后有一个停顿,那么,该完整语音可以是“先生们,女士们,大家晚上好”。
S206,服务器将初始文本发送至用户终端。
作为一个示例,服务器将文本发送至用户终端,发送的文本用于指示该用户终端将接收到的文本展示于同声传译辅助页面,以便同声传译会议中的观众可以通过用户终端观看同声传译的文本内容。
在一个实施例中,每当服务器处理完一段音频得到相应的文本之后,服务器将得到的文本发送至用户终端。其中,上述的一段音频可以是:演讲者演讲的一段话,且该段话的时长在一定时间范围内,如一分钟或半分钟等。
在一个实施例中,每当服务器处理完一段音频得到相应的文本之后,若确定出该文本的字数达到预设字数阈值,则服务器将该文本发送至用户终端。
S208,服务器接收用户终端反馈的修改文本,该修改文本是用户终端对初始文本进行修改后得到的。
其中,由于服务器发送的文本中可以既包括识别文本也包括翻译文本,因此修改文本既可以是基于识别文本修改得到的文本,也可以是基于翻译文本修改得到的文本。需要说明的是,对文本进行修改可以是对文本中的一个字、或一个词语、或一句话、或该文本整体进行修改。
即,服务器接收用户终端反馈的、对识别文本修改后得到的修改文本。或者,服务器接收用户终端反馈的、对翻译文本修改后得到的修改文本。
S210,服务器根据初始文本和修改文本更新同声传译模型。
在一个实施例中,由于同声传译模型包括语音模型和翻译模型,因此当加权累积值达到阈值、且修改文本为基于识别文本修改得到时,服务器可以根据识别文本和修改文本更新语音模型。当加权累积值达到阈值、且修改文本为基于翻译文本修改得到时,服务器可以根据翻译文本和修改文本更新翻译模型。
其中,加权累计值达到阈值指的是加权累积值大于或等于阈值。在一个实施例中,当服务器按照与用户终端标识对应的权重,确定修改文本的加权累积值之后,服务器判断加权累积值与预设的阈值之间的大小。
上述基于同声传译的数据处理方法,服务器通过接收用户终端反馈的对初始文本修改后得到的修改文本,实现了当初始文本发生修改时,能及时地获得相应的反馈。另外,根据初始文本和修改文本更新同声传译模型后,再通过更新后的同声传译模型对后续音频进行处理,从而提高了由处理音频得到的文本的准确性。
在一个实施例中,S204可以包括:
S302,服务器对获取到的音频进行降噪处理。
在一个实施例中,服务器通过降噪算法对获取到的音频进行降噪处理,其中,降噪算法可以包括维纳滤波降噪算法、基本谱减法或LMS的自适应陷波算法等。
在一个实施例中,对获取到的音频进行降噪处理之后,服务器还可以对降噪后的音频进行功率放大处理。
S304,服务器获取降噪处理后的音频包括的语音部分。
其中,音频可以包含语音部分和非语音部分。
在一个实施例中,服务器还可以对降噪处理后的音频,或对降噪和功放处理后的音频进行语音活动性检测,以判断音频中是否存在非语音部分。当确定音频中存在非语音部分时,将非语音部分进行删除,从而获取到音频中的语音部分。
S306,服务器从获取到的语音部分中获取能量值大于或等于能量阈值的音频部分。
由于演讲者在演讲的过程中,可能会有其他人讲话。那么,已删除非语音部分的音频中,除了演讲者的音频之外,还可能包含他人的音频。其中,他人的音频的能量相对演讲者的能量要小。因此,可以对获取到的语音部分进行能量检测,从获取的语音部分中获取能量值大于或等于能量阈值的音频部分。
S308,服务器通过同声传译模型处理该音频部分获得初始文本。
在一个实施例中,服务器通过语音识别算法,对步骤s306得到的音频部分进行语音识别,获得初始文本。
上述基于同声传译的数据处理方法,服务器对获得的音频进行降噪,有利于提高语音识别的正确率。另外,获取降噪处理后的音频中的语音部分,有利于在语音识别的过程中,避免服务器对整个音频进行编解码,提高了服务器的计算效率。另外,从获取到的语音部分中获取能量大于或等于能量阈值的音频部分,避免了语音识别过程中,他人的语音对演讲者的语音产生干扰,从而避免了获得非演讲者的语音对应的文本。
在一个实施例中,同声传译模型包括通用语音模型和辅助语音模型;初始文本包括识别文本和识别更新文本中的至少一种。
其中,识别文本是通过通用语音模型对获取到的音频进行语音识别得到的;识别更新文本是通过辅助语音模型更新识别文本得到的;换一种表达方式,通过同声传译模型处理音频得到初始文本,包括:通过通用语音模型对音频进行语音识别,得到识别文本;通过辅助语音模型对识别文本进行更新,得到识别更新文本。S210可以包括:根据初始文本和修改文本更新辅助语音模型。
其中,通用语音模型用于对获取到的音频进行语音识别,获得识别文本。辅助语音模型用于对识别文本进行更新,例如,在服务器根据初始文本和修改文本对辅助语音模型进行更新之后,当辅助语音模型检测到识别文本出现错误、 且该错误有对应的修改文本时,服务器通过辅助语音模型将出错的识别文本进行更新,即把出错的识别文本替换为修改文本。当辅助语音模型未检测到识别文本出现错误时,服务器将不对识别文本进行更新。
在一个实施例中,在根据初始文本和修改文本更新辅助语音模型之后,该方法还包括:服务器将获得新的音频输入通用语音模型,通过通用语音模型将输入的音频识别为对应的识别文本。服务器将识别得到的识别文本输入辅助语音模型,通过辅助语音模型检测该识别文本是否包含有与修改文本对应的内容,若该识别文本包含有与修改文本对应的内容,则将上述对应的内容更新为修改文本。
上述基于同声传译的数据处理方法,服务器根据初始文本和修改文本更新辅助语音模型,以通过更新的辅助语音模型对后续的文本进行更新处理,即若后续的文本包含有与修改文本对应的内容时,将对应的内容替换为修改文本,避免再次出现更新之前的错误,从而提高了同声传译中获得的文本的准确性。
在一个实施例中,同声传译模型包括翻译模型;初始文本包括翻译文本;修改文本包括修改的翻译文本;S210可以包括:根据翻译文本和修改的翻译文本更新翻译模型。
在一个实施例中,在根据翻译文本和修改的翻译文本更新翻译模型之后,该方法还包括:服务器将识别文本或识别更新文本输入翻译模型,当通过翻译模型检测到识别文本或识别更新文本包含有与修改的翻译文本对应的内容时,将该对应的内容更新为修改的翻译文本。
在一个实施例中,翻译模型可以包括通用翻译模型和辅助翻译模型;根据翻译文本和修改的翻译文本更新翻译模型的步骤,可以包括:根据翻译文本和修改的翻译文本更新辅助翻译模型。在辅助翻译模型更新之后,服务器将识别文本或识别更新文本输入通用翻译模型,通过通用翻译模型将识别文本或识别更新文本翻译为翻译文本。之后,服务器将翻译文本输入辅助翻译模型,通过辅助翻译模型检测翻译文本是否包含有与修改的翻译文本匹配的内容,若翻译文本包含有与修改的翻译文本匹配的内容,则将该匹配的内容更新为修改的翻译文本,得到最终的翻译文本。
上述基于同声传译的数据处理方法,服务器根据翻译文本和修改的翻译文本更新翻译模型,以通过更新的翻译模型对后续的文本进行翻译,避免出现更新之前出现的错误,从而提高了同声传译中获得的文本的准确性。
在一个实施例中,如图4所示,该方法还包括:
S402,服务器接收同声传译设备发送的与音频匹配的视频。
其中,该视频可以是演讲者的视频,也可以是演讲者的PPT(Power Point,演示文稿)。
在一个实施例中,同声传译设备采集与获取到的音频匹配的视频,并将采集到的视频发送至服务器。服务器接收同声传译设备采集的视频。
S404,服务器将初始文本嵌入视频。
在一个实施例中,服务器可以根据处理音频得到的文本在视频中的出现时间,将该文本嵌入视频。其中,该出现时间指的是当用户终端在播放视频时,该文本以字幕的形式出现在视频中的时间。
在一个实施例中,服务器可以将初始文本嵌入视频的底部、中部或顶部。服务器还可以设置初始文本嵌入在视频中的行数,如大于或等于两行。
S406,服务器将已嵌入初始文本的视频发送至用户终端。
在一个实施例中,服务器通过与用户终端之间建立的连接通道,将已嵌入文本的视频发送至用户终端。其中,该连接通道可以是TCP连接通道或UDP连接通道。
需要说明的是,将文本嵌入至视频之后,用户终端展示嵌入文本的视频时,用户可以通过用户终端对嵌入的文本进行修改。
上述基于同声传译的数据处理方法,服务器将同声传译过程中得到的文本嵌入视频中,将嵌入文本的视频发送至用户终端,一方面,文本与视频的结合,有利于提高观众对文本的理解;另一方面,观众除了可以观看到同声传译中的文本之外,还可以观看到视频内容,丰富了用户终端展示的内容。
在一个实施例中,服务器获取到的音频与群组标识对应;作为一个示例,该群组标识指代会议号。如图5所示,S206可以包括:
S502,服务器将初始文本发送至通过会议号接入的用户终端。
其中,会议号指的是同声传译会议中的编号。在同声传译的软件系统中,可以同时支持多个同声传译会议,不同的同声传译会议具有不同的会议号。
在一个实施例中,当用户终端扫描会议室中的二维码或条形码之后,服务器建立与用户终端的通信连接,并将同声传译列表发送至用户终端,以便持有用户终端的观众选择同声传译列表中的会议编号,进入对应的同声传译会议。
在一个实施例中,服务器接收用户终端携带有会议号和用户标识的访问请求,根据用户标识确定用户是否具有访问与会议号对应的同声传译会议的权限。若具有访问与会议号对应的同声传译会议的权限时,服务器允许用户终端的访问。若不具有访问与会议号对应的同声传译会议的权限时,服务器则拒绝用户终端的访问。其中,用户标识可以是手机号或社交账号。
S504,服务器接收用户终端反馈的评论信息。
其中,评论信息指的是观众在观看演讲者的演讲内容过程中发起的评论。演讲内容包括处理音频得到的文本和对应的翻译文本。
S506,服务器在通过会议号接入的用户终端间同步评论信息。
在一个实施例中,服务器根据会议号确定连接的用户终端,将接收到的评论信息同步至确定的所有用户终端,以指示用户终端将接收到的评论信息以弹幕的形式展示于同声传译辅助页面。
上述基于同声传译的数据处理方法,服务器根据会议号确定接收评论信息的用户终端,一方面,观众可以通过用户终端发起评论,提升用户与同声传译系统之间的交互性;另一方面,可以有效地避免将评论信息发送至其它同声传译会议的用户终端。
在一个实施例中,服务器获取到的音频与群组标识对应;作为一个示例,该群组标识指代会议号。S206可以包括:将初始文本发送至通过会议号接入的用户终端;如图6所示,该方法还包括:
S602,服务器将初始文本与会议号对应存储。
在一个实施例中,当服务器开始处理某个演讲者的音频得到相应的文本之后,创建目标格式的文档,将初始文本添加入该文档中,并建立文档与会议号之间的映射关系,以及将建立映射关系的文档和会议号进行存储。之后,当服务器处理新采集到的音频得到相应的文本之后,将该文本直接添加入创建的文档中。
S604,当修改文本的加权累积值达到阈值时,服务器将与会议号对应存储的文本更新为修改文本。
S606,当接收到通过会议号接入的用户终端发送的同步请求时,服务器向发起同步请求的用户终端反馈与会议号对应的更新后的文本。
对于同声传译的观众而言,可以及时下载到正确版本的文本内容。
上述基于同声传译的数据处理方法,服务器根据修改文本更新存储的文本,从而保证了原先出现错误的文本能够被及时的纠正。当服务器接收到用户终端的同步请求时,即可将更新后的文本发送至用户终端,保证了用户终端获得的文本为更新后的正确文本,提高了文本的准确性。
在一个实施例中,如图7所示,该方法还包括:
S702,服务器统计各个用户终端标识对应的文本修改次数。
其中,文本修改次数指的是携带用户终端的用户对观看的文本进行修改的次数。观看的文本可以是服务器处理不同的音频得到的文本,作为一个示例,观看的文本包括识别文本、识别更新文本和翻译文本。用户终端标识用于表示携带该用户终端的用户。
在一个实施例中,服务器根据接收的修改文本和对应的用户终端标识,确定归属于同一个用户终端标识的修改文本数量,将该数量作为同一个用户终端标识对应的文本修改次数。
S704,服务器检测各个用户终端标识对应的文本修改正确率。
其中,文本修改正确率指的是在预设时间内,该用户终端标识对应的用户终端修改对应文本的修改正确率,也即预设时间内得到的修改文本的正确率。
由于对文本进行修改得到的修改文本,可能会出现错误,因此,在一个实施例中,服务器检测用户终端标识对应的修改文本,判断修改文本是否正确,统计正确率,从而得到用户终端标识对应的文本修改正确率。
S706,对于任意一个用户终端标识,当文本修改次数达到修改次数阈值、且文本修改正确率达到文本修改正确率阈值时,服务器调高与该用户终端标识对应的权重。
其中,权重指的是每个用户终端对接收的文本进行修改具有的修改权重。不同级别的用户终端标识,对应的权重可以不同。例如,普通用户级别的用户 终端标识,对应的权重较小。具有管理者权限用户级别的用户终端标识,对应的权重较大。
在一个实施例中,服务器根据文本修改次数和文本修改正确率,调整与用户终端标识对应的权重。其中,调整与用户终端标识对应的权重包括:当文本修改次数小于修改次数阈值、且文本修改正确率小于文本修改正确率阈值时,调低与用户终端标识对应的权重。当文本修改次数达到修改次数阈值、且文本修改正确率达到文本修改正确率阈值时,调高与用户终端标识对应的权重。
上述基于同声传译的数据处理方法,服务器判断文本修改次数和文本修改正确率达到预设的条件时,调高与用户终端标识对应的权重,实现了对修改文本贡献度大的用户赋予更大的权重,有利于提高文本的准确性。
在一个实施例中,S210可以包括:按照与各个用户终端标识对应的权重,确定修改文本的加权累积值;当加权累积值达到阈值时,根据初始文本和修改文本更新同声传译模型。
加权累积值是对权重进行累加或累积所得。其中,累加指的是将各个权重相加。累积指的是:当某个用户终端对同一个文本进行了多次修改,则将修改次数与对应的权重进行相乘,再与其它的权重进行相加。
例如,用户终端A、用户终端B和用户终端C对某个文本进行了修改,用户终端标识对应的权重分别为q1、q2和q3,用户终端A修改了两次,用户终端B和用户终端C修改了1次,那么,加权累积值S=2×q1+q2+q3。
在一个实施例中,服务器接收到用户终端发送的修改文本时,确定该用户终端的用户终端标识对应的级别,根据确定的级别获得对应的权重。之后,服务器将获得的权重进行累加或累积计算,将计算结果确定为修改文本的加权累积值。
在一个实施例中,当服务器接收到某个用户终端发送的多个修改文本、且该多个修改文本基于同一个文本修改得到时,服务器将最后接收到的修改文本作为该用户终端的最终修改版本。
上述基于同声传译的数据处理方法,根据反馈的修改文本,统计用户终端对修改文本的加权累积值,当加权累积值达到阈值时,根据初始文本和修改文本更新同声传译模型,而使用更新后的同声传译模型对后续的音频进行处理,可以提高由处理音频得到的文本的准确性。此外,由于加权累积值达到阈值时,才对同声传译模型进行更新,因此可以有效地避免因无效修改而影响同声传译模型,进一步保证了由处理音频得到的文本的准确性。
如图8所示,在一个实施例中,提供了一种基于同声传译的数据处理方法。本实施例主要以该方法应用于上述图1中的用户终端110来举例说明。参照图8,该基于同声传译的数据处理方法包括如下步骤:
S802,用户终端展示同声传译辅助页面。
其中,同声传译辅助页面可用于展示文本,或者展示嵌入文本的视频。此外,同声传译辅助页面还可以展示同声传译列表。
在一个实施例中,用户终端通过社交应用扫描同声传译会议中的条形码或二维码,根据条形码或二维码中的链接地址进入社交应用中的小程序。用户终端在小程序中展示同声传译辅助页面,并在该同声传译辅助页面中展示同声传译列表,该同声传译列表中包含有不同会议号的同声传译会议。根据输入的选择指令,进入同声传译列表中对应的同声传译会议。
在一个实施例中,在该同声传译辅助页面中展示同声传译列表的步骤可以包括:用户终端向服务器发送携带手机号或社交账号的获取请求,接收服务器发送的具有访问权限的同声传译列表。
在一个实施例中,图9示出了进入和展示同声传译辅助页面的示意图。图9中,在首次进入同声传译辅助页面时,用户终端首先会显示同声传译列表,根据选择指令展示对应的同声传译会议。若非首次进入同声传译辅助页面,则将直接进入同声传译会议。
S804,用户终端接收服务器发送的初始文本;该初始文本是服务器通过同声传译模型处理同声传译设备发送的音频得到的。
S806,用户终端在同声传译辅助页面中展示初始文本。
在一个实施例中,在同声传译辅助页面中展示文本时,用户终端根据展示的文本合成对应语种的语音,并将该语音播报出来。
图9中还示出了同声传译辅助页面中展示的文本。此外,用户终端可以切换不同的语种选择性展示文本,还可以使用不同的音色对文本进行语音合成,并播报出来。
S808,当检测到修改指令时,用户终端获取与初始文本对应的修改文本。
在一个实施例中,用户终端实时检测输入的针对初始文本的修改指令,根据修改指令获得与初始文本对应的修改文本。
S810,用户终端将修改文本发送至服务器;修改文本,用于指示服务器根据初始文本和修改文本更新同声传译模型。
作为一个示例,用户终端还会将本地的用户终端标识发送给服务器,相应地,修改文本用于指示服务器按照与该用户终端标识对应的权重,确定修改文本的加权累积值;在加权累积值达到阈值时,根据初始文本和修改文本更新同声传译模型。
上述基于同声传译的数据处理方法,用户终端通过同传辅助展示页面展示由服务器处理音频得到的文本,当检测到修改指令时,用户终端获得对应的修改文本,实现了当处理音频得到的文本出错时,用户终端可以进行相应的修改。另外,用户终端还会将获得的修改文本同步至服务器,指示服务器当修改文本的加权累积值到阈值时,根据处理音频得到的文本和修改文本更新同声传译模型,从而提高了文本的准确性。
在一个实施例中,如图10所示,展示同声传译辅助页面包括:
S1002,用户终端通过母应用程序获取子应用程序标识。
其中,母程序是承载子应用程序的应用程序,为子应用程序的实现提供环境。母应用程序是原生应用程序,可直接运行于操作系统。该母程序可以包括 社交应用程序或直播应用。子应用程序则是可在母应用程序提供的环境中实现的应用程序。作为一个示例,子应用程序可以是同声传译小程序。
在一个实施例中,用户终端可通过母应用程序展示子应用程序列表,接收针对子应用程序列表中选项的选择指令,根据该选择指令确定子应用程序列表中被选中的选项,从而获取被选中的选项对应的子应用程序标识。
S1004,用户终端根据子应用程序标识获取相应的同声传译辅助页面配置文件。
用户终端可通过母应用程序,从本地或者服务器获取与子应用程序标识相应的同声传译辅助页面配置文件。进一步地,用户终端可根据子应用程序标识在本地或者服务器确定相应的文件夹,进而从该文件夹中获取同声传译辅助页面配置文件。或者,用户终端可根据子应用程序标识和页面标识的对应关系,获取与该子应用程序标识相应的同声传译辅助页面配置文件。
其中,页面标识用于唯一标识一个子应用程序包括的一个同声传译辅助页面,而不同的子应用程序可以采用相同的页面标识。
其中,同声传译辅助页面配置文件是对子应用程序呈现的页面进行配置的文件。该配置文件可以是源代码或者是将源代码编译后得到的文件。子应用程序呈现的页面称为同声传译辅助页面,子应用程序可以包括一个或多个同声传译辅助页面。
S1006,用户终端从同声传译辅助页面配置文件中获取公共组件标识。
用户终端可解析同声传译辅助页面配置文件,从而从同声传译辅助页面配置文件中获取公共组件标识。
其中,公共组件标识用于唯一标识相应的公共组件。公共组件是母应用程序提供的组件,该组件可供不同的子应用程序共用。公共组件具有视觉形态,是同声传译辅助页面的组成单元。公共组件还可以封装有逻辑代码,该逻辑代码用于处理针对该公共组件的触发事件。不同的子应用程序共用公共组件,具体可以是同时或者在不同时刻调用相同的公共组件。在一个实施例中,公共组件还可由母应用程序和子应用程序所共用。
S1008,用户终端在母应用程序提供的公共组件库中,选择与公共组件标识对应的公共组件。
其中,公共组件库是母应用程序提供的公共组件构成的集合。公共组件库中的每个公共组件具有唯一的公共组件标识。公共组件库可由母应用程序在运行时从服务器下载到本地,或者可由母应用程序在被安装时从相应的应用程序安装包中解压缩得到。
S1010,用户终端根据选择的公共组件构建同声传译辅助页面。
用户终端可获取选择的公共组件自带的默认组件样式数据,从而按照该默认组件样式数据组织选择的公共组件并渲染,形成同声传译辅助页面。
其中,默认组件样式数据是描述公共组件默认的展示形式的数据。默认组件样式数据可以包括公共组件默认在同声传译辅助页面中的位置、尺寸、颜色、字体和字号等属性。用户终端可通过母应用程序集成的浏览器控件并根据选择的公共组件构建同声传译辅助页面。
上述基于同声传译的数据处理方法,用户终端运行母应用程序,母应用程序提供公共组件库,通过母应用程序可以获取到子应用程序标识,从而获取相应的同声传译辅助页面配置文件,从而依据该同声传译辅助页面配置文件,从公共组件库中选择公共组件构建出同声传译辅助页面。其中,子应用程序标识可以标识出不同的子应用程序,母应用程序可以依据不同子应用程序标识对应的同声传译辅助页面配置文件实现不同的子应用程序。当母应用程序在运行时,便可以利用母程序提供的公共组件快速构建同声传译辅助页面,缩短了应用程序安装时长,提高了应用程序使用效率。
在一个实施例中,如图11所示,该方法还包括:
S1102,用户终端接收由服务器同步的修改文本和对应的排序序号;接收的修改文本和修改前的相应文本共用排序序号。
其中,排序序号用于表示某个文本在文档中的排列位置,或者表示某个文本在某个存储区的存储位置。文档指的是用于保存和编辑文本的一种文本文件,包括TEXT文档、WORD文档和XML文档等。
在一个实施例中,当服务器确定加权累积值达到阈值时,将修改文本和对应的排序序号同步至用户终端。用户终端在接收到修改文本和对应的排序序号后执行S1104。
S1104,用户终端在本地查找与接收到的排序序号对应的文本。
在一个实施例中,用户终端在存储区中查找与接收到的排序序号对应的文本。或者,由于文本可以保存于文档中,而文档与会议号具有映射关系并存储于用户终端,因此用户终端还可根据会议号查找保存文本的文档,在该文档中根据接收到的排列序号查找对应的文本。
S1106,用户终端将本地查找到的文本替换为接收到的修改文本。
上述基于同声传译的数据处理方法,用户终端根据接收到的排列序号查找对应的文本,并将查找到的文本替换为接收到的修改文本,确保了当某文本出现错误时,各个用户终端均可以同步进行修改,提高了获得的文本的准确性。
如图12所示,在一个实施例中,提供了一种基于同声传译的数据处理方法。本实施例主要以该方法应用于上述图1中的服务器120来举例说明。参照图12,该基于同声传译的数据处理方法包括如下步骤:
S1202,服务器获取音频。
S1204,服务器对获取到的音频进行降噪处理。
S1206,服务器获取降噪处理后的音频中的语音部分。
S1208,服务器从语音部分中获取能量值大于或等于能量阈值的音频部分。
S1210,服务器通过同声传译模型处理该音频部分获得初始文本。
S1212,服务器将该初始文本发送至用户终端。
S1214,服务器接收与获取到的音频匹配的视频。
S1216,服务器将初始文本嵌入视频。
S1218,服务器将已嵌入初始文本的视频发送至用户终端。
在一个实施例中,服务器还可将初始文本发送至通过会议号接入的用户终端。
S1220,服务器接收用户终端反馈的修改文本,该修改文本是用户终端对初始文本进行修改后得到的。
S1222,服务器按照与用户终端标识对应的权重,确定修改文本的加权累积值。
S1224,当加权累积值达到阈值时,服务器根据初始文本和修改文本更新同声传译模型。
S1226,服务器接收用户终端反馈的评论信息。
S1228,服务器在通过会议号接入的用户终端间同步评论信息。
S1230,服务器将初始文本与会议号对应存储。
S1232,当加权累积值达到阈值时,服务器将与会议号对应存储的文本更新为修改文本。
S1234,当接收到通过会议号接入的用户终端所发送的同步请求时,服务器向发起同步请求的用户终端反馈与会议号对应的更新后的文本。
如图13所示,在一个实施例中,提供了另一种基于同声传译的数据处理方法。本实施例主要以该方法应用于上述图1中的用户终端110来举例说明。参照图13,该基于同声传译的数据处理方法包括如下步骤:
S1302,用户终端展示同声传译辅助页面。
S1304,用户终端接收服务器发送的初始文本;该文本是服务器通过同声传译模型处理同声传译设备发送的音频得到的。
S1306,用户终端在同声传译辅助页面中展示初始文本。
S1308,当检测到修改指令时,用户终端获取与初始文本对应的修改文本。
S1310,用户终端将本地的用户终端标识和修改文本发送至服务器;修改文本,用于指示服务器按照与该用户终端标识对应的权重确定修改文本的加权累积值;在加权累积值达到阈值时,根据初始文本和修改文本更新同声传译模型。
S1312,用户终端接收由服务器同步的修改文本和对应的排序序号;接收的修改文本和修改前的相应文本共用排序序号。
S1314,用户终端在本地查找与接收到的排序序号对应的文本。
S1316,用户终端将本地查找到的文本替换为接收到的修改文本。
相关技术的同声传译方案中,同声传译设备采集音频并进行相应的处理,然后将处理后的音频上传到语音服务器做语音识别,语音服务器识别完成后将识别文本发给翻译服务器,翻译服务器将识别文本翻译为目标语种,并将翻译文本返回给同声传译客户端,最后同声传译设备将返回结果展示到显示屏。一个典型的大会同声传译系统如图14所示。
相关技术的同声传译系统中,主要采用以下两种显示文本的方式:一种是分屏展示,演讲者的图像或者PPT占屏幕的一部分,同声传译的文本占屏幕的另一部分。第二种是字幕展示,演讲者的图像或者PPT占满屏幕,同声传译的 文本则在屏幕底部以字幕的方式显示出来。
但上述两种展示方式均存在以下问题:1)看不清,对于参会人数较多的会议,后排及视角不佳的观众将看不清会议显示屏展示的文本。此外,对于因故无法参加会议的观众更无法获取会议内容。2)无互动,观众只能被动获取同声传译文本。3)无法优化同声传译模型,观众不能对识别文本和/或翻译文本进行即时修改,无法对同声传译中的语音模型和翻译模型进行优化。
对于上述问题,本申请实施例中提出了一种解决方案,其中,本申请实施例的同声传译系统如图15所示。如图15所示,同声传译系统包括服务器、同声传译设备、麦克风、用户终端和显示屏。其中,上述服务器可以是由服务器集群组成,例如可以包括语音服务器和翻译服务器。
如图16所示,在一个实施例中,提供了又一种基于同声传译的数据处理方法。参照图16,该基于同声传译的数据处理方法包括如下步骤:
S1602,麦克风将采集到的音频输出至同声传译设备。
S1604,同声传译设备对接收到的音频进行降噪、增益和语音活动性检测处理。
同声传译设备通过前端处理算法,对麦克风采集到的音频进行降噪、增益和语音活动性检测处理。作为一个示例,前端处理算法可采用“DNN(Deep Neural Network,深层神经网络)+能量”双重检测的方式。其中,DNN可用于抑制噪音。能量检测可用于将音频中能量小于阈值的部分滤除。
S1606,同声传译设备将音频发送至语音服务器。
S1608,同声传译设备将接收到的视频发送至语音服务器。
本申请实施例中,除了采集语音作为输入源,还会获取视频作为输入源。该视频可以是演讲者的PPT,也可以是演讲者本人的视频。
同声传译客户端通过上传“会议号”等字段来唯一标识本次同声传译会议和相应的演讲内容(包括识别文本和翻译文本)。
S1610,语音服务器通过通用语音模型识别音频,获得识别文本;通过辅助语音模型对识别文本进行检测更新,获得更新后的识别文本。
S1612,语音服务器将识别文本发送至翻译服务器。
S1614,翻译服务器对接收到的识别文本进行翻译,获得目标语种的翻译文本。
S1616,翻译服务器将翻译文本发送至语音服务器。
S1618,语音服务器将识别文本和翻译文本合并,将合并后的文本发送至同声传译设备。
S1620,语音服务器将识别文本、翻译文本和视频进行合并,将合并后的文本和视频发送至用户终端。
语音服务器将合并后的文本和视频推送给所有已经激活的用户终端。
S1622,同声传译设备将合并后的文本和视频发送至显示屏展示。
这里,同声传译设备将识别文本、翻译文本和视频发送至同声传译会议的显示屏中进行展示。
S1624,用户终端对识别文本进行修改,将得到的修改文本发送至语音服务器。
在同声传译的过程中,用户可通过社交应用扫描二维码或点击相应的链接进入网页或小程序,用户终端会通过手机号或微信号选择具有访问权限的同声传译列表,用户点击其中某个条目进入同声传译辅助页面。进入同声传译辅助页面后,该用户终端将被激活。用户终端的同声传译辅助页面默认显示当前正在演讲的文本。用户终端还可以自行切换不同的语种展示文本,根据显示的文本合成不同对应音色的语音,并播报出来。
作为一个示例,同声传译辅助页面中设置有一键保存的功能按键,当该功能按键被触发时,用户终端将接收到的识别文本和翻译文本进行保存,形成同声传译全文。此外,用户在用户终端可以对识别文本和翻译文本做修改,修改文本可上传到服务器。
S1626,语音服务器根据识别文本和修改文本更新辅助语音模型。
S1628,用户终端对翻译文本进行修改,将得到的修改文本通过语音服务器发送至翻译服务器。
S1630,翻译服务器根据翻译文本和修改文本更新翻译模型。
语音服务器或翻译服务器接收到修改文本时,通过对应的算法使用修改文本实时更新语音模型和翻译模型,更新的语音模型和翻译模型用于本次同声传译的后面演讲中。在实时更新语音模型方面,语音模型包括通用语言模型和辅助语言模型。其中,通用语言模型在程序开始运行时完成一次加载。当接到用户修改的指令后,会对辅助语言模型进行更新,并重新热加载,在整个过程中实现无缝切换。需要说明的是,辅助语音模型可在程序运行过程中多次热加载,每次更新辅助语音模型后,对辅助语音模型进行一次热加载。
热加载指代的是:在运行时重新加载class(开发环境),基于字节码的更改,不释放内存开发可用,上线不可用,热加载不重启tomcat,不重新打包。
对音频的声学符号序列的解码过程中,服务器将音频的声学符号序列输入通用语言模型进行语音识别,获得识别文本。然后将识别文本输入辅助语言模型,通过辅助语音模型将之前出现错误的文本替换为修改文本。
服务器对修改文本做合理性检测,检测合理的修改文本用于更新语音模型和/或翻译模型。举例来说:如果发现有错误翻译、且多人对错误翻译进行修改,服务器根据携带用户终端的用户具有的权重,确定修改文本的加权累积值。当加权累积值达到阈值时,服务器对翻译模型进行优化。
服务器根据文本修改次数和文本修改正确率确定用户修改的贡献度,并适应性的调整对应的权重。
观众通过用户终端对演讲者或演讲内容发表评论。用户终端将评论信息发送至服务器,通过服务器中转至会议显示屏和各个已激活的用户终端,评论信息以弹幕的形式展示于显示屏和用户终端。
通过实施上述基于同声传译的数据处理方法,可以具有以下有益效果:
1)可以通过用户终端观看语音识别后的文本和翻译后的文本,避免了因后 排及视角不佳而看不清的问题。
2)具有互动效果,通过用户终端进入同声传译辅助页面,观众可对演讲者或演讲内容发表评论并提交,提交的评论将由服务器下发到会议显示屏和各个用户终端。
3)可实时更新同声传译模型(包括语音模型和翻译模型),可以通过用户终端对识别文本和/或翻译文本进行修改,若大量用户同时修改或拥有管理员权限的人修改某个文本或其中某个词,则服务器会对语音模型和/或翻译模型进行更新,更新后的语音模型和翻译模型用于后续的语音识别和翻译,避免再次出现错误。
4)可随时切换目标语种,在社交应用的小程序中,用户可以设置翻译的语种和选择个性化音色合成对应的语音。
5)很便捷地获取同声传译全文内容,同声传译辅助页面设置有一键保存会议演讲记录的功能。
图2和图8为一个实施例中基于同声传译的数据处理方法的流程示意图。应该理解的是,虽然图2和图8的流程图中各个步骤按照箭头的指示依次显示,但是这些步骤并不是必然按照箭头指示的顺序依次执行。除非本文中有明确的说明,这些步骤的执行并没有严格的顺序限制,这些步骤可以以其它的顺序执行。而且,图2和图8中的至少一部分步骤可以包括多个子步骤或者多个阶段,这些子步骤或者子阶段并不必然是在同一时刻执行完成,而是可以在不同的时刻执行,这些子步骤或者子阶段的执行顺序也不必然是依次进行,而是可以与其它步骤、或者其它步骤的子步骤或者子阶段的至少一部分,轮流或者交替地执行。
如图17所示,在一个实施例中,提供了一种基于同声传译的数据处理装置,该基于同声传译的数据处理装置1700包括:获取模块1702、处理模块1704、发送模块1706、接收模块1708、确定模块1712和更新模块1710;其中:
获取模块1702,用于获取同声传译设备发送的音频;
处理模块1704,用于通过同声传译模型处理音频得到初始文本;
发送模块1706,用于将初始文本发送至用户终端;
接收模块1708,用于接收用户终端反馈的修改文本,修改文本是用户终端对初始文本进行修改后得到的;
更新模块1710,根据初始文本和修改文本更新同声传译模型。
上述基于同声传译的数据处理装置,通过接收用户终端反馈的对初始文本修改后得到的修改文本,实现了当初始文本发生修改时,能及时地获得相应的反馈。另外,根据初始文本和修改文本更新同声传译模型,再通过更新后的同声传译模型对后续的音频进行处理,提高了由处理音频得到的文本的准确性。
在一个实施例中,如图18所示,该装置还可以包括:确定模块1712;
其中:确定模块1712,用于按照与各个用户终端标识对应的权重,确定修改文本的加权累积值;
更新模块1710还用于当加权累积值达到阈值时,根据初始文本和修改文本更新同声传译模型。
在一个实施例中,处理模块1704还用于对音频进行降噪处理;获取降噪处理后的音频包括的语音部分;从语音部分中获取能量值大于或等于能量阈值的音频部分;通过同声传译模型处理音频部分获得初始文本。
在一个实施例中,同声传译模型包括通用语音模型和辅助语音模型;
处理模块1704,还用于通过通用语音模型对音频进行语音识别,得到识别文本;通过辅助语音模型对识别文本进行更新,得到识别更新文本;其中,初始文本包括识别文本和识别更新文本中的至少一种;
更新模块1710还用于根据初始文本和修改文本更新辅助语音模型。
在一个实施例中,同声传译模型包括翻译模型;初始文本包括翻译文本;修改文本包括修改的翻译文本;
更新模块1710还用于根据翻译文本和修改的翻译文本更新翻译模型。
在一个实施例中,如图18所示,该装置还包括:嵌入模块1714;其中,
接收模块1708还用于接收同声传译设备发送的与音频匹配的视频;
嵌入模块1714,用于将初始文本嵌入视频;
发送模块1706还用于将已嵌入初始文本的视频发送至用户终端。
在一个实施例中,该装置还包括:同步模块1716;
其中,音频与群组标识对应;
发送模块1706还用于将初始文本发送至通过群组标识接入的用户终端;
接收模块1708还用于接收用户终端反馈的评论信息;
同步模块1716,用于在通过群组标识接入的用户终端间同步评论信息。
在一个实施例中,如图18所示,该装置还包括:存储模块1718和反馈模块1720;其中,音频与群组标识对应;
发送模块1706还用于将初始文本发送至通过群组标识接入的用户终端;
存储模块1718,用于将初始文本与群组标识对应存储;
更新模块1710还用于当修改文本的加权累积值达到阈值时,将与群组标识对应存储的文本更新为修改文本;
反馈模块1720,用于当接收到通过群组标识接入的用户终端所发送的同步请求时,向发起同步请求的用户终端反馈与群组标识对应的更新后的文本。
在一个实施例中,如图18所示,该装置还包括:统计模块1722、检测模块1724和调整模块1726;其中,
统计模块1722,用于统计与各个用户终端标识对应的文本修改次数;
检测模块1724,用于检测各个用户终端标识对应的文本修改正确率;
调整模块1726,用于对于任意一个用户终端标识,当文本修改次数达到修改次数阈值、且文本修改正确率达到文本修改正确率阈值时,调高与该用户终端标识对应的权重。
上述所有可选技术方案,可以采用任意结合形成本申请的可选实施例。
如图19所示,在一个实施例中,提供了一种基于同声传译的数据处理装置, 该基于同声传译的数据处理装置1900包括:第一展示模块1902、接收模块1904、第二展示模块1906、获取模块1908和发送模块1910;其中:
第一展示模块1902,用于展示同声传译辅助页面;
接收模块1904,用于接收服务器发送的初始文本;初始文本是服务器通过同声传译模型处理同声传译设备发送的音频得到的;
第二展示模块1906,用于在同声传译辅助页面中展示初始文本;
获取模块1908,用于当检测到修改指令时,获取与初始文本对应的修改文本;
发送模块1910,用于将修改文本发送至服务器;修改文本,用于指示服务器根据初始文本和修改文本更新同声传译模型。
上述基于同声传译的数据处理装置,通过同传辅助展示页面展示由服务器处理音频得到的初始文本,当检测到修改指令时,获得对应的修改文本,实现了当由服务器处理音频得到的文本出错时,用户终端可以进行相应的修改,将获得的修改文本同步至服务器,以指示服务器根据初始文本和修改文本更新同声传译模型,从而提高了由处理音频得到的文本的准确性。
在一个实施例中,展示同声传译辅助页面,第一展示模块1902还用于通过母应用程序获取子应用程序标识;根据子应用程序标识,获取相应的同声传译辅助页面配置文件;从同声传译辅助页面配置文件中获取公共组件标识;在母应用程序提供的公共组件库中,选择与公共组件标识对应的公共组件;根据选择的公共组件构建同声传译辅助页面。
在一个实施例中,如图20所示,该装置还包括:查找模块1912和替换模块1914;其中,
接收模块1904还用于接收由服务器同步的修改文本和对应的排序序号;接收的修改文本和修改前的相应文本共用排序序号;
查找模块1912,用于在本地查找与排序序号对应的文本;
替换模块1914,用于将本地查找到的文本替换为接收到的修改文本。
上述所有可选技术方案,可以采用任意结合形成本申请的可选实施例。
图21示出了一个实施例中计算机设备的内部结构图。该计算机设备可以是图1中的服务器120。如图21所示,该计算机设备包括该计算机设备包括通过系统总线连接的处理器、存储器和网络接口。其中,存储器包括非易失性存储介质和内存储器。该计算机设备的非易失性存储介质存储有操作系统,还可存储有计算机程序,该计算机程序被处理器执行时,可使得处理器实现基于同声传译的数据处理方法。该内存储器中也可储存有计算机程序,该计算机程序被处理器执行时,可使得处理器执行基于同声传译的数据处理方法。
本领域技术人员可以理解,图21中示出的结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的计算机设备的限定,计算机设备可以包括比图中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。
在一个实施例中,本申请提供的基于同声传译的数据处理装置可以实现为 一种计算机程序的形式,计算机程序可在如图21所示的计算机设备上运行。计算机设备的存储器中可存储组成该基于同声传译的数据处理装置的各个程序模块,比如,图17所示的获取模块1702、处理模块1704、发送模块1706、接收模块1708、确定模块1712和更新模块1710。各个程序模块构成的计算机程序使得处理器执行本说明书中描述的本申请各个实施例的基于同声传译的数据处理方法中的步骤。
例如,图21所示的计算机设备可以通过如图17所示的基于同声传译的数据处理装置中的获取模块1702执行S202。计算机设备可通过处理模块1704执行S204。计算机设备可通过发送模块1706执行S206。计算机设备可通过接收模块1708执行S208。计算机设备可通过更新模块1710执行S210。
在一个实施例中,提供了一种计算机设备,包括存储器和处理器,存储器存储有计算机程序,计算机程序被该计算机设备的处理器执行时,使得处理器能够执行前述由图1中服务器120执行的基于同声传译的数据处理方法。
在一个实施例中,提供了一种计算机可读存储介质,存储有计算机程序,计算机程序被该计算机设备的处理器执行时,使得处理器能够执行前述由图1中服务器120执行的基于同声传译的数据处理方法。
图22示出了一个实施例中计算机设备的内部结构图。该计算机设备可以是图1中的用户终端110。如图22所示,该计算机设备包括该计算机设备包括通过系统总线连接的处理器、存储器、网络接口、输入装置和显示屏。其中,存储器包括非易失性存储介质和内存储器。该计算机设备的非易失性存储介质存储有操作系统,还可存储有计算机程序,该计算机程序被处理器执行时,可使得处理器实现基于同声传译的数据处理方法。该内存储器中也可储存有计算机程序,该计算机程序被处理器执行时,可使得处理器执行基于同声传译的数据处理方法。计算机设备的显示屏可以是液晶显示屏或者电子墨水显示屏,计算机设备的输入装置可以是显示屏上覆盖的触摸层,也可以是计算机设备外壳上设置的按键、轨迹球或触控板,还可以是外接的键盘、触控板或鼠标等。
本领域技术人员可以理解,图22中示出的结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的计算机设备的限定,计算机设备可以包括比图中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。
在一个实施例中,本申请提供的基于同声传译的数据处理装置可以实现为一种计算机程序的形式,计算机程序可在如图22所示的计算机设备上运行。计算机设备的存储器中可存储组成该基于同声传译的数据处理装置的各个程序模块,比如,图19所示的第一展示模块1902、接收模块1904、第二展示模块1906、获取模块1908和发送模块1910。各个程序模块构成的计算机程序使得处理器执行本说明书中描述的本申请各个实施例的基于同声传译的数据处理方法中的步骤。
例如,图22所示的计算机设备可以通过如图19所示的基于同声传译的数据处理装置中的第一展示模块1902执行S802。计算机设备可通过接收模块1904执行S804。计算机设备可通过第二展示模块1906执行S806。计算机设备可通过获取模块1908执行S808。计算机设备可通过发送模块1910执行S810。
在一个实施例中,提供了一种计算机设备,包括存储器和处理器,存储器存储有计算机程序,计算机程序被该计算机设备的处理器执行时,使得处理器能够执行前述由图1中用户终端110执行的基于同声传译的数据处理方法。
在一个实施例中,提供了一种计算机可读存储介质,存储有计算机程序,计算机程序被该计算机设备的处理器执行时,使得处理器能够执行前述前述由图1中用户终端110执行的基于同声传译的数据处理方法。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,所述的程序可存储于一非易失性计算机可读取存储介质中,该程序在执行时,可包括如上述各方法的实施例的流程。其中,本申请所提供的各实施例中所使用的对存储器、存储、数据库或其它介质的任何引用,均可包括非易失性和/或易失性存储器。非易失性存储器可包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦除可编程ROM(EEPROM)或闪存。易失性存储器可包括随机存取存储器(RAM)或者外部高速缓冲存储器。作为说明而非局限,RAM以多种形式可得,诸如静态RAM(SRAM)、动态RAM(DRAM)、同步DRAM(SDRAM)、双数据率SDRAM(DDRSDRAM)、增强型SDRAM(ESDRAM)、同步链路(Synchlink)DRAM(SLDRAM)、存储器总线(Rambus)直接RAM(RDRAM)、直接存储器总线动态RAM(DRDRAM)、以及存储器总线动态RAM(RDRAM)等。
以上实施例的各技术特征可以进行任意的组合,为使描述简洁,未对上述实施例中的各个技术特征所有可能的组合都进行描述,然而,只要这些技术特征的组合不存在矛盾,都应当认为是本说明书记载的范围。
以上所述实施例仅表达了本申请的几种实施方式,其描述较为具体和详细,但并不能因此而理解为对本申请专利范围的限制。应当指出的是,对于本领域的普通技术人员来说,在不脱离本申请构思的前提下,还可以做出若干变形和改进,这些都属于本申请的保护范围。因此,本申请专利的保护范围应以所附权利要求为准。

Claims (15)

  1. 一种基于同声传译的数据处理方法,所述方法应用于同声传译系统中的服务器,所述同声传译系统还包括同声传译设备和用户终端,包括:
    获取所述同声传译设备发送的音频;
    通过同声传译模型处理所述音频得到初始文本;
    将所述初始文本发送至所述用户终端;
    接收所述用户终端反馈的修改文本,所述修改文本是所述用户终端对所述初始文本进行修改后得到的;
    根据所述初始文本和所述修改文本更新所述同声传译模型。
  2. 根据权利要求1所述的方法,其特征在于,所述通过同声传译模型处理所述音频得到初始文本,包括:
    对所述音频进行降噪处理;
    获取降噪处理后的音频包括的语音部分;
    从所述语音部分中获取能量值大于或等于能量阈值的音频部分;
    通过所述同声传译模型处理所述音频部分获得所述初始文本。
  3. 根据权利要求1所述的方法,其特征在于,所述同声传译模型包括通用语音模型和辅助语音模型;
    所述通过同声传译模型处理所述音频得到初始文本,包括:
    通过所述通用语音模型对所述音频进行语音识别,得到识别文本;
    通过所述辅助语音模型对所述识别文本进行更新,得到识别更新文本;
    其中,所述初始文本包括所述识别文本和所述识别更新文本中的至少一种;
    所述根据所述初始文本和所述修改文本更新所述同声传译模型,包括:
    根据所述初始文本和所述修改文本更新所述辅助语音模型。
  4. 根据权利要求1所述的方法,其特征在于,所述同声传译模型包括翻译模型;所述初始文本包括翻译文本;所述修改文本包括修改的翻译文本;
    所述根据所述初始文本和所述修改文本更新所述同声传译模型,包括:
    根据所述翻译文本和所述修改的翻译文本更新所述翻译模型。
  5. 根据权利要求1所述的方法,其特征在于,所述方法还包括:
    接收所述同声传译设备发送的与所述音频匹配的视频;
    将所述初始文本嵌入所述视频;
    所述将所述初始文本发送至所述用户终端,包括:
    将已嵌入所述初始文本的视频发送至所述用户终端。
  6. 根据权利要求1所述的方法,其特征在于,所述音频与群组标识对应;
    所述将所述初始文本发送至所述用户终端,包括:将所述初始文本发送至通过所述群组标识接入的所述用户终端;
    所述方法还包括:
    接收所述用户终端反馈的评论信息;
    在通过所述群组标识接入的所述用户终端间同步所述评论信息。
  7. 根据权利要求1所述的方法,其特征在于,所述音频与群组标识对应;
    所述将所述初始文本发送至所述用户终端,包括:将所述初始文本发送至通过所述群组标识接入的所述用户终端;
    所述方法还包括:
    将所述初始文本与所述群组标识对应存储;
    当所述修改文本的加权累积值达到阈值时,将与所述群组标识对应存储的文本更新为所述修改文本;
    当接收到通过所述群组标识接入的用户终端发送的同步请求时,向发起所述同步请求的用户终端反馈与所述群组标识对应的更新后的文本。
  8. 根据权利要求1至7中任一项所述的方法,其特征在于,所述方法还包括:
    统计各个用户终端标识对应的文本修改次数;
    检测所述各个用户终端标识对应的文本修改正确率;
    对于任意一个用户终端标识,当所述文本修改次数达到修改次数阈值、且所述文本修改正确率达到文本修改正确率阈值时,调高与所述用户终端标识对应的权重。
  9. 根据权利要求1至7中任一项所述的方法,其特征在于,所述根据初始文本和所述修改文本更新所述同声传译模型,包括:
    按照与各个用户终端标识对应的权重,确定所述修改文本的加权累积值;
    当所述加权累积值达到阈值时,根据所述初始文本和所述修改文本更新所述同声传译模型。
  10. 一种基于同声传译的数据处理方法,所述方法应用于同声传译系统中的用户终端,所述同声传译系统还包括同声传译设备和服务器,包括:
    展示同声传译辅助页面;
    接收所述服务器发送的初始文本;所述初始文本是所述服务器通过同声传译模型处理所述同声传译设备发送的音频得到的;
    在所述同声传译辅助页面中展示所述初始文本;
    当检测到修改指令时,获取与所述初始文本对应的修改文本;
    将所述修改文本发送至所述服务器;所述修改文本,用于指示所述服务器 根据所述初始文本和所述修改文本更新所述同声传译模型。
  11. 根据权利要求10所述的方法,其特征在于,所述展示同声传译辅助页面,包括:
    通过母应用程序获取子应用程序标识;
    根据所述子应用程序标识,获取相应的同声传译辅助页面配置文件;
    从所述同声传译辅助页面配置文件中获取公共组件标识;
    在所述母应用程序提供的公共组件库中,选择与所述公共组件标识对应的公共组件;
    根据选择的公共组件构建所述同声传译辅助页面。
  12. 根据权利要求10或11所述的方法,其特征在于,所述方法还包括:
    接收由所述服务器同步的修改文本和对应的排序序号;接收的所述修改文本和修改前的相应文本共用排序序号;
    在本地查找与所述排序序号对应的文本;
    将本地查找到的文本替换为接收到的所述修改文本。
  13. 一种计算机设备,包括存储器和处理器,存储器存储有计算机程序,计算机程序被处理器执行时,使得处理器执行以下步骤:
    获取所述同声传译设备发送的音频;
    通过同声传译模型处理所述音频得到初始文本;
    将所述初始文本发送至所述用户终端;
    接收所述用户终端反馈的修改文本,所述修改文本是所述用户终端对所述初始文本进行修改后得到的;
    根据所述初始文本和所述修改文本更新所述同声传译模型。
  14. 一种计算机设备,包括存储器和处理器,存储器存储有计算机程序,计算机程序被处理器执行时,使得处理器执行以下步骤:
    展示同声传译辅助页面;
    接收所述服务器发送的初始文本;所述初始文本是所述服务器通过同声传译模型处理所述同声传译设备发送的音频得到的;
    在所述同声传译辅助页面中展示所述初始文本;
    当检测到修改指令时,获取与所述初始文本对应的修改文本;
    将所述修改文本发送至所述服务器;所述修改文本,用于指示所述服务器根据所述初始文本和所述修改文本更新所述同声传译模型。
  15. 一种存储介质,存储有计算机程序,所述计算机程序被处理器执行时,使得所述处理器执行如权利要求1至10中任一项所述方法的步骤,或,使得所述处理器执行如权利要求11至12中任一项所述方法的步骤。
PCT/CN2019/080027 2018-05-10 2019-03-28 基于同声传译的数据处理方法、计算机设备和存储介质 WO2019214359A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP19799122.7A EP3792916B1 (en) 2018-05-10 2019-03-28 Data processing method based on simultaneous interpretation, computer device, and storage medium
US16/941,503 US20200357389A1 (en) 2018-05-10 2020-07-28 Data processing method based on simultaneous interpretation, computer device, and storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810443090.X 2018-05-10
CN201810443090.XA CN108615527B (zh) 2018-05-10 2018-05-10 基于同声传译的数据处理方法、装置和存储介质

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US16/941,503 Continuation US20200357389A1 (en) 2018-05-10 2020-07-28 Data processing method based on simultaneous interpretation, computer device, and storage medium

Publications (1)

Publication Number Publication Date
WO2019214359A1 true WO2019214359A1 (zh) 2019-11-14

Family

ID=63662720

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/080027 WO2019214359A1 (zh) 2018-05-10 2019-03-28 基于同声传译的数据处理方法、计算机设备和存储介质

Country Status (4)

Country Link
US (1) US20200357389A1 (zh)
EP (1) EP3792916B1 (zh)
CN (3) CN110444196B (zh)
WO (1) WO2019214359A1 (zh)

Families Citing this family (30)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108196930B (zh) * 2018-01-18 2020-04-03 腾讯科技(深圳)有限公司 应用程序处理方法、装置、存储介质和计算机设备
CN110444196B (zh) * 2018-05-10 2023-04-07 腾讯科技(北京)有限公司 基于同声传译的数据处理方法、装置、系统和存储介质
CN111107380B (zh) * 2018-10-10 2023-08-15 北京默契破冰科技有限公司 一种用于管理音频数据的方法、设备和计算机存储介质
CN111031329B (zh) * 2018-10-10 2023-08-15 北京默契破冰科技有限公司 一种用于管理音频数据的方法、设备和计算机存储介质
CN111083421A (zh) * 2018-10-19 2020-04-28 珠海金山办公软件有限公司 一种表格文档展示方法及装置
CN109561081B (zh) * 2018-11-13 2023-04-07 平安科技(深圳)有限公司 移动终端视频会议方法、装置及存储介质、服务器
CN110381388B (zh) * 2018-11-14 2021-04-13 腾讯科技(深圳)有限公司 一种基于人工智能的字幕生成方法和装置
CN111506278A (zh) * 2019-01-30 2020-08-07 阿里巴巴集团控股有限公司 数据同传的方法、音频翻译的方法、装置和系统
CN111508484B (zh) * 2019-01-31 2024-04-19 阿里巴巴集团控股有限公司 语音数据的处理方法及装置
CN110047488B (zh) * 2019-03-01 2022-04-12 北京彩云环太平洋科技有限公司 语音翻译方法、装置、设备及控制设备
CN110085256B (zh) * 2019-03-21 2021-11-19 视联动力信息技术股份有限公司 信息处理方法和装置
CN110059313B (zh) * 2019-04-03 2021-02-12 百度在线网络技术(北京)有限公司 翻译处理方法和装置
CN110401889A (zh) * 2019-08-05 2019-11-01 深圳市小瑞科技股份有限公司 基于usb控制的多路蓝牙麦克风系统和使用方法
CN114223029A (zh) * 2019-08-13 2022-03-22 三星电子株式会社 支持装置进行语音识别的服务器及服务器的操作方法
CN111177353B (zh) * 2019-12-27 2023-06-09 赣州得辉达科技有限公司 文本记录生成方法、装置、计算机设备及存储介质
CN111526133B (zh) * 2020-04-10 2022-02-25 阿卡都(北京)科技有限公司 远程同传系统中展示译员信息的方法
CN113628626A (zh) * 2020-05-09 2021-11-09 阿里巴巴集团控股有限公司 语音识别方法、装置和系统以及翻译方法和系统
CN111639503B (zh) * 2020-05-22 2021-10-26 腾讯科技(深圳)有限公司 会议数据处理方法、装置、存储介质及设备
US11818373B1 (en) * 2020-09-08 2023-11-14 Block, Inc. Machine-learning based data compression for streaming media
CN114338643A (zh) * 2020-09-25 2022-04-12 北京有竹居网络技术有限公司 一种数据处理方法、装置、客户端、服务端及存储介质
CN112241632A (zh) * 2020-10-14 2021-01-19 国家电网有限公司 一种基于语音ai智能会议系统及其实现方法
CN112232092A (zh) * 2020-10-15 2021-01-15 安徽听见科技有限公司 具备机器与人工协同模式的同声传译方法以及系统
CN112164392A (zh) * 2020-11-13 2021-01-01 北京百度网讯科技有限公司 确定显示的识别文本的方法、装置、设备以及存储介质
CN112599130B (zh) * 2020-12-03 2022-08-19 安徽宝信信息科技有限公司 一种基于智慧屏的智能会议系统
CN112601102A (zh) * 2020-12-11 2021-04-02 北京有竹居网络技术有限公司 同声传译字幕的确定方法、装置、电子设备及存储介质
CN112601101B (zh) * 2020-12-11 2023-02-24 北京有竹居网络技术有限公司 一种字幕显示方法、装置、电子设备及存储介质
CN112580371A (zh) * 2020-12-25 2021-03-30 江苏鑫盛通讯科技有限公司 一种基于人工智能的人机耦合客服系统及方法
CN113689862B (zh) * 2021-08-23 2024-03-22 南京优飞保科信息技术有限公司 一种客服坐席语音数据的质检方法和系统
CN113891168B (zh) * 2021-10-19 2023-12-19 北京有竹居网络技术有限公司 字幕处理方法、装置、电子设备和存储介质
CN116384418B (zh) * 2023-05-24 2023-08-15 深圳市微克科技有限公司 一种应用智能手表进行翻译的数据处理方法及系统

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105159870A (zh) * 2015-06-26 2015-12-16 徐信 一种精准完成连续自然语音文本化的处理系统及方法
CN106486125A (zh) * 2016-09-29 2017-03-08 安徽声讯信息技术有限公司 一种基于语音识别技术的同声传译系统
CN107660303A (zh) * 2015-06-26 2018-02-02 英特尔公司 使用远程源对本地语音识别系统的语言模型修改
CN107678561A (zh) * 2017-09-29 2018-02-09 百度在线网络技术(北京)有限公司 基于人工智能的语音输入纠错方法及装置
CN108615527A (zh) * 2018-05-10 2018-10-02 腾讯科技(深圳)有限公司 基于同声传译的数据处理方法、装置和存储介质

Family Cites Families (26)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE19910236A1 (de) * 1999-03-09 2000-09-21 Philips Corp Intellectual Pty Verfahren zur Spracherkennung
US6529866B1 (en) * 1999-11-24 2003-03-04 The United States Of America As Represented By The Secretary Of The Navy Speech recognition system and associated methods
US9818136B1 (en) * 2003-02-05 2017-11-14 Steven M. Hoffberg System and method for determining contingent relevance
US8204884B2 (en) * 2004-07-14 2012-06-19 Nice Systems Ltd. Method, apparatus and system for capturing and analyzing interaction based content
US8249854B2 (en) * 2005-05-26 2012-08-21 Microsoft Corporation Integrated native language translation
CN2884704Y (zh) * 2005-11-30 2007-03-28 刘永权 实时通讯翻译装置
US8407052B2 (en) * 2006-04-17 2013-03-26 Vovision, Llc Methods and systems for correcting transcribed audio files
US8972268B2 (en) * 2008-04-15 2015-03-03 Facebook, Inc. Enhanced speech-to-speech translation system and methods for adding a new word
CN101458681A (zh) * 2007-12-10 2009-06-17 株式会社东芝 语音翻译方法和语音翻译装置
CN101697581B (zh) * 2009-10-26 2012-11-21 华为终端有限公司 支持同声传译视讯会议的方法、装置及系统
CN102360347A (zh) * 2011-09-30 2012-02-22 宇龙计算机通信科技(深圳)有限公司 一种语音翻译方法、系统及语音翻译服务器
CN103885783A (zh) * 2014-04-03 2014-06-25 深圳市三脚蛙科技有限公司 一种应用程序的语音控制方法及装置
CN103929666B (zh) * 2014-04-14 2017-11-03 深圳情景智能有限公司 一种连续语音交互方法及装置
US20160026730A1 (en) * 2014-07-23 2016-01-28 Russell Hasan Html5-based document format with parts architecture
CN104462186A (zh) * 2014-10-17 2015-03-25 百度在线网络技术(北京)有限公司 一种语音搜索方法及装置
CN105589850A (zh) * 2014-10-21 2016-05-18 青岛鑫益发工贸有限公司 阅读翻译器
US9697201B2 (en) * 2014-11-24 2017-07-04 Microsoft Technology Licensing, Llc Adapting machine translation data using damaging channel model
KR20160081244A (ko) * 2014-12-31 2016-07-08 한국전자통신연구원 자동 통역 시스템 및 이의 동작 방법
US9953073B2 (en) * 2015-05-18 2018-04-24 Oath Inc. System and method for editing dynamically aggregated data
KR102195627B1 (ko) * 2015-11-17 2020-12-28 삼성전자주식회사 통역 모델 생성 장치 및 방법과, 자동 통역 장치 및 방법
CN105512113B (zh) * 2015-12-04 2019-03-19 青岛冠义科技有限公司 交流式语音翻译系统及翻译方法
CN105551488A (zh) * 2015-12-15 2016-05-04 深圳Tcl数字技术有限公司 语音控制方法及系统
US10418026B2 (en) * 2016-07-15 2019-09-17 Comcast Cable Communications, Llc Dynamic language and command recognition
RU2626657C1 (ru) * 2016-11-01 2017-07-31 Общество с ограниченной ответственностью "Аби Девелопмент" Определение последовательности команд вывода текста в pdf документах
CN107046523A (zh) * 2016-11-22 2017-08-15 深圳大学 一种基于个人移动终端的同声传译方法及客户端
US10498898B2 (en) * 2017-12-13 2019-12-03 Genesys Telecommunications Laboratories, Inc. Systems and methods for chatbot generation

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105159870A (zh) * 2015-06-26 2015-12-16 徐信 一种精准完成连续自然语音文本化的处理系统及方法
CN107660303A (zh) * 2015-06-26 2018-02-02 英特尔公司 使用远程源对本地语音识别系统的语言模型修改
CN106486125A (zh) * 2016-09-29 2017-03-08 安徽声讯信息技术有限公司 一种基于语音识别技术的同声传译系统
CN107678561A (zh) * 2017-09-29 2018-02-09 百度在线网络技术(北京)有限公司 基于人工智能的语音输入纠错方法及装置
CN108615527A (zh) * 2018-05-10 2018-10-02 腾讯科技(深圳)有限公司 基于同声传译的数据处理方法、装置和存储介质

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3792916A4

Also Published As

Publication number Publication date
CN110444196A (zh) 2019-11-12
CN108615527B (zh) 2021-10-15
EP3792916A4 (en) 2021-06-30
CN110444196B (zh) 2023-04-07
EP3792916C0 (en) 2023-07-12
EP3792916A1 (en) 2021-03-17
CN108615527A (zh) 2018-10-02
US20200357389A1 (en) 2020-11-12
EP3792916B1 (en) 2023-07-12
CN110444197A (zh) 2019-11-12
CN110444197B (zh) 2023-01-03

Similar Documents

Publication Publication Date Title
WO2019214359A1 (zh) 基于同声传译的数据处理方法、计算机设备和存储介质
US11315546B2 (en) Computerized system and method for formatted transcription of multimedia content
US9530415B2 (en) System and method of providing speech processing in user interface
US11917344B2 (en) Interactive information processing method, device and medium
KR101027548B1 (ko) 통신 시스템용 보이스 브라우저 다이얼로그 인에이블러
US20100100371A1 (en) Method, System, and Apparatus for Message Generation
EP2455936B1 (en) Speech translation system, dictionary server, and program
JP2017084366A (ja) メッセージ提供方法、メッセージ提供装置、表示制御方法、表示制御装置及びコンピュータプログラム
KR100451260B1 (ko) 제한된 처리 능력을 갖는 장치들에 대체 입력 장치로서 연속 스피치 인식을 제공하는 방법, 장치 및 제조품
US11869508B2 (en) Systems and methods for capturing, processing, and rendering one or more context-aware moment-associating elements
CN109782997B (zh) 一种数据处理方法、装置及存储介质
TWI399739B (zh) 語音留言與傳達之系統與方法
CN114064943A (zh) 会议管理方法、装置、存储介质及电子设备
US20230326369A1 (en) Method and apparatus for generating sign language video, computer device, and storage medium
WO2021057957A1 (zh) 视频通话方法、装置、计算机设备和存储介质
KR101351264B1 (ko) 음성인식 기반의 메시징 통역서비스 제공 시스템 및 그 방법
JPWO2018043137A1 (ja) 情報処理装置及び情報処理方法
CN110992960A (zh) 控制方法、装置、电子设备和存储介质
CN111968630B (zh) 信息处理方法、装置和电子设备
KR102127909B1 (ko) 채팅 서비스 제공 시스템, 이를 위한 장치 및 방법
JP2016024378A (ja) 情報処理装置、その制御方法及びプログラム
US11830120B2 (en) Speech image providing method and computing device for performing the same
KR102248701B1 (ko) 다국어 음성 자동 통역 채팅시 통역의 시점과 종점과 소정 정보 제공을 소정의 음성으로 제어하는 방법
US20240087574A1 (en) Systems and methods for capturing, processing, and rendering one or more context-aware moment-associating elements
US20240135949A1 (en) Joint Acoustic Echo Cancellation (AEC) and Personalized Noise Suppression (PNS)

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19799122

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

WWE Wipo information: entry into national phase

Ref document number: 2019799122

Country of ref document: EP

ENP Entry into the national phase

Ref document number: 2019799122

Country of ref document: EP

Effective date: 20201210