CN113312928A - Text translation method and device, electronic equipment and storage medium - Google Patents

Text translation method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN113312928A
CN113312928A CN202110610695.5A CN202110610695A CN113312928A CN 113312928 A CN113312928 A CN 113312928A CN 202110610695 A CN202110610695 A CN 202110610695A CN 113312928 A CN113312928 A CN 113312928A
Authority
CN
China
Prior art keywords
translation
voice data
hotword
hot word
audio
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110610695.5A
Other languages
Chinese (zh)
Inventor
徐文铭
韩晓
杜春赛
陈可蓉
杨晶生
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Zitiao Network Technology Co Ltd
Original Assignee
Beijing Zitiao Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Zitiao Network Technology Co Ltd filed Critical Beijing Zitiao Network Technology Co Ltd
Priority to CN202110610695.5A priority Critical patent/CN113312928A/en
Publication of CN113312928A publication Critical patent/CN113312928A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/58Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Acoustics & Sound (AREA)
  • Human Computer Interaction (AREA)
  • Machine Translation (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
  • Telephonic Communication Services (AREA)

Abstract

The disclosure provides a text translation method, a text translation device, an electronic device and a storage medium. One embodiment of the method comprises: acquiring a target recognition text and a hotword corresponding to continuous voice data, wherein the hotword is associated with the continuous voice data; determining a hot word translation of the hot word according to scene information of the continuous voice data; and translating the target recognition text based on the hot word translation. The method can reasonably and accurately determine the correct translation of the target recognition text by combining the scene information of the continuous voice data, and is favorable for improving the accuracy of the translation result.

Description

Text translation method and device, electronic equipment and storage medium
Technical Field
The embodiment of the disclosure relates to the technical field of translation, in particular to a text translation method, a text translation device, electronic equipment and a storage medium.
Background
Machine translation, also known as automatic translation, is the process of converting one natural language (source language) to another (target language) using a computer. From early dictionary matching, regular translation of a dictionary in combination with linguistic expert knowledge and statistical machine translation based on a corpus, along with the improvement of computer computing capacity and explosive growth of multi-language information, a machine translation technology starts to provide real-time and convenient translation service for common users.
However, in the existing machine translation technology, the translation error rate of some words (such as human names, specific entity words and the like) in a specific scene is high. For example, a person's name is "X climb," and when the translation engine performs chinese-to-english translation, the person may literally translate into "X clinmbig," which causes a translation error.
Therefore, it is necessary to provide a new technical solution for performing machine translation.
Disclosure of Invention
The embodiment of the disclosure provides a text translation method and device, electronic equipment and a storage medium.
In a first aspect, the present disclosure provides a text translation method, including:
acquiring a target recognition text and a hotword corresponding to continuous voice data, wherein the hotword is associated with the continuous voice data;
determining a hot word translation of the hot word according to the scene information of the continuous voice data;
and translating the target recognition text based on the hot word translation.
In some optional embodiments, the continuous voice data is continuous voice data of a speaking user in the audio-video conference.
In some alternative embodiments, the continuous speech data is obtained by a single invocation of an automatic speech recognition service.
In some optional embodiments, the hot words corresponding to the continuous voice data include names of people and/or high-frequency words of the audio-video conference.
In some optional embodiments, the scene information of the continuous speech data includes meeting description information, meeting staff information and/or meeting transcript information of the audio-video meeting.
In some optional embodiments, the determining a hotword translation of the hotword according to the scene information of the continuous speech data includes:
determining a hot word translation of the name of the person in the audio and video conference according to the conference person information of the audio and video conference; and/or
And determining a hot word translation of the high-frequency words of the audio and video conference according to the conference description information or the conference transcription information of the audio and video conference.
In some optional embodiments, the translating the target recognition text based on the hotword translation includes:
for each hot word corresponding to the continuous voice data, searching a target word consistent with the hot word in the target recognition text;
and in response to the finding, determining the hot word translation corresponding to the hot word as the target translation corresponding to the target word.
In some alternative embodiments, the hotword translation is stored in a memory.
In a second aspect, the present disclosure provides a text translation apparatus, including:
the device comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring a target recognition text and a hotword corresponding to continuous voice data, and the hotword is associated with the continuous voice data;
a determining unit, configured to determine a hotword translation of the hotword according to scene information of the continuous speech data;
and the translation unit is used for translating the target recognition text based on the hot word translation.
In some optional embodiments, the continuous voice data is continuous voice data of a speaking user in the audio-video conference.
In some alternative embodiments, the continuous speech data is obtained by a single invocation of an automatic speech recognition service.
In some optional embodiments, the hot words corresponding to the continuous voice data include names of people and/or high-frequency words of the audio-video conference.
In some optional embodiments, the scene information of the continuous speech data includes meeting description information, meeting staff information and/or meeting transcript information of the audio-video meeting.
In some optional embodiments, the determining unit is further configured to:
determining a hot word translation of the name of the person in the audio and video conference according to the conference person information of the audio and video conference; and/or
And determining a hot word translation of the high-frequency words of the audio and video conference according to the conference description information or the conference transcription information of the audio and video conference.
In some optional embodiments, the translation unit is further configured to:
for each hot word corresponding to the continuous voice data, searching a target word consistent with the hot word in the target recognition text;
and in response to the finding, determining the hot word translation corresponding to the hot word as the target translation corresponding to the target word.
In some alternative embodiments, the hotword translation is stored in a memory.
In a third aspect, the present disclosure provides an electronic device, comprising:
one or more processors;
a storage device having one or more programs stored thereon,
the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method as described in any embodiment of the first aspect of the disclosure.
In a fourth aspect, the present disclosure provides a computer-readable storage medium having a computer program stored thereon, wherein the computer program, when executed by one or more processors, implements the method as described in any one of the embodiments of the first aspect of the present disclosure.
According to the text translation method, the text translation device, the electronic equipment and the storage medium, the hot word translation of the hot word is determined according to the scene information of the continuous voice data, the target recognition text is translated on the basis, the correct translation of the target recognition text can be reasonably and accurately determined by combining the scene information of the continuous voice data, and the accuracy of the translation result is improved.
Drawings
Other features, objects, and advantages of the disclosure will become apparent from a reading of the following detailed description of non-limiting embodiments which proceeds with reference to the accompanying drawings. The drawings are only for purposes of illustrating the particular embodiments and are not to be construed as limiting the invention. In the drawings:
FIG. 1 is a system architecture diagram of one embodiment of a translation system according to the present disclosure;
FIG. 2 is a flow diagram for one embodiment of a text translation method according to the present disclosure;
FIG. 3 is a schematic diagram of one example of a text translation method according to the present disclosure;
FIG. 4 is a schematic block diagram of one embodiment of a text translation device according to the present disclosure;
FIG. 5 is a schematic block diagram of a computer system suitable for use with an electronic device implementing embodiments of the present disclosure.
Detailed Description
The present disclosure is described in further detail below with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.
It should be noted that, in the present disclosure, the embodiments and features of the embodiments may be combined with each other without conflict. The present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.
Fig. 1 illustrates an exemplary system architecture 100 to which embodiments of the text translation methods, apparatus, terminal devices, and storage media of the present disclosure may be applied.
As shown in fig. 1, the system architecture 100 may include terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the terminal devices 101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.
The user may use the terminal devices 101, 102, 103 to interact with the server 105 via the network 104 to receive or send messages or the like. Various communication client applications, such as a voice interaction application, a video conference application, a short video social application, a web browser application, a shopping application, a search application, an instant messaging tool, a mailbox client, social platform software, and the like, may be installed on the terminal devices 101, 102, 103.
The terminal apparatuses 101, 102, and 103 may be hardware or software. When the terminal devices 101, 102, 103 are hardware, they may be various electronic devices having a microphone and a speaker, including but not limited to smart phones, tablet computers, e-book readers, MP3 players (Moving Picture Experts Group Audio Layer III, motion Picture Experts compression standard Audio Layer 3), MP4 players (Moving Picture Experts Group Audio Layer IV, motion Picture Experts compression standard Audio Layer 4), portable computers, desktop computers, and the like. When the terminal apparatuses 101, 102, 103 are software, they can be installed in the electronic apparatuses listed above. It may be implemented as multiple software or software modules (e.g., for translation services) or as a single software or software module. And is not particularly limited herein.
The server 105 may be a server providing various services, such as a background server providing processing services for audio signals captured on the terminal devices 101, 102, 103. The background server can perform corresponding processing on the received audio signals and the like.
In some cases, the text translation method provided by the present disclosure may be executed by the terminal devices 101, 102, 103 and the server 105 together, for example, the step of "determining a hotword translation of a hotword according to scene information of continuous speech data" may be executed by the server 105, and the step of "translating a target recognition text based on the hotword translation" may be executed by the terminal devices 101, 102, 103. The present disclosure is not limited thereto. Accordingly, the text translation means may be provided in the terminal devices 101, 102, and 103 and the server 105, respectively.
In some cases, the text translation method provided by the present disclosure may be executed by the terminal devices 101, 102, and 103, and accordingly, the text translation apparatus may also be disposed in the terminal devices 101, 102, and 103, in this case, the system architecture 100 may not include the server 105.
In some cases, the text translation method provided by the present disclosure may be executed by the server 105, and accordingly, the text translation apparatus may also be disposed in the server 105, and in this case, the system architecture 100 may also not include the terminal devices 101, 102, and 103.
The server 105 may be hardware or software. When the server 105 is hardware, it may be implemented as a distributed server cluster composed of a plurality of servers, or may be implemented as a single server. When the server 105 is software, it may be implemented as multiple pieces of software or software modules (e.g., to provide distributed services), or as a single piece of software or software module. And is not particularly limited herein.
It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
With continuing reference to fig. 2, a flow 200 of one embodiment of a text translation method according to the present disclosure is shown, applied to the terminal device or the server in fig. 1, the flow 200 including the steps of:
step 201, a target recognition text and a hotword corresponding to a continuous voice data are obtained, wherein the hotword is associated with the continuous voice data.
In this embodiment, the continuous speech data may be a continuous speech segment, or may be an integral formed by a plurality of speech segments with smaller intervals. For example, a number of speech passages spaced less than 5 seconds apart from each other may form a continuous speech data.
In one example, the continuous speech data may be continuous speech data of a speaking user in an audio-video conference. Here, the continuous speech data of the speaking user may be a continuous speech passage formed when the user speaks alone, or may be an integral body composed of a plurality of speech passages with smaller intervals formed when the user talks with other users.
In the present embodiment, the target recognition text is a text to be translated. The target recognition text can be obtained by recognizing continuous voice data through a voice recognition technology. The voice recognition technology can utilize the language pattern recognition and the autonomous learning technology to perform centralized analysis processing on the sound signals generated by various services, thereby realizing high-efficiency voice transcription character service. The speech recognition can comprise three basic parts of feature extraction, pattern matching and reference model library, and is divided into two stages of learning and training. Firstly, training the characteristic parameters of the recognition content to obtain a reference template, and then matching the test template with the existing reference template through a recognition decision to obtain the best matched reference template, thereby forming a voice recognition result.
In one example, voice recognition may be performed on continuous voice data of the audio-video conference to obtain a voice recognition text, and the voice recognition text is used as a target recognition text. Here, the target recognition text may be a caption text of the conference, or may be a conference recording text of the conference.
In the embodiment, the hotword is a word related to a source scene of the target recognition text. For example, when the source scene of the target identification text is a conference scene, the hotword corresponding to the target identification text may be a name of a participant, a word in a conference title, a word in a conference caption, a word in a conference record, and the like. For another example, when the source scene of the target recognition text is a lecture scene, the hotword corresponding to the target recognition text may be a speaker's name, a word in the lecture name, or the like.
In the present embodiment, the hotword may correspond to any language, such as chinese or english, and the disclosure does not limit this.
In one example, the hot words corresponding to the continuous voice data may include names of people and/or high frequency words of the audio-video conference. The names of the persons in the audio and video conference include names of the participants in the audio and video conference, names in an electronic address book of the participants, and the like. The high-frequency words of the audio/video Conference may be words with higher actual occurrence frequency in the audio/video Conference, or words with higher probability of occurrence in the audio/video Conference, or words in a Conference title, a Conference caption, or a Conference record, such as "investment", "VC (video Conference)", or the like.
In the example of an audio-video conference, an audio-video conference may correspond to a plurality of continuous speech data, each having a corresponding set of hotwords.
In one example, the voice data may be converted into the recognized text by calling a preset automatic voice recognition service. The execution subject of the text translation method in this embodiment may establish a session with the automatic speech recognition service engine. In the case where voice data is continuously generated and transmitted, a session between the execution body and the automatic speech recognition service engine is always maintained. If the speech data is interrupted for more than a certain period of time, a session disconnection between the subject and the automatic speech recognition service engine is performed. It is easy to understand that a session is established between the execution subject and the automatic speech recognition service engine, that is, a session is called by the execution subject for the automatic speech recognition service, and accordingly, a section of continuous speech data is formed and a corresponding target recognition text is obtained.
The execution main body can also obtain the hot words corresponding to the continuous voice data by calling the preset hot word service.
Step 202, determining a hotword translation of the hotword according to the scene information of the continuous voice data.
In the present embodiment, the scene information is information related to a source scene of the continuous voice data. When the continuous voice data is from an audio-video conference, the relevant scene information may include conference description information (e.g., a conference title or a conference brief introduction, etc.), conference person information (e.g., user information of a participant, etc.), conference transcription information (e.g., a conference record or a conference caption of the current conference), and the like.
In this embodiment, a hotword translation of a hotword may be determined according to scene information of continuous speech data.
According to the conference staff information of the audio and video conference, the hot word translation of the staff name of the audio and video conference can be determined. For example, for the hotword "X climb," it may be judged to be a participant's name based on the participant information. If an English name such as "Leo" exists in the user information of the user, a hotword translation of the hotword "X climb" may be determined as "Leo"; if the English name does not exist in the related user information, the corresponding hot word translation can be determined to be 'Panding X' instead of 'X Climbing' according to the name translation rule.
And determining a hot word translation of the high-frequency words of the audio and video conference according to the conference description information or the conference transcription information of the audio and video conference. For the hotword "VC", it may be determined from the Conference description information (for example, "video Conference" is included in the Conference title) that it translates into "video Conference (Visual Conference)", instead of translating into "Venture Capital" in the usual case.
In one example, a hotword translation corresponding to the hotword may be stored in the memory. For example, each hot word and the corresponding hot word translation may be formed into a translation word list, and the translation word list may be stored in the memory. Through the mode, on one hand, the occupation of the local storage space of the execution main body can be reduced, on the other hand, the access speed of the hot word translation is favorably improved, and the translation speed is further improved.
In one example, for each successive speech data formed by a single invocation of the automatic speech recognition service, a corresponding translation vocabulary may be formed for translating the corresponding target recognition text. In the process of the audio and video conference, the execution host calls the automatic speech recognition service for multiple times, and a corresponding translation word list is formed in each call and is used for translating the corresponding target recognition text. The translation vocabulary can be changed along with the progress of the audio-video conference, namely the translation vocabulary (including the hotword and the hotword translation) has real-time property and dynamic property. For continuous speech data formed by calling the automatic speech recognition service, the continuous speech data can be translated by using a translation word list formed by calling the automatic speech recognition service. Compared with the fixed translation word list, the method ensures that the information according to the translation has pertinence, and is beneficial to improving the accuracy of the translation result.
For example, assuming that a first speaking user speaks around a "videoconference," a first continuous speech data is formed by a single invocation of the automatic speech recognition service; the second speaking user speaks around the "wind cast," forming a second continuous voice data with a single invocation of the automatic voice recognition service. Assuming that the hotword corresponding to the first continuous voice data and the second continuous voice data both include "VC," the hotword translation of the hotword will be different due to the difference in the speaking content of the two speaking users. For the first continuous speech data, the hotword translation of the hotword "VC" may be determined to be "video conference", and for the second continuous speech data, the hotword translation of the hotword "VC" may be determined to be "wind cast". For the first continuous voice data and the second continuous voice data, translation is carried out by using corresponding translation word lists respectively, and translation results which are consistent with specific contents of the translation words can be obtained. However, if a fixed translation vocabulary is adopted, the same translation result can only be obtained for the hot words corresponding to different continuous voice data, and the above effect cannot be realized.
And step 203, translating the target recognition text based on the hot word translation.
In one example, step 203 may be implemented as follows: and for each hot word corresponding to the continuous voice data, searching a target word consistent with the hot word in the target recognition text. And responding to the finding, and determining the hot word translation corresponding to the hot word as the target translation corresponding to the target word.
According to the text translation method provided by the embodiment of the disclosure, the hot word translation of the hot word is determined according to the scene information of the continuous voice data, and the target recognition text is translated on the basis, so that the correct translation of the target recognition text can be reasonably and accurately determined by combining the scene information of the continuous voice data, and the accuracy of the translation result can be favorably improved.
In one example, the text translation method may further include the following steps: and determining corresponding translated text for contents except the target words in the target recognition text according to a preset translation algorithm. As shown in fig. 3, for a target word in the target recognition text, a hot word translation corresponding to the hot word may be determined as a target translation corresponding to the target word. For contents other than the target words in the target recognition text, corresponding translations can be determined according to a preset translation algorithm. Therefore, the text translation method provided by the disclosure can be combined with the existing text translation method, meets the universality translation task, and simultaneously improves the accuracy of scene-related word translation, thereby making up the defects of the existing text translation method.
With further reference to fig. 4, as an implementation of the methods shown in the above-mentioned figures, the present disclosure provides an embodiment of a text translation apparatus, which corresponds to the method embodiment shown in fig. 2, and which can be specifically applied to various terminal devices.
As shown in fig. 4, the text translation apparatus 400 of the present embodiment includes: an acquisition unit 401, a determination unit 402, and a translation unit 403. The obtaining unit 401 is configured to obtain a target recognition text and a hotword corresponding to a continuous speech data, where the hotword is associated with the continuous speech data. A determining unit 402, configured to determine a hotword translation of the hotword according to the scene information of the continuous speech data. And a translation unit 403, configured to translate the target recognition text based on the hotword translation.
In this embodiment, specific processes of the obtaining unit 401, the determining unit 402, and the translating unit 403 of the text translating apparatus 400 and technical effects thereof can refer to related descriptions of step 201, step 202, and step 203 in the corresponding embodiment of fig. 2, which are not described herein again.
In some optional embodiments, the continuous voice data may be continuous voice data of a speaking user in the audio-video conference.
In some alternative embodiments, the continuous speech data may be obtained by a single invocation of an automatic speech recognition service.
In some optional embodiments, the hot words corresponding to the continuous voice data may include names of people and/or high-frequency words of the audio-video conference.
In some optional embodiments, the scene information of the continuous speech data may include meeting description information, meeting staff information, and/or meeting transcript information of the audiovisual meeting.
In some optional embodiments, the determining unit 402 may be further configured to: determining a hot word translation of the name of the person in the audio and video conference according to the conference person information of the audio and video conference; and/or determining a hot word translation of the high-frequency words of the audio and video conference according to the conference description information or the conference transcription information of the audio and video conference.
In some optional embodiments, the translation unit 403 may be further configured to: for each hot word corresponding to the continuous voice data, searching a target word consistent with the hot word in the target recognition text; and in response to the finding, determining the hot word translation corresponding to the hot word as the target translation corresponding to the target word.
In some alternative embodiments, the hotword translation may be stored in a memory.
It should be noted that, for details of implementation and technical effects of each unit in the text translation apparatus provided in the embodiments of the present disclosure, reference may be made to descriptions of other embodiments in the present disclosure, and details are not described herein again.
Referring now to FIG. 5, a block diagram of a computer system 500 suitable for use in implementing the terminal devices of the present disclosure is shown. The computer system 500 shown in fig. 5 is only an example and should not bring any limitations to the functionality or scope of use of the embodiments of the present disclosure.
As shown in fig. 5, computer system 500 may include a processing device (e.g., central processing unit, graphics processor, etc.) 501 that may perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)502 or a program loaded from a storage device 508 into a Random Access Memory (RAM) 503. In the RAM 503, various programs and data necessary for the operation of the computer system 500 are also stored. The processing device 501, the ROM 502, and the RAM 503 are connected to each other through a bus 504. An input/output (I/O) interface 505 is also connected to bus 504.
Generally, the following devices may be connected to the I/O interface 505: input devices 506 including, for example, a touch screen, a touch pad, a keyboard, a mouse, a camera, a microphone, and the like; output devices 507 including, for example, a Liquid Crystal Display (LCD), speakers, vibrators, and the like; storage devices 508 including, for example, magnetic tape, hard disk, etc.; and a communication device 509. The communication means 509 may allow the computer system 500 to communicate with other devices wirelessly or by wire to exchange data. While fig. 5 illustrates a computer system 500 having various means of electronic equipment, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided.
In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication means 509, or installed from the storage means 508, or installed from the ROM 502. The computer program, when executed by the processing device 501, performs the above-described functions defined in the methods of embodiments of the present disclosure.
It should be noted that the computer readable medium in the present disclosure can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present disclosure, a computer readable signal medium may comprise a propagated data signal with computer readable program code embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.
The computer readable medium may be embodied in the electronic device; or may exist separately without being assembled into the electronic device.
The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to implement the text translation method as shown in the embodiment shown in fig. 2 and its alternative embodiments.
Computer program code for carrying out operations for aspects of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units described in the embodiments of the present disclosure may be implemented by software or hardware. The unit name does not in some cases constitute a limitation of the unit itself, and for example, the hotword acquiring unit may also be described as "a unit for acquiring a target recognition text and a hotword corresponding to a continuous speech data".
The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the disclosure herein is not limited to the particular combination of features described above, but also encompasses other embodiments in which any combination of the features described above or their equivalents does not depart from the spirit of the disclosure. For example, the above features and (but not limited to) the features disclosed in this disclosure having similar functions are replaced with each other to form the technical solution.

Claims (11)

1. A method of text translation, comprising:
acquiring a target recognition text and a hotword corresponding to continuous voice data, wherein the hotword is associated with the continuous voice data;
determining a hot word translation of the hot word according to the scene information of the continuous voice data;
and translating the target recognition text based on the hot word translation.
2. The method of claim 1, wherein the continuous voice data is continuous voice data of a speaking user in an audio-video conference.
3. The method of claim 1, wherein the continuous speech data is obtained by a single invocation of an automatic speech recognition service.
4. The method according to claim 2, wherein the hot words corresponding to the continuous voice data comprise names of people and/or high-frequency words of the audio-video conference.
5. The method according to claim 4, wherein the scene information of the continuous voice data comprises meeting description information, meeting personnel information and/or meeting transcription information of the audio-video meeting.
6. The method of claim 5, wherein determining a hotword translation of the hotword from context information of the continuous speech data comprises:
determining a hotword translation of the name of the person in the audio and video conference according to the conference person information of the audio and video conference; and/or
And determining a hot word translation of the high-frequency words of the audio and video conference according to the conference description information or the conference transcription information of the audio and video conference.
7. The method of any of claims 1-6, wherein the translating the target recognition text based on the hotword translation comprises:
for each hot word corresponding to the continuous voice data, searching a target word consistent with the hot word in the target recognition text;
and in response to the finding, determining the hot word translation corresponding to the hot word as the target translation corresponding to the target word.
8. The method of any of claims 1-6, wherein the hotword translation is stored in memory.
9. A text translation apparatus comprising:
the device comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring a target recognition text and a hotword corresponding to continuous voice data, and the hotword is associated with the continuous voice data;
the determining unit is used for determining the hot word translation of the hot word according to the scene information of the continuous voice data;
and the translation unit is used for translating the target recognition text based on the hot word translation.
10. An electronic device, comprising:
one or more processors;
a storage device having one or more programs stored thereon,
the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method of any of claims 1-8.
11. A computer-readable storage medium, on which a computer program is stored, wherein the computer program, when executed by one or more processors, implements the method of any one of claims 1-8.
CN202110610695.5A 2021-06-01 2021-06-01 Text translation method and device, electronic equipment and storage medium Pending CN113312928A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110610695.5A CN113312928A (en) 2021-06-01 2021-06-01 Text translation method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110610695.5A CN113312928A (en) 2021-06-01 2021-06-01 Text translation method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN113312928A true CN113312928A (en) 2021-08-27

Family

ID=77376857

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110610695.5A Pending CN113312928A (en) 2021-06-01 2021-06-01 Text translation method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN113312928A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115113787A (en) * 2022-07-05 2022-09-27 北京字跳网络技术有限公司 Message processing method, device, equipment and medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108986820A (en) * 2018-06-29 2018-12-11 北京百度网讯科技有限公司 For the method, apparatus of voiced translation, electronic equipment and storage medium
CN110633475A (en) * 2019-09-27 2019-12-31 安徽咪鼠科技有限公司 Natural language understanding method, device and system based on computer scene and storage medium
CN110941965A (en) * 2018-09-06 2020-03-31 重庆好德译信息技术有限公司 Instant translation system based on professional language

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108986820A (en) * 2018-06-29 2018-12-11 北京百度网讯科技有限公司 For the method, apparatus of voiced translation, electronic equipment and storage medium
CN110941965A (en) * 2018-09-06 2020-03-31 重庆好德译信息技术有限公司 Instant translation system based on professional language
CN110633475A (en) * 2019-09-27 2019-12-31 安徽咪鼠科技有限公司 Natural language understanding method, device and system based on computer scene and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115113787A (en) * 2022-07-05 2022-09-27 北京字跳网络技术有限公司 Message processing method, device, equipment and medium
CN115113787B (en) * 2022-07-05 2024-04-19 北京字跳网络技术有限公司 Message processing method, device, equipment and medium

Similar Documents

Publication Publication Date Title
US10586541B2 (en) Communicating metadata that identifies a current speaker
US12002464B2 (en) Systems and methods for recognizing a speech of a speaker
US20240021202A1 (en) Method and apparatus for recognizing voice, electronic device and medium
JP7400100B2 (en) Privacy-friendly conference room transcription from audio-visual streams
CN111599343B (en) Method, apparatus, device and medium for generating audio
US10956480B2 (en) System and method for generating dialogue graphs
CN110730952A (en) Method and system for processing audio communication on network
JP6233798B2 (en) Apparatus and method for converting data
CN109582825B (en) Method and apparatus for generating information
US10762906B2 (en) Automatically identifying speakers in real-time through media processing with dialog understanding supported by AI techniques
CN111883107A (en) Speech synthesis and feature extraction model training method, device, medium and equipment
CN112364144A (en) Interaction method, device, equipment and computer readable medium
CN112581965A (en) Transcription method, device, recording pen and storage medium
CN111354362A (en) Method and device for assisting hearing-impaired communication
CN114064943A (en) Conference management method, conference management device, storage medium and electronic equipment
CN113312928A (en) Text translation method and device, electronic equipment and storage medium
WO2023142590A1 (en) Sign language video generation method and apparatus, computer device, and storage medium
CN110196900A (en) Exchange method and device for terminal
CN112652329B (en) Text realignment method and device, electronic equipment and storage medium
CN113221514A (en) Text processing method and device, electronic equipment and storage medium
CN113761865A (en) Sound and text realignment and information presentation method and device, electronic equipment and storage medium
CN112632241A (en) Method, device, equipment and computer readable medium for intelligent conversation
CN113241061B (en) Method and device for processing voice recognition result, electronic equipment and storage medium
CN111582708A (en) Medical information detection method, system, electronic device and computer-readable storage medium
CN112383722B (en) Method and apparatus for generating video

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20210827

RJ01 Rejection of invention patent application after publication