CN109658919A - Interpretation method, device and the translation playback equipment of multimedia file - Google Patents
Interpretation method, device and the translation playback equipment of multimedia file Download PDFInfo
- Publication number
- CN109658919A CN109658919A CN201811543822.9A CN201811543822A CN109658919A CN 109658919 A CN109658919 A CN 109658919A CN 201811543822 A CN201811543822 A CN 201811543822A CN 109658919 A CN109658919 A CN 109658919A
- Authority
- CN
- China
- Prior art keywords
- file
- original
- new
- new speech
- voice
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000013519 translation Methods 0.000 title claims abstract description 91
- 238000000034 method Methods 0.000 title claims abstract description 31
- 230000001360 synchronised effect Effects 0.000 claims abstract description 18
- 230000015572 biosynthetic process Effects 0.000 claims description 17
- 238000003786 synthesis reaction Methods 0.000 claims description 16
- 230000008859 change Effects 0.000 claims description 3
- 230000014616 translation Effects 0.000 description 84
- 238000001514 detection method Methods 0.000 description 15
- 101001008616 Tityus serrulatus Potassium channel toxin epsilon-KTx 1.1 Proteins 0.000 description 10
- 230000000694 effects Effects 0.000 description 10
- 238000010586 diagram Methods 0.000 description 9
- 230000002123 temporal effect Effects 0.000 description 8
- 101001008617 Tityus serrulatus Potassium channel toxin epsilon-KTx 1.2 Proteins 0.000 description 6
- 230000006870 function Effects 0.000 description 4
- 230000008569 process Effects 0.000 description 4
- 241001465754 Metazoa Species 0.000 description 3
- 238000006243 chemical reaction Methods 0.000 description 2
- 230000005236 sound signal Effects 0.000 description 2
- 230000009466 transformation Effects 0.000 description 2
- 230000007704 transition Effects 0.000 description 2
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/04—Segmentation; Word boundary detection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11B—INFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
- G11B20/00—Signal processing not specific to the method of recording or reproducing; Circuits therefor
- G11B20/10—Digital recording or reproducing
- G11B20/12—Formatting, e.g. arrangement of data block or words on the record carriers
- G11B20/1217—Formatting, e.g. arrangement of data block or words on the record carriers on discs
- G11B20/1251—Formatting, e.g. arrangement of data block or words on the record carriers on discs for continuous data, e.g. digitised analog information signals, pulse code modulated [PCM] data
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H2240/00—Data organisation or data communication aspects, specifically adapted for electrophonic musical tools or instruments
- G10H2240/011—Files or data streams containing coded musical information, e.g. for transmission
- G10H2240/046—File format, i.e. specific or non-standard musical file format used in or adapted for electrophonic musical instruments, e.g. in wavetables
- G10H2240/071—Wave, i.e. Waveform Audio File Format, coding, e.g. uncompressed PCM audio according to the RIFF bitstream format method
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- Multimedia (AREA)
- Theoretical Computer Science (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Signal Processing (AREA)
- Machine Translation (AREA)
- Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
Abstract
Present invention discloses a kind of interpretation method of multimedia file, device and translation playback equipments, and wherein method includes: the original voice file obtained in multimedia file;It translates the original voice file and obtains new speech file, the language in the new speech file is appointed language;The load attribute of the new speech file is configured, it is synchronous to load the new speech file when so that the multimedia file playing.Realize the file that the original voice file in a kind of multimedia file is converted into other languages automatically.
Description
Technical field
The present invention relates to field of computer technology, especially relate to interpretation method, the device of a kind of multimedia file
And translation playback equipment.
Background technique
With the fast development of computer technology, the user using player plays multimedia file is more and more.Due to
When playing multimedia file, it usually needs shown to the corresponding prompt information of multimedia file.For example, user is playing
When song, it may be necessary to while showing the corresponding lyrics of song;User is when watching movie, it may be necessary to while showing that film is corresponding
Subtitle.Since prompt information can be the corresponding character of different language, for example, user is with Chinese for mother tongue and English poor
User, and song is English song, even if music player can show the English lyrics, but the song can mention user
The Limited information of confession.
Multimedia audio-video data currently on the market is to first pass through manual type to realize the translation of different language, then pass through shadow
For picture picture subtitle superposition by subtitle superposition into video pictures, audio-frequency unit is also to first pass through human translation voice is synchronized to view
On frequency picture.This is also meaned that, when user sees the multimedia image data of other language, is artificially turned over if not first passing through
It translates, multimedia image data can only play the subtitle and voice of other language, and user is to be difficult to understand for multimedia at this time
The problem of content.
Summary of the invention
The main object of the present invention is the interpretation method for providing a kind of multimedia file, device and translation playback equipment, solution
Certainly user cannot understand and cannot identify the content of the video of other language or audio in multimedia file.
In order to achieve the above-mentioned object of the invention, the present invention proposes a kind of interpretation method of multimedia file, comprising:
Obtain the original voice file in multimedia file;
It translates the original voice file and obtains new speech file, the language in the new speech file is appointed language;
The load attribute of the new speech file is configured, when so that the multimedia file playing, synchronous load is described new
Voice document.
It further, include original audio file in the original voice file, in the acquisition multimedia file
The step of original voice file, comprising:
Detect the beginning and end of the voice of each personage in the multimedia file;
Using the voice segments between the origin-to-destination of the voice of each personage as original audio file, wherein institute
Stating original audio file is the first original voice file.
It further, further include having raw tone text file in the original voice file, the acquisition multimedia text
The step of original voice file in part, comprising:
The original voice file is converted into the raw tone text file, wherein the raw tone text text
Part is the second original voice file.
Further, the voice segments using between the origin-to-destination of the voice of each personage are as raw tone
After the step of file, comprising:
Detect the format of the original audio file;
Judge whether the original audio file format is PCM format;
If it is not, being PCM format by the format change of the original audio file.
Further, the translation raw tone text file obtains new speech text file, the new speech text
After the step of language in this document is the text file of appointed language, comprising:
The raw tone text file is translated, the new speech text file after being translated, the new speech
Text file is the first new speech file.
Further, the translation original voice file obtains new speech file, the language in the new speech file
The step of speech is appointed language, comprising:
Each new speech text file is subjected to speech synthesis, obtains new audio file, the new audio file is
Second new speech file.
Further, the load attribute of the configuration new speech file, when so that the multimedia file playing, together
Step loaded after the step of new speech file, comprising:
It receives and plays selection signal, the broadcastings selection signal is for selecting the broadcasting new speech text file and original
Speech text file it is one or more, and selection plays one of original audio file and new audio file, and described
New speech text file and the new audio file at least play one of which;
It is played out according to the broadcasting selection signal.
Further, the translation original voice file obtains new speech file, the language in the new speech file
After the step of speech is appointed language, including,
It receives and searches information, the lookups information is the original audio file, the raw tone text file, described
The text or sentence of new speech text file and the new audio file in any one;
Lookup result is played according to the lookup signal is corresponding.
The present invention also provides a kind of translating equipments of multimedia file, comprising:
Module is obtained, for obtaining the original voice file in multimedia file;
Translation module obtains new speech file for translating the original voice file, the language in the new speech file
Speech is appointed language;
Configuration module, for configuring the load attribute of the new speech file, when so that the multimedia file playing, together
Step loads the new speech file.
A kind of translation playback equipment, including memory, processor and application program, the application program are stored in described
It in memory and is configured as being executed by the processor, the application program is configurable for executing described in any of the above-described
Method.
Interpretation method, device and the translation playback equipment of a kind of multimedia file of the embodiment of the present invention, obtain multimedia
Original voice file in file, translation original voice file obtain the new speech file of appointed language, and by configuring newspeak
The load attribute of sound text file, when so that the multimedia file playing, synchronous load new speech file is realized not via people
The mode of work translation, the voice document that a kind of original voice file is converted into other languages automatically, can help user more preferable
Ground understands in time and identifies the content in the audio and video in multimedia file.
Detailed description of the invention
Fig. 1 is the flow diagram of the interpretation method of the multimedia file of one embodiment of the invention;
Fig. 2 is the part-structure schematic block diagram of the translating equipment of the multimedia file of one embodiment of the invention;
Fig. 3 is the acquisition modular structure schematic block diagram of one embodiment of the invention;
Fig. 4 is the acquisition modular structure schematic block diagram of one embodiment of the invention;
Fig. 5 is the part-structure schematic block diagram of the translating equipment of the multimedia file of one embodiment of the invention;
Fig. 6 is the part-structure schematic block diagram of the translating equipment of the multimedia file of one embodiment of the invention;
Fig. 7 is the translation module structural schematic block diagram of one embodiment of the invention;
Fig. 8 is the part-structure schematic block diagram of the translating equipment of the multimedia file of one embodiment of the invention;
Fig. 9 is the image schematic diagram of the audio file of the label detection of one embodiment of the invention.
The embodiments will be further described with reference to the accompanying drawings for the realization, the function and the advantages of the object of the present invention.
Specific embodiment
It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, it is not intended to limit the present invention.
The embodiment of the present invention is described below in detail, examples of the embodiments are shown in the accompanying drawings, wherein from beginning to end
Same or similar label indicates same or similar element or element with the same or similar functions.Below with reference to attached
The embodiment of figure description is exemplary, and for explaining only the invention, and is not construed as limiting the claims.
Those skilled in the art of the present technique are appreciated that unless otherwise defined, all terms used herein (including technology art
Language and scientific term), there is meaning identical with the general understanding of those of ordinary skill in fields of the present invention.Should also
Understand, those terms such as defined in the general dictionary, it should be understood that have in the context of the prior art
The consistent meaning of meaning, and unless idealization or meaning too formal otherwise will not be used by specific definitions as here
To explain.
Referring to Fig.1, the embodiment of the present invention provides a kind of method of the translation of multimedia file, comprising steps of
Original voice file in S10, acquisition multimedia file;
S20, the translation original voice file obtain new speech file, and the language in the new speech file is specified language
Speech;
The load attribute of S30, the configuration new speech file, when so that the multimedia file playing, synchronous load institute
State new speech file.
The above method is applied to translation playback equipment, and above-mentioned translation playback equipment is generally video translation player, audio
The intelligent translations playback equipments such as player are translated, the present embodiment is explained so that video translates player as an example, has
The functions such as playing video file, audio file, display subtitle.As described in above-mentioned steps S10, above-mentioned multimedia file includes original
Voice document, video file and header file etc..Wherein, it is (i.e. original to include at least the first original voice file for original voice file
One of audio file) or the second original voice file (i.e. raw tone text file), in order to preferably be illustrated, this
Embodiment will be illustrated for including simultaneously above two voice document, it is worth mentioning at this point that, above-mentioned raw tone text
This document is the subtitle file in multimedia file, and original audio file is audio files.
As described in above-mentioned steps S20, original voice file can be the raw tone text of other language in addition to user's mother tongue
Part, new speech file can not be the mother tongue of user, the file for the appointed language watched can be wanted for user, wherein new speech
File includes the second new speech file (new audio file) and the first new speech file (new speech text file).
Such as above-mentioned steps S30, after obtaining new speech file, to enable users to understand the content in multimedia file,
It needs when playing, synchronous load new speech file.
In the present embodiment, new speech file includes new audio file and new speech text file, and original voice file includes
Original audio file and raw tone text file can carry out a variety of display methods and use, the user that can be further improved
Study and understanding to video file.For example, display new speech text file can contribute to user and go to understand multimedia file;
New speech text file and raw tone text file play simultaneously, can further help to user's study and identify more matchmakers
Language and pronunciation in body file.It plays new audio file and can contribute to user and understand video file, play new speech text
File, raw tone text file and original audio file can help user to learn and identify the pronunciation in multimedia file
Effect.
It include original audio file in above-mentioned multimedia file, the raw tone text obtained in multimedia file
In the step S10 of file, comprising:
Detect the beginning and end of the voice of each personage in the original audio file;
Wherein using the voice segments between the origin-to-destination of the voice of each personage as original audio file, described
Original audio file is original voice file.
In the present embodiment, multiple audio objects are contained in original voice file, such as ambient noise, personage's voice or dynamic
Plant issue sound, when being detected to raw tone, detect only personage voice signal, as ambient noise,
The sound that shot or animals and plants issue will not be detected, and pass through voice activity detection (Voice Activity
Detection, VAD) endpoint of personage's voice will not be continuous and in an original audio file in technology detection audio file
It constantly all makes a sound, so the terminal of the starting point of the voice of detection and voice signal, is a certain company in original voice file
Continuous a segment of audio file constitutes an original audio file (i.e. the first original voice file).Wherein, original audio file packet
Include a certain continuous original audio file detected when multiple personages individually speak, i.e. a continuous word of people
For an original audio file, next people's word is another original audio file later, further includes a certain continuous
One original audio file is that the voice by multiple personages while speaking is grouped together into, in present embodiment, preferably
, the voice segments of each personage are not to be overlapped respectively, that is, are alone that the voice said forms an original audio file, because individually
One people speaks, and the tone color and tone spoken are not much different, more convenient to be detected, and the original audio file detected is more
To be accurate, the beginning and end of the voice of label is not in error.
It further include having raw tone text file in above-mentioned original voice file, it is original in the acquisition multimedia file
In the step S10 of voice document, comprising:
The original audio file is converted into the raw tone text file, wherein the raw tone text text
Part is the original voice file.
As described in above-mentioned steps, raw tone text file is the subtitle file in original voice file, because of original text
Part is there is comprising one or both of raw tone text file or original audio file, in the present embodiment, when only wrapping
When containing a kind of original audio file, original audio file can be converted into raw tone text file through the above steps, solved
Word speed of certainly speaking in original voice file is too fast, and wherein clips non-type pronunciation, and user is difficult to depend merely on speech understanding, this
When raw tone text file can be leaned on tentatively to be understood, further improve understanding of the user to video file;Or it is former
If beginning speech text file both of which includes that can directly acquire raw tone text file, acquisition time is saved.
The above-mentioned voice segments using between the origin-to-destination of the voice of each personage are as the step of original audio file
After rapid, comprising:
Detect the original audio file format;
Judge whether the original audio file format is PCM format;
If it is not, being PCM format by the original audio file format change.
As described in above-mentioned steps, when detecting original audio file format not is PCM format, it is preferred that video translation
The format for the original audio file that player transition detection arrives, to be changed to the voice document of PCM format, PCM (Pulse Code
Modulation---- pulse-code modulation recording), the analog signals such as sound are exactly become to the spike train of symbolism, then remembered
Record.PCM signal is the digital signal being made of the symbols such as [1], [0].With analog signal ratio, it is not vulnerable to the clutter of conveyer system
And the influence of distortion.The fairly good impact effect of sound quality can be obtained in wide dynamic range.And the track PCM is different from video track, it can
For after-recording.In addition, the audio file of PCM format is that analog audio signal is directly formed through analog-to-digital conversion (A/D transformation)
Binary sequence, video translation player accurately can be decoded process to it.The initial format packet of original audio file
A variety of, such as PCM, WMV, MP4, DAT, RM multiple format is included, the audio file formats parsed in the present embodiment are preferably PCM lattice
Formula.
The above-mentioned translation original voice file obtains new speech file, and the language in the new speech file is specified language
In the step S20 of speech, comprising:
The raw tone text file is translated, the new speech text file after being translated, the new speech
Text file is the first new speech file.
In the present embodiment, the first new speech file is new speech text file, that is, the subtitle file after translating, Yong Huke
To understand the content of video file by the subtitle after translation, facilitate understanding.
The above-mentioned translation original voice file obtains new speech file, and the language in the new speech file is specified language
In the step S20 of speech, comprising:
Each new speech text file is subjected to speech synthesis, obtains new audio file, the new audio file is
Second new speech file.
In the present embodiment, inside above-mentioned new audio file (voice) can be carried out by new speech text file (subtitle)
It is converted to, the new audio file of multistage carries out the new speech file of the available completion of synthesis, is playing multimedia file
Process, it is corresponding to play new audio file according to the play time of original audio file;When playing multimedia file, Ke Yiwei
New speech file and new speech text file all replace original voice file and raw tone text file, or newspeak
The new audio file in part in sound file replaces the part original audio file in corresponding original voice file, does not do in detail in this
It repeats, for user when watching video, playing new audio file, new speech text file and original audio text file can be into one
Play the role of fully understanding video content to user in step ground;Video translates player plays original audio file, new speech text
This document and raw tone text file, user pass through viewing video, raw tone text file (original subtitle), new speech text
This document (subtitle after translation) and the video (shape of the mouth as one speaks of viewing speaker) of simultaneous display can play study newspeak to user
The effect of speech.
The above-mentioned load attribute to the configuration new speech file, it is synchronous when so that the multimedia file playing
After the step S30 for loading the new speech file, comprising:
It receives and plays selection signal, the broadcastings selection signal is for selecting the broadcasting new speech text file and original
Speech text file it is one or more, and selection plays one of original audio file and new audio file, and described
New speech text file and the new audio file at least play one of which;
It is played out according to the broadcasting selection signal.
In the present embodiment, sending can be selected for user by playing selection signal, or video player is automatic
What selection issued, user is according to itself Grasping level or hobby interests to voice in multimedia file, and voluntarily selection plays one
A or multiple files, to improve the usage experience of user.As if user is high to the language acquisition degree in original audio file,
Can choose and play original audio file and new speech text file, viewing when improve oneself hearing energy to this kind of language
Power;Or user is weaker to the grasp ability of the language, can choose and plays original audio file, raw tone text file
And new speech text file, user can pass through original audio file, raw tone text file (original subtitle), new speech text
The video (shape of the mouth as one speaks of viewing speaker) of file (subtitle after translation) and simultaneous display, to learn the sounding, language of this kind of language
Sentence and semanteme;Or user can choose broadcasting when be not desired to learn the language, merely desire to understand the content of the multimedia file
New audio file (voice after translation), new speech text file, the content being fully understood that inside the multimedia.
The load attribute of the above-mentioned configuration new speech file, when so that the multimedia file playing, synchronous load institute
State the step S30 of new speech file, further includes:
The play time length of each original audio file is obtained, and obtains corresponding each new audio text
The play time length of part;
Judge whether the play time length of each original audio file is greater than the corresponding new audio file
Play time length;
If more than then selecting to play the corresponding new audio file;
If being less than, select to play the corresponding original audio file.
In the present embodiment, it is worth mentioning at this point that, new speech text file (subtitle of translation languages text) display starting point
Time and terminal time are the starting time and terminal time of corresponding original voice file, and video translation player can play at this time
Original audio file, raw tone text file and new speech text file, video translation player also can receive user and do
Selection out, (original audio file, raw tone text file and new such as in the file that can play of video translation player
Speech text file), selection only plays some or certain several files;Or after the time span of original audio file is greater than translation
New audio file synthesis output voice segments time span, i.e., the starting time of the new audio file after every section translation can be to upper original
The starting time of beginning audio file, video player can automatically select the new audio file of output, not export original audio file, newly
Speech text file (subtitle of translation languages text) display starting time is corresponding raw tone text file (original subtitle)
Starting time, terminal time is the terminal time of corresponding new speech text file, and video translation player can select to play new
Audio file, new speech text file and raw tone text file, in the case, video translate player to new multimedia
After the broadcasting of file makes a choice, video translation player also can receive the selection that user makes, and such as broadcast in video translation
It puts in the file that device can play (new audio file, new speech text file and raw tone text file), selection only plays
Some or certain several files;Or the time span of original audio file is equal to new audio file synthesis output voice segments after translation
Time span, the i.e. starting time and terminal time of original audio file and the starting time of new audio file and terminal time are same
Step corresponds to, then video translation player the case where can playing then include new audio file and new speech text file one kind,
One or more, the selection of the receivable user of the player of video translation at this time of original voice file and raw tone text file
Some or multiple files play out for example, multimedia file can be gif file.
It is noted that in one embodiment, load attribute is to original voice file and new audio file solution
Analysis, load time broadcast information, specifically, multimedia file multi-section point includes original voice file, video file and head text
Therefore part etc. before playing video file, can first play header file, video file has one relative to multimedia file
That is, it plays the time of header file synchronization time when broadcasting, K original voice file and M is generally had in multimedia file
Video file, an original audio file include the interval between multistage original audio file and multistage original audio file
Section marks the starting time Ts11 and terminal time Te11 of the broadcasting of every section of original audio file (referring to Fig. 9, wherein at one
It successively include first segment voice segments: Ts11 to Te11 since time shaft starting point on time shaft;Second segment voice segments: Ts12 is extremely
N sections of voice segments of Te12 ... ...: Ts1n to Te1n), the initial time Ts11 and terminal time of every section of original audio file
Te11 adds Toffset1 synchronization time of the relatively more playback of media files of correct document analysis, that is, may know that one it is original
N number of first original voice file in voice document is respectively as follows: Toffset1+Ts11 relative to the play time of system,
Toffset1+Ts12,………,Toffset1+Ts1n;Handled first segment original audio file in original audio file it
Afterwards, successively respectively other K-1 original audio files in original audio file are handled to obtain in K audio file
All first original voice files information and relative to multimedia file play time, such as it is shown below:
The temporal information of the original audio file of first voice document: Toffset1+Ts11, Toffset1+
Ts12,………,Toffset1+Ts1n;
The temporal information of the original audio file of second voice document: Toffset2+Ts21, Toffset2+
Ts22,………,Toffset2+Ts2n;
The temporal information of the original audio file of k-th voice document: Toffsetk+Tsk1, Toffsetk+
Tsk2 ... ..., Toffsetk+Tskl, wherein Toffsetk is play time of the k-th voice document relative to system,
Tsk1 is the initial time of first voice segments of k-th original voice file, and Tskl is the last of k-th original voice file
The initial time of one voice segments L, can regard a multimedia file as includes Y audio file, and records Y audio file
Respectively relative to the starting time and terminal time of the play time of system.
Each audio file is converted into raw tone text file, and each raw tone text file is added opposite
In the temporal information of system plays, is corresponded with each audio file, obtain Y raw tone text file, it is original by Y
Speech text file translation obtains Y new speech text file.Y synthesis is obtained into Y new audio files, and obtains Y
The respective duration T r of new audio file, wherein r is positive integer, and 0 < r < y+1, Y new audio files replace original sound one by one
Frequency file, the mode for taking original audio file to be aligned one by one with new audio file initial time are replaced, new speech text file
Display is synchronous with new speech file and video frame, when the time span of original audio file and new audio file export the time not
Together, there are two types of situations:
A) original audio file is only exported, does not export new audio file, new speech text file shows initial time and cuts
Only the time is initial time and the deadline of corresponding original audio file, if Z new audio texts of n-th new speech file
The initial time of part is ToffsetN+TSZ and deadline is ToffsetN+TEZ, and the starting point of corresponding video frame is
ToffsetN, the picture frame number that subtitle persistently occurs are (TEz-TSZ) X video frame per second, and video frame rate is compiled by multimedia file and solved
Code format determination, for example be 30 frames/second.
B) new audio file is exported, original audio file is not exported.New speech text file shows that initial time is to correspond to
The initial time of raw tone text file, deadline is the deadline of corresponding speech synthesis languages voice segments, if n-th
The initial time of the Z voice segments of audio file is ToffsetN+TSZ, the corresponding new audio file of original audio file
Duration is Tr, and the starting point of corresponding video frame is ToffsetN, and the picture frame number that subtitle persistently occurs is Tr X video frame per second
(r=z), video frame rate is determined by multimedia file code/decode format, for example is 30 frames/second.
When the time span of original audio file is identical as the new audio file output time, such as GIF living document, new speech
File and new speech text file can completely or partially replace original voice file and new speech text file, specifically can root
Selected according to user, can also system play selection automatically, improve the usage experience of user.
The above-mentioned translation original voice file obtains new speech file, and the voice in the new speech file is specified language
After the step S20 of speech, including,
It receives and searches information, the lookups information is the original audio file, the raw tone text file, described
The text or sentence of new speech text file and the new audio file in any one;
Lookup result is played according to the lookup signal is corresponding.
In the present embodiment, occur in the voice or new audio file that input needs the original audio file retrieved to occur
The voice the crossed crucial sentence that perhaps raw tone text file occurred or some text or new speech text file go out
The crucial sentence or some text now crossed, then to all raw tone text files, original audio file or after translating
New speech text file and new audio file carry out character, the matching search of each audio file one by one, obtain crucial sentence and
The corresponding speech text file of word and corresponding voice segment file (i.e. original audio file or new audio file) and video frame
In the location information of multimedia file, in this way, the multimedia file segment of corresponding crucial sentence and word can be played.
The interpretation method of a kind of multimedia file of the embodiment of the present invention, by obtaining the raw tone in multimedia file
File, translation original voice file obtain the new speech file of appointed language, and the load by configuring new speech text file
Attribute, when so that the multimedia file playing, synchronous load new speech file is realized not via the mode of human translation, certainly
A kind of dynamic voice document that original voice file is converted into other languages, can help user understand preferably, in time and
Identify the content in the audio and video in multimedia file.
Referring to Fig. 2, in the present embodiment, a kind of translating equipment of multimedia file is provided, comprising:
Module 10 is obtained, for obtaining the original voice file in multimedia file;
Translation module 20 obtains new speech file for translating the original voice file, in the new speech file
Language is appointed language;
Configuration module 30, for configuring the load attribute of the new speech file, when so that the multimedia file playing,
It is synchronous to load the new speech file.
Above-mentioned apparatus is applied to translation playback equipment, and above-mentioned translation playback equipment is generally video translation player, audio
The intelligent translations playback equipments such as player are translated, the present embodiment is explained so that video translates player as an example, has
The functions such as playing video file, audio file, display subtitle.Above-mentioned acquisition module 10, multimedia file include raw tone text
Part, video file and header file etc..Wherein, it obtains module 10 and obtains original voice file, original voice file includes at least the
One of one original voice file (i.e. original audio file) or the second original voice file (i.e. raw tone text file),
In order to preferably be illustrated, present embodiment will be illustrated for including simultaneously above two voice document, be worth one
It is mentioned that, above-mentioned raw tone text file is the subtitle file in multimedia file, and original audio file is audio files.
Above-mentioned translation module 20, original voice file can be the original voice file of other language in addition to user's mother tongue,
New speech file can not be the mother tongue of user, can want the text for the appointed language watched for user by translation module 20
Part, wherein new speech file includes the second new speech file (new audio file) and the first new speech file (new speech text text
Part).
Above-mentioned configuration module 30, after obtaining new speech file, to enable users to understand the content in multimedia file,
It needs when playing, the synchronous load new speech file of configuration module 30.
In the present embodiment, new speech file includes new audio file and new speech text file, and original voice file includes
Original audio file and raw tone text file can carry out a variety of display methods and use, the user that can be further improved
Study and understanding to video file.For example, display new speech text file can contribute to user and go to understand multimedia file;
New speech text file and raw tone text file play simultaneously, can further help to user's study and identify more matchmakers
Language and pronunciation in body file.It plays new audio file and can contribute to user and understand video file, play new speech text
File, raw tone text file and original audio file can help user to learn and identify the pronunciation in multimedia file
Effect.
Referring to Fig. 3, in the present embodiment, the acquisition module 10 includes:
First detection unit 101, for detecting the beginning and end of the voice of each personage in the multimedia file;
Determination unit 102, for using the voice segments between the origin-to-destination of the voice of each personage as original
Audio file, wherein the original audio file is the first original voice file.
Such as above-mentioned first detection unit 101101, contain multiple audio objects in original voice file, as ambient noise,
The sound that personage's voice or animals and plants issue, when being detected to raw tone, the voice signal of the only personage detected,
Sound as ambient noise, shot or animals and plants issue will not be detected, and pass through voice activity detection (Voice
Activity Detection, VAD) endpoint of personage's voice in technology detection audio file, and in an original audio file,
It continuously will not all make a sound, so the terminal of the starting point of the voice of detection and voice signal, is original voice file
In a certain continuous a segment of audio file, constitute an original audio file (i.e. the first original voice file).
Wherein, original audio file includes that detect when multiple personages individually speak a certain continuous one is original
Audio file, i.e. a continuous word of people are an original audio file, and next people's word is that another is former later
It is voice by multiple personages while combination of speaking one that beginning audio file, which further includes an a certain continuous original audio file,
Formation is played, in present embodiment, it is preferred that the voice segments of each personage are not to be overlapped respectively, that is, are alone the voice said
An original audio file is formed, because an independent people speaks, the tone color and tone spoken are not much different, and are more convenient to be detected
It arrives, and the original audio file detected is more accurate, the beginning and end of the voice of label is not in error.
Referring to Fig. 3, above-mentioned acquisition module 10, further includes:
First converting unit 103, for the original voice file to be converted into the raw tone text file,
In, the raw tone text file is the second original voice file.
As described in above-mentioned first converting unit 103, raw tone text file is the subtitle text in original voice file
Part, because original document is there is comprising one or both of raw tone text file or original audio file, in this implementation
In example, when only including a kind of original audio file, original audio file can be converted to by the first converting unit 103
It is too fast to solve word speed of speaking in original voice file, and wherein clips non-type pronunciation, user for raw tone text file
It is difficult to depend merely on speech understanding, raw tone text file can be leaned on tentatively to be understood at this time, further improve user couple
The understanding of video file;Or if raw tone text file both of which includes that can directly acquire raw tone text file,
Save acquisition time.
Referring to Fig. 4, above-mentioned acquisition module 10 further include:
Second detection unit 104, for detecting the raw tone audio file formats;
First judging unit 105, for judging whether the raw tone audio file formats are PCM format;
Second converting unit 106, when being detected as no, for the raw tone audio file formats to be changed into PCM
Format.
In the present embodiment, the first judging unit 105 is used for the original audio file lattice that second detection unit 104 detects
Whether formula is PCM format, it is preferred that the original audio that video translation player is arrived by 106 transition detection of the second converting unit
The format of file, to be changed to the voice document of PCM format, PCM (Pulse Code Modulation---- pulse-code modulation record
Sound), the analog signals such as sound are exactly become to the spike train of symbolism, then recorded.PCM signal is accorded with by [1], [0] etc.
Number constitute digital signal.With analog signal ratio, it is not vulnerable to the clutter of conveyer system and the influence of distortion.Wide dynamic range,
The fairly good impact effect of sound quality can be obtained.And the track PCM is different from video track, can be used for after-recording.In addition, PCM lattice
The audio file of formula is the binary sequence that analog audio signal is directly formed through analog-to-digital conversion (A/D transformation), and video translation is broadcast
Putting device accurately can be decoded process to it.The initial format of original audio file include it is a variety of, as PCM, WMV, MP4,
The multiple formats such as DAT, RM, the audio file formats parsed in the present embodiment are preferably PCM format.
Referring to Fig. 7, above-mentioned translation module 20 includes:
Translation unit 201, for being translated to the raw tone text file, the new speech text after being translated
File, the new speech text file are the first new speech file.
As described in above-mentioned steps, the first new speech file is new speech text file, i.e. word after the translation of translation unit 201
Curtain file, user can understand the content of video file by the subtitle after translation, facilitate understanding.
Above-mentioned translation module 20 further include:
Each new speech text file is carried out speech synthesis, obtains new audio file by synthesis unit 202, described
New audio file is the second new speech file.
Above-mentioned new audio file (voice) can by new speech text file (subtitle) by internal synthesis unit 202 into
Row is converted to, and the new audio file of multistage carries out the new speech file of the available completion of synthesis, is playing multimedia file
Process, it is corresponding to play new audio file according to the play time of original audio file;It, can be with when playing multimedia file
Original voice file and raw tone text file are all replaced for new speech file and new speech text file, or new
The new audio file in part in voice document replaces the part original audio file in corresponding original voice file, does not do in detail in this
It carefully repeats, for user when watching video, playing new audio file, new speech text file and original audio text file can be into
Play the role of fully understanding video content to user to one step;Video translates player plays original audio file, new speech
Text file and raw tone text file, user pass through viewing video, raw tone text file (original subtitle), new speech
It is new that text file (subtitle after translation) and the video (shape of the mouth as one speaks of viewing speaker) of simultaneous display can play study to user
The effect of language.
Referring to Fig. 5, the translating equipment of above-mentioned multimedia file further include:
First receiving module 40 is used to receive broadcasting selection signal, and the broadcasting selection signal is for selecting described in broadcasting
New speech text file and raw tone text file it is one or more, and selection plays original audio file and new audio
One of file, and the new speech text file and the new audio file at least play one of which;It is broadcast according to described
Selection signal is put to play out;
First playing module 50, for being played out according to the broadcasting selection signal.
Above-mentioned broadcasting selection signal can select sending for user, or video player automatically selects sending
, and the broadcasting selection signal that the first playing module 50 is received according to the first receiving module 40, it plays out, wherein user
According to itself Grasping level or hobby interests to voice in multimedia file, voluntarily selection plays one or more files, with
Improve the usage experience of user.If it is former to can choose broadcasting as user is high to the language acquisition degree in original audio file
Beginning audio file and new speech text file, viewing when improve oneself Listening Ability of Ethnic to this kind of language;Or user couple
The grasp ability of the language is weaker, can choose and plays original audio file, raw tone text file and new speech text
File, user can be by original audio file, raw tone text file (original subtitle), new speech text files (after translation
Subtitle) and simultaneous display the video shape of the mouth as one speaks of speaker (viewing), to learn the sounding, sentence and semanteme of this kind of language;Or
When person user is not desired to learn the language, merely desire to understand the content of the multimedia file, it can choose and play new audio file
(voice after translation), new speech text file, the content being fully understood that inside the multimedia.
Referring to Fig. 8 and Fig. 9, the translating equipment of above-mentioned multimedia file further include:
Time-obtaining module 80, for obtaining the play time length of each original audio file, and acquisition pair
Each of answer the play time length of the new audio file;
Judgment module 90, for judging whether the play time length of each original audio file is greater than corresponding institute
State the play time length of new audio file;
Selecting module 100, for judge if more than when, the corresponding new audio file of selection broadcasting;It is also used to judge
If be less than, selection plays the corresponding original audio file.
In one embodiment, new speech text file (subtitle of translation languages text) display starting time and terminal time
For the starting time and terminal time of corresponding original voice file, video translation player can play at this time original audio file,
Raw tone text file and new speech text file, video translation player also can receive the selection that user makes, such as exist
(original audio file, raw tone text file and new speech text text in the file that video translation player can play
Part), selection only plays some or certain several files;Or the time span of original audio file is greater than new audio file after translation
Synthesis output voice segments time span, i.e., the starting time of the new audio file after every section translation can be to upper original audio file
Starting time, video player can automatically select the new audio file of output, not export original audio file, new speech text text
Part (subtitle of translation languages text) display starting time is the starting time of corresponding raw tone text file (original subtitle),
Terminal time is the terminal time of corresponding new speech text file, and video translation player can select to play new audio file, new
Speech text file and raw tone text file, in the case, video translate broadcasting of the player to new multimedia file
After making a choice, video translation player also can receive the selection that user makes, and can such as broadcast in video translation player
In the file put (new audio file, new speech text file and raw tone text file), selection only plays some or certain
Several files;Or the time span of original audio file is equal to new audio file synthesis output voice segments time span after translation,
That is the starting time and terminal time with the starting time of new audio file of original audio file synchronized with terminal time it is corresponding, then
The case where video translation player can play then includes one kind of new audio file and new speech text file, raw tone text
Part and raw tone text file it is one or more, the player of video translation at this time can receive user selection some or it is multiple
File plays out for example, multimedia file can be gif file.
It is noted that in one embodiment, load attribute is to original voice file and new audio file solution
Analysis, load time broadcast information, specifically, multimedia file multi-section point includes original voice file, video file and head text
Therefore part etc. before playing video file, can first play header file, video file has one relative to multimedia file
That is, it plays the time of header file synchronization time when broadcasting, K original voice file and M is generally had in multimedia file
Video file, an original audio file include the interval between multistage original audio file and multistage original audio file
Section marks the starting time Ts11 and terminal time Te11 of the broadcasting of every section of original audio file (referring to Fig. 9, wherein at one
It successively include first segment voice segments: Ts11 to Te11 since time shaft starting point on time shaft;Second segment voice segments: Ts12 is extremely
N sections of voice segments of Te12 ... ...: Ts1n to Te1n), the initial time Ts11 and terminal time of every section of original audio file
Te11 adds Toffset1 synchronization time of the relatively more playback of media files of correct document analysis, that is, may know that one it is original
N number of first original voice file in voice document is respectively as follows: Toffset1+Ts11 relative to the play time of system,
Toffset1+Ts12,………,Toffset1+Ts1n;Handled first segment original audio file in original audio file it
Afterwards, successively respectively other K-1 original audio files in original audio file are handled to obtain in K audio file
All first original voice files information and relative to multimedia file play time, such as it is shown below:
The temporal information of the original audio file of first voice document: Toffset1+Ts11, Toffset1+
Ts12,………,Toffset1+Ts1n;
The temporal information of the original audio file of second voice document: Toffset2+Ts21, Toffset2+
Ts22,………,Toffset2+Ts2n;
The temporal information of the original audio file of k-th voice document: Toffsetk+Tsk1, Toffsetk+
Tsk2 ... ..., Toffsetk+Tskl, wherein Toffsetk is play time of the k-th voice document relative to system,
Tsk1 is the initial time of first voice segments of k-th original voice file, and Tskl is the last of k-th original voice file
The initial time of one voice segments L, can regard a multimedia file as includes Y audio file, and records Y audio file
Respectively relative to the starting time and terminal time of the play time of system.
Each audio file is converted into raw tone text file, and each raw tone text file is added opposite
In the temporal information of system plays, is corresponded with each audio file, obtain Y raw tone text file, it is original by Y
Speech text file translation obtains Y new speech text file.Y synthesis is obtained into Y new audio files, and obtains Y
The respective duration T r of new audio file, wherein r is positive integer, and 0 < r < y+1, Y new audio files replace original sound one by one
Frequency file, the mode for taking original audio file to be aligned one by one with new audio file initial time are replaced, new speech text file
Display is synchronous with new speech file and video frame, when the time span of original audio file and new audio file export the time not
Together, there are two types of situations:
A) original audio file is only exported, does not export new audio file, new speech text file shows initial time and cuts
Only the time is initial time and the deadline of corresponding original audio file, if Z new audio texts of n-th new speech file
The initial time of part is ToffsetN+TSZ and deadline is ToffsetN+TEZ, and the starting point of corresponding video frame is
ToffsetN, the picture frame number that subtitle persistently occurs are (TEz-TSZ) X video frame per second, and video frame rate is compiled by multimedia file and solved
Code format determination, for example be 30 frames/second.
B) new audio file is exported, original audio file is not exported.New speech text file shows that initial time is to correspond to
The initial time of raw tone text file, deadline is the deadline of corresponding speech synthesis languages voice segments, if n-th
The initial time of the Z voice segments of audio file is ToffsetN+TSZ, the corresponding new audio file of original audio file
Duration is Tr, and the starting point of corresponding video frame is ToffsetN, and the picture frame number that subtitle persistently occurs is Tr X video frame per second
(r=z), video frame rate is determined by multimedia file code/decode format, for example is 30 frames/second.
When the time span of original audio file is identical as the new audio file output time, such as GIF living document, new speech
File and new speech text file can completely or partially replace original voice file and new speech text file, specifically can root
Selected according to user, can also system play selection automatically, improve the usage experience of user.
Referring to Fig. 6, the translating equipment of above-mentioned multimedia file further include:
Second receiving module 60 searches information for receiving, and the lookup information is the original audio file, the original
The text or sentence of beginning speech text file, the new speech text file and the new audio file in any one;
Second playing module 70, for playing lookup result according to the lookup signal is corresponding.
In the present embodiment, the second receiving module 60 is by receiving the voice for needing the original audio file retrieved to occur
Or crucial sentence or some text that the voice or raw tone text file occurred in new audio file occurred, or
The crucial sentence or some text that person's new speech text file occurred, the second playing module 70 is then to all raw tones
Text file, original audio file or translation after new speech text file and new audio file carry out one by one character, each
Audio file matching search, obtains crucial sentence and the corresponding speech text file of word and corresponding voice segment file is (i.e. former
Beginning audio file or new audio file) and video frame in the location information of multimedia file, play corresponding crucial sentence and word
Multimedia file segment, if raw tone text file is English, user inputs " I ", and retrieval module retrieves about " I "
One or more snippets raw tone text file and corresponding video file, user can play the view for wanting to watch in wherein selection
Frequency file also can choose the video file after playing translation, facilitate user after finishing watching some video, it is desirable to watch again
Wherein when excellent segment, it is able to carry out accurate lookup.
The translating equipment of a kind of multimedia file of the embodiment of the present invention, by obtaining the raw tone in multimedia file
File, translation original voice file obtain the new speech file of appointed language, and the load by configuring new speech text file
Attribute, when so that the multimedia file playing, synchronous load new speech file is realized not via the mode of human translation, certainly
A kind of dynamic voice document that original voice file is converted into other languages, can help user understand preferably, in time and
Identify the content in the audio and video in multimedia file.
In one embodiment, a kind of translation playback equipment, including memory, processor and application program, institute are additionally provided
It states application program to be stored in the memory and be configured as being executed by the processor, the application program is configured as
For executing method described in any of the above embodiments.Translation playback equipment includes the intelligence such as video translation player, language learner
Translate playback equipment.
The above description is only a preferred embodiment of the present invention, is not intended to limit the scope of the invention, all utilizations
Equivalent structure or equivalent flow shift made by description of the invention and accompanying drawing content is applied directly or indirectly in other correlations
Technical field, be included within the scope of the present invention.
Claims (10)
1. a kind of interpretation method of multimedia file characterized by comprising
Obtain the original voice file in multimedia file;
It translates the original voice file and obtains new speech file, the language in the new speech file is appointed language;
The load attribute of the new speech file is configured, it is synchronous to load the new speech when so that the multimedia file playing
File.
2. the interpretation method of multimedia file according to claim 1, which is characterized in that wrapped in the original voice file
Original audio file is included, the step of the original voice file obtained in multimedia file, comprising:
Detect the beginning and end of the voice of each personage in the multimedia file;
Using the voice segments between the origin-to-destination of the voice of each personage as original audio file, wherein the original
Beginning audio file is the first original voice file.
3. the interpretation method of multimedia file according to claim 2, which is characterized in that in the original voice file also
Include raw tone text file, it is described obtain multimedia file in original voice file the step of, comprising:
The original audio file is converted into the raw tone text file, wherein the raw tone text file is
Second original voice file.
4. the interpretation method of multimedia file according to claim 2, which is characterized in that described by each personage's
After the step of voice segments between the origin-to-destination of voice are as original audio file, comprising:
Detect the format of the original audio file;
Judge whether the original audio file format is PCM format;
If it is not, being PCM format by the format change of the original audio file.
5. the interpretation method of multimedia file according to claim 3, which is characterized in that the translation raw tone
The step of file obtains new speech file, and the language in the new speech file is appointed language, comprising:
The raw tone text file is translated, the new speech text file after being translated, the new speech text
File is the first new speech file.
6. the interpretation method of multimedia file according to claim 5, which is characterized in that the translation raw tone
The step of file obtains new speech file, and the language in the new speech file is appointed language, comprising:
Each new speech text file is subjected to speech synthesis, obtains new audio file, the new audio file is second
New speech file.
7. the interpretation method of multimedia file according to claim 6, which is characterized in that the configuration new speech text
The load attribute of part, when so that the multimedia file playing, after synchronous the step of loading the new speech file, comprising:
It receives and plays selection signal, the broadcasting selection signal plays the new speech text file and raw tone for selecting
Text file it is one or more, and selection plays one of original audio file and new audio file, and the newspeak
Sound text file and the new audio file at least play one of which;
It is played out according to the broadcasting selection signal.
8. the interpretation method of multimedia file according to claim 6, which is characterized in that the translation raw tone
After the step of file obtains new speech file, and the language in the new speech file is appointed language, including,
It receives and searches information, the lookup information is the original audio file, the raw tone text file, the newspeak
The text or sentence of sound text file and the new audio file in any one;
Lookup result is played according to the lookup signal is corresponding.
9. a kind of translating equipment of multimedia file characterized by comprising
Module is obtained, for obtaining the original voice file in multimedia file;
Translation module obtains new speech file for translating the original voice file, and the language in the new speech file is
Appointed language;
Configuration module, it is synchronous to add when so that the multimedia file playing for configuring the load attribute of the new speech file
Carry the new speech file.
10. a kind of translation playback equipment, including memory, processor and application program, the application program is stored in described
It in memory and is configured as being executed by the processor, which is characterized in that the application program is configurable for right of execution
Benefit requires 1 to 8 described in any item methods.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811543822.9A CN109658919A (en) | 2018-12-17 | 2018-12-17 | Interpretation method, device and the translation playback equipment of multimedia file |
PCT/CN2019/073767 WO2020124754A1 (en) | 2018-12-17 | 2019-01-29 | Multimedia file translation method and apparatus, and translation playback device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811543822.9A CN109658919A (en) | 2018-12-17 | 2018-12-17 | Interpretation method, device and the translation playback equipment of multimedia file |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109658919A true CN109658919A (en) | 2019-04-19 |
Family
ID=66114701
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811543822.9A Pending CN109658919A (en) | 2018-12-17 | 2018-12-17 | Interpretation method, device and the translation playback equipment of multimedia file |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN109658919A (en) |
WO (1) | WO2020124754A1 (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110121097A (en) * | 2019-05-13 | 2019-08-13 | 深圳市亿联智能有限公司 | Multimedia playing apparatus and method with accessible function |
CN110335610A (en) * | 2019-07-19 | 2019-10-15 | 北京硬壳科技有限公司 | The control method and display of multimedia translation |
CN110471659A (en) * | 2019-08-16 | 2019-11-19 | 珠海格力电器股份有限公司 | Multilingual method and system, human-machine interface configuration software end and equipment end |
CN114007116A (en) * | 2022-01-05 | 2022-02-01 | 凯新创达(深圳)科技发展有限公司 | Video processing method and video processing device |
CN115066908A (en) * | 2019-12-09 | 2022-09-16 | 金京喆 | User terminal and control method thereof |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1196636A (en) * | 1998-01-15 | 1998-10-21 | 英业达股份有限公司 | Interactive image sychronization captions display device and display method |
CN1201215A (en) * | 1998-01-15 | 1998-12-09 | 英业达股份有限公司 | Display device for interactive image synchronous captions and displaying method therefor |
KR20140121516A (en) * | 2013-04-05 | 2014-10-16 | 이현철 | System and method for offering real-time translated subtitles |
US20150373428A1 (en) * | 2014-06-20 | 2015-12-24 | Google Inc. | Clarifying Audible Verbal Information in Video Content |
CN105704579A (en) * | 2014-11-27 | 2016-06-22 | 南京苏宁软件技术有限公司 | Real-time automatic caption translation method during media playing and system |
CN105848004A (en) * | 2016-05-16 | 2016-08-10 | 乐视控股(北京)有限公司 | Caption playing method and caption playing device |
CN106303695A (en) * | 2016-08-09 | 2017-01-04 | 北京东方嘉禾文化发展股份有限公司 | Audio translation multiple language characters processing method and system |
CN207302623U (en) * | 2017-07-26 | 2018-05-01 | 安徽听见科技有限公司 | A kind of remote speech processing system |
CN108780643A (en) * | 2016-11-21 | 2018-11-09 | 微软技术许可有限责任公司 | Automatic dubbing method and apparatus |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8515749B2 (en) * | 2009-05-20 | 2013-08-20 | Raytheon Bbn Technologies Corp. | Speech-to-speech translation |
CN103226947B (en) * | 2013-03-27 | 2016-08-17 | 广东欧珀移动通信有限公司 | A kind of audio-frequency processing method based on mobile terminal and device |
CN104683873A (en) * | 2013-11-27 | 2015-06-03 | 英业达科技有限公司 | Multimedia play system and multimedia play method |
CN104244081B (en) * | 2014-09-26 | 2018-10-16 | 可牛网络技术(北京)有限公司 | The providing method and device of video |
CN104681049B (en) * | 2015-02-09 | 2017-12-22 | 广州酷狗计算机科技有限公司 | The display methods and device of prompt message |
CN108289244B (en) * | 2017-12-28 | 2021-05-25 | 努比亚技术有限公司 | Video subtitle processing method, mobile terminal and computer readable storage medium |
-
2018
- 2018-12-17 CN CN201811543822.9A patent/CN109658919A/en active Pending
-
2019
- 2019-01-29 WO PCT/CN2019/073767 patent/WO2020124754A1/en active Application Filing
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1196636A (en) * | 1998-01-15 | 1998-10-21 | 英业达股份有限公司 | Interactive image sychronization captions display device and display method |
CN1201215A (en) * | 1998-01-15 | 1998-12-09 | 英业达股份有限公司 | Display device for interactive image synchronous captions and displaying method therefor |
KR20140121516A (en) * | 2013-04-05 | 2014-10-16 | 이현철 | System and method for offering real-time translated subtitles |
US20150373428A1 (en) * | 2014-06-20 | 2015-12-24 | Google Inc. | Clarifying Audible Verbal Information in Video Content |
CN105704579A (en) * | 2014-11-27 | 2016-06-22 | 南京苏宁软件技术有限公司 | Real-time automatic caption translation method during media playing and system |
CN105848004A (en) * | 2016-05-16 | 2016-08-10 | 乐视控股(北京)有限公司 | Caption playing method and caption playing device |
CN106303695A (en) * | 2016-08-09 | 2017-01-04 | 北京东方嘉禾文化发展股份有限公司 | Audio translation multiple language characters processing method and system |
CN108780643A (en) * | 2016-11-21 | 2018-11-09 | 微软技术许可有限责任公司 | Automatic dubbing method and apparatus |
CN207302623U (en) * | 2017-07-26 | 2018-05-01 | 安徽听见科技有限公司 | A kind of remote speech processing system |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110121097A (en) * | 2019-05-13 | 2019-08-13 | 深圳市亿联智能有限公司 | Multimedia playing apparatus and method with accessible function |
CN110335610A (en) * | 2019-07-19 | 2019-10-15 | 北京硬壳科技有限公司 | The control method and display of multimedia translation |
CN110471659A (en) * | 2019-08-16 | 2019-11-19 | 珠海格力电器股份有限公司 | Multilingual method and system, human-machine interface configuration software end and equipment end |
CN110471659B (en) * | 2019-08-16 | 2023-07-21 | 珠海格力电器股份有限公司 | Multilingual implementation method and system, man-machine interface configuration software end and equipment end |
CN115066908A (en) * | 2019-12-09 | 2022-09-16 | 金京喆 | User terminal and control method thereof |
CN114007116A (en) * | 2022-01-05 | 2022-02-01 | 凯新创达(深圳)科技发展有限公司 | Video processing method and video processing device |
Also Published As
Publication number | Publication date |
---|---|
WO2020124754A1 (en) | 2020-06-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109658919A (en) | Interpretation method, device and the translation playback equipment of multimedia file | |
US6314398B1 (en) | Apparatus and method using speech understanding for automatic channel selection in interactive television | |
US8497939B2 (en) | Method and process for text-based assistive program descriptions for television | |
US20080195386A1 (en) | Method and a Device For Performing an Automatic Dubbing on a Multimedia Signal | |
CN108133632B (en) | The training method and system of English Listening Comprehension | |
WO2014161282A1 (en) | Method and device for adjusting playback progress of video file | |
CN111462553B (en) | Language learning method and system based on video dubbing and sound correction training | |
KR102044689B1 (en) | System and method for creating broadcast subtitle | |
CN101753915A (en) | Data processing device, data processing method, and program | |
CN1722803A (en) | Method and apparatus for navigating through subtitles of an audio video data stream | |
CN109036372A (en) | A kind of voice broadcast method, apparatus and system | |
KR20060087144A (en) | A multimedia player and the multimedia-data search way using the player | |
US20110103768A1 (en) | Information processing apparatus, scene search method, and program | |
JP2008299032A (en) | Linguistic training aid, and character data regenerator | |
JP2006337490A (en) | Content distribution system | |
KR101618777B1 (en) | A server and method for extracting text after uploading a file to synchronize between video and audio | |
KR102232642B1 (en) | Media play device and voice recognition server for providing sound effect of story contents | |
CN109065018A (en) | A kind of narration data processing method and system towards intelligent robot | |
JP2002056006A (en) | Video/voice retrieving device | |
CN113808593A (en) | Voice interaction system, related method, device and equipment | |
US20110165541A1 (en) | Reviewing a word in the playback of audio data | |
JP2006074514A (en) | Image editing device, image reproducing device, file database, file distributing server, image editing method, image editing program, image reproducing method, image reproducing program, and computer readable storage medium | |
KR101103329B1 (en) | Play method of language learning system | |
JP2005038014A (en) | Information presentation device and method | |
CN109033099A (en) | A kind of multi-media management method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190419 |
|
RJ01 | Rejection of invention patent application after publication |