CN109658919A

CN109658919A - Interpretation method, device and the translation playback equipment of multimedia file

Info

Publication number: CN109658919A
Application number: CN201811543822.9A
Authority: CN
Inventors: 郑勇; 孙俊; 王文祺; 杨汉丹; 杜志华; 温平; 王辉
Original assignee: Shenzhen Water World Co Ltd
Current assignee: Shenzhen Water World Co Ltd
Priority date: 2018-12-17
Filing date: 2018-12-17
Publication date: 2019-04-19
Also published as: WO2020124754A1

Abstract

Present invention discloses a kind of interpretation method of multimedia file, device and translation playback equipments, and wherein method includes: the original voice file obtained in multimedia file；It translates the original voice file and obtains new speech file, the language in the new speech file is appointed language；The load attribute of the new speech file is configured, it is synchronous to load the new speech file when so that the multimedia file playing.Realize the file that the original voice file in a kind of multimedia file is converted into other languages automatically.

Description

Interpretation method, device and the translation playback equipment of multimedia file

Technical field

The present invention relates to field of computer technology, especially relate to interpretation method, the device of a kind of multimedia file And translation playback equipment.

Background technique

With the fast development of computer technology, the user using player plays multimedia file is more and more.Due to When playing multimedia file, it usually needs shown to the corresponding prompt information of multimedia file.For example, user is playing When song, it may be necessary to while showing the corresponding lyrics of song；User is when watching movie, it may be necessary to while showing that film is corresponding Subtitle.Since prompt information can be the corresponding character of different language, for example, user is with Chinese for mother tongue and English poor User, and song is English song, even if music player can show the English lyrics, but the song can mention user The Limited information of confession.

Multimedia audio-video data currently on the market is to first pass through manual type to realize the translation of different language, then pass through shadow For picture picture subtitle superposition by subtitle superposition into video pictures, audio-frequency unit is also to first pass through human translation voice is synchronized to view On frequency picture.This is also meaned that, when user sees the multimedia image data of other language, is artificially turned over if not first passing through It translates, multimedia image data can only play the subtitle and voice of other language, and user is to be difficult to understand for multimedia at this time The problem of content.

Summary of the invention

The main object of the present invention is the interpretation method for providing a kind of multimedia file, device and translation playback equipment, solution Certainly user cannot understand and cannot identify the content of the video of other language or audio in multimedia file.

In order to achieve the above-mentioned object of the invention, the present invention proposes a kind of interpretation method of multimedia file, comprising:

Obtain the original voice file in multimedia file；

It translates the original voice file and obtains new speech file, the language in the new speech file is appointed language；

The load attribute of the new speech file is configured, when so that the multimedia file playing, synchronous load is described new Voice document.

It further, include original audio file in the original voice file, in the acquisition multimedia file The step of original voice file, comprising:

Detect the beginning and end of the voice of each personage in the multimedia file；

Using the voice segments between the origin-to-destination of the voice of each personage as original audio file, wherein institute Stating original audio file is the first original voice file.

It further, further include having raw tone text file in the original voice file, the acquisition multimedia text The step of original voice file in part, comprising:

The original voice file is converted into the raw tone text file, wherein the raw tone text text Part is the second original voice file.

Further, the voice segments using between the origin-to-destination of the voice of each personage are as raw tone After the step of file, comprising:

Detect the format of the original audio file；

Judge whether the original audio file format is PCM format；

If it is not, being PCM format by the format change of the original audio file.

Further, the translation raw tone text file obtains new speech text file, the new speech text After the step of language in this document is the text file of appointed language, comprising:

The raw tone text file is translated, the new speech text file after being translated, the new speech Text file is the first new speech file.

Further, the translation original voice file obtains new speech file, the language in the new speech file The step of speech is appointed language, comprising:

Each new speech text file is subjected to speech synthesis, obtains new audio file, the new audio file is Second new speech file.

Further, the load attribute of the configuration new speech file, when so that the multimedia file playing, together Step loaded after the step of new speech file, comprising:

It receives and plays selection signal, the broadcastings selection signal is for selecting the broadcasting new speech text file and original Speech text file it is one or more, and selection plays one of original audio file and new audio file, and described New speech text file and the new audio file at least play one of which；

It is played out according to the broadcasting selection signal.

Further, the translation original voice file obtains new speech file, the language in the new speech file After the step of speech is appointed language, including,

It receives and searches information, the lookups information is the original audio file, the raw tone text file, described The text or sentence of new speech text file and the new audio file in any one；

Lookup result is played according to the lookup signal is corresponding.

The present invention also provides a kind of translating equipments of multimedia file, comprising:

Module is obtained, for obtaining the original voice file in multimedia file；

Translation module obtains new speech file for translating the original voice file, the language in the new speech file Speech is appointed language；

Configuration module, for configuring the load attribute of the new speech file, when so that the multimedia file playing, together Step loads the new speech file.

A kind of translation playback equipment, including memory, processor and application program, the application program are stored in described It in memory and is configured as being executed by the processor, the application program is configurable for executing described in any of the above-described Method.

Interpretation method, device and the translation playback equipment of a kind of multimedia file of the embodiment of the present invention, obtain multimedia Original voice file in file, translation original voice file obtain the new speech file of appointed language, and by configuring newspeak The load attribute of sound text file, when so that the multimedia file playing, synchronous load new speech file is realized not via people The mode of work translation, the voice document that a kind of original voice file is converted into other languages automatically, can help user more preferable Ground understands in time and identifies the content in the audio and video in multimedia file.

Detailed description of the invention

Fig. 1 is the flow diagram of the interpretation method of the multimedia file of one embodiment of the invention；

Fig. 2 is the part-structure schematic block diagram of the translating equipment of the multimedia file of one embodiment of the invention；

Fig. 3 is the acquisition modular structure schematic block diagram of one embodiment of the invention；

Fig. 4 is the acquisition modular structure schematic block diagram of one embodiment of the invention；

Fig. 5 is the part-structure schematic block diagram of the translating equipment of the multimedia file of one embodiment of the invention；

Fig. 6 is the part-structure schematic block diagram of the translating equipment of the multimedia file of one embodiment of the invention；

Fig. 7 is the translation module structural schematic block diagram of one embodiment of the invention；

Fig. 8 is the part-structure schematic block diagram of the translating equipment of the multimedia file of one embodiment of the invention；

Fig. 9 is the image schematic diagram of the audio file of the label detection of one embodiment of the invention.

The embodiments will be further described with reference to the accompanying drawings for the realization, the function and the advantages of the object of the present invention.

Specific embodiment

It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, it is not intended to limit the present invention.

The embodiment of the present invention is described below in detail, examples of the embodiments are shown in the accompanying drawings, wherein from beginning to end Same or similar label indicates same or similar element or element with the same or similar functions.Below with reference to attached The embodiment of figure description is exemplary, and for explaining only the invention, and is not construed as limiting the claims.

Those skilled in the art of the present technique are appreciated that unless otherwise defined, all terms used herein (including technology art Language and scientific term), there is meaning identical with the general understanding of those of ordinary skill in fields of the present invention.Should also Understand, those terms such as defined in the general dictionary, it should be understood that have in the context of the prior art The consistent meaning of meaning, and unless idealization or meaning too formal otherwise will not be used by specific definitions as here To explain.

Referring to Fig.1, the embodiment of the present invention provides a kind of method of the translation of multimedia file, comprising steps of

Original voice file in S10, acquisition multimedia file；

S20, the translation original voice file obtain new speech file, and the language in the new speech file is specified language Speech；

The load attribute of S30, the configuration new speech file, when so that the multimedia file playing, synchronous load institute State new speech file.

The above method is applied to translation playback equipment, and above-mentioned translation playback equipment is generally video translation player, audio The intelligent translations playback equipments such as player are translated, the present embodiment is explained so that video translates player as an example, has The functions such as playing video file, audio file, display subtitle.As described in above-mentioned steps S10, above-mentioned multimedia file includes original Voice document, video file and header file etc..Wherein, it is (i.e. original to include at least the first original voice file for original voice file One of audio file) or the second original voice file (i.e. raw tone text file), in order to preferably be illustrated, this Embodiment will be illustrated for including simultaneously above two voice document, it is worth mentioning at this point that, above-mentioned raw tone text This document is the subtitle file in multimedia file, and original audio file is audio files.

As described in above-mentioned steps S20, original voice file can be the raw tone text of other language in addition to user's mother tongue Part, new speech file can not be the mother tongue of user, the file for the appointed language watched can be wanted for user, wherein new speech File includes the second new speech file (new audio file) and the first new speech file (new speech text file).

Such as above-mentioned steps S30, after obtaining new speech file, to enable users to understand the content in multimedia file, It needs when playing, synchronous load new speech file.

In the present embodiment, new speech file includes new audio file and new speech text file, and original voice file includes Original audio file and raw tone text file can carry out a variety of display methods and use, the user that can be further improved Study and understanding to video file.For example, display new speech text file can contribute to user and go to understand multimedia file； New speech text file and raw tone text file play simultaneously, can further help to user's study and identify more matchmakers Language and pronunciation in body file.It plays new audio file and can contribute to user and understand video file, play new speech text File, raw tone text file and original audio file can help user to learn and identify the pronunciation in multimedia file Effect.

It include original audio file in above-mentioned multimedia file, the raw tone text obtained in multimedia file In the step S10 of file, comprising:

Detect the beginning and end of the voice of each personage in the original audio file；

Wherein using the voice segments between the origin-to-destination of the voice of each personage as original audio file, described Original audio file is original voice file.

In the present embodiment, multiple audio objects are contained in original voice file, such as ambient noise, personage's voice or dynamic Plant issue sound, when being detected to raw tone, detect only personage voice signal, as ambient noise, The sound that shot or animals and plants issue will not be detected, and pass through voice activity detection (Voice Activity Detection, VAD) endpoint of personage's voice will not be continuous and in an original audio file in technology detection audio file It constantly all makes a sound, so the terminal of the starting point of the voice of detection and voice signal, is a certain company in original voice file Continuous a segment of audio file constitutes an original audio file (i.e. the first original voice file).Wherein, original audio file packet Include a certain continuous original audio file detected when multiple personages individually speak, i.e. a continuous word of people For an original audio file, next people's word is another original audio file later, further includes a certain continuous One original audio file is that the voice by multiple personages while speaking is grouped together into, in present embodiment, preferably , the voice segments of each personage are not to be overlapped respectively, that is, are alone that the voice said forms an original audio file, because individually One people speaks, and the tone color and tone spoken are not much different, more convenient to be detected, and the original audio file detected is more To be accurate, the beginning and end of the voice of label is not in error.

It further include having raw tone text file in above-mentioned original voice file, it is original in the acquisition multimedia file In the step S10 of voice document, comprising:

The original audio file is converted into the raw tone text file, wherein the raw tone text text Part is the original voice file.

As described in above-mentioned steps, raw tone text file is the subtitle file in original voice file, because of original text Part is there is comprising one or both of raw tone text file or original audio file, in the present embodiment, when only wrapping When containing a kind of original audio file, original audio file can be converted into raw tone text file through the above steps, solved Word speed of certainly speaking in original voice file is too fast, and wherein clips non-type pronunciation, and user is difficult to depend merely on speech understanding, this When raw tone text file can be leaned on tentatively to be understood, further improve understanding of the user to video file；Or it is former If beginning speech text file both of which includes that can directly acquire raw tone text file, acquisition time is saved.

The above-mentioned voice segments using between the origin-to-destination of the voice of each personage are as the step of original audio file After rapid, comprising:

Detect the original audio file format；

Judge whether the original audio file format is PCM format；

If it is not, being PCM format by the original audio file format change.

As described in above-mentioned steps, when detecting original audio file format not is PCM format, it is preferred that video translation The format for the original audio file that player transition detection arrives, to be changed to the voice document of PCM format, PCM (Pulse Code Modulation---- pulse-code modulation recording), the analog signals such as sound are exactly become to the spike train of symbolism, then remembered Record.PCM signal is the digital signal being made of the symbols such as [1], [0].With analog signal ratio, it is not vulnerable to the clutter of conveyer system And the influence of distortion.The fairly good impact effect of sound quality can be obtained in wide dynamic range.And the track PCM is different from video track, it can For after-recording.In addition, the audio file of PCM format is that analog audio signal is directly formed through analog-to-digital conversion (A/D transformation) Binary sequence, video translation player accurately can be decoded process to it.The initial format packet of original audio file A variety of, such as PCM, WMV, MP4, DAT, RM multiple format is included, the audio file formats parsed in the present embodiment are preferably PCM lattice Formula.

The above-mentioned translation original voice file obtains new speech file, and the language in the new speech file is specified language In the step S20 of speech, comprising:

In the present embodiment, the first new speech file is new speech text file, that is, the subtitle file after translating, Yong Huke To understand the content of video file by the subtitle after translation, facilitate understanding.

In the present embodiment, inside above-mentioned new audio file (voice) can be carried out by new speech text file (subtitle) It is converted to, the new audio file of multistage carries out the new speech file of the available completion of synthesis, is playing multimedia file Process, it is corresponding to play new audio file according to the play time of original audio file；When playing multimedia file, Ke Yiwei New speech file and new speech text file all replace original voice file and raw tone text file, or newspeak The new audio file in part in sound file replaces the part original audio file in corresponding original voice file, does not do in detail in this It repeats, for user when watching video, playing new audio file, new speech text file and original audio text file can be into one Play the role of fully understanding video content to user in step ground；Video translates player plays original audio file, new speech text This document and raw tone text file, user pass through viewing video, raw tone text file (original subtitle), new speech text This document (subtitle after translation) and the video (shape of the mouth as one speaks of viewing speaker) of simultaneous display can play study newspeak to user The effect of speech.

The above-mentioned load attribute to the configuration new speech file, it is synchronous when so that the multimedia file playing After the step S30 for loading the new speech file, comprising:

It is played out according to the broadcasting selection signal.

In the present embodiment, sending can be selected for user by playing selection signal, or video player is automatic What selection issued, user is according to itself Grasping level or hobby interests to voice in multimedia file, and voluntarily selection plays one A or multiple files, to improve the usage experience of user.As if user is high to the language acquisition degree in original audio file, Can choose and play original audio file and new speech text file, viewing when improve oneself hearing energy to this kind of language Power；Or user is weaker to the grasp ability of the language, can choose and plays original audio file, raw tone text file And new speech text file, user can pass through original audio file, raw tone text file (original subtitle), new speech text The video (shape of the mouth as one speaks of viewing speaker) of file (subtitle after translation) and simultaneous display, to learn the sounding, language of this kind of language Sentence and semanteme；Or user can choose broadcasting when be not desired to learn the language, merely desire to understand the content of the multimedia file New audio file (voice after translation), new speech text file, the content being fully understood that inside the multimedia.

The load attribute of the above-mentioned configuration new speech file, when so that the multimedia file playing, synchronous load institute State the step S30 of new speech file, further includes:

The play time length of each original audio file is obtained, and obtains corresponding each new audio text The play time length of part；

Judge whether the play time length of each original audio file is greater than the corresponding new audio file Play time length；

If more than then selecting to play the corresponding new audio file；

If being less than, select to play the corresponding original audio file.

In the present embodiment, it is worth mentioning at this point that, new speech text file (subtitle of translation languages text) display starting point Time and terminal time are the starting time and terminal time of corresponding original voice file, and video translation player can play at this time Original audio file, raw tone text file and new speech text file, video translation player also can receive user and do Selection out, (original audio file, raw tone text file and new such as in the file that can play of video translation player Speech text file), selection only plays some or certain several files；Or after the time span of original audio file is greater than translation New audio file synthesis output voice segments time span, i.e., the starting time of the new audio file after every section translation can be to upper original The starting time of beginning audio file, video player can automatically select the new audio file of output, not export original audio file, newly Speech text file (subtitle of translation languages text) display starting time is corresponding raw tone text file (original subtitle) Starting time, terminal time is the terminal time of corresponding new speech text file, and video translation player can select to play new Audio file, new speech text file and raw tone text file, in the case, video translate player to new multimedia After the broadcasting of file makes a choice, video translation player also can receive the selection that user makes, and such as broadcast in video translation It puts in the file that device can play (new audio file, new speech text file and raw tone text file), selection only plays Some or certain several files；Or the time span of original audio file is equal to new audio file synthesis output voice segments after translation Time span, the i.e. starting time and terminal time of original audio file and the starting time of new audio file and terminal time are same Step corresponds to, then video translation player the case where can playing then include new audio file and new speech text file one kind, One or more, the selection of the receivable user of the player of video translation at this time of original voice file and raw tone text file Some or multiple files play out for example, multimedia file can be gif file.

It is noted that in one embodiment, load attribute is to original voice file and new audio file solution Analysis, load time broadcast information, specifically, multimedia file multi-section point includes original voice file, video file and head text Therefore part etc. before playing video file, can first play header file, video file has one relative to multimedia file That is, it plays the time of header file synchronization time when broadcasting, K original voice file and M is generally had in multimedia file Video file, an original audio file include the interval between multistage original audio file and multistage original audio file Section marks the starting time Ts11 and terminal time Te11 of the broadcasting of every section of original audio file (referring to Fig. 9, wherein at one It successively include first segment voice segments: Ts11 to Te11 since time shaft starting point on time shaft；Second segment voice segments: Ts12 is extremely N sections of voice segments of Te12 ... ...: Ts1n to Te1n), the initial time Ts11 and terminal time of every section of original audio file Te11 adds Toffset1 synchronization time of the relatively more playback of media files of correct document analysis, that is, may know that one it is original N number of first original voice file in voice document is respectively as follows: Toffset1+Ts11 relative to the play time of system, Toffset1+Ts12,………,Toffset1+Ts1n；Handled first segment original audio file in original audio file it Afterwards, successively respectively other K-1 original audio files in original audio file are handled to obtain in K audio file All first original voice files information and relative to multimedia file play time, such as it is shown below:

The temporal information of the original audio file of first voice document: Toffset1+Ts11, Toffset1+ Ts12,………,Toffset1+Ts1n；

The temporal information of the original audio file of second voice document: Toffset2+Ts21, Toffset2+ Ts22,………,Toffset2+Ts2n；

The temporal information of the original audio file of k-th voice document: Toffsetk+Tsk1, Toffsetk+ Tsk2 ... ..., Toffsetk+Tskl, wherein Toffsetk is play time of the k-th voice document relative to system, Tsk1 is the initial time of first voice segments of k-th original voice file, and Tskl is the last of k-th original voice file The initial time of one voice segments L, can regard a multimedia file as includes Y audio file, and records Y audio file Respectively relative to the starting time and terminal time of the play time of system.

Each audio file is converted into raw tone text file, and each raw tone text file is added opposite In the temporal information of system plays, is corresponded with each audio file, obtain Y raw tone text file, it is original by Y Speech text file translation obtains Y new speech text file.Y synthesis is obtained into Y new audio files, and obtains Y The respective duration T r of new audio file, wherein r is positive integer, and 0 < r < y+1, Y new audio files replace original sound one by one Frequency file, the mode for taking original audio file to be aligned one by one with new audio file initial time are replaced, new speech text file Display is synchronous with new speech file and video frame, when the time span of original audio file and new audio file export the time not Together, there are two types of situations:

A) original audio file is only exported, does not export new audio file, new speech text file shows initial time and cuts Only the time is initial time and the deadline of corresponding original audio file, if Z new audio texts of n-th new speech file The initial time of part is ToffsetN+TSZ and deadline is ToffsetN+TEZ, and the starting point of corresponding video frame is ToffsetN, the picture frame number that subtitle persistently occurs are (TEz-TSZ) X video frame per second, and video frame rate is compiled by multimedia file and solved Code format determination, for example be 30 frames/second.

B) new audio file is exported, original audio file is not exported.New speech text file shows that initial time is to correspond to The initial time of raw tone text file, deadline is the deadline of corresponding speech synthesis languages voice segments, if n-th The initial time of the Z voice segments of audio file is ToffsetN+TSZ, the corresponding new audio file of original audio file Duration is Tr, and the starting point of corresponding video frame is ToffsetN, and the picture frame number that subtitle persistently occurs is Tr X video frame per second (r=z), video frame rate is determined by multimedia file code/decode format, for example is 30 frames/second.

When the time span of original audio file is identical as the new audio file output time, such as GIF living document, new speech File and new speech text file can completely or partially replace original voice file and new speech text file, specifically can root Selected according to user, can also system play selection automatically, improve the usage experience of user.

The above-mentioned translation original voice file obtains new speech file, and the voice in the new speech file is specified language After the step S20 of speech, including,

Lookup result is played according to the lookup signal is corresponding.

In the present embodiment, occur in the voice or new audio file that input needs the original audio file retrieved to occur The voice the crossed crucial sentence that perhaps raw tone text file occurred or some text or new speech text file go out The crucial sentence or some text now crossed, then to all raw tone text files, original audio file or after translating New speech text file and new audio file carry out character, the matching search of each audio file one by one, obtain crucial sentence and The corresponding speech text file of word and corresponding voice segment file (i.e. original audio file or new audio file) and video frame In the location information of multimedia file, in this way, the multimedia file segment of corresponding crucial sentence and word can be played.

The interpretation method of a kind of multimedia file of the embodiment of the present invention, by obtaining the raw tone in multimedia file File, translation original voice file obtain the new speech file of appointed language, and the load by configuring new speech text file Attribute, when so that the multimedia file playing, synchronous load new speech file is realized not via the mode of human translation, certainly A kind of dynamic voice document that original voice file is converted into other languages, can help user understand preferably, in time and Identify the content in the audio and video in multimedia file.

Referring to Fig. 2, in the present embodiment, a kind of translating equipment of multimedia file is provided, comprising:

Module 10 is obtained, for obtaining the original voice file in multimedia file；

Translation module 20 obtains new speech file for translating the original voice file, in the new speech file Language is appointed language；

Configuration module 30, for configuring the load attribute of the new speech file, when so that the multimedia file playing, It is synchronous to load the new speech file.

Above-mentioned apparatus is applied to translation playback equipment, and above-mentioned translation playback equipment is generally video translation player, audio The intelligent translations playback equipments such as player are translated, the present embodiment is explained so that video translates player as an example, has The functions such as playing video file, audio file, display subtitle.Above-mentioned acquisition module 10, multimedia file include raw tone text Part, video file and header file etc..Wherein, it obtains module 10 and obtains original voice file, original voice file includes at least the One of one original voice file (i.e. original audio file) or the second original voice file (i.e. raw tone text file), In order to preferably be illustrated, present embodiment will be illustrated for including simultaneously above two voice document, be worth one It is mentioned that, above-mentioned raw tone text file is the subtitle file in multimedia file, and original audio file is audio files.

Above-mentioned translation module 20, original voice file can be the original voice file of other language in addition to user's mother tongue, New speech file can not be the mother tongue of user, can want the text for the appointed language watched for user by translation module 20 Part, wherein new speech file includes the second new speech file (new audio file) and the first new speech file (new speech text text Part).

Above-mentioned configuration module 30, after obtaining new speech file, to enable users to understand the content in multimedia file, It needs when playing, the synchronous load new speech file of configuration module 30.

Referring to Fig. 3, in the present embodiment, the acquisition module 10 includes:

First detection unit 101, for detecting the beginning and end of the voice of each personage in the multimedia file；

Determination unit 102, for using the voice segments between the origin-to-destination of the voice of each personage as original Audio file, wherein the original audio file is the first original voice file.

Such as above-mentioned first detection unit 101101, contain multiple audio objects in original voice file, as ambient noise, The sound that personage's voice or animals and plants issue, when being detected to raw tone, the voice signal of the only personage detected, Sound as ambient noise, shot or animals and plants issue will not be detected, and pass through voice activity detection (Voice Activity Detection, VAD) endpoint of personage's voice in technology detection audio file, and in an original audio file, It continuously will not all make a sound, so the terminal of the starting point of the voice of detection and voice signal, is original voice file In a certain continuous a segment of audio file, constitute an original audio file (i.e. the first original voice file).

Wherein, original audio file includes that detect when multiple personages individually speak a certain continuous one is original Audio file, i.e. a continuous word of people are an original audio file, and next people's word is that another is former later It is voice by multiple personages while combination of speaking one that beginning audio file, which further includes an a certain continuous original audio file, Formation is played, in present embodiment, it is preferred that the voice segments of each personage are not to be overlapped respectively, that is, are alone the voice said An original audio file is formed, because an independent people speaks, the tone color and tone spoken are not much different, and are more convenient to be detected It arrives, and the original audio file detected is more accurate, the beginning and end of the voice of label is not in error.

Referring to Fig. 3, above-mentioned acquisition module 10, further includes:

First converting unit 103, for the original voice file to be converted into the raw tone text file, In, the raw tone text file is the second original voice file.

As described in above-mentioned first converting unit 103, raw tone text file is the subtitle text in original voice file Part, because original document is there is comprising one or both of raw tone text file or original audio file, in this implementation In example, when only including a kind of original audio file, original audio file can be converted to by the first converting unit 103 It is too fast to solve word speed of speaking in original voice file, and wherein clips non-type pronunciation, user for raw tone text file It is difficult to depend merely on speech understanding, raw tone text file can be leaned on tentatively to be understood at this time, further improve user couple The understanding of video file；Or if raw tone text file both of which includes that can directly acquire raw tone text file, Save acquisition time.

Referring to Fig. 4, above-mentioned acquisition module 10 further include:

Second detection unit 104, for detecting the raw tone audio file formats；

First judging unit 105, for judging whether the raw tone audio file formats are PCM format；

Second converting unit 106, when being detected as no, for the raw tone audio file formats to be changed into PCM Format.

In the present embodiment, the first judging unit 105 is used for the original audio file lattice that second detection unit 104 detects Whether formula is PCM format, it is preferred that the original audio that video translation player is arrived by 106 transition detection of the second converting unit The format of file, to be changed to the voice document of PCM format, PCM (Pulse Code Modulation---- pulse-code modulation record Sound), the analog signals such as sound are exactly become to the spike train of symbolism, then recorded.PCM signal is accorded with by [1], [0] etc. Number constitute digital signal.With analog signal ratio, it is not vulnerable to the clutter of conveyer system and the influence of distortion.Wide dynamic range, The fairly good impact effect of sound quality can be obtained.And the track PCM is different from video track, can be used for after-recording.In addition, PCM lattice The audio file of formula is the binary sequence that analog audio signal is directly formed through analog-to-digital conversion (A/D transformation), and video translation is broadcast Putting device accurately can be decoded process to it.The initial format of original audio file include it is a variety of, as PCM, WMV, MP4, The multiple formats such as DAT, RM, the audio file formats parsed in the present embodiment are preferably PCM format.

Referring to Fig. 7, above-mentioned translation module 20 includes:

Translation unit 201, for being translated to the raw tone text file, the new speech text after being translated File, the new speech text file are the first new speech file.

As described in above-mentioned steps, the first new speech file is new speech text file, i.e. word after the translation of translation unit 201 Curtain file, user can understand the content of video file by the subtitle after translation, facilitate understanding.

Above-mentioned translation module 20 further include:

Each new speech text file is carried out speech synthesis, obtains new audio file by synthesis unit 202, described New audio file is the second new speech file.

Above-mentioned new audio file (voice) can by new speech text file (subtitle) by internal synthesis unit 202 into Row is converted to, and the new audio file of multistage carries out the new speech file of the available completion of synthesis, is playing multimedia file Process, it is corresponding to play new audio file according to the play time of original audio file；It, can be with when playing multimedia file Original voice file and raw tone text file are all replaced for new speech file and new speech text file, or new The new audio file in part in voice document replaces the part original audio file in corresponding original voice file, does not do in detail in this It carefully repeats, for user when watching video, playing new audio file, new speech text file and original audio text file can be into Play the role of fully understanding video content to user to one step；Video translates player plays original audio file, new speech Text file and raw tone text file, user pass through viewing video, raw tone text file (original subtitle), new speech It is new that text file (subtitle after translation) and the video (shape of the mouth as one speaks of viewing speaker) of simultaneous display can play study to user The effect of language.

Referring to Fig. 5, the translating equipment of above-mentioned multimedia file further include:

First receiving module 40 is used to receive broadcasting selection signal, and the broadcasting selection signal is for selecting described in broadcasting New speech text file and raw tone text file it is one or more, and selection plays original audio file and new audio One of file, and the new speech text file and the new audio file at least play one of which；It is broadcast according to described Selection signal is put to play out；

First playing module 50, for being played out according to the broadcasting selection signal.

Above-mentioned broadcasting selection signal can select sending for user, or video player automatically selects sending , and the broadcasting selection signal that the first playing module 50 is received according to the first receiving module 40, it plays out, wherein user According to itself Grasping level or hobby interests to voice in multimedia file, voluntarily selection plays one or more files, with Improve the usage experience of user.If it is former to can choose broadcasting as user is high to the language acquisition degree in original audio file Beginning audio file and new speech text file, viewing when improve oneself Listening Ability of Ethnic to this kind of language；Or user couple The grasp ability of the language is weaker, can choose and plays original audio file, raw tone text file and new speech text File, user can be by original audio file, raw tone text file (original subtitle), new speech text files (after translation Subtitle) and simultaneous display the video shape of the mouth as one speaks of speaker (viewing), to learn the sounding, sentence and semanteme of this kind of language；Or When person user is not desired to learn the language, merely desire to understand the content of the multimedia file, it can choose and play new audio file (voice after translation), new speech text file, the content being fully understood that inside the multimedia.

Referring to Fig. 8 and Fig. 9, the translating equipment of above-mentioned multimedia file further include:

Time-obtaining module 80, for obtaining the play time length of each original audio file, and acquisition pair Each of answer the play time length of the new audio file；

Judgment module 90, for judging whether the play time length of each original audio file is greater than corresponding institute State the play time length of new audio file；

Selecting module 100, for judge if more than when, the corresponding new audio file of selection broadcasting；It is also used to judge If be less than, selection plays the corresponding original audio file.

In one embodiment, new speech text file (subtitle of translation languages text) display starting time and terminal time For the starting time and terminal time of corresponding original voice file, video translation player can play at this time original audio file, Raw tone text file and new speech text file, video translation player also can receive the selection that user makes, such as exist (original audio file, raw tone text file and new speech text text in the file that video translation player can play Part), selection only plays some or certain several files；Or the time span of original audio file is greater than new audio file after translation Synthesis output voice segments time span, i.e., the starting time of the new audio file after every section translation can be to upper original audio file Starting time, video player can automatically select the new audio file of output, not export original audio file, new speech text text Part (subtitle of translation languages text) display starting time is the starting time of corresponding raw tone text file (original subtitle), Terminal time is the terminal time of corresponding new speech text file, and video translation player can select to play new audio file, new Speech text file and raw tone text file, in the case, video translate broadcasting of the player to new multimedia file After making a choice, video translation player also can receive the selection that user makes, and can such as broadcast in video translation player In the file put (new audio file, new speech text file and raw tone text file), selection only plays some or certain Several files；Or the time span of original audio file is equal to new audio file synthesis output voice segments time span after translation, That is the starting time and terminal time with the starting time of new audio file of original audio file synchronized with terminal time it is corresponding, then The case where video translation player can play then includes one kind of new audio file and new speech text file, raw tone text Part and raw tone text file it is one or more, the player of video translation at this time can receive user selection some or it is multiple File plays out for example, multimedia file can be gif file.

Referring to Fig. 6, the translating equipment of above-mentioned multimedia file further include:

Second receiving module 60 searches information for receiving, and the lookup information is the original audio file, the original The text or sentence of beginning speech text file, the new speech text file and the new audio file in any one；

Second playing module 70, for playing lookup result according to the lookup signal is corresponding.

In the present embodiment, the second receiving module 60 is by receiving the voice for needing the original audio file retrieved to occur Or crucial sentence or some text that the voice or raw tone text file occurred in new audio file occurred, or The crucial sentence or some text that person's new speech text file occurred, the second playing module 70 is then to all raw tones Text file, original audio file or translation after new speech text file and new audio file carry out one by one character, each Audio file matching search, obtains crucial sentence and the corresponding speech text file of word and corresponding voice segment file is (i.e. former Beginning audio file or new audio file) and video frame in the location information of multimedia file, play corresponding crucial sentence and word Multimedia file segment, if raw tone text file is English, user inputs " I ", and retrieval module retrieves about " I " One or more snippets raw tone text file and corresponding video file, user can play the view for wanting to watch in wherein selection Frequency file also can choose the video file after playing translation, facilitate user after finishing watching some video, it is desirable to watch again Wherein when excellent segment, it is able to carry out accurate lookup.

The translating equipment of a kind of multimedia file of the embodiment of the present invention, by obtaining the raw tone in multimedia file File, translation original voice file obtain the new speech file of appointed language, and the load by configuring new speech text file Attribute, when so that the multimedia file playing, synchronous load new speech file is realized not via the mode of human translation, certainly A kind of dynamic voice document that original voice file is converted into other languages, can help user understand preferably, in time and Identify the content in the audio and video in multimedia file.

In one embodiment, a kind of translation playback equipment, including memory, processor and application program, institute are additionally provided It states application program to be stored in the memory and be configured as being executed by the processor, the application program is configured as For executing method described in any of the above embodiments.Translation playback equipment includes the intelligence such as video translation player, language learner Translate playback equipment.

The above description is only a preferred embodiment of the present invention, is not intended to limit the scope of the invention, all utilizations Equivalent structure or equivalent flow shift made by description of the invention and accompanying drawing content is applied directly or indirectly in other correlations Technical field, be included within the scope of the present invention.

Claims

1. a kind of interpretation method of multimedia file characterized by comprising

Obtain the original voice file in multimedia file；

The load attribute of the new speech file is configured, it is synchronous to load the new speech when so that the multimedia file playing File.

2. the interpretation method of multimedia file according to claim 1, which is characterized in that wrapped in the original voice file Original audio file is included, the step of the original voice file obtained in multimedia file, comprising:

Using the voice segments between the origin-to-destination of the voice of each personage as original audio file, wherein the original Beginning audio file is the first original voice file.

3. the interpretation method of multimedia file according to claim 2, which is characterized in that in the original voice file also Include raw tone text file, it is described obtain multimedia file in original voice file the step of, comprising:

The original audio file is converted into the raw tone text file, wherein the raw tone text file is Second original voice file.

4. the interpretation method of multimedia file according to claim 2, which is characterized in that described by each personage's After the step of voice segments between the origin-to-destination of voice are as original audio file, comprising:

Detect the format of the original audio file；

Judge whether the original audio file format is PCM format；

If it is not, being PCM format by the format change of the original audio file.

5. the interpretation method of multimedia file according to claim 3, which is characterized in that the translation raw tone The step of file obtains new speech file, and the language in the new speech file is appointed language, comprising:

6. the interpretation method of multimedia file according to claim 5, which is characterized in that the translation raw tone The step of file obtains new speech file, and the language in the new speech file is appointed language, comprising:

7. the interpretation method of multimedia file according to claim 6, which is characterized in that the configuration new speech text The load attribute of part, when so that the multimedia file playing, after synchronous the step of loading the new speech file, comprising:

It receives and plays selection signal, the broadcasting selection signal plays the new speech text file and raw tone for selecting Text file it is one or more, and selection plays one of original audio file and new audio file, and the newspeak Sound text file and the new audio file at least play one of which；

It is played out according to the broadcasting selection signal.

8. the interpretation method of multimedia file according to claim 6, which is characterized in that the translation raw tone After the step of file obtains new speech file, and the language in the new speech file is appointed language, including,

It receives and searches information, the lookup information is the original audio file, the raw tone text file, the newspeak The text or sentence of sound text file and the new audio file in any one；

Lookup result is played according to the lookup signal is corresponding.

9. a kind of translating equipment of multimedia file characterized by comprising

Module is obtained, for obtaining the original voice file in multimedia file；

Translation module obtains new speech file for translating the original voice file, and the language in the new speech file is Appointed language；

Configuration module, it is synchronous to add when so that the multimedia file playing for configuring the load attribute of the new speech file Carry the new speech file.

10. a kind of translation playback equipment, including memory, processor and application program, the application program is stored in described It in memory and is configured as being executed by the processor, which is characterized in that the application program is configurable for right of execution Benefit requires 1 to 8 described in any item methods.