CN107172449A

CN107172449A - Multi-medium play method, device and multimedia storage method

Info

Publication number: CN107172449A
Application number: CN201710462699.7A
Authority: CN
Inventors: 陈凌奇
Original assignee: Whaley Technology Co Ltd
Current assignee: Whaley Technology Co Ltd
Priority date: 2017-06-19
Filing date: 2017-06-19
Publication date: 2017-09-15

Abstract

The invention discloses a kind of multi-medium play method.This method includes：Configuration is dubbed determined by S1, the multimedia fileinfo of acquisition and user, fileinfo includes multimedia video, background audio, the storage information for dubbing text, dubbing configuration includes the vocal print feature of each role；S2, the multimedia video, background audio obtained according to fileinfo, text is dubbed；Text is dubbed described in S3, basis and configuration generation is dubbed and dubs audio, this is dubbed the vocal print feature of each role in audio and matched with dubbing the vocal print feature of each role in configuration；S4, by described dub audio and background audio synthesizes the multimedia audio；S5, the synchronous broadcasting multimedia video and audio.The invention also discloses a kind of multimedia playing apparatus and a kind of multimedia storage method.The storage resource of multimedia occupancy can be greatly reduced in one aspect of the present invention, on the other hand can require that adjustment role dubs according to user, so as to meet the appreciation demand of user individual.

Description

Multi-medium play method, device and multimedia storage method

Technical field

The present invention relates to a kind of multi-medium play method and device.

Background technology

Multimedia order programme system (Demand Multimedia System) is the common form of multi-media network application, main To apply includes：Video request program (Video on Demand, VOD), film-on-demand (Movie on Demand, MOD), news point Broadcast (News on Demand, NOD) etc..With the fast development of the technologies such as network, computer, audio frequency and video processing, multimedia point The service broadcast has been widely applied.

Client/server (C/S) pattern is used multi-media service system more.In fact, this is also just because of multimedia The characteristic (needing large storage capacity or high throughput) such as data volume is big promotes the realization of Client/Server pattern, therefore many Media server is exactly the computer system that multimedia service is provided for other systems (multimedia client).Existing multimedia clothes Business system is for the multimedia storage mode such as movie and television play often as shown in figure 1, separating to deposit by its video and audio file Storage, in user's program request, in real time plays video and audio sync.One film or musical works generally have multiple audio versions (most commonly multiple languages), so require many parts of voice datas of storage, on the one hand need to take substantial amounts of storage money Source；On the other hand, can only be heard during multimedia it is original dub, and original dub not necessarily is adapted to all users, it is difficult to full The appreciation demand of sufficient user individual.

The content of the invention

The technical problems to be solved by the invention be to overcome prior art not enough there is provided a kind of multi-medium play method and Device, on the one hand can be greatly reduced the storage resource of multimedia occupancy, on the other hand can require that adjustment role dubs according to user, So as to meet the appreciation demand of user individual.

It is of the invention specific using following technical scheme solution above-mentioned technical problem：

A kind of multi-medium play method, comprises the following steps：

Configuration is dubbed determined by S1, the multimedia fileinfo of acquisition and user, the fileinfo includes multimedia regard Frequently, background audio, dub the storage information of text, it is described to dub configuration and include the vocal print feature of each role；

S2, the multimedia video, background audio obtained according to the fileinfo, text is dubbed；

S3, according to it is described dub text and dub configuration generation dub audio, this dub in audio the vocal print feature of each role and The vocal print feature for dubbing each role in configuration matches；

S4, by described dub audio and background audio synthesizes the multimedia audio；

S5, the synchronous broadcasting multimedia video and audio.

Further, it is described to dub configuration also including dubbing used languages.Further, it is described to dub configuration also Including dubbing used dialect type.

Preferably, step S1~S4 is completed by the server of distal end, and step S5 is completed by local intelligent terminal, the clothes It is engaged between device and intelligent terminal that information exchange can be achieved.

A kind of multimedia playing apparatus, including：

Data obtaining module, dubs configuration, the fileinfo for obtaining determined by multimedia fileinfo and user It is described to dub the vocal print spy that configuration includes each role including multimedia video, background audio, the storage information for dubbing text Levy；

File acquisition module, for obtaining the multimedia video, background audio according to the fileinfo, dubbing text；

Audio generation module is dubbed, audio is dubbed for dubbing text according to and dubbing configuration generation, this is dubbed in audio The vocal print feature of each role matches with dubbing the vocal print feature of each role in configuration；

Audio synthesis module, for audio and the background audio of dubbing to be synthesized into the multimedia audio；

Playing module, the multimedia video and audio are played for synchronous.

Preferably, data obtaining module, file acquisition module, dub audio generation module, audio synthesis module and be arranged at In the server of distal end, playing module is arranged in local intelligent terminal, can be achieved between the server and intelligent terminal Information exchange.

Following technical scheme can also be obtained based on same inventive concept：

A kind of multimedia storage method, extracts video, the audio of original multimedia file first；Then from the audio extracted In be partitioned into and background audio and dub audio；The audio of dubbing being partitioned into is converted to and dubs text；By the video, background Audio, dub text and store respectively.

Further, this method is further comprising the steps of：From be partitioned into dub audio in extract the sound of each role Line feature, and will record the text message of the vocal print feature of each role add described in dub in text.

Compared with prior art, the invention has the advantages that：

The present invention by multimedia video, background audio, dub text and store respectively, and synthesized in real time when playing；Due to text Notebook data is much smaller compared to the memory space that voice data takes, therefore the storage of mass multimedia resource can be greatly reduced disappears Consumption；On the other hand, the present invention synthesize it is multimedia dub audio when, can be that role chooses the vocal print that dub according to user preferences Feature, meets the appreciation demand of user individual, improves Consumer's Experience.

Brief description of the drawings

Fig. 1 is existing multimedia storage mode schematic diagram；

Fig. 2 is multimedia storage mode schematic diagram of the present invention；

Fig. 3 is the principle schematic diagram of one specific embodiment of multimedia playing apparatus of the present invention；

Fig. 4 is the example user interface that configuration is dubbed for determination；

Fig. 5 is the schematic flow sheet of audio server Composite tone.

Embodiment

It is big and user individual can not be met appreciate demand for the storage resource consumption amount present in prior art Deficiency, thinking of the invention is by multimedia video, background audio, dubs text and store respectively, and is closed in real time when playing Into；Because text data is much smaller compared to the memory space that voice data takes, therefore mass multimedia resource can be greatly reduced Storage consumption；On the other hand, the present invention synthesize it is multimedia dub audio when, can according to user preferences be role choose match somebody with somebody The vocal print feature of sound, meets the appreciation demand of user individual, improves Consumer's Experience.

So-called vocal print (Voiceprint), is the sound wave spectrum for the carrying verbal information that electricity consumption acoustic instrument is shown.It is modern Scientific investigations showed that, vocal print not only has specificity, and the characteristics of have relative stability.In real life, everyone says Language during words, the characteristics of having oneself.Between the people being very familiar with, can a listening and mutually it is discernable, here it is language The variant characteristic of sound people.The fine difference of human body phonatory organ can all cause the change of sounding air-flow, cause tonequality, tone color Difference.In addition, the also faster or slower of the custom of people's sounding, firmly varies, the difference of loudness of a sound, the duration of a sound is also resulted in.Pitch, sound By force, the duration of a sound, tone color are referred to as voice " four key elements " in linguistics, and these factors are decomposed into more than 90 and plant feature again.These Feature is demonstrated by different wave length, frequency, intensity, the rhythm of alternative sounds.The change of sound wave can be converted into the strong of electric signal The change of these electric signals is depicted as wave spectrum figure by degree, wavelength, frequency, tempo variation, instrument again, just into vocal print.From sound The characteristic parameter for characterizing speaker's personal characteristics can be extracted in line signal（For example parameters of cepstrum LPCC, Mei Er frequencies are fallen Compose parameter MFCC etc.）, i.e. vocal print feature.Have benefited from voice process technology（Speech recognition especially therein, voice are closed Into, voice coding, Application on Voiceprint Recognition this four big branch technique）And the fast development of computer and network technologies, it is that multimedia is entered Online dub in real time of row is possibly realized.

The present invention carries out multimedia storage using mode as shown in Figure 2 in advance.Specific storage method is as follows：

Step 1, the video for extracting original multimedia file, audio；

Multimedia（Multimedia）It is the synthesis of media, generally comprises text, the media form such as sound and image. In computer systems, multimedia refers to a kind of man-machine interactive information interchange for combining two or more media and propagates matchmaker Body.Conventional media include word, picture, photo, sound, animation and film, and the interaction function that formula is provided.Root According to coded system and the difference of concrete application, original multimedia file generally with MVO, AVI, MP3, MP4, WMV, MPG, The forms such as RAM, RA, DVD are stored.The video data of original multimedia file, voice data are extracted respectively, specifically Extracting method is existing mature technology, and here is omitted.

Step 2, it is partitioned into from the audio extracted background audio and dubs audio；

The prior art that the function can be achieved is a lot, for example, Pazera Free Audio commercial at present can be used directly The softwares such as Extractor, adobe audition are realized.Wherein, background audio can also be provided by Moviemaking company, because Film company background audio and dubs what is typically made respectively when turning out movies.

Step 3, the audio of dubbing being partitioned into is converted to and dubs text；

Can be by manually being changed or using speech recognition technology automatic conversion.The specific form for dubbing text can be certainly Row definition.In view of the original selection for dubbing often majority of movie and television play, it is therefore necessary to retain original dub as user Option（It is typically set at default option）.The present invention specifically uses following methods：From be partitioned into dub audio in carry Take out the vocal print feature of each role, and will record the text message of the vocal print feature of each role add described in dub in text.With Under be an example that present invention dubs text：

<Languages>Chinese</ languages>

</ film information>

<Role's label>

<Personality>It is bold and generous</ personality>

<Give tacit consent to vocal print>Performer Lu Shu engraves vocal print</ acquiescence vocal print>

</ leading man 1>

……

</ role label>

<Text>

00:00:01-00:00:07 pass plumage (it is arrogant | middling speed | medium)：It is good that I sees face, such as inserts and sells first ear ... by tender

……

</ text>

Step 4, by the video, background audio, dub text and store respectively；

Video, background audio, dub text these three data and can be stored in locally, can also be stored respectively in corresponding same high in the clouds In database, server or different cloud databases, server.

Fig. 3 shows the structural principle of one specific embodiment of multimedia playing apparatus of the present invention, and it is substantially a set of Multimedia order programme system.As shown in figure 3, the device includes four Cloud Servers：Vod server, dub text server, sound Frequency server and video server, and be respectively used to store video, background audio, dub three cloud databases of text.Should The idiographic flow that device provides Multimedia on demand service is as follows：

Vod server obtains the order request of user by the information exchange with intelligent terminal, is deposited according to order request from itself Find the acute fileinfo of institute ordering film in the multimedia file index of storage, the fileinfo include multimedia video, The storage informations such as background audio, the storage address for dubbing text, file size, can also include duration, role of movie and television play etc. Information.

Vod server also dubs configuration by being obtained with the information exchange of intelligent terminal determined by user, described to dub Match somebody with somebody

Putting includes the vocal print feature of each role.Fig. 4 shows the example user interface that configuration is dubbed for determination, passes through point The vocal print feature oneself liked can be chosen for each role by hitting corresponding button in interface.User is not clicked on then as acquiescence vocal print Feature（The usually original vocal print feature dubbed）, next stage option can be ejected after user's click replacement：

A. local vocal print storehouse B. network vocal prints storehouse

Local vocal print feature list is ejected if local vocal print storehouse has been selected to select to user；If network vocal print storehouse has been selected

Ejection input frame fills in vocal print feature title to user, such as can utilize " Liu Dehua ", " Donald duck ", " Zhao Benshan " Role's title for being widely known by the people names corresponding vocal print feature, or the configuration of each vocal print feature is a bit of accordingly to be shown Example audio selects for user's audition.Can also further in configuration is dubbed increase Chinese, English, French etc. dub it is used Languages option, or even can also add the dialect options such as Guangdong language, the south of Fujian Province language, Sichuan words.

Corresponding fileinfo is sent respectively to dub text server, audio server and Video service by vod server Device, while the configuration of dubbing that user determines to be sent to and dubs text server.Dub text server, audio server and regard Frequency server is found out from corresponding database respectively dubs text, background audio, video accordingly.Dubbing text server will Dub that text and user determine dub configuration together with send to audio server.

Text is dubbed described in S3, basis and configuration generation is dubbed and dubs audio, the vocal print that this dubs each role in audio is special Levy and matched with dubbing the vocal print feature of each role in configuration；

Audio server, which will dub text using speech synthesis technique and be converted to, dubs audio accordingly, and be configured to according to dubbing The audio of dubbing of each role assigns corresponding vocal print feature so that dubs in audio the vocal print feature of each role and dubs in configuration The vocal print feature of each role matches.Specific phonetic synthesis can use existing various technologies, such as Chinese invention patent Technology disclosed in CN104485099A, CN105023570A, CN102117614B etc..Can also be combined with translation engine into The conversion of row languages.

The means such as audio server passage time stamp are synthesized the audio of dubbing of generation with background audio, obtain user institute point The personalized audio of playing multimedia.Fig. 5 shows the basic procedure of the present embodiment sound intermediate frequency server Composite tone.

S5, the synchronous broadcasting multimedia video and audio；

Video server and audio server, which transmit video and audio sync to intelligent terminal, to be played.

These are only the present invention a specific embodiment, actually vod server, dub text server, audio clothes Be engaged in device and video server can be same server, and corresponding database can also use same database.With depositing The further development of the technologies such as storage, computing, above-mentioned multi-medium play method can also independently be realized in local intelligent terminal.

Claims

1. a kind of multi-medium play method, it is characterised in that comprise the following steps：

S5, the synchronous broadcasting multimedia video and audio.

2. method as claimed in claim 1, it is characterised in that described to dub configuration also including dubbing used languages.

3. method as claimed in claim 2, it is characterised in that described to dub configuration also including dubbing used dialect type.

4. method as claimed in claim 1, it is characterised in that step S1~S4 is completed by the server of distal end, and step S5 is by this The intelligent terminal on ground is completed, and information exchange can be achieved between the server and intelligent terminal.

5. a kind of multimedia playing apparatus, it is characterised in that including：

Playing module, the multimedia video and audio are played for synchronous.

6. device as claimed in claim 5, it is characterised in that described to dub configuration also including dubbing used languages.

7. device as claimed in claim 6, it is characterised in that described to dub configuration also including dubbing used dialect type.

8. device as claimed in claim 5, it is characterised in that data obtaining module, file acquisition module, dub audio generation mould Block, audio synthesis module are arranged in the server of distal end, and playing module is arranged in local intelligent terminal, the server Information exchange can be achieved between intelligent terminal.

9. a kind of multimedia storage method, it is characterised in that extract video, the audio of original multimedia file first；Then Background audio is partitioned into from the audio extracted and audio is dubbed；The audio of dubbing being partitioned into is converted to and dubs text； By the video, background audio, dub text and store respectively.

10. method as claimed in claim 9, it is characterised in that this method is further comprising the steps of：Sound is dubbed from what is be partitioned into Extract the vocal print feature of each role in frequency, and will record the text message of the vocal print feature of each role add described in dub text In.