CN101174448B

CN101174448B - Talking picture playing method and device, method for generating index file of talking picture

Info

Publication number: CN101174448B
Application number: CN2007101949964A
Authority: CN
Inventors: 谢知非
Original assignee: BEIJING JULI NORTH MICROELECTRONICS Co Ltd
Current assignee: Beijing Juli North Microelectronics Co.,Ltd.
Priority date: 2007-12-10
Filing date: 2007-12-10
Publication date: 2010-09-15
Anticipated expiration: 2027-12-10
Also published as: CN101174448A

Abstract

The invention discloses a play method and a device for an acoustic picture, and an index file generating method for the acoustic picture. Through setting up the corresponding relation of a picture and a text file, the corresponding relation and the attribute information of the picture and the text file are stored into an acoustic picture index file. When the acoustic picture is played, the corresponding acoustic picture index file of the pre-played acoustic picture is acquired, the acquired acoustic picture index file is analyzed to obtain the corresponding relation of the picture and the text file; according to the corresponding relation and the attribute information, the picture and the text file are acquired; the text file is converted into audio frequency and is played synchronously with the picture. The proposal the embodiment of the invention provided effectively reduces the storage space occupied by the acoustic picture which is stored.

Description

Talking picture player method, device and index file of talking picture generation method

Technical field

The present invention relates to the digital electric technical field, relate in particular to a kind of talking picture player method, device and a kind of index file generation method.

Background technology

Along with the development of digital electric technology, talking picture begins to occur and progressively obtain people's approval.The existing digital album (digital photo frame) in market much has the talking picture function, and its picture and the combination of audio frequency mainly are by the audio file that imports picture and be associated with picture, plays in playing pictures then that associated audio file realizes.

Existing talking picture is made, and is by audio files such as storage recording or music mostly, and audio file is together play with the picture that is associated finish.Because audio file generally can take bigger storage space, the talking picture of Sheng Chenging also can take bigger storage space thus.Simultaneously, when needs were recorded relevant recording or music, operation was comparatively complicated, and it is cumbersome to cause making talking picture.

In the prior art, also there is a kind of network digital photo frame, content in the digital album (digital photo frame) (picture and the audio file that is associated) is stored on the webserver, when the digital photo frame of client need be play relevant talking picture, and the audio file that obtains picture and be associated by network.Because the restriction of network transfer speeds may make file transfer such as picture or audio frequency slow excessively, influences the result of broadcast of talking picture, brings inconvenience to the user.

Therefore, talking picture of the prior art exists that to take storage space big, uses underaction problem easily, has limited the popularization and the application of talking picture greatly.

Summary of the invention

The embodiment of the invention provides a kind of talking picture player method, device and index file generation method, and it is big to take storage space in order to talking picture in the solution prior art, uses underaction problem easily.

The talking picture player method that the embodiment of the invention provides comprises:

Obtain the index file of talking picture of talking picture correspondence to be played; Described index file of talking picture comprises: the attribute information of the corresponding relation of picture and text file, the attribute information of described picture and text;

Resolve described index file of talking picture, obtain described picture and text file;

With described text be converted to audio frequency and with described picture synchronous playing.

Also comprise the corresponding relation of picture and audio file and the attribute information of described audio file in the described index file of talking picture;

After resolving described index file of talking picture, also obtain described audio file;

And with described picture, text and described audio file synchronous playing.

Play after being translated into audio file by speech synthesis engine when playing described text.

Described method also comprises the steps:

Set up the corresponding relation of picture and text file;

With the attribute information of described corresponding relation, described picture and the attribute information of text, be stored as index file of talking picture.

The index file of talking picture generation method that the embodiment of the invention provides also comprises: the corresponding relation of setting up picture and audio file;

Also store the corresponding relation of described picture and audio file and the attribute information of described audio file at described index file of talking picture.

The described corresponding relation of setting up the picture and text file comprises:

Set up the one-one relationship between the picture and text file; Perhaps set up the many-one relationship between the picture and text file; Perhaps set up the many-to-many relationship between the picture and text file;

The described corresponding relation of setting up picture and audio file comprises:

Set up the one-one relationship between picture and the audio file; Perhaps set up the many-one relationship between picture and the audio file; Perhaps set up the many-to-many relationship between picture and the audio file.

The attribute information of described picture comprises the stored position information of picture at least;

The attribute information of described text comprises the stored position information of text at least;

The attribute information of described audio file comprises the stored position information of audio file at least.

The attribute information of described picture also comprises storage format, picture size and coded system one of them or the combination in any of picture;

The attribute information of described text also comprises storage format, text size and coded system one of them or the combination in any of text;

The attribute information of described audio file also comprises storage format, audio file size and coded system one of them or the combination in any of audio file.

The talking picture playing device that the embodiment of the invention provides comprises acquiring unit, pretreatment unit and broadcast unit, wherein:

Described acquiring unit is used to obtain the index file of talking picture corresponding with talking picture to be played; Described index file of talking picture comprises: the attribute information of the corresponding relation of picture and text file, the attribute information of described picture and text;

Described pretreatment unit is used to resolve described index file of talking picture, obtains described picture and text file, and sends to described broadcast unit respectively;

Described broadcast unit, be used for described text be converted to audio frequency and with described picture synchronous playing.

The talking picture playing device that the embodiment of the invention provides also comprises: index storage unit is used to store described index file of talking picture;

Described acquiring unit obtains described index file of talking picture from described index storage unit.

The talking picture playing device that the embodiment of the invention provides also comprises: information memory cell is used for picture and stores described text;

Described pretreatment unit obtains described picture and described text, and sends to described broadcast unit respectively from described information memory cell.

Go back storing audio files in the described information memory cell; Described pretreatment unit also obtains audio file from described information memory cell, and sends to described broadcast unit.

Described broadcast unit comprises that further picture shows subelement, speech synthesis engine subelement and voice playing subelement, wherein:

Described picture shows subelement, is used to receive described picture and shows;

Described speech synthesis engine subelement is used to receive described text, is converted into audio file, and sends to described voice playing subelement;

Described voice playing subelement is used to receive audio file and the broadcast that described speech synthesis engine subelement sends; Perhaps receive audio file and broadcast that described pretreatment unit and described speech synthesis engine subelement send.

The talking picture playing device that the embodiment of the invention provides also comprises control sub unit, also is used to control described picture and shows that subelement and voice playing subelement carry out synchronous playing to picture and audio frequency.

The embodiment of the invention is by setting up the corresponding relation of picture and text file; With the attribute information of described corresponding relation, described picture and the attribute information of text, be stored as index file of talking picture.When playing talking picture, resolve index file of talking picture, obtain the corresponding relation of picture and text file; According to described corresponding relation, search the picture and text file; According to the attribute information of described picture and text file, obtain described picture and text file; Described picture and text file synchronization is play.According to the scheme that the embodiment of the invention provides, the user only needs to establish a corresponding index file of talking picture for each talking picture in advance, can get access to picture and text realization synchronous playing by this index file of talking picture; Make things convenient for the user to make and use talking picture; And since text to take storage space less, can greatly reduce the required storage space that takies of talking picture storage.

Description of drawings

The cardinal principle process flow diagram of the index file of talking picture generation method that Fig. 1 provides for the embodiment of the invention;

The talking picture storage organization synoptic diagram that Fig. 2 provides for the embodiment of the invention;

The cardinal principle process flow diagram of the talking picture player method that Fig. 3 provides for the embodiment of the invention;

One of talking picture playing device illustrative view of functional configuration that Fig. 4 provides for the embodiment of the invention;

Two of the talking picture playing device illustrative view of functional configuration that Fig. 5 provides for the embodiment of the invention;

The structural representation of broadcast unit in the talking picture playing device that Fig. 6 provides for the embodiment of the invention;

A kind of talking picture playing device concrete structure synoptic diagram that Fig. 7 provides for the embodiment of the invention;

The hardware design schematic diagram of a kind of talking picture playing device that Fig. 8 provides for the embodiment of the invention.

Embodiment

Be explained in detail to the main realization principle of embodiment of the invention technical scheme, embodiment and to the beneficial effect that should be able to reach below in conjunction with each accompanying drawing.

As shown in Figure 1, the embodiment of the invention at first provides a kind of index file of talking picture generation method, and its cardinal principle flow process is as follows:

Step 11 is for picture is set up corresponding relation with the synchronous documents that is associated.

Talking picture comprises picture and the synchronous documents that is associated, and the synchronous documents in the embodiment of the invention comprises following two kinds of situations:

One, described synchronous documents only is a text;

Two, described synchronous documents comprises text, also comprises audio file simultaneously.

The synchronous documents that hereinafter is mentioned to can be one of above-mentioned two kinds of situations.

In order to keep the compatibility with existing file system, when talking picture is stored in storer, the mode that still adopts picture file and synchronous documents to store respectively.A picture file can be associated with a plurality of synchronous documents, also can be associated with a synchronous documents by a plurality of picture files, can also be associated with a plurality of synchronous documents by a plurality of picture files.The utilization factor of picture file and synchronous documents can be improved like this, and the storage space that is used for picture file and synchronous documents can be effectively saved.

The text related with picture can be converted into voice and realize the talking picture broadcast by speech synthesis engine with text conversion Audiotechnica (TTS, Text ToSpeech) and the text message of inciting somebody to action wherein.

When talking picture is stored, need be according to the related information between the picture that is provided with and the synchronous documents that is associated, set up the corresponding relation of picture and the synchronous documents that is associated.

Step 12, the attribute information with the attribute information of the corresponding relation of picture and the synchronous documents that is associated, picture and the synchronous documents that is associated is stored as index file of talking picture.

After setting up the corresponding relation of picture and the synchronous documents that is associated, need set up index information for talking picture.Common index information is stored in the mode of index file of talking picture, not only need to comprise the corresponding relation of picture and the synchronous documents that is associated in the index file of talking picture, also need to comprise the attribute information of picture file and the attribute information of the synchronous documents that is associated.

The attribute information of picture file described here for example comprises storage format, memory location, size and the coded system etc. of picture file; The be associated attribute information of synchronous documents for example comprises storage format, memory location, size and the coded system etc. of the synchronous documents that is associated.

Index file of talking picture can also can generate when setting up talking picture automatically by the user by manually generating.

Index file of talking picture is logically finished the merging of picture with the synchronous documents that is associated of talking picture, realizes related between picture and the synchronous documents that is associated with the form of index file of talking picture.In actual applications, as shown in Figure 2, picture file is still stored respectively with existing document form with the synchronous documents that is associated, mutually between and onrelevant.With the attribute information of picture file, be stored as index file of talking picture with corresponding form with the synchronous documents that is associated.Owing to comprising attribute information and the corresponding relation of picture in the index file of talking picture, therefore, can obtain talking picture relevant picture and synchronous documents, and then talking picture is play by index file of talking picture with the synchronous documents that is associated.

Preferable, a plurality of index file of talking picture can be formed the index information storehouse, and by the search index information bank, the user need can select the talking picture of broadcast easily.

Accordingly, the embodiment of the invention also provides a kind of talking picture player method, and as shown in Figure 3, this method is specific as follows:

Step 21 is resolved the index file of talking picture of talking picture correspondence to be played, obtains the corresponding relation of picture and the synchronous documents that is associated.

When talking picture is play, at first need to obtain corresponding index file of talking picture.According to the content of index file of talking picture, obtain the corresponding relation of picture and the synchronous documents that is associated.

Index file of talking picture can obtain by the search index information bank.

Step 22 according to described corresponding relation, is searched picture and the synchronous documents that is associated.

According to the corresponding relation of the picture that obtains, further search corresponding picture and synchronous documents with the synchronous documents that is associated.

Here, a picture file can be associated with a plurality of synchronous documents, also can be associated with a synchronous documents by a plurality of picture files, and all right a plurality of picture files and a plurality of synchronous documents are interrelated.

Step 23 according to the attribute information of picture with the synchronous documents that is associated, is obtained picture and the synchronous documents that is associated.

After finding concrete picture and the synchronous documents that is associated, need obtain picture and the synchronous documents that is associated, the foundation of obtaining is the picture of storing in the index file of talking picture and the attribute information of the synchronous documents that is associated.

In the attribute information of the picture of storing in the index file of talking picture and the synchronous documents that is associated, can comprise storage format, memory location, size and the coded system etc. of picture file; The attribute information of the synchronous documents that is associated can comprise storage format, memory location, size and the coded system etc. of the synchronous documents that is associated.

Step 24 is with picture and the synchronous documents synchronous playing that is associated.

According to the attribute information of picture with the synchronous documents that is associated, the synchronous documents that not only can obtain picture easily from corresponding memory location and be associated, and, can also know file size, file memory format and the coded system etc. of picture and the synchronous documents that is associated.According to these attribute informations, can adopt corresponding playing program that talking picture is carried out synchronous playing.

For example, from the attribute information of synchronous documents, can know that synchronous documents is audio file and text or only is text that if comprise audio file in the synchronous documents, then this audio file can directly be play by audio player; For the text in the synchronous documents, then need to call speech synthesis engine with TTS technology, text is converted to audio file, play by audio player then.

Here, when the synchronous documents that is associated with picture in talking picture is text, need calls speech synthesis engine text is converted to audio file with TTS technology.According to the category of language that the difference and the speech synthesis engine of the employed language of text itself are supported, speech synthesis engine can be converted into text the multilingual form, and plays by audio player.For example, demand according to the user, the language that the text in the text can be converted into any speech synthesis engine supports such as English, Russian, French is play, and certainly, also text can be transformed into dialect (as Sichuan words, Guangdong language etc.) and play.

Preferable, when talking picture was play, the content that the user can self-defined broadcast that is to say, the user can select the type and the content of the synchronous documents that is associated with picture in the talking picture, according to the broadcast form of requirement definition talking picture of self.For example: only play wherein audio file, only play text, perhaps displaying audio file and text simultaneously.

Preferable, when the synchronous documents that is associated with picture in talking picture was text, a kind of realization and player method flow process of concrete talking picture were as follows:

When 1, talking picture being stored, at first in storer, deposit picture in, import the text message that picture therewith is associated then, preserve;

2, picture of Bao Cuning and text message can be by after the pre-service, generate index file of talking picture, are kept in the middle of the storer or are uploaded to network, deposit in the network storage server.Pre-service is to set up index file of talking picture for talking picture, and this index file of talking picture has comprised the attribute information of picture and the attribute information of associated text, can be text 1, text 2...... text n.

3, in the time that talking picture will be play, call the index file of talking picture in the index information storehouse.Concrete grammar is: the index file of talking picture that the retrieval previous step is set up, and according to the file in download from storer or network storage server of the information in the index file of talking picture.So just the picture file and the text of association are downloaded.

4, with picture file decoding and export display device to.

5, simultaneously, text message is imported the TTS speech synthesis engine, text message is converted to audio-frequency information, and plays by audio player.

Preferable, the 4th step was used direct memory access (DMA) (DMA, Direct MemoryAccess) technology with the 5th step process, thereby the demonstration of picture and the broadcast of voice can be carried out synchronously, and saved cpu resource.

The embodiment of the invention can be that speech form is play with the stored text file conversion owing to introduce the TTS technology.Thereby, the talking picture in the embodiment of the invention, the synchronous documents that is associated with picture can also comprise text except general audio file (recording or music); When playing talking picture, the text that will be associated with picture by the speech synthesis engine with TTS technology converts corresponding speech form to and plays.Therefore, only need just can realize voice output for the related relevant text of picture.Not only can save the recording process of a large amount of audio files, and can reduce the storage space that talking picture takies greatly, and, because the employed language of text can be any language that system supports, speech synthesis engine with TTS technology also can be play corresponding text by the language that any system supports, makes that the use of talking picture is very flexible.

Correspondingly, the embodiment of the invention also provides a kind of talking picture playing device functional structure as shown in Figure 4, and this device comprises acquiring unit 31, pretreatment unit 32 and broadcast unit 33, and is specific as follows:

Acquiring unit 31 obtains the index file of talking picture corresponding with talking picture to be played.

Index file of talking picture described here comprises: the attribute information of the corresponding relation of picture and the synchronous documents that is associated, the attribute information of picture and the synchronous documents that is associated.The synchronous documents that is associated comprises text, perhaps comprises text and audio file.

Pretreatment unit 32 is used to resolve described index file of talking picture, obtains described picture and the synchronous documents that is associated, and sends to described broadcast unit respectively.

Broadcast unit 33, the text that is used for receiving be converted to audio frequency and with described picture synchronous playing, for the audio file that receives direct synchronous playing then.

Preferable, as shown in Figure 5, above-mentioned talking picture playing device further comprises index storage unit 34 and information memory cell 35, and is specific as follows:

Index storage unit 34 is used to store index file of talking picture.

Information memory cell 35 is used for picture information, and stores the synchronous documents that is associated.

Acquiring unit 31 obtains index file of talking picture from index storage unit 34.

Pretreatment unit 32 obtains picture and relevant connection synchronous documents from information memory cell 35.

Preferable, as shown in Figure 6, the broadcast unit 33 further pictures in the above-mentioned talking picture playing device show subelement 331, speech synthesis engine subelement 332 and voice playing subelement 333, and are specific as follows:

Picture shows subelement 331, is used to receive picture and shows.

Speech synthesis engine subelement 332 is used to receive text file, and the text message in the text is converted into audio frequency, generates corresponding audio files, and sends to voice playing subelement 333.

Voice playing subelement 333 is used to receive the audio file of pretreatment unit 32 and/or 332 transmissions of speech synthesis engine subelement and play.

Preferable, can also comprise control sub unit (not illustrating among Fig. 6), be used to control the synchronous playing that picture shows

subelement

331 and 333 pairs of pictures of voice playing subelement and audio frequency.

Preferable, based on device shown in Figure 4, auxiliary unit additional among Fig. 5 and Fig. 6 can mutually combine, and obtains the more comprehensive talking picture playing device of function.

As shown in Figure 7, a kind of preferable talking picture playing device specific implementation structure is as follows:

Storer is used to provide the corresponding function of above-mentioned index storage unit 34 and information memory cell 35, and the storage talking picture comprises picture and the synchronous documents that is associated.

Obtain/pretreatment module, be used to provide the corresponding function of above-mentioned acquiring unit 31, pretreatment unit 32, for talking picture is set up index file of talking picture.Composing picture, text and audio file in logic.The text here is the synchronous documents that is associated with picture in the talking picture with audio file.Pretreatment module also is responsible for the retrieval index file of talking picture, find the synchronous documents of picture association and decomposite relevant picture, text and audio file, and picture sent to picture driver module in the broadcast unit, text is sent to speech synthesis engine driver module in the broadcast unit, audio file is directly sent to audio conversion driver module in the broadcast unit.

Broadcast unit comprises: picture driver module, speech synthesis engine driver module and audio conversion driver module.Wherein:

The picture driver module is used to provide above-mentioned picture to show the corresponding function of subelement 331, is used for picture is shown.

The speech synthesis engine driver module is used to provide the corresponding function of above-mentioned speech synthesis engine subelement 332, and the text message in the text is converted into audio-frequency information, generates corresponding audio file, and sends to the audio conversion driver module.

The audio conversion driver module is used to provide the corresponding function of above-mentioned voice playing subelement 333, with obtain/audio-frequency information in the audio file that pretreatment module and speech synthesis engine driver module send carries out digital-to-analog conversion, and with the pictorial information synchronous playing.

As shown in Figure 8, a kind of hardware design principle of preferable talking picture playing device is specific as follows:

Memory interface obtains the talking picture data from storer; CPU carries out separating treatment to the form of talking picture data, isolate image, text and audio file, call corresponding picture processing, display driver, text message analyzing and processing and phonetic synthesis driving respectively and carry out phonetic synthesis and coding, audio file is decoded and digital-to-analog conversion, and last image and voice data are transferred to display interface and audio frequency and encoding and decoding interface respectively and show respectively and broadcast.Further, use the DMA technology to guarantee that whole process is good synchronously.

Wherein, need comprise support in the memory interface, for example: various FLASH, various storage card, hard disk and portable hard drive etc. to various memory devices.CPU finishes system control, the analysis of view data and text, decoding, and audio frequency such as synthesizes at function.Display interface is then finished and is received the view data demonstration.Audio frequency and encoding and decoding interface then are that original audio data is carried out digital-to-analog conversion and broadcast.DMA interface is in order to guarantee to allow picture and audio sync smoothness, also saves the interface that cpu resource institute must interpolation simultaneously.

In sum, the scheme that the embodiment of the invention provided has reduced the shared storage space of talking picture storage, and, can satisfy the demand of user's flexible use.

Obviously, those skilled in the art can carry out various changes and modification to the present invention and not break away from the spirit and scope of the present invention.Like this, if of the present invention these are revised and modification belongs within the scope of claim of the present invention and equivalent technologies thereof, then the present invention also is intended to comprise these changes and modification interior.

Claims

1. a talking picture player method is characterized in that, comprising:

2. talking picture player method as claimed in claim 1 is characterized in that, also comprises the corresponding relation of picture and audio file and the attribute information of described audio file in the described index file of talking picture;

And with described picture, described audio file and described by the audio sync broadcast after the text conversion.

3. talking picture player method as claimed in claim 1 or 2 is characterized in that, plays after being translated into audio file by speech synthesis engine when playing described text.

4. a generation method that is applied to the index file of talking picture in the described talking picture player method of claim 1 is characterized in that described method comprises the steps:

Set up the corresponding relation of picture and text file;

5. the generation method of index file of talking picture as claimed in claim 4 is characterized in that, also comprises: the corresponding relation of setting up picture and audio file;

6. the generation method of index file of talking picture as claimed in claim 4 is characterized in that, the described corresponding relation of setting up the picture and text file comprises:

Set up the one-one relationship between the picture and text file; Perhaps set up the many-one relationship between the picture and text file; Perhaps set up the many-to-many relationship between the picture and text file.

7. the generation method of index file of talking picture as claimed in claim 5 is characterized in that, the described corresponding relation of setting up picture and audio file comprises:

8. the generation method of index file of talking picture as claimed in claim 4 is characterized in that, the attribute information of described picture comprises the stored position information of picture at least;

The attribute information of described text comprises the stored position information of text at least.

9. the generation method of index file of talking picture as claimed in claim 5 is characterized in that, the attribute information of described picture comprises the stored position information of picture at least;

10. the generation method of index file of talking picture as claimed in claim 8 is characterized in that, the attribute information of described picture also comprises storage format, picture size and coded system one of them or the combination in any of picture;

The attribute information of described text also comprises storage format, text size and coded system one of them or the combination in any of text.

11. the generation method of index file of talking picture as claimed in claim 9 is characterized in that, the attribute information of described audio file also comprises storage format, audio file size and coded system one of them or the combination in any of audio file.

12. a talking picture playing device is characterized in that this device comprises acquiring unit, pretreatment unit and broadcast unit, wherein,

13. talking picture playing device as claimed in claim 12 is characterized in that, also comprises: index storage unit is used to store described index file of talking picture;

14. talking picture playing device as claimed in claim 13 is characterized in that, also comprises: information memory cell is used for picture and stores described text;

15. talking picture playing device as claimed in claim 14 is characterized in that, goes back storing audio files in the described information memory cell;

Described pretreatment unit also obtains described audio file, and sends to described broadcast unit from described information memory cell.

16., it is characterized in that described broadcast unit comprises that further picture shows subelement, speech synthesis engine subelement and voice playing subelement as claim 14 or 15 described talking picture playing devices, wherein,

17. talking picture playing device as claimed in claim 16 is characterized in that, also comprises: control sub unit is used to control described picture and shows that subelement and voice playing subelement carry out synchronous playing to picture and audio frequency.