CN107172485A

CN107172485A - A kind of method and apparatus for being used to generate short-sighted frequency

Info

Publication number: CN107172485A
Application number: CN201710278060.3A
Authority: CN
Inventors: 门文
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2017-04-25
Filing date: 2017-04-25
Publication date: 2017-09-15
Anticipated expiration: 2037-04-25
Also published as: CN107172485B

Abstract

It is an object of the invention to provide a kind of method and apparatus for being used to generate short-sighted frequency.The present invention obtains the one or more voice messagings of one or more pictures and user to one or more of pictures；According to the content of the voice messaging, it is determined that the caption information corresponding with the voice messaging；According to the phonetic feature and/or semantic feature of the voice messaging, it is determined that with the displaying special efficacy corresponding to the picture and/or the caption information；According to the caption information and the displaying special efficacy, the picture and voice messaging are generated into short-sighted frequency.Compared with prior art, the present invention can improve the diversity of input behavior, add the interesting, intelligent of picture more rich in creatively showing emotion so that the form of expression of user is more enriched and attractive, improves Consumer's Experience.

Description

A kind of method and apparatus for being used to generate short-sighted frequency

Technical field

The present invention relates to field of computer technology, more particularly to a kind of technology for generating short-sighted frequency.

Background technology

In the prior art, input method is mainly deposited including emoji, face word, expression figure, GIF figures etc. in the form of picture Expression, it is main to include with various either statically or dynamically picture shapes and in forum, microblogging etc. under the scene of user-generated content The expression that formula is present.This expression is fixed single picture or GIF figures, and user directly selects transmission when sending expression .

The content contained yet with expression figure is fixed, if user wants to supplement this, or wishes voice and sentiment And it is luxuriant when expressing emotion corresponding with the expression figure, then need individually to input word or voice etc. to aggravate emotional expression. Further, user can also be fabricated separately short-sighted frequency and be used as new expression figure, but general short video production needs to use Family shoots one section of short-sighted frequency and is simultaneously converted into expression figure, or using image processing software by multiple pictures, video, audio or Captions are combined, and this method is excessively complicated for a user, compared to for directly transmitting expression figure, cost of manufacture compared with It is high.Therefore, user is still accustomed to using each independent expression figure, voice, word etc. come expression content, so as to result in expression It is less efficient and form is single.

The content of the invention

It is an object of the invention to provide a kind of method and apparatus for being used to generate short-sighted frequency.

According to an aspect of the invention, there is provided a kind of method for generating short-sighted frequency, wherein, this method include with Lower step：

A obtains the one or more voice messagings of one or more pictures and user to one or more of pictures；

B is according to the content of the voice messaging, it is determined that the caption information corresponding with the voice messaging；

C according to the phonetic feature and/or semantic feature of the voice messaging, it is determined that with the picture and/or the captions Displaying special efficacy corresponding to information；

The picture and voice messaging are generated short-sighted frequency by d according to the caption information and the displaying special efficacy.

Alternatively, the step c includes：

- according to the phonetic feature and/or semantic feature of the voice messaging, with reference to the picture feature according to the picture, It is determined that with the displaying special efficacy corresponding to the picture and/or the caption information.

Alternatively, the step c includes：

- according to the phonetic feature and/or semantic feature of the voice messaging, with reference to the voice length of the voice messaging, It is determined that with the displaying special efficacy corresponding to the picture and/or the caption information.

Alternatively, this method also includes：

X is according to the voice messaging and the picture, it is determined that the one or more related figures being associated to the picture Piece；

Wherein, this method also includes：

- according to the caption information and the displaying special efficacy, the picture, the picture concerned and the voice are believed Breath, generates one or more related short-sighted frequencies.

Alternatively, the step x includes：

X1 determines the picture concerned quantity associated with the picture；

- according to the voice messaging, the picture and the picture concerned quantity, it is determined that associated with the picture One or more picture concerneds.

Alternatively, the step x1 includes following at least any one：

- according to the voice length of the voice messaging, it is determined that the picture concerned quantity associated with the picture；

- according to the phonetic feature of the voice messaging, it is determined that the picture concerned quantity associated with the picture；

- according to the semantic feature of the voice messaging, it is determined that the picture concerned quantity associated with the picture.

Alternatively, this method also includes：

The one or more history voice messagings of-acquisition user, it is determined that the user speech corresponding with the user Feature database；

Wherein, the step c includes：

- according to the user vocal feature storehouse, it is determined that the phonetic feature corresponding with the voice messaging；

- according to the semantic feature of the phonetic feature and/or the voice messaging, it is determined that with the picture and/or described Displaying special efficacy corresponding to caption information.

Alternatively, this method also includes：

The relevant configuration information of-the application according to corresponding to the short-sighted frequency, is a kind of or many by the short-sighted frequency unloading Plant and apply available formats；

- add the short-sighted frequency in the application with the application available formats.

Alternatively, the displaying special efficacy includes one or more dynamic effects.

According to another aspect of the present invention, a kind of generating means for being used to generate short-sighted frequency are additionally provided, wherein, the life Include into device：

Acquisition device, for obtaining one or more pictures and user to one or many of one or more of pictures Individual voice messaging；

Captions determining device, for the content according to the voice messaging, it is determined that the word corresponding with the voice messaging Curtain information；

Special efficacy determining device, for the phonetic feature and/or semantic feature according to the voice messaging, it is determined that with the figure Displaying special efficacy corresponding to piece and/or the caption information；

Video-generating device, for according to the caption information and the displaying special efficacy, the picture and voice to be believed The short-sighted frequency of breath generation.

Alternatively, the special efficacy determining device is used for：

Alternatively, the generating means also include：

Picture concerned determining device, for according to the voice messaging and the picture, it is determined that related to the picture One or more picture concerneds of connection；

Wherein, the generating means also include：

Associated video generating means, for according to the caption information and the displaying special efficacy, by the picture, described Picture concerned and the voice messaging, generate one or more related short-sighted frequencies.

Alternatively, the picture concerned determining device includes：

Quantity determining unit, for determining the picture concerned quantity associated with the picture；

Determining unit is associated, for according to the voice messaging, the picture and the picture concerned quantity, it is determined that with The associated one or more picture concerneds of the picture.

Alternatively, the quantity determining unit is used for following at least any one：

Alternatively, the generating means also include：

History acquisition device, one or more history voice messagings for obtaining the user, it is determined that with the user Corresponding user vocal feature storehouse；

Wherein, the special efficacy determining device is used for：

Alternatively, the generating means also include：

Transferring device, for the relevant configuration information of the application according to corresponding to the short-sighted frequency, the short-sighted frequency is turned Save as one or more application available formats；

Adding set, for the short-sighted frequency to be added in the application with the application available formats.

According to another aspect of the invention, a kind of input equipment, including the generation described in any one as described above are additionally provided Device.

Compared with prior art, the present invention obtains one or more pictures and user to one or more of pictures One or more voice messagings；According to the content of the voice messaging, it is determined that the caption information corresponding with the voice messaging； According to the phonetic feature and/or semantic feature of the voice messaging, it is determined that with corresponding to the picture and/or the caption information Displaying special efficacy；According to the caption information and the displaying special efficacy, the picture and voice messaging are generated into short-sighted frequency.From And, the present invention by the phonetic feature according to the voice messaging and/or meaning of one's words feature, it is determined that with the picture and/or described Displaying special efficacy corresponding to caption information, short-sighted frequency is converted to by picture, can be improved more rich in creatively showing emotion The diversity of input behavior, adds the interesting, intelligent of picture so that the form of expression of user is more enriched and had Attraction, improves Consumer's Experience.

Moreover, the present invention can also be according to the phonetic feature and/or semantic feature of the voice messaging, with reference to according to described The picture feature of picture, it is determined that with the displaying special efficacy corresponding to the picture and/or the caption information；Or, according to described The phonetic feature and/or semantic feature of voice messaging, with reference to the voice length of the voice messaging, it is determined that with the picture and/ Or the displaying special efficacy corresponding to the caption information.So as to show special efficacy and picture with voice more the invention enables identified Laminating, improves expression effect and visuality, further increases attraction, improve Consumer's Experience.

Moreover, the present invention can also be according to the voice messaging and the picture, it is determined that associated with the picture One or more picture concerneds；According to the caption information and the displaying special efficacy, by the picture, the picture concerned with The voice messaging, generates one or more related short-sighted frequencies.So as to which it is short-sighted that the present invention can generate a variety of correlations for user Frequently, the operation that user finds picture is reduced, the efficiency for obtaining information is improved there is provided more being selected to user, further Attraction is improved, Consumer's Experience is improved.

Moreover, the present invention can also obtain one or more history voice messagings of the user, it is determined that with the user Corresponding user vocal feature storehouse；According to the user vocal feature storehouse, it is determined that the voice corresponding with the voice messaging Feature；According to the semantic feature of the phonetic feature and/or the voice messaging, it is determined that with the picture and/or the captions Displaying special efficacy corresponding to information.So as to, the invention enables the phonetic feature extracted is more accurate, identified displaying special efficacy Also the demand of user is just more met.

Moreover, the present invention can also be according to the relevant configuration information of the application corresponding to the short-sighted frequency, will be described short-sighted Frequency unloading is one or more application available formats；The short-sighted frequency is added in the application with the application available formats In.So as to which the present invention enriches the form of expression using interior information representation so that the information representation form of user is abundant and more has Attraction.For example, can be added short-sighted frequency as expression etc., therefore, the present invention can allow user when seeing expression figure Voice is heard, and combines displaying special efficacy to understand the emotional expression of other side, makes the expression dynamic image form of addition true man's voice Abundant and more attractive.

Brief description of the drawings

By reading the detailed description made to non-limiting example made with reference to the following drawings, of the invention is other Feature, objects and advantages will become more apparent upon：

Fig. 1 shows a kind of generating means schematic diagram for being used to generate short-sighted frequency according to one aspect of the invention；

Fig. 2 shows a kind of generating means signal for being used to generate short-sighted frequency according to a preferred embodiment of the present invention Figure；

Fig. 3 shows a kind of method flow diagram for being used to generate short-sighted frequency according to a further aspect of the present invention；

Fig. 4 shows a kind of method flow diagram for being used to generate short-sighted frequency according to a preferred embodiment of the present invention.

Same or analogous reference represents same or analogous part in accompanying drawing.

Embodiment

It should be mentioned that some exemplary embodiments are described as before exemplary embodiment is discussed in greater detail The processing described as flow chart or method.Although operations are described as the processing of order by flow chart, therein to be permitted Multioperation can be implemented concurrently, concomitantly or simultaneously.In addition, the order of operations can be rearranged.When it The processing can be terminated when operation is completed, it is also possible to the additional step being not included in accompanying drawing.The processing It can correspond to method, function, code, subroutine, subprogram etc..

Alleged within a context " generating means " are " computer equipment ", also referred to as " computer ", and referring to can be by fortune Row preset program instructs to perform the intelligent electronic device of the predetermined process process such as numerical computations and/or logical calculated, and it can So that including processor and memory, the survival prestored in memory by computing device instructs to perform predetermined process process, Or predetermined process process is performed by hardware such as ASIC, FPGA, DSP, or realized by said two devices combination.

The computer equipment includes user equipment and/or the network equipment.Wherein, the user equipment includes but is not limited to Computer, smart mobile phone, PDA etc.；The network equipment includes but is not limited to single network server, multiple webservers composition Server group or the cloud being made up of a large amount of computers or the webserver based on cloud computing (Cloud Computing), its In, cloud computing is one kind of Distributed Calculation, a super virtual computer being made up of the computer collection of a group loose couplings. Wherein, the computer equipment can isolated operation realize the present invention, also can access network and by being counted with other in network The interactive operation of machine equipment is calculated to realize the present invention.Wherein, the network residing for the computer equipment includes but is not limited to interconnection Net, wide area network, Metropolitan Area Network (MAN), LAN, VPN etc..

Those skilled in the art will be understood that heretofore described " generating means " can only be user equipment, i.e., by User equipment performs corresponding operation；Can also be constituted by user equipment is integrated with the network equipment or server, i.e., Corresponding operation is performed by user equipment and the network equipment are engaged.

It should be noted that the user equipment, the network equipment and network etc. are only for example, other are existing or from now on may be used The computer equipment or network that can occur such as are applicable to the present invention, should also be included within the scope of the present invention, and to draw It is incorporated herein with mode.

It should be noted that preferably, " generating means " of the present invention can be included in various kinds of equipment (as input is set It is standby), types of applications (such as input method), or in the device comprising types of applications (as be included in input method in device).Wherein, originally The described generating means of invention can be attached to the computer equipment in advance by the production firm or sales service business of computer equipment, Also from server computer equipment can be loaded into by computer equipment.Those skilled in the art will be understood that any can be used in fact The device of function in the existing present invention, regardless of whether being loaded in computer equipment, is all contained in protection scope of the present invention It is interior.

Here, those skilled in the art will be understood that present invention can apply to mobile terminal and non-moving end, used for example, working as When family uses mobile phone or PC, it can be provided and be presented using method or apparatus of the present invention.

Concrete structure and function detail disclosed herein are only representational, and are for describing showing for the present invention The purpose of example property embodiment.But the present invention can be implemented by many alternative forms, and it is not interpreted as It is limited only by the embodiments set forth herein.

Although it should be appreciated that may have been used term " first ", " second " etc. herein to describe unit, But these units should not be limited by these terms.It is used for the purpose of using these terms by a unit and another unit Make a distinction.For example, in the case of the scope without departing substantially from exemplary embodiment, it is single that first module can be referred to as second Member, and similarly second unit can be referred to as first module.Term "and/or" used herein above include one of them or Any and all combination of more listed associated items.

Term used herein above is not intended to limit exemplary embodiment just for the sake of description specific embodiment.Unless Context clearly refers else, and otherwise singulative " one " used herein above, " one " also attempt to include plural number.Should also When understanding, term " comprising " and/or "comprising" used herein above provide stated feature, integer, step, operation, The presence of unit and/or component, and do not preclude the presence or addition of other one or more features, integer, step, operation, unit, Component and/or its combination.

It should further be mentioned that in some replaces realization modes, the function/action being previously mentioned can be according to different from attached The order indicated in figure occurs.For example, depending on involved function/action, the two width figures shown in succession actually may be used Substantially simultaneously to perform or can perform in a reverse order sometimes.

The present invention is described in further detail below in conjunction with the accompanying drawings.

Fig. 1 shows a kind of generating means schematic diagram for being used to generate short-sighted frequency according to one aspect of the invention；Wherein, institute Stating generating means includes acquisition device 1, captions determining device 2, special efficacy determining device 3, video-generating device 4.

Specifically, the acquisition device 1 obtains one or more pictures and user to one or more of pictures One or more voice messagings；The captions determining device 2 according to the content of the voice messaging, it is determined that with the voice messaging Corresponding caption information；The special efficacy determining device 3 is according to the phonetic feature and/or semantic feature of the voice messaging, really The fixed displaying special efficacy with corresponding to the picture and/or the caption information；The video-generating device 4 is believed according to the captions Breath and the displaying special efficacy, short-sighted frequency is generated by the picture and voice messaging.

The acquisition device 1 obtains one or more pictures and user to one or many of one or more of pictures Individual voice messaging.

Specifically, the acquisition device 1 can pass through upload or shooting etc. by calling built-in default picture, obtaining user Mode provided picture, picture, download pictures etc. by way of download are searched from network by way of search One or more modes, obtain one or more pictures.The picture can be obtained in real time, can also be obtained in advance.The figure Piece includes static images (such as using the picture of jpg, bmp form) and/or dynamic picture (such as using the figure of gif forms Piece).

The acquisition device 1 is by real-time recording or calls the modes such as history recording, obtains user to one or many One or more voice messagings of individual picture.Here, those skilled in the art will be understood that a pictures can correspond to one Or multiple voice messagings, a voice messaging may correspond to one or more picture.The pass of the picture and voice messaging Connection relation can be determined according to the setting of user.

For example, the user have selected it is built-in in a pictures, then press record button, recorded one section of voice letter Breath, then the voice messaging is corresponding with the picture；Then, user recorded one section of voice messaging again, then this two sections of voice letters Breath is corresponding with the picture.

Or, for example, the user has downloaded two pictures from the Internet, then this two pictures of simultaneous selection, and with Another section of voice messaging is associated, then this section of voice messaging is i.e. simultaneously associated with this two pictures.

The captions determining device 2 is according to the content of the voice messaging, it is determined that the word corresponding with the voice messaging Curtain information.

Specifically, the captions determining device 2 identifies the voice content of the voice messaging, so by speech recognition Afterwards, it is determined that the word corresponding with the voice content recognized, to be used as the caption information corresponding with the voice messaging.

Preferably, the captions determining device 2 can be combined with the length of the voice messaging, to determine whether to described Caption information carries out branch etc.；The captions determining device 2 can be according to the voice content of the voice messaging, with reference to institute's predicate Phonetic feature in message breath, such as tone, rhythm, to determine the contents such as punctuate, branch in the caption information；The word Curtain determining device 2 can also be interacted with the user, check and correction input function be provided for the user, in order to the user The caption information is proofreaded.

Preferably, the captions determining device 2 can voluntarily carry out the analysis of phonetic feature, voice length etc.；Can also Interacted with the special efficacy determining device 3, according to the special efficacy determining device 3 to phonetic feature/semantic feature/displaying special efficacy etc. Feedback, carrys out the contents such as branch, the punctuate of caption information described in iteration adjustment.

The special efficacy determining device 3 according to the phonetic feature and/or semantic feature of the voice messaging, it is determined that with the figure Displaying special efficacy corresponding to piece and/or the caption information.

Specifically, described 3 pairs of voice messagings of special efficacy determining device are analyzed, to determine the language of the voice messaging Sound feature and/or semantic feature.

Wherein, the phonetic feature includes but is not limited to tone, rhythm, tone color etc.；For example by the voice messaging Waveform analysis, learn volume up-down change and/or rhythm etc. of the voice messaging；Pass through the frequency spectrum to the voice messaging And/or the analysis of language spectrum, learn the tone color of the voice messaging, it is such as rough, tapering, overcast, milk gas, clear and melodious；Due to described Tone, rhythm, tone color of voice messaging etc. are continually changing, and the change of user's tone can also be determined according to above-mentioned change Change, volume or reduction volume are such as improved suddenly.

The semantic feature is the implication of the voice of the user.Positive feelings are expressed if " I is very glad " Thread, " this thing is bad " then expresses mood of passiveness etc..

Then, the special efficacy determining device 3 is according to the phonetic feature and/or semantic feature of the voice messaging, from preset Special efficacy in select one or more using as the displaying special efficacy corresponding with the picture and/or the caption information, or By being interacted with server or other third party devices, the exhibition corresponding with the picture and/or the caption information is obtained Show special efficacy.

Wherein, the displaying special efficacy includes acting on the displaying special efficacy of the picture, acts on the caption information Displaying special efficacy or the displaying special efficacy for acting on the picture and caption information simultaneously.Include but is not limited in the displaying special efficacy Static effects and/or dynamic effect.Wherein, the static effects of the caption information are acted on such as font, color, are acted on The static effects of the picture are such as accessory decoration picture, accessory decoration word, increase picture texture, picture discoloration.It is described Dynamic effect includes but is not limited to gradual change, floating, flicker etc..

If for example, the phonetic feature of the user is small when representing big during user's sound of speaking, displaying captions can be with speaking Small one and large one is continually changing；If the phonetic feature of the user represents the tone color milk sound milk gas that user speaks, Hua Kangwa is shown Baby's body captions etc..

For example, the semantic feature to the user is analyzed, if user says " liking you ", occur one on picture or captions The flickering heart；If user says " good night ", expression figure adds gradual change mask to gradually become the effect that black is turned off the light to realize.

Preferably, the special efficacy determining device 3 is according to the phonetic feature and/or semantic feature of the voice messaging, with reference to According to the picture feature of the picture, it is determined that with the displaying special efficacy corresponding to the picture and/or the caption information.

Specifically, the special efficacy determining device 3 can also be analyzed the picture, to determine the picture of the picture Feature, wherein, the picture feature includes but is not limited to picture name, caption, picture color, multidate information, and (such as gif is moved State picture), image content etc..

Then, the special efficacy determining device 3 is according to the phonetic feature and/or semantic feature of the voice messaging, with reference to root According to the picture feature of the picture, above-mentioned many factors are considered, to determine and the picture and/or the caption information Corresponding displaying special efficacy.

If for example, the picture feature is that picture color is dark, and the phonetic feature of user is that tone is brisk, then is determined Displaying special efficacy be：The captions are shown in the form of jumping, and shallower profile of color etc. is added for captions.

If for example, the picture feature is has contained heart pattern in picture, when user says " liking you ", no longer " will occur a flickering heart on picture or captions " as displaying special efficacy, but rose is added on picture.

Preferably, the special efficacy determining device 3 is according to the phonetic feature and/or semantic feature of the voice messaging, with reference to The voice length of the voice messaging, it is determined that with the displaying special efficacy corresponding to the picture and/or the caption information.

Specifically, the voice length is the duration of the voice messaging.The special efficacy determining device 3 can consider On the basis of the voice length, to be determined for compliance with the displaying special efficacy of the phonetic feature and/or semantic feature.

If for example, the semantic length of a voice is 3 seconds, and some Dynamic Display special efficacy circulation primary needs 5 seconds, then not Using the displaying special efficacy；If conversely, some Dynamic Display special efficacy circulation primary need 3 seconds, can be with reference to the phonetic feature And/or semantic feature come determine whether select the displaying special efficacy.

The video-generating device 4 believes the picture and voice according to the caption information and the displaying special efficacy The short-sighted frequency of breath generation.

Specifically, the video-generating device 4 by the picture and the voice messaging generate together comprising voice with The short-sighted frequency of image, and the caption information and the displaying special efficacy are added in the short-sighted frequency.If for example, described image For dynamic image, one can be generated and contain caption information and show the short-sighted frequency of special efficacy, the broadcasting process and language of picture Sound process is consistent；If described image is still image, one can be generated and contain caption information and show the short of special efficacy Video, picture can be as background, and is dynamically caption information and displaying special efficacy, and dynamic part is with voice process Carry out.

The short-sighted frequency can be saved, collects, send.

Preferably, the generating means also include history acquisition device (not shown), wherein, the history acquisition device is obtained One or more history voice messagings of the user are taken, it is determined that the user vocal feature storehouse corresponding with the user；It is described Special efficacy determining device 3 is according to the user vocal feature storehouse, it is determined that the phonetic feature corresponding with the voice messaging；According to institute State the semantic feature of phonetic feature and/or the voice messaging, it is determined that with corresponding to the picture and/or the caption information Show special efficacy.

Specifically, the history acquisition device with user mutual with obtaining the one or more of the user by directly being gone through History voice messaging, or the equipment that can provide the user's history voice messaging with other interact, to obtain the user's One or more history voice messagings.Here, the history voice messaging can be user in other scenes or other application The voice messaging provided；The history voice messaging can correspond to phonetic feature have confirmed that or unacknowledged and/or language Adopted feature.

Then, the history acquisition device sets up the user corresponding with the user according to the history voice messaging Phonetic feature storehouse, for example, by the analysis to multiple history voice messagings and statistics, obtains the conventional tone, non-general of the user Logical tone, tone color, rhythm etc., to set up the user vocal feature storehouse corresponding with the user.

Then, the special efficacy determining device 3 can be according to the user vocal feature storehouse, by by the current of the user Voice matches or contrasted with the user vocal feature storehouse, to determine the voice corresponding with the current speech information of the user Feature.

Then, the special efficacy determining device 3 is according to the semantic feature of the phonetic feature and/or the voice messaging, really The fixed displaying special efficacy with corresponding to the picture and/or the caption information.

Preferably, the generating means also include transferring device (not shown) and adding set (not shown)；Wherein, it is described The relevant configuration information of application of the transferring device according to corresponding to the short-sighted frequency, is one or more by the short-sighted frequency unloading Using available formats；The adding set adds the short-sighted frequency in the application with the application available formats.

Specifically, the transferring device can according to the relevant configuration information of the application corresponding to the short-sighted frequency, it is determined that This applies required one or more application available formats；And be the application available formats by the short-sighted frequency unloading；Example Such as, can be dynamic picture class form by the short-sighted frequency unloading, to be used as dynamic picture table if the application is input method Feelings；Can be one or more available video forms by its unloading, to be taken place frequently as short-sighted if the application is microblogging etc. Send.

Then, the adding set by the short-sighted frequency with the application available formats add in the application, for Family carries out subsequent calls.

Fig. 2 shows a kind of generating means signal for being used to generate short-sighted frequency according to a preferred embodiment of the present invention Figure；Wherein, the generating means include acquisition device 1 ', captions determining device 2 ', special efficacy determining device 3 ', video-generating device 4 ', picture concerned determining device 5 ', associated video generating means 6 '.

Specifically, the acquisition device 1 ' obtains one or more pictures and user to one or more of pictures One or more voice messagings；The captions determining device 2 ' is according to the content of the voice messaging, it is determined that believing with the voice The corresponding caption information of manner of breathing；The special efficacy determining device 3 ' according to the phonetic feature and/or semantic feature of the voice messaging, It is determined that with the displaying special efficacy corresponding to the picture and/or the caption information；The video-generating device 4 ' is according to the word Curtain information and the displaying special efficacy, short-sighted frequency is generated by the picture and voice messaging；The picture concerned determining device 5 ' According to the voice messaging and the picture, it is determined that the one or more picture concerneds associated with the picture；The phase Video-generating device 6 ' is closed according to the caption information and the displaying special efficacy, by the picture, the picture concerned and institute Voice messaging is stated, one or more related short-sighted frequencies are generated.

The picture concerned determining device 5 ' is according to the voice messaging and the picture, it is determined that related to the picture One or more picture concerneds of connection.

Specifically, the picture concerned determining device 5 ' is according to the phonetic feature and/or semantic feature of the voice messaging, And with reference to the picture feature of the picture, come determine with including above-mentioned voice messaging and the picture perhaps in feature it is associated One or more picture concerneds.

Wherein, the phonetic feature includes but is not limited to tone, rhythm, tone color etc.；The semantic feature is the use The implication of the voice at family.The picture feature includes but is not limited to picture name, caption, picture color, dynamic letter Cease (such as gif dynamic pictures), image content.

Wherein, the picture concerned with including the voice messaging/picture perhaps on theme it is associated；Or, institute State picture concerned and the picture associated etc. on tone.

If for example, the voice messaging is：It is " very good！", the picture be " applause ", then can recommend with " very good " be Other pictures of theme, such as " tilting thumb ", " cheer ", or can select the applause picture that is the theme with different role or Similar picture, such as " this base of rabbit is applauded ", " this basic point of rabbit is praised ", " ma's AC thumb up ".

Preferably, the picture concerned determining device 5 ' can also further filter out excellent from selected picture concerned Select picture concerned.For example, after upper example, only can pick out the picture concerned of same subject, as preferred picture concerned, such as " this base of rabbit is applauded " and " this basic point of rabbit is praised "；Or, the similar picture of tone in picture concerned can be picked out, as excellent Picture concerned is selected, such as there are identical background colour or theme colors.

The associated video generating means 6 ' are according to the caption information and the displaying special efficacy, by the picture, institute Picture concerned and the voice messaging are stated, one or more related short-sighted frequencies are generated.

Specifically, the associated video generating means 6 ' can be true for picture institute according to the special efficacy determining device 3 ' Fixed displaying special efficacy, comes the caption information and the displaying special efficacy, by the picture, the picture concerned and the voice Information, generates one or more related short-sighted frequencies.Or, the associated video generating means 6 ' can by the picture concerned, Voice messaging, the picture issue the captions determining device 2 ' (as shown in Figure 2) again, for the captions determining device 2 ' Displaying special efficacy is redefined for the above, where it determines that method and the determination side of corresponding intrument in Fig. 1 of the displaying special efficacy Method is same or similar, therefore will not be repeated here.

Then, the associated video generating means 6 ' are according to the caption information and the displaying special efficacy, by the figure Piece, the picture concerned and the voice messaging, generate one or more related short-sighted frequencies.

Wherein, the related short-sighted frequency can correspond to " caption information, displaying special efficacy, the picture concerned, an institute State voice messaging ", will the caption information, the displaying special efficacy, the voice messaging be respectively added to certain picture concerned In, to generate related short-sighted frequency；

The related short-sighted frequency is also can correspond to " caption information, displaying special efficacy, multiple described picture concerneds, institute's predicate Message ceases ", will the caption information, the displaying special efficacy, the voice messaging be added in multiple picture concerneds so that it is many Opening picture concerned can continuously play, to form a short-sighted frequency of dynamical correlation；

The related short-sighted frequency is also can correspond to " caption information, displaying special efficacy, the picture and one or more institute State picture concerned, the voice messaging ", will the picture and one or more picture concerned as pending picture, And the caption information, the displaying special efficacy, the voice messaging are added in above-mentioned pending picture so that multiple are treated The picture of processing can be played continuously, to form short-sighted frequency of dynamical correlation etc..

Preferably, the picture concerned determining device 5 ' includes quantity determining unit (not shown) and association determining unit (not shown)；Wherein, the quantity determining unit determines the picture concerned quantity associated with the picture；The association is determined Unit is according to the voice messaging, the picture and the picture concerned quantity, it is determined that one associated with the picture Or multiple picture concerneds.

Specifically, the quantity determining unit is by way of according to pre-seting, it is determined that the phase associated with the picture Close picture number；Or, it is highly preferred that based on one or more of mode, it is determined that the related figure being associated to the picture Piece quantity：

- according to the voice length of the voice messaging, it is determined that the picture concerned quantity associated with the picture：For example, If the voice length is 5 seconds, the picture concerned quantity is defined as 5；If the voice length is 10 seconds, the correlation Picture number is defined as 10；

- according to the phonetic feature of the voice messaging, it is determined that the picture concerned quantity associated with the picture：For example, If the phonetic feature shows that the intonation that there occurs 2 times or multiple converts (such as high pitch turns bass, and bass turns high pitch etc.), then improve Identified picture concerned quantity；

- according to the semantic feature of the voice messaging, it is determined that the picture concerned quantity associated with the picture：For example, If the semantic feature, which is shown, contains multiple semantic key words, different correlations can be determined according to different keywords Picture, therefore, picture concerned quantity can be more.

The association determining unit is in the phonetic feature and/or semantic feature according to the voice messaging, and with reference to described On the basis of the picture feature of picture, come determine with including above-mentioned voice messaging and the picture perhaps in feature it is associated, Meet one or more picture concerned of above-mentioned picture concerned quantitative requirement.

Fig. 3 shows a kind of method flow diagram for being used to generate short-sighted frequency according to a further aspect of the present invention.Specifically, exist In step S1, the generating means obtain one or more pictures and user to one or many of one or more of pictures Individual voice messaging；In step s 2, the generating means are according to the content of the voice messaging, it is determined that with the voice messaging phase Corresponding caption information；In step s3, phonetic feature and/or semantic spy of the generating means according to the voice messaging Levy, it is determined that with the displaying special efficacy corresponding to the picture and/or the caption information；In step s 4, the generating means root According to the caption information and the displaying special efficacy, the picture and voice messaging are generated into short-sighted frequency.

In step sl, the generating means obtain one or more pictures and user to one or more of pictures One or more voice messagings.

Specifically, in step sl, the generating means can be upper by calling built-in default picture, acquisition user to pass through Picture that the mode for passing or shooting etc. is provided, search from network by way of search picture, by way of download The one or more mode such as download pictures, obtains one or more pictures.The picture can be obtained in real time, can also be obtained in advance Take.The picture includes static images (such as using the picture of jpg, bmp form) and/or dynamic picture (such as uses gif lattice The picture of formula).

In step sl, the generating means are by real-time recording or call the modes such as history recording, obtain user to institute State one or more voice messagings of one or more pictures.Here, those skilled in the art will be understood that a pictures can be with Corresponding to one or more voice messagings, a voice messaging may correspond to one or more picture.The picture and language The incidence relation of message breath can be determined according to the setting of user.

In step s 2, the generating means are according to the content of the voice messaging, it is determined that relative with the voice messaging The caption information answered.

Specifically, in step s 2, the generating means are by speech recognition, in the voice for identifying the voice messaging Hold, then, it is determined that the word corresponding with the voice content recognized, to be used as the captions letter corresponding with the voice messaging Breath.

Preferably, in step s 2, the generating means can be combined with the length of the voice messaging, to determine whether Branch etc. is carried out to the caption information；In step s 2, the generating means can be in the voice according to the voice messaging Hold, with reference to the phonetic feature in the voice messaging, such as tone, rhythm, to determine the punctuate in the caption information, branch Etc. content；The generating means can also be interacted with the user, and check and correction input function is provided for the user, in order to The user proofreads to the caption information.

Preferably, in step s 2, the generating means can voluntarily carry out the analysis of phonetic feature, voice length etc.； Can also be special to phonetic feature/semantic feature/displaying according to the step S3 with being fed back from step S3 implementing result The feedback of effect etc., carrys out the contents such as branch, the punctuate of caption information described in iteration adjustment.

In step s3, the generating means are according to the phonetic feature and/or semantic feature of the voice messaging, it is determined that with Displaying special efficacy corresponding to the picture and/or the caption information.

Specifically, in step s3, the generating means are analyzed the voice messaging, to determine that the voice is believed The phonetic feature and/or semantic feature of breath.

Then, in step s3, the generating means are according to the phonetic feature and/or semantic feature of the voice messaging, Select one or more using special as the displaying corresponding with the picture and/or the caption information from preset special efficacy Effect, or by being interacted with server or other third party devices, obtain relative with the picture and/or the caption information The displaying special efficacy answered.

Preferably, in step s3, phonetic feature and/or semantic spy of the generating means according to the voice messaging Levy, with reference to the picture feature according to the picture, it is determined that special with displaying corresponding to the picture and/or the caption information Effect.

Specifically, in step s3, the generating means can also be analyzed the picture, to determine the picture Picture feature, wherein, the picture feature includes but is not limited to picture name, caption, picture color, multidate information (such as gif dynamic pictures), image content.

Then, in step s3, the generating means are according to the phonetic feature and/or semantic feature of the voice messaging, With reference to the picture feature according to the picture, above-mentioned many factors are considered, to determine and the picture and/or the word Displaying special efficacy corresponding to curtain information.

Preferably, in step s3, phonetic feature and/or semantic spy of the generating means according to the voice messaging Levy, with reference to the voice length of the voice messaging, it is determined that special with displaying corresponding to the picture and/or the caption information Effect.

Specifically, the voice length is the duration of the voice messaging.In step s3, the generating means can be with On the basis of the voice length is considered, to be determined for compliance with the displaying special efficacy of the phonetic feature and/or semantic feature.

In step s 4, the generating means are according to the caption information and the displaying special efficacy, by the picture with Voice messaging generates short-sighted frequency.

Specifically, in step s 4, the picture and the voice messaging are generated include together by the generating means The short-sighted frequency of voice and image, and the caption information and the displaying special efficacy are added in the short-sighted frequency.If for example, institute Image is stated for dynamic image, one can be generated and contain caption information and show the short-sighted frequency of special efficacy, the broadcasting of picture is entered Journey is consistent with voice process；If described image is still image, one can be generated and contain caption information and displaying spy The short-sighted frequency of effect, picture can be as background, and is dynamically caption information and displaying special efficacy, and dynamic part is with language Sound process is carried out.

The short-sighted frequency can be saved, collects, send.

Preferably, methods described also includes step S7 (not shown), wherein, in the step s 7, the generating means are obtained for institute One or more history voice messagings of the user are taken, it is determined that the user vocal feature storehouse corresponding with the user；In step In rapid S3, the generating means are according to the user vocal feature storehouse, it is determined that the phonetic feature corresponding with the voice messaging； According to the semantic feature of the phonetic feature and/or the voice messaging, it is determined that with the picture and/or caption information institute Corresponding displaying special efficacy.

Specifically, in the step s 7, the generating means are by obtaining one of the user directly with user mutual Or multiple history voice messagings, or the equipment that can provide the user's history voice messaging with other interacts, to obtain State one or more history voice messagings of user.Here, the history voice messaging can be user other scenes or its Voice messaging provided in his application；The history voice messaging can correspond to phonetic feature have confirmed that or unacknowledged And/or semantic feature.

Then, in the step s 7, the generating means are set up corresponding with the user according to the history voice messaging User vocal feature storehouse, for example, by the analysis to multiple history voice messagings and statistics, obtaining the conventional sound of the user Tune, non-generic tone, tone color, rhythm etc., to set up the user vocal feature storehouse corresponding with the user.

Then, in the step s 7, the generating means can be according to the user vocal feature storehouse, by by the user Current speech match or contrast with the user vocal feature storehouse, with determine it is corresponding with the current speech information of the user Phonetic feature.

Then, in step s3, the generating means are special according to the semanteme of the phonetic feature and/or the voice messaging Levy, it is determined that with the displaying special efficacy corresponding to the picture and/or the caption information.

Preferably, methods described also includes step S8 (not shown) and step S9 (not shown)；Wherein, in step s 8, The relevant configuration information of application of the generating means according to corresponding to the short-sighted frequency, by the short-sighted frequency unloading to be a kind of or A variety of application available formatses；In step s 9, the generating means add the short-sighted frequency with the application available formats In the application.

Specifically, in step s 8, the generating means can match somebody with somebody according to the correlation of the application corresponding to the short-sighted frequency Confidence ceases, and determines that this applies required one or more application available formats；And be the application by the short-sighted frequency unloading Available formats；If for example, the application be input method, can will the short-sighted frequency unloading be dynamic picture class form, using as Dynamic picture is expressed one's feelings；Can be one or more available video forms by its unloading if the application is microblogging etc., using as Short-sighted take place frequently is sent.

Then, in step s 9, the generating means add the short-sighted frequency described with the application available formats In, subsequent calls are carried out for user.

Specifically, in step S1 ', the generating means obtain one or more pictures and user to one or One or more voice messagings of multiple pictures；In step S2 ', the generating means according to the content of the voice messaging, It is determined that the caption information corresponding with the voice messaging；In step S3 ', the generating means are according to the voice messaging Phonetic feature and/or semantic feature, it is determined that with the displaying special efficacy corresponding to the picture and/or the caption information；In step In S3 ', the generating means generate the picture and voice messaging short according to the caption information and the displaying special efficacy Video；In step S5 ', the generating means are according to the voice messaging and the picture, it is determined that related to the picture One or more picture concerneds of connection；In step S6 ', the generating means are special according to the caption information and the displaying Effect, by the picture, the picture concerned and the voice messaging, generates one or more related short-sighted frequencies.

In step S5 ', the generating means according to the voice messaging and the picture, it is determined that with the picture phase One or more picture concerneds of association.

Specifically, in step S5 ', phonetic feature and/or semantic spy of the generating means according to the voice messaging Levy, and with reference to the picture feature of the picture, come determine to including above-mentioned voice messaging and the picture perhaps in feature it is related One or more picture concerneds of connection.

Preferably, in step S5 ', the generating means can also further be filtered out from selected picture concerned It is preferred that picture concerned.For example, after upper example, only can pick out the picture concerned of same subject, as preferred picture concerned, Such as " this base of rabbit is applauded " and " this basic point of rabbit is praised "；Or, the similar picture of tone in picture concerned can be picked out, as It is preferred that picture concerned, such as has identical background colour or theme colors.

In step S6 ', the generating means according to the caption information and the displaying special efficacy, by the picture, The picture concerned and the voice messaging, generate one or more related short-sighted frequencies.

Specifically, in step S6 ', the generating means can be according to determined by the step S3 ' be the picture Special efficacy is shown, comes the caption information and the displaying special efficacy, the picture, the picture concerned and the voice are believed Breath, generates one or more related short-sighted frequencies.Or, in the step S6 ' of place, the generating means can be by the related figure Piece, voice messaging, the picture re-execute step S2 ' (as shown in Figure 2), for the step S2 ' be the above again It is determined that displaying special efficacy, where it determines that the determination method of corresponding intrument is same or similar in the method and Fig. 3 of the displaying special efficacy, Therefore will not be repeated here.

Then, in step S6 ', the generating means, will be described according to the caption information and the displaying special efficacy Picture, the picture concerned and the voice messaging, generate one or more related short-sighted frequencies.

Preferably, the step S5 ' includes step S51 ' (not shown) and step S52 ' (not shown)；Wherein, in step In rapid S51 ', the generating means determine the picture concerned quantity associated with the picture；In step S52 ', the generation Device is according to the voice messaging, the picture and the picture concerned quantity, it is determined that one associated with the picture Or multiple picture concerneds.

Specifically, in step S51 ', the generating means by way of according to pre-seting, it is determined that with the picture phase The picture concerned quantity of association；Or, it is highly preferred that based on one or more of mode, it is determined that associated with the picture Picture concerned quantity：

In step S52 ', the generating means in the phonetic feature and/or semantic feature according to the voice messaging, and On the basis of picture feature with reference to the picture, to determine and phase perhaps in feature including above-mentioned voice messaging and the picture Association, one or more picture concerned that is meeting above-mentioned picture concerned quantitative requirement.

It should be noted that the present invention can be carried out in the assembly of software and/or software and hardware, for example, can adopt Realized with application specific integrated circuit (ASIC), general purpose computer or any other similar hardware device.In one embodiment In, software program of the invention can realize steps described above or function by computing device.Similarly, it is of the invention Software program (including related data structure) can be stored in computer readable recording medium storing program for performing, for example, RAM memory, Magnetically or optically driver or floppy disc and similar devices.In addition, some steps or function of the present invention can employ hardware to realize, example Such as, as coordinating with processor so as to performing the circuit of each step or function.

In addition, the part of the present invention can be applied to computer program product, such as computer program instructions, when its quilt When computer is performed, by the operation of the computer, the method according to the invention and/or technical scheme can be called or provided. And the programmed instruction of the method for the present invention is called, it is possibly stored in fixed or moveable recording medium, and/or pass through Broadcast or the data flow in other signal bearing medias and be transmitted, and/or be stored according to described program instruction operation In the working storage of computer equipment.Here, including a device according to one embodiment of present invention, the device includes using In the memory and processor for execute program instructions of storage computer program instructions, wherein, when the computer program refers to When order is by the computing device, method and/or skill of the plant running based on foregoing multiple embodiments according to the present invention are triggered Art scheme.

It is obvious to a person skilled in the art that the invention is not restricted to the details of above-mentioned one exemplary embodiment, Er Qie In the case of without departing substantially from spirit or essential attributes of the invention, the present invention can be realized in other specific forms.Therefore, no matter From the point of view of which point, embodiment all should be regarded as exemplary, and be nonrestrictive, the scope of the present invention is by appended power Profit is required rather than described above is limited, it is intended that all in the implication and scope of the equivalency of claim by falling Change is included in the present invention.Any reference in claim should not be considered as to the claim involved by limitation.This Outside, it is clear that the word of " comprising " one is not excluded for other units or step, and odd number is not excluded for plural number.That is stated in device claim is multiple Unit or device can also be realized by a unit or device by software or hardware.The first, the second grade word is used for table Show title, and be not offered as any specific order.

Claims

1. a kind of method for generating short-sighted frequency, wherein, this method comprises the following steps：

C according to the phonetic feature and/or semantic feature of the voice messaging, it is determined that with the picture and/or the caption information Corresponding displaying special efficacy；

2. according to the method described in claim 1, wherein, the step c includes：

3. method according to claim 1 or 2, wherein, the step c includes：

4. according to the method in any one of claims 1 to 3, wherein, this method also includes：

X is according to the voice messaging and the picture, it is determined that the one or more picture concerneds associated with the picture；

Wherein, this method also includes：

- according to the caption information and the displaying special efficacy, by the picture, the picture concerned and the voice messaging, Generate one or more related short-sighted frequencies.

5. method according to claim 4, wherein, the step x includes：

X1 determines the picture concerned quantity associated with the picture；

- according to the voice messaging, the picture and the picture concerned quantity, it is determined that one associated with the picture Or multiple picture concerneds.

6. method according to claim 5, wherein, the step x1 includes following at least any one：

7. method according to any one of claim 1 to 6, wherein, this method also includes：

The one or more history voice messagings of-acquisition user, it is determined that the user vocal feature corresponding with the user Storehouse；

Wherein, the step c includes：

- according to the semantic feature of the phonetic feature and/or the voice messaging, it is determined that with the picture and/or the captions Displaying special efficacy corresponding to information.

8. method according to any one of claim 1 to 7, wherein, this method also includes：

The relevant configuration information of-the application according to corresponding to the short-sighted frequency, the short-sighted frequency unloading is answered to be one or more Use available formats；

9. method according to any one of claim 1 to 8, wherein, the displaying special efficacy includes one or more dynamics Effect.

10. a kind of generating means for being used to generate short-sighted frequency, wherein, the generating means include：

Acquisition device, for obtaining the one or more languages of one or more pictures and user to one or more of pictures Message ceases；

Captions determining device, for the content according to the voice messaging, it is determined that the captions letter corresponding with the voice messaging Breath；

Special efficacy determining device, for the phonetic feature and/or semantic feature according to the voice messaging, it is determined that with the picture And/or the displaying special efficacy corresponding to the caption information；

Video-generating device, for according to the caption information and the displaying special efficacy, the picture and voice messaging to be given birth to Into short-sighted frequency.

11. generating means according to claim 10, wherein, the special efficacy determining device is used for：

12. the generating means according to claim 10 or 11, wherein, the special efficacy determining device is used for：

13. the generating means according to any one of claim 10 to 12, wherein, the generating means also include：

Picture concerned determining device, for according to the voice messaging and the picture, it is determined that associated with the picture One or more picture concerneds；

Wherein, the generating means also include：

Associated video generating means, for according to the caption information and the displaying special efficacy, by the picture, the correlation Picture and the voice messaging, generate one or more related short-sighted frequencies.

14. generating means according to claim 13, wherein, the picture concerned determining device includes：

Associate determining unit, for according to the voice messaging, the picture and the picture concerned quantity, it is determined that with it is described The associated one or more picture concerneds of picture.

15. generating means according to claim 14, wherein, the quantity determining unit is used for following at least any one：

16. the generating means according to any one of claim 10 to 15, wherein, the generating means also include：

History acquisition device, one or more history voice messagings for obtaining the user, it is determined that relative with the user The user vocal feature storehouse answered；

Wherein, the special efficacy determining device is used for：

17. the generating means according to any one of claim 10 to 16, wherein, the generating means also include：

Transferring device, for the relevant configuration information of the application according to corresponding to the short-sighted frequency, be by the short-sighted frequency unloading One or more application available formats；

18. the generating means according to any one of claim 10 to 17, wherein, the displaying special efficacy includes a kind of or many Plant dynamic effect.

19. a kind of input equipment, including the generating means as any one of claim 10 to 18.