CN109462768A

CN109462768A - A kind of caption presentation method and terminal device

Info

Publication number: CN109462768A
Application number: CN201811253131.5A
Authority: CN
Inventors: 胡吉祥
Original assignee: Vivo Mobile Communication Co Ltd
Current assignee: Vivo Mobile Communication Co Ltd
Priority date: 2018-10-25
Filing date: 2018-10-25
Publication date: 2019-03-12

Abstract

The embodiment of the present invention provides a kind of caption presentation method and terminal device, is related to field of communication technology, to solve the problems, such as that existing terminal device shows that the effect of subtitle is poor.This method comprises: obtaining the voice characteristics information of target speech data, voice characteristics information includes at least one of the following: the speech volume in target speech data, the speech speed in target speech data, the speech tone in target speech data, and the voice tone in target speech data, voice characteristics information are used to indicate the tone of the corresponding voice of target speech data；According to voice characteristics information, target display mode corresponding with the tone is determined, target display mode is for showing subtitle corresponding with target speech data；Subtitle is shown with target display mode.This method is applied to terminal device and shows in the scene of subtitle.

Description

A kind of caption presentation method and terminal device

Technical field

The present embodiments relate to field of communication technology more particularly to a kind of caption presentation methods and terminal device.

Background technique

With the rapid development of communication technology, terminal device using more and more extensive, performance of the user to terminal device It is required that also higher and higher.

Currently, the picture of video can be not only shown on the display interface of terminal device when terminal device plays video, It can also show the caption information related with video such as title, producer's list, dialogue, libretto of video (following system Referred to as subtitle).Wherein, all subtitles in video are in the post production process of video, and producer is that video is uniformly matched It sets.In general, terminal device can be with the spolen title and video of simultaneous display video during terminal device plays video Picture so that user can clearly know the content of video in conjunction with these spolen titles and picture.

However, producer is unified for video since all subtitles in video are in the post production process of video Configuration, thus when terminal device shows these subtitles be according to displays such as unified configured style, formats so that Terminal device shows that the mode of subtitle is more dull, and then terminal device is caused to show that the effect of subtitle is poor.

Summary of the invention

The embodiment of the present invention provides a kind of caption presentation method and terminal device, to solve the effect that terminal device shows subtitle The poor problem of fruit.

In order to solve the above-mentioned technical problem, the present invention is implemented as follows:

In a first aspect, the embodiment of the invention provides a kind of methods of Subtitle Demonstration, comprising: obtain target speech data Voice characteristics information, the voice characteristics information include at least one of the following: speech volume, target voice in target speech data The voice tone in the speech tone and target speech data in speech speed, target speech data in data, the voice Characteristic information is used to indicate the tone of the corresponding voice of the target speech data；And according to the voice characteristics information, determines and be somebody's turn to do The corresponding target display mode of the tone, the target display mode is for showing subtitle corresponding with the target speech data；And The subtitle is shown with the target display mode.

Second aspect, the embodiment of the invention provides a kind of terminal devices, including obtain module, determining module and display mould Block.Module is obtained, for obtaining the voice characteristics information of target speech data, which includes following at least one : the speech speed in speech volume, target speech data in target speech data, the voice sound in target speech data Voice tone in tune and target speech data, the voice characteristics information are used to indicate the corresponding language of the target speech data The tone of sound；Determining module, the voice characteristics information for obtaining according to module is obtained, determines target corresponding with the tone Display mode, the target display mode is for showing subtitle corresponding with the target speech data；Display module, for determination The target display mode that module determines shows the subtitle.

The third aspect, the embodiment of the invention provides a kind of terminal device, the terminal device include processor, memory and It is stored in the computer program that can be run on the memory and on the processor, when which is executed by the processor The step of realizing the caption presentation method in such as above-mentioned first aspect.

Fourth aspect, the embodiment of the invention provides a kind of computer readable storage medium, the computer-readable storage mediums Computer program is stored in matter, is realized when which is executed by processor such as the Subtitle Demonstration side in above-mentioned first aspect The step of method.

In the embodiment of the present invention, the voice characteristics information of available target speech data (is used to indicate the target voice The tone of the corresponding voice of data), and according to the voice characteristics information, determination is corresponding with the tone that the voice characteristics information indicates Target display mode (for showing corresponding with target speech data subtitle), and shown with the target display mode and The corresponding subtitle of the target speech data.Wherein, voice characteristics information includes at least one of the following: the language in target speech data In the speech tone and target speech data in speech speed, target speech data in sound volume, target speech data Voice tone.With this solution, since the voice characteristics information of target speech data can serve to indicate that the target speech data The tone of corresponding voice, therefore the embodiment of the present invention can determine mesh corresponding with the tone according to the voice characteristics information Display mode is marked, for showing subtitle corresponding with the target speech data.In this way, due to the corresponding language of different phonetic data The tone of sound may be different, therefore also not identical according to the display mode that the voice characteristics information of different phonetic data determines, from And terminal device is allowed to show the corresponding subtitle of different phonetic data with different display modes, so that terminal device It shows that the display mode of subtitle is relatively abundanter, improves the effect that terminal device shows subtitle.

Detailed description of the invention

Fig. 1 is the configuration diagram of Android operation system provided in an embodiment of the present invention；

Fig. 2 is one of the schematic diagram of caption presentation method provided in an embodiment of the present invention；

Fig. 3 is one of the interface schematic diagram of caption presentation method provided in an embodiment of the present invention application；

Fig. 4 is the two of the interface schematic diagram of caption presentation method provided in an embodiment of the present invention application；

Fig. 5 is the three of the interface schematic diagram of caption presentation method provided in an embodiment of the present invention application；

Fig. 6 is the two of the schematic diagram of caption presentation method provided in an embodiment of the present invention；

Fig. 7 is the structural schematic diagram of terminal device provided in an embodiment of the present invention；

Fig. 8 is the hardware schematic of terminal device provided in an embodiment of the present invention.

Specific embodiment

Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation description, it is clear that described embodiments are some of the embodiments of the present invention, instead of all the embodiments.Based on this hair Embodiment in bright, every other implementation obtained by those of ordinary skill in the art without making creative efforts Example, shall fall within the protection scope of the present invention.

The terms "and/or" is a kind of incidence relation for describing affiliated partner, indicates may exist three kinds of relationships, For example, A and/or B, can indicate: individualism A exists simultaneously A and B, these three situations of individualism B.Symbol herein "/" indicates that affiliated partner is relationship such as A/B expression A or B perhaps.

Term " first " and " second " in description and claims of this specification etc. are for distinguishing different pairs As, rather than it is used for the particular order of description object.For example, first position and second position etc. are for distinguishing different positions It sets, rather than the particular order for describing position.

In embodiments of the present invention, " illustrative " or " such as " etc. words for indicate make example, illustration or explanation.This Be described as in inventive embodiments " illustrative " or " such as " any embodiment or design scheme be not necessarily to be construed as comparing Other embodiments or design scheme more preferably or more advantage.Specifically, use " illustrative " or " such as " etc. words purport Related notion is being presented in specific ways.

In the description of the embodiment of the present invention, unless otherwise indicated, the meaning of " plurality " is refer to two or more, For example, multiple element refers to two or more element etc..

First below in the embodiment of the present invention each noun and/or term be explained.

Subtitle: the definition of broad sense refers to the character (such as text, symbol etc.) shown on screen.The definition of narrow sense refers to screen Character being shown on curtain and corresponding with voice data.

Wherein, subtitle involved in the embodiment of the present invention is the narrow sense definition of the subtitle of foregoing description.

Streaming Media: refer to the media formats played by the way of stream transmission.Specifically, Streaming Media may include terminal The Streaming Media and terminal device saved in equipment passes through the Streaming Media that network plays in real time in terminal device.

The embodiment of the present invention provides a kind of caption presentation method and terminal device, the voice of available target speech data Characteristic information (tone for being used to indicate the corresponding voice of the target speech data), and according to the voice characteristics information, determine with The corresponding target display mode of the tone of voice characteristics information instruction is (for showing word corresponding with the target speech data Curtain), and subtitle corresponding with the target speech data is shown with the target display mode.Wherein, voice characteristics information is used for Indicate the tone of the corresponding voice of target speech data；Target display mode is for showing word corresponding with target speech data Curtain.With this solution, since the voice characteristics information of target speech data can serve to indicate that the target speech data is corresponding The tone of voice, therefore the embodiment of the present invention can determine that target corresponding with the tone is shown according to the voice characteristics information Mode, for showing subtitle corresponding with the target speech data.In this way, due to the language of the corresponding voice of different phonetic data Gas may be different, therefore also not identical according to the display mode that the voice characteristics information of different phonetic data determines, so that Terminal device can show the corresponding subtitle of different phonetic data with different display modes, so that terminal device shows word The display mode of curtain is relatively abundanter, improves the effect that terminal device shows subtitle.

Terminal device in the embodiment of the present invention can be the terminal device with operating system.The operating system can be Android (Android) operating system can be ios operating system, can also be other possible operating systems, and the present invention is implemented Example is not especially limited.

Below by taking Android operation system as an example, introduce applied by caption presentation method provided in an embodiment of the present invention Software environment.

As shown in Figure 1, being a kind of configuration diagram of possible Android operation system provided in an embodiment of the present invention.Scheming In 1, the framework of Android operation system includes 4 layers, be respectively as follows: application layer, application framework layer, system Runtime Library layer and Inner nuclear layer (is specifically as follows Linux inner core).

Wherein, application layer includes each application program (including system application and in Android operation system Tripartite's application program).

Application framework layer is the frame of application program, and developer can be in the exploitation for the frame for abiding by application program In the case where principle, some application programs are developed based on application framework layer.

System Runtime Library layer includes library (also referred to as system library) and Android operation system running environment.Library is mainly Android behaviour As system it is provided needed for all kinds of resources.Android operation system running environment is used to provide software loop for Android operation system Border.

Inner nuclear layer is the operating system layer of Android operation system, belongs to the bottom of Android operation system software level.It is interior Stratum nucleare provides core system service and hardware-related driver based on linux kernel for Android operation system.

By taking Android operation system as an example, in the embodiment of the present invention, developer can be based on above-mentioned Android as shown in Figure 1 The software program of caption presentation method provided in an embodiment of the present invention is realized in the system architecture of operating system, exploitation, so that The caption presentation method can be run based on Android operation system as shown in Figure 1.I.e. processor or terminal device can lead to It crosses and runs software program realization caption presentation method provided in an embodiment of the present invention in Android operation system.

Terminal device in the embodiment of the present invention can be mobile terminal, or immobile terminal.Illustratively, it moves Dynamic terminal can be mobile phone, tablet computer, laptop, palm PC, car-mounted terminal, wearable device, super movement People's computer (ultra-mobile personal computer, UMPC), net book or personal digital assistant (personal Digital assistant, PDA) etc., immobile terminal can be personal computer (personal computer, PC), electricity Depending on machine (television, TV), automatic teller machine or self-service machine etc., the embodiment of the present invention is not especially limited.

The executing subject of caption presentation method provided in an embodiment of the present invention can be above-mentioned terminal device, or The functional module and/or functional entity that can be realized the caption presentation method in the terminal device, specifically can be according to reality Use demand determines that the embodiment of the present invention is not construed as limiting.Below by taking terminal device as an example, to subtitle provided in an embodiment of the present invention Display methods is illustratively illustrated.

Caption presentation method provided in an embodiment of the present invention can be applied to any one in be exemplified below three kinds of scenes In kind scene.

Scene one: (voice data in Streaming Media is mesh provided in an embodiment of the present invention to the subtitle in display stream medium Mark voice data).

Scene two: (content of user's input is that the embodiment of the present invention mentions to the corresponding subtitle of content of display user's input The target speech data of confession).

Scene three: the corresponding subtitle of content (the voice number in Streaming Media of subtitle and user's input in display stream medium According to the first voice data as provided in an embodiment of the present invention, the content of user's input is provided in an embodiment of the present invention second Voice data, wherein target speech data provided in an embodiment of the present invention includes the first voice data and second speech data).

It is corresponding in terminal device displaying target voice data in any one of the above scene in the embodiment of the present invention Before subtitle, terminal device can first determine the display mode of the subtitle, then show the subtitle again with the display mode.Specifically , after terminal device obtains target speech data, it can first obtain the voice characteristics information of the target speech data, and according to The voice characteristics information determines target display mode corresponding with the tone that the voice characteristics information indicates, then again with the mesh Mark display mode shows subtitle corresponding with the target speech data.In this way, target display mode can be used for expressing target language The tone of the voice characteristics information instruction of sound data.

Lower mask body combine above-mentioned three kinds of scenes and each attached drawing to caption presentation method provided in an embodiment of the present invention into The illustrative explanation of row.

As shown in Fig. 2, the embodiment of the present invention provides a kind of caption presentation method, this method may include following S201- S203。

S201, terminal device obtain the voice characteristics information of target speech data.

Wherein, above-mentioned voice characteristics information may include at least one of following: phonetic characters, mesh in target speech data Speech volume, the speech speed in target speech data, the speech tone in target speech data in voice data are marked, and Voice tone in target speech data.Above-mentioned voice characteristics information can serve to indicate that the corresponding voice of target speech data The tone.

Optionally, in the embodiment of the present invention, above-mentioned target speech data may include the first voice data and the second voice At least one of in data.Wherein, above-mentioned first voice data can be the voice data in Streaming Media, above-mentioned second voice number According to the voice data that can be terminal device acquisition.

It should be understood that above-mentioned target speech data can be the voice data in Streaming Media in the embodiment of the present invention, It can be the voice data of terminal device acquisition, can also be the voice number of voice data and terminal device acquisition in Streaming Media According to.

Optionally, in the embodiment of the present invention, the voice data in above-mentioned Streaming Media can be the language saved in terminal device Sound data (such as the voice data that can be recorded for the voice data or terminal device that terminal device is downloaded from network), also Can to be terminal device be played in terminal device in real time by network (such as can be that terminal device is obtained from server, And played in real time in terminal device) the voice data such as voice data.It can specifically be determined according to actual use demand, this hair Bright embodiment is not construed as limiting.

In the embodiment of the present invention, above-mentioned Streaming Media any one can may include voice data for audio or video etc. Streaming Media.It can specifically be determined according to actual use demand, the embodiment of the present invention is not construed as limiting.

In the embodiment of the present invention, the voice data of above-mentioned terminal device acquisition can be filled for terminal device by audio collection Set the voice data in the terminal device local environment of acquisition.

Optionally, in the embodiment of the present invention, above-mentioned audio collecting device (can be specifically as follows terminal device for microphone On microphone) etc. any possible audio collecting device.It can specifically be determined according to actual use demand, the embodiment of the present invention It is not construed as limiting.

In the embodiment of the present invention, the phonetic characters in above-mentioned target speech data can serve to indicate that target speech data Content；Illustratively, the phonetic characters of target speech data can be the keyword etc. of target speech data.Above-mentioned target voice Speech volume in data can serve to indicate that the volume of target speech data；Illustratively, in target speech data Speech volume can be amount of bass, middle volume and louder volume etc..Speech speed in above-mentioned target speech data can be used for referring to Show the speed of speech speed in target speech data；Illustratively, the speech speed in target speech data may include at a slow speed, Middling speed and quickly etc..Speech tone in above-mentioned target speech data can serve to indicate that sound vibrations frequency in target speech data The height of rate；Illustratively, illustratively, the speech tone in target speech data may include low frequency, intermediate frequency and high frequency etc., Speech tone (correspond to high frequency) of the speech tone (corresponding to low frequency) of usual male lower than women.Above-mentioned target speech data In voice tone can serve to indicate that the modulation in tone of voice in target speech data；Illustratively, voice tone can wrap Include high and level tone, rising tone, upper sound and falling tone.

Optionally, in the embodiment of the present invention, above-mentioned each voice characteristics information can be the whole in target speech data The voice characteristics information of data or partial data can specifically determine that the embodiment of the present invention is not limited according to actual use demand It is fixed.Illustratively, by taking the voice characteristics information of target speech data is the phonetic characters in target speech data as an example, the voice Character can be the alphabet or partial character (such as keyword etc.) in target speech data.Again with target speech data For voice characteristics information is the speech speed in target speech data, which can be whole in target speech data The corresponding speech speed of voice data or the corresponding speech speed of part of speech data.

In the embodiment of the present invention, different voice data may have different voice characteristics informations, due to phonetic feature Information is used to indicate the tone of the corresponding voice of voice data, and therefore, the tone of different voice characteristics information instructions is also different, The tone of the corresponding voice of i.e. different voice data is also different.

Optionally, in the embodiment of the present invention, the above-mentioned tone can be used to indicate that the emotion of voice data expression.

In the embodiment of the present invention, the above-mentioned tone can be indicative mood, the query tone, imperative mood or exclamation tone etc. A kind of corresponding tone of any one possible tone type.Specifically, every kind of tone type can correspond to a variety of tone, and every The kind tone can express at least one emotion.

Illustratively, when tone type be indicative mood when, the tone type can correspond to " it is real, determine and Meet true " etc. a variety of tone, correspondingly, the emotion that indicates of every kind of tone can be " real, determining and meet thing It is real " etc. at least one emotion.

Again illustrative, when tone type is the query tone, which can correspond to " inquire, ask in retort, estimating " Etc. a variety of tone, correspondingly, the emotion that every kind of tone indicates can be " inquire, ask in retort, estimating " at least one emotion.

It is again illustrative, when tone type is imperative mood, the tone type can correspond to " suggest, request, invitation, Order " etc. a variety of tone, correspondingly, every kind of tone indicate emotion can for " suggest, request, invite, order " etc. at least one Kind emotion.

It is again illustrative, when tone type is to sigh with feeling the tone, the tone type can correspond to " glad, indignation, it is sad, It is happy " etc. a variety of tone, correspondingly, the emotion that indicates of every kind of tone can be " glad, indignation, sad, happy " etc. at least one Kind emotion.

Optionally, in the embodiment of the present invention, a kind of emotion can correspond at least one display mode.Illustratively, it is assumed that The emotion of above-mentioned target speech data expression is " happiness ", then " happiness " corresponding display mode can be with target voice " ha ha ha " is added behind the corresponding subtitle of data, or one is added behind subtitle corresponding with target speech data Other any possible display modes such as smiling face's mark.

In the embodiment of the present invention, the tone that terminal device can be indicated according to the voice characteristics information of target speech data, It determines target display mode corresponding with the tone, and word corresponding with the target speech data is shown with the target display mode Curtain.In this way, since the tone can be used to indicate that the emotion of target speech data expression, the mesh determined according to the tone Mark display mode can also indicate the emotion of target speech data expression, to be shown and the mesh by the target display mode The corresponding subtitle of voice data is marked, the diversity that terminal device shows the display mode of subtitle can be improved.

S202, terminal device determine target display mode corresponding with the tone according to voice characteristics information.

Wherein, above-mentioned target display mode is displayed for subtitle corresponding with target speech data.

It is appreciated that the above-mentioned different tone can correspond to different target display modes in the embodiment of the present invention.When After terminal device determines the different tone according to different voice characteristics informations, terminal device can be according to these different tone Determine the tone corresponding different target display mode different from these.

Optionally, in the embodiment of the present invention, above-mentioned target display mode may include at least one of following: to be particularly shown Effect shows subtitle, and mark is added on subtitle.

Wherein, the above-mentioned effect and mark of being particularly shown may be incorporated for indicating the tone.

Optionally, above-mentioned to be particularly shown the font size that effect be the display color for changing subtitle, change subtitle, with And increase character pitch etc. in subtitle any one, two or more combined display effect.It specifically can basis Actual use demand determines that the embodiment of the present invention is not construed as limiting.

Optionally, above-mentioned mark can for smiling face, flame, tears and it is terrified etc. any one can indicate target voice The mark of the emotion of data representation.It can specifically be determined according to actual use demand, the embodiment of the present invention is not construed as limiting.

Illustratively, it is assumed that the display color of the subtitle of terminal device default is white, then, when terminal device determines language Gas is when indicating the tone of indignation, and terminal device can determine the display color of subtitle for red；When terminal device determines the tone When to indicate the glad tone, terminal device can determine that the display color of subtitle is yellow；When terminal device determines that the tone is When indicating the sad tone, terminal device can determine the display color of subtitle for blue.

It is again illustrative, as shown in (a) in Fig. 3, if terminal device determines the voice characteristics information of target speech data The tone of instruction is to indicate the glad tone, then the target display mode of the corresponding subtitle of target speech data " I am good happy " can Think and adds the mark of a smiling face below in subtitle " I am good happy ".As shown in (b) in Fig. 3, if terminal device determines mesh The tone of the voice characteristics information instruction of mark voice data is to indicate sad language, then corresponding subtitle " I of target speech data It is good sad " target display mode can be to add a mark shed tears below in subtitle " I am good sad "；In Fig. 3 (c) shown in, if terminal device determines that the tone of the voice characteristics information instruction of target speech data is to indicate the tone of indignation, The target display mode of the corresponding subtitle of target speech data " I am angry " can be in subtitle " I am angry " addition below The mark of one flame.

In the embodiment of the present invention, terminal device passes through to be particularly shown effect display subtitle, or the addition mark on subtitle The mode of knowledge shows subtitle, the diversity that terminal device shows the display mode of subtitle can be improved, to improve terminal device Show the interest of subtitle.

Optionally, in the embodiment of the present invention, above-mentioned S202 can specifically be realized by following S202a and S202b.

S202a, terminal device determine the tone according to voice characteristics information.

In the embodiment of the present invention, terminal device can obtain and the mesh according to the voice characteristics information of target speech data Mark the tone of the corresponding voice of voice data.

Optionally, in the embodiment of the present invention, terminal device can be according to above-mentioned at least one voice characteristics information enumerated Determine the tone of the corresponding voice of target speech data got.

Optionally, in the embodiment of the present invention, voice characteristics information can be preset in terminal device.The preset phonetic feature Information may include at least one of following: phonetic characters database, speech volume threshold value, speech speed threshold value, speech frequency threshold Value and voice tone pitch library etc..In this way, terminal is set after the voice characteristics information that terminal device obtains target speech data It is standby to compare the voice characteristics information and preset voice characteristics information, and according to comparison result, determine the target voice number According to the tone of corresponding voice.

Wherein, the phonetic characters in above-mentioned phonetic characters database specifically can be according to actual use demand setting, this hair Bright embodiment is not construed as limiting.It may include each voice tone pitch in above-mentioned voice tone pitch library, such as high and level tone, rising tone, upper sound and go Sound.

Illustratively, it is assumed that above-mentioned speech volume threshold value is 10 decibels (dB), and speech speed threshold value is 100 characters-per-seconds, And terminal device is according to the phonetic characters in target speech data, the speech volume in target speech data and target speech data In speech speed judge the tone of the corresponding voice of target speech data, then, when the target voice number that terminal device obtains Phonetic characters in are " I am very angry " (for the phonetic characters in phonetic characters database), and the language in target speech data Sound volume is that the speech speed in 20dB (being greater than speech volume threshold value) and target speech data is that 120 characters-per-seconds (are greater than Speech speed threshold value) when, terminal device can determine the tone of the corresponding voice of the target speech data for indignation.

It is again illustrative, it is assumed that speech frequency threshold value is 1000 beautiful (mel), and terminal device is according in target speech data Phonetic characters, the voice tone in speech tone and target speech data in target speech data judge target speech data The tone of corresponding voice, then, when the phonetic characters in the target speech data that terminal device obtains be " I good happy " (for Phonetic characters in phonetic characters database), and the speech tone in target speech data is that 1250mel (is greater than speech frequency Threshold value) and voice tone be high and level tone (identical as the high and level tone in voice tone pitch library) when, terminal device can determine the target language The tone of the corresponding voice of sound data is happiness.

S202b, terminal device determine default display mode corresponding with the tone according to the tone.

In the embodiment of the present invention, terminal device is after the voice characteristics information according to target speech data determines the tone, eventually End equipment can determine default display mode corresponding with the tone according to the tone.It is appreciated that in the embodiment of the present invention, The different tone correspond to different display modes.

Optionally, in the embodiment of the present invention, the corresponding relationship of the tone and display mode can be preset in terminal device, when After terminal device determines the tone (the hereinafter referred to as target tone) of the corresponding voice of target speech data, terminal device can basis The corresponding relationship of the tone and display mode determines default display mode corresponding with the target tone.

Default display mode corresponding with the tone is determined as target display mode by S202c, terminal device.

Optionally, in the embodiment of the present invention, when terminal device is according to the determination of the corresponding relationship of the tone and display mode and mesh After the corresponding default display mode of poster gas, this can be preset display mode as above-mentioned target display mode by terminal device. To which terminal device can show subtitle corresponding with target speech data with the target display mode.

Illustratively, it is assumed that above-mentioned speech volume threshold value is 10dB, and terminal device is according to target voice The speech volume in phonetic characters and target speech data in data judges the tone of the corresponding voice of target speech data, when The phonetic characters in target speech data that terminal device obtains are " heartily " (for the voice word in phonetic characters database Symbol), and when the speech volume in target speech data is 30dB (be greater than speech volume threshold value), terminal device can determine the mesh The tone for marking the corresponding voice of voice data is happiness.Assuming that glad corresponding default display mode are as follows: put the font size of subtitle It is twice display, then the font size of subtitle corresponding with the target speech data can be amplified one times of display by terminal device.Example Such as, if terminal device shows that the default font size of subtitle is 12 pounds, terminal device can be with 24 pounds of font size in terminal device Subtitle corresponding with the target speech data is shown on display interface.For example, as shown in figure 4, terminal device can be with 24 pounds Font size shows subtitle " heartily " corresponding with the target speech data.

In the embodiment of the present invention, since the tone that the voice characteristics information of target speech data indicates can be used to indicate that this The emotion of target speech data expression, therefore the target voice number can also be indicated according to the target display mode that the tone determines According to the emotion of expression, to show subtitle corresponding with the target speech data by the target display mode, end can be improved End equipment shows the diversity of the display mode of subtitle.

S203, terminal device show subtitle with target display mode.

In the embodiment of the present invention, after terminal device determines target display mode, terminal device can be with the target display side Formula shows subtitle corresponding with target speech data.In this way, target display mode can be used for expressing the language of target speech data The tone of sound characteristic information instruction.

The display mode of the corresponding subtitle of specific voice data may refer to the phase in above-described embodiment to Fig. 3 and Fig. 4 The other associated descriptions for showing the display mode of subtitle in description and above-described embodiment to terminal device are closed, it is no longer superfluous herein It states.

Optionally, in the embodiment of the present invention, when target speech data includes the first voice data and second speech data, Above-mentioned S203 can specifically be realized by following S203a.

In the embodiment of the present invention, the voice characteristics information of above-mentioned first voice data can serve to indicate that first tone, on The voice characteristics information for stating second speech data can serve to indicate that second tone, which is first voice data pair The tone for the voice answered, second tone are the tone of the corresponding voice of the second speech data.Wherein, target display mode can To include the first display mode and the second display mode, which is display mode corresponding with first tone, should Second display mode is display mode corresponding with second tone.

S203a, terminal device show the first subtitle with the first display mode, and show the second word in a second display mode Curtain.

Wherein, above-mentioned first subtitle can be subtitle corresponding with above-mentioned first voice data, above-mentioned second subtitle be with The corresponding subtitle of above-mentioned second speech data.

In the embodiment of the present invention, in the case where target speech data includes the first voice data and second speech data, Terminal device can be believed by the phonetic feature of the voice characteristics information and second speech data that obtain the first voice data respectively Breath, and the language that the voice characteristics information of first voice data indicates is determined according to the voice characteristics information of first voice data Gas (i.e. first tone) determines that the phonetic feature of the second speech data is believed according to the voice characteristics information of the second speech data It ceases the tone (i.e. second tone) of instruction, and corresponding with first tone display mode (i.e. the is determined according to first tone One display mode), display mode (i.e. the second display mode) corresponding with second tone is determined according to second tone, then Terminal device shows the corresponding subtitle of the first voice data with the first display mode, shows the second voice number in a second display mode According to corresponding subtitle.In this way, the first display mode can be used for expressing first tone, the second display mode can be used for expressing Two tone.

Illustratively, in the embodiment of the present invention, it is assumed that the corresponding subtitle of the first voice data is " I am good happy ", and first is aobvious Show that mode is that the mark of a smiling face is added behind subtitle；The corresponding subtitle of second speech data be " heartily ", second Display mode is to put the font size of subtitle to be twice, and the interval of character in subtitle is increased, for example, if terminal device shows word The default font size of curtain is 12 pounds, is divided into 1 pound between the default character of terminal device display subtitle, then terminal device can be with 24 pounds Font size, 10 pounds of character pitch show subtitle corresponding with second speech data on the display interface of terminal device.Such as Fig. 5 institute Show, terminal device shows the corresponding subtitle of the first voice data with the mark for adding a smiling face below in subtitle " I am good happy " " I am good happy ", with 24 pounds of font size, 10 pounds of character pitch shows the corresponding subtitle " heartily " of second speech data.

In the embodiment of the present invention, terminal device shows the corresponding subtitle of the first voice data with the first display mode, with Two display modes show the corresponding subtitle of second speech data, and terminal device can be made in target speech data to include different In the case where voice data, the corresponding subtitle of different voice data is shown with different display modes, so that terminal is set The display mode of standby display subtitle is relatively abundanter, and then can be further improved the effect that terminal device shows subtitle.

Optionally, in the embodiment of the present invention, above-mentioned S203a can specifically be realized by following S203a1.

S203a1, terminal device show the first subtitle with the first display mode, show the second subtitle in a second display mode, And the displaying target subtitle in a manner of being particularly shown.

Wherein, above-mentioned target subtitle can be identical subtitle in above-mentioned first subtitle and above-mentioned second subtitle.

In the embodiment of the present invention, when in the first subtitle and the second subtitle including identical subtitle (i.e. target subtitle), eventually End equipment can show the first subtitle with the first display mode, show the second subtitle, and in a second display mode to be particularly shown Mode displaying target subtitle (can be the target subtitle in the first subtitle, target subtitle or the first subtitle in the second subtitle In target subtitle and the second subtitle in target subtitle).

Optionally, in the embodiment of the present invention, the above-mentioned display effect for being particularly shown mode can be vibration, jump, or Any possible display effect such as flashing.Specifically it can determine that the embodiment of the present invention is without limitation according to actual use demand.

Illustratively, in the embodiment of the present invention, it is assumed that the first subtitle is " I gets well happily heartily ", and the second subtitle is " very lovely heartily ", then terminal device can determine that target subtitle is " heartily ", and terminal device is with specific aobvious The mode of showing shows the target subtitle in target subtitle and the second subtitle in the first subtitle.Assume that the first display mode is in word again Curtain " I good happy heartily " adds the mark of a smiling face below；Second display mode is by the second subtitle " very lovely Kazakhstan The font size of ha ha ha " amplifies one times of display, for example, if terminal device shows that the default font size of subtitle is 12 pounds, terminal device The second subtitle can be shown on the display interface of terminal device with 24 pounds of font size；Being particularly shown mode is with the display of vibration Effect is shown " heartily ".So, terminal device can add the mark of a smiling face behind the first subtitle, with 24 pounds Font size shows the second subtitle, shows the target word in target subtitle and the second subtitle in the first subtitle with the display effect of vibration Curtain.

Caption presentation method provided in an embodiment of the present invention, since the voice characteristics information of target speech data can be used for Indicate the tone of the corresponding voice of the target speech data, therefore the embodiment of the present invention can be according to the voice characteristics information, really Fixed target display mode corresponding with the tone, for showing subtitle corresponding with the target speech data.In this way, due to not With voice data, the tone of corresponding voice may be different, therefore are shown according to what the voice characteristics information of different phonetic data determined Show that mode is not also identical, so that terminal device can show the corresponding word of different phonetic data with different display modes Curtain improves the effect that terminal device shows subtitle so that terminal device shows that the display mode of subtitle is relatively abundanter.

Optionally, in conjunction with Fig. 2, as shown in fig. 6, before above-mentioned S201, Subtitle Demonstration side provided in an embodiment of the present invention Method can also include following S204.Also, before above-mentioned S203, caption presentation method provided in an embodiment of the present invention may be used also To include following S205 and S206.

S204, terminal device obtain target speech data.

Optionally, in the embodiment of the present invention, target speech data can be the voice data (example saved in terminal device The voice data that the voice data or terminal device that can be such as downloaded from network for terminal device are recorded), terminal device it is logical Cross the voice data that network plays in real time in terminal device and the language that terminal device is acquired from terminal device local environment In sound data etc. any one, two or more combined voice data.It specifically can be according to actual use demand It determines, the embodiment of the present invention is not construed as limiting.

Specifically, in the embodiment of the present invention, when target speech data be the voice data saved in terminal device and/or When the voice data that terminal device is played in terminal device in real time by network, which can be above-mentioned scene Voice data (i.e. the first voice data) in Streaming Media described in one.When target speech data sets for terminal device from terminal When the voice data acquired in standby local environment, which can be the input of user described in above-mentioned scene two The corresponding voice data of content (i.e. second speech data)；When target speech data is the voice data saved in terminal device And/or the voice data that is played in real time in terminal device by network of terminal device and terminal device are from terminal device institute When the voice data acquired in place's environment, which can be the voice in Streaming Media described in above-mentioned scene three (i.e. target speech data includes the first voice data and the second voice number for data and the corresponding voice data of content of user's input According to).

S205, terminal device determine the acquisition modes for obtaining target speech data.

Optionally, in the embodiment of the present invention, the acquisition modes of above-mentioned acquisition target speech data can be to obtain from file It takes and/or is acquired using audio collecting device.Wherein, when the acquisition modes of above-mentioned acquisition target speech data are to obtain from file When taking, which is terminal device voice data to be played (in Streaming Media described in i.e. above-mentioned scene one Voice data)；When the acquisition modes of above-mentioned acquisition target speech data are to be acquired using audio collecting device, the target voice Data are voice data (the corresponding voice number of content of the input of user described in i.e. above-mentioned scene two of terminal device acquisition According to)；When the acquisition modes of above-mentioned acquisition target speech data include obtaining from file and being acquired using audio collecting device, The target speech data is voice data (the i.e. above-mentioned scene three of terminal device voice data to be played and terminal device acquisition Described in the voice data in Streaming Media and user's input the corresponding voice data of content).

S206, terminal device determine display position of the subtitle on the display interface of terminal device according to acquisition modes.

In the embodiment of the present invention, the acquisition modes that terminal device obtains target speech data are different, what terminal device obtained Target speech data may be different, due to display position of the corresponding subtitle of different phonetic data on the display interface of terminal device Difference is set, therefore terminal device can determine the target voice number with acquisition according to the acquisition modes for obtaining target speech data According to display position of the corresponding subtitle on the display interface of terminal device.

Optionally, in the embodiment of the present invention, when the acquisition modes for obtaining target speech data are to obtain from file, with Display position of the corresponding subtitle of the target speech data on the display interface of terminal device can be first position.Work as acquisition The acquisition modes of target speech data are when being acquired using audio collecting device, and subtitle corresponding with the target speech data is at end Display position on the display interface of end equipment can be the second position.Wherein, first position and the second position can be terminal Two different positions on the display interface of equipment.

Optionally, in the embodiment of the present invention, above-mentioned first position can be at the lower section 1/5 of display interface of terminal device Any possible display position such as corresponding position at the middle part 1/2 of the display interface of corresponding position or terminal device.On Stating the second position can be corresponding position at the top 1/4 of the display interface of terminal device.It specifically can be according to actual use Demand determines that the embodiment of the present invention is not construed as limiting.

Optionally, in the embodiment of the present invention, however, it is determined that the corresponding subtitle of target speech data is in the display in terminal device Display position on interface is above-mentioned first position, then terminal device can in conjunction with above-mentioned target display mode with barrage, flashing, Other any possible display effects such as fade in, fade out or roll and shows the subtitle in the position.It specifically can be according to reality Use demand determines that the embodiment of the present invention is not construed as limiting.

It should be noted that the execution that can not be limited between S205-S206 and S201-S202 in the embodiment of the present invention is suitable Sequence.I.e. the embodiment of the present invention can first carry out S205-S206, execute S201-S202 afterwards；S201-S202 can also be first carried out, after S205-S206 is executed, may also be performed simultaneously S205-S206 and S201-S202.Wherein, above-mentioned Fig. 6 is to first carry out S205- Illustratively illustrate for S206, rear execution S201-S202, can specifically be determined according to actual use demand, the present invention is real Example is applied to be not construed as limiting.

In the embodiment of the present invention, terminal device can be according to different target speech data acquisition modes, in terminal device Display interface on the corresponding subtitle of different location displaying target voice data, therefore, user can be aobvious according to terminal device The display position for showing the corresponding subtitle of target speech data determines the voice of the subtitle shown on the display interface of terminal device Source, to improve the man-machine interaction of terminal device.

As shown in fig. 7, the embodiment of the present invention provides a kind of terminal device 700, which may include obtaining mould Block 701, determining module 702 and display module 703.Module 701 is obtained, the phonetic feature for obtaining target speech data is believed Breath；Determining module 702, the voice characteristics information for obtaining according to module 701 is obtained determine that target corresponding with the tone is shown Mode；Display module 703, the target display mode for being determined with determining module 702 show subtitle.Wherein, phonetic feature is believed Breath may include at least one of following: the speech volume in phonetic characters, target speech data in target speech data, target The voice tone in the speech tone and target speech data in speech speed, target speech data in voice data, language Sound characteristic information can serve to indicate that the tone of the corresponding voice of target speech data；Target display mode be displayed for The corresponding subtitle of target speech data.

Optionally, above-mentioned target speech data may include at least one in the first voice data and second speech data , wherein the first voice data is the voice data in Streaming Media, and second speech data is the voice number of terminal device acquisition According to.

Optionally, target speech data may include the first voice data and second speech data, the first voice data Voice characteristics information is used to indicate first tone, and the voice characteristics information of second speech data is used to indicate second tone, and first The tone is the tone of the corresponding voice of the first voice data, and second tone is the tone of the corresponding voice of second speech data.Mesh Marking display mode includes the first display mode and the second display mode, and the first display mode is display side corresponding with first tone Formula, the second display mode are display mode corresponding with second tone.Display module 703 is specifically used for the first display mode It shows the first subtitle, and shows the second subtitle in a second display mode.Wherein, the first subtitle is corresponding with the first voice data Subtitle, the second subtitle are subtitle corresponding with second speech data.

Optionally, display module 703 are specifically used for showing the first subtitle with the first display mode, in a second display mode Show the second subtitle, and the displaying target subtitle in a manner of being particularly shown.Wherein, target subtitle is in the first subtitle and the second subtitle Identical subtitle.

Optionally, target display mode is included at least one of the following: to be particularly shown effect and show subtitle, and in subtitle Upper addition mark.Wherein, it is particularly shown effect and mark may be incorporated for the instruction tone.

Terminal device provided in an embodiment of the present invention can be realized terminal device in above-mentioned caption presentation method embodiment and hold Capable each process, and identical technical effect can be reached, to avoid repeating, details are not described herein again.

Terminal device provided in an embodiment of the present invention, since the voice characteristics information of target speech data can serve to indicate that The tone of the corresponding voice of the target speech data, thus the embodiment of the present invention can according to the voice characteristics information, determine with The corresponding target display mode of the tone, for showing subtitle corresponding with the target speech data.In this way, due to different languages The tone of the corresponding voice of sound data may display sides that are different, therefore being determined according to the voice characteristics information of different phonetic data Formula is not also identical, so that terminal device can show the corresponding subtitle of different phonetic data with different display modes, into And terminal device is made to show that the display mode of subtitle is relatively abundanter, improve the effect that terminal device shows subtitle.

A kind of hardware schematic of Fig. 8 terminal device of each embodiment to realize the present invention.As shown in figure 8, terminal is set Standby 100 include but is not limited to: radio frequency unit 101, network module 102, audio output unit 103, input unit 104, sensor 105, display unit 106, user input unit 107, interface unit 108, memory 109, processor 110 and power supply 111 etc. Component.It will be understood by those skilled in the art that terminal device structure shown in Fig. 8 does not constitute the restriction to terminal device, Terminal device may include perhaps combining certain components or different component layouts than illustrating more or fewer components.? In the embodiment of the present invention, terminal device includes but is not limited to mobile phone, tablet computer, laptop, palm PC, vehicle-mounted end End, wearable device and pedometer etc..

Wherein, processor 110 are believed for obtaining the voice characteristics information of target speech data, and according to the phonetic feature Breath determines target display mode corresponding with the tone that the voice characteristics information indicates；Display unit 106, for processor 110 target display modes determined show subtitle corresponding with the target speech data.Wherein, voice characteristics information may include At least one of below: the speech volume in phonetic characters, target speech data in target speech data, in target speech data Speech speed, the voice tone in speech tone and target speech data in target speech data, voice characteristics information It can serve to indicate that the tone of the corresponding voice of target speech data.

It should be understood that the embodiment of the present invention in, radio frequency unit 101 can be used for receiving and sending messages or communication process in, signal Send and receive, specifically, by from base station downlink data receive after, to processor 110 handle；In addition, by uplink Data are sent to base station.In general, radio frequency unit 101 includes but is not limited to antenna, at least one amplifier, transceiver, coupling Device, low-noise amplifier, duplexer etc..In addition, radio frequency unit 101 can also by wireless communication system and network and other set Standby communication.

Terminal device provides wireless broadband internet by network module 102 for user and accesses, and such as user is helped to receive It sends e-mails, browse webpage and access streaming video etc..

Audio output unit 103 can be received by radio frequency unit 101 or network module 102 or in memory 109 The audio data of storage is converted into audio signal and exports to be sound.Moreover, audio output unit 103 can also provide and end The relevant audio output of specific function that end equipment 100 executes is (for example, call signal receives sound, message sink sound etc. Deng).Audio output unit 103 includes loudspeaker, buzzer and receiver etc..

Input unit 104 is for receiving audio or video signal.Input unit 104 may include graphics processor (graphics processing unit, GPU) 1041 and microphone 1042, graphics processor 1041 is in video acquisition mode Or the image data of the static images or video obtained in image capture mode by image capture apparatus (such as camera) carries out Reason.Treated, and picture frame may be displayed on display unit 106.Through graphics processor 1041, treated that picture frame can be deposited Storage is sent in memory 109 (or other storage mediums) or via radio frequency unit 101 or network module 102.Mike Wind 1042 can receive sound, and can be audio data by such acoustic processing.Treated audio data can be The format output that mobile communication base station can be sent to via radio frequency unit 101 is converted in the case where telephone calling model.

Terminal device 100 further includes at least one sensor 105, such as optical sensor, motion sensor and other biographies Sensor.Specifically, optical sensor includes ambient light sensor and proximity sensor, wherein ambient light sensor can be according to environment The light and shade of light adjusts the brightness of display panel 1061, and proximity sensor can close when terminal device 100 is moved in one's ear Display panel 1061 and/or backlight.As a kind of motion sensor, accelerometer sensor can detect in all directions (general For three axis) size of acceleration, it can detect that size and the direction of gravity when static, can be used to identify terminal device posture (ratio Such as horizontal/vertical screen switching, dependent game, magnetometer pose calibrating), Vibration identification correlation function (such as pedometer, tap)；It passes Sensor 105 can also include fingerprint sensor, pressure sensor, iris sensor, molecule sensor, gyroscope, barometer, wet Meter, thermometer, infrared sensor etc. are spent, details are not described herein.

Display unit 106 is for showing information input by user or being supplied to the information of user.Display unit 106 can wrap Display panel 1061 is included, liquid crystal display (liquid crystal display, LCD), Organic Light Emitting Diode can be used Forms such as (organic light-emitting diode, OLED) configure display panel 1061.

User input unit 107 can be used for receiving the number or character information of input, and generate the use with terminal device Family setting and the related key signals input of function control.Specifically, user input unit 107 include touch panel 1071 and Other input equipments 1072.Touch panel 1071, also referred to as touch screen collect the touch operation of user on it or nearby (for example user uses any suitable objects or attachment such as finger, stylus on touch panel 1071 or in touch panel 1071 Neighbouring operation).Touch panel 1071 may include both touch detecting apparatus and touch controller.Wherein, touch detection Device detects the touch orientation of user, and detects touch operation bring signal, transmits a signal to touch controller；Touch control Device processed receives touch information from touch detecting apparatus, and is converted into contact coordinate, then gives processor 110, receiving area It manages the order that device 110 is sent and is executed.Furthermore, it is possible to more using resistance-type, condenser type, infrared ray and surface acoustic wave etc. Seed type realizes touch panel 1071.In addition to touch panel 1071, user input unit 107 can also include other input equipments 1072.Specifically, other input equipments 1072 can include but is not limited to physical keyboard, function key (such as volume control button, Switch key etc.), trace ball, mouse, operating stick, details are not described herein.

Further, touch panel 1071 can be covered on display panel 1061, when touch panel 1071 is detected at it On or near touch operation after, send processor 110 to determine the type of touch event, be followed by subsequent processing device 110 according to touching The type for touching event provides corresponding visual output on display panel 1061.Although in fig. 8, touch panel 1071 and display Panel 1061 is the function that outputs and inputs of realizing terminal device as two independent components, but in some embodiments In, can be integrated by touch panel 1071 and display panel 1061 and realize the function that outputs and inputs of terminal device, it is specific this Place is without limitation.

Interface unit 108 is the interface that external device (ED) is connect with terminal device 100.For example, external device (ED) may include having Line or wireless head-band earphone port, external power supply (or battery charger) port, wired or wireless data port, storage card end Mouth, port, the port audio input/output (I/O), video i/o port, earphone end for connecting the device with identification module Mouthful etc..Interface unit 108 can be used for receiving the input (for example, data information, electric power etc.) from external device (ED) and By one or more elements that the input received is transferred in terminal device 100 or can be used in 100 He of terminal device Data are transmitted between external device (ED).

Memory 109 can be used for storing software program and various data.Memory 109 can mainly include storing program area The storage data area and, wherein storing program area can (such as the sound of application program needed for storage program area, at least one function Sound playing function, image player function etc.) etc.；Storage data area can store according to mobile phone use created data (such as Audio data, phone directory etc.) etc..In addition, memory 109 may include high-speed random access memory, it can also include non-easy The property lost memory, a for example, at least disk memory, flush memory device or other volatile solid-state parts.

Processor 110 is the control centre of terminal device, utilizes each of various interfaces and the entire terminal device of connection A part by running or execute the software program and/or module that are stored in memory 109, and calls and is stored in storage Data in device 109 execute the various functions and processing data of terminal device, to carry out integral monitoring to terminal device.Place Managing device 110 may include one or more processing units；Optionally, processor 110 can integrate application processor and modulatedemodulate is mediated Manage device, wherein the main processing operation system of application processor, user interface and application program etc., modem processor is main Processing wireless communication.It is understood that above-mentioned modem processor can not also be integrated into processor 110.

Terminal device 100 can also include the power supply 111 (such as battery) powered to all parts, optionally, power supply 111 Can be logically contiguous by power-supply management system and processor 110, to realize management charging by power-supply management system, put The functions such as electricity and power managed.

In addition, terminal device 100 includes some unshowned functional modules, details are not described herein.

Optionally, the embodiment of the present invention also provides a kind of terminal device, including processor 110, and memory 109 is stored in It is real when which is executed by processor 110 on memory 109 and the computer program that can be run on processor 110 Each process of existing above-mentioned caption presentation method embodiment, and identical technical effect can be reached, to avoid repeating, here no longer It repeats.

The embodiment of the present invention also provides a kind of computer readable storage medium, and meter is stored on computer readable storage medium Calculation machine program, the computer program realize each process of above-mentioned caption presentation method embodiment, and energy when being executed by processor Reach identical technical effect, to avoid repeating, which is not described herein again.

Wherein, the computer readable storage medium may include read-only memory (read-only memory, ROM), with Machine accesses memory (random access memory, RAM), magnetic or disk etc..

It should be noted that, in this document, the terms "include", "comprise" or its any other variant are intended to non-row His property includes, so that the process, method, article or the device that include a series of elements not only include those elements, and And further include other elements that are not explicitly listed, or further include for this process, method, article or device institute it is intrinsic Element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that including being somebody's turn to do There is also other identical elements in the process, method of element, article or device.

Through the above description of the embodiments, those skilled in the art can be understood that above-described embodiment side Method can be realized by means of software and necessary general hardware platform, naturally it is also possible to by hardware, but in many cases The former is more preferably embodiment.Based on this understanding, technical solution of the present invention substantially in other words does the prior art The part contributed out can be embodied in the form of software products, which is stored in a storage medium In (such as ROM/RAM, magnetic disk, CD), including some instructions are used so that a terminal device (can be mobile phone, computer, clothes Business device, air conditioner or the network equipment etc.) execute method described in each embodiment of the present invention.

The embodiment of the present invention is described with above attached drawing, but the invention is not limited to above-mentioned specific Embodiment, the above mentioned embodiment is only schematical, rather than restrictive, those skilled in the art Under the inspiration of the present invention, without breaking away from the scope protected by the purposes and claims of the present invention, it can also make very much Form belongs within protection of the invention.

Claims

1. a kind of caption presentation method is applied to terminal device characterized by comprising

The voice characteristics information of target speech data is obtained, the voice characteristics information includes at least one of the following: the target Speech volume in voice data, the speech speed in the target speech data, the voice sound in the target speech data Voice tone in tune and the target speech data, the voice characteristics information are used to indicate the target speech data The tone of corresponding voice；

According to the voice characteristics information, determine that target display mode corresponding with the tone, the target display mode are used In display subtitle corresponding with the target speech data；

The subtitle is shown with the target display mode.

2. caption presentation method according to claim 1, which is characterized in that the target speech data includes the first voice In data and second speech data at least one of, first voice data be Streaming Media in voice data, described second Voice data is the voice data of terminal device acquisition.

3. caption presentation method according to claim 2, which is characterized in that the target speech data includes the first voice Data and second speech data, the voice characteristics information of first voice data are used to indicate first tone, second language The voice characteristics information of sound data is used to indicate second tone, and first tone is the corresponding voice of first voice data The tone, second tone be the corresponding voice of the second speech data the tone；

The target display mode includes the first display mode and the second display mode, and first display mode is and described the The corresponding display mode of one tone, second display mode are display mode corresponding with second tone；

It is described that the subtitle is shown with the target display mode, comprising:

The first subtitle is shown with first display mode, and the second subtitle is shown with second display mode, described first Subtitle is subtitle corresponding with first voice data, and second subtitle is word corresponding with the second speech data Curtain.

4. caption presentation method according to claim 3, which is characterized in that described to show with first display mode One subtitle, and the second subtitle is shown with second display mode, comprising:

First subtitle is shown with first display mode, and second subtitle is shown with second display mode, and The displaying target subtitle in a manner of being particularly shown, the target subtitle are identical word in first subtitle and second subtitle Curtain.

5. caption presentation method according to any one of claim 1 to 4, which is characterized in that the target display mode It includes at least one of the following:

The subtitle is shown to be particularly shown effect, and mark is added on the subtitle；It is described to be particularly shown effect and institute Mark is stated to be used to indicate the tone.

6. a kind of terminal device, which is characterized in that including obtaining module, determining module and display module；

The acquisition module, for obtaining the voice characteristics information of target speech data, the voice characteristics information includes following At least one of: the speech volume in the target speech data, the speech speed in the target speech data, the target language The voice tone in speech tone and the target speech data in sound data, the voice characteristics information are used to indicate The tone of the corresponding voice of the target speech data；

The determining module, the voice characteristics information for being obtained according to the acquisition module, the determining and tone pair The target display mode answered, the target display mode is for showing subtitle corresponding with the target speech data；

The display module, the target display mode for being determined with the determining module show the subtitle.

7. terminal device according to claim 6, which is characterized in that the target speech data includes the first voice data With at least one in second speech data, first voice data is the voice data in Streaming Media, second voice Data are the voice data of terminal device acquisition.

8. terminal device according to claim 7, which is characterized in that the target speech data includes the first voice data And second speech data, the voice characteristics information of first voice data are used to indicate first tone, the second voice number According to voice characteristics information be used to indicate second tone, first tone is the language of the corresponding voice of first voice data Gas, second tone are the tone of the corresponding voice of the second speech data；

The display module is specifically used for showing the first subtitle with first display mode, and with second display mode Show the second subtitle, first subtitle is subtitle corresponding with first voice data, second subtitle be with it is described The corresponding subtitle of second speech data.

9. terminal device according to claim 8, which is characterized in that the display module is specifically used for described first Display mode shows first subtitle, shows second subtitle with second display mode, and in a manner of being particularly shown Displaying target subtitle, the target subtitle are identical subtitle in first subtitle and second subtitle.

10. terminal device according to any one of claims 6 to 9, which is characterized in that the target display mode includes At least one of below: