CN105791087A

CN105791087A - Media segmentation method, and terminal

Info

Publication number: CN105791087A
Application number: CN201610109802.5A
Authority: CN
Inventors: 金妍敏; 钟婉平; 郭雷
Original assignee: Shenzhen Jinli Communication Equipment Co Ltd
Current assignee: Shenzhen Jinli Communication Equipment Co Ltd
Priority date: 2016-02-27
Filing date: 2016-02-27
Publication date: 2016-07-20

Abstract

An embodiment of the present invention provides a media segmentation method. The method includes the steps of receiving a segmentation operation for a media message, wherein the media message carries voice content, and the segmentation operation is used for segmenting the media message according to the voice content; acquiring a segmentation position where the segmentation operation acts on the media message; and analyzing a segmentation time point at which the media message is segmented and that is corresponding to the segmentation position, and segmenting the media message according to the voice content at the segmentation time point. According to the media segmentation method, the operation for segmenting the media message by a user is simplified.

Description

A kind of media division method and terminal

Technical field

The present invention relates to human-computer interaction technique field, particularly relate to a kind of media division method and terminal.

Background technology

Existing instant messaging application can provide voice communication mode easily for people.As it is shown in figure 1, in voice call interface 101, user can send speech message 102 mutually with contact person's first, particularly as follows: when user wants to send voice messaging to contact person's first, user can grow by recording control 103, trigger mobile phone recording；When user unclamps recording control 103, the speech message 102 comprising voice content can be sent to contact person's first by mobile phone automatically.Contact person's first is after receiving speech message 102, it is possible to play, by clicking speech message 102, the voice content wherein comprised.

Under application scenes, user needs preserve the voice content comprised in speech message or be shared with other users.When possible, user simply wants to preserve or share a part for this voice content.Such as, the whole duration of this voice content is 60 ", and user is actual think only to share above 30 ".At present, user needs again to enroll this above 30 " then content is then forwarded to other users, troublesome poeration, poor user experience.

Summary of the invention

Embodiments provide a kind of media division method and terminal, split, according to the cutting operation of user's input, the media information carrying voice content, user can be simplified and split the operation of described media information.

Embodiment of the present invention first aspect provides a kind of media division method, and the method includes:

Receive the cutting operation for media information；Described media information carries voice content；Described cutting operation is for splitting described media information according to described voice content；

Obtain described cutting operation and act on the split position on described media information；

Analyze some sliced time for splitting described media information that described split position is corresponding, and at point described sliced time, split described media information according to described voice content.

Embodiment of the present invention second aspect provides a kind of terminal, and described terminal includes:

Input block, for receiving the cutting operation for media information；Described media information carries voice content；Described cutting operation is for splitting described media information according to described voice content；

Acquiring unit, acts on the split position on described media information for obtaining described cutting operation；

Analytic unit, for analyzing some sliced time for splitting described media information that described split position is corresponding；

Cutting unit, at point described sliced time, splitting described media information according to described voice content.Implement the embodiment of the present invention, by receiving the cutting operation for media information, described media information carries voice content, described cutting operation is for splitting described media information according to described voice content, and obtain described cutting operation and act on the split position on described media information, then analyze some sliced time that described split position is corresponding, and at point described sliced time, split described media information according to described voice content, user can be simplified and split the operation of described media information.

Accompanying drawing explanation

In order to be illustrated more clearly that the technical scheme in the embodiment of the present invention, below the accompanying drawing used required during embodiment is described is briefly described, apparently, accompanying drawing in the following describes is some embodiments of the present invention, for those of ordinary skill in the art, under the premise not paying creative work, it is also possible to obtain other accompanying drawing according to these accompanying drawings.

Fig. 1 is the schematic diagram of the chat interface that the present embodiments relate to；

Fig. 2 is the schematic flow sheet of the media division method that the embodiment of the present invention provides；

Fig. 3 A-3B is a kind of operation chart splitting media information that the embodiment of the present invention provides；

Fig. 3 C is the operation chart of the another kind segmentation media information that the embodiment of the present invention provides；

Fig. 4 is the schematic diagram of the nearest segmentation position searching described cutting operation that the embodiment of the present invention provides；

Fig. 5 A is a kind of method schematic diagram determining segmentation position that the embodiment of the present invention provides；

Fig. 5 B is the method schematic diagram that the another kind that the embodiment of the present invention provides determines segmentation position；

Fig. 6 is the structural representation of the first embodiment of the terminal that the embodiment of the present invention provides；

Fig. 7 is the structural representation of the second embodiment of the terminal that the embodiment of the present invention provides；

Fig. 8 is the structural representation of the 3rd embodiment of the terminal that the embodiment of the present invention provides.

Detailed description of the invention

Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is clearly and completely described, it is clear that described embodiment is a part of embodiment of the present invention, rather than whole embodiments.Based on the embodiment in the present invention, the every other embodiment that those of ordinary skill in the art obtain under not making creative work premise, broadly fall into the scope of protection of the invention.

The present invention can be realized by mobile terminal, it is also possible to is realized by the computing equipment such as PC, the network equipment.Illustrate for mobile terminal below.

Preferably, the solution of the present invention can be realized by a kind of APP installing and running in mobile terminal.Concrete can be realized by the APP of a running background all the time such as, the solution of the present invention.Further, the solution of the present invention can be integrated in MSN as a sub-function module.

At this, mobile terminal includes but not limited to any hand-held electronic product based on intelligent operating system, it can carry out man-machine interaction with user by input equipments such as keyboard, dummy keyboard, touch pad, touch screen and voice-operated devices, such as smart mobile phone, panel computer etc..Wherein, intelligent operating system includes but not limited to any operating system by providing various Mobile solution to enrich functions of the equipments to mobile equipment, such as Android (Android), IOS, WindowsPhone etc..

Referring to Fig. 2, it it is the flow chart of the media division method that the embodiment of the present invention provides.As in figure 2 it is shown, the method includes:

Step S101: receive the cutting operation for media information.

Step S103: obtain described cutting operation and act on the split position on described media information.

Step S105: analyze some sliced time for splitting described media information that described split position is corresponding.

Step S107: at point described sliced time, splits described media information according to described voice content.

In the embodiment of the present invention, described media information can be speech message, it is also possible to is the other kinds of media information carrying voice content, for instance video messaging, is not limited as here.

In the embodiment of the present invention, the voice duration positive correlation of the voice content that the display length of described media information and described media information carry.Such as, as it is shown in figure 1, described media information is speech message, voice duration is 58 the speech message of " speech message be 38 than voice duration " the display length on interface is longer.Furthermore it is possible to understand, if described media information is video messaging, due to the synchronicity of image player Yu speech play, then the voice duration of the voice content that video messaging carries can also with the display length positive correlation on interface of this video messaging.Such as, as it is shown in figure 1, described media information is video messaging, voice duration is 58 the video messaging of " video messaging be 38 than voice duration " the display length on interface is longer.In the embodiment of the present invention, described cutting operation can be used for splitting described media information according to described voice content.The split position that described cutting operation acts on described media information is corresponding with some sliced time splitting the voice content that described media information carries.Generally, described sliced time, point referred to the time point intercepting described voice content.Such as, 58 for duration " media information; if described sliced time point be: 25 ", then represent that 25 the media information of " position be 58 by described duration " intercepts into 2 cross-talk media informations, wherein the duration of a cross-talk media information be 25 " (i.e. 0-25 "), the duration of another cross-talk media information is 33 " (namely 26 "-58 ").

In implementing, described cutting operation comprises the steps that mouse action, slide, somatosensory operation etc..Such as, described cutting operation can be user to be controlled cursor by mouse and moves the operation splitting described media information, and the intersection point of described smooth target motion track and described media information can be described split position.Again such as, described cutting operation can be the slide that user passes through the touch point such as finger or the pointer described media information of segmentation, and the intersection point of described sliding trace and described media information can be described split position.Again such as, described cutting operation can be the somatosensory operation (such as dividing gesture) that user splits described media information, and it can be described split position that described dividing gesture is mapped in the intersection point of the motion track on screen and described media information.

Example is only a kind of implementation of the embodiment of the present invention, can be different in practical application, should not constitute restriction.

Below in conjunction with Fig. 3 A-3C, for slide, describe the present invention program in detail.

In a kind of implementation of the present invention, according to Fig. 3 A-3B, described cutting operation can be 2 touch control operations, i.e. corresponding 2 touch points of described cutting operation, for instance the finger contact point 1 shown in accompanying drawing and finger contact point 2.Needing explanation, touch point described here can also is that what pointer etc. produced, is not limited to the finger of user.

In the embodiment of the present invention, if described cutting operation is 2 touch control operations, then corresponding for the middle point vertical of described 2 the touch point lines position on described media information can be defined as described split position.

Such as, the cutting operation shown in Fig. 3 A, finger contact point 1 and finger contact point 2 slide respectively laterally, are similar to the amplifieroperation of 2 touch points.In implementing, finger contact point 1 and the central point that finger contact point 2 can be that 2 fingers contact with contactor control device (such as touch screen) respectively.As shown in Figure 3 B, described cutting operation acts on split position corresponding on described media information can be finger contact point 1 and the center of finger contact point 2 line, i.e. midpoint.

Due to the voice duration positive correlation of the display length of the described media information voice content corresponding with described media information, therefore, it can analyze some sliced time for splitting described voice content that described split position is corresponding.

Assume, described media information is speech message, and the voice duration of described voice content is T, if described split position is in m/n (value of m/n more than 0 less than the 1) place of the length of described speech message, so, some sliced time that described split position is corresponding is: T*m/n.Such as, as shown in Figure 3 B, the voice duration of described voice content is 50 ", described split position is in 7/10 place of the length of described speech message, then described split position is corresponding sliced time, point was: (7/10) * 50 ", namely 35 ".In other words, cutting operation shown in Fig. 3 A and Fig. 3 B for by described voice content 35 " place is split; the voice content comprised by described speech message is divided into 2 section audios; wherein the duration of a section audio be 35 " (i.e. 0-35 "), the duration of another section audio is 15 " (namely 36 "-50 ").

In implementing, if described media information is the video messaging carrying voice content, then, the cutting operation shown in Fig. 3 A-3B can be also used for splitting described video messaging.Such as, video messaging shown in Fig. 3 A and Fig. 3 B carries voice content, split position shown in the drawings is at 35 " places; so; cutting operation shown in the drawings can also " place carries out being divided into 2 cross-talk videos 35 by video corresponding for described video messaging, wherein the duration of a cross-talk video be 35 " (i.e. 0-35 "), the duration of another cross-talk video is 15 " (namely 36 "-50 ").

Example is used only for explaining the embodiment of the present invention, should not constitute restriction.

Needing explanation, except the direction that the finger contact point 1 shown in Fig. 3 A and Fig. 3 B and finger contact point 2 slide in and out, the glide direction of described 2 touch points is not limited as by the embodiment of the present invention.Such as, finger contact point 1 and finger contact point 2 can also slid inward, be similar to the reduction operation of 2 touch points.

In the another kind of implementation of the present invention, according to Fig. 3 C, described cutting operation can be the unidirectional slide intersected with described media information, then the position that the sliding trace of described cutting operation intersects with described media information can be defined as described split position.Such as, such as Fig. 3 C, sliding trace and described media information intersect at 1/2 place of the length of described media information, then described split position is positioned at 1/2 place of the length of described media information.

Identical with the implementation shown in Fig. 3 A-3B, the voice duration assuming described voice content is T, if described split position is in m/n (value of m/n more than 0 less than the 1) place of the length of described media information, then, some sliced time that described split position is corresponding is: T*m/n.Such as, such as Fig. 3 C, the voice duration of described voice content is 50 ", described split position is positioned at 1/2 place of the length of described media information, then described split position is corresponding sliced time, point was: (1/2) * 50 ", namely 25 ".

Concrete that need explanation, if described media information is the video messaging carrying voice content, then, the unidirectional slide shown in Fig. 3 C is possible not only to split the voice content that described video messaging carries, it is also possible to split the video that described video messaging is corresponding.

Need to illustrate, except parallel with the described media information unidirectional slide for splitting described media information, the described cutting operation that the present embodiments relate to can corresponding arbitrary glide direction, the embodiment of the present invention is not limited as.

Further, in the specific implementation, in order to avoid described cutting operation and common unidirectional slide (such as the chat interface that slides up or down) are obscured, can be under selected state at described media information, the unidirectional slide intersected with described media information is defined as described cutting operation.Concrete, user can by long by choosing described media information, it is also possible to chooses described media information by other means, is not limited as here.Need explanation, in practical application, described cutting operation and common unidirectional slide can also be distinguished by other means, the such as press pressure distinguished when sliding distinguishes described cutting operation and common unidirectional slide, it is possible to the unidirectional slide that press pressure exceedes preset pressure threshold value is identified as the described cutting operation for splitting described media information.Example is used only for explaining the embodiment of the present invention, should not constitute restriction.

In another implementation of the present invention, implementation corresponding for Fig. 3 A-3B implementation corresponding with Fig. 3 C can be applied in the application scenarios that described cutting operation is somatosensory operation.

Concrete, cutting operation shown in Fig. 3 A-3B can be somatosensory operation, namely user can be similar to the cutting operation of amplifieroperation by gesture input, without directly contacting touch screen, make terminal can respond the input of this gesture, synchronize on screen, demonstrate the movement locus that the input of this gesture is corresponding.In implementing, described split position can be the central point on two application points (as the two of the user being mapped on screen palms) line.

Concrete, it is referred to implementation corresponding for above-mentioned 3A-3B to analyze some sliced time that described split position is corresponding, repeats no more here.

Same, cutting operation shown in Fig. 3 C can also be somatosensory operation, namely user can pass through the cutting operation being similar to unidirectional slip that gesture input is intersected with described media information, without directly contacting touch screen, make terminal can respond the input of this gesture, synchronize on screen, demonstrate the movement locus that the input of this gesture is corresponding.In implementing, described split position can be the position intersected of this movement locus and described media information.

Concrete, it is referred to implementation corresponding for above-mentioned 3C to analyze some sliced time that described split position is corresponding, repeats no more here.

In another implementation of the present invention, implementation corresponding for Fig. 3 C can be applied in the application scenarios that described cutting operation is mouse action.That is, user can be moved by mouse control cursor and produce described cutting operation.Described smooth target motion track can be similar with the cutting operation of described unidirectional slip, intersects with described media information.Concrete, described split position can be the position that described cursor motion track intersects with described media information.

It should be understood that voice content corresponding to described media information can comprise multiple linguistic unit.Here, described linguistic unit may refer to have independent semantic unit, for instance, a word, a phrase or a sentence etc..In order to avoid segmentation is likely to destroy the integrity of statement, affect hearing user identification, terminal can determine that the segmentation position that described voice content comprises, after receiving described cutting operation, terminal can find out the segmentation position that described split position is nearest from the segmentation position that described voice content comprises, and in described nearest segmentation position, described media information is split.Wherein, described segmentation position can be used for 2 adjacent linguistic units dividing in described voice content.In other words, some sliced time that described split position is corresponding can drop between 2 adjacent linguistic units, rather than drops on inside linguistic unit.Here, described segmentation position is the same with point described sliced time, belongs to time concept.

Such as, as shown in Figure 4, some sliced time that described split position is corresponding is likely to just drop on described segmentation position, it is also possible to drop on the inside of described linguistic unit (such as sentence 4).In order to not affect the readability of the audio frequency after segmentation, if some sliced time corresponding to described split position drops on the inside of sentence 4, then, it is possible at segmentation position 3 place nearest from described split position, described voice content is split.Example is used only for explaining the embodiment of the present invention, should not constitute restriction.

Understandable, if described media information is video messaging, due to the synchronicity of image player Yu speech play, above-mentioned in described nearest segmentation position, the method being undertaken splitting by described media information, it is also possible to the integrity of the meaning expressed by sub-video obtained after guaranteeing segmentation.Such as, will not by an action, as nodded, corresponding complete video section is decomposed.Example is used only for explaining the embodiment of the present invention, should not constitute restriction.

In the embodiment of the present invention, in order to determine the segmentation position that described voice content comprises, first terminal may identify which out the volume height in each moment in described voice content.As indicated by figures 5 a-5b, transverse axis is the time, and the longitudinal axis is volume.Generally, the moment that volume is low is generally the dead time spoken.Generally, the pause between phrase and phrase is shorter than the pause between sentence and sentence, the pause between phrase and phrase again of the pause between word and word.

In one implementation, as shown in Figure 5A, volume can be defined as described segmentation position lower than the moment of the first volume threshold by terminal.Here, described first volume threshold can be empirically derived, for distinguishing the pause between phrase and phrase, or the pause between word and word, or the pause between word and phrase.Adopt this implementation, it is possible to find from the segmentation position close to described split position so that the divided physical location of described voice content is more nearly user and performs the position of described cutting operation.

In another kind of implementation, as shown in Figure 5 B, volume can be defined as described segmentation position lower than the time period of the second volume threshold and persistent period overtime threshold value by terminal.In this implementation, described segmentation position includes at least 2 moment.Here, described second volume threshold and described time threshold can be empirically derived, for distinguishing the pause between sentence and sentence language.Adopt this implementation, it is possible to after making described voice content divided, obtained consonant frequency range is closer to complete sentence, it is ensured that the integrity of sentence.Needing explanation, by controlling the size of described time threshold, this implementation may also be used for distinguishing the pause between phrase and phrase or the pause between word and word or the pause between word and phrase.Such as, described time threshold is 0.1 second, for distinguishing the pause between word and word；Described time threshold is 0.5 second, for distinguishing the pause between phrase and phrase；Described time threshold is 1 second, for distinguishing the pause between sentence and sentence.Example is used only for explaining the embodiment of the present invention, should not constitute restriction.

Need explanation, in practical application, it is also possible to determine, by other speech recognition technologies, the segmentation position that described voice content comprises, be not limited as here.

Implement the embodiment of the present invention, by receiving the cutting operation for media information, described media information carries voice content, described cutting operation is for splitting described media information according to described voice content, and obtain described cutting operation and act on the split position on described media information, then analyze some sliced time that described split position is corresponding, and at point described sliced time, split described media information according to described voice content, user can be simplified and split the operation of described media information.

Referring to Fig. 6, being the structural representation of the first embodiment of terminal provided by the invention, terminal 60 as shown in Figure 6 comprises the steps that input block 601, acquiring unit 603, analytic unit 605 and cutting unit 607.Wherein:

Input block 601, for receiving the cutting operation for media information；Described media information carries voice content；Described cutting operation is for splitting described media information according to described voice content；

Acquiring unit 603, acts on the split position on described media information for obtaining described cutting operation；

Analytic unit 605, for analyzing some sliced time for splitting described media information that described split position is corresponding；

Cutting unit 607, at point described sliced time, splitting described media information according to described voice content.

In the embodiment of the present invention, described media information can be speech message, it is also possible to is the other types media information carrying voice content, for instance video messaging, is not limited as here.

In the embodiment of the present invention, the voice duration positive correlation of the voice content that the display length of described media information and described media information carry.Furthermore it is possible to understand, if described media information is video messaging, due to the synchronicity of image player Yu speech play, then the voice duration of the voice content that video messaging carries can also with the display length positive correlation on interface of this video messaging.

In the embodiment of the present invention, described cutting operation can be used for splitting described media information according to described voice content.The split position that described cutting operation acts on described media information is corresponding with some sliced time splitting the voice content that described media information carries.Generally, described sliced time, point referred to the time point intercepting described voice content.Such as, 58 for duration " media information; if described sliced time point be: 25 ", then represent that 25 the media information of " position be 58 by described duration " intercepts into 2 cross-talk media informations, wherein the duration of a cross-talk media information be 25 " (i.e. 0-25 "), the duration of another cross-talk media information is 33 " (namely 26 "-58 ").

In a kind of implementation of the present invention, if the described cutting operation that input block 601 receives is 2 touch control operations, then acquiring unit 603 can be specifically for: corresponding for the middle point vertical of described 2 the touch point lines position on described media information is defined as described split position.It is referred to the related content in Fig. 2 embodiment about acquiring unit 603 detail in this implementation, repeats no more here.

Needing explanation, touch point described here can also is that what pointer etc. produced, is not limited to the finger of user.

In the another kind of implementation of the present invention, if the described cutting operation that input block 601 receives is the unidirectional slide intersected with described media information, then acquiring unit 603 can be specifically for: the position that the sliding trace of described cutting operation intersects with described media information is defined as described split position.Such as, described sliding trace and described media information intersect at 1/2 place of the length of described media information, then described split position is positioned at 1/2 place of the length of described media information.It is referred to the related content in Fig. 2 embodiment about acquiring unit 603 detail in this implementation, repeats no more here.

Due to the voice duration positive correlation of the display length of the described media information voice content corresponding with described media information, therefore, analytic unit 605 can analyze some sliced time for splitting described voice content that described split position is corresponding.

Assume, described media information is speech message, and the voice duration of described voice content is T, if described split position is in m/n (value of m/n more than 0 less than the 1) place of the length of described speech message, so, some sliced time that described split position is corresponding is: T*m/n.

In implementing, if described media information is the video messaging carrying voice content, then, described cutting operation can be not only used for splitting the voice content that described video messaging carries, it is also possible to for splitting the video that described video messaging is corresponding.

It should be understood that voice content corresponding to described media information can comprise multiple linguistic unit.Here, described linguistic unit may refer to have independent semantic unit, for instance, a word, a phrase or a sentence etc..

In order to avoid segmentation is likely to destroy the integrity of statement, affect hearing user identification, as shown in Figure 7, terminal 60 also can further comprise determining that unit 609, for determining the segmentation position that described voice content comprises, wherein, described segmentation position can be used for 2 adjacent linguistic units dividing in described voice content.

Concrete, cutting unit 607 can be used for: finds out the segmentation position that described split position is nearest from the segmentation position that described voice content comprises, and in described nearest segmentation position, is split by described media information.In other words, some sliced time that described split position is corresponding can drop between 2 adjacent linguistic units, rather than drops on inside linguistic unit.Here, described segmentation position is the same with point described sliced time, belongs to time concept.

Understandable, if described media information is video messaging, synchronicity due to image player Yu speech play, cutting unit 607 is in described nearest segmentation position, the method being undertaken splitting by described voice content, it can also be ensured that the integrity of the meaning expressed by sub-video that described cutting operation obtains after can ensure that segmentation.

As shown in Figure 7, it is determined that unit 609 also can further include: recognition unit the 6091, first segmentation position determination unit 6093 or the second segmentation position determination unit 6095, wherein:

Recognition unit 6091, can be used for identifying the volume height in each moment in described voice content；

First segmentation position determination unit 6093, can be used for lower than the moment of the first volume threshold, volume is defined as described segmentation position；

Second segmentation position determination unit 6095, can be used for lower than the time period of the second volume threshold and duration overtime threshold value, volume is defined as described segmentation position；Described segmentation position section includes at least 2 moment.

It will be appreciated that the specific implementation of each functional unit is referred to the detailed content in Fig. 2 embodiment of the method in terminal 60, repeat no more here.

For the ease of implementing the embodiment of the present invention, the invention provides a kind of terminal, for realizing the method described in Fig. 2 embodiment.

Referring to Fig. 8, terminal 100 comprises the steps that baseband chip 100, memorizer 105 (can include one or more computer-readable recording medium), radio frequency (RF) module 106, peripheral system 107, display (LCD) 113, photographic head 114, voicefrequency circuit 115, touch screen 116 and sensor 117 (can include one or more sensor).Wherein, baseband chip 100 can integrated include: one or more processors 101, clock module 102 and power management module 103.These parts can communicate on one or more communication bus 104.

Should be appreciated that terminal 100 is only an example of the present invention, and, terminal 100 can have parts more more or less of than the parts illustrated, it is possible to combines two or more parts, or the different configurations can with parts realize.

Memorizer 105 couples with processor 101, is used for storing various software program and/or organizing instruction more.In implementing, memorizer 105 can include the memorizer of high random access, and may also comprise nonvolatile memory, for instance one or more disk storage equipment, flash memory device or other non-volatile solid-state memory devices.

Radio frequency (RF) module 106 is used for receiving and sending radiofrequency signal.Radio frequency (RF) module 106 is by radiofrequency signal and communication network and other communication apparatus communications.In implementing, radio frequency (RF) module 106 may include but be not limited to: antenna system, RF transceiver, one or more amplifier, tuner, one or more agitator, digital signal processor, CODEC chip, SIM and storage medium etc..In certain embodiments, radio frequency (RF) module 106 can be realized on a separate chip.

Peripheral system 107 is mainly used in realizing the interactive function between terminal 100 and user/external environment condition.In implementing, peripheral system 107 comprises the steps that display (LCD) controller 108, photographic head controller 109, Audio Controller 110, touch screen controller 111 and sensor management module 112.Wherein, each controller can couple with each self-corresponding ancillary equipment.In certain embodiments, peripheral system 107 can also include the controller of other I/O peripheral hardwares.

The clock module 102 being integrated in baseband chip 100 is mainly used in the clock produced required for data transmission and sequencing contro for processor 101.The power management module 103 being integrated in baseband chip 100 is mainly used in providing voltage stable, pinpoint accuracy for processor 101, radio-frequency module 106 and peripheral system.The processor 101 being integrated in baseband chip 100 is mainly used in calling the voice segmentation procedure being stored in memorizer 105, and performs following steps:

The cutting operation for media information is received by input equipment such as touch screen 116 or photographic head 114；

In the embodiment of the present invention, the voice duration positive correlation of the voice content that the display length of described media information is corresponding with described media information.Furthermore it is possible to understand, if described media information is video messaging, due to the synchronicity of image player Yu speech play, then the voice duration of the voice content that video messaging carries can also with the display length positive correlation on interface of this video messaging.

In implementing, described cutting operation comprises the steps that mouse action, slide, somatosensory operation etc..Such as, described cutting operation can be the intersection point that processor 101 moves operation, described smooth target motion track and described media information by the cursor for splitting described media information that mouse receives can be described split position.Again such as, described cutting operation can the touch-control slide for splitting described media information that received by touch screen 116 of processor 101, the intersection point of described sliding trace and described media information can be described split position.Again such as, described cutting operation can be the somatosensory operation (such as dividing gesture) for splitting described media information that processor 101 is received by photographic head 114, and it can be described split position that described dividing gesture is mapped in the intersection point of the motion track on screen and described media information.

In a kind of implementation of the present invention, described cutting operation can be 2 touch control operations, i.e. corresponding 2 touch points of described cutting operation.Touch point described here can also is that what pointer etc. produced, is not limited to the finger of user.

In the embodiment of the present invention, if described cutting operation is 2 touch control operations, then corresponding for the middle point vertical of described 2 the touch point lines position on described media information can be defined as described split position by processor 101.

In the another kind of implementation of the present invention, described cutting operation can be the unidirectional slide intersected with described media information, then the position that the sliding trace of described cutting operation intersects with described media information can be defined as described split position by processor 101.

Further, in the specific implementation, in order to avoid described cutting operation and common unidirectional slide (such as the chat interface that slides up or down) are obscured, processor 101 can be under selected state at described media information, and the unidirectional slide intersected with described media information is defined as described cutting operation.Concrete, user can by long by choosing described media information, it is also possible to chooses described media information by other means, is not limited as here.

Need explanation, in practical application, processor 101 can also distinguish described cutting operation and common unidirectional slide by other means, the such as press pressure distinguished when sliding distinguishes described cutting operation and common unidirectional slide, and press pressure can be exceeded the unidirectional slide of preset pressure threshold value and be identified as the described cutting operation for splitting described media information by processor 101.Example is used only for explaining the embodiment of the present invention, should not constitute restriction.

In implementing, if described media information is the video messaging carrying voice content, so, the described cutting operation that the present embodiments relate to can be not only used for splitting the voice content that described video messaging carries, it is also possible to for splitting the video that described video messaging is corresponding.

It should be understood that the voice content that described media information carries can comprise multiple linguistic unit.Here, described linguistic unit may refer to have independent semantic unit, for instance, a word, a phrase or a sentence etc..In order to avoid segmentation is likely to destroy the integrity of statement, affect hearing user identification, processor 101 can determine that the segmentation position that described voice content comprises, after receiving described cutting operation, processor 101 can find out the segmentation position that described split position is nearest from the segmentation position that described voice content comprises, and in described nearest segmentation position, described media information is split.Wherein, described segmentation position can be used for 2 adjacent linguistic units dividing in described voice content.In other words, some sliced time that described split position is corresponding can drop between 2 adjacent linguistic units, rather than drops on inside linguistic unit.Here, described segmentation position is the same with point described sliced time, belongs to time concept.

Understandable, if described media information is video messaging, due to the synchronicity of image player Yu speech play, processor 101 is in described nearest segmentation position, the method being undertaken splitting by described media information, it is also possible to the integrity of the meaning expressed by sub-video obtained after guaranteeing segmentation.

In one implementation, volume can be defined as described segmentation position lower than the moment of the first volume threshold by processor 101.Here, described first volume threshold can be empirically derived, for distinguishing the pause between phrase and phrase, or the pause between word and word, or the pause between word and phrase.Adopt this implementation, it is possible to find from the segmentation position close to described split position so that the divided physical location of described voice content is more nearly user and performs the position of described cutting operation.

In another kind of implementation, volume can be defined as described segmentation position lower than the time period of the second volume threshold and persistent period overtime threshold value by processor 101.In this implementation, described segmentation position includes at least 2 moment.Here, described second volume threshold and described time threshold can be empirically derived, for distinguishing the pause between sentence and sentence language.Adopt this implementation, it is possible to after making described voice content divided, obtained consonant frequency range is closer to complete sentence, it is ensured that the integrity of sentence.Needing explanation, by controlling the size of described time threshold, this implementation may also be used for distinguishing the pause between phrase and phrase or the pause between word and word or the pause between word and phrase.Such as, described time threshold is 0.1 second, for distinguishing the pause between word and word；Described time threshold is 0.5 second, for distinguishing the pause between phrase and phrase；Described time threshold is 1 second, for distinguishing the pause between sentence and sentence.Example is used only for explaining the embodiment of the present invention, should not constitute restriction.

It will be appreciated that the step that processor 101 performs with reference to the specific implementation in above-mentioned Fig. 2 embodiment of the method, can also repeat no more here.

In sum, implement the embodiment of the present invention, by receiving the cutting operation for media information, described media information carries voice content, described cutting operation is for splitting described media information according to described voice content, and obtain described cutting operation and act on the split position on described media information, then some sliced time that described split position is corresponding is analyzed, and at point described sliced time, split described media information according to described voice content, user can be simplified and split the operation of described media information.

Module in all embodiments of the invention or submodule, universal integrated circuit can be passed through, such as CPU (CentralProcessingUnit, central processing unit), or realized by ASIC (ApplicationSpecificIntegratedCircuit, special IC).

The sequence of steps of the method for the embodiment of the present invention can be adjusted according to actual needs, merges or delete.The unit of the terminal of the embodiment of the present invention, module can carry out integrating according to actual needs, Further Division or delete.

One of ordinary skill in the art will appreciate that all or part of flow process realizing in above-described embodiment method, can be by the hardware that computer program carrys out instruction relevant to complete, described program can be stored in a computer read/write memory medium, this program is upon execution, it may include such as the flow process of the embodiment of above-mentioned each side method.Wherein, described storage medium can be magnetic disc, CD, read-only store-memory body (Read-OnlyMemory, ROM) or random store-memory body (RandomAccessMemory, RAM) etc..

Above disclosed it is only present pre-ferred embodiments, certainly can not limit the interest field of the present invention, the equivalent variations therefore made according to the claims in the present invention with this, still belong to the scope that the present invention contains.

Claims

1. a media division method, it is characterised in that including:

2. the method for claim 1, it is characterised in that also include:

Determining the segmentation position that described voice content comprises, wherein, described segmentation position is for dividing 2 adjacent linguistic units in described voice content；

Described according to the described voice content described media information of segmentation, including:

The segmentation position nearest with described split position is found out from the segmentation position that described voice content comprises；

In described nearest segmentation position, described media information is split.

3. method as claimed in claim 2, it is characterised in that described in determine the segmentation position that described voice content comprises, including:

Identify the volume height in each moment in described voice content；

Volume is defined as described segmentation position lower than the moment of the first volume threshold, or volume is defined as described segmentation position lower than the time period of the second volume threshold and duration overtime threshold value；Described segmentation position section includes at least 2 moment.

4. the method for claim 1, it is characterised in that the described cutting operation of described acquisition acts on the split position on described media information, including:

If corresponding 2 touch points of described cutting operation, then corresponding for the middle point vertical of described 2 the touch point lines position on described media information is defined as described split position.

5. the method for claim 1, it is characterised in that the described cutting operation of described acquisition acts on the split position on described media information, including:

If described cutting operation is the unidirectional slide intersected with described media information, then the position that the sliding trace of described cutting operation intersects with described media information is defined as described split position.

6. a terminal, it is characterised in that including:

Cutting unit, at point described sliced time, splitting described media information according to described voice content.

7. terminal as claimed in claim 6, it is characterised in that also comprise determining that unit, for determining the segmentation position that described voice content comprises, wherein, described segmentation position is for dividing 2 adjacent linguistic units in described voice content；

Described cutting unit, specifically for: from the segmentation position that described voice content comprises, find out the segmentation position that described split position is nearest；In described nearest segmentation position, media information is split.

8. terminal as claimed in claim 7, it is characterised in that described determine unit, specifically includes: recognition unit, the first segmentation position determination unit or the second segmentation position determination unit, wherein:

Described recognition unit, for identifying the volume height in each moment in described voice content；

First segmentation position determination unit, for being defined as described segmentation position by volume lower than the moment of the first volume threshold；

Second segmentation position determination unit, for being defined as described segmentation position by volume lower than the time period of the second volume threshold and duration overtime threshold value；Described segmentation position section includes at least 2 moment.

9. terminal as claimed in claim 6, it is characterized in that, described acquiring unit, specifically for: if corresponding 2 touch points of described cutting operation, then corresponding for the middle point vertical of described 2 the touch point lines position on described media information is defined as described split position.

10. terminal as claimed in claim 6, it is characterized in that, described acquiring unit, specifically for: if described cutting operation is the unidirectional slide intersected with described media information, then the position that the sliding trace of described cutting operation intersects with described media information is defined as described split position.