CN106357929A

CN106357929A - Previewing method based on audio file and mobile terminal

Info

Publication number: CN106357929A
Application number: CN201610991972.0A
Authority: CN
Inventors: 李光宇
Original assignee: Nubia Technology Co Ltd
Current assignee: Nubia Technology Co Ltd
Priority date: 2016-11-10
Filing date: 2016-11-10
Publication date: 2017-01-25

Abstract

The invention discloses a previewing method based on an audio file and a mobile terminal. The mobile terminal comprises a segmentation module, an acquiring module and a previewing module, wherein the segmentation module is used for segmenting the audio file according to a blank part of the audio file or in an equivalent segmenting manner so as to acquire a voice segment; the acquiring module is used for confirming visual description information of the voice segment according to the content of the voice segment; the previewing module is used for displaying the visual description information of the voice segment as an identifier of the corresponding voice segment. According to the technical scheme provided by the invention, quick index listening test and quick film editing of the voice can be realized, the function of the terminal products can be added and the audio finding efficiency can be increased.

Description

A kind of method for previewing based on voice document and mobile terminal

Technical field

The present invention relates to technical field of mobile terminals, more particularly, to a kind of method for previewing based on voice document and movement are eventually End.

Background technology

Present mobile terminal can very easily carry out work of recording, but is related to the record longer to recording duration Sound file execution playback operation when it is still desirable to useful content is carried out using slider bar with the retrieval of content, and in Between empty content it is impossible to effective exclude.Very low can be become to the screening efficiency of useful content.Need repeatedly to drag slider bar Carry out audition and just can find the content wanted.If user is want to carry out editing to some useful parts, typically require specially The instrument of industry and need expend longer time.

Content of the invention

Present invention is primarily targeted at proposing a kind of method for previewing based on voice document and mobile terminal it is intended to solve It is directed to the difficult problem of voice document preview in prior art.

For achieving the above object, a kind of mobile terminal that the present invention provides, comprising:

Segmentation module, for the blank parts according to voice document or the mode according to decile, is carried out to voice document Segmentation obtains sound bite；

Acquisition module, for determining the visual description information of described sound bite according to the content of sound bite；

Previewing module, for being shown the visual description information of each sound bite as the mark of corresponding sound bite Show.

Optionally, described segmentation module, comprising:

Blank segmenting unit, for the blank parts reaching setting duration according to included in voice document, by voice File division becomes sound bite；Or,

Unit segmenting unit, at set time intervals or set individual voice clip size, by voice File division becomes sound bite.

Optionally, in the case that described segmentation module adopts described blank segmenting unit, described previewing module, also use In: hide the mark of the corresponding sound bite of described blank parts.

Optionally, described acquisition module, comprising:

Character acquiring unit, for being converted into corresponding character string by sound bite；From the corresponding character string of sound bite In extract character string summary, using described character string summary as sound bite visual description information；Or,

Image acquisition unit, in the case of coming from audio-video document in institute's voice file, from described audio frequency and video A picture frame is extracted as the visual description information of sound bite in video file corresponding with sound bite in file.

Optionally, described visual description information, also includes: the residing time start stop bit in voice document of sound bite Put.Optionally, described device, also includes:

Processing module, for receiving and responding the operational order for the mark of sound bite each described, described operation Instruction include: choose, non-selected, delete, sequence or play.

Optionally, described processing module, is additionally operable to:

Receive the preservation instruction of the mark of all described sound bite for display, will be shown based on described preservation instruction All described sound bite generate voice clip files.

Optionally, described processing module, in the case of being to play in described operational order, if described voice is civilian Part comes from audio-video document, then in described audio-video document, video file corresponding with sound bite is together broadcast with sound bite Put.

Additionally, for achieving the above object, the present invention also proposes a kind of method for previewing based on voice document, comprising:

Blank parts according to voice document or the mode according to decile, carry out segmentation to voice document and obtain voice sheet Section；

Determine the visual description information of described sound bite according to the content of sound bite；

The visual description information of each sound bite is shown as the mark of corresponding sound bite.

Optionally, the described blank parts according to voice document or the mode according to decile, is carried out to voice document point Section obtains sound bite, comprising:

The blank parts reaching setting duration according to included in voice document, voice document is divided into voice sheet Section；Or,

At set time intervals or set individual voice clip size, voice document is divided into voice sheet Section.

Optionally, voice document is divided into by the blank parts setting duration in reaching according to included in voice document In the case of sound bite, methods described, also include:

Hide the mark of the corresponding sound bite of described blank parts.

Optionally, the described content according to sound bite determines the visual description information of described sound bite, comprising:

Sound bite is converted into corresponding character string；Extract character string from the corresponding character string of sound bite to pluck Will, described character string is made a summary as the visual description information of sound bite；Or,

In the case that institute's voice file comes from audio-video document, with sound bite pair from described audio-video document A picture frame is extracted as the visual description information of sound bite in the video file answered.

Optionally, described visual description information, also includes: the residing time start stop bit in voice document of sound bite Put.

Optionally, methods described, also includes:

Receive and respond the operational order for the mark of sound bite each described, described operational order includes: choose, Non-selected, delete, sequence or play.

Optionally, methods described, also includes:

Optionally, in the case of being to play in described operational order, if institute's voice file comes from audio-video document, In described audio-video document, video file corresponding with sound bite is together play with sound bite.

Method for previewing based on voice document proposed by the present invention and mobile terminal, carry out fragment to voice document and disassemble, Blank fragment can be excluded, and present by way of specific triggering mode and human-computer interaction interface are on screen with bubble The visual description information of voice document content, is facilitated user to be quickly found out fragment interested and goes audition, selected by click Select and instead select and by fragment sequence is carried out with the permutation and combination of fragment, ultimately generate the sound clip after arrangement and restructuring, Achieve the quick-searching audition of voice and quick editing, increased end product function, improve the efficiency of audio frequency lookup.

Brief description

Fig. 1 is the hardware architecture diagram of an optional mobile terminal realizing each embodiment of the present invention；

Fig. 2 is the wireless communication system schematic diagram of mobile terminal as shown in Figure 1；

Fig. 3 is the situation schematic diagram that is held by a user of mobile terminal of various embodiments of the present invention；

Fig. 4 is a kind of composition structural representation of mobile terminal of first embodiment of the invention；

Fig. 5 is the composition structural representation of another kind of mobile terminal of first embodiment of the invention；

Fig. 6 is the composition structural representation of the mobile terminal of second embodiment of the invention；

Fig. 7 is the composition structural representation of the mobile terminal of third embodiment of the invention；

Fig. 8 is the method for previewing flow chart based on voice document of fourth embodiment of the invention；

Fig. 9 is the method for previewing flow chart based on voice document of fifth embodiment of the invention；

Figure 10 is the method for previewing flow chart based on voice document of sixth embodiment of the invention；

Figure 11 is the voice document preview effect diagram of seventh embodiment of the invention；

The realization of the object of the invention, functional characteristics and advantage will be described further in conjunction with the embodiments referring to the drawings.

Specific embodiment

It should be appreciated that specific embodiment described herein, only in order to explain the present invention, is not intended to limit the present invention.

Realize the mobile terminal of each embodiment of the present invention referring now to Description of Drawings.In follow-up description, use For represent element such as " module ", " part " or " unit " suffix only for being conducive to the explanation of the present invention, itself Not specific meaning.Therefore, " module " and " part " can mixedly use.

Mobile terminal can be implemented in a variety of manners.For example, the terminal described in the present invention can include such as moving Phone, smart phone, notebook computer, digit broadcasting receiver, pda (personal digital assistant), pad (panel computer), pmp The mobile terminal of (portable media player), guider etc. and such as numeral tv, desk computer etc. consolidate Determine terminal.Hereinafter it is assumed that terminal is mobile terminal.However, it will be understood by those skilled in the art that, except being used in particular for moving Outside the element of purpose, construction according to the embodiment of the present invention can also apply to the terminal of fixed type.

Fig. 1 is that the hardware configuration of an optional mobile terminal realizing each embodiment of the present invention is illustrated.

Mobile terminal 1 00 can include wireless communication unit 110, a/v (audio/video) input block 120, user input Unit 130, sensing unit 140, output unit 150, memorizer 160, interface unit 170, controller 180 and power subsystem 190 Etc..Fig. 1 shows the mobile terminal with various assemblies, it should be understood that being not required for implementing all groups illustrating Part.More or less of assembly can alternatively be implemented.Will be discussed in more detail below the element of mobile terminal.

Wireless communication unit 110 generally includes one or more assemblies, and it allows mobile terminal 1 00 and wireless communication system Or the radio communication between network.For example, wireless communication unit can include broadcasting reception module 111, mobile communication module 112nd, at least one of wireless Internet module 113, short range communication module 114 and location information module 115.

Broadcasting reception module 111 receives broadcast singal and/or broadcast via broadcast channel from external broadcast management server Relevant information.Broadcast channel can include satellite channel and/or terrestrial channel.Broadcast management server can be generated and sent The broadcast singal generating before the server of broadcast singal and/or broadcast related information or reception and/or broadcast related information And send it to the server of terminal.Broadcast singal can include tv broadcast singal, radio signals, data broadcasting Signal etc..And, broadcast singal may further include the broadcast singal combining with tv or radio signals.Broadcast phase Pass information can also provide via mobile communications network, and in this case, broadcast related information can be by mobile communication mould Block 112 is receiving.Broadcast singal can exist in a variety of manners, and for example, it can be with the electronics of DMB (dmb) The form of program guide (epg), the electronic service guidebooks (esg) of digital video broadcast-handheld (dvb-h) etc. and exist.Broadcast Receiver module 111 can be broadcasted by using various types of broadcast system receipt signals.Especially, broadcasting reception module 111 Can be wide by using such as multimedia broadcasting-ground (dmb-t), DMB-satellite (dmb-s), digital video Broadcast-hand-held (dvb-h), forward link media (mediaflo^@) Radio Data System, received terrestrial digital broadcasting integrated service Etc. (isdb-t) digit broadcasting system receives digital broadcasting.Broadcasting reception module 111 may be constructed such that and is adapted to provide for extensively Broadcast the various broadcast systems of signal and above-mentioned digit broadcasting system.Via broadcasting reception module 111 receive broadcast singal and/ Or broadcast related information can be stored in memorizer 160 (or other types of storage medium).

Mobile communication module 112 sends radio signals to base station (for example, access point, node b etc.), exterior terminal And at least one of server and/or receive from it radio signal.Such radio signal can include voice and lead to Words signal, video calling signal or the various types of data sending and/or receiving according to text and/or Multimedia Message.

Wireless Internet module 113 supports the Wi-Fi (Wireless Internet Access) of mobile terminal.This module can be internally or externally It is couple to terminal.Wi-Fi (Wireless Internet Access) technology involved by this module can include wlan (wireless lan) (wi-fi), wibro (WiMAX), wimax (worldwide interoperability for microwave accesses), hsdpa (high-speed downlink packet access) etc..

Short range communication module 114 is the module for supporting junction service.Some examples of short-range communication technology include indigo plant Tooth^tm, RF identification (rfid), Infrared Data Association (irda), ultra broadband (uwb), purple honeybee^tmEtc..

Location information module 115 be for check or obtain mobile terminal positional information module.Location information module Typical case be gps (global positioning system).According to current technology, gps module 115 calculates and is derived from three or more satellites Range information and correct time information and for the Information application triangulation calculating, thus according to longitude, latitude Highly accurately calculate three-dimensional current location information.Currently, the method for calculating position and temporal information is defended using three Star and the error of the position that calculates by using other satellite correction and temporal information.Additionally, gps module 115 Can be by Continuous plus current location information in real time come calculating speed information.

A/v input block 120 is used for receiving audio or video signal.A/v input block 120 can include camera 121 He Mike 122, camera 121 is to the static images being obtained by image capture apparatus in Video Capture pattern or image capture mode Or the view data of video is processed.Picture frame after process may be displayed on display unit 151.Process through camera 121 Picture frame afterwards can be stored in memorizer 160 (or other storage medium) or carry out sending out via wireless communication unit 110 Send, two or more cameras 121 can be provided according to the construction of mobile terminal.Mike 122 can be in telephone calling model, note Sound (voice data) is received via mike in record pattern, speech recognition mode etc. operational mode, and can be by so Acoustic processing be voice data.Audio frequency (voice) data after process can be converted in the case of telephone calling model can It is sent to the form output of mobile communication base station via mobile communication module 112.Mike 122 can implement various types of making an uproar Sound eliminates (or suppression) algorithm to eliminate the noise that (or suppression) produces in reception with during sending audio signal or to do Disturb.

User input unit 130 can generate key input data to control each of mobile terminal according to the order of user input Plant operation.User input unit 130 allows the various types of information of user input, and can include keyboard, metal dome, touch Plate (for example, detection due to touched and lead to resistance, pressure, the change of electric capacity etc. sensitive component), roller, rocking bar etc. Deng.Especially, when touch pad is superimposed upon on display unit 151 as a layer, touch screen can be formed.

Sensing unit 140 detect mobile terminal 1 00 current state, (for example, mobile terminal 1 00 open or close shape State), the position of mobile terminal 1 00, user is for the presence or absence of the contact (that is, touch input) of mobile terminal 1 00, mobile terminal 100 orientation, the acceleration or deceleration movement of mobile terminal 1 00 and direction etc., and generate for controlling mobile terminal 1 00 The order of operation or signal.For example, when mobile terminal 1 00 is embodied as sliding-type mobile phone, sensing unit 140 can sense This sliding-type phone opens or cuts out.In addition, sensing unit 140 can detect power subsystem 190 whether provide electric power or Whether person's interface unit 170 is coupled with external device (ED).Sensing unit 140 can include proximity transducer 141 etc..

Interface unit 170 is connected, with mobile terminal 1 00, the interface that can pass through as at least one external device (ED).For example, External device (ED) can include wired or wireless head-band earphone port, external power source (or battery charger) port, wired or nothing Line FPDP, memory card port, the port of device for connection with identification module, audio input/output (i/o) end Mouth, video i/o port, ear port etc..Identification module can be storage for verifying that user uses each of mobile terminal 1 00 Kind of information and subscriber identification module (uim), client identification module (sim), Universal Subscriber identification module (usim) can be included Etc..In addition, the device (hereinafter referred to as " identifying device ") with identification module can take the form of smart card, therefore, know Other device can be connected with mobile terminal 1 00 via port or other attachment means.Interface unit 170 can be used for reception and is derived from The input (for example, data message, electric power etc.) of the external device (ED) and input receiving is transferred in mobile terminal 1 00 One or more elements or can be used for transmission data between mobile terminal and external device (ED).

In addition, when mobile terminal 1 00 is connected with external base, interface unit 170 can serve as allowing by it by electricity Power provides the path of mobile terminal 1 00 from base or can serve as allowing the various command signals from base input to pass through it It is transferred to the path of mobile terminal.May serve as identifying that mobile terminal is from the various command signals of base input or electric power The no signal being accurately fitted within base.Output unit 150 is configured to defeated with the offer of vision, audio frequency and/or tactile manner Go out signal (for example, audio signal, video signal, alarm signal, vibration signal etc.).

Output unit 150 can include display unit 151, dio Output Modules 152, alarm unit 153 etc..

Display unit 151 may be displayed on the information processing in mobile terminal 1 00.For example, when mobile terminal 1 00 is in electricity During words call mode, display unit 151 can show (for example, text messaging, the multimedia file that communicate with call or other Download etc.) related user interface (ui) or graphic user interface (gui).When mobile terminal 1 00 is in video calling pattern Or during image capture mode, display unit 151 can show the image of capture and/or the image of reception, illustrate video or figure Ui or gui of picture and correlation function etc..

Meanwhile, when display unit 151 and the touch pad touch screen with formation superposed on one another as a layer, display unit 151 can serve as input equipment and output device.Display unit 151 can include liquid crystal display (lcd), thin film transistor (TFT) In lcd (tft-lcd), Organic Light Emitting Diode (oled) display, flexible display, three-dimensional (3d) display etc. at least A kind of.Some in these display may be constructed such that transparence to allow user from outside viewing, and this is properly termed as transparent Display, typical transparent display can be, for example, toled (transparent organic light emitting diode) display etc..According to specific The embodiment wanted, mobile terminal 1 00 can include two or more display units (or other display device), for example, moves Dynamic terminal can include outernal display unit (not shown) and inner display unit (not shown).Touch screen can be used for detection and touches Input pressure and touch input position and touch input area.

Dio Output Modules 152 can mobile terminal be in call signal reception pattern, call mode, logging mode, When under the isotypes such as speech recognition mode, broadcast reception mode, that wireless communication unit 110 is received or in memorizer 160 The voice data transducing audio signal of middle storage and be output as sound.And, dio Output Modules 152 can provide and move The audio output (for example, call signal receives sound, message sink sound etc.) of the specific function correlation of terminal 100 execution. Dio Output Modules 152 can include speaker, buzzer etc..

Alarm unit 153 can provide output to notify event to mobile terminal 1 00.Typical event is permissible Including calling reception, message sink, key signals input, touch input etc..In addition to audio or video output, alarm unit 153 can provide output in a different manner with the generation of notification event.For example, alarm unit 153 can be in the form of vibrating Output is provided, enters when communicating (incomingcommunication) when receiving calling, message or some other, alarm list Unit 153 can provide tactile output (that is, vibrating) to notify to user.By providing such tactile output, even if When the mobile phone of user is in the pocket of user, user also can recognize that the generation of various events.Alarm unit 153 The output of the generation of notification event can be provided via display unit 151 or dio Output Modules 152.

Memorizer 160 can store software program of the process being executed by controller 180 and control operation etc., or can Temporarily to store oneself data (for example, telephone directory, message, still image, video etc.) through exporting or will export.And And, memorizer 160 can be to store the vibration of various modes with regard to exporting and audio signal when touching and being applied to touch screen Data.

Memorizer 160 can include the storage medium of at least one type, and described storage medium includes flash memory, hard disk, many Media card, card-type memorizer (for example, sd or dx memorizer etc.), random access storage device (ram), static random-access storage Device (sram), read only memory (rom), Electrically Erasable Read Only Memory (eeprom), programmable read only memory (prom), magnetic storage, disk, CD etc..And, mobile terminal 1 00 can execute memorizer with by network connection The network storage device cooperation of 160 store function.

Controller 180 generally controls the overall operation of mobile terminal.For example, controller 180 execution and voice call, data The related control of communication, video calling etc. and process.In addition, controller 180 can be included for reproducing (or playback) many matchmakers The multi-media module 181 of volume data, multi-media module 181 can construct in controller 180, or it is so structured that and controls Device 180 separates.Controller 180 can be with execution pattern identifying processing, by the handwriting input executing on the touchscreen or picture Draw input and be identified as character or image.

Power subsystem 190 receives external power or internal power under the control of controller 180 and provides operation each unit Suitable electric power needed for part and assembly.

Various embodiment described herein can be with using such as computer software, hardware or its any combination of calculating Machine computer-readable recording medium is implementing.Hardware is implemented, embodiment described herein can be by using application-specific IC (asic), digital signal processor (dsp), digital signal processing device (dspd), programmable logic device (pld), scene can Program gate array (fpga), processor, controller, microcontroller, microprocessor, be designed to execute function described herein At least one in electronic unit implementing, in some cases, can be implemented in controller 180 by such embodiment. Software is implemented, the embodiment of such as process or function can with allow to execute the single of at least one function or operation Software module is implementing.Software code can be come by the software application (or program) write with any suitable programming language Implement, software code can be stored in memorizer 160 and be executed by controller 180.

So far, oneself is through describing mobile terminal according to its function.Below, for the sake of brevity, will describe such as folded form, Slide type mobile terminal in various types of mobile terminals of board-type, oscillating-type, slide type mobile terminal etc. is as showing Example.Therefore, the present invention can be applied to any kind of mobile terminal, and is not limited to slide type mobile terminal.

As shown in Figure 1 mobile terminal 1 00 may be constructed such that using via frame or packet transmission data all if any Line and wireless communication system and satellite-based communication system are operating.

The communication system being wherein operable to according to the mobile terminal of the present invention referring now to Fig. 2 description.

Such communication system can use different air interfaces and/or physical layer.For example, used by communication system Air interface includes such as frequency division multiple access (fdma), time division multiple acess (tdma), CDMA (cdma) and universal mobile communications system System (umts) (especially, Long Term Evolution (lte)), global system for mobile communications (gsm) etc..As non-limiting example, under The description in face is related to cdma communication system, but such teaching is equally applicable to other types of system.

With reference to Fig. 2, cdma wireless communication system can include multiple mobile terminal 1s 00, multiple base station (bs) 270, base station Controller (bsc) 275 and mobile switching centre (msc) 280.Msc280 is configured to and Public Switched Telephony Network (pstn) 290 formation interfaces.Msc280 is also structured to and can form interface via the bsc275 that back haul link is couple to base station 270. If back haul link can construct according to any one in the interface that Ganji knows, described interface includes such as e1/t1, atm, ip, Ppp, frame relay, hdsl, adsl or xdsl.It will be appreciated that system as shown in Figure 2 can include multiple bsc275.

Each bs270 can service one or more subregions (or region), by the sky of multidirectional antenna or sensing specific direction Each subregion that line covers is radially away from bs270.Or, each subregion can by for diversity reception two or more Antenna covers.Each bs270 may be constructed such that support multiple frequency distribution, and the distribution of each frequency has specific frequency spectrum (for example, 1.25mhz, 5mhz etc.).

Intersecting that subregion and frequency are distributed can be referred to as cdma channel.Bs270 can also be referred to as base station transceiver System (bts) or other equivalent terms.In this case, term " base station " can be used for broadly representing single Bsc275 and at least one bs270.Base station can also be referred to as " cellular station ".Or, each subregion of specific bs270 can be claimed For multiple cellular stations.

As shown in Figure 2, broadcast singal is sent to the mobile terminal of operation in system by broadcsting transmitter (bt) 295 100.Broadcasting reception module 111 is arranged at mobile terminal 1 00 to receive the broadcast being sent by bt295 as shown in Figure 1 Signal.In fig. 2 it is shown that several global positioning system (gps) satellites 300.Satellite 300 helps position multiple mobile terminals At least one of 100.

In fig. 2, depict multiple satellites 300, it is understood that be, it is possible to use any number of satellite obtains useful Location information.Gps module 115 is generally configured to coordinate with satellite 300 to obtain the positioning letter wanted as shown in Figure 1 Breath.Substitute gps tracking technique or outside gps tracking technique, it is possible to use other of the position of mobile terminal can be followed the tracks of Technology.In addition, at least one gps satellite 300 can optionally or additionally process satellite dmb transmission.

As a typical operation of wireless communication system, bs270 receives the reverse link from various mobile terminal 1s 00 Signal.Mobile terminal 1 00 generally participates in call, information receiving and transmitting and other types of communication.Each of certain base station 270 reception is anti- Processed in specific bs270 to link signal.The data obtaining is forwarded to the bsc275 of correlation.Bsc provides call Resource allocation and the mobile management function of including the coordination of soft switching process between bs270.Bsc275 is also by the number receiving According to being routed to msc280, it provides the extra route service for forming interface with pstn290.Similarly, pstn290 with Msc280 forms interface, and msc and bsc275 form interface, and bsc275 correspondingly controls bs270 with by forward link signals It is sent to mobile terminal 1 00.

, the situation that user grips mobile terminal is as shown in Figure 3 taking mobile terminal of mobile telephone as a example.

Based on above-mentioned mobile terminal hardware configuration and communication system, each embodiment of the present invention is proposed.

As shown in Figures 4 and 5, first embodiment of the invention proposes a kind of mobile terminal, comprising:

1) segmentation module 401, for the blank parts according to voice document or the mode according to decile, to voice document Carry out segmentation and obtain sound bite；

2) acquisition module 402, for determining the visual description information of described sound bite according to the content of sound bite；

3) previewing module 403, for using the visual description information of each sound bite as corresponding sound bite mark Shown.Such as: the mode according to list or bubble to identify each sound bite.

Optionally, as shown in figure 4, segmentation module 401, comprising:

Blank segmenting unit 41, for the blank parts reaching setting duration according to included in voice document, by language Sound file division becomes sound bite.Such as: blank parts are judged based on audio time domain waveform, whether reaches further according to blank parts To determine whether the foundation as segmentation using this blank parts to the duration setting.In segmentation module 401 using blank segmentation In the case of unit 41, previewing module 403 is additionally operable to: hides the mark of the corresponding sound bite of described blank parts.

Or, as shown in figure 5, segmentation module 401, comprising:

Unit segmenting unit 42, at set time intervals or set individual voice clip size, by language Sound file division becomes sound bite.

The embodiment of the present invention utilizes audio time domain waveform analyses, carries out fragment to voice document and disassembles, can exclude blank Fragment, and the visual description information of the sound bite content mark to sound bite is presented on screen, facilitate user quick Find sound bite interested and go audition, improve the efficiency of audio frequency lookup.

As shown in fig. 6, second embodiment of the invention proposes a kind of mobile terminal, comprising:

2) character acquisition module 402-a, for being converted into corresponding character string by sound bite；Corresponding from sound bite Extract character string summary in character string, described character string is made a summary as the visual description information of sound bite；It is believed that In the embodiment of the present invention, character acquisition module 402-a is that one of acquisition module 402 of first embodiment implements.

3) previewing module 403, for using the visual description information of each sound bite as corresponding sound bite mark Shown.

Optionally, described device, also includes:

4) processing module 404, for receiving and responding the operational order for the mark of sound bite each described, described Operational order includes: choose, non-selected, delete, sequence or play.

Optionally, processing module 404, are additionally operable to:

Described operational order be play in the case of, optionally, processing module 404 specifically for: if described voice is civilian Part comes from audio-video document, then in described audio-video document, video file corresponding with sound bite is together broadcast with sound bite Put.

The embodiment of the present invention utilizes audio time domain waveform analyses, carries out fragment to voice document and disassembles, can exclude blank Fragment, and speech recognition technology works in coordination, the visual description information presenting sound bite content on screen is to voice sheet Section mark, facilitate user be quickly found out sound bite interested go audition, selected by click and instead select, Yi Jitong Crossing and the permutation and combination of fragment is carried out to sound bite sequence, ultimately generating the sound bite after arrangement and restructuring it is achieved that voice Quick-searching audition and quick editing, increased end product function, improve audio frequency lookup efficiency.

As shown in fig. 7, third embodiment of the invention proposes a kind of mobile terminal, comprising:

2) image collection module 402-b, in the case of coming from audio-video document in institute's voice file, from described A picture frame is extracted as the visual description information of sound bite in video file corresponding with sound bite in audio-video document； It is believed that in the embodiment of the present invention, image collection module 402-b is one of acquisition module 402 of first embodiment tool Body is realized.

Optionally, described device, also includes:

Optionally, processing module 404, are additionally operable to:

The embodiment of the present invention utilizes audio time domain waveform analyses, carries out fragment to voice document and disassembles, can exclude blank Fragment, presents picture frame in the corresponding video file of the sound bite mark to sound bite on screen, facilitates user fast Speed find sound bite interested go audition, selected by click and anti-choosing and by sound bite is sorted into The permutation and combination of row fragment, ultimately generates the sound bite after arrangement and restructuring it is achieved that the quick-searching audition of voice and fast Fast editing, increased end product function, improves the efficiency of audio frequency lookup.

As shown in figure 8, fourth embodiment of the invention proposes a kind of method for previewing based on voice document, comprising:

Step s101, the blank parts according to voice document or the mode according to decile, carry out segmentation to voice document Obtain sound bite；

Step s102, determines the visual description information of described sound bite according to the content of sound bite；

Step s103, the visual description information of each sound bite is shown as the mark of corresponding sound bite. Such as: the mode according to list or bubble to identify each sound bite.

The effect of visual description information is to allow users to intuitively see the word in sound bite from terminal screen Information or the image frame information of the corresponding video segment of sound bite, facilitate user to be rapidly performed by searching preview.

Optionally, the segmented mode in step s101 specifically include following two:

The first: the blank parts reaching setting duration according to included in voice document, voice document is divided into Sound bite；

Specifically, blank parts are judged based on the audio time domain waveform of voice document, whether reach further according to blank parts To determine whether the foundation as segmentation using this blank parts to the duration setting.Blank parts refer to audio time domain waveform In set audio amplitude scope outside part, such as: can be according to the intensity of sound feature setting audio amplitude model of people Enclose, be likely to unrelated background sound less than the sound of the minima of audio amplitude scope, higher than the maximum of audio amplitude scope The sound of value is likely to interfering noise.Preferably, in addition to intensity of sound, can be according to the tone color of the sound of people, tone Aspect set corresponding amplitude range more accurately to determine blank parts.

Second: at set time intervals or set individual voice clip size, voice document is divided into Sound bite.

Specifically, user can arrange the size of this time interval and individual voice fragment manually, such as: when user with Specifically during mode (such as length presses voice document), one setting interface of triggering display, manual on this setting interface for user This time interval or the size of individual voice fragment are set, and the information based on user setup is carried out to voice document point again on backstage Cut.Or, user sets time interval or the size of individual voice fragment in advance, when user in a particular manner (such as Length presses voice document) when, directly according to the information of setting, voice document was split later.The embodiment of the present invention is except carrying Outside mode voice document split for the first foundation blank parts, also provide the mode of User Defined segmentation, User can be carried out segmentation substantially or careful segmentation, can show accordingly according to the memory ability of itself to voice document The visual description information of less or more sound bite is shown, carries out voice lookup for user.

As shown in figure 9, fifth embodiment of the invention proposes a kind of method for previewing based on voice document, comprising:

Step s201, the blank parts according to voice document or the mode according to decile, carry out segmentation to voice document Obtain sound bite；

Specifically, step s201 includes:

Optionally, voice document is divided into by the blank parts setting duration in reaching according to included in voice document In the case of sound bite, hide the mark of the corresponding sound bite of described blank parts.

Step s202, determines the visual description information of described sound bite according to the content of sound bite；

Specifically, step s202 includes:

Optionally, described visual description information, also includes: the residing time start stop bit in voice document of sound bite Put.The embodiment of the present invention, in addition to providing a user with regard to the sound bite content visual information of itself, also provides voice sheet The generation beginning and ending time information of section carries out auxiliary reference to user, such as: when user can occur according to the voice substantially remembered Carve and find the sound bite wanted faster.Subsequently, when user's click sound bite plays out, it is also to play this voice sheet Voice in the range of the residing time start-stop position in voice document of section.

Step s203, the visual description information of each sound bite is shown as the mark of corresponding sound bite.

The embodiment of the present invention utilizes audio time domain waveform analyses, carries out fragment to voice document and disassembles, can exclude blank Fragment, presents the figure in the corresponding character string information of sound bite content or the corresponding video file of sound bite on screen As the mark to sound bite for the frame, facilitate user to be quickly found out sound bite interested and go audition, selected by click Permutation and combination with anti-choosing and by sound bite sequence is carried out with fragment, ultimately generates the voice sheet after arrangement and restructuring Section, it is achieved that the quick-searching audition of voice and quick editing, increased end product function, improves the effect of audio frequency lookup Rate.

As shown in Figure 10, sixth embodiment of the invention proposes a kind of method for previewing based on voice document, comprising:

Step s301, the blank parts according to voice document or the mode according to decile, carry out segmentation to voice document Obtain sound bite；

Specifically, step s301 includes:

Step s302, determines the visual description information of described sound bite according to the content of sound bite；

Specifically, step s302 includes:

Step s303, the visual description information of each sound bite is shown as the mark of corresponding sound bite.

Step s304, receives and responds the operational order for the mark of sound bite each described, described operational order Including: choose, non-selected, delete, sequence or play.

Optionally, described step s304 also includes:

In the case of being to play in described operational order, optionally, if institute's voice file comes from audio-video document, In described audio-video document, video file corresponding with sound bite is together play with sound bite.

Seventh embodiment of the invention proposes a kind of application example carrying out preview based on voice document, the skill of this application example The innovation of art scheme is, using the interactive meanses such as touch screen and pressure screen, in conjunction with speech recognition technology and recording, in people Rapid preview method and any editing and the combined method of a set of voice document, on machine interactive interface, are provided, and can be very efficient Skip blank space.When user touches voice document for a long time, or after firmly pressing voice document, backstage parses audio time domain Waveform, removes the waveform of blank space in audio time domain waveform, resolves into several sound bites according to blank space simultaneously, and voice helps Handss pass through to obtain the audio stream of sound bite, carry out, to all sound bites, the operation that voice turns text, and record each language The time point of tablet section.On human-computer interaction interface, sound bite becomes several bubble distributions on screen, display language in bubble Tablet section corresponding word outline, the beginning and ending time of display sound bite by bubble.User can pass through the side of this bubble primary dcreening operation Formula, presses the bubble of corresponding sound bite to carry out audition preview, can also random these bubbles of permutation and combination, including choosing Neutralize non-selected state, finally piece together the audio file of a new rough cut.

This application example to realize step as follows:

The voice document segmentation of step a..

When user passes through ad hoc fashion, for example firmly press, long by etc. mode reach trigger condition.System is carried out based on sound Waveform in time domain for the frequency is analyzed, and carries out just screening section according to blank fragment.

The voice of step b. sound bite turns text maninulation.

Voice assistant uploads all sound bite information to high in the clouds, waits callback () function to return each sound bite phase Close character string, meanwhile, backstage records the beginning and ending time in whole voice document for all sound bites.

Step c. sound bite is to the conversion of human-computer interaction interface.

Rising according to the quantity of sound bite, each corresponding character string of sound bite content and each sound bite The only time, the bubble of respective numbers is generated on human-computer interaction interface, as shown in figure 11, display character string outline, gas in bubble The other labelling initial time of bubble.The preview of current speech segment when user presses respective segments, is play according to beginning and ending time information； Meanwhile, user can enter line slip in screen area to bubble, arbitrary arrangement, choose, non-selected, to be combined, finally The clip files of a preliminary screening can be generated.

It should be noted that herein, term " inclusion ", "comprising" or its any other variant are intended to non-row The comprising of his property, so that including a series of process of key elements, method, article or device not only include those key elements, and And also include other key elements of being not expressly set out, or also include intrinsic for this process, method, article or device institute Key element.In the absence of more restrictions, the key element being limited by sentence "including a ..." is it is not excluded that including being somebody's turn to do Also there is other identical element in the process of key element, method, article or device.

The embodiments of the present invention are for illustration only, do not represent the quality of embodiment.

Through the above description of the embodiments, those skilled in the art can be understood that above-described embodiment side Method can be realized by the mode of software plus necessary general hardware platform naturally it is also possible to pass through hardware, but in many cases The former is more preferably embodiment.Based on such understanding, technical scheme is substantially done to prior art in other words Go out partly can embodying in the form of software product of contribution, this computer software product is stored in a storage medium In (as rom/ram, magnetic disc, CD), including some instructions with so that a station terminal equipment (can be mobile phone, computer, clothes Business device, air-conditioner, or network equipment etc.) method described in execution each embodiment of the present invention.

These are only the preferred embodiments of the present invention, not thereby limit the present invention the scope of the claims, every using this Equivalent structure or equivalent flow conversion that bright description and accompanying drawing content are made, or directly or indirectly it is used in other related skills Art field, is included within the scope of the present invention.

Claims

1. a kind of mobile terminal is it is characterised in that described mobile terminal includes:

Segmentation module, for the blank parts according to voice document or the mode according to decile, carries out segmentation to voice document Obtain sound bite；

Previewing module, for being shown the visual description information of each sound bite as the mark of corresponding sound bite.

2. mobile terminal as claimed in claim 1 is it is characterised in that described segmentation module, comprising:

Blank segmenting unit, for the blank parts reaching setting duration according to included in voice document, by voice document It is divided into sound bite；Or,

Unit segmenting unit, at set time intervals or set individual voice clip size, by voice document It is divided into sound bite.

3. mobile terminal as claimed in claim 1 is it is characterised in that described acquisition module, comprising:

Character acquiring unit, for being converted into corresponding character string by sound bite；Carry from the corresponding character string of sound bite Take out character string summary, described character string is made a summary as the visual description information of sound bite；Or,

Image acquisition unit, in the case of coming from audio-video document in institute's voice file, from described audio-video document In extract a picture frame as the visual description information of sound bite in video file corresponding with sound bite.

4. mobile terminal as claimed in claim 3, it is characterised in that described visual description information, also includes: sound bite exists Residing time start-stop position in voice document.

5. the mobile terminal as any one of Claims 1 to 4, it is characterised in that described device, also includes:

Processing module, for receiving and responding the operational order for the mark of sound bite each described, described operational order Including: choose, non-selected, delete, sequence or play；

Receive the preservation instruction of the mark of all described sound bite being directed to display, preserve, based on described, the institute that instruction will show State sound bite and generate voice clip files.

6. a kind of method for previewing based on voice document is it is characterised in that include:

Blank parts according to voice document or the mode according to decile, carry out segmentation to voice document and obtain sound bite；

7. the method for previewing based on voice document as claimed in claim 6 is it is characterised in that the described sky according to voice document White part or the mode according to decile, carry out segmentation to voice document and obtain sound bite, comprising:

The blank parts reaching setting duration according to included in voice document, voice document is divided into sound bite；Or Person,

At set time intervals or set individual voice clip size, voice document is divided into sound bite.

8. the method for previewing based on voice document as claimed in claim 6 it is characterised in that described according in sound bite Hold the visual description information determining described sound bite, comprising:

Sound bite is converted into corresponding character string；Character string summary is extracted from the corresponding character string of sound bite, will Described character string summary is as the visual description information of sound bite；Or,

In the case that institute's voice file comes from audio-video document, corresponding with sound bite from described audio-video document A picture frame is extracted as the visual description information of sound bite in video file.

9. the method for previewing based on voice document as claimed in claim 8, it is characterised in that described visual description information, is gone back Including: the residing time start-stop position in voice document of sound bite.

10. the method for previewing based on voice document as any one of claim 6～9 is it is characterised in that described side Method, also includes:

Receive and respond the operational order for the mark of sound bite each described, described operational order includes: choose, non-choosing In, delete, sequence or play；