CN106385548A

CN106385548A - Mobile terminal and method for generating video captions

Info

Publication number: CN106385548A
Application number: CN201610801534.3A
Authority: CN
Inventors: 李林兴; 陈鹏飞
Original assignee: Nubia Technology Co Ltd
Current assignee: Nubia Technology Co Ltd
Priority date: 2016-09-05
Filing date: 2016-09-05
Publication date: 2017-02-08

Abstract

The present invention provides a mobile terminal. The mobile terminal comprises a voice identification module, a caption generation module and a face identification module. The voice identification module is configured to identify the voice content in a video file through a voice identification function; the caption generation module is configured to convert the voice content to character content and make the character content to a caption file; and the face identification module is configured to determine the speaker's name in the video file through a face identification function and mark the corresponding speaker's name for the caption file. The present invention further provides a method for generating video captions. The mobile terminal and the method for generating video captions can reduce the users' time to search related video files and mark the speaker's names for the caption files to take as the recording documents of the video files so as to facilitate searching the video content by the users.

Description

A kind of mobile terminal and the method generating video caption

Technical field

The present invention relates to the communications field, more particularly, it relates to a kind of mobile terminal and the method generating video caption.

Background technology

With the popularization of the electronic equipment with camera function, increasing people begins to use camera function to be recorded Picture, obtains video file data, and existing electronic equipment cannot provide corresponding captions after obtaining video file data it is impossible to expire The demand of sufficient user's reading sub titles, is carried out after video record to meeting it is impossible to differentiate the speech content of each talker it is impossible to carry For detailed minutes document.

Content of the invention

The invention provides a kind of mobile terminal, video file can be generated with subtitle file, differentiate in video file and talk The speech content of person.Described mobile terminal includes：

Sound identification module, for identifying the voice content in video file by speech identifying function；

Captions generation module, for described voice content is converted into word content, and described word content is fabricated to For subtitle file；And

Face recognition module, for judging the facial characteristics of talker in described video file by face identification functions, Determine talker's name according to described facial characteristics, and indicate corresponding talker's name to described subtitle file.

Further, described sound identification module, is additionally operable to identify the sound characteristic of talker by speech identifying function, Talker's name in described video file is determined according to described sound characteristic, and indicates corresponding talker to described subtitle file Name.

Further, described mobile terminal also includes：

Retrieval module, for obtaining key word, and retrieves the literary composition related to described key word from described word content Field falls, and finds out the corresponding video file of described word paragraph.

Further, described mobile terminal also includes：

Document creation module, for by described subtitle file, described subtitle file in described video file residing when Between, described subtitle file corresponding talker name arrange become document information；

Display module, for video file described in simultaneous display and described subtitle file.

Further, described mobile terminal also includes：

Control module, for receiving the first control signal, closes described recognition of face work(according to described first control signal , described speech identifying function can be opened, judge the talker's name in described video file by described speech identifying function, and Receive the second control signal, close described speech identifying function, open described recognition of face work(according to described second control signal Can, the talker's name in described video file is judged by described face identification functions.

The mobile terminal that the present invention provides identifies the voice content in video file by speech identifying function, by voice Hold and be converted into word content, word content is organized into subtitle file, by subtitle file and video file simultaneous display, and judges The name of talker in video file, indicates the name of talker, side in the corresponding subtitle file of voice content of talker Just user reads the caption content of video file, after carrying out video record, differentiates the speech content of each talker, provides in detail Recording documents, simplify and video file carried out with word arrange flow process.

The present invention also provides a kind of method generating video caption, video file can be generated with subtitle file, differentiate video The speech content of talker in file.The described method generating video caption includes：

Voice content in video file is identified by speech identifying function；

Described voice content is converted into word content, and the making of described word content is become subtitle file；And

Judge the facial characteristics of talker in described video file by face identification functions, true according to described facial characteristics Determine talker's name, and indicate corresponding talker's name to described subtitle file.

Further, the described method generating video caption also includes：

Identify the sound characteristic of talker by speech identifying function, described video file is determined according to described sound characteristic Middle talker's name, and indicate corresponding talker's name to described subtitle file.

Further, the described method generating video caption also includes：

Obtain key word, and retrieve the word paragraph related to described key word from described word content, and search Go out the corresponding video file of described word paragraph.

Further, the described method generating video caption also includes：

By time residing in described video file to described subtitle file, described subtitle file, described subtitle file pair The talker's name answered arranges becomes document information；

Video file described in simultaneous display and described subtitle file.

Further, the described method generating video caption also includes：

Receive the first control signal, close described face identification functions, open institute's predicate according to described first control signal Sound identification function, judges the talker's name in described video file by described speech identifying function；

Receive the second control signal, close described speech identifying function, open described people according to described second control signal Face identification function, judges the talker's name in described video file by described face identification functions.

The method of the generation video caption that the present invention provides is identified in the voice in video file by speech identifying function Hold, voice content is converted into word content, word content is organized into subtitle file, will be synchronous to subtitle file and video file Display, and judge the name of talker in video file, the corresponding subtitle file of voice content of talker indicates speech The name of person, facilitates user to read the caption content of video file, after carrying out video record, differentiates in the speech of each talker Hold, detailed recording documents are provided, simplify and video file is carried out with word arrangement flow process.

Brief description

Fig. 1 is the hardware architecture diagram of the mobile terminal realizing each embodiment of the present invention；

Fig. 2 is the wireless communication system schematic diagram of mobile terminal as shown in Figure 1；

Fig. 3 is the functional block diagram of the embodiment of the present invention one mobile terminal；

Fig. 4 is the functional block diagram of the embodiment of the present invention two mobile terminal；

Fig. 5 is the applied environment figure of the embodiment of the present invention three mobile terminal；

Fig. 6 is the flow chart of the method that the embodiment of the present invention four generates video caption；

Fig. 7 is the flow chart of the method that the embodiment of the present invention five generates video caption.

The realization of the object of the invention, functional characteristics and advantage will be described further in conjunction with the embodiments referring to the drawings.

Specific embodiment

It should be appreciated that specific embodiment described herein, only in order to explain the present invention, is not intended to limit the present invention.

Realize the mobile terminal of each embodiment of the present invention referring now to Description of Drawings.In follow-up description, use For represent element such as " module ", " part " or " unit " suffix only for being conducive to the explanation of the present invention, itself Not specific meaning.Therefore, " module " and " part " can mixedly use.

Mobile terminal can be implemented in a variety of manners.For example, the terminal described in the present invention can include such as moving Phone, smart phone, notebook computer, digit broadcasting receiver, PDA (personal digital assistant), PAD (panel computer), PMP The mobile terminal of (portable media player), guider etc. and such as numeral TV, desk computer etc. consolidate Determine terminal.Hereinafter it is assumed that terminal is mobile terminal.However, it will be understood by those skilled in the art that, except being used in particular for moving Outside the element of purpose, construction according to the embodiment of the present invention can also apply to the terminal of fixed type.

Fig. 1 is the hardware architecture diagram of the mobile terminal realizing each embodiment of the present invention.

Mobile terminal 10 can include, but not limited to memorizer 20, controller 30, wireless communication unit 40, input block 50th, input block 60, photographic head 70, mike 71, interface unit 80 and power subsystem 90.Fig. 1 shows with various assemblies Mobile terminal 10 it should be appreciated that being not required for implementing all assemblies illustrating.Can alternatively implement more or Less assembly.Will be discussed in more detail below the element of mobile terminal 10.

Wireless communication unit 40 generally includes one or more assemblies, its allow mobile terminal 10 and wireless communication system or Wireless points communication between network.For example, wireless communication unit can include broadcasting reception module, mobile communication module, wireless At least one of the Internet module, short range communication module and location information module.

Broadcasting reception module receives broadcast singal via broadcast channel from external broadcast management server and/or broadcast is related Information.Broadcast channel can include satellite channel and/or terrestrial channel.Broadcast management server can be to generate and send broadcast The server of signal and/or broadcast related information or receive before generate broadcast singal and/or broadcast related information and Send it to the server of terminal.Broadcast singal can include TV broadcast singal, radio signals, data broadcasting signal Etc..And, broadcast singal may further include the broadcast singal combining with TV or radio signals.The related letter of broadcast Breath can also provide via mobile communications network, and in this case, broadcast related information can be come by mobile communication module Receive.Broadcast singal can exist in a variety of manners, and for example, it can be referred to the electronic programming of DMB (DMB) The form of southern (EPG), the electronic service guidebooks (ESG) of digital video broadcast-handheld (DVB-H) etc. and exist.Broadcast reception mould Block can be broadcasted by using various types of broadcast system receipt signals.Especially, broadcasting reception module can be by using Such as multimedia broadcasting-ground (DMB-T), DMB-satellite (DMB-S), DVB-hand-held (DVB- H), the number of the Radio Data System of forward link media (MediaFLO@), received terrestrial digital broadcasting integrated service (ISDB-T) etc. Word broadcast system receives digital broadcasting.Broadcasting reception module may be constructed such that the various broadcast systems being adapted to provide for broadcast singal And above-mentioned digit broadcasting system.Via broadcasting reception module, the broadcast singal receiving and/or broadcast related information can store In memorizer 20 (or other types of storage medium).

Mobile communication module send radio signals to base station (for example, access point, node B etc.), exterior terminal with And at least one of server and/or receive from it radio signal.Such radio signal can include voice call Signal, video calling signal or the various types of data sending and/or receiving according to text and/or Multimedia Message.

Wireless Internet module supports the Wi-Fi (Wireless Internet Access) of mobile terminal.This module can internally or externally couple To terminal.Wi-Fi (Wireless Internet Access) technology involved by this module can include WLAN (WLAN) (Wi-Fi), Wibro (no Live width band), Wimax (worldwide interoperability for microwave accesses), HSDPA (high-speed downlink packet access) etc..

Short range communication module is the module for supporting junction service.Some examples of short-range communication technology include bluetooth^TM、 RF identification (RFID), Infrared Data Association (IrDA), ultra broadband (UWB), purple honeybee^TMEtc..

Location information module be for check or obtain mobile terminal positional information module.The allusion quotation of location information module Type example is GPS (global positioning system).According to current technology, GPS module calculates the distance from three or more satellites Information and correct time information and the Information application triangulation for calculating, thus according to longitude, latitude and height Calculate three-dimensional current location information exactly.Currently, for calculate position and temporal information method use three satellites and The position calculating by using an other satellite correction and the error of temporal information.Additionally, GPS module can be by real When ground Continuous plus current location information carry out calculating speed information.

Output unit 50 be configured to vision, audio frequency and/or tactile manner provide output signal (for example, audio signal, Video signal, alarm signal, vibration signal etc.).Output unit 50 can include display unit 51, dio Output Modules 52, Alarm unit 53 etc..

Display unit 51 may be displayed on the information processing in mobile terminal 10.For example, when mobile terminal 10 is in phone During call mode, display unit 51 can show and communicate with call or other (for example, text messaging, under multimedia file Carry etc.) related user interface (UI) or graphic user interface (GUI).When mobile terminal 10 be in video calling pattern or During image capture mode, display unit 51 can show the image of capture and/or the image of reception, illustrate video or image and UI or GUI of correlation function etc..

Meanwhile, when display unit 51 and the touch pad touch screen with formation superposed on one another as a layer, display unit 51 Can serve as input equipment and output device.Display unit 51 can include liquid crystal display (LCD), thin film transistor (TFT) LCD (TFT-LCD), at least in Organic Light Emitting Diode (OLED) display, flexible display, three-dimensional (3D) display etc. Kind.Some in these display may be constructed such that transparence to allow user from outside viewing, and this is properly termed as transparent aobvious Show device, typical transparent display can be, for example, TOLED (transparent organic light emitting diode) display etc..Thought according to specific The embodiment wanted, mobile terminal 10 can include two or more display units (or other display device), for example, mobile whole End can include outernal display unit (not shown) and inner display unit (not shown).Touch screen can be used for detecting touch input Pressure and touch input position and touch input area.

Dio Output Modules 52 can be in call signal reception pattern, call mode, logging mode, language in mobile terminal When under the isotypes such as sound recognition mode, broadcast reception mode, that wireless communication unit 40 is received or deposit in memorizer 20 Storage voice data transducing audio signal and be output as sound.And, dio Output Modules 52 can provide and mobile terminal The audio output (for example, call signal receives sound, message sink sound etc.) of the specific function correlation of 10 execution.Audio frequency is defeated Go out module 52 and can include speaker, buzzer etc..

Alarm unit 53 can provide output to notify event to mobile terminal 10.Typical event can be wrapped Include calling reception, message sink, key signals input, touch input etc..In addition to audio or video output, alarm unit 53 Output can be provided in a different manner with the generation of notification event.For example, alarm unit 53 can be provided in the form of vibrating Output, enters when communicating (incoming communication) when receiving calling, message or some other, alarm unit 53 Tactile output (that is, vibrating) can be provided to notify to user.By providing such tactile output, even if user's When mobile phone is in the pocket of user, user also can recognize that the generation of various events.Alarm unit 53 can also be through The output of the generation of notification event is provided by display unit 51 or dio Output Modules 52.

Input block 60 can generate key input data to control the various behaviour of mobile terminal according to the order of user input Make.Input block 60 allows the various types of information of user input, and can include keyboard, metal dome, touch pad (for example, Detection due to touched and lead to resistance, pressure, the change of electric capacity etc. sensitive component), roller, rocking bar etc..Especially Ground, when touch pad is superimposed upon on display unit 50 as a layer, can form touch screen.In an embodiment of the present invention, Described input block 60 includes touch screen and ink screen.Photographic head 70 is used for shooting image data, and mike 71 is used for enrolling sound Frequency data.

Interface unit 80 is connected, with mobile terminal 10, the interface that can pass through as at least one external device (ED).For example, outward Part device can include wired or wireless head-band earphone port, external power source (or battery charger) port, wired or wireless FPDP, memory card port, for connect have the port of device of identification module, audio input/output (I/O) port, Video i/o port, ear port etc..Identification module can be storage for verifying that user uses the various letters of mobile terminal 10 Cease and subscriber identification module (UIM), client identification module (SIM), Universal Subscriber identification module (USIM) etc. can be included. In addition, the device (hereinafter referred to as " identifying device ") with identification module can take the form of smart card, therefore, identifying device Can be connected with mobile terminal 10 via port or other attachment means.Interface unit 80 can be used for receiving from external device (ED) Input (for example, data message, electric power etc.) and the input receiving is transferred to one or many in mobile terminal 10 Individual element or can be used for transmission data between mobile terminal and external device (ED).

In addition, when mobile terminal 10 is connected with external base, interface unit 80 can serve as allowing by it by electric power There is provided the path of mobile terminal 10 from base or can serve as allowing the various command signals from base input to pass by it The defeated path to mobile terminal.May serve as whether identifying mobile terminal from the various command signals of base input or electric power It is accurately fitted within the signal on base.

Memorizer 20 can store software program of the process being executed by controller 30 and control operation etc., or permissible Temporarily store oneself data (for example, telephone directory, message, still image, video etc.) through exporting or will export.And, Memorizer 20 can be to store the vibration of various modes and the data of audio signal with regard to exporting when touching and being applied to touch screen.

Memorizer 20 can include the storage medium of at least one type, and described storage medium includes flash memory, hard disk, many matchmakers Body card, card-type memorizer (for example, SD or DX memorizer etc.), random access storage device (RAM), static random-access memory (SRAM), read only memory (ROM), Electrically Erasable Read Only Memory (EEPROM), programmable read only memory (PROM), magnetic storage, disk, CD etc..And, mobile terminal 10 can execute memorizer 20 with by network connection Store function network storage device cooperation.

Controller 30 generally controls the overall operation of mobile terminal.For example, controller 30 execution and voice call, data are led to The related control of letter, video calling etc. and process.In addition, controller 30 can be included for reproducing (or playback) multimedia number According to multi-media module, multi-media module can construct in controller 30, or it is so structured that separates with controller 30.Control The handwriting input executing on the touchscreen or picture can be drawn input and are identified as with execution pattern identifying processing by device 30 processed Character or image.

Power subsystem 90 receives external power or internal power under the control of the controller 30 and provides operation each element With the suitable electric power needed for assembly.

Various embodiment described herein can be with using such as computer software, hardware or its any combination of calculating Machine computer-readable recording medium is implementing.Hardware is implemented, embodiment described herein can be by using application-specific IC (ASIC), digital signal processor (DSP), digital signal processing device (DSPD), programmable logic device (PLD), scene can Program gate array (FPGA), processor, controller, microcontroller, microprocessor, be designed to execute function described herein At least one in electronic unit implementing, in some cases, can be implemented in controller 180 by such embodiment. Software is implemented, the embodiment of such as process or function can with allow to execute the single of at least one function or operation Software module is implementing.Software code can be come by the software application (or program) write with any suitable programming language Implement, software code can be stored in memorizer 160 and be executed by controller 180.

So far, oneself is through describing mobile terminal according to its function.Below, for the sake of brevity, will describe such as folded form, Slide type mobile terminal in various types of mobile terminals of board-type, oscillating-type, slide type mobile terminal etc. is as showing Example.Therefore, the present invention can be applied to any kind of mobile terminal, and is not limited to slide type mobile terminal.

As shown in Figure 1 mobile terminal 10 may be constructed such that using such as wired via frame or packet transmission data To operate with wireless communication system and satellite-based communication system.

The communication system being wherein operable to according to the mobile terminal of the present invention referring now to Fig. 2 description.

Such communication system can use different air interfaces and/or physical layer.For example, used by communication system Air interface includes such as frequency division multiple access (FDMA), time division multiple acess (TDMA), CDMA (CDMA) and universal mobile communications system System (UMTS) (especially, Long Term Evolution (LTE)), global system for mobile communications (GSM) etc..As non-limiting example, under The description in face is related to cdma communication system, but such teaching is equally applicable to other types of system.

With reference to Fig. 2, cdma wireless communication system can include multiple mobile terminal 1s 0, multiple base station (BS) 270, base station control Device (BSC) 275 processed and mobile switching centre (MSC) 280.MSC280 is configured to and Public Switched Telephony Network (PSTN) 290 Form interface.MSC280 is also structured to and can form interface via the BSC275 that back haul link is couple to base station 270.Backhaul If circuit can construct according to any one in the interface that Ganji knows, described interface includes such as E1/T1, ATM, IP, PPP, Frame relay, HDSL, ADSL or xDSL.It will be appreciated that system as shown in Figure 2 can include multiple BSC2750.

Each BS270 can service one or more subregions (or region), by the sky of multidirectional antenna or sensing specific direction Each subregion that line covers is radially away from BS270.Or, each subregion can by for diversity reception two or more Antenna covers.Each BS270 may be constructed such that support multiple frequency distribution, and the distribution of each frequency has specific frequency spectrum (for example, 1.25MHz, 5MHz etc.).

Intersecting that subregion and frequency are distributed can be referred to as CDMA Channel.BS270 can also be referred to as base station transceiver System (BTS) or other equivalent terms.In this case, term " base station " can be used for broadly representing single BSC275 and at least one BS270.Base station can also be referred to as " cellular station ".Or, each subregion of specific BS270 can be claimed For multiple cellular stations.

As shown in Figure 2, broadcast singal is sent to the mobile terminal of operation in system by broadcsting transmitter (BT) 295 10.Broadcasting reception module 111 is arranged on and is believed by the broadcast that BT295 sends with receiving at mobile terminal 10 as shown in Figure 1 Number.In fig. 2 it is shown that several global positioning system (GPS) satellites 300.Satellite 300 helps position in multiple mobile terminal 1s 0 At least one.

In fig. 2, depict multiple satellites 300, it is understood that be, it is possible to use any number of satellite obtains useful Location information.GPS module 115 is generally configured to coordinate with satellite 300 to obtain the positioning letter wanted as shown in Figure 1 Breath.Substitute GPS tracking technique or outside GPS tracking technique, it is possible to use other of the position of mobile terminal can be followed the tracks of Technology.In addition, at least one gps satellite 300 can optionally or additionally process satellite dmb transmission.

As a typical operation of wireless communication system, BS270 receives the reverse link from various mobile terminal 1s 0 Signal.Mobile terminal 10 generally participates in call, information receiving and transmitting and other types of communication.Each of certain base station 270 reception is anti- Processed in specific BS270 to link signal.The data obtaining is forwarded to the BSC275 of correlation.BSC provides call Resource allocation and the mobile management function of including the coordination of soft switching process between BS270.BSC275 is also by the number receiving According to being routed to MSC280, it provides the extra route service for forming interface with PSTN290.Similarly, PSTN290 with MSC280 forms interface, and MSC and BSC275 form interface, and BSC275 correspondingly controls BS270 with by forward link signals It is sent to mobile terminal 10.

Based on above-mentioned mobile terminal hardware configuration and communication system, each embodiment of the inventive method is proposed.

Refer to Fig. 3, Fig. 3 is the functional block diagram of the embodiment of the present invention one mobile terminal.Mobile terminal 10 shown in Fig. 3 Including：Sound identification module 101, captions generation module 103, face recognition module 105.Sound identification module 101 passes through voice Identification function identifies the voice content in video file, and voice content is transferred to captions generation module 103.Captions generate mould Voice content is converted into word content by block 103, and word content making is become subtitle file.Face recognition module 105 is led to Cross the facial characteristics that face identification functions judge talker in video file, determine talker's name according to facial characteristics, and give Described subtitle file indicates corresponding talker's name.Supplementary notes, sound identification module 101 passes through speech identifying function The sound characteristic of identification talker, determines talker's name in video file according to sound characteristic, and right to subtitle file sign The talker's name answered.

The mobile terminal that the present embodiment provides identifies the voice content in video file by speech identifying function, by voice Content Transformation becomes word content, and word content is organized into subtitle file, by subtitle file and video file simultaneous display, and sentences The name of talker in disconnected video file, indicates the name of talker in the corresponding subtitle file of voice content of talker, User is facilitated to read the caption content of video file.

Refer to Fig. 4, Fig. 4 is the functional block diagram of the embodiment of the present invention two mobile terminal.Mobile terminal 10 shown in Fig. 4 Including sound identification module 101, captions generation module 103, face recognition module 105, retrieval module 109, document creation module 111st, display module 113, control module 115.In the present embodiment, sound identification module 101 is identified by speech identifying function Voice content in video file, and voice content is transferred to captions generation module 103.Captions generation module 103 is by voice Content Transformation becomes word content, and word content making is become subtitle file.Face recognition module 105 passes through recognition of face work( The name of talker in video file can be judged, and indicate corresponding talker's name to subtitle file.Supplementary notes, language Sound identification module 101 identifies the sound characteristic of talker by speech identifying function, judges talker's name in video file, and Indicate corresponding talker's name to subtitle file.

Retrieval module 109 obtains key word, and retrieves the word paragraph related to key word from word content, and looks into Find out the corresponding video file of word paragraph, subtitle file can be carried out extensive according to keyword and have and targetedly examine Rope, it is possible to achieve quickly video file positioning, is that user finds the have video file related to keyword, reduces user and look into Look for the time of associated video files.

Subtitle file, subtitle file residing time in video file, subtitle file are corresponded to by document creation module 111 Talker's name arrange and become document information, the meeting for example many people being participated in carries out conference video recording, document creation module 111 The speech content of participant in conference video recording is arranged, it will in view video recording, subtitle file, subtitle file are in procceedingss As in residing time, the corresponding conference speech person's name of subtitle file be organized into detailed document information, as minutes, just Consult the content of meeting in participant.

After generating subtitle file, display module 113 simultaneous display video file and subtitle file, facilitate user in viewing Reading sub titles file during video file, when video file volume is less, accent is unclear it is ensured that user understands in the voice of video Hold.

It should be noted that saying in the present embodiment, it is possible to be differentiated by the speech identifying function of sound identification module 101 The name of words person it is also possible to identify the name of talker, mobile terminal 10 by the face identification functions of face recognition module 105 Control module 115 select a kind of mode to differentiate the name of talker from both modes, specifically, when control module 115 When receiving the first control signal, control module 115 is closed face identification functions, is opened speech recognition work(according to the first control signal Can, judge the talker's name in video file by speech identifying function, when control module 115 receives the second control signal When, control module 115 is closed speech identifying function, is opened face identification functions, by recognition of face according to the second control signal Function judges the talker's name in video file.

The mobile terminal that the present embodiment provides identifies the voice content in video file by speech identifying function, by voice Content Transformation becomes word content, and word content is organized into subtitle file, by subtitle file and video file simultaneous display, and sentences The name of talker in disconnected video file, indicates the name of talker in the corresponding subtitle file of voice content of talker, Facilitate user to read the caption content of video file, after carrying out video record, differentiate the speech content of each talker, can be automatically Detailed recording documents are provided, simplify the flow process that video file is carried out with word arrangement.

Refering to Fig. 5, Fig. 5 is the applied environment figure of the embodiment of the present invention three mobile terminal.Applied environment in the present embodiment In figure includes mobile terminal 10, party A-subscriber, party B-subscriber, C user, D user, and mobile terminal 10 is the mobile terminal 10 of Fig. 3 or Fig. 4. In the present embodiment, when party A-subscriber, party B-subscriber, C user, D user are when carrying out meeting, mobile terminal 10 carries out whole record to meeting Picture.Mobile terminal 10 is opened photographic head 70 and is carried out video record to party A-subscriber, party B-subscriber, C user, D user, opens Mike 71 to A User, party B-subscriber, C user, D user are recorded, and carry out video record to the speech in conference process.

When shooting video, user can send control signal by the touch screen of mobile terminal 10, selects to differentiate speech The mode of the name of person, selects according to user, can differentiate talker's by the speech identifying function of sound identification module 101 Name is it is also possible to pass through the name of the face identification functions identification talker of face recognition module 105, specifically, when control mould When block 115 receives the first control signal that touch screen spreads out of, control module 115 closes recognition of face work(according to the first control signal , speech identifying function can be opened, judge the talker's name in video file by speech identifying function, when control module 115 When receiving the second control signal that touch screen spreads out of, control module 115 is closed speech identifying function, is beaten according to the second control signal Open face identification functions, judge the talker's name in video file by face identification functions.

Supplementary notes, the speech identifying function of sound identification module 101 differentiates the concrete steps of the name of talker Including the sound characteristic obtaining party A-subscriber, party B-subscriber, C user, D user from video file, sound characteristic and address name are built Vertical corresponding relation, detects to the voice content in video file, when the voice content detecting meets the sound spy of party A-subscriber When levying, determine that this voice content is sent by party A-subscriber, same mode, determine in the voice that user B, C user, D user send Hold.Judge that by face identification functions the concrete steps of the talker's name in video file include obtaining A from video file User, party B-subscriber, C user, D user facial characteristics, facial characteristics and address name are set up corresponding relation, to video literary composition The face occurring in part carries out facial recognition, when the facial characteristics detecting meet the facial characteristics of party A-subscriber, determines now Voice content sends for party A-subscriber, same mode, determines the voice content that user B, C user, D user send.

Before or after the discriminant approach of the talker's name in determining video file, sound identification module 101 leads to Cross the voice content that speech identifying function identifies in video file, and voice content is transferred to captions generation module 103.Captions Voice content is converted into word content by generation module 103, and word content making is become subtitle file.Face recognition module 105 judge the name of talker in video file by face identification functions, and indicate corresponding talker's surname to subtitle file Name.Sound identification module 101 identifies the sound characteristic of talker by speech identifying function, judges talker's surname in video file Name, and indicate corresponding talker's name to subtitle file.

User can input keyword to be retrieved by touch screen, after retrieval module 109 obtains key word, in word Retrieve the word paragraph related to key word in appearance, and find out the corresponding video file of word paragraph, associated is regarded Frequency file shows in the form of a list, subtitle file can be carried out extensive according to keyword and have and targetedly examine Rope, it is possible to achieve quickly video file positioning, is that user finds the have video file related to keyword, reduces user and look into Look for the time of associated video files.

After conference video recording, document creation module 111 by subtitle file, subtitle file in video file residing when Between, subtitle file corresponding talker name arrange become document information, according to order of speech in conference process to party A-subscriber, B use Family, C user, D user's speech content are arranged, it will in view video recording, subtitle file, subtitle file are residing in conference video recording Time, the corresponding conference speech person's name of subtitle file be organized into detailed document information, as minutes, be easy to participant Member consults the content of meeting.

After generating subtitle file, while conference video recording, display module 113 simultaneous display video file and captions are civilian Part, facilitates user's reading sub titles file when watching video file, when video file volume is less, accent is unclear it is ensured that using Family understands the voice content of video.

The present invention also provides a kind of method generating video caption, the mobile terminal shown in the method application Fig. 3 or Fig. 4 10, below the method for the generation video caption of the present embodiment is described in detail.

Refering to Fig. 6, Fig. 6 is the flow chart of the method that the embodiment of the present invention four generates video caption.

In step s 601, sound identification module 101 identifies the voice content in video file by speech identifying function, And voice content is transferred to captions generation module 103.

In step S603, voice content is converted into word content by captions generation module 103, and word content is made Become subtitle file.

In step s 605, face recognition module 105 judges the surname of talker in video file by face identification functions Name, and indicate corresponding talker's name to subtitle file.

In step S607, sound identification module 101 identifies the sound characteristic of talker by speech identifying function, judges Talker's name in video file, and indicate corresponding talker's name to subtitle file.

It should be added that, in the present embodiment, can only select the speech recognition by sound identification module 101 Function differentiates the name of talker it is also possible to only select to identify talker by the face identification functions of face recognition module 105 Name, specifically, when the control module 115 of mobile terminal 10 receives the first control signal, control module 115 is according to first Control signal is closed face identification functions, is opened speech identifying function, judges saying in video file by speech identifying function Words person's name, when control module 115 receives the second control signal, control module 115 closes voice according to the second control signal Identification function, open face identification functions, judge the talker's name in video file by face identification functions.

The method of the generation video caption that the present embodiment provides identifies the voice in video file by speech identifying function Content, voice content is converted into word content, and word content is organized into subtitle file, will be same to subtitle file and video file Step display, and judge the name of talker in video file, indicate in the corresponding subtitle file of voice content of talker and say The name of words person, facilitates user to read the caption content of video file, and understands the name of spokesman in real time.

Refering to Fig. 7, Fig. 7 is the method flow diagram that the embodiment of the present invention five generates video caption.The method application with Fig. 3 or Mobile terminal 10 shown in Fig. 4, describes in detail to the method for the generation video caption of the present embodiment below.

In step s 701, retrieval module 109 obtains key word, and retrieves related to key word from word content Word paragraph, and find out the corresponding video file of word paragraph.In this step, the method obtaining key word can be passed through The touch screen of mobile terminal 10 inputs corresponding keyword, subtitle file is carried out extensive according to keyword and has targetedly Retrieval, it is possible to achieve quickly video file positioning, is that user finds the have video file related to keyword, reduces user Search the time of associated video files.

In step S703, document creation module 111 by subtitle file, subtitle file in video file residing when Between, subtitle file corresponding talker name arrange become document information.The meeting for example many people being participated in carries out conference video recording, The speech content of participant in conference video recording is arranged by document creation module 111, it will subtitle file, word in view video recording Curtain file residing time in conference video recording, the corresponding conference speech person's name of subtitle file are organized into detailed document information, As minutes, it is easy to the content that participant consults meeting.

In step S705, after generating subtitle file, display module 113 simultaneous display video file and subtitle file, Facilitate user's reading sub titles file when watching video file, when video file volume is less, accent is unclear it is ensured that user The voice content of solution video.

The method of the generation video caption that present embodiment provides identifies the language in video file by speech identifying function Sound content, voice content is converted into word content, and word content is organized into subtitle file, by subtitle file and video file Simultaneous display, and judge the name of talker in video file, indicate in the corresponding subtitle file of voice content of talker The name of talker, facilitates user to read the caption content of video file, after carrying out video record, differentiates sending out of each talker Speech content, provides detailed recording documents, simplifies the flow process that video file is carried out with word arrangement.

These are only presently preferred embodiments of the present invention, not in order to limit the present invention, all spirit in the present invention and Any modification, equivalent and improvement of being made within principle etc., should be included within the scope of the present invention.

Claims

1. a kind of mobile terminal is it is characterised in that include：

Captions generation module, for described voice content is converted into word content, and the making of described word content is become word Curtain file；And

Face recognition module, for judging the facial characteristics of talker in described video file by face identification functions, according to Described facial characteristics determine talker's name, and indicate corresponding talker's name to described subtitle file.

2. mobile terminal as claimed in claim 1 is it is characterised in that described sound identification module is additionally operable to by speech recognition The sound characteristic of identification of function talker, determines talker's name in described video file according to described sound characteristic, and gives institute State subtitle file and indicate corresponding talker's name.

3. the mobile terminal as described in as arbitrary in claim 1-2 one is it is characterised in that also include：

Retrieval module, for obtaining key word, and retrieves the word section related to described key word from described word content Fall, and find out the corresponding video file of described word paragraph.

4. the mobile terminal as described in claim 1-2 is it is characterised in that also include：

Document creation module, for by time residing in described video file to described subtitle file, described subtitle file, institute Stating subtitle file corresponding talker name and arranging becomes document information；

5. the mobile terminal described in arbitrary one of claim 1-2 is it is characterised in that also include：

Control module, for receiving the first control signal, closes described face identification functions, beats according to described first control signal Open described speech identifying function, judge the talker's name in described video file by described speech identifying function, and receive Second control signal, closes described speech identifying function, opens described face identification functions according to described second control signal, lead to Cross described face identification functions and judge the talker's name in described video file.

6. a kind of method generating video caption is it is characterised in that include：

Voice content in video file is identified by speech identifying function；

Judge the facial characteristics of talker in described video file by face identification functions, determined according to described facial characteristics and say Words person's name, and indicate corresponding talker's name to described subtitle file.

7. the method generating video caption as claimed in claim 6 is it is characterised in that also include：

Identify the sound characteristic of talker by speech identifying function, determined in described video file according to described sound characteristic and say Words person's name, and indicate corresponding talker's name to described subtitle file.

8. the method for the generation video caption as described in as arbitrary in claim 6-7 one is it is characterised in that also include：

Obtain key word, and retrieve the word paragraph related to described key word from described word content, and find out institute State the corresponding video file of word paragraph.

9. the method for the generation video caption as described in claim 6-7 is it is characterised in that also include：

Will be corresponding to time residing in described video file to described subtitle file, described subtitle file, described subtitle file Talker's name arranges becomes document information；

Video file described in simultaneous display and described subtitle file.

10. the method for the generation video caption described in arbitrary one of claim 6-7 is it is characterised in that also include：

Receive the first control signal, close described face identification functions, open described voice knowledge according to described first control signal Other function, judges the talker's name in described video file by described speech identifying function；

Receive the second control signal, close described speech identifying function, open described face knowledge according to described second control signal Other function, judges the talker's name in described video file by described face identification functions.