CN106776836A

CN106776836A - Apparatus for processing multimedia data and method

Info

Publication number: CN106776836A
Application number: CN201611062262.6A
Authority: CN
Inventors: 赵欣
Original assignee: Nubia Technology Co Ltd
Current assignee: Nubia Technology Co Ltd
Priority date: 2016-11-25
Filing date: 2016-11-25
Publication date: 2017-05-31

Abstract

The invention discloses a kind of apparatus for processing multimedia data, including：First acquisition module, obtains the speech data and photo of collection in advance, and extracts the shooting time of the photo；Matching module, the speech data is matched with the reference speech data of the target person of advance collection, the first speech data that acquisition matches；First modular converter, multiple correspondence Ziwen notebook datas are converted to according to the shooting time by the first speech data；First insertion module, all Ziwen notebook datas and photo are inserted according to the shooting time in default multimedia document.The invention also discloses a kind of multimedia data processing method.With the photo for shooting be automatically added in multimedia document text data sequentially in time after the speech data of collection is converted into text data by the present invention, solves the technical problem that cannot automatically process multi-medium data in the prior art.

Description

Apparatus for processing multimedia data and method

Technical field

The present invention relates to multimedia technology field, more particularly to a kind of apparatus for processing multimedia data and method.

Background technology

At present, with the lifting of quality of life, tourism has become a part essential during people live.It is busy It is worked after, tourism can loosen body and mind, broaden one's outlook, wash mental state, understand various regions folkways and customs.In happy travelling After end, most people all wants memory fine during the journey to record, for example, write a travel notes, is shared with the kith and kin of oneself Good friend.

But, it is often very troublesome to complete a complete travel notes.Especially face prolonged journey, people Each detail records of course get off to be nearly impossible, and omission is had unavoidably.And, when photo is classified, face Hundreds of even upper thousand sheets photos, often have no way of doing it, it is necessary to take a significant amount of time classification.

The content of the invention

It is a primary object of the present invention to propose a kind of apparatus for processing multimedia data and method, it is intended to solve prior art In cannot automatically process multi-medium data, it is necessary to user go manually treatment, the technical problem for wasting time and energy.

To achieve the above object, the present invention provides a kind of apparatus for processing multimedia data, including：

First acquisition module, speech data and photo for obtaining advance collection, and when extracting the shooting of the photo Between；

Matching module, speech data and the reference speech data of the target person of advance collection for that will get are carried out Matching, obtains the first speech data with the reference speech data match；

First modular converter, for first speech data to be converted into multiple correspondence Ziwens according to the shooting time Notebook data；

First insertion module, for according to the shooting time, all sons being inserted in default multimedia document Text data and the photo.

Alternatively, first modular converter includes：

Division module, for using the shooting time as cut-point, multiple the is divided into by first speech data One sub- speech data；

Second modular converter, for the described first sub- speech data to be converted into corresponding Ziwen notebook data one by one.

Alternatively, the first insertion module includes：

Second insertion module, for inserting all Ziwen notebook datas in the default multimedia document；

3rd insertion module, for inserting the photo between two adjacent Ziwen notebook datas so that described two sons The corresponding shooting time of cut-point of text data is corresponding with the shooting time of the photo.

Alternatively, described device also includes：

Second acquisition module, for obtaining the geography gathered during the photo that collection speech data and user shoot Positional information, wherein, during the photo that collection speech data and user shoot, gathered every default time interval and used The geographical location information of family region；

4th insertion module, for the acquisition time according to the geographical location information and the photo shooting time with And the sequencing of the multiple Ziwen notebook data corresponding time, the geographical location information is inserted into the multimedia text In shelves.

Alternatively, the first insertion module is additionally operable to：

After all Ziwen notebook datas and the photo are inserted in the default multimedia document, create to work as the day before yesterday The file of phase name, the photo that the multimedia document and the speech data for collecting shoot with user is saved in respectively In the file.

In addition, to achieve the above object, the present invention also provides a kind of multimedia data processing method, and methods described includes：

The speech data and photo of collection in advance are obtained, and extracts the shooting time of the photo；

The speech data that will be got is matched with the reference speech data of the target person of advance collection, is obtained and institute State the first speech data of reference speech data match；

First speech data is converted to by multiple correspondence Ziwen notebook datas according to the shooting time；

According to the shooting time, all Ziwen notebook datas and the photograph are inserted in default multimedia document Piece.

Alternatively, first speech data is converted to the step of multiple correspondence Ziwen notebook datas according to the shooting time Suddenly include：

Using the shooting time as cut-point, first speech data is divided into multiple first sub- speech datas；

Described first sub- speech data is converted into corresponding Ziwen notebook data one by one.

Alternatively, according to the shooting time, inserted in default multimedia document all Ziwen notebook datas and The step of photo, includes：

All Ziwen notebook datas are inserted in the default multimedia document；

The photo is inserted between two adjacent Ziwen notebook datas so that the cut-point pair of described two Ziwen notebook datas The shooting time answered is corresponding with the shooting time of the photo.

Alternatively, according to the shooting time, inserted in default multimedia document all Ziwen notebook datas and Also include after the step of photo：

The geographical location information gathered during the photo that collection speech data and user shoot is obtained, wherein, During the photo that collection speech data and user shoot, every the geography of default time interval collection user region Positional information；

According to the acquisition time of the geographical location information and the shooting time of the photo and the multiple Ziwen sheet The sequencing of data corresponding time, the geographical location information is inserted into the multimedia document.

Alternatively, according to the shooting time, inserted in default multimedia document all Ziwen notebook datas and Also include after the photo：

The file named with current date is created, by the multimedia document and the speech data for collecting and user The photo of shooting is saved in the file respectively.

A kind of apparatus for processing multimedia data provided by the present invention, including：First acquisition module, adopts in advance for obtaining The speech data and photo of collection, and extract the shooting time of the photo；Matching module, for the speech data that will get with The reference speech data of the target person of collection are matched in advance, obtain the first language with the reference speech data match Sound data；First modular converter, for first speech data to be converted into multiple correspondence Ziwens according to the shooting time Notebook data；First insertion module, for according to the shooting time, all Ziwens being inserted in default multimedia document Notebook data and the photo.Present invention also offers a kind of multimedia data processing method.Language by gathering user of the invention Sound data and the photo for shooting, and after the speech data of collection is converted into text data, by text data and the photo for shooting It is automatically added in multimedia document sequentially in time, it is not necessary to user's manual handle, solving in the prior art cannot be certainly The technical problem of dynamic treatment multi-medium data.

Brief description of the drawings

Fig. 1 is the hardware architecture diagram for realizing the optional mobile terminal of each embodiment one of the invention；

Fig. 2 is the radio communication device schematic diagram of mobile terminal in Fig. 1；

Fig. 3 is the module diagram of apparatus for processing multimedia data first embodiment of the present invention；

Fig. 4 is the refinement module schematic diagram of the first modular converter 30 in apparatus for processing multimedia data shown in Fig. 3 of the present invention；

Fig. 5 is the refinement module schematic diagram of the first insertion module 40 in apparatus for processing multimedia data shown in Fig. 3 of the present invention；

Fig. 6 is the schematic diagram of a scenario for inserting photo in the present invention between two neighboring Ziwen notebook data；

Fig. 7 is another schematic diagram of a scenario for inserting photo in the present invention between two neighboring Ziwen notebook data；

Fig. 8 is the module diagram of apparatus for processing multimedia data second embodiment of the present invention；

Fig. 9 is the schematic diagram of a scenario that the geographical location information that will be collected in the present invention is inserted into multimedia document；

Figure 10 is the schematic flow sheet of multimedia data processing method first embodiment of the present invention；

Figure 11 is the refinement step schematic flow sheet of step S30 in multimedia data processing method shown in Figure 10 of the present invention；

Figure 12 is the refinement step schematic flow sheet of step S40 in multimedia data processing method shown in Figure 10 of the present invention；

Figure 13 is the schematic flow sheet of multimedia data processing method second embodiment of the present invention.

The realization of the object of the invention, functional characteristics and advantage will be described further referring to the drawings in conjunction with the embodiments.

Specific embodiment

It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, it is not intended to limit the present invention.

The mobile terminal of each embodiment of the invention is realized referring now to Description of Drawings.In follow-up description, use For represent element such as " module ", " part " or " unit " suffix only for being conducive to explanation of the invention, itself Not specific meaning.Therefore, " module " can be used mixedly with " part ".

Mobile terminal can be implemented in a variety of manners.For example, the terminal described in the present invention can include such as moving Phone, smart phone, notebook computer, digit broadcasting receiver, PDA (personal digital assistant), PAD (panel computer), PMP The mobile terminal of (portable media player), guider etc. and such as numeral TV, desktop computer etc. are consolidated Determine terminal.Hereinafter it is assumed that terminal is mobile terminal.However, it will be understood by those skilled in the art that, except being used in particular for movement Outside the element of purpose, construction according to the embodiment of the present invention can also apply to the terminal of fixed type.

Fig. 1 is the hardware architecture diagram for realizing the optional mobile terminal of each embodiment one of the invention.

Mobile terminal 1 00 can include wireless communication unit 110, A/V (audio/video) input block 120, user input Unit 130, sensing unit 140, output unit 150, apparatus for processing multimedia data 400, memory 160, controller 180 and electricity Source unit 190 etc..Fig. 1 shows the mobile terminal with various assemblies, it should be understood that being not required for implementing all The component for showing.More or less component can alternatively be implemented.The element of mobile terminal will be discussed in more detail below.

Wireless communication unit 110 generally includes one or more assemblies, and it allows mobile terminal 1 00 and radio communication device Or the radio communication between network.For example, wireless communication unit can include broadcasting reception module, mobile communication module, nothing At least one of line the Internet module, short range communication module and location information module.

A/V input blocks 120 are used to receive audio or video signal.A/V input blocks 120 can include the He of camera 121 Microphone 122, the static images that 121 pairs, camera is obtained in Video Capture pattern or image capture mode by image capture apparatus Or the view data of video is processed.Picture frame after treatment may be displayed on display unit 151.Processed through camera 121 Picture frame afterwards can be stored in memory 160 (or other storage mediums) or sent out via wireless communication unit 110 Send, two or more cameras 121 can be provided according to the construction of mobile terminal.Microphone 122 can be in telephone calling model, note Sound (voice data) is received via microphone in record pattern, speech recognition mode etc. operational mode, and can be by so Acoustic processing be voice data.Microphone 122 can implement various types of noises eliminate (or suppression) algorithms eliminating (or Suppress) noise of generation or interference during receiving and sending audio signal.

User input unit 130 can generate key input data to control each of mobile terminal according to the order of user input Plant operation.User input unit 130 allows the various types of information of user input, and can include keyboard, metal dome, touch Plate (for example, detection due to being touched caused by resistance, pressure value, electric capacity etc. change sensitive component), roller, rocking bar Etc..Especially, when touch pad is superimposed upon on display unit 151 in the form of layer, touch-screen can be formed.

Sensing unit 140 detects the current state of mobile terminal 1 00, (for example, mobile terminal 1 00 opens or closes shape State), the presence or absence of the contact (that is, touch input) of the position of mobile terminal 1 00, user for mobile terminal 1 00, mobile terminal 100 orientation, the acceleration of mobile terminal 1 00 or by speed is mobile and direction etc., and generate for controlling mobile terminal 1 00 The order of operation or signal.In addition, sensing unit 140 can detect whether power subsystem 190 provides electric power.

Display unit 151 may be displayed on the information processed in mobile terminal 1 00.For example, when mobile terminal 1 00 is in electricity During words call mode, display unit 151 can show and converse or other communicate (for example, text messaging, multimedia file Download etc.) related user interface (UI) or graphic user interface (GUI).When mobile terminal 1 00 is in video calling pattern Or during image capture mode, display unit 151 can show the image of capture and/or the image of reception, show video or figure UI or GUI of picture and correlation function etc..

Meanwhile, when display unit 151 and touch pad in the form of layer it is superposed on one another to form touch-screen when, display unit 151 can serve as input unit and output device.Display unit 151 can include liquid crystal display (LCD), thin film transistor (TFT) In LCD (TFT-LCD), Organic Light Emitting Diode (OLED) display, flexible display, three-dimensional (3D) display etc. at least It is a kind of.Some in these displays may be constructed such that transparence to allow user to be watched from outside, and this is properly termed as transparent Display, typical transparent display can be, for example, TOLED (transparent organic light emitting diode) display etc..According to specific Desired implementation method, mobile terminal 1 00 can include two or more display units (or other display devices), for example, moving Dynamic terminal can include outernal display unit (not shown) and inner display unit (not shown).Touch-screen can be used to detect touch Input pressure value and touch input position and touch input area.

Dio Output Modules 152 can mobile terminal be in call signal reception pattern, call mode, logging mode, It is that wireless communication unit 110 is received or in memory 160 when under the isotypes such as speech recognition mode, broadcast reception mode The voice data transducing audio signal of middle storage and it is output as sound.And, dio Output Modules 152 can be provided and movement The audio output (for example, call signal receives sound, message sink sound etc.) of the specific function correlation that terminal 100 is performed. Dio Output Modules 152 can include sound pick-up, buzzer etc..

Memory 160 can store software program for the treatment and control operation performed by controller 180 etc., Huo Zheke Temporarily to store oneself data (for example, telephone directory, message, still image, video etc.) through exporting or will export.And And, memory 160 can store the vibration of various modes on being exported when touching and being applied to touch-screen and audio signal Data.

Memory 160 can include the storage medium of at least one type, and the storage medium includes flash memory, hard disk, many Media card, card-type memory (for example, SD or DX memories etc.), random access storage device (RAM), static random-access storage Device (SRAM), read-only storage (ROM), Electrically Erasable Read Only Memory (EEPROM), programmable read only memory (PROM), magnetic storage, disk, CD etc..And, mobile terminal 1 00 can perform memory with by network connection The network storage device cooperation of 160 store function.

The overall operation of the generally control mobile terminal of controller 180.For example, controller 180 is performed and voice call, data Communication, video calling etc. related control and treatment.In addition, controller 180 can be included for reproducing (or playback) many matchmakers The multi-media module 181 of volume data, multi-media module 181 can be constructed in controller 180, or can be structured as and control Device 180 is separated.Controller 180 can be with execution pattern identifying processing, the handwriting input that will be performed on the touchscreen or picture Draw input and be identified as character or image.

Power subsystem 190 receives external power or internal power under the control of controller 180 and provides operation each unit Appropriate electric power needed for part and component.

Various implementation methods described herein can be with use such as computer software, hardware or its any combination of calculating Machine computer-readable recording medium is implemented.Implement for hardware, implementation method described herein can be by using application-specific IC (ASIC), digital signal processor (DSP), digital signal processing device (DSPD), programmable logic device (PLD), scene can Programming gate array (FPGA), processor, controller, microcontroller, microprocessor, it is designed to perform function described herein At least one in electronic unit is implemented, and in some cases, such implementation method can be implemented in controller 180. For software implementation, the implementation method of such as process or function can with allow to perform the single of at least one function or operation Software module is implemented.Software code can be come by the software application (or program) write with any appropriate programming language Implement, software code can be stored in memory 160 and performed by controller 180.

Mobile terminal 1 00 as shown in Figure 1 may be constructed such that using via frame or packet transmission data it is all if any Line and radio communication device and satellite-based communicator are operated.

The communicator that mobile terminal wherein of the invention can be operated is described referring now to Fig. 2.

Such communicator can use different air interface and/or physical layer.For example, used by communicator Air interface includes such as frequency division multiple access (FDMA), time division multiple acess (TDMA), CDMA (CDMA) and universal mobile communications dress Put (UMTS) (especially, Long Term Evolution (LTE)), global mobile communication device (GSM) etc..As non-limiting example, under The description in face is related to cdma communication device, but such teaching is equally applicable to other types of device.

With reference to Fig. 2, cdma wireless communication device can include multiple mobile terminal 1s 00, multiple base station (BS) 270, base station Controller (BSC) 275 and mobile switching centre (MSC) 280.MSC280 is configured to and Public Switched Telephony Network (PSTN) 290 form interface.MSC280 is also structured to form interface with the BSC275 that can be couple to BS270 via back haul link.Return If any one in the interface that journey circuit can be known according to Ganji is constructed, the interface includes such as E1/T1, ATM, IP, PPP, frame relay, HDSL, ADSL or xDSL.It will be appreciated that device can include multiple BSC275 as shown in Figure 2.

Each BS270 can service one or more subregions (or region), by multidirectional antenna or the day of sensing specific direction Each subregion of line covering is radially away from BS270.Or, each subregion can be by two or more for diversity reception Antenna is covered.Each BS270 may be constructed such that the multiple frequency distribution of support, and the distribution of each frequency has specific frequency spectrum (for example, 1.25MHz, 5MHz etc.).

What subregion and frequency were distributed intersects can be referred to as CDMA Channel.BS270 can also be referred to as base station transceiver Device (BTS) or other equivalent terms.In this case, term " base station " can be used for broadly representing single BSC275 and at least one BS270.Base station can also be referred to as " cellular station ".Or, each subregion of specific BS270 can be claimed It is multiple cellular stations.

As shown in Figure 2, broadcast singal is sent to broadcsting transmitter (BT) 295 mobile terminal operated in device 100.In fig. 2 it is shown that several global pick device (GPS) satellites 300.Satellite 300 helps position multiple mobile terminal 1s 00 At least one of.

In fig. 2, multiple satellites 300 are depicted, but it is understood that, it is possible to use any number of satellite is obtained Useful location information.Used as a typical operation of radio communication device, BS270 is received from various mobile terminal 1s 00 Reverse link signal.Mobile terminal 1 00 generally participates in call, information receiving and transmitting and other types of communication.Certain base station 270 is received Each reverse link signal processed in specific BS270.The data of acquisition are forwarded to the BSC275 of correlation.BSC The mobile management function of call resource allocation and the coordination including the soft switching process between BS270 is provided.BSC275 will also connect The data for receiving are routed to MSC280, and it provides the extra route service for forming interface with PSTN290.Similarly, PSTN290 and MSC280 form interface, and MSC and BSC275 forms interface, and BSC275 correspondingly controls BS270 with by forward direction Link signal is sent to mobile terminal 1 00.

Based on above-mentioned mobile terminal hardware configuration, communication apparatus structure, propose that apparatus for processing multimedia data of the present invention is each Embodiment, the apparatus for processing multimedia data is a part for mobile terminal.

Reference picture 3, Fig. 3 is the module diagram of apparatus for processing multimedia data first embodiment of the present invention, the present embodiment In, the apparatus for processing multimedia data 400 includes：

First acquisition module 10, for obtaining the speech data and photo of advance collection, and extracts the shooting of the photo Time.

In the present embodiment, by taking travelling process as an example, the scenery of grace or the scape of uniqueness are met during travelling as user After point, often can all sigh with all sorts of feeling in the heart, then reach the impression of oneself heart this moment to the buddylist of oneself by language, Or current this image and scene are described with the language of oneself, and in the course of the description, it may be desirable to by way of taking pictures, by eye Preceding vision record gets off, and is kept as a souvenir or is shared with friend relatives of oneself.Therefore in the present embodiment, multimedia is started in user After data handling utility, the photo captured by the speech data of Real-time Collection mobile terminal near zone and user, and protect There is default region.Wherein, the shooting time corresponding to the corresponding acquisition time of speech data and photo that will be collected Also preserved simultaneously.After user closes multimedia-data procession application, the automatic speech data and photo for obtaining collection, And extract the shooting time of the photo.

Matching module 20, speech data and the reference speech data of the target person of advance collection for that will get enter Row matching, obtains the first speech data with the reference speech data match.

In the present embodiment, it is contemplated that in the gatherer process of speech data, when user is in crowd, the voice for collecting Data can include the speech data of all visitors of mobile terminal near zone, and for a user, only only want to oneself Or companion what is said or talked about is recorded, and be not intended to recorded described in other visitors.Therefore in the present embodiment, will get To speech data matched with the reference speech data of the target person of advance collection, obtain and the reference speech number According to the first speech data for matching, i.e., the speech data for getting is filtered, only obtain the target person that user specifies The speech data of thing.Wherein, can be obtained and the reference speech data phase by speech recognition technology or sound groove recognition technology in e etc. First speech data of matching.

Wherein, in the present embodiment, the target person can be two or more.

First modular converter 30, for first speech data to be converted into multiple correspondence according to the shooting time Text data.

In the present embodiment, such as user during travelling, typically all can while take pictures, while exchanged with companion, therefore First speech data is converted to multiple Ziwen notebook datas by the shooting time according to the photo for collecting.Wherein, may be used The speech data is converted into multiple Ziwen notebook datas with by modes such as speech recognition technology or speech softwares.

First insertion module 40, it is all described for according to the shooting time, being inserted in default multimedia document Ziwen notebook data and the photo.

In the present embodiment, a multimedia document is pre-build, may be inserted into photo in the multimedia document, word, Geography information etc..According to the shooting time of the photo for collecting, inserted according to time sequencing in default multimedia document Enter the multiple Ziwen notebook data and the photo, and preserved.

Wherein, according to the shooting time, all Ziwen notebook datas and institute are inserted in default multimedia document State and also include after photo：The file named with current date is created, by the multimedia document and the voice for collecting The photo that data shoot with user is saved in the file respectively.In the present embodiment, at the same by the multimedia document with And the photo that the speech data for collecting shoots with user is saved in the file respectively, to facilitate user many to what is preserved Media document carries out the setting or modification of personalization.

Specifically, the multimedia document preserved in the present embodiment as " travel notes " can be shared with other people or be uploaded to Network.

Apparatus for processing multimedia data 400 described in the present embodiment, including：First acquisition module 10, obtains collection in advance Speech data and photo, and extract the shooting time of the photo；Matching module 20, the speech data that will be got with it is advance The reference speech data of the target person of collection are matched, and obtain the first voice number with the reference speech data match According to；First modular converter 30, multiple correspondence Ziwen notebook datas are converted to according to the shooting time by first speech data； First insertion module 40, according to the shooting time, inserted in default multimedia document all Ziwen notebook datas and The photo.The present embodiment, and will after the speech data of collection is converted into text data by gathering speech data and photo Text data is automatically added in multimedia document sequentially in time with the photo, it is not necessary to user's manual handle, is solved The technical problem of multi-medium data in the prior art cannot be automatically processed.

Further, reference picture 4, Fig. 4 is the first modular converter 30 in apparatus for processing multimedia data shown in Fig. 3 of the present invention Refinement module schematic diagram, based on the embodiment described in above-mentioned Fig. 3, first modular converter 30 includes：

Division module 31, for using the shooting time as cut-point, first speech data being divided into multiple First sub- speech data.

In the present embodiment, if user is when during travelling, usually while being taken pictures, while entering to exchange the heart with companion , therefore user corresponding to each photo captured by user gains in depth of comprehension this moment also can be different.For example, 9 points 00 minute To 9: 10 by stages between, collect user and have taken within 08 minute a photo at 05 minute, 9 points at 04 minute, 9 points at 02 minute, 9 points at 9 points respectively, So just with 9 points 02 minute, 9 points 04 minute, 9 points 05 minute, 9: 08 be allocated as the first voice for will being got in the time period for cut-point Data are divided into 5 the first sub- speech datas.

Second modular converter 32, for the described first sub- speech data to be converted into corresponding Ziwen notebook data one by one.

In the present embodiment, after the above-mentioned speech data for collecting is divided into multiple first sub- speech datas, by voice The multiple first sub- speech data is respectively converted into multiple sub- textual datas by the mode such as identification technology or speech software According to.

The first modular converter 30 described in the present embodiment includes：Division module 31, for using the shooting time as point Cutpoint, multiple first sub- speech datas are divided into by first speech data；Second modular converter 32, for by described first Sub- speech data is converted to corresponding Ziwen notebook data one by one.The present embodiment is by the shooting time of photo by first voice Data are divided into multiple first sub- speech datas, and the multiple first sub- speech data is respectively converted into multiple textual datas According to, it is not necessary to user's manual handle, the implementation process for automatically processing multi-medium data is further simplify, save the essence of user Power.

Further, reference picture 5, Fig. 5 is the first insertion module 40 in apparatus for processing multimedia data shown in Fig. 3 of the present invention Refinement module schematic diagram, based on the embodiment described in above-mentioned Fig. 3, it is described first insertion module 40 include：

Second insertion module 41, for inserting all Ziwen notebook datas in the default multimedia document.

In the present embodiment, using the acquisition time of the corresponding first sub- speech data of each Ziwen notebook data as the Ziwen sheet The data corresponding time, by conversion after multiple Ziwen notebook datas be sequentially inserted into default many matchmakers according to corresponding time order and function In body document.

3rd insertion module 42, for inserting the photo between two adjacent Ziwen notebook datas so that described two The corresponding shooting time of cut-point of Ziwen notebook data is corresponding with the shooting time of the photo.

In the present embodiment, two corresponding shooting times of cut-point of adjacent Ziwen sheet are obtained first, bat is then obtained again The consistent photo of time shooting time corresponding with the cut-point is taken the photograph, the photo is finally inserted into described adjacent two Between individual sub- text so that the corresponding shooting time of cut-point of described two Ziwen notebook datas and the shooting time of the photo It is corresponding.

In order to more fully understand the technical scheme that this implementation is provided, reference picture 6, Fig. 6 in the present invention two neighboring The schematic diagram of a scenario of photo is inserted between Ziwen notebook data.

In addition, in the present embodiment, such as user wants to record oneself tourism experience common with companion in multimedia document, i.e., User preset collection speech data target person be two or more, then the speech data that will be collected respectively with advance The reference speech data of each target person for first gathering are matched, and obtain the reference speech data phase with each target person First speech data of matching, and first speech data of different target personage is entered into rower by different mark modes Note.First speech data is divided into many sub- speech datas by the shooting time of the photo according to collection respectively, and After the multiple sub- speech data is converted to text data, by the corresponding text data of each target person equally with different Mark mode is marked.

Wherein, the multiple Ziwen notebook data is inserted in default multimedia document, and target person is corresponding Text data is marked with different mark modes.For example, by the corresponding Ziwen notebook data of target person first labeled as blueness, By the corresponding Ziwen notebook data of target person second labeled as red etc..

In order to be better understood from the technical scheme described in the present embodiment, reference picture 7, Fig. 7 in the present invention two neighboring Another schematic diagram of a scenario of photo is inserted between Ziwen notebook data.In Fig. 7, distinguished by the thickness of text target person first with Text data after the speech data corresponding conversion of target person second.

The first modular converter 40 described in the present embodiment includes：Second insertion module 41, in default many matchmakers All Ziwen notebook datas are inserted in body document；3rd insertion module 42, for interleaving in two neighboring Ziwen notebook data Enter the photo so that the corresponding shooting time of cut-point of described two Ziwen notebook datas and the shooting time phase of the photo Correspondence.The present embodiment by inserting all Ziwen notebook datas sequentially in time in default multimedia document, and according to The shooting time of photo inserts photo between two neighboring Ziwen notebook data, and multimedia document is automatically generated with this, is not required to User's Manual arranging is wanted, the energy of user is dramatically saves on.

Further, reference picture 8, Fig. 8 is the module diagram of apparatus for processing multimedia data second embodiment of the present invention, Based on the embodiment described in above-mentioned Fig. 3, in the present embodiment, the apparatus for processing multimedia data 400 also includes：

Second acquisition module 50, for obtaining the ground gathered during the photo that collection speech data and user shoot Reason positional information, wherein, during the photo that collection speech data and user shoot, every the collection of default time interval The geographical location information of user region.

In the present embodiment, during the photo that collection speech data and user shoot, every default time interval The geographical location information of user is gathered, for example, gathered a geographical location information for user position every 30 minutes.

4th insertion module 60, for the shooting time of the acquisition time according to the geographical location information and the photo And the sequencing of the multiple Ziwen notebook data corresponding time, the geographical location information is inserted into the multimedia In document.

In the present embodiment, during the shooting of acquisition time and the photo for collecting according to the geographical location information Between and the multiple Ziwen notebook data corresponding time sequencing, the geographical location information is inserted into default In multimedia document.

In order to be better understood from the technical scheme described in the present embodiment, reference picture 9, Fig. 9 is that will collect in the present invention Geographical location information is inserted into the schematic diagram of a scenario in multimedia document.In Fig. 9, it is assumed that the position where getting user is , then "-the Forbidden City-" be inserted into multimedia document according to the time point for getting the positional information by " the Forbidden City ", i.e., in fig .9, Positional information "-the Forbidden City-" corresponding time point is later than " photo " the corresponding time point above it, earlier than " text below Data " corresponding time point.

The present embodiment by timing acquiring user geographical location information, and the geographical location information that will be collected according to Time sequencing is inserted into default multimedia document so that the movement of user can be effectively recorded in the multimedia document of generation Route, it is not necessary to which user goes addition manually, saves the energy of user.

The present invention also provides a kind of multimedia data processing method, and the multimedia data processing method is mainly used in movement In terminal, reference picture 10, Figure 10 is the schematic flow sheet of multimedia data processing method first embodiment of the present invention, the present embodiment In, the multimedia data processing method includes：

Step S10, obtains the photo that the speech data of collection in advance and user shoot, and when extracting the shooting of the photo Between.

In the present embodiment, by taking travelling process as an example, the scenery of grace or the scape of uniqueness are met during travelling as user After point, often can all sigh with all sorts of feeling in the heart, then reach the impression of oneself heart this moment to the buddylist of oneself by language, Or current this image and scene are described with the language of oneself, and in the course of the description, it may be desirable to by way of taking pictures, by eye Preceding vision record gets off, and is kept as a souvenir or is shared with friend relatives of oneself.Therefore in the present embodiment, multimedia is started in user After data handling utility, the photo captured by the speech data of Real-time Collection mobile terminal near zone and user, and protect There is default region.Wherein, the shooting time corresponding to the corresponding acquisition time of speech data and photo that will be collected Also preserved simultaneously.Terminate route in user, close after multimedia-data procession application, the automatic voice number for obtaining collection According to and photo, and extract the shooting time of the photo.

Step S20, the speech data that will be got is matched with the reference speech data of the target person of advance collection, Obtain the first speech data with the reference speech data match.

In the present embodiment, it is contemplated that in the gatherer process of speech data, when user is in crowd, the voice for collecting Data can include the speech data of all visitors of mobile terminal near zone, and for a user, only only want to oneself Or companion what is said or talked about is recorded, and be not intended to recorded described in other visitors.Therefore, in the present embodiment, will obtain To speech data matched with the reference speech data of the target person of advance collection, obtain and the reference speech First speech data of data match, i.e., filter to the speech data for getting, and only obtains the target that user specifies The speech data of personage.Wherein, can be obtained and the reference speech data by speech recognition technology or sound groove recognition technology in e etc. The first speech data for matching.

Wherein, in the present embodiment, the target person can be two or more.

Step S30, multiple correspondence Ziwen notebook datas are converted to according to the shooting time by first speech data.

In the present embodiment, such as user during travelling, typically all can while take pictures, while exchanged with companion, therefore, When multi-medium data is processed, first speech data is converted to multiple by the shooting time according to the photo for collecting Ziwen notebook data.Wherein it is possible to the speech data is converted to by modes such as speech recognition technology or speech softwares Multiple Ziwen notebook datas.

Step S40, according to the shooting time, inserted in default multimedia document all Ziwen notebook datas and The photo.

Multimedia data processing method described in the present embodiment includes：The speech data and photo of collection in advance are obtained, and Extract the shooting time of the photo；The speech data that will be got enters with the reference speech data of the target person of advance collection Row matching, obtains the first speech data with the reference speech data match；According to the shooting time by described first Speech data is converted to multiple correspondence Ziwen notebook datas；According to the shooting time, institute is inserted in default multimedia document There are the Ziwen notebook data and the photo.The present embodiment is by gathering speech data of the user during travelling with shooting Photo, and add text data automatically sequentially in time with the photo after the speech data of collection is converted into text data It is added in multimedia document, it is not necessary to user's manual handle, solving cannot automatically generate during travelling in the prior art The technical problem of multimedia document.

Further, reference picture 11, Figure 11 is the thin of step S30 in multimedia data processing method shown in Figure 10 of the present invention Change steps flow chart schematic diagram, based on the embodiment described in above-mentioned Figure 10, the step S30 includes：

Step S31, using the shooting time as cut-point, multiple first sub- languages is divided into by first speech data Sound data.

In the present embodiment, user comes in tourism process, usually while being taken pictures, while enter to exchange gains in depth of comprehension with companion, Therefore the gains in depth of comprehension this moment of the user corresponding to each photo captured by user also can be different.For example, 9 points 00 minute to 9 Between putting 10 by stages, collect user and have taken within 08 minute a photo at 02 minute, 9 points at 04 minute, 9 points at 05 minute, 9 points at 9 points respectively, then Just with 9 points 02 minute, 9 points 04 minute, 9 points 05 minute, 9: 08 be allocated as the first speech data for will being got in the time period for cut-point It is divided into 5 the first sub- speech datas.

Step S32, corresponding Ziwen notebook data is converted to by the described first sub- speech data one by one.

In multimedia data processing method described in the present embodiment, the step S30 includes：Using the shooting time as point Cutpoint, multiple first sub- speech datas are divided into by first speech data；Described first sub- speech data is changed one by one It is corresponding Ziwen notebook data.First speech data is divided into multiple first by the present embodiment by the shooting time of photo Sub- speech data, and the multiple first sub- speech data is respectively converted into multiple text datas, it is not necessary to user locates manually Reason, further simplify the implementation process for automatically processing multi-medium data, save the energy of user.

Further, reference picture 12, Figure 12 is the thin of step S40 in apparatus for processing multimedia data shown in Figure 10 of the present invention Change steps flow chart schematic diagram, based on the embodiment described in above-mentioned Figure 10, the step S40 includes：

Step S41, inserts all Ziwen notebook datas in the default multimedia document.

Step S42, the photo is inserted between two adjacent Ziwen notebook datas so that described two Ziwen notebook datas The corresponding shooting time of cut-point is corresponding with the shooting time of the photo.

Step S40 described in the present embodiment includes：All Ziwen sheets are inserted in the default multimedia document Data；The photo is inserted between two neighboring Ziwen notebook data so that the cut-point correspondence of described two Ziwen notebook datas Shooting time it is corresponding with the shooting time of the photo.The present embodiment by default multimedia document according to the time All Ziwen notebook datas are sequentially inserted into, and photograph is inserted between two neighboring Ziwen notebook data according to the shooting time of photo Piece, multimedia document is automatically generated with this, it is not necessary to user's Manual arranging, dramatically saves on the energy of user.

Further, reference picture 13, Figure 13 is that the flow of multimedia data processing method second embodiment of the present invention is illustrated Figure, based on the embodiment described in above-mentioned Figure 10, in the present embodiment, according to the shooting time, in default multimedia document Also include after the step of inserting all Ziwen notebook datas and the photo：

Step S50, obtains the geographical location information gathered during the photo that collection speech data and user shoot, Wherein, during the photo that collection speech data and user shoot, every default time interval collection user location The geographical location information in domain.

In the present embodiment, during the photo that collection speech data and user shoot, every default time interval Geographical location information of the collection user during travelling, for example, gathered a geographical position for user position every 30 minutes Confidence ceases.

Step S60, according to the acquisition time of the geographical location information and the shooting time of the photo and described many The sequencing of individual sub- text data corresponding time, the geographical location information is inserted into the multimedia document.

The present embodiment is by geographical location information of the timing acquiring user during travelling, and the geographical position that will be collected Confidence breath is inserted into default multimedia document sequentially in time so that can effectively be recorded in the multimedia document of generation The mobile route of user, it is not necessary to which user goes addition manually, saves the energy of user.

It should be noted that herein, term " including ", "comprising" or its any other variant be intended to non-row His property is included, so that process, method, article or device including a series of key elements not only include those key elements, and And also include other key elements being not expressly set out, or also include for this process, method, article or device institute are intrinsic Key element.In the absence of more restrictions, the key element limited by sentence "including a ...", it is not excluded that including this Also there is other identical element in the process of key element, method, article or device.

The embodiments of the present invention are for illustration only, and the quality of embodiment is not represented.

Through the above description of the embodiments, those skilled in the art can be understood that above-described embodiment side Method can add the mode of required general hardware platform to realize by software, naturally it is also possible to by hardware, but in many cases The former is more preferably implementation method.Based on such understanding, technical scheme is substantially done to prior art in other words The part for going out contribution can be embodied in the form of software product, and the computer software product is stored in a storage medium In (such as ROM/RAM, magnetic disc, CD), including some instructions are used to so that a station terminal equipment (can be mobile phone, computer, clothes Business device, air-conditioner, or network equipment etc.) perform method described in each embodiment of the invention.

The preferred embodiments of the present invention are these are only, the scope of the claims of the invention is not thereby limited, it is every to utilize this hair Equivalent structure or equivalent flow conversion that bright specification and accompanying drawing content are made, or directly or indirectly it is used in other related skills Art field, is included within the scope of the present invention.

Claims

1. a kind of apparatus for processing multimedia data, it is characterised in that described device includes：

First acquisition module, for obtaining the speech data and photo of advance collection, and extracts the shooting time of the photo；

Matching module, is carried out for the speech data that will get and in advance the reference speech data of the target person of collection Match somebody with somebody, obtain the first speech data with the reference speech data match；

First modular converter, for first speech data to be converted into multiple sub- textual datas of correspondence according to the shooting time According to；

First insertion module, for according to the shooting time, inserting all Ziwen sheets in default multimedia document Data and the photo.

2. apparatus for processing multimedia data according to claim 1, it is characterised in that first modular converter includes：

Division module, for using the shooting time as cut-point, first speech data being divided into multiple first sons Speech data；

3. apparatus for processing multimedia data according to claim 2, it is characterised in that the first insertion module includes：

3rd insertion module, for inserting the photo between two adjacent Ziwen notebook datas so that described two Ziwen sheets The corresponding shooting time of cut-point of data is corresponding with the shooting time of the photo.

4. apparatus for processing multimedia data according to claim 1, it is characterised in that described device also includes：

Second acquisition module, for obtaining the geographical position gathered during the photo that collection speech data and user shoot Information, wherein, during the photo that collection speech data and user shoot, every default time interval collection user institute In the geographical location information in region；

4th insertion module, the shooting time with the photo and institute for the acquisition time according to the geographical location information The sequencing of multiple Ziwen notebook datas corresponding time is stated, the geographical location information is inserted into the multimedia document In.

5. apparatus for processing multimedia data according to claim 1, it is characterised in that the first insertion module is also used In：

After all Ziwen notebook datas and the photo are inserted in the default multimedia document, create and ordered with current date The file of name, the photo that the multimedia document and the speech data for collecting shoot with user is saved in respectively described In file.

6. a kind of multimedia data processing method, it is characterised in that methods described includes：

The speech data that will be got is matched with the reference speech data of the target person of advance collection, is obtained and the base The first speech data that quasi- speech data matches；

According to the shooting time, all Ziwen notebook datas and the photo are inserted in default multimedia document.

7. multimedia data processing method according to claim 6, it is characterised in that according to the shooting time will described in The step of first speech data is converted to multiple correspondence Ziwen notebook datas includes：

8. multimedia data processing method according to claim 7, it is characterised in that according to the shooting time, pre- If multimedia document in include the step of insert all Ziwen notebook datas and the photo：

All Ziwen notebook datas are inserted in the default multimedia document；

The photo is inserted between two adjacent Ziwen notebook datas so that the cut-point of described two Ziwen notebook datas is corresponding Shooting time is corresponding with the shooting time of the photo.

9. multimedia data processing method according to claim 6, it is characterised in that according to the shooting time, pre- If multimedia document in the step of insert all Ziwen notebook datas and the photo after also include：

The geographical location information gathered during the photo that collection speech data and user shoot is obtained, wherein, in collection During the photo that speech data and user shoot, every the geographical position of default time interval collection user region Information；

According to the acquisition time of the geographical location information and the shooting time of the photo and the multiple Ziwen notebook data The sequencing of corresponding time, the geographical location information is inserted into the multimedia document.

10. multimedia data processing method according to claim 6, it is characterised in that according to the shooting time, pre- If multimedia document in insert and also include after all Ziwen notebook datas and the photo：

The file named with current date is created, the multimedia document and the speech data for collecting are shot with user Photo be saved in respectively in the file.