CN110505403A

CN110505403A - A kind of video record processing method and device

Info

Publication number: CN110505403A
Application number: CN201910770432.3A
Authority: CN
Inventors: 陈晓明
Original assignee: Vivo Mobile Communication Co Ltd
Current assignee: Vivo Mobile Communication Co Ltd
Priority date: 2019-08-20
Filing date: 2019-08-20
Publication date: 2019-11-26

Abstract

The present invention provides a kind of video record processing method and device, which has at least two cameras.This method comprises: the audio signal in acquisition video process and at least two images by the shooting simultaneously of at least two cameras；Audio signal is handled, multiple tonic trains of corresponding different sound sources and the first spatial positional information of each sound source are generated；According at least two images, the second space location information of target reference object is determined；According to second space location information and each first spatial positional information, the corresponding target sound source of target reference object, and target audio sequence corresponding with target sound source are determined；According to target audio sequence, video recording audio is generated.The present invention is able to ascend the audio quality of recorded video.

Description

A kind of video record processing method and device

Technical field

The present invention relates to technical field of audio/video more particularly to a kind of video record processing method and devices.

Background technique

Currently, electronic equipment when carrying out video record, mainly by camera collection image, is collected by microphone Sound, to generate the video of recording.Wherein, Image Acquisition channel and audio input channel are two altogether irrelevant logical Road, when generating video mainly by being overlapped the data received in two channels, to reach recorded video Purpose.

But in practical applications, microphone can be collected into the sound that many sound sources issue under photographed scene, therefore, In Video obtained in noisy environment not only includes the sound that main reference object is issued, further includes the shooting The noise of the sendings such as many other people, machines, is easy to make the sound of reference object in video to be covered by ambient sound under scene Lid, influences the audio of recorded video.

Summary of the invention

The embodiment of the present invention provides a kind of video record processing method and device, to solve mesh in recorded video in the related technology The sound of mark reference object is covered by ambient sound, the problem for causing the video sound intermediate frequency recorded second-rate.

In order to solve the above-mentioned technical problem, the present invention is implemented as follows:

In a first aspect, being applied to that there are at least two cameras the embodiment of the invention provides a kind of video record processing method Electronic equipment, which comprises

At least two figures for acquiring the audio signal in video process and being shot simultaneously by least two camera Picture；

The audio signal is handled, the multiple tonic trains and each sound of corresponding different sound sources are generated First spatial positional information in source；

According at least two image, the second space location information of target reference object is determined；

According to the second space location information and each first spatial positional information, the target shooting is determined The corresponding target sound source of object, and target audio sequence corresponding with the target sound source；

According to the target audio sequence, video recording audio is generated.

Second aspect, the embodiment of the invention also provides a kind of record processing device, the record processing device has extremely Few two cameras, the record processing device include:

Acquisition module, for acquiring audio signal in video process and by least two camera while clapping At least two images taken the photograph；

Processing module, for handling the audio signal, generate multiple tonic trains of corresponding different sound sources with And the first spatial positional information of each sound source；

First determining module, for determining the second space position of target reference object according at least two image Information；

Second determining module is used for according to the second space location information and each first spatial positional information, Determine the corresponding target sound source of the target reference object, and target audio sequence corresponding with the target sound source；

Generation module, for generating video recording audio according to the target audio sequence.

The third aspect, the embodiment of the invention also provides a kind of electronic equipment, comprising: memory, processor and is stored in On the memory and the computer program that can run on the processor, the computer program are held by the processor The step of video record processing method is realized when row.

Fourth aspect, it is described computer-readable to deposit the embodiment of the invention also provides a kind of computer readable storage medium It is stored with computer program on storage media, the video record processing method is realized when the computer program is executed by processor The step of.

In embodiments of the present invention, at least two images using the shooting of at least two cameras, to determine that target is clapped The second space location information of object is taken the photograph, and Sound seperation is carried out to the audio signal in video recording, generates the more of different sound sources First spatial positional information of a tonic train and each sound source, then combining the first spatial positional information and second space Location information, so that it may accurately determine the target audio sequence for belonging to target reference object, and utilize target audio sequence Column, generate the audio of video recording, can in video process, to audio signal filter out with target reference object it is incoherent other The sound of sound source obtains the high quality video with clear target reference object sound, and target in recorded video is avoided to clap The sound for taking the photograph object is covered by ambient sound, the problem for causing the video sound intermediate frequency recorded second-rate, improves recording view Audio quality in frequency.

Detailed description of the invention

In order to illustrate the technical solution of the embodiments of the present invention more clearly, below will be in the description to the embodiment of the present invention Required attached drawing is briefly described, it should be apparent that, the accompanying drawings in the following description is only some realities of the invention Example is applied, it for those of ordinary skill in the art, without any creative labor, can also be according to these Attached drawing obtains other attached drawings.

Fig. 1 is the flow chart of the video record processing method of one embodiment of the invention；

Fig. 2 is the block diagram of the record processing device of one embodiment of the invention；

Fig. 3 is the hardware structural diagram of the electronic equipment of one embodiment of the invention.

Specific embodiment

Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation description, it is clear that described embodiments are some of the embodiments of the present invention, instead of all the embodiments.Based on this hair Embodiment in bright, every other reality obtained by those of ordinary skill in the art without making creative efforts Example is applied, shall fall within the protection scope of the present invention.

Referring to Fig.1, the flow chart of the video record processing method of one embodiment of the invention is shown, electronic equipment is applied to, Wherein, the electronic equipment has at least two cameras, and the method can specifically include following steps:

Step 101, it acquires the audio signal in video process and is shot simultaneously extremely by least two camera Few two images；

Wherein, at least two camera of electronic equipment be camera in the same direction (such as be front camera, or It is rear camera).And at least two cameras shooting is same photographed scene.

For ease of description, it is illustrated by taking rear camera there are two electronic equipment tools as an example hereinafter, two cameras It is to be shot simultaneously to the same photographed scene, so as to get two images under same photographed scene.Due to two There are a certain distance between a camera, therefore, when two cameras shoot same photographed scene simultaneously, shooting The first image and the second image obtained there is parallax.

Wherein, during video record, two figures of two cameras shooting can be got from two cameras Picture can also acquire the audio signal in video process from the microphone of electronic equipment.

It should be noted that electronic equipment can have a microphone, it is possible to have multiple microphones, if it is Multiple microphones, the then audio signal acquired here are multiple received multipath audio signals of microphone described in video process.

In addition, the sound source due to photographed scene is more miscellaneous, for arbitrarily all the way audio signal its very maximum probability is all The sound of multi-acoustical audio signal all the way mixed in together.

So when the audio signal of acquisition is the multipath audio signal that multiple microphones receive, then step is being executed When 102, multipath audio signal can be handled, to generate the multiple tonic trains for corresponding to different sound sources and each institute State the first spatial positional information of sound source.Specific implementation principle is similar with following S201 and S202, and which is not described herein again.

Sound seperation and auditory localization are carried out compared to the audio signal all the way received to a microphone, at this In inventive embodiments, when carrying out Sound seperation and auditory localization to the multipath audio signal that multiple microphones receive, Ke Yiti Rise the accuracy of Sound seperation and the accuracy of auditory localization.

Step 102, the audio signal is handled, generates multiple tonic trains of corresponding different sound sources and every First spatial positional information of a sound source；

Wherein it is possible to using the Sound seperation and sound source location technology of the traditional and following exploitation, to audio signal It is analyzed and processed, to isolate the multiple tonic trains and each sound issued by different sound sources from the audio signal First spatial positional information of the corresponding sound source of frequency sequence.

Wherein, which can be three-dimensional coordinate range.

Optionally, in one embodiment, it when executing step 102, can be realized by S201 and S202:

S201 extracts the characteristic information of the audio signal；

Wherein, this feature information includes but is not limited to phase, frequency, intensity etc..

S202 handles the characteristic information, generates the multiple tonic trains for corresponding to different sound sources and each institute State the first spatial positional information of sound source.

Wherein it is possible to be handled by the method for deep learning this feature information, such as believed based on this feature Breath, to learn to audio signal, so that multiple tonic trains of corresponding different sound sources are isolated from the audio signal, And the first spatial positional information of each sound source.

For example, this video record is wanted to record the video of user A, but photographed scene is amusement park, in the bat It takes the photograph under scene, when carrying out video record, not only there is the sound of user A, there are also other users and the brouhaha of environment, In It is a sound source by each of microphone radio reception individual, but the sound of only user A is only this record under the photographed scene Therefore the main sound of system in order to promote the audio quality of video record, avoids the sound of user A from being covered by other background sounds It covers, is needed in this step to one or more audio signal received, according to different points for carrying out tonic trains of sound source From, and determine the spatial position of each sound source.

In embodiments of the present invention, by extracting the characteristic information of audio signal, Lai Liyong characteristic information determines audio Multiple tonic trains that different sound sources are corresponded in signal, can promote the classification accuracy of the tonic train to different sound sources, with And the spatial positional information of the corresponding sound source of each tonic train is determined using this feature information, it is fixed to sound source to be promoted The accuracy of position.

Step 103, according at least two image, the second space location information of target reference object is determined；

Wherein, it is based on the example above, goal reference object is user A.Due to being deposited between at least two cameras In spacing, so that being also that there are sighting distance differences between two images of two cameras shooting, then due to two images pair The content of shooting answered be it is identical, difference be only in that sighting distance is poor, therefore, can use two images and determine user A's Spatial positional information, such as the spatial informations such as three dimensional local information and azimuth.

It should be noted that the present invention for step 102 and step 103 execution sequence with no restrictions, the two is in step It is executed after rapid 101, and after step 102 and step 103 execution terminate, executes step 104.

Optionally, in one embodiment, it when executing step 103, can be accomplished by the following way:

S301 carries out recognition of face to any one target image at least two image, determines that target is shot The two-dimensional position information of object and the target reference object in the target image；

Wherein, due to the object of two image takings be it is identical, difference be that sighting distance is poor, therefore, can be to two Any one image in image is named as target image here and carries out recognition of face, to determine that the target in target image is shot Object and the two-dimensional position information of the target reference object in the target image.

Since two images are two dimensional image, so, finding target reference object from target image, such as with After the human face region of family A, can determine the user A in the target image where region, i.e., two-dimensional coordinate range (namely Here two-dimensional position information).

When determining the target reference object in target image, if there are multiple faces in target image, for determination Which is the face of user A in multiple faces.

Optionally, in one embodiment, it is contemplated that under most of scenes, target reference object is all apart from camera Nearest object, therefore, the area of the facial image of the target reference object are than the area of the facial image of other objects It is big, therefore, target can be determined as by target reference object (i.e. human region) corresponding to the facial image by maximum area Reference object, and determine the two-dimensional coordinate range of target reference object in the target image.Determining target can be promoted in this way The efficiency of reference object and its two-dimensional coordinate range.

Optionally, in another embodiment, can by the multiple facial images recognized from target image respectively with The facial image of user A (i.e. target reference object) prestored matches, so as to search from multiple facial image It to the facial image for belonging to user A, and then determines two-dimensional coordinate range of the user A in the target image, can be promoted to mesh Mark the recognition accuracy of reference object, and the recognition accuracy to its two-dimensional coordinate range.

S302 carries out three-dimensional reconstruction at least two image, obtains threedimensional model；

When carrying out three-dimensional reconstruction at least two two dimensional images, Lai Shengcheng threedimensional model can be using known and not It is realized any one three-dimensional rebuilding method for developing, the present invention is without limitation.

By taking at least two image is the first image and the second image as an example, to be illustrated to three-dimensional reconstruction.

Due to the two dimensional image that the first image and the second image are three-dimension object in photographed scene, it can be by right Two cameras are demarcated to establish effective imaging model, solve the inside and outside parameter of two cameras, thus can be with The three-dimensional point coordinate in space is obtained in conjunction with the matching result of image, to achieve the purpose that carry out three-dimensional reconstruction.Then, right First image and the second image carry out feature extraction, and the feature of extraction mainly includes characteristic point, characteristic curve and characteristic area.Root A kind of corresponding relationship between image pair is established according to extracted feature, that is, by same physical space o'clock in the first figure As correspond with the imaging point in this two width different images of the second image, this completes Stereo matching mistakes Journey.More accurate matching result is obtained by the above process, in conjunction with the inside and outside parameter of camera calibration, so that it may recover Three-dimensional scene information.

S303 determines that the second of the target reference object is empty according to the two-dimensional position information and the threedimensional model Between location information.

Wherein, since S302 gets the threedimensional model of photographed scene, which may include every in photographed scene The three-dimensional informations such as the three-dimensional coordinate of a point (pixel corresponding to image) and azimuth.And S302 also available user A Two-dimensional position information (i.e. two-dimensional coordinate range) in the two dimensional image (i.e. target image) of photographed scene, then due to three Dimension module and two dimensional image it is corresponding be all the same photographed scene, therefore, can based on threedimensional model and the user A two Location information is tieed up, to determine the three-dimensional information (such as the information such as depth information, two-dimensional coordinate range, azimuth) of user A

In embodiments of the present invention, using at least two images under the Same Scene of at least two cameras shooting, come The threedimensional model of the photographed scene is generated, and using the two-dimensional position information of the threedimensional model and target reference object, to determine The spatial positional information of target reference object can accurately get the spatial positional information of target reference object.And Due to construction threedimensional model it can be shown that each pixel depth information, without camera configuration with higher (i.e. the parameter configuration of camera requires less high, does not need that TOF (Time of flight, flight time) next life can be utilized At threedimensional model, the depth information of target reference object is obtained), reduce the cost of electronic equipment.

Wherein, when executing S303, the two-dimensional position information and the threedimensional model can be matched, is determined Depth information corresponding with the two-dimensional position information in the threedimensional model；According to the two-dimensional position information and the depth Information is spent, determines the second space location information of the target reference object.

Wherein, due to threedimensional model it can be shown that the three-dimensional coordinate information of each pixel and side under photographed scene The three-dimensional informations such as parallactic angle therefore, can be by the two of the target reference object in order to determine the spatial position of target reference object Dimension location information is matched with the three-dimensional coordinate information of the threedimensional model, so that it is determined that with described two in the threedimensional model The corresponding depth information of dimension location information (to be illustrated for a coordinate (x1, y1) of two-dimensional position information, three-dimensional mould It is z1 that two-dimensional coordinate, which is the z coordinate of x1, y1, in type, hence, it can be determined that in threedimensional model with coordinate (x1, y1) corresponding depth Angle value is z1)；The three-dimensional coordinate of some two-dimensional image vegetarian refreshments so in target reference object is (x1, y1, z1), due to mesh It is a two-dimensional coordinate range that it is corresponding in two dimensional image, which to mark reference object, then can be obtained by the target reference object Three-dimensional coordinate range, wherein the second space location information of target reference object may include the three-dimensional coordinate range, also It may further include azimuth information.

Step 104, according to the second space location information and each first spatial positional information, determine described in The corresponding target sound source of target reference object, and target audio sequence corresponding with the target sound source；

Optionally, in one embodiment, it when executing step 104, can be accomplished by the following way: will correspond to Multiple first spatial positional informations of different sound sources are matched respectively with the second space location information, and determining With highest the first spatial positional information of target of degree；Will sound source corresponding with first spatial positional information of target, be determined as The target sound source of the target reference object；By tonic train corresponding with the target sound source in the multiple tonic train, It is determined as target audio sequence.

Wherein, first space bit confidence of each sound source under photographed scene in the video recorded due to step 102 Breath, and step 103 is got back second space location information of the target reference object under the photographed scene, then for determination Which sound source is target reference object in the multi-acoustical of step 102, can be distinguished multiple first spatial positional informations here It is matched with second space location information, so that multiple matching degrees are obtained, it can be by the target first of corresponding highest matching degree Sound source corresponding to spatial positional information is determined as the sound source of target reference object, then the corresponding tonic train of the sound source is all For the tonic train (i.e. the target reference object issued during video record voice data) of the target reference object.

In embodiments of the present invention, using under same photographed scene, the spatial position of the tonic train of different sound sources, with And the spatial position of target reference object, so as to accurately identify the target sound source of corresponding target reference object, and The tonic train of the target sound source accurately can find corresponding target from isolated multiple tonic trains The tonic train of reference object.

Optionally, in another embodiment, it when executing step 104, can also be accomplished by the following way:

By multiple first spatial positional informations of the different sound sources of correspondence and the second space location information respectively into Row matching, determines the first spatial positional information of target (number of the first spatial positional information of target that match degree is greater than the preset threshold Amount can be one or more)；

It, will be with first spatial positional information pair of target when the quantity of the first spatial positional information of target is one The sound source answered is determined as the target sound source of the target reference object；By in the multiple tonic train with the target sound source Corresponding tonic train, determines target audio sequence；

When the quantity of the first spatial positional information of target is multiple, identification and multiple first spatial positions of target Corresponding multiple first sound sources of information；It determines in the multiple tonic train and is respectively corresponded with the multiple first sound source Multiple first tonic trains；By the vocal print of the vocal print feature of each first tonic train and the target reference object Feature is matched, and determines the first tonic train that highest vocal print feature matching degree is corresponded in the multiple first tonic train； By the first tonic train of corresponding highest vocal print feature matching degree, it is determined as target audio sequence；By the target audio sequence Corresponding sound source is determined as the corresponding target sound source of the target reference object.

In this way, working as the first space bit of target that the determining second space positional distance with target reference object is closer to When being set to multiple, the vocal print feature that can use target reference object is come to corresponding more with multiple the first spatial positions of target A first sound source is screened, so that it is determined that the corresponding target sound source of target reference object and target audio sequence, promotion pair The tonic train recognition accuracy of target sound source in multiple tonic trains.

Step 105, according to the target audio sequence, video recording audio is generated.

Optionally, in one embodiment, target audio sequence can be determined as to audio of recording a video.

Optionally, in another embodiment, noise reduction process can also be carried out to the target audio sequence, to generate record As audio.

In addition, in embodiments of the present invention, it, should different from tradition video recording image and audio simple combination mode used Method utilizes at least two images of at least two cameras shooting, generates threedimensional model, consequently facilitating determining target shooting pair The depth information of elephant, and combined with the audio in the video of face and recording in image, take full advantage of same ring The spatial position related information of border, target reference object and sound source obtains the video recording for containing only video recording main body sound, Ke Yixian Write the video recording audio quality for promoting electronic equipment.

Referring to Fig. 2, the block diagram of the record processing device of one embodiment of the invention is shown.The record processing device With at least two cameras, the record processing device of the embodiment of the present invention is able to achieve the video record processing side in above-described embodiment The details of method, and reach identical effect.Record processing device shown in Fig. 2 includes:

Acquisition module 21, for acquire audio signal in video process and by least two camera simultaneously At least two images of shooting；

Processing module 22 generates multiple tonic trains of corresponding different sound sources for handling the audio signal And the first spatial positional information of each sound source；

First determining module 23, for determining the second space position of target reference object according at least two image Confidence breath；

Second determining module 24, for according to the second space location information and each first space bit confidence Breath, determines the corresponding target sound source of the target reference object, and target audio sequence corresponding with the target sound source；

Generation module 25, for generating video recording audio according to the target audio sequence.

Optionally, first determining module 23 includes:

Identify submodule, for carrying out recognition of face to any one target image at least two image, really Set the goal the two-dimensional position information of reference object and the target reference object in the target image；

Submodule is rebuild, for carrying out three-dimensional reconstruction at least two image, obtains threedimensional model；

First determines submodule, for determining that the target is clapped according to the two-dimensional position information and the threedimensional model Take the photograph the second space location information of object.

Optionally, described first determine that submodule includes:

First determination unit determines described three for matching the two-dimensional position information and the threedimensional model Depth information corresponding with the two-dimensional position information in dimension module；

Second determination unit, for determining the target shooting according to the two-dimensional position information and the depth information The second space location information of object.

Optionally, the processing module 22 includes:

Extracting sub-module, for extracting the characteristic information of the audio signal；

Submodule is handled, for handling the characteristic information, generates multiple tonic trains of corresponding different sound sources And the first spatial positional information of each sound source.

Optionally, the record processing device has multiple microphones；

The acquisition module 21 is also used to acquire multiple received multipath audio signals of microphone described in video process.

Optionally, second determining module 24 includes:

Matched sub-block, for multiple first spatial positional informations and second sky of different sound sources will to be corresponded to Between location information matched respectively, determine highest the first spatial positional information of target of matching degree；

Second determines submodule, for will sound source corresponding with first spatial positional information of target, be determined as described The target sound source of target reference object；

Third determines submodule, is used for tonic train corresponding with the target sound source in the multiple tonic train, It is determined as target audio sequence.

Record processing device provided in an embodiment of the present invention can be realized what electronic equipment in above method embodiment was realized Each process, to avoid repeating, which is not described herein again.

The hardware structural diagram of Fig. 3 a kind of electronic equipment of each embodiment to realize the present invention, the electronic equipment 400 have at least two cameras and at least one microphone, it is preferable that electronic equipment 400 has multiple microphones.

The electronic equipment 400 includes but is not limited to: radio frequency unit 401, network module 402, audio output unit 403, defeated Enter unit 404, sensor 405, display unit 406, user input unit 407, interface unit 408, memory 409, processor The components such as 410 and power supply 411.It will be understood by those skilled in the art that electronic devices structure shown in Fig. 3 is not constituted Restriction to electronic equipment, electronic equipment may include components more more or fewer than diagram, or combine certain components, or The different component layout of person.In embodiments of the present invention, electronic equipment includes but is not limited to mobile phone, tablet computer, notebook Computer, palm PC, car-mounted terminal, wearable device and pedometer etc..

Wherein, input unit 404, for acquiring audio signal in video process and being imaged by described at least two At least two images of head shooting simultaneously；

Processor 410, the audio signal for acquiring to input unit 404 are handled, and generate corresponding different sound sources First spatial positional information of multiple tonic trains and each sound source；And according to the acquisition of input unit 404 At least two images determine the second space location information of target reference object；According to the second space location information and often A first spatial positional information, determines the corresponding target sound source of the target reference object, and with the target sound source Corresponding target audio sequence；According to the target audio sequence, video recording audio is generated.

It should be understood that the embodiment of the present invention in, radio frequency unit 401 can be used for receiving and sending messages or communication process in, signal Send and receive, specifically, by from base station downlink data receive after, to processor 410 handle；In addition, by uplink Data are sent to base station.In general, radio frequency unit 401 includes but is not limited to antenna, at least one amplifier, transceiver, coupling Device, low-noise amplifier, duplexer etc..In addition, radio frequency unit 401 can also by wireless communication system and network and other Equipment communication.

Electronic equipment provides wireless broadband internet by network module 402 for user and accesses, and such as user is helped to receive It sends e-mails, browse webpage and access streaming video etc..

Audio output unit 403 can be received by radio frequency unit 401 or network module 402 or in memory 409 The audio data of storage is converted into audio signal and exports to be sound.Moreover, audio output unit 403 can also provide with The relevant audio output of specific function that electronic equipment 400 executes is (for example, call signal receives sound, message sink sound etc. Deng).Audio output unit 403 includes loudspeaker, buzzer and receiver etc..

Input unit 404 is for receiving audio or video signal.Input unit 404 may include graphics processor (Graphics Processing Unit, GPU) 4041 and microphone 4042, graphics processor 4041 capture mould in video The image data of the static images or video that are obtained in formula or image capture mode by image capture apparatus (such as camera) carries out Processing.Treated, and picture frame may be displayed on display unit 406.Through graphics processor 4041, treated that picture frame can To be stored in memory 409 (or other storage mediums) or be sent via radio frequency unit 401 or network module 402. Microphone 4042 can receive sound, and can be audio data by such acoustic processing.Treated audio data It is defeated that the format that can be sent to mobile communication base station via radio frequency unit 401 can be converted in the case where telephone calling model Out.

Electronic equipment 400 further includes at least one sensor 405, such as optical sensor, motion sensor and other biographies Sensor.Specifically, optical sensor includes ambient light sensor and proximity sensor, wherein ambient light sensor can be according to environment The light and shade of light adjusts the brightness of display panel 4061, and proximity sensor can close when electronic equipment 400 is moved in one's ear Close display panel 4061 and/or backlight.As a kind of motion sensor, accelerometer sensor can detect (one in all directions As be three axis) acceleration size, can detect that size and the direction of gravity when static, can be used to identify electronic equipment posture (such as horizontal/vertical screen switching, dependent game, magnetometer pose calibrating), Vibration identification correlation function (such as pedometer, percussion) Deng；Sensor 405 can also include fingerprint sensor, pressure sensor, iris sensor, molecule sensor, gyroscope, gas Meter, hygrometer, thermometer, infrared sensor etc. are pressed, details are not described herein.

Display unit 406 is for showing information input by user or being supplied to the information of user.Display unit 406 can Including display panel 4061, liquid crystal display (Liquid Crystal Display, LCD), organic light-emitting diodes can be used Forms such as (Organic Light-Emitting Diode, OLED) are managed to configure display panel 4061.

User input unit 407 can be used for receiving the number or character information of input, and generate the use with electronic equipment Family setting and the related key signals input of function control.Specifically, user input unit 407 include touch panel 4071 with And other input equipments 4072.Touch panel 4071, also referred to as touch screen collect the touch behaviour of user on it or nearby Make (for example user uses any suitable objects or attachment such as finger, stylus on touch panel 4071 or in touch panel Operation near 4071).Touch panel 4071 may include both touch detecting apparatus and touch controller.Wherein, it touches Detection device detects the touch orientation of user, and detects touch operation bring signal, transmits a signal to touch controller； Touch controller receives touch information from touch detecting apparatus, and is converted into contact coordinate, then gives processor 410, It receives the order that processor 410 is sent and is executed.Furthermore, it is possible to using resistance-type, condenser type, infrared ray and surface The multiple types such as sound wave realize touch panel 4071.In addition to touch panel 4071, user input unit 407 can also include it His input equipment 4072.Specifically, other input equipments 4072 can include but is not limited to physical keyboard, function key (such as sound Measure control button, switch key etc.), trace ball, mouse, operating stick, details are not described herein.

Further, touch panel 4071 can be covered on display panel 4061, when touch panel 4071 is detected at it On or near touch operation after, send processor 410 to determine the type of touch event, be followed by subsequent processing device 410 according to touching The type for touching event provides corresponding visual output on display panel 4061.Although in Fig. 3, touch panel 4071 and aobvious Show that panel 4061 is the function that outputs and inputs of realizing electronic equipment as two independent components, but in certain implementations In example, touch panel 4071 and display panel 4061 can be integrated and be realized the function that outputs and inputs of electronic equipment, specifically Herein without limitation.

Interface unit 408 is the interface that external device (ED) is connect with electronic equipment 400.For example, external device (ED) may include having Line or wireless head-band earphone port, external power supply (or battery charger) port, wired or wireless data port, storage card Port, port, the port audio input/output (I/O), video i/o port, ear for connecting the device with identification module Generator terminal mouth etc..Interface unit 408 can be used for receiving the input (for example, data information, electric power etc.) from external device (ED) And by one or more elements that the input received is transferred in electronic equipment 400 or it can be used in electronic equipment Data are transmitted between 400 and external device (ED).

Memory 409 can be used for storing software program and various data.Memory 409 can mainly include storage program Area and storage data area, wherein storing program area can application program needed for storage program area, at least one function (such as Sound-playing function, image player function etc.) etc.；Storage data area, which can be stored, uses created data (ratio according to mobile phone Such as audio data, phone directory) etc..In addition, memory 409 may include high-speed random access memory, it can also include non- Volatile memory, for example, at least a disk memory, flush memory device or other volatile solid-state parts.

Processor 410 is the control centre of electronic equipment, utilizes each of various interfaces and the entire electronic equipment of connection A part by running or execute the software program and/or module that are stored in memory 409, and calls and is stored in storage Data in device 409 execute the various functions and processing data of electronic equipment, to carry out integral monitoring to electronic equipment.Place Managing device 410 may include one or more processing units；Preferably, processor 410 can integrate application processor and modulatedemodulate is mediated Manage device, wherein the main processing operation system of application processor, user interface and application program etc., modem processor is main Processing wireless communication.It is understood that above-mentioned modem processor can not also be integrated into processor 410.

Electronic equipment 400 can also include the power supply 411 (such as battery) powered to all parts, it is preferred that power supply 411 can be logically contiguous by power-supply management system and processor 410, thus charged by power-supply management system realization management, The functions such as electric discharge and power managed.

In addition, electronic equipment 400 includes some unshowned functional modules, details are not described herein.

Preferably, the embodiment of the present invention also provides a kind of electronic equipment, including processor 410, and memory 409 is stored in On memory 409 and the computer program that can run on the processor 410, the computer program are executed by processor 410 Each process of the above-mentioned video record processing embodiment of the method for Shi Shixian, and identical technical effect can be reached, to avoid repeating, this In repeat no more.

The embodiment of the present invention also provides a kind of computer readable storage medium, is stored on computer readable storage medium Computer program, the computer program realize each process of above-mentioned video record processing embodiment of the method when being executed by processor, and Identical technical effect can be reached, to avoid repeating, which is not described herein again.Wherein, the computer readable storage medium, As read-only memory (Read-Only Memory, abbreviation ROM), random access memory (Random Access Memory, Abbreviation RAM), magnetic or disk etc..

It should be noted that, in this document, the terms "include", "comprise" or its any other variant be intended to it is non- It is exclusive to include, so that the process, method, article or the device that include a series of elements not only include those elements, It but also including other elements that are not explicitly listed, or further include for this process, method, article or device institute Intrinsic element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that There is also other identical elements in process, method, article or device including the element.

Through the above description of the embodiments, those skilled in the art can be understood that above-described embodiment Method can be realized by means of software and necessary general hardware platform, naturally it is also possible to by hardware, but many situations It is lower the former be more preferably embodiment.Based on this understanding, technical solution of the present invention is substantially in other words to the prior art The part to contribute can be embodied in the form of software products, which is stored in a storage and is situated between In matter (such as ROM/RAM, magnetic disk, CD), including some instructions are used so that a terminal (can be mobile phone, computer, clothes Business device, air conditioner or the network equipment etc.) execute method described in each embodiment of the present invention.

The embodiment of the present invention is described with above attached drawing, but the invention is not limited to above-mentioned tools Body embodiment, the above mentioned embodiment is only schematical, rather than restrictive, the ordinary skill of this field Personnel under the inspiration of the present invention, without breaking away from the scope protected by the purposes and claims of the present invention, can also make Many forms belong within protection of the invention.

Claims

1. a kind of video record processing method, applied to the electronic equipment at least two cameras, which is characterized in that the method Include:

At least two images for acquiring the audio signal in video process and being shot simultaneously by least two camera；

The audio signal is handled, generate corresponding different sound sources multiple tonic trains and each sound source the One spatial positional information；

According to the second space location information and each first spatial positional information, the target reference object pair is determined The target sound source answered, and target audio sequence corresponding with the target sound source；

According to the target audio sequence, video recording audio is generated.

2. determining that target is clapped the method according to claim 1, wherein described according at least two image Take the photograph the second space location information of object, comprising:

Recognition of face is carried out to any one target image at least two image, determines target reference object and institute State two-dimensional position information of the target reference object in the target image；

Three-dimensional reconstruction is carried out at least two image, obtains threedimensional model；

According to the two-dimensional position information and the threedimensional model, the second space position letter of the target reference object is determined Breath.

3. according to the method described in claim 2, it is characterized in that, described according to the two-dimensional position information and the three-dimensional mould Type determines the second space location information of the target reference object, comprising:

The two-dimensional position information and the threedimensional model are matched, determine in the threedimensional model with the two-dimensional position The corresponding depth information of information；

According to the two-dimensional position information and the depth information, the second space position letter of the target reference object is determined Breath.

4. the method according to claim 1, wherein the electronic equipment has multiple microphones, the acquisition Audio signal in video process, comprising:

Acquire multiple received multipath audio signals of microphone described in video process.

5. the method according to claim 1, wherein described according to the second space location information and each institute The first spatial positional information is stated, determines the corresponding target sound source of the target reference object, and corresponding with the target sound source Target audio sequence, comprising:

Multiple first spatial positional informations of the different sound sources of correspondence and the second space location information are carried out respectively Match, determines highest the first spatial positional information of target of matching degree；

Will sound source corresponding with first spatial positional information of target, be determined as the target sound source of the target reference object；

By tonic train corresponding with the target sound source in the multiple tonic train, it is determined as target audio sequence.

6. a kind of record processing device, which is characterized in that the record processing device has at least two cameras, the video recording Processing unit includes:

Acquisition module, for acquire audio signal in video process and by least two camera at the same shooting extremely Few two images；

Processing module generates multiple tonic trains of corresponding different sound sources and every for handling the audio signal First spatial positional information of a sound source；

First determining module, for determining the second space location information of target reference object according at least two image；

Second determining module, for determining according to the second space location information and each first spatial positional information The corresponding target sound source of the target reference object, and target audio sequence corresponding with the target sound source；

7. record processing device according to claim 6, which is characterized in that first determining module includes:

It identifies submodule, for carrying out recognition of face to any one target image at least two image, determines mesh Mark the two-dimensional position information of reference object and the target reference object in the target image；

First determines submodule, for determining the target shooting pair according to the two-dimensional position information and the threedimensional model The second space location information of elephant.

8. record processing device according to claim 7, which is characterized in that described first determines that submodule includes:

First determination unit determines the three-dimensional mould for matching the two-dimensional position information and the threedimensional model Depth information corresponding with the two-dimensional position information in type；

Second determination unit, for determining the target reference object according to the two-dimensional position information and the depth information Second space location information.

9. record processing device according to claim 6, which is characterized in that the record processing device has multiple Mikes Wind；

The acquisition module is also used to acquire multiple received multipath audio signals of microphone described in video process.

10. record processing device according to claim 6, which is characterized in that second determining module includes:

Matched sub-block, for will correspond to different sound sources multiple first spatial positional informations and the second space position Information is matched respectively, determines highest the first spatial positional information of target of matching degree；

Second determine submodule, for will sound source corresponding with first spatial positional information of target, be determined as the target The target sound source of reference object；

Third determines submodule, for determining tonic train corresponding with the target sound source in the multiple tonic train For target audio sequence.