CN110505403A - A kind of video record processing method and device - Google Patents
A kind of video record processing method and device Download PDFInfo
- Publication number
- CN110505403A CN110505403A CN201910770432.3A CN201910770432A CN110505403A CN 110505403 A CN110505403 A CN 110505403A CN 201910770432 A CN201910770432 A CN 201910770432A CN 110505403 A CN110505403 A CN 110505403A
- Authority
- CN
- China
- Prior art keywords
- target
- reference object
- sound source
- information
- image
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N13/00—Stereoscopic video systems; Multi-view video systems; Details thereof
- H04N13/20—Image signal generators
- H04N13/275—Image signal generators from 3D object models, e.g. computer-generated stereoscopic image signals
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N23/00—Cameras or camera modules comprising electronic image sensors; Control thereof
- H04N23/60—Control of cameras or camera modules
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N23/00—Cameras or camera modules comprising electronic image sensors; Control thereof
- H04N23/60—Control of cameras or camera modules
- H04N23/61—Control of cameras or camera modules based on recognised objects
- H04N23/611—Control of cameras or camera modules based on recognised objects where the recognised objects include parts of the human body
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N5/00—Details of television systems
- H04N5/76—Television signal recording
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Television Signal Processing For Recording (AREA)
Abstract
The present invention provides a kind of video record processing method and device, which has at least two cameras.This method comprises: the audio signal in acquisition video process and at least two images by the shooting simultaneously of at least two cameras;Audio signal is handled, multiple tonic trains of corresponding different sound sources and the first spatial positional information of each sound source are generated;According at least two images, the second space location information of target reference object is determined;According to second space location information and each first spatial positional information, the corresponding target sound source of target reference object, and target audio sequence corresponding with target sound source are determined;According to target audio sequence, video recording audio is generated.The present invention is able to ascend the audio quality of recorded video.
Description
Technical field
The present invention relates to technical field of audio/video more particularly to a kind of video record processing method and devices.
Background technique
Currently, electronic equipment when carrying out video record, mainly by camera collection image, is collected by microphone
Sound, to generate the video of recording.Wherein, Image Acquisition channel and audio input channel are two altogether irrelevant logical
Road, when generating video mainly by being overlapped the data received in two channels, to reach recorded video
Purpose.
But in practical applications, microphone can be collected into the sound that many sound sources issue under photographed scene, therefore, In
Video obtained in noisy environment not only includes the sound that main reference object is issued, further includes the shooting
The noise of the sendings such as many other people, machines, is easy to make the sound of reference object in video to be covered by ambient sound under scene
Lid, influences the audio of recorded video.
Summary of the invention
The embodiment of the present invention provides a kind of video record processing method and device, to solve mesh in recorded video in the related technology
The sound of mark reference object is covered by ambient sound, the problem for causing the video sound intermediate frequency recorded second-rate.
In order to solve the above-mentioned technical problem, the present invention is implemented as follows:
In a first aspect, being applied to that there are at least two cameras the embodiment of the invention provides a kind of video record processing method
Electronic equipment, which comprises
At least two figures for acquiring the audio signal in video process and being shot simultaneously by least two camera
Picture;
The audio signal is handled, the multiple tonic trains and each sound of corresponding different sound sources are generated
First spatial positional information in source;
According at least two image, the second space location information of target reference object is determined;
According to the second space location information and each first spatial positional information, the target shooting is determined
The corresponding target sound source of object, and target audio sequence corresponding with the target sound source;
According to the target audio sequence, video recording audio is generated.
Second aspect, the embodiment of the invention also provides a kind of record processing device, the record processing device has extremely
Few two cameras, the record processing device include:
Acquisition module, for acquiring audio signal in video process and by least two camera while clapping
At least two images taken the photograph;
Processing module, for handling the audio signal, generate multiple tonic trains of corresponding different sound sources with
And the first spatial positional information of each sound source;
First determining module, for determining the second space position of target reference object according at least two image
Information;
Second determining module is used for according to the second space location information and each first spatial positional information,
Determine the corresponding target sound source of the target reference object, and target audio sequence corresponding with the target sound source;
Generation module, for generating video recording audio according to the target audio sequence.
The third aspect, the embodiment of the invention also provides a kind of electronic equipment, comprising: memory, processor and is stored in
On the memory and the computer program that can run on the processor, the computer program are held by the processor
The step of video record processing method is realized when row.
Fourth aspect, it is described computer-readable to deposit the embodiment of the invention also provides a kind of computer readable storage medium
It is stored with computer program on storage media, the video record processing method is realized when the computer program is executed by processor
The step of.
In embodiments of the present invention, at least two images using the shooting of at least two cameras, to determine that target is clapped
The second space location information of object is taken the photograph, and Sound seperation is carried out to the audio signal in video recording, generates the more of different sound sources
First spatial positional information of a tonic train and each sound source, then combining the first spatial positional information and second space
Location information, so that it may accurately determine the target audio sequence for belonging to target reference object, and utilize target audio sequence
Column, generate the audio of video recording, can in video process, to audio signal filter out with target reference object it is incoherent other
The sound of sound source obtains the high quality video with clear target reference object sound, and target in recorded video is avoided to clap
The sound for taking the photograph object is covered by ambient sound, the problem for causing the video sound intermediate frequency recorded second-rate, improves recording view
Audio quality in frequency.
Detailed description of the invention
In order to illustrate the technical solution of the embodiments of the present invention more clearly, below will be in the description to the embodiment of the present invention
Required attached drawing is briefly described, it should be apparent that, the accompanying drawings in the following description is only some realities of the invention
Example is applied, it for those of ordinary skill in the art, without any creative labor, can also be according to these
Attached drawing obtains other attached drawings.
Fig. 1 is the flow chart of the video record processing method of one embodiment of the invention;
Fig. 2 is the block diagram of the record processing device of one embodiment of the invention;
Fig. 3 is the hardware structural diagram of the electronic equipment of one embodiment of the invention.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete
Site preparation description, it is clear that described embodiments are some of the embodiments of the present invention, instead of all the embodiments.Based on this hair
Embodiment in bright, every other reality obtained by those of ordinary skill in the art without making creative efforts
Example is applied, shall fall within the protection scope of the present invention.
Referring to Fig.1, the flow chart of the video record processing method of one embodiment of the invention is shown, electronic equipment is applied to,
Wherein, the electronic equipment has at least two cameras, and the method can specifically include following steps:
Step 101, it acquires the audio signal in video process and is shot simultaneously extremely by least two camera
Few two images;
Wherein, at least two camera of electronic equipment be camera in the same direction (such as be front camera, or
It is rear camera).And at least two cameras shooting is same photographed scene.
For ease of description, it is illustrated by taking rear camera there are two electronic equipment tools as an example hereinafter, two cameras
It is to be shot simultaneously to the same photographed scene, so as to get two images under same photographed scene.Due to two
There are a certain distance between a camera, therefore, when two cameras shoot same photographed scene simultaneously, shooting
The first image and the second image obtained there is parallax.
Wherein, during video record, two figures of two cameras shooting can be got from two cameras
Picture can also acquire the audio signal in video process from the microphone of electronic equipment.
It should be noted that electronic equipment can have a microphone, it is possible to have multiple microphones, if it is
Multiple microphones, the then audio signal acquired here are multiple received multipath audio signals of microphone described in video process.
In addition, the sound source due to photographed scene is more miscellaneous, for arbitrarily all the way audio signal its very maximum probability is all
The sound of multi-acoustical audio signal all the way mixed in together.
So when the audio signal of acquisition is the multipath audio signal that multiple microphones receive, then step is being executed
When 102, multipath audio signal can be handled, to generate the multiple tonic trains for corresponding to different sound sources and each institute
State the first spatial positional information of sound source.Specific implementation principle is similar with following S201 and S202, and which is not described herein again.
Sound seperation and auditory localization are carried out compared to the audio signal all the way received to a microphone, at this
In inventive embodiments, when carrying out Sound seperation and auditory localization to the multipath audio signal that multiple microphones receive, Ke Yiti
Rise the accuracy of Sound seperation and the accuracy of auditory localization.
Step 102, the audio signal is handled, generates multiple tonic trains of corresponding different sound sources and every
First spatial positional information of a sound source;
Wherein it is possible to using the Sound seperation and sound source location technology of the traditional and following exploitation, to audio signal
It is analyzed and processed, to isolate the multiple tonic trains and each sound issued by different sound sources from the audio signal
First spatial positional information of the corresponding sound source of frequency sequence.
Wherein, which can be three-dimensional coordinate range.
Optionally, in one embodiment, it when executing step 102, can be realized by S201 and S202:
S201 extracts the characteristic information of the audio signal;
Wherein, this feature information includes but is not limited to phase, frequency, intensity etc..
S202 handles the characteristic information, generates the multiple tonic trains for corresponding to different sound sources and each institute
State the first spatial positional information of sound source.
Wherein it is possible to be handled by the method for deep learning this feature information, such as believed based on this feature
Breath, to learn to audio signal, so that multiple tonic trains of corresponding different sound sources are isolated from the audio signal,
And the first spatial positional information of each sound source.
For example, this video record is wanted to record the video of user A, but photographed scene is amusement park, in the bat
It takes the photograph under scene, when carrying out video record, not only there is the sound of user A, there are also other users and the brouhaha of environment, In
It is a sound source by each of microphone radio reception individual, but the sound of only user A is only this record under the photographed scene
Therefore the main sound of system in order to promote the audio quality of video record, avoids the sound of user A from being covered by other background sounds
It covers, is needed in this step to one or more audio signal received, according to different points for carrying out tonic trains of sound source
From, and determine the spatial position of each sound source.
In embodiments of the present invention, by extracting the characteristic information of audio signal, Lai Liyong characteristic information determines audio
Multiple tonic trains that different sound sources are corresponded in signal, can promote the classification accuracy of the tonic train to different sound sources, with
And the spatial positional information of the corresponding sound source of each tonic train is determined using this feature information, it is fixed to sound source to be promoted
The accuracy of position.
Step 103, according at least two image, the second space location information of target reference object is determined;
Wherein, it is based on the example above, goal reference object is user A.Due to being deposited between at least two cameras
In spacing, so that being also that there are sighting distance differences between two images of two cameras shooting, then due to two images pair
The content of shooting answered be it is identical, difference be only in that sighting distance is poor, therefore, can use two images and determine user A's
Spatial positional information, such as the spatial informations such as three dimensional local information and azimuth.
It should be noted that the present invention for step 102 and step 103 execution sequence with no restrictions, the two is in step
It is executed after rapid 101, and after step 102 and step 103 execution terminate, executes step 104.
Optionally, in one embodiment, it when executing step 103, can be accomplished by the following way:
S301 carries out recognition of face to any one target image at least two image, determines that target is shot
The two-dimensional position information of object and the target reference object in the target image;
Wherein, due to the object of two image takings be it is identical, difference be that sighting distance is poor, therefore, can be to two
Any one image in image is named as target image here and carries out recognition of face, to determine that the target in target image is shot
Object and the two-dimensional position information of the target reference object in the target image.
Since two images are two dimensional image, so, finding target reference object from target image, such as with
After the human face region of family A, can determine the user A in the target image where region, i.e., two-dimensional coordinate range (namely
Here two-dimensional position information).
When determining the target reference object in target image, if there are multiple faces in target image, for determination
Which is the face of user A in multiple faces.
Optionally, in one embodiment, it is contemplated that under most of scenes, target reference object is all apart from camera
Nearest object, therefore, the area of the facial image of the target reference object are than the area of the facial image of other objects
It is big, therefore, target can be determined as by target reference object (i.e. human region) corresponding to the facial image by maximum area
Reference object, and determine the two-dimensional coordinate range of target reference object in the target image.Determining target can be promoted in this way
The efficiency of reference object and its two-dimensional coordinate range.
Optionally, in another embodiment, can by the multiple facial images recognized from target image respectively with
The facial image of user A (i.e. target reference object) prestored matches, so as to search from multiple facial image
It to the facial image for belonging to user A, and then determines two-dimensional coordinate range of the user A in the target image, can be promoted to mesh
Mark the recognition accuracy of reference object, and the recognition accuracy to its two-dimensional coordinate range.
S302 carries out three-dimensional reconstruction at least two image, obtains threedimensional model;
When carrying out three-dimensional reconstruction at least two two dimensional images, Lai Shengcheng threedimensional model can be using known and not
It is realized any one three-dimensional rebuilding method for developing, the present invention is without limitation.
By taking at least two image is the first image and the second image as an example, to be illustrated to three-dimensional reconstruction.
Due to the two dimensional image that the first image and the second image are three-dimension object in photographed scene, it can be by right
Two cameras are demarcated to establish effective imaging model, solve the inside and outside parameter of two cameras, thus can be with
The three-dimensional point coordinate in space is obtained in conjunction with the matching result of image, to achieve the purpose that carry out three-dimensional reconstruction.Then, right
First image and the second image carry out feature extraction, and the feature of extraction mainly includes characteristic point, characteristic curve and characteristic area.Root
A kind of corresponding relationship between image pair is established according to extracted feature, that is, by same physical space o'clock in the first figure
As correspond with the imaging point in this two width different images of the second image, this completes Stereo matching mistakes
Journey.More accurate matching result is obtained by the above process, in conjunction with the inside and outside parameter of camera calibration, so that it may recover
Three-dimensional scene information.
S303 determines that the second of the target reference object is empty according to the two-dimensional position information and the threedimensional model
Between location information.
Wherein, since S302 gets the threedimensional model of photographed scene, which may include every in photographed scene
The three-dimensional informations such as the three-dimensional coordinate of a point (pixel corresponding to image) and azimuth.And S302 also available user A
Two-dimensional position information (i.e. two-dimensional coordinate range) in the two dimensional image (i.e. target image) of photographed scene, then due to three
Dimension module and two dimensional image it is corresponding be all the same photographed scene, therefore, can based on threedimensional model and the user A two
Location information is tieed up, to determine the three-dimensional information (such as the information such as depth information, two-dimensional coordinate range, azimuth) of user A
In embodiments of the present invention, using at least two images under the Same Scene of at least two cameras shooting, come
The threedimensional model of the photographed scene is generated, and using the two-dimensional position information of the threedimensional model and target reference object, to determine
The spatial positional information of target reference object can accurately get the spatial positional information of target reference object.And
Due to construction threedimensional model it can be shown that each pixel depth information, without camera configuration with higher
(i.e. the parameter configuration of camera requires less high, does not need that TOF (Time of flight, flight time) next life can be utilized
At threedimensional model, the depth information of target reference object is obtained), reduce the cost of electronic equipment.
Wherein, when executing S303, the two-dimensional position information and the threedimensional model can be matched, is determined
Depth information corresponding with the two-dimensional position information in the threedimensional model;According to the two-dimensional position information and the depth
Information is spent, determines the second space location information of the target reference object.
Wherein, due to threedimensional model it can be shown that the three-dimensional coordinate information of each pixel and side under photographed scene
The three-dimensional informations such as parallactic angle therefore, can be by the two of the target reference object in order to determine the spatial position of target reference object
Dimension location information is matched with the three-dimensional coordinate information of the threedimensional model, so that it is determined that with described two in the threedimensional model
The corresponding depth information of dimension location information (to be illustrated for a coordinate (x1, y1) of two-dimensional position information, three-dimensional mould
It is z1 that two-dimensional coordinate, which is the z coordinate of x1, y1, in type, hence, it can be determined that in threedimensional model with coordinate (x1, y1) corresponding depth
Angle value is z1);The three-dimensional coordinate of some two-dimensional image vegetarian refreshments so in target reference object is (x1, y1, z1), due to mesh
It is a two-dimensional coordinate range that it is corresponding in two dimensional image, which to mark reference object, then can be obtained by the target reference object
Three-dimensional coordinate range, wherein the second space location information of target reference object may include the three-dimensional coordinate range, also
It may further include azimuth information.
Step 104, according to the second space location information and each first spatial positional information, determine described in
The corresponding target sound source of target reference object, and target audio sequence corresponding with the target sound source;
Optionally, in one embodiment, it when executing step 104, can be accomplished by the following way: will correspond to
Multiple first spatial positional informations of different sound sources are matched respectively with the second space location information, and determining
With highest the first spatial positional information of target of degree;Will sound source corresponding with first spatial positional information of target, be determined as
The target sound source of the target reference object;By tonic train corresponding with the target sound source in the multiple tonic train,
It is determined as target audio sequence.
Wherein, first space bit confidence of each sound source under photographed scene in the video recorded due to step 102
Breath, and step 103 is got back second space location information of the target reference object under the photographed scene, then for determination
Which sound source is target reference object in the multi-acoustical of step 102, can be distinguished multiple first spatial positional informations here
It is matched with second space location information, so that multiple matching degrees are obtained, it can be by the target first of corresponding highest matching degree
Sound source corresponding to spatial positional information is determined as the sound source of target reference object, then the corresponding tonic train of the sound source is all
For the tonic train (i.e. the target reference object issued during video record voice data) of the target reference object.
In embodiments of the present invention, using under same photographed scene, the spatial position of the tonic train of different sound sources, with
And the spatial position of target reference object, so as to accurately identify the target sound source of corresponding target reference object, and
The tonic train of the target sound source accurately can find corresponding target from isolated multiple tonic trains
The tonic train of reference object.
Optionally, in another embodiment, it when executing step 104, can also be accomplished by the following way:
By multiple first spatial positional informations of the different sound sources of correspondence and the second space location information respectively into
Row matching, determines the first spatial positional information of target (number of the first spatial positional information of target that match degree is greater than the preset threshold
Amount can be one or more);
It, will be with first spatial positional information pair of target when the quantity of the first spatial positional information of target is one
The sound source answered is determined as the target sound source of the target reference object;By in the multiple tonic train with the target sound source
Corresponding tonic train, determines target audio sequence;
When the quantity of the first spatial positional information of target is multiple, identification and multiple first spatial positions of target
Corresponding multiple first sound sources of information;It determines in the multiple tonic train and is respectively corresponded with the multiple first sound source
Multiple first tonic trains;By the vocal print of the vocal print feature of each first tonic train and the target reference object
Feature is matched, and determines the first tonic train that highest vocal print feature matching degree is corresponded in the multiple first tonic train;
By the first tonic train of corresponding highest vocal print feature matching degree, it is determined as target audio sequence;By the target audio sequence
Corresponding sound source is determined as the corresponding target sound source of the target reference object.
In this way, working as the first space bit of target that the determining second space positional distance with target reference object is closer to
When being set to multiple, the vocal print feature that can use target reference object is come to corresponding more with multiple the first spatial positions of target
A first sound source is screened, so that it is determined that the corresponding target sound source of target reference object and target audio sequence, promotion pair
The tonic train recognition accuracy of target sound source in multiple tonic trains.
Step 105, according to the target audio sequence, video recording audio is generated.
Optionally, in one embodiment, target audio sequence can be determined as to audio of recording a video.
Optionally, in another embodiment, noise reduction process can also be carried out to the target audio sequence, to generate record
As audio.
In embodiments of the present invention, at least two images using the shooting of at least two cameras, to determine that target is clapped
The second space location information of object is taken the photograph, and Sound seperation is carried out to the audio signal in video recording, generates the more of different sound sources
First spatial positional information of a tonic train and each sound source, then combining the first spatial positional information and second space
Location information, so that it may accurately determine the target audio sequence for belonging to target reference object, and utilize target audio sequence
Column, generate the audio of video recording, can in video process, to audio signal filter out with target reference object it is incoherent other
The sound of sound source obtains the high quality video with clear target reference object sound, and target in recorded video is avoided to clap
The sound for taking the photograph object is covered by ambient sound, the problem for causing the video sound intermediate frequency recorded second-rate, improves recording view
Audio quality in frequency.
In addition, in embodiments of the present invention, it, should different from tradition video recording image and audio simple combination mode used
Method utilizes at least two images of at least two cameras shooting, generates threedimensional model, consequently facilitating determining target shooting pair
The depth information of elephant, and combined with the audio in the video of face and recording in image, take full advantage of same ring
The spatial position related information of border, target reference object and sound source obtains the video recording for containing only video recording main body sound, Ke Yixian
Write the video recording audio quality for promoting electronic equipment.
Referring to Fig. 2, the block diagram of the record processing device of one embodiment of the invention is shown.The record processing device
With at least two cameras, the record processing device of the embodiment of the present invention is able to achieve the video record processing side in above-described embodiment
The details of method, and reach identical effect.Record processing device shown in Fig. 2 includes:
Acquisition module 21, for acquire audio signal in video process and by least two camera simultaneously
At least two images of shooting;
Processing module 22 generates multiple tonic trains of corresponding different sound sources for handling the audio signal
And the first spatial positional information of each sound source;
First determining module 23, for determining the second space position of target reference object according at least two image
Confidence breath;
Second determining module 24, for according to the second space location information and each first space bit confidence
Breath, determines the corresponding target sound source of the target reference object, and target audio sequence corresponding with the target sound source;
Generation module 25, for generating video recording audio according to the target audio sequence.
Optionally, first determining module 23 includes:
Identify submodule, for carrying out recognition of face to any one target image at least two image, really
Set the goal the two-dimensional position information of reference object and the target reference object in the target image;
Submodule is rebuild, for carrying out three-dimensional reconstruction at least two image, obtains threedimensional model;
First determines submodule, for determining that the target is clapped according to the two-dimensional position information and the threedimensional model
Take the photograph the second space location information of object.
Optionally, described first determine that submodule includes:
First determination unit determines described three for matching the two-dimensional position information and the threedimensional model
Depth information corresponding with the two-dimensional position information in dimension module;
Second determination unit, for determining the target shooting according to the two-dimensional position information and the depth information
The second space location information of object.
Optionally, the processing module 22 includes:
Extracting sub-module, for extracting the characteristic information of the audio signal;
Submodule is handled, for handling the characteristic information, generates multiple tonic trains of corresponding different sound sources
And the first spatial positional information of each sound source.
Optionally, the record processing device has multiple microphones;
The acquisition module 21 is also used to acquire multiple received multipath audio signals of microphone described in video process.
Optionally, second determining module 24 includes:
Matched sub-block, for multiple first spatial positional informations and second sky of different sound sources will to be corresponded to
Between location information matched respectively, determine highest the first spatial positional information of target of matching degree;
Second determines submodule, for will sound source corresponding with first spatial positional information of target, be determined as described
The target sound source of target reference object;
Third determines submodule, is used for tonic train corresponding with the target sound source in the multiple tonic train,
It is determined as target audio sequence.
Record processing device provided in an embodiment of the present invention can be realized what electronic equipment in above method embodiment was realized
Each process, to avoid repeating, which is not described herein again.
In embodiments of the present invention, at least two images using the shooting of at least two cameras, to determine that target is clapped
The second space location information of object is taken the photograph, and Sound seperation is carried out to the audio signal in video recording, generates the more of different sound sources
First spatial positional information of a tonic train and each sound source, then combining the first spatial positional information and second space
Location information, so that it may accurately determine the target audio sequence for belonging to target reference object, and utilize target audio sequence
Column, generate the audio of video recording, can in video process, to audio signal filter out with target reference object it is incoherent other
The sound of sound source obtains the high quality video with clear target reference object sound, and target in recorded video is avoided to clap
The sound for taking the photograph object is covered by ambient sound, the problem for causing the video sound intermediate frequency recorded second-rate, improves recording view
Audio quality in frequency.
The hardware structural diagram of Fig. 3 a kind of electronic equipment of each embodiment to realize the present invention, the electronic equipment
400 have at least two cameras and at least one microphone, it is preferable that electronic equipment 400 has multiple microphones.
The electronic equipment 400 includes but is not limited to: radio frequency unit 401, network module 402, audio output unit 403, defeated
Enter unit 404, sensor 405, display unit 406, user input unit 407, interface unit 408, memory 409, processor
The components such as 410 and power supply 411.It will be understood by those skilled in the art that electronic devices structure shown in Fig. 3 is not constituted
Restriction to electronic equipment, electronic equipment may include components more more or fewer than diagram, or combine certain components, or
The different component layout of person.In embodiments of the present invention, electronic equipment includes but is not limited to mobile phone, tablet computer, notebook
Computer, palm PC, car-mounted terminal, wearable device and pedometer etc..
Wherein, input unit 404, for acquiring audio signal in video process and being imaged by described at least two
At least two images of head shooting simultaneously;
Processor 410, the audio signal for acquiring to input unit 404 are handled, and generate corresponding different sound sources
First spatial positional information of multiple tonic trains and each sound source;And according to the acquisition of input unit 404
At least two images determine the second space location information of target reference object;According to the second space location information and often
A first spatial positional information, determines the corresponding target sound source of the target reference object, and with the target sound source
Corresponding target audio sequence;According to the target audio sequence, video recording audio is generated.
In embodiments of the present invention, at least two images using the shooting of at least two cameras, to determine that target is clapped
The second space location information of object is taken the photograph, and Sound seperation is carried out to the audio signal in video recording, generates the more of different sound sources
First spatial positional information of a tonic train and each sound source, then combining the first spatial positional information and second space
Location information, so that it may accurately determine the target audio sequence for belonging to target reference object, and utilize target audio sequence
Column, generate the audio of video recording, can in video process, to audio signal filter out with target reference object it is incoherent other
The sound of sound source obtains the high quality video with clear target reference object sound, and target in recorded video is avoided to clap
The sound for taking the photograph object is covered by ambient sound, the problem for causing the video sound intermediate frequency recorded second-rate, improves recording view
Audio quality in frequency.
It should be understood that the embodiment of the present invention in, radio frequency unit 401 can be used for receiving and sending messages or communication process in, signal
Send and receive, specifically, by from base station downlink data receive after, to processor 410 handle;In addition, by uplink
Data are sent to base station.In general, radio frequency unit 401 includes but is not limited to antenna, at least one amplifier, transceiver, coupling
Device, low-noise amplifier, duplexer etc..In addition, radio frequency unit 401 can also by wireless communication system and network and other
Equipment communication.
Electronic equipment provides wireless broadband internet by network module 402 for user and accesses, and such as user is helped to receive
It sends e-mails, browse webpage and access streaming video etc..
Audio output unit 403 can be received by radio frequency unit 401 or network module 402 or in memory 409
The audio data of storage is converted into audio signal and exports to be sound.Moreover, audio output unit 403 can also provide with
The relevant audio output of specific function that electronic equipment 400 executes is (for example, call signal receives sound, message sink sound etc.
Deng).Audio output unit 403 includes loudspeaker, buzzer and receiver etc..
Input unit 404 is for receiving audio or video signal.Input unit 404 may include graphics processor
(Graphics Processing Unit, GPU) 4041 and microphone 4042, graphics processor 4041 capture mould in video
The image data of the static images or video that are obtained in formula or image capture mode by image capture apparatus (such as camera) carries out
Processing.Treated, and picture frame may be displayed on display unit 406.Through graphics processor 4041, treated that picture frame can
To be stored in memory 409 (or other storage mediums) or be sent via radio frequency unit 401 or network module 402.
Microphone 4042 can receive sound, and can be audio data by such acoustic processing.Treated audio data
It is defeated that the format that can be sent to mobile communication base station via radio frequency unit 401 can be converted in the case where telephone calling model
Out.
Electronic equipment 400 further includes at least one sensor 405, such as optical sensor, motion sensor and other biographies
Sensor.Specifically, optical sensor includes ambient light sensor and proximity sensor, wherein ambient light sensor can be according to environment
The light and shade of light adjusts the brightness of display panel 4061, and proximity sensor can close when electronic equipment 400 is moved in one's ear
Close display panel 4061 and/or backlight.As a kind of motion sensor, accelerometer sensor can detect (one in all directions
As be three axis) acceleration size, can detect that size and the direction of gravity when static, can be used to identify electronic equipment posture
(such as horizontal/vertical screen switching, dependent game, magnetometer pose calibrating), Vibration identification correlation function (such as pedometer, percussion)
Deng;Sensor 405 can also include fingerprint sensor, pressure sensor, iris sensor, molecule sensor, gyroscope, gas
Meter, hygrometer, thermometer, infrared sensor etc. are pressed, details are not described herein.
Display unit 406 is for showing information input by user or being supplied to the information of user.Display unit 406 can
Including display panel 4061, liquid crystal display (Liquid Crystal Display, LCD), organic light-emitting diodes can be used
Forms such as (Organic Light-Emitting Diode, OLED) are managed to configure display panel 4061.
User input unit 407 can be used for receiving the number or character information of input, and generate the use with electronic equipment
Family setting and the related key signals input of function control.Specifically, user input unit 407 include touch panel 4071 with
And other input equipments 4072.Touch panel 4071, also referred to as touch screen collect the touch behaviour of user on it or nearby
Make (for example user uses any suitable objects or attachment such as finger, stylus on touch panel 4071 or in touch panel
Operation near 4071).Touch panel 4071 may include both touch detecting apparatus and touch controller.Wherein, it touches
Detection device detects the touch orientation of user, and detects touch operation bring signal, transmits a signal to touch controller;
Touch controller receives touch information from touch detecting apparatus, and is converted into contact coordinate, then gives processor 410,
It receives the order that processor 410 is sent and is executed.Furthermore, it is possible to using resistance-type, condenser type, infrared ray and surface
The multiple types such as sound wave realize touch panel 4071.In addition to touch panel 4071, user input unit 407 can also include it
His input equipment 4072.Specifically, other input equipments 4072 can include but is not limited to physical keyboard, function key (such as sound
Measure control button, switch key etc.), trace ball, mouse, operating stick, details are not described herein.
Further, touch panel 4071 can be covered on display panel 4061, when touch panel 4071 is detected at it
On or near touch operation after, send processor 410 to determine the type of touch event, be followed by subsequent processing device 410 according to touching
The type for touching event provides corresponding visual output on display panel 4061.Although in Fig. 3, touch panel 4071 and aobvious
Show that panel 4061 is the function that outputs and inputs of realizing electronic equipment as two independent components, but in certain implementations
In example, touch panel 4071 and display panel 4061 can be integrated and be realized the function that outputs and inputs of electronic equipment, specifically
Herein without limitation.
Interface unit 408 is the interface that external device (ED) is connect with electronic equipment 400.For example, external device (ED) may include having
Line or wireless head-band earphone port, external power supply (or battery charger) port, wired or wireless data port, storage card
Port, port, the port audio input/output (I/O), video i/o port, ear for connecting the device with identification module
Generator terminal mouth etc..Interface unit 408 can be used for receiving the input (for example, data information, electric power etc.) from external device (ED)
And by one or more elements that the input received is transferred in electronic equipment 400 or it can be used in electronic equipment
Data are transmitted between 400 and external device (ED).
Memory 409 can be used for storing software program and various data.Memory 409 can mainly include storage program
Area and storage data area, wherein storing program area can application program needed for storage program area, at least one function (such as
Sound-playing function, image player function etc.) etc.;Storage data area, which can be stored, uses created data (ratio according to mobile phone
Such as audio data, phone directory) etc..In addition, memory 409 may include high-speed random access memory, it can also include non-
Volatile memory, for example, at least a disk memory, flush memory device or other volatile solid-state parts.
Processor 410 is the control centre of electronic equipment, utilizes each of various interfaces and the entire electronic equipment of connection
A part by running or execute the software program and/or module that are stored in memory 409, and calls and is stored in storage
Data in device 409 execute the various functions and processing data of electronic equipment, to carry out integral monitoring to electronic equipment.Place
Managing device 410 may include one or more processing units;Preferably, processor 410 can integrate application processor and modulatedemodulate is mediated
Manage device, wherein the main processing operation system of application processor, user interface and application program etc., modem processor is main
Processing wireless communication.It is understood that above-mentioned modem processor can not also be integrated into processor 410.
Electronic equipment 400 can also include the power supply 411 (such as battery) powered to all parts, it is preferred that power supply
411 can be logically contiguous by power-supply management system and processor 410, thus charged by power-supply management system realization management,
The functions such as electric discharge and power managed.
In addition, electronic equipment 400 includes some unshowned functional modules, details are not described herein.
Preferably, the embodiment of the present invention also provides a kind of electronic equipment, including processor 410, and memory 409 is stored in
On memory 409 and the computer program that can run on the processor 410, the computer program are executed by processor 410
Each process of the above-mentioned video record processing embodiment of the method for Shi Shixian, and identical technical effect can be reached, to avoid repeating, this
In repeat no more.
The embodiment of the present invention also provides a kind of computer readable storage medium, is stored on computer readable storage medium
Computer program, the computer program realize each process of above-mentioned video record processing embodiment of the method when being executed by processor, and
Identical technical effect can be reached, to avoid repeating, which is not described herein again.Wherein, the computer readable storage medium,
As read-only memory (Read-Only Memory, abbreviation ROM), random access memory (Random Access Memory,
Abbreviation RAM), magnetic or disk etc..
It should be noted that, in this document, the terms "include", "comprise" or its any other variant be intended to it is non-
It is exclusive to include, so that the process, method, article or the device that include a series of elements not only include those elements,
It but also including other elements that are not explicitly listed, or further include for this process, method, article or device institute
Intrinsic element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that
There is also other identical elements in process, method, article or device including the element.
Through the above description of the embodiments, those skilled in the art can be understood that above-described embodiment
Method can be realized by means of software and necessary general hardware platform, naturally it is also possible to by hardware, but many situations
It is lower the former be more preferably embodiment.Based on this understanding, technical solution of the present invention is substantially in other words to the prior art
The part to contribute can be embodied in the form of software products, which is stored in a storage and is situated between
In matter (such as ROM/RAM, magnetic disk, CD), including some instructions are used so that a terminal (can be mobile phone, computer, clothes
Business device, air conditioner or the network equipment etc.) execute method described in each embodiment of the present invention.
The embodiment of the present invention is described with above attached drawing, but the invention is not limited to above-mentioned tools
Body embodiment, the above mentioned embodiment is only schematical, rather than restrictive, the ordinary skill of this field
Personnel under the inspiration of the present invention, without breaking away from the scope protected by the purposes and claims of the present invention, can also make
Many forms belong within protection of the invention.
Claims (10)
1. a kind of video record processing method, applied to the electronic equipment at least two cameras, which is characterized in that the method
Include:
At least two images for acquiring the audio signal in video process and being shot simultaneously by least two camera;
The audio signal is handled, generate corresponding different sound sources multiple tonic trains and each sound source the
One spatial positional information;
According at least two image, the second space location information of target reference object is determined;
According to the second space location information and each first spatial positional information, the target reference object pair is determined
The target sound source answered, and target audio sequence corresponding with the target sound source;
According to the target audio sequence, video recording audio is generated.
2. determining that target is clapped the method according to claim 1, wherein described according at least two image
Take the photograph the second space location information of object, comprising:
Recognition of face is carried out to any one target image at least two image, determines target reference object and institute
State two-dimensional position information of the target reference object in the target image;
Three-dimensional reconstruction is carried out at least two image, obtains threedimensional model;
According to the two-dimensional position information and the threedimensional model, the second space position letter of the target reference object is determined
Breath.
3. according to the method described in claim 2, it is characterized in that, described according to the two-dimensional position information and the three-dimensional mould
Type determines the second space location information of the target reference object, comprising:
The two-dimensional position information and the threedimensional model are matched, determine in the threedimensional model with the two-dimensional position
The corresponding depth information of information;
According to the two-dimensional position information and the depth information, the second space position letter of the target reference object is determined
Breath.
4. the method according to claim 1, wherein the electronic equipment has multiple microphones, the acquisition
Audio signal in video process, comprising:
Acquire multiple received multipath audio signals of microphone described in video process.
5. the method according to claim 1, wherein described according to the second space location information and each institute
The first spatial positional information is stated, determines the corresponding target sound source of the target reference object, and corresponding with the target sound source
Target audio sequence, comprising:
Multiple first spatial positional informations of the different sound sources of correspondence and the second space location information are carried out respectively
Match, determines highest the first spatial positional information of target of matching degree;
Will sound source corresponding with first spatial positional information of target, be determined as the target sound source of the target reference object;
By tonic train corresponding with the target sound source in the multiple tonic train, it is determined as target audio sequence.
6. a kind of record processing device, which is characterized in that the record processing device has at least two cameras, the video recording
Processing unit includes:
Acquisition module, for acquire audio signal in video process and by least two camera at the same shooting extremely
Few two images;
Processing module generates multiple tonic trains of corresponding different sound sources and every for handling the audio signal
First spatial positional information of a sound source;
First determining module, for determining the second space location information of target reference object according at least two image;
Second determining module, for determining according to the second space location information and each first spatial positional information
The corresponding target sound source of the target reference object, and target audio sequence corresponding with the target sound source;
Generation module, for generating video recording audio according to the target audio sequence.
7. record processing device according to claim 6, which is characterized in that first determining module includes:
It identifies submodule, for carrying out recognition of face to any one target image at least two image, determines mesh
Mark the two-dimensional position information of reference object and the target reference object in the target image;
Submodule is rebuild, for carrying out three-dimensional reconstruction at least two image, obtains threedimensional model;
First determines submodule, for determining the target shooting pair according to the two-dimensional position information and the threedimensional model
The second space location information of elephant.
8. record processing device according to claim 7, which is characterized in that described first determines that submodule includes:
First determination unit determines the three-dimensional mould for matching the two-dimensional position information and the threedimensional model
Depth information corresponding with the two-dimensional position information in type;
Second determination unit, for determining the target reference object according to the two-dimensional position information and the depth information
Second space location information.
9. record processing device according to claim 6, which is characterized in that the record processing device has multiple Mikes
Wind;
The acquisition module is also used to acquire multiple received multipath audio signals of microphone described in video process.
10. record processing device according to claim 6, which is characterized in that second determining module includes:
Matched sub-block, for will correspond to different sound sources multiple first spatial positional informations and the second space position
Information is matched respectively, determines highest the first spatial positional information of target of matching degree;
Second determine submodule, for will sound source corresponding with first spatial positional information of target, be determined as the target
The target sound source of reference object;
Third determines submodule, for determining tonic train corresponding with the target sound source in the multiple tonic train
For target audio sequence.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910770432.3A CN110505403A (en) | 2019-08-20 | 2019-08-20 | A kind of video record processing method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910770432.3A CN110505403A (en) | 2019-08-20 | 2019-08-20 | A kind of video record processing method and device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110505403A true CN110505403A (en) | 2019-11-26 |
Family
ID=68589018
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910770432.3A Pending CN110505403A (en) | 2019-08-20 | 2019-08-20 | A kind of video record processing method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110505403A (en) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113099031A (en) * | 2021-02-26 | 2021-07-09 | 华为技术有限公司 | Sound recording method and related equipment |
CN113329138A (en) * | 2021-06-03 | 2021-08-31 | 维沃移动通信有限公司 | Video shooting method, video playing method and electronic equipment |
WO2021175165A1 (en) * | 2020-03-06 | 2021-09-10 | 华为技术有限公司 | Audio processing method and device |
CN113473057A (en) * | 2021-05-20 | 2021-10-01 | 华为技术有限公司 | Video recording method and electronic equipment |
CN113472943A (en) * | 2021-06-30 | 2021-10-01 | 维沃移动通信有限公司 | Audio processing method, device, equipment and storage medium |
CN113658254A (en) * | 2021-07-28 | 2021-11-16 | 深圳市神州云海智能科技有限公司 | Method and device for processing multi-modal data and robot |
WO2023045980A1 (en) * | 2021-09-24 | 2023-03-30 | 北京有竹居网络技术有限公司 | Audio signal playing method and apparatus, and electronic device |
CN116095254A (en) * | 2022-05-30 | 2023-05-09 | 荣耀终端有限公司 | Audio processing method and device |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104077804A (en) * | 2014-06-09 | 2014-10-01 | 广州嘉崎智能科技有限公司 | Method for constructing three-dimensional human face model based on multi-frame video image |
CN105578097A (en) * | 2015-07-10 | 2016-05-11 | 宇龙计算机通信科技(深圳)有限公司 | Video recording method and terminal |
CN105847660A (en) * | 2015-06-01 | 2016-08-10 | 维沃移动通信有限公司 | Dynamic zoom method, device and intelligent device |
US20180124320A1 (en) * | 2014-05-12 | 2018-05-03 | Gopro, Inc. | Dual-microphone camera |
CN108876835A (en) * | 2018-03-28 | 2018-11-23 | 北京旷视科技有限公司 | Depth information detection method, device and system and storage medium |
CN109683135A (en) * | 2018-12-28 | 2019-04-26 | 科大讯飞股份有限公司 | A kind of sound localization method and device, target capturing system |
-
2019
- 2019-08-20 CN CN201910770432.3A patent/CN110505403A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180124320A1 (en) * | 2014-05-12 | 2018-05-03 | Gopro, Inc. | Dual-microphone camera |
CN104077804A (en) * | 2014-06-09 | 2014-10-01 | 广州嘉崎智能科技有限公司 | Method for constructing three-dimensional human face model based on multi-frame video image |
CN105847660A (en) * | 2015-06-01 | 2016-08-10 | 维沃移动通信有限公司 | Dynamic zoom method, device and intelligent device |
CN105578097A (en) * | 2015-07-10 | 2016-05-11 | 宇龙计算机通信科技(深圳)有限公司 | Video recording method and terminal |
CN108876835A (en) * | 2018-03-28 | 2018-11-23 | 北京旷视科技有限公司 | Depth information detection method, device and system and storage medium |
CN109683135A (en) * | 2018-12-28 | 2019-04-26 | 科大讯飞股份有限公司 | A kind of sound localization method and device, target capturing system |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2021175165A1 (en) * | 2020-03-06 | 2021-09-10 | 华为技术有限公司 | Audio processing method and device |
CN113099031A (en) * | 2021-02-26 | 2021-07-09 | 华为技术有限公司 | Sound recording method and related equipment |
CN113473057A (en) * | 2021-05-20 | 2021-10-01 | 华为技术有限公司 | Video recording method and electronic equipment |
CN113329138A (en) * | 2021-06-03 | 2021-08-31 | 维沃移动通信有限公司 | Video shooting method, video playing method and electronic equipment |
CN113472943A (en) * | 2021-06-30 | 2021-10-01 | 维沃移动通信有限公司 | Audio processing method, device, equipment and storage medium |
CN113472943B (en) * | 2021-06-30 | 2022-12-09 | 维沃移动通信有限公司 | Audio processing method, device, equipment and storage medium |
CN113658254A (en) * | 2021-07-28 | 2021-11-16 | 深圳市神州云海智能科技有限公司 | Method and device for processing multi-modal data and robot |
WO2023045980A1 (en) * | 2021-09-24 | 2023-03-30 | 北京有竹居网络技术有限公司 | Audio signal playing method and apparatus, and electronic device |
CN116095254A (en) * | 2022-05-30 | 2023-05-09 | 荣耀终端有限公司 | Audio processing method and device |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110505403A (en) | A kind of video record processing method and device | |
CN108628985B (en) | Photo album processing method and mobile terminal | |
CN106648118A (en) | Virtual teaching method based on augmented reality, and terminal equipment | |
CN108989672A (en) | A kind of image pickup method and mobile terminal | |
CN107864336B (en) | A kind of image processing method, mobile terminal | |
CN107566749A (en) | Image pickup method and mobile terminal | |
CN110166691A (en) | A kind of image pickup method and terminal device | |
CN113365085B (en) | Live video generation method and device | |
CN109005336A (en) | A kind of image capturing method and terminal device | |
CN108683850A (en) | A kind of shooting reminding method and mobile terminal | |
CN108052819A (en) | A kind of face identification method, mobile terminal and computer readable storage medium | |
CN108462826A (en) | A kind of method and mobile terminal of auxiliary photo-taking | |
CN110519699A (en) | A kind of air navigation aid and electronic equipment | |
CN109618218A (en) | A kind of method for processing video frequency and mobile terminal | |
CN108881544A (en) | A kind of method taken pictures and mobile terminal | |
CN108564613A (en) | A kind of depth data acquisition methods and mobile terminal | |
CN110536479A (en) | Object transmission method and electronic equipment | |
CN110110571A (en) | A kind of barcode scanning method and mobile terminal | |
CN109241832A (en) | A kind of method and terminal device of face In vivo detection | |
CN108257104A (en) | A kind of image processing method and mobile terminal | |
CN109544445A (en) | A kind of image processing method, device and mobile terminal | |
CN109684277A (en) | A kind of image display method and terminal | |
CN108174081B (en) | A kind of image pickup method and mobile terminal | |
CN109803110A (en) | A kind of image processing method, terminal device and server | |
CN108763475A (en) | A kind of method for recording, record device and terminal device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20191126 |
|
RJ01 | Rejection of invention patent application after publication |