WO2016082199A1 - 录取录像对象的声音的方法和移动终端 - Google Patents

录取录像对象的声音的方法和移动终端 Download PDF

Info

Publication number
WO2016082199A1
WO2016082199A1 PCT/CN2014/092534 CN2014092534W WO2016082199A1 WO 2016082199 A1 WO2016082199 A1 WO 2016082199A1 CN 2014092534 W CN2014092534 W CN 2014092534W WO 2016082199 A1 WO2016082199 A1 WO 2016082199A1
Authority
WO
WIPO (PCT)
Prior art keywords
mobile terminal
recording
sound
information
configuration information
Prior art date
Application number
PCT/CN2014/092534
Other languages
English (en)
French (fr)
Inventor
康俊腾
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to PCT/CN2014/092534 priority Critical patent/WO2016082199A1/zh
Priority to CN201480083698.6A priority patent/CN107004426B/zh
Publication of WO2016082199A1 publication Critical patent/WO2016082199A1/zh
Priority to US15/607,124 priority patent/US10062393B2/en

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0272Voice signal separating
    • G10L21/028Voice signal separating using properties of sound source
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/57Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for processing of video signals
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B27/00Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
    • G11B27/02Editing, e.g. varying the order of information signals recorded on, or reproduced from, record carriers
    • G11B27/031Electronic editing of digitised analogue information signals, e.g. audio or video signals
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L2021/02161Number of inputs available containing the signal or the noise to be suppressed
    • G10L2021/02166Microphone arrays; Beamforming
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03GCONTROL OF AMPLIFICATION
    • H03G3/00Gain control in amplifiers or frequency changers without distortion of the input signal
    • H03G3/20Automatic control
    • H03G3/30Automatic control in amplifiers having semiconductor devices
    • H03G3/3005Automatic control in amplifiers having semiconductor devices in amplifiers suitable for low-frequencies, e.g. audio amplifiers
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03GCONTROL OF AMPLIFICATION
    • H03G3/00Gain control in amplifiers or frequency changers without distortion of the input signal
    • H03G3/20Automatic control
    • H03G3/30Automatic control in amplifiers having semiconductor devices
    • H03G3/32Automatic control in amplifiers having semiconductor devices the control being dependent upon ambient noise level or sound level
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03GCONTROL OF AMPLIFICATION
    • H03G5/00Tone control or bandwidth control in amplifiers
    • H03G5/005Tone control or bandwidth control in amplifiers of digital signals
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03GCONTROL OF AMPLIFICATION
    • H03G5/00Tone control or bandwidth control in amplifiers
    • H03G5/16Automatic control

Definitions

  • the present invention relates to the field of data processing, and more particularly to a method and a mobile terminal for recording a sound of a recorded object.
  • Video recording refers to the recording of images by optical or electromagnetic methods, such as the movement of a child, or the feeding process of an animal. With the development of electronic technology, recording is generally performed simultaneously during the recording process. Get complete audio and video material.
  • the omnidirectional recording is generally used in the recording process, that is, the sound enters the microphone from 0 to 360 degrees and the output is not obvious. The change.
  • the embodiment of the invention provides a method for recording the sound of the recorded object and a mobile terminal, which is used for reducing the background noise in the sound of the recorded video object and improving the recording quality.
  • a first aspect of the embodiments of the present invention provides a method for recording a sound of a video recording object, including:
  • the mobile terminal obtains location information of the video recording object relative to the mobile terminal by using face recognition, where the location information includes angle information and distance information of the video recording object relative to the mobile terminal;
  • the mobile terminal performs beamforming processing on the recorded sound signal according to the beam configuration information, so that the signal strength of the sound signal of the position of the recording object is enhanced, and the signal strength of the sound signal of other orientations is attenuated. The sound of the orientation of the recorded object is obtained.
  • the beam configuration information includes a sound source azimuth angle, a beam direction, and a beam width;
  • the converting, by the mobile terminal, the location information into the beam configuration information specifically includes:
  • the mobile terminal converts the distance information of the video recording object with respect to the terminal into a beam width, wherein the farther the distance, the narrower the beam width.
  • the mobile terminal includes at least two microphones
  • the performing, by the mobile terminal, the beamforming processing on the received sound signal according to the beam configuration information specifically includes:
  • the mobile terminal adjusts parameters of each microphone to collect sound signals according to the beam configuration information, so that after the sound signals collected by the microphones in the mobile terminal are combined, only the sound signals of the orientation of the recording objects exist.
  • the third implementation manner of the first aspect of the embodiment of the present invention is characterized in that the Before identifying the steps to track the recorded object, it also includes:
  • the mobile terminal compares each object in the video recording with the stored preset object, and determines that the same object in the recording screen as the preset object is the recording object.
  • the second aspect of the embodiment of the present invention provides a mobile terminal, which is used for recording a sound of a video object, including:
  • An identification module configured to obtain, by using face recognition, location information of the video object relative to the mobile terminal during video recording, where the location information includes an angle information and a distance of the video object relative to the mobile terminal information;
  • a conversion module configured to convert location information obtained by the identification module into beam configuration information, where the beam configuration information is an input parameter of a beamforming technology
  • a processing module configured to perform beamforming processing on the acquired sound signal according to the beam configuration information, gain a signal strength of the sound signal of the position of the recording object, and attenuate signal strength of the sound signal of other orientations, thereby obtaining a The sound of the orientation of the recording object.
  • the beam configuration information includes a sound source azimuth angle, a beam direction, and a beam width.
  • the conversion module specifically includes:
  • a first converting unit configured to convert angle information of the recording object relative to the mobile terminal into a sound source azimuth angle and a beam direction;
  • a second converting unit configured to convert the distance information of the recording object relative to the mobile terminal into a beam width, wherein the farther the distance, the narrower the beam width.
  • the mobile terminal includes at least two microphones
  • the processing module is specifically configured to: according to the beam configuration information, adjust parameters of each microphone to collect sound signals, so that after the sound signals recorded by the microphones in the mobile terminal are synthesized, only the orientation of the video recording object exists.
  • the sound signal obtains the sound of the position of the recording object.
  • the mobile terminal further includes:
  • the determining module is configured to compare each object in the recorded picture with the stored preset object, and determine that the same object in the recording picture as the preset object is the recording object.
  • a third aspect of the present invention provides a mobile terminal, which is used for recording a sound of a video object, including:
  • the camera obtains position information of the video object relative to the mobile terminal by using face recognition, and the location information includes angle information and distance information of the video object relative to the mobile terminal, and
  • the microphone captures a sound signal around the mobile terminal;
  • the camera transmits the obtained location information to the processor
  • the processor After invoking the operation instruction stored in the memory, after receiving the location information, the processor converts the location information into beam configuration information, where the beam configuration information is an input parameter of a beamforming technology;
  • the processor performs beamforming processing on the sound signal captured by the microphone according to the beam configuration information obtained by the conversion, so that the signal strength of the sound signal of the position of the recording object is enhanced, and the sound signal of other orientations is enhanced.
  • the signal strength is attenuated to obtain the location of the video object the sound of.
  • the beam configuration information includes a sound source azimuth angle, a beam direction, and a beam width.
  • converting the location information into beam configuration information specifically includes:
  • the processor After receiving the location information, the processor converts the angle information of the video object in the location information relative to the mobile terminal into a sound source azimuth angle and a beam direction;
  • the distance information of the video object in the location information relative to the mobile terminal is converted into a beam width, wherein the further the distance, the narrower the beam width.
  • the microphone is at least two;
  • the processor performs beamforming processing on the sound signal recorded by the microphone according to the beam configuration information obtained by the conversion, and specifically includes:
  • the processor obtains the beam configuration information according to the conversion, and adjusts parameters of the sound signal collected by each microphone, so that after the sound signals collected by the microphones are combined, only the sound signal of the orientation of the recording object exists.
  • the memory further stores information about the preset object.
  • the processor compares each object in the recorded picture with the stored preset object, and determines the pre-recorded in the video recording The object with the same object is the recording object.
  • the embodiment of the present invention has the following advantages: in the embodiment of the present invention, the mobile terminal obtains the location information of the recording object through face recognition tracking, and converts the location information into a beam that is an input parameter of the beamforming technology.
  • the configuration information is used to perform beamforming processing on the received sound signal, so that the signal strength of the sound signal in the position of the recording object is enhanced, and the signal intensity of the sound signal of other orientations is attenuated to obtain the sound of the position of the recording object, so that The influence of the sound of other orientations on the sound of the recorded object is avoided, the background noise in the sound of the recorded video object is reduced, and the recording quality is improved.
  • FIG. 1 is a schematic flow chart of a method for recording in recording according to an embodiment of the present invention
  • FIG. 2 is another schematic flowchart of a method for recording in video recording according to an embodiment of the present invention
  • FIG. 3 is a schematic structural diagram of a terminal in an embodiment of the present invention.
  • FIG. 4 is a schematic structural diagram of another terminal in an embodiment of the present invention.
  • FIG. 5 is a schematic structural diagram of another terminal in an embodiment of the present invention.
  • FIG. 6 is a schematic structural diagram of another terminal in an embodiment of the present invention.
  • face recognition refers to a biometric recognition technique based on human facial feature information for identification.
  • beamforming which may also be referred to as spatial domain filtering, is a signal processing technique that uses a sensor array to direct the transmission and reception of signals.
  • the beamforming technique adjusts the parameters of the basic unit of the phase array so that signals of certain angles obtain constructive interference, while signals of other angles acquire destructive interference. Beamforming can be used both for the signal transmitter and for the signal receiver.
  • the beamformer controls the phase and signal amplitude of each of the transmitting devices to obtain the desired constructive and destructive interference modes in the transmitted signal array.
  • the signals received by the different receivers are combined in an appropriate manner to obtain the desired signal radiation pattern.
  • an embodiment of a method for recording a sound of a video recording object in an embodiment of the present invention includes:
  • the mobile terminal obtains location information of the video object relative to the mobile terminal by using face recognition during the recording process;
  • the mobile terminal will capture the image signal and the sound signal, and the mobile terminal according to the mobile terminal Obtaining an image signal, and performing face recognition on the video image represented by the image signal to obtain position information of the video object relative to the mobile terminal, wherein the location information includes angle information and distance of the video object relative to the mobile terminal information.
  • the mobile terminal converts the location information into beam configuration information.
  • the mobile terminal After obtaining the location information, the mobile terminal converts the location information into beam configuration information, where the beam configuration information is an input parameter of a beamforming technology.
  • the mobile terminal performs beamforming processing on the recorded sound signal according to the beam configuration information, so that the signal strength of the sound signal of the position of the recording object is enhanced, and the signal strength of the sound signal of other orientations is attenuated. The sound of the orientation of the recorded object is obtained.
  • the terminal After obtaining the beam configuration information, the terminal performs beamforming processing on the recorded sound signal according to the beam configuration information, so that the signal strength of the sound signal of the position of the recording object is enhanced, and the signal strength of the sound signal of other orientation is Attenuation, the sound of the orientation of the recorded object is obtained.
  • the mobile terminal obtains the location information of the recording object through face recognition, converts the location information into beam configuration information as an input parameter of the beamforming technology, and performs beamforming processing on the recorded sound signal to enable video recording.
  • the signal strength of the sound signal in the orientation of the object is enhanced, and the signal intensity of the sound signal of other orientations is attenuated to obtain the sound of the position of the recorded object, thus avoiding the influence of the sound of other orientations on the sound of the recorded object, and reducing the The background noise in the sound of the recorded dynamic video object improves the recording quality.
  • FIG. 2 Another embodiment of the method for recording the sound of the recorded object in the embodiment of the present invention includes:
  • the mobile terminal compares each object in the recording screen with the stored preset object, and determines that the same object in the recording screen as the preset object is a recording object;
  • the mobile terminal will capture the image signal and the sound signal, and the mobile terminal compares each object in the recording screen represented by the image signal with the stored preset object according to the captured image signal, and determines the pre-recorded image and the pre-recorded image.
  • the object with the same object is the recording object.
  • the preset object may be stored in various forms, for example, may be a picture containing a recording object, and before the recording, the terminal may receive or store a picture containing the recording object, and specify the picture.
  • the specific object in the slice is a recording object; the image information of the preset object may be directly input, and other methods may be used, which are not limited herein.
  • the mobile terminal can also specify a specific object in the recording screen as a recording object by using the object confirmation information, and there are many other ways, which are not limited herein.
  • the step 201 is not performed, and the mobile terminal can automatically use the object that can be recognized by the face recognition in the video screen as the recording target, which is not limited herein.
  • the mobile terminal obtains location information of the video recording object relative to the mobile terminal by using face recognition.
  • the mobile terminal After determining the recording object, the mobile terminal obtains the location information of the recording object relative to the mobile terminal by performing face recognition on the recording object in the recording screen, where the location information includes angle information of the recording object relative to the mobile terminal. Distance information.
  • the location information may include an angle and a distance of a face of the video object recognized by the face recognition. It may be understood that the location information may further include other information, such as a motion trend, etc., which is not limited herein. .
  • the mobile terminal converts the location information into beam configuration information.
  • the mobile terminal After obtaining the location information, the mobile terminal converts the location information into beam configuration information, where the beam configuration information is an input parameter of a beamforming technology.
  • the beam configuration information may include a sound source azimuth angle, a beam direction, and a beam width, and may further include other parameters, such as a sampling rate, a microphone pitch, a maximum noise reduction amount, and the like, which are not limited herein.
  • the mobile terminal converts the location information into beam configuration information, and may convert the angle information of the video object relative to the mobile terminal into a sound source azimuth angle and a beam direction, and move the video object relative to the mobile terminal.
  • the distance information of the terminal is converted into a beam width, wherein the further the distance, the narrower the beam width.
  • the mobile terminal adjusts parameters of the sound signal collected by each microphone according to the beam configuration information, so that after the sound signals collected by the microphones in the mobile mobile terminal are combined, only the sound signal of the position of the recording object exists. The sound of the orientation of the recorded object is obtained.
  • the mobile terminal includes at least two microphones. After obtaining the beam configuration information, the mobile terminal uses the beamforming technology to adjust the parameters of the sound signals collected by the microphones according to the beam configuration information, and enhances the signal strength of the sound signals corresponding to the orientation of the recording object. Attenuating the signal strength of the sound signals of other orientations, so that after the sound signals acquired by the microphones in the mobile terminal are combined, only the sound signals of the position of the recording object exist, and the sound of the position of the recording object is obtained.
  • the mobile terminal can compare each object in the recorded picture with the stored preset object, and determine that the same object in the recorded picture as the preset object is a recording object, which can more accurately capture the sound of the required recording object. .
  • an embodiment of the mobile terminal in the embodiment of the present invention includes:
  • the identification module 301 is configured to obtain, by using face recognition, location information of the video recording object relative to the mobile terminal, where the location information includes angle information of the video recording object relative to the mobile terminal, and Distance information;
  • the conversion module 302 is configured to convert the location information obtained by the identification module 301 into beam configuration information, where the beam configuration information is an input parameter of a beamforming technology;
  • the processing module 303 is configured to perform beamforming processing on the recorded sound signal according to the beam configuration information, and gain a signal strength of the sound signal of the position of the recording object, and attenuate the signal strength of the sound signal of other orientations, thereby obtaining The sound of the orientation of the recording object.
  • the identification module 301 acquires the location information of the recording object through the face recognition tracking, and the conversion module 302 converts the location information into beam configuration information as an input parameter of the beamforming technology, and the processing module 303 performs the sound of the recording.
  • the signal is beamforming, so that the signal strength of the sound signal in the position of the recording object is enhanced, and the signal strength of the sound signal of other orientations is attenuated, so that the sound of the position of the recording object is obtained, thus avoiding the sound of other orientations.
  • the influence of the sound of the recorded object reduces the background noise in the sound of the recorded dynamic video object and improves the recording quality.
  • the conversion module 302 converts the location information into beam configuration information.
  • the beam configuration information may include a sound source azimuth angle, a beam direction, and a beam width.
  • the mobile device is moved in the embodiment of the present invention.
  • the conversion module 302 in the foregoing mobile terminal specifically includes:
  • a first converting unit 401 configured to convert angle information of the recording object relative to the mobile terminal into a sound source azimuth angle and a beam direction;
  • the second converting unit 402 is configured to convert distance information of the recording object relative to the mobile terminal into a beam width, wherein the farther the distance, the narrower the beam width.
  • the mobile terminal includes at least two microphones
  • the processing module 303 is specifically configured to: according to the beam configuration information, adjust parameters of each microphone to collect sound signals, so that the sounds recorded by the microphones in the terminal are After the signal is synthesized, only the sound signal of the position of the recording object exists, and the sound of the position of the recording object is obtained.
  • the conversion module 302 converts the specific parameter in the position information obtained by the face recognition into the corresponding parameter in the beam configuration information. Further, the processing module 303 can adjust the parameters of each microphone according to the beam configuration information, so that the mobile terminal is in the mobile terminal. After the sound signals recorded by the respective microphones are combined, only the sound signal of the position of the recording object exists, and the sound of only the sound corresponding to the recording object is realized.
  • the identification module 301 tracks the recorded object by face recognition.
  • the recorded object can be determined as any object that appears in the recording, or can be a pre-stored preset object, please refer to the figure. 5, as another embodiment of the mobile terminal in the embodiment of the present invention, the mobile terminal further includes:
  • the determining module 501 is configured to compare each object in the recorded picture with the stored preset object, and determine that the same object in the recording picture as the preset object is the recording object.
  • the determining module 501 can compare and determine the recording object according to the stored preset object, and can more accurately record the sound of the required recording object.
  • another embodiment of the mobile terminal 600 in the embodiment of the present invention includes:
  • the mobile terminal may further include an RF circuit 605, an audio circuit 606, a speaker 607, a power management chip 608, an input/output (I/O) subsystem 609, other input/control devices 610, a peripheral interface 611, and an external port 612. These components communicate via one or more communication buses or signal lines 613.
  • the camera 601 can be connected to the processor 603 through the peripheral interface 611, and the microphone 602 can be connected to the audio circuit 606 and the processor 603 through the peripheral interface 611.
  • Mobile terminals may have more or fewer components than those shown in Figure 7, two or more components may be combined, or may have different component configurations or settings, each component may include one or more signals Hardware, software, or a combination of hardware and software, including processing and/or application specific integrated circuits.
  • the mobile terminal provided in this embodiment is described in detail below.
  • Memory 604 can be accessed by a CPU 603, a peripheral interface 611, etc., which can include a high speed random access memory, and can also include non-volatile memory, such as one or more magnetic disk storage devices, flash memory devices. Or other volatile solid-state storage devices.
  • non-volatile memory such as one or more magnetic disk storage devices, flash memory devices. Or other volatile solid-state storage devices.
  • a peripheral interface 611 that can connect the input and output peripherals of the device to the CPU 603 and the memory 604.
  • I/O subsystem 609 can connect input and output peripherals on the device, such as touch screen 614 and other input/control devices 610, to peripheral interface 611.
  • I/O subsystem 609 can include display controller 6091 and one or more input controllers 6092 for controlling other input/control devices 610.
  • one or more input controllers 6092 receive electrical signals from other input/control devices 610 or transmit electrical signals to other input/control devices 610, and other input/control devices 610 may include physical buttons (press buttons, rocker buttons, etc.) ), dial, slide switch, joystick, click wheel.
  • the input controller 6092 can be connected to any of the following: a keyboard, an infrared port, a USB interface, and a pointing device such as a mouse.
  • the touch screen 614 is an input interface and an output interface between the mobile terminal and the user, and displays the visual output to the user.
  • the visual output may include graphics, text, icons, videos, and the like.
  • Display controller 6091 in I/O subsystem 609 receives an electrical signal from touch screen 614 or an electrical signal to touch screen 614.
  • the touch screen 614 detects contact on the touch screen, and the display controller 6091 converts the detected contact into interaction with a user interface object displayed on the touch screen 614, ie, human-computer interaction is achieved, and the user interface object displayed on the touch screen 614 may be running.
  • the icon of the game, the icon of the network to the corresponding network, and the like.
  • the device may also include a light mouse, which is a touch sensitive surface that does not display a visual output, or an extension of a touch sensitive surface formed by the touch screen.
  • the RF circuit 605 is mainly used to establish communication between the mobile terminal and the wireless network (ie, the network side), and implement data reception and transmission between the mobile terminal and the wireless network. For example, sending and receiving short messages, emails, and the like. Specifically, the RF circuit 605 receives and transmits an RF signal, which is also referred to as an electromagnetic signal, and the RF circuit 605 converts the electrical signal into an electromagnetic signal or converts the electromagnetic signal into an electrical signal, and through the electromagnetic signal The communication network and other devices communicate.
  • an RF signal which is also referred to as an electromagnetic signal
  • RF circuitry 605 may include known circuitry for performing these functions including, but not limited to, an antenna system, an RF transceiver, one or more amplifiers, a tuner, one or more oscillators, a digital signal processor, a CODEC chip Group, Subscriber Identity Module (SIM), etc.
  • an antenna system an RF transceiver, one or more amplifiers, a tuner, one or more oscillators, a digital signal processor, a CODEC chip Group, Subscriber Identity Module (SIM), etc.
  • SIM Subscriber Identity Module
  • the audio circuit 606 is primarily for receiving audio data from the peripheral interface 604, converting the audio data to an electrical signal, and transmitting the electrical signal to the speaker 607.
  • the speaker 607 is configured to restore the voice signal received by the mobile phone from the wireless network through the RF circuit 605 to sound and play the sound to the user.
  • the power management chip 608 is used for power supply and power management of the hardware connected to the CPU 603, the I/O subsystem 609, and the peripheral interface 611.
  • the camera 601 obtains position information of the recording object relative to the mobile terminal by using face recognition, and the location information includes angle information and distance information of the recording object relative to the mobile terminal, and
  • the microphone 602 captures a sound signal around the mobile terminal;
  • the camera 601 transmits the obtained location information to the processor 603;
  • the processor 603 After the operation information stored in the memory 601 is invoked, the processor 603, after receiving the location information, converts the location information into beam configuration information, where the beam configuration information is an input parameter of a beamforming technology;
  • the processor 603 performs beamforming processing on the sound signal captured by the microphone 602 according to the converted beam configuration information, so that the signal strength of the sound signal of the position of the recording object is enhanced, and other directions The signal strength of the sound signal is attenuated to obtain the sound of the position of the recorded object.
  • the beam configuration information includes a sound source azimuth angle, a beam direction, and a beam width.
  • the angle of the video object in the location information relative to the mobile terminal may be The information is converted into a sound source azimuth angle and a beam direction; the distance information of the video object in the position information relative to the mobile terminal is converted into a beam width, wherein the farther the distance is, the narrower the beam width is.
  • the microphone 602 is at least two, and the processor 603 can obtain the beam configuration information according to the conversion, and adjust parameters of each microphone 602 to collect the sound signal, so that each microphone 602 After the collected sound signals are combined, only the sound signal of the position of the recording object exists.
  • the memory 604 further stores information of the preset object.
  • the processor 603 can compare the objects in the recorded picture with the stored pre-preparation. Set the object to determine that the same object in the recording screen as the preset object is the recording object.
  • the disclosed system, apparatus, and method may be implemented in other manners.
  • the device embodiments described above are merely illustrative.
  • the division of the unit is only a logical function division.
  • there may be another division manner for example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored or not executed.
  • the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or unit, and may be in an electrical, mechanical or other form.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.
  • each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.
  • the above integrated unit can be implemented in the form of hardware or in the form of a software functional unit.
  • the integrated unit if implemented in the form of a software functional unit and sold or used as a standalone product, may be stored in a computer readable storage medium.
  • the technical solution of the present invention which is essential or contributes to the prior art, or all or part of the technical solution, may be embodied in the form of a software product stored in a storage medium.
  • a number of instructions are included to cause a computer device (which may be a personal computer, server, or network device, etc.) to perform all or part of the steps of the methods described in various embodiments of the present invention.
  • the storage medium includes: a U disk, a removable hard disk, a read-only memory (ROM), a random access memory (RAM), a magnetic disk, or an optical disk, and the like, which can store program codes.

Landscapes

  • Engineering & Computer Science (AREA)
  • Signal Processing (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Quality & Reliability (AREA)
  • Circuit For Audible Band Transducer (AREA)
  • Studio Devices (AREA)

Abstract

本发明实施例公开了录取录像对象的声音的方法和移动终端,用于降低录取的录像对象的声音中的背景噪声,提高录音质量。本发明实施例方法包括:移动终端通过人脸识别获取录像对象的位置信息,将该位置信息转换为作为波束赋形技术的输入参数的波束配置信息,对录取到的声音信号进行波束赋形处理,使得录像对象所在方位的声音信号的信号强度被增强,其他方位的声音信号的信号强度被衰减,得到录像对象所在方位的声音。

Description

录取录像对象的声音的方法和移动终端 技术领域
本发明涉及数据处理领域,尤其涉及录取录像对象的声音的方法和移动终端。
背景技术
录像是指用光学、电磁等方法把图像记录下来,例如录取一个小孩的运动,或一个动物的觅食过程等,随着电子技术的发展,在录像过程中,一般也会同时进行录音,以得到完整的音视频资料。
目前,因为录像过程中录像的对象的位置会动态变化,为了能采集到录像的对象的声音,录像过程中一般采用的全指向录音,即声音从0至360度进入麦克风而输出不会有明显的变化。
然而,在实际应用中,采取全指向录音,在得到录像对象的声音的同时,还会得到其他所有角度的声音,使得录音结果中背景噪声太大,严重影响录音质量。
发明内容
本发明实施例提供了录取录像对象的声音的方法和移动终端,用于降低录取的录像对象的声音中的背景噪声,提高录音质量。
本发明实施例第一方面提供了一种录取录像对象的声音的方法,包括:
移动终端在录像的过程中,通过人脸识别得到所述录像对象相对于所述移动终端的位置信息,所述位置信息包括所述录像对象相对于所述移动终端的角度信息和距离信息;
所述移动终端将所述位置信息转换为波束配置信息,所述波束配置信息为波束赋形技术的输入参数;
所述移动终端根据所述波束配置信息,对录取到的声音信号进行波束赋形处理,使得所述录像对象所在方位的声音信号的信号强度被增强,其他方位的声音信号的信号强度被衰减,得到所述录像对象所在方位的声音。
结合本发明实施例的第一方面,本发明实施例第一方面的第一种实现方式中,所述波束配置信息包括声源方位角度,波束方向和波束宽度;
所述移动终端将所述位置信息转换为波束配置信息具体包括:
所述移动终端将所述录像对象相对于所述终端的角度信息转换为声源方位角度与波束方向;
所述移动终端将所述录像对象相对于所述终端的距离信息转换为波束宽度,其中,距离越远,波束宽度越窄。
结合本发明实施例的第一方面或第一方面的第一种实现方式,本发明实施例第一方面的第二种实现方式中,所述移动终端中包括至少两个麦克风;
所述移动终端根据所述波束配置信息,对录取到的声音信号进行波束赋形处理具体包括:
所述移动终端根据所述波束配置信息,调整每个麦克风采集声音信号的参数,使得所述移动终端中各麦克风采集到的声音信号合成后,仅存在所述录像对象所在方位的声音信号。
结合本发明实施例的第一方面至第一方面的第二种实现方式中任一种实现方式,本发明实施例第一方面的第三种实现方式中,其特征在于,所述通过人脸识别对录像对象进行追踪的步骤之前还包括:
所述移动终端对比录像画面中各对象与存储的预置对象,确定所述录像画面中与所述预置对象相同的对象为所述录像对象。
本发明实施例第二方面提供了一种移动终端,用于录取录像对象的声音,包括:
识别模块,用于在录像的过程中,通过人脸识别得到所述录像对象相对于所述移动终端的位置信息,所述位置信息包括所述录像对象相对于所述移动终端的角度信息和距离信息;
转换模块,用于将所述识别模块得到的位置信息转换为波束配置信息,所述波束配置信息为波束赋形技术的输入参数;
处理模块,用于根据所述波束配置信息,对录取到的声音信号进行波束赋形处理,增益所述录像对象所在方位的声音信号的信号强度,衰减其他方位的声音信号的信号强度,得到所述录像对象所在方位的声音。
结合本发明实施例的第二方面,本发明实施例第二方面的第一种实现方式中,所述波束配置信息包括声源方位角度,波束方向和波束宽度;
所述转换模块具体包括:
第一转换单元,用于将所述录像对象相对于所述移动终端的角度信息转换为声源方位角度与波束方向;
第二转换单元,用于将所述录像对象相对于所述移动终端的距离信息转换为波束宽度,其中,距离越远,波束宽度越窄。
结合本发明实施例的第二方面或第二方面的第一种实现方式,本发明实施例第二方面的第二种实现方式中,所述移动终端中包括至少两个麦克风;
所述处理模块具体用于,根据所述波束配置信息,调整每个麦克风采集声音信号的参数,使得所述移动终端中各麦克风录取到的声音信号合成后,仅存在所述录像对象所在方位的声音信号,得到所述录像对象所在方位的声音。
结合本发明实施例的第二方面至第二方面的第二种实现方式中任一种实现方式,本发明实施例第二方面的第三种实现方式中,所述移动终端还包括:
确定模块,用于对比录像画面中各对象与存储的预置对象,确定所述录像画面中与所述预置对象相同的对象为所述录像对象。
本发明实施例第三方面提供了一种移动终端,用于录取录像对象的声音,包括:
摄像头、麦克风、处理器和存储器;
在录像的过程中,摄像头通过人脸识别得到所述录像对象相对于所述移动终端的位置信息,所述位置信息包括所述录像对象相对于所述移动终端的角度信息和距离信息,同时,所述麦克风录取所述移动终端周围的声音信号;
所述摄像头将得到的所述位置信息传输给所述处理器;
通过调用所述存储器中存储的操作指令,所述处理器接收到所述位置信息后,将所述位置信息转换为波束配置信息,所述波束配置信息为波束赋形技术的输入参数;
所述处理器根据转换得到的所述波束配置信息,对所述麦克风录取到的声音信号进行波束赋形处理,使得所述录像对象所在方位的声音信号的信号强度被增强,其他方位的声音信号的信号强度被衰减,得到所述录像对象所在方位 的声音。
结合本发明实施例的第三方面,本发明实施例第三方面的第一种实现方式中,所述波束配置信息包括声源方位角度,波束方向和波束宽度;
所述处理器接收到所述位置信息后,将所述位置信息转换为波束配置信息具体包括:
所述处理器接收到所述位置信息后,将所述位置信息中录像对象相对于所述移动终端的角度信息转换为声源方位角度与波束方向;
将所述位置信息中录像对象相对于所述移动终端的距离信息转换为波束宽度,其中,距离越远,波束宽度越窄。
结合本发明实施例的第三方面或第三方面的第一种实现方式,本发明实施例第三方面的第二种实现方式中,所述麦克风为至少两个;
所述处理器根据转换得到的所述波束配置信息,对所述麦克风录取到的声音信号进行波束赋形处理具体包括:
所述处理器根据转换得到所述波束配置信息,调整每个麦克风采集声音信号的参数,使得各麦克风采集到的声音信号合成后,仅存在所述录像对象所在方位的声音信号。
结合本发明实施例的第三方面至第三方面的第二种实现方式,本发明实施例第三方面的第三种实现方式中,所述存储器中还存储有预置对象的信息;
所述摄像头通过人脸识别得到所述录像对象相对于所述移动终端的位置信息之前,所述处理器对比录像画面中各对象与存储的预置对象,确定所述录像画面中与所述预置对象相同的对象为所述录像对象。
从以上技术方案可以看出,本发明实施例具有以下优点:本发明实施例中移动终端通过人脸识别追踪获取录像对象的位置信息,将该位置信息转换为作为波束赋形技术的输入参数的波束配置信息,对录取到的声音信号进行波束赋形处理,使得录像对象所在方位的声音信号的信号强度被增强,其他方位的声音信号的信号强度被衰减,得到录像对象所在方位的声音,这样就避免了其他方位的声音对该录像对象的声音的影响,降低了录取的录像对象的声音中的背景噪声,提高了录音质量。
附图说明
图1为本发明实施例中录像中录音的方法一个流程示意图;
图2为本发明实施例中录像中录音的方法另一个流程示意图;
图3为本发明实施例中终端一个结构示意图;
图4为本发明实施例中终端另一个结构示意图;
图5为本发明实施例中终端另一个结构示意图;
图6为本发明实施例中终端另一个结构示意图。
具体实施方式
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚描述,显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。
术语“人脸识别”,表示基于人的脸部特征信息进行身份识别的一种生物识别技术。用摄像机或摄像头采集含有人脸的图像或视频流,并自动在图像中检测和跟踪人脸,进而对检测到的人脸进行脸部的一系列处理与识别的相关技术,通常也叫做人像识别、面部识别。
术语“波束赋形”又可称为空域滤波,是一种使用传感器阵列定向发送和接收信号的信号处理技术。波束赋形技术通过调整相位阵列的基本单元的参数,使得某些角度的信号获得相长干涉,而另一些角度的信号获得相消干涉。波束赋形既可以用于信号发射端,又可以用于信号接收端。在发射端,波束赋形器控制每一个发射装置的相位和信号幅度,从而在发射出的信号波阵中获得需要相长和相消干涉模式。在接收端,不同接收器接收到的信号被以一种恰当的方式组合起来,从而获得期盼中的信号辐射模式。
请参阅图1,本发明实施例中录取录像对象的声音的方法一个实施例包括:
101、移动终端在录像的过程中,通过人脸识别得到录像对象相对于所述移动终端的位置信息;
移动终端在录像的过程中,会录取到图像信号和声音信号,移动终端根据 录取到的图像信号,通过对该图像信号表示的录像画面进行人脸识别,得到录像对象相对于移动终端的位置信息,其中,该位置信息包括该录像对象相对于该移动终端的角度信息和距离信息。
可以理解的是,当录像对象在录像过程中位置变动时,该位置信息也会随着实时变化。
102、移动终端将所述位置信息转换为波束配置信息;
移动终端得到该位置信息后,将该位置信息转换为波束配置信息,其中,该波束配置信息为波束赋形技术的输入参数。
103、移动终端根据所述波束配置信息,对录取到的声音信号进行波束赋形处理,使得所述录像对象所在方位的声音信号的信号强度被增强,其他方位的声音信号的信号强度被衰减,得到所述录像对象所在方位的声音。
终端得到波束配置信息后,根据该波束配置信息,对录取到的声音信号进行波束赋形处理,使得所述录像对象所在方位的声音信号的信号强度被增强,其他方位的声音信号的信号强度被衰减,得到所述录像对象所在方位的声音。
本发明实施例中移动终端通过人脸识别获取录像对象的位置信息,将该位置信息转换为作为波束赋形技术的输入参数的波束配置信息,对录取到的声音信号进行波束赋形处理,使得录像对象所在方位的声音信号的信号强度被增强,其他方位的声音信号的信号强度被衰减,得到录像对象所在方位的声音,这样就避免了其他方位的声音对该录像对象的声音的影响,降低了录取的动态录像对象的声音中的背景噪声,提高了录音质量。
下面对本发明实施例中录取录像对象的声音的方法进行具体描述,请参阅图2,本发明实施例中录取录像对象的声音的方法另一个实施例包括:
201、在录像的过程中,移动终端对比录像画面中各对象与存储的预置对象,确定所述录像画面中与所述预置对象相同的对象为录像对象;
在录像过程中,移动终端会录取到图像信号和声音信号,移动终端根据录取到的图像信号,对比该图像信号表示的录像画面中各对象与存储的预置对象,确定录像画面中与该预置对象相同的对象为录像对象。
具体的,可以采用多种形式存储该预置对象,例如可以为包含有录像对象的图片,在录像之前,终端可以接收或存储包含有录像对象的图片,指定该图 片中的特定对象为录像对象;也可以直接输入预置对象的影像信息,还可以采用其它方式,此处不作限定。
除此之外,在录像过程中,移动终端还可以通过对象确认信息指定录像画面中的特定对象为录像对象,还可以有很多其他的方式,此处不做限定。
可以理解的是,在实际应用中,也可以不执行步骤201,移动终端可以将录像画面中所有人脸识别能识别的对象都自动作为录像对象,此处不做限定。
202、移动终端通过人脸识别得到所述录像对象相对于所述移动终端的位置信息;
移动终端确定录像对象后,通过对录像画面中的录像对象进行人脸识别,得到该录像对象相对于移动终端的位置信息,其中,该位置信息包括该录像对象相对于该移动终端的角度信息和距离信息。
可以理解的是,当录像对象在录像过程中位置变动时,该位置信息也会随着实时变化。
具体的,该位置信息可以包括人脸识别识别出的录像对象的人脸的角度和距离,可以理解的是,该位置信息还可以包括有其他的信息,例如运动趋势等等,此处不作限定。
203、移动终端将所述位置信息转换为波束配置信息;
移动终端得到该位置信息后,将该位置信息转换为波束配置信息,其中,该波束配置信息为波束赋形技术的输入参数。
具体的,该波束配置信息可以包括声源方位角度,波束方向和波束宽度,还可以包括更多的其他参数,例如采样率、麦克风间距、最大降噪量等,此处不作限定。
具体的,移动终端将所述位置信息转换为波束配置信息,可以为移动终端将录像对象相对于所述移动终端的角度信息转换为声源方位角度与波束方向,将录像对象相对于所述移动终端的距离信息转换为波束宽度,其中,距离越远,波束宽度越窄。
204、移动终端根据所述波束配置信息,调整每个麦克风采集声音信号的参数,使得所述移动移动终端中各麦克风采集到的声音信号合成后,仅存在所述录像对象所在方位的声音信号,得到所述录像对象所在方位的声音。
移动终端中包括至少两个麦克风,移动终端得到波束配置信息后,根据该波束配置信息,利用波束赋形技术,调整各麦克风采集声音信号的参数,增强该录像对象对应方位的声音信号的信号强度,衰减其他方位的声音信号的信号强度,使得移动终端中各麦克风获取到的声音信号合成后,仅存在所述录像对象所在方位的声音信号,得到所述录像对象所在方位的声音。
本发明实施例中,移动终端可以对比录像画面中各对象与存储的预置对象,确定录像画面中与预置对象相同的对象为录像对象,能更准确的对需要的录像对象的声音进行录取。
下面对本发明实施例中的移动终端进行描述,请参阅图3,本发明实施例中移动终端一个实施例包括:
识别模块301,用于在录像的过程中,通过人脸识别得到所述录像对象相对于所述移动终端的位置信息,所述位置信息包括所述录像对象相对于所述移动终端的角度信息和距离信息;
转换模块302,用于将所述识别模块301得到的位置信息转换为波束配置信息,所述波束配置信息为波束赋形技术的输入参数;
处理模块303,用于根据所述波束配置信息,对录取到的声音信号进行波束赋形处理,增益所述录像对象所在方位的声音信号的信号强度,衰减其他方位的声音信号的信号强度,得到所述录像对象所在方位的声音。
本发明实施例中识别模块301通过人脸识别追踪获取录像对象的位置信息,转换模块302将该位置信息转换为作为波束赋形技术的输入参数的波束配置信息,处理模块303对录取到的声音信号进行波束赋形处理,使得录像对象所在方位的声音信号的信号强度被增强,其他方位的声音信号的信号强度被衰减,得到录像对象所在方位的声音,这样就避免了其他方位的声音对该录像对象的声音的影响,降低了录取的动态录像对象的声音中的背景噪声,提高了录音质量。
上面实施例中,转换模块302将位置信息转换为波束配置信息,在实际应用中,该波束配置信息可以包括声源方位角度,波束方向和波束宽度,请参阅图4,作为本发明实施例中移动终端另一个实施例,上述移动终端中转换模块302具体包括:
第一转换单元401,用于将所述录像对象相对于所述移动终端的角度信息转换为声源方位角度与波束方向;
第二转换单元402,用于将所述录像对象相对于所述移动终端的距离信息转换为波束宽度,其中,距离越远,波束宽度越窄。
具体的,该移动终端中包括至少两个麦克风,该处理模块303具体可以用于,根据所述波束配置信息,调整每个麦克风采集声音信号的参数,使得所述终端中各麦克风录取到的声音信号合成后,仅存在所述录像对象所在方位的声音信号,得到所述录像对象所在方位的声音。
本实施例中,转换模块302将人脸识别得到的位置信息中特定参数转换为波束配置信息中的对应参数,进一步的,处理模块303可以根据波束配置信息调整各麦克风的参数,使得移动终端中各麦克风录取到的声音信号合成后,仅存在所述录像对象所在方位的声音信号,实现了仅对录像对象对应方位的声音的录取。
上面实施例中,识别模块301通过人脸识别对录像对象进行追踪,在实际应用中,该录像对象可以确定为出现在录像中的任意对象,也可以为预先存储的预置对象,请参阅图5,作为本发明实施例中移动终端另一个实施例,上述移动终端还包括:
确定模块501,用于对比录像画面中各对象与存储的预置对象,确定所述录像画面中与所述预置对象相同的对象为所述录像对象。
本发明实施例中,确定模块501可以根据存储的预置对象,对比确定出录像对象,能更准确的对需要的录像对象的声音进行录取。
请参阅图6,本发明实施例中移动终端600另一个实施例包括:
摄像头601、麦克风602、处理器603和存储器604;
该移动终端中还可以包括RF电路605、音频电路606、扬声器607、电源管理芯片608、输入/输出(I/O)子系统609、其他输入/控制设备610、外设接口611以及外部端口612,这些部件通过一个或多个通信总线或信号线613来通信。
其中,摄像头601可以通过外设接口611与处理器603相连,麦克风602可以通过外设接口611与音频电路606和处理器603相连。
值得说明的是,本实施例提供的移动终端的一个示例,本发明实施例涉及 的移动终端可以具有比图7所示出的更多或更少的部件,可以组合两个或更多个部件,或者可以具有不同的部件配置或设置,各个部件可以在包括一个或多个信号处理和/或专用集成电路在内的硬件、软件或硬件和软件的组合实现。
下面就本实施例提供的移动终端进行详细的描述。
存储器604:所述存储器604可以被CPU603、外设接口611等访问,所述存储器604可以包括高速随机存取存储器,还可以包括非易失性存储器,例如一个或多个磁盘存储器件、闪存器件、或其他易失性固态存储器件。
外设接口611,所述外设接口可以将设备的输入和输出外设连接到CPU603和存储器604。
I/O子系统609:所述I/O子系统609可以将设备上的输入输出外设,例如触摸屏614和其他输入/控制设备610,连接到外设接口611。I/O子系统609可以包括显示控制器6091和用于控制其他输入/控制设备610的一个或多个输入控制器6092。其中,一个或多个输入控制器6092从其他输入/控制设备610接收电信号或者向其他输入/控制设备610发送电信号,其他输入/控制设备610可以包括物理按钮(按压按钮、摇臂按钮等)、拨号盘、滑动开关、操纵杆、点击滚轮。值得说明的是,输入控制器6092可以与以下任一个连接:键盘、红外端口、USB接口以及诸如鼠标的指示设备。
触摸屏614:所述触摸屏614是移动终端与用户之间的输入接口和输出接口,将可视输出显示给用户,可视输出可以包括图形、文本、图标、视频等。
I/O子系统609中的显示控制器6091从触摸屏614接收电信号或者向触摸屏614发送电信号。触摸屏614检测触摸屏上的接触,显示控制器6091将检测到的接触转换为与显示在触摸屏614上的用户界面对象的交互,即实现人机交互,显示在触摸屏614上的用户界面对象可以是运行游戏的图标、联网到相应网络的图标等。值得说明的是,设备还可以包括光鼠,光鼠是不显示可视输出的触摸敏感表面,或者是由触摸屏形成的触摸敏感表面的延伸。
RF电路605,主要用于建立移动终端与无线网络(即网络侧)的通信,实现移动终端与无线网络的数据接收和发送。例如收发短信息、电子邮件等。具体地,RF电路605接收并发送RF信号,RF信号也称为电磁信号,RF电路605将电信号转换为电磁信号或将电磁信号转换为电信号,并且通过该电磁信号与 通信网络以及其他设备进行通信。RF电路605可以包括用于执行这些功能的已知电路,其包括但不限于天线系统、RF收发机、一个或多个放大器、调谐器、一个或多个振荡器、数字信号处理器、CODEC芯片组、用户标识模块(Subscriber Identity Module,SIM)等等。
音频电路606,主要用于从外设接口604接收音频数据,将该音频数据转换为电信号,并且可以将该电信号发送给扬声器607。
扬声器607,用于将手机通过RF电路605从无线网络接收的语音信号,还原为声音并向用户播放该声音。
电源管理芯片608,用于为CPU603、I/O子系统609及外设接口611所连接的硬件进行供电及电源管理。
具体的,在录像的过程中,摄像头601通过人脸识别得到录像对象相对于移动终端的位置信息,所述位置信息包括所述录像对象相对于所述移动终端的角度信息和距离信息,同时,所述麦克风602录取所述移动终端周围的声音信号;
所述摄像头601将得到的所述位置信息传输给所述处理器603;
通过调用所述存储器601中存储的操作指令,所述处理器603接收到所述位置信息后,将所述位置信息转换为波束配置信息,所述波束配置信息为波束赋形技术的输入参数;
所述处理器603根据转换得到的所述波束配置信息,对所述麦克风602录取到的声音信号进行波束赋形处理,使得所述录像对象所在方位的声音信号的信号强度被增强,其他方位的声音信号的信号强度被衰减,得到所述录像对象所在方位的声音。
可选的,所述波束配置信息包括声源方位角度,波束方向和波束宽度,该处理器603接收到所述位置信息后,可以将所述位置信息中录像对象相对于所述移动终端的角度信息转换为声源方位角度与波束方向;将所述位置信息中录像对象相对于所述移动终端的距离信息转换为波束宽度,其中,距离越远,波束宽度越窄。
可选的,该麦克风602为至少两个,该处理器603可以根据转换得到所述波束配置信息,调整每个麦克风602采集声音信号的参数,使得各麦克风602 采集到的声音信号合成后,仅存在所述录像对象所在方位的声音信号。
可选的,该存储器604中还存储有预置对象的信息,该摄像头601通过人脸识别得到录像对象相对于移动终端的位置信息之前,处理器603可以对比录像画面中各对象与存储的预置对象,确定该录像画面中与预置对象相同的对象为录像对象。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统,装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
在本申请所提供的几个实施例中,应该理解到,所揭露的系统,装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本发明各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。
所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本发明的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本发明各个实施例所述方法的全部或部分步骤。而前述 的存储介质包括:U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。
以上所述,以上实施例仅用以说明本发明的技术方案,而非对其限制;尽管参照前述实施例对本发明进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本发明各实施例技术方案的精神和范围。

Claims (12)

  1. 一种录取录像对象的声音的方法,其特征在于,包括:
    移动终端在录像的过程中,通过人脸识别得到所述录像对象相对于所述移动终端的位置信息,所述位置信息包括所述录像对象相对于所述移动终端的角度信息和距离信息;
    所述移动终端将所述位置信息转换为波束配置信息,所述波束配置信息为波束赋形技术的输入参数;
    所述移动终端根据所述波束配置信息,对录取到的声音信号进行波束赋形处理,使得所述录像对象所在方位的声音信号的信号强度被增强,其他方位的声音信号的信号强度被衰减,得到所述录像对象所在方位的声音。
  2. 根据权利要求1所述的方法,其特征在于,所述波束配置信息包括声源方位角度,波束方向和波束宽度;
    所述移动终端将所述位置信息转换为波束配置信息具体包括:
    所述移动终端将所述录像对象相对于所述终端的角度信息转换为声源方位角度与波束方向;
    所述移动终端将所述录像对象相对于所述终端的距离信息转换为波束宽度,其中,距离越远,波束宽度越窄。
  3. 根据权利要求1或2所述的方法,其特征在于,所述移动终端中包括至少两个麦克风;
    所述移动终端根据所述波束配置信息,对录取到的声音信号进行波束赋形处理具体包括:
    所述移动终端根据所述波束配置信息,调整每个麦克风采集声音信号的参数,使得所述移动终端中各麦克风采集到的声音信号合成后,仅存在所述录像对象所在方位的声音信号。
  4. 根据权利要求1至3中任一项所述的方法,其特征在于,所述通过人脸识别对录像对象进行追踪的步骤之前还包括:
    所述移动终端对比录像画面中各对象与存储的预置对象,确定所述录像画面中与所述预置对象相同的对象为所述录像对象。
  5. 一种移动终端,用于录取录像对象的声音,其特征在于,包括:
    识别模块,用于在录像的过程中,通过人脸识别得到所述录像对象相对于所述移动终端的位置信息,所述位置信息包括所述录像对象相对于所述移动终端的角度信息和距离信息;
    转换模块,用于将所述识别模块得到的位置信息转换为波束配置信息,所述波束配置信息为波束赋形技术的输入参数;
    处理模块,用于根据所述波束配置信息,对录取到的声音信号进行波束赋形处理,增益所述录像对象所在方位的声音信号的信号强度,衰减其他方位的声音信号的信号强度,得到所述录像对象所在方位的声音。
  6. 根据权利要求5所述的移动终端,其特征在于,所述波束配置信息包括声源方位角度,波束方向和波束宽度;
    所述转换模块具体包括:
    第一转换单元,用于将所述录像对象相对于所述移动终端的角度信息转换为声源方位角度与波束方向;
    第二转换单元,用于将所述录像对象相对于所述移动终端的距离信息转换为波束宽度,其中,距离越远,波束宽度越窄。
  7. 根据权利要求5或6所述的移动终端,其特征在于,所述移动终端中包括至少两个麦克风;
    所述处理模块具体用于,根据所述波束配置信息,调整每个麦克风采集声音信号的参数,使得所述移动终端中各麦克风录取到的声音信号合成后,仅存在所述录像对象所在方位的声音信号,得到所述录像对象所在方位的声音。
  8. 根据权利要求5至7中任一项所述的移动终端,其特征在于,所述移动终端还包括:
    确定模块,用于对比录像画面中各对象与存储的预置对象,确定所述录像画面中与所述预置对象相同的对象为所述录像对象。
  9. 一种移动终端,用于录取录像对象的声音,其特征在于,包括:
    摄像头、麦克风、处理器和存储器;
    在录像的过程中,摄像头通过人脸识别得到所述录像对象相对于所述移动终端的位置信息,所述位置信息包括所述录像对象相对于所述移动终端的角度信息和距离信息,同时,所述麦克风录取所述移动终端周围的声音信号;
    所述摄像头将得到的所述位置信息传输给所述处理器;
    通过调用所述存储器中存储的操作指令,所述处理器接收到所述位置信息后,将所述位置信息转换为波束配置信息,所述波束配置信息为波束赋形技术的输入参数;
    所述处理器根据转换得到的所述波束配置信息,对所述麦克风录取到的声音信号进行波束赋形处理,使得所述录像对象所在方位的声音信号的信号强度被增强,其他方位的声音信号的信号强度被衰减,得到所述录像对象所在方位的声音。
  10. 根据权利要求9所述的移动终端,其特征在于,所述波束配置信息包括声源方位角度,波束方向和波束宽度;
    所述处理器接收到所述位置信息后,将所述位置信息转换为波束配置信息具体包括:
    所述处理器接收到所述位置信息后,将所述位置信息中录像对象相对于所述移动终端的角度信息转换为声源方位角度与波束方向;
    将所述位置信息中录像对象相对于所述移动终端的距离信息转换为波束宽度,其中,距离越远,波束宽度越窄。
  11. 根据权利要求9或10所述的移动终端,其特征在于,所述麦克风为至少两个;
    所述处理器根据转换得到的所述波束配置信息,对所述麦克风录取到的声音信号进行波束赋形处理具体包括:
    所述处理器根据转换得到所述波束配置信息,调整每个麦克风采集声音信号的参数,使得各麦克风采集到的声音信号合成后,仅存在所述录像对象所在方位的声音信号。
  12. 根据权利要求9至11中任一项所述的移动终端,其特征在于,所述存储器中还存储有预置对象的信息;
    所述摄像头通过人脸识别得到所述录像对象相对于所述移动终端的位置信息之前,所述处理器对比录像画面中各对象与存储的预置对象,确定所述录像画面中与所述预置对象相同的对象为所述录像对象。
PCT/CN2014/092534 2014-11-28 2014-11-28 录取录像对象的声音的方法和移动终端 WO2016082199A1 (zh)

Priority Applications (3)

Application Number Priority Date Filing Date Title
PCT/CN2014/092534 WO2016082199A1 (zh) 2014-11-28 2014-11-28 录取录像对象的声音的方法和移动终端
CN201480083698.6A CN107004426B (zh) 2014-11-28 2014-11-28 录取录像对象的声音的方法和移动终端
US15/607,124 US10062393B2 (en) 2014-11-28 2017-05-26 Method for recording sound of video-recorded object and mobile terminal

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2014/092534 WO2016082199A1 (zh) 2014-11-28 2014-11-28 录取录像对象的声音的方法和移动终端

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US15/607,124 Continuation US10062393B2 (en) 2014-11-28 2017-05-26 Method for recording sound of video-recorded object and mobile terminal

Publications (1)

Publication Number Publication Date
WO2016082199A1 true WO2016082199A1 (zh) 2016-06-02

Family

ID=56073393

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2014/092534 WO2016082199A1 (zh) 2014-11-28 2014-11-28 录取录像对象的声音的方法和移动终端

Country Status (3)

Country Link
US (1) US10062393B2 (zh)
CN (1) CN107004426B (zh)
WO (1) WO2016082199A1 (zh)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110602424A (zh) * 2019-08-28 2019-12-20 维沃移动通信有限公司 视频处理方法及电子设备
CN112165591A (zh) * 2020-09-30 2021-01-01 联想(北京)有限公司 一种音频数据的处理方法、装置及电子设备
CN112165590A (zh) * 2020-09-30 2021-01-01 联想(北京)有限公司 视频的录制实现方法、装置及电子设备
CN113676593A (zh) * 2021-08-06 2021-11-19 Oppo广东移动通信有限公司 视频录制方法、装置、电子设备及存储介质
CN116055869A (zh) * 2022-05-30 2023-05-02 荣耀终端有限公司 一种视频处理方法和终端
WO2023202431A1 (zh) * 2022-04-19 2023-10-26 华为技术有限公司 一种定向拾音方法及设备

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110121048A (zh) * 2018-02-05 2019-08-13 青岛海尔多媒体有限公司 一种会议一体机的控制方法及控制系统和会议一体机
CN110740259B (zh) * 2019-10-21 2021-06-25 维沃移动通信有限公司 视频处理方法及电子设备
CN113994426B (zh) * 2020-05-28 2023-08-01 深圳市大疆创新科技有限公司 音频处理方法、电子设备及计算机可读存储介质
CN111970625B (zh) * 2020-08-28 2022-03-22 Oppo广东移动通信有限公司 录音方法和装置、终端和存储介质
CN113014844A (zh) * 2021-02-08 2021-06-22 Oppo广东移动通信有限公司 一种音频处理方法、装置、存储介质及电子设备
CN115225840A (zh) * 2021-04-17 2022-10-21 华为技术有限公司 一种视频录制方法和电子设备
CN115272839A (zh) * 2021-04-29 2022-11-01 华为技术有限公司 一种收音方法、装置及相关电子设备

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070265850A1 (en) * 2002-06-03 2007-11-15 Kennewick Robert A Systems and methods for responding to natural language speech utterance
US7873349B1 (en) * 2009-10-06 2011-01-18 Sur-Tec, Inc. System, method, and device for intelligence gathering and position tracking
CN102685339A (zh) * 2011-03-04 2012-09-19 米特尔网络公司 音频会议电话的主持人模式
CN102903362A (zh) * 2011-09-02 2013-01-30 微软公司 集成的本地和基于云的语音识别

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6449593B1 (en) * 2000-01-13 2002-09-10 Nokia Mobile Phones Ltd. Method and system for tracking human speakers
KR100677554B1 (ko) * 2005-01-14 2007-02-02 삼성전자주식회사 비임형성 방식을 이용한 녹음 장치 및 그 방법
DE202013005408U1 (de) * 2012-06-25 2013-10-11 Lg Electronics Inc. Mikrophonbefestigungsanordnung eines mobilen Endgerätes
EP2680616A1 (en) * 2012-06-25 2014-01-01 LG Electronics Inc. Mobile terminal and audio zooming method thereof
CN104049721B (zh) * 2013-03-11 2019-04-26 联想(北京)有限公司 信息处理方法及电子设备
CN103414988B (zh) 2013-05-21 2016-11-23 杭州联汇科技股份有限公司 一种室内扩声录音设备及语音追踪调整方法
CN103679155A (zh) * 2013-12-30 2014-03-26 深圳锐取信息技术股份有限公司 多目标自动跟踪系统及方法

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070265850A1 (en) * 2002-06-03 2007-11-15 Kennewick Robert A Systems and methods for responding to natural language speech utterance
US7873349B1 (en) * 2009-10-06 2011-01-18 Sur-Tec, Inc. System, method, and device for intelligence gathering and position tracking
CN102685339A (zh) * 2011-03-04 2012-09-19 米特尔网络公司 音频会议电话的主持人模式
CN102903362A (zh) * 2011-09-02 2013-01-30 微软公司 集成的本地和基于云的语音识别

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110602424A (zh) * 2019-08-28 2019-12-20 维沃移动通信有限公司 视频处理方法及电子设备
CN112165591A (zh) * 2020-09-30 2021-01-01 联想(北京)有限公司 一种音频数据的处理方法、装置及电子设备
CN112165590A (zh) * 2020-09-30 2021-01-01 联想(北京)有限公司 视频的录制实现方法、装置及电子设备
CN112165591B (zh) * 2020-09-30 2022-05-31 联想(北京)有限公司 一种音频数据的处理方法、装置及电子设备
CN112165590B (zh) * 2020-09-30 2022-05-31 联想(北京)有限公司 视频的录制实现方法、装置及电子设备
CN113676593A (zh) * 2021-08-06 2021-11-19 Oppo广东移动通信有限公司 视频录制方法、装置、电子设备及存储介质
CN113676593B (zh) * 2021-08-06 2022-12-06 Oppo广东移动通信有限公司 视频录制方法、装置、电子设备及存储介质
WO2023202431A1 (zh) * 2022-04-19 2023-10-26 华为技术有限公司 一种定向拾音方法及设备
CN116055869A (zh) * 2022-05-30 2023-05-02 荣耀终端有限公司 一种视频处理方法和终端
CN116055869B (zh) * 2022-05-30 2023-10-20 荣耀终端有限公司 一种视频处理方法和终端

Also Published As

Publication number Publication date
US20170263264A1 (en) 2017-09-14
US10062393B2 (en) 2018-08-28
CN107004426A (zh) 2017-08-01
CN107004426B (zh) 2020-09-11

Similar Documents

Publication Publication Date Title
WO2016082199A1 (zh) 录取录像对象的声音的方法和移动终端
CN110740259B (zh) 视频处理方法及电子设备
WO2020078237A1 (zh) 音频处理方法和电子设备
US20180227658A1 (en) Headset
US20210217433A1 (en) Voice processing method and apparatus, and device
US20130262687A1 (en) Connecting a mobile device as a remote control
EP4131931A1 (en) Image capturing method and electronic device
KR20210017229A (ko) 오디오 줌 기능을 갖는 전자 장치 및 이의 동작 방법
CN113132863B (zh) 立体声拾音方法、装置、终端设备和计算机可读存储介质
CN113676592B (zh) 录音方法、装置、电子设备及计算机可读介质
CN109029252B (zh) 物体检测方法、装置、存储介质及电子设备
CN114697812A (zh) 声音采集方法、电子设备及系统
CN110572600A (zh) 一种录像处理方法及电子设备
JP2022545933A (ja) 目標ユーザのロック方法および電子デバイス
CN113918004A (zh) 手势识别方法及其装置、介质和系统
CN108600623B (zh) 重聚焦显示方法以及终端设备
KR20130054131A (ko) 디스플레이장치 및 그 제어방법
CN113395451B (zh) 视频拍摄方法、装置、电子设备以及存储介质
US11368611B2 (en) Control method for camera device, camera device, camera system, and storage medium
CN113542597B (zh) 对焦方法以及电子设备
US10152985B2 (en) Method for recording in video chat, and terminal
US20220337945A1 (en) Selective sound modification for video communication
WO2021159943A1 (zh) 一种拍摄控制方法、装置及终端设备
CN114943242A (zh) 事件检测方法及装置、电子设备、存储介质
CN108174101B (zh) 一种拍摄方法和装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 14906880

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 14906880

Country of ref document: EP

Kind code of ref document: A1