WO2020078237A1 - 音频处理方法和电子设备 - Google Patents

音频处理方法和电子设备 Download PDF

Info

Publication number
WO2020078237A1
WO2020078237A1 PCT/CN2019/110095 CN2019110095W WO2020078237A1 WO 2020078237 A1 WO2020078237 A1 WO 2020078237A1 CN 2019110095 W CN2019110095 W CN 2019110095W WO 2020078237 A1 WO2020078237 A1 WO 2020078237A1
Authority
WO
WIPO (PCT)
Prior art keywords
audio signal
microphone
audio
electronic device
target
Prior art date
Application number
PCT/CN2019/110095
Other languages
English (en)
French (fr)
Inventor
陶凯
鲍光照
陈松
尹明婕
缪海波
胡伟湘
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2020078237A1 publication Critical patent/WO2020078237A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R3/00Circuits for transducers, loudspeakers or microphones
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R3/00Circuits for transducers, loudspeakers or microphones
    • H04R3/04Circuits for transducers, loudspeakers or microphones for correcting frequency response
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S3/00Systems employing more than two channels, e.g. quadraphonic
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S7/00Indicating arrangements; Control arrangements, e.g. balance control

Definitions

  • the present application relates to the field of electronic technology, in particular to an audio processing method and electronic equipment.
  • the recording application is one of the most important multimedia video and audio experiences for users of electronic devices. Due to the complexity of recording scenes and the diversity of users' recording purposes, users have diverse needs for recording effects in different scenarios. For example, in scenes such as classrooms and conferences, in order to improve the clarity of recordings, the speaker ’s voice needs to be enhanced while other noise interference is attenuated. For another example, in music recording occasions such as classical musical instrument performance, the fidelity of recording is emphasized to avoid sound quality damage caused by excessive processing. As another example, in near-field vocal recording scenes such as self-timer recording and live broadcasting, the far-field sound needs to be weakened to ensure that the near-field sound is clean and clear.
  • Parameter processing may include, for example, digital filtering, gain control, and equalizer (EQ) frequency response control.
  • the user can select various recording modes on the electronic device.
  • the recording mode can include "conference mode” corresponding to scenes such as classrooms and meetings, "music mode” corresponding to scenes in music recording occasions, “vocal mode” corresponding to scenes in near-field recording, and “interview mode” corresponding to interviews and interviews , The corresponding "distant mode” when the recorded target is far away, the corresponding "natural environment mode” set when the recorded target is the natural environment, etc.
  • the user can select different modes on the electronic device to adapt to different recording scenes and different recording purposes.
  • the user selection mode may be touch selection through a touch screen, or remote control selection using a remote control device corresponding to the electronic device.
  • More and more recording modes increase the complexity of user operations, and the refinement of recording scenes makes it difficult for users to understand different recording scenes and is prone to mistakes in selecting scenes, thereby increasing the complexity of determining recording scenes.
  • the technical solution of the present application discloses an audio processing method and an electronic device, which can improve the convenience of audio processing strategy selection.
  • the technical solution of the present application provides an audio processing method, the method includes: performing image recognition on a first image acquired by a camera component to obtain a target type of the captured object in the first image, the captured image The orientation of the target relative to the microphone and the distance of the subject relative to the microphone; according to the target type of the subject, the orientation of the subject relative to the microphone and the relative position of the subject relative to the microphone The distance of the microphone determines the audio processing strategy; the audio signal picked up by the microphone is processed according to the audio processing strategy.
  • the audio processing strategy for processing the audio signal picked up by the microphone can be determined by the method of image recognition. Therefore, the convenience of audio processing strategy selection can be improved, and the processing effect of audio signals can be improved.
  • the determining an audio processing strategy according to the target type of the subject, the orientation of the subject relative to the microphone, and the distance of the subject relative to the microphone includes: Determine the orientation of the spatial enhancement according to the orientation of the subject relative to the microphone; determine the filter according to the target type of the subject; based on the target type of the subject and the subject relative to the microphone The distance determines the first gain control curve and the first equalizer frequency response curve; the audio processing strategy includes the spatially enhanced orientation, the filter, the first gain control curve and the first equalizer frequency Sound curve.
  • the technical solution of the present application may not limit the order in which the processor in the electronic device determines the spatially enhanced orientation, the filter, the first gain control curve, the first gain control curve, and the first equalizer frequency response curve.
  • the processing of the audio signal picked up by the microphone according to the audio processing strategy includes: performing spatial enhancement, filtering, and gain on the audio signal picked up by the microphone according to the audio processing strategy Control and equalizer frequency response control.
  • performing spatial enhancement, filtering, gain control, and equalizer frequency response control on the audio signal picked up by the microphone according to the audio processing strategy includes: enhancing the original audio signal in the spatial Perform spatial enhancement in the direction of to obtain the first audio signal; the original audio signal is the audio signal picked up by the microphone; use the filter to filter the first audio signal to obtain a second audio signal; use the The first gain control curve performs gain control on the second audio signal to obtain a third audio signal; the first equalizer frequency response curve is used to perform equalizer frequency response control on the third audio signal to obtain a fourth audio signal.
  • the processor in the electronic device may also use image recognition to obtain an image scene as a recording scene.
  • the processor in the electronic device may determine the first gain control curve and the first equalizer frequency response curve according to one or more of the following: the target type of the recorded target, the recording scene, and the distance between the recorded target and the microphone.
  • the determining the spatially enhanced orientation according to the orientation of the subject relative to the microphone includes determining the orientation of the subject relative to the microphone as the audio signal picked up by the microphone Spatially enhanced orientation; the determination of the filter according to the target type of the captured target includes: obtaining the filter from the first mapping table according to the target type of the captured target; wherein, the first mapping table contains multiple Target types and filters corresponding to each target type in the multiple target types; the multiple target types include the target type of the photographed target.
  • the determining the first gain control curve according to the target type of the photographed target and the distance of the photographed target relative to the microphone may be: according to the target type of the photographed target and the Obtain the first gain control curve from the second mapping table for the distance of the captured target relative to the microphone; where the second mapping table contains multiple target types, multiple distances, and target type i and The gain control curve corresponding to the distance j; wherein, the target type i is any one of the target types, and the distance j is any one of the multiple distances; the multiple target types Contains the target type of the subject, and the plurality of distances includes the distance of the subject relative to the microphone.
  • the determining the first gain control curve according to the target type of the photographed target and the distance of the photographed target relative to the microphone may also be: according to the target type of the photographed target Obtain the second gain control curve from the three mapping tables; wherein, the third mapping table contains multiple target types and the gain control curve corresponding to each target type in the multiple target types; the multiple target types include The target type of the photographed target; obtaining a first gain compensation curve from a fourth mapping table according to the distance of the photographed target relative to the microphone; wherein, the fourth mapping table contains multiple distances and all A gain compensation curve corresponding to each of the plurality of distances; the plurality of distances includes the distance of the subject relative to the microphone.
  • the determining the first EQ frequency response curve according to the target type of the photographed target and the distance of the photographed target relative to the microphone may be: according to the target type and Obtain the first EQ frequency response curve from the fifth mapping table for the distance of the photographed target relative to the microphone; wherein, the fifth mapping table contains multiple target types, multiple distances, and the The EQ frequency response curve corresponding to the target type i and the distance j; wherein, the target type i is any one of the multiple target types, and the distance j is any one of the multiple distances;
  • the plurality of target types include the target type of the photographed target, and the plurality of distances include the distance of the photographed target relative to the microphone.
  • the determining the first EQ frequency response curve according to the target type of the photographed target and the distance of the photographed target relative to the microphone may be: according to the target type of the photographed target Obtaining the second EQ frequency response curve from the six mapping tables; wherein, the sixth mapping table contains multiple target types and the EQ frequency response curve corresponding to each target type in the multiple target types; the multiple targets The type includes the target type of the photographed target; the first EQ frequency response compensation curve is obtained from a seventh mapping table according to the distance of the photographed target relative to the microphone; wherein, the seventh mapping table contains multiple Distances and the EQ frequency response compensation curve corresponding to each distance in the plurality of distances; the plurality of distances include the distance of the subject relative to the microphone.
  • the first image acquired by the camera component is subjected to image recognition to obtain the target type of the subject, the orientation of the subject relative to the microphone, and the relative position of the subject
  • the distance of the microphone may be specifically implemented as follows: performing image recognition on the first image to obtain the image content of the subject, and obtaining the image content of the subject from the eighth mapping table according to the image content of the subject Target type; wherein, the eighth mapping table contains a plurality of image content and the target type corresponding to each image content in the plurality of image content; the plurality of image content contains the image content of the photographed target; Obtain the distance of the subject relative to the microphone from the ninth mapping table according to the image content of the subject and the size of the two-dimensional frame obtained by focusing the subject in the first image; , The ninth mapping table contains multiple image contents, multiple two-dimensional frame sizes, and the distance corresponding to the image content k and the two-dimensional frame size l; where, The image content k is any one of the plurality of image contents, and the two-dimensional
  • the two-dimensional image frame obtained by focusing the subject can be realized by using the principle of autofocus in an electronic device.
  • the focusing of the subject can also be achieved in response to the user's manual focusing operation, that is, the two-dimensional image frame obtained by focusing the subject can also be obtained in response to the user's manual focusing operation.
  • the distance of the photographed target relative to the microphone can also be determined using multi-camera distance measurement.
  • Z ft / d is used to determine the distance from the subject to the camera.
  • Z is the distance from the subject to the camera
  • f is the focal length of the two cameras
  • d is the distance difference between the coordinate positions of the subject on the images of the two cameras
  • t is the physical distance between the two cameras.
  • the distance between the subject and the electronic device is far enough, the distance between the camera and the microphone can be ignored, then no coordinate system conversion is required, and the distance between the subject and the camera can be directly used as the distance between the subject and the microphone , The orientation of the subject relative to the camera is taken as the orientation of the subject relative to the microphone.
  • the distance between the subject and the microphone can also be adopted in other ways, for example, using structured light ranging.
  • the technical solution of the present application does not limit the measurement method of the distance between the subject and the microphone.
  • the method further includes: picking up the microphone The fourth audio signal is superimposed with the fourth audio signal to obtain a fifth audio signal; the fourth audio signal is an audio signal obtained after the audio signal picked up by the microphone undergoes spatial enhancement, filtering, gain control, and EQ frequency response control.
  • the fifth audio signal may be an audio signal used for audio output after the processing is completed.
  • the process of processing the audio signal picked up by the microphone does not limit the sequence of spatial enhancement, filtering, gain control, and EQ frequency response control.
  • performing spatial enhancement, filtering, gain control, and EQ frequency response control on the audio signal picked up by the microphone according to the audio processing strategy may be specifically implemented as follows: according to the audio picked up by the microphone The signal determines the original audio signal of each channel of the multiple channels; the original audio signal of each channel is spatially enhanced, filtered, gain controlled, and EQ frequency response controlled according to the audio processing strategy.
  • the left and right channels separately perform spatial enhancement, filtering, gain control, and EQ frequency response control.
  • the audio signal processing and playback between the two channels do not affect each other, thereby improving the stereoscopic effect of the output audio signal.
  • the method before processing the audio signal picked up by the microphone according to the audio processing strategy, the method further includes: displaying the audio processing strategy; the method according to the audio processing strategy
  • the processing of the audio signal picked up by the microphone includes: in response to the user's operation of the audio processing strategy, performing spatial enhancement, filtering, gain control, and EQ frequency response on the audio signal picked up by the microphone according to the audio processing strategy control.
  • the user's confirmation of the audio processing strategy automatically obtained by the electronic device can improve the accuracy and convenience of audio processing strategy recognition.
  • the processor in the electronic device may perform processing on the audio signal picked up by the microphone when starting to record audio and video. It can process the audio signal picked up by the sound pickup component in real time. In the scene of recording and broadcasting, the audio processing strategy can be automatically selected in real time, which improves the convenience of audio processing strategy selection and improves the target type of different objects or Effects of audio signal processing in different recording scenarios.
  • the processor in the electronic device may perform processing on the audio signal picked up by the microphone after recording audio and video ends. It can reduce the occupation of the processor in the process of recording audio and video, improve the smoothness of the audio and video recording process, improve the convenience of audio processing strategy selection, and improve the processing effect of the audio signal under different recording target types or different recording scenarios .
  • the processor in the electronic device may also perform processing on the audio signal picked up by the microphone when the audio and video recording ends and the recorded audio and video signal is stored in the memory. Can reduce the occupation of the processor in the process of recording audio and video, improve the smoothness of the audio and video recording process. In this way, the audio signal picked up by the microphone is processed only when the audio and video signals need to be saved and recorded, thereby reducing the waste of processor resources when the audio and video signals are not saved and recorded, thereby saving processor resources.
  • the technical solution of the present application provides an audio processing method, the method includes: performing image recognition on a first image acquired by a camera component to obtain a target type of a captured object in the first image; according to the The target type of the shooting target determines the filter; the filter is used to filter the audio signal picked up by the microphone.
  • an image recognition method may be used to determine a filter for processing the audio signal picked up by the microphone. Therefore, the convenience of audio processing strategy selection can be improved, and the processing effect of audio signals can be improved.
  • the method before using the filter to filter the audio signal picked up by the microphone, the method further includes: obtaining the orientation of the photographed target relative to the microphone according to the image recognition ;
  • the original audio signal is spatially enhanced in the orientation of the subject relative to the microphone to obtain a first audio signal; the original audio signal is the audio signal picked up by the microphone; the use of the filter pair
  • the filtering of the audio signal picked up by the microphone may be specifically implemented as: filtering the first audio signal using the filter to obtain a second audio signal.
  • an image recognition method can also be used to determine the direction of spatial enhancement, so that the processing effect of the audio signal can be further improved.
  • performing spatial enhancement before filtering the audio signal can increase the proportion of the audio signal from the captured object in the processed audio signal and reduce the proportion of noise, thereby improving the processing effect on the audio signal.
  • the technical solution of the present application does not limit the orientation of the spatial enhancement determined by the processor in the electronic device and the order of the filters.
  • the method further includes: obtaining the distance of the subject relative to the microphone according to the image recognition; according to the target type of the subject and the subject With respect to the distance of the microphone, determine the first gain control curve and the first equalizer frequency response curve; after filtering the audio signal picked up by the microphone using the filter, the method further includes: using the The first gain control curve performs gain control on the second audio signal to obtain a third audio signal; the second audio signal is an audio signal obtained by filtering the audio signal picked up by the microphone by the filter; The first equalizer frequency response curve performs equalizer frequency response control on the third audio signal to obtain a fourth audio signal.
  • performing spatial enhancement and filtering on the audio signal before gain control and EQ control can increase the proportion of the audio signal from the subject in the processed audio signal, reduce the proportion of noise, and thus improve the audio frequency.
  • Signal processing effect can increase the proportion of the audio signal from the subject in the processed audio signal, reduce the proportion of noise, and thus improve the audio frequency.
  • the technical solution of the present application does not limit the order in which the processor in the electronic device determines the spatially enhanced orientation, the filter, the first gain control curve, and the first equalizer frequency response curve.
  • the processor in the electronic device may also use image recognition to obtain an image scene as a recording scene.
  • the processor in the electronic device may determine the first gain control curve and the first equalizer frequency response curve according to one or more of the following: the target type of the recorded target, the recording scene, and the distance between the recorded target and the microphone.
  • the determining the filter according to the target type of the photographed target may be specifically implemented as follows: acquiring the filter from the first mapping table according to the target type of the photographed target; Wherein, the first mapping table includes multiple target types and filters corresponding to each target type in the multiple target types; the multiple target types include the target type of the photographed target.
  • the determining the first gain control curve according to the target type of the subject and the distance of the subject relative to the microphone may be: according to the subject Obtain the first gain control curve from the second mapping table for the target type of the shooting target and the distance of the shooting target relative to the microphone; the second mapping table contains multiple target types and multiple distances , And the gain control curve corresponding to the target type i and the distance j; wherein, the target type i is any one of the multiple target types, and the distance j is any one of the multiple distances;
  • the plurality of target types include the target type of the photographed target, and the plurality of distances include the distance of the photographed target relative to the microphone.
  • the determining the first gain control curve according to the target type of the photographed target and the distance of the photographed target relative to the microphone may also be: according to The target type of the captured target obtains the second gain control curve from the third mapping table; wherein the third mapping table contains multiple target types and the gain control curve corresponding to each target type in the multiple target types
  • the plurality of target types includes the target type of the photographed target; the first gain compensation curve is obtained from a fourth mapping table according to the distance of the photographed target relative to the microphone; wherein, the fourth mapping The table contains multiple distances and the gain compensation curve corresponding to each of the multiple distances; the multiple distances include the distance of the subject relative to the microphone.
  • the determining the frequency response curve of the first equalizer according to the target type of the captured target and the distance of the captured target relative to the microphone may be: Acquiring the target type of the subject and the distance of the subject relative to the microphone, acquiring the frequency response curve of the first equalizer from a fifth mapping table; wherein, the fifth mapping table contains multiple Target type, multiple distances, and the equalizer frequency response curve corresponding to the target type i and the distance j; wherein, the target type i is any one of the multiple target types, and the distance j is Any one of the plurality of distances; the plurality of target types include the target type of the photographed target, and the plurality of distances include the distance of the photographed target relative to the microphone.
  • the determining the frequency response curve of the first equalizer according to the target type of the captured target and the distance of the captured target relative to the microphone may also be:
  • the target type of the captured target obtains the second equalizer frequency response curve from the sixth mapping table; wherein, the sixth mapping table contains multiple target types and each target type in the multiple target types corresponds to Equalizer frequency response curve; the multiple target types include the target type of the photographed target; the first equalizer frequency response compensation is obtained from the seventh mapping table according to the distance of the photographed target relative to the microphone Curve; wherein, the seventh mapping table contains a plurality of distances and an equalizer frequency response compensation curve corresponding to each of the distances; the plurality of distances includes the subject relative to the microphone distance.
  • performing image recognition on the first image acquired by the camera component to obtain the target type of the captured object in the first image may be: performing image recognition on the first image to obtain the captured image
  • the image content of the target, the target type of the captured object is obtained from the eighth mapping table according to the image content of the captured object; wherein the eighth mapping table contains multiple image content and the multiple images
  • the target type corresponding to each image content in the content; the plurality of image content includes the image content of the photographed target.
  • the obtaining the orientation of the subject relative to the microphone according to the image recognition may be: acquiring the coordinates contained in the two-dimensional frame obtained by focusing the subject Points; obtain the orientation of the point on the subject relative to the microphone from the tenth mapping table according to the coordinate points contained in the two-dimensional frame; wherein, the tenth mapping table contains multiple coordinate points and The orientation corresponding to each coordinate point in the multiple coordinate points; the multiple coordinate points include the coordinate points included in the two-dimensional frame.
  • the obtaining the distance of the subject relative to the microphone according to the image recognition may be: according to the image content of the subject and the first image The size of the two-dimensional frame obtained by focusing the subject, and the distance of the subject relative to the microphone is obtained from the ninth mapping table; wherein, the ninth mapping table contains multiple image contents and multiple The size of the two-dimensional frame, and the distance corresponding to the image content k and the two-dimensional frame size l; wherein, the image content k is any one of the plurality of image contents, and the two-dimensional frame size l Is any two-dimensional frame size of the plurality of two-dimensional frame sizes; the plurality of image contents include the image content of the photographed target, and the plurality of two-dimensional frame sizes include the photographed target The size of the two-dimensional frame obtained by focusing.
  • the two-dimensional image frame obtained by focusing the subject can be realized by using the principle of autofocus in an electronic device.
  • the focusing of the subject can also be achieved in response to the manual focusing operation of the user, that is, the two-dimensional image frame obtained by focusing the subject can also be obtained in response to the manual focusing operation of the user.
  • the distance of the subject relative to the microphone can also be determined using multi-camera ranging.
  • Z ft / d is used to determine the distance from the subject to the camera.
  • Z is the distance from the subject to the camera
  • f is the focal length of the two cameras
  • d is the distance difference between the coordinate positions of the subject on the images of the two cameras
  • t is the physical distance between the two cameras.
  • the distance between the subject and the electronic device is far enough, the distance between the camera and the microphone can be ignored, then no coordinate system conversion is required, and the distance between the subject and the camera can be directly used as the distance between the subject and the microphone , The orientation of the subject relative to the camera is taken as the orientation of the subject relative to the microphone.
  • the distance between the subject and the microphone can also adopt other methods, such as using structured light ranging.
  • the technical solution of the present application does not limit the measurement method of the distance between the subject and the microphone.
  • the method further includes: The original audio signal and the fourth audio signal are superimposed to obtain a fifth audio signal; the original audio signal is an audio signal picked up by the microphone.
  • the fifth audio signal may be an audio signal used for audio output after the processing is completed.
  • the use of the filter to filter the audio signal picked up by the microphone may be specifically implemented as follows: determining the original of each channel of the multiple channels according to the audio signal picked up by the microphone Audio signal; processing the original audio signal of each channel, the processing including filtering using the filter.
  • the left and right channels separately perform spatial enhancement, filtering, gain control, and EQ frequency response control.
  • the audio signal processing and playback between the two channels do not affect each other, thereby improving the stereoscopic effect of the output audio signal.
  • the method before using the filter to filter the audio signal picked up by the microphone, the method further includes: displaying the audio processing strategy; the audio processing strategy includes the filter;
  • the use of the filter to filter the audio signal picked up by the microphone may be specifically implemented as follows: in response to a user's operation of the audio processing strategy, the filter is used to filter the audio signal picked up by the microphone.
  • the user's confirmation of the audio processing strategy automatically obtained by the electronic device can improve the accuracy and convenience of audio processing strategy recognition.
  • the processor in the electronic device may perform processing on the audio signal picked up by the microphone when starting to record audio and video. It can process the audio signal picked up by the sound pickup component in real time.
  • the audio processing strategy can be automatically selected in real time, which improves the convenience of audio processing strategy selection and improves the target type of different objects or Effects of audio signal processing in different recording scenarios.
  • the processor in the electronic device may perform processing on the audio signal picked up by the microphone after recording audio and video ends. It can reduce the occupation of the processor in the process of recording audio and video, improve the smoothness of the audio and video recording process, improve the convenience of audio processing strategy selection, and improve the processing effect of the audio signal under different recording target types or different recording scenarios .
  • the processor in the electronic device may also perform processing on the audio signal picked up by the microphone when the audio and video recording ends and the recorded audio and video signals are stored in the memory. Can reduce the occupation of the processor in the process of recording audio and video, improve the smoothness of the audio and video recording process. In this way, the audio signal picked up by the microphone is processed only when the audio and video signals need to be saved and recorded, thereby reducing the waste of processor resources when the audio and video signals are not saved and recorded, thereby saving processor resources.
  • the technical solution of the present application provides an audio processing method, the method includes: performing image recognition on a first image acquired by a camera component to obtain the target type of the captured object in the first image and the captured image The distance of the target relative to the microphone; the audio processing strategy is determined according to the target type of the subject and the distance of the subject relative to the microphone; the audio signal picked up by the microphone according to the audio processing strategy Be processed.
  • an image recognition method may be used to determine an audio processing strategy for processing the audio signal picked up by the microphone. Therefore, the convenience of audio processing strategy selection can be improved, and the processing effect of audio signals can be improved.
  • the target type of the photographed target may include a voice type and a non-voice type.
  • the image recognition finds a subject including "person”, it can be determined that the target type of the subject is a voice type.
  • the target type of the subject is a non-speech type.
  • the determining an audio processing strategy according to the target type of the subject and the distance of the subject relative to the microphone includes determining filtering according to the target type of the subject Determine the first gain control curve and the first equalizer frequency response curve according to the target type of the subject and the distance of the subject relative to the microphone; the audio processing strategy includes the filter, The first gain control curve and the first equalizer frequency response curve.
  • the processing of the audio signal picked up by the microphone according to the audio processing strategy includes filtering, gain control, and equalization of the audio signal picked up by the microphone according to the audio processing strategy Frequency response control.
  • the method before using the filter to filter the audio signal picked up by the microphone, the method further includes: obtaining the orientation of the photographed target relative to the microphone according to the image recognition; Performing spatial enhancement on the original audio signal in the orientation of the subject relative to the microphone to obtain a first audio signal; the original audio signal is the audio signal picked up by the microphone; and using the filter to the microphone
  • the filtered audio signal may be specifically implemented as follows: the first audio signal is filtered using the filter to obtain a second audio signal.
  • an image recognition method can also be used to determine the direction of spatial enhancement, so that the processing effect of the audio signal can be further improved.
  • performing spatial enhancement, filtering, gain control, and equalizer frequency response control on the audio signal picked up by the microphone according to the audio processing strategy includes: enhancing the original audio signal in the spatial Perform spatial enhancement in the direction of to obtain the first audio signal; the original audio signal is the audio signal picked up by the microphone; use the filter to filter the first audio signal to obtain a second audio signal; use the The first gain control curve performs gain control on the second audio signal to obtain a third audio signal; the first equalizer frequency response curve is used to perform equalizer frequency response control on the third audio signal to obtain a fourth audio signal.
  • the method further includes: combining the audio signal picked up by the microphone with a fourth audio signal A fifth audio signal is obtained by superposition; the fourth audio signal is an audio signal obtained after the audio signal picked up by the microphone undergoes spatial enhancement, filtering, gain control, and EQ frequency response control.
  • the fifth audio signal may be an audio signal used for audio output after the processing is completed.
  • the process of processing the audio signal picked up by the microphone does not limit the sequence of spatial enhancement, filtering, gain control, and EQ frequency response control.
  • spatial enhancement and filtering are performed, which can increase the audio signal from the captured target in the processed audio signal. Reduce the proportion of noise, thereby improving the processing effect of audio signals.
  • performing spatial enhancement, filtering, gain control, and EQ frequency response control on the audio signal picked up by the microphone according to the audio processing strategy may be specifically implemented as follows: according to the audio picked up by the microphone The signal determines the original audio signal of each channel of the multiple channels; the original audio signal of each channel is spatially enhanced, filtered, gain controlled, and EQ frequency response controlled according to the audio processing strategy.
  • the left and right channels separately perform spatial enhancement, filtering, gain control, and EQ frequency response control.
  • the audio signal processing and playback between the two channels do not affect each other, thereby improving the stereoscopic effect of the output audio signal.
  • the method before processing the audio signal picked up by the microphone according to the audio processing strategy, the method further includes: displaying the audio processing strategy; the method according to the audio processing strategy
  • the processing of the audio signal picked up by the microphone includes: in response to the user's operation of the audio processing strategy, performing spatial enhancement, filtering, gain control, and EQ frequency response on the audio signal picked up by the microphone according to the audio processing strategy control.
  • the user's confirmation of the audio processing strategy automatically obtained by the electronic device can improve the accuracy and convenience of audio processing strategy recognition.
  • the processor in the electronic device may perform processing on the audio signal picked up by the microphone when starting to record audio and video. It can process the audio signal picked up by the sound pickup component in real time.
  • the audio processing strategy can be automatically selected in real time, which improves the convenience of audio processing strategy selection and improves the target type of different objects or Effects of audio signal processing in different recording scenarios.
  • the processor in the electronic device may perform processing on the audio signal picked up by the microphone after recording audio and video ends. It can reduce the occupation of the processor in the process of recording audio and video, improve the smoothness of the audio and video recording process, improve the convenience of audio processing strategy selection, and improve the processing effect of the audio signal under different recording target types or different recording scenarios .
  • the processor in the electronic device may also perform processing on the audio signal picked up by the microphone when the audio and video recording ends and the audio and video signals recorded are stored in the memory. Can reduce the occupation of the processor in the process of recording audio and video, improve the smoothness of the audio and video recording process. In this way, the audio signal picked up by the microphone is processed only when the audio and video signals need to be saved and recorded, thereby reducing the waste of processor resources when the audio and video signals are not saved and recorded, thereby saving processor resources.
  • the technical solution of the present application provides an electronic device, including one or more processors and one or more memories.
  • the one or more memories are coupled to one or more processors.
  • the one or more memories are used to store computer program code.
  • the computer program codes include computer instructions.
  • the one or more processors execute the computer instructions, the electronic device is executed.
  • the technical solution of the present application provides an electronic device including any possible technical solution for performing any one of the first, second, third, and first to third aspects Module or unit of the method.
  • the technical solution of the present application provides a chip system including at least one processor, a memory and an interface circuit, the memory, the interface circuit and the at least one processor are connected, and the at least one memory stores a program Instructions; when the program instructions are executed by the processor, the method provided in any possible technical solution of the first aspect, the second aspect, the third aspect, and any one of the first to third aspects is implemented.
  • the technical solution of the present application provides a computer-readable storage medium that stores program instructions, and when the program instructions are executed by a processor, implement the first aspect, the second aspect, and the third Aspect and the method provided in any possible technical solution of any one of the first to third aspects.
  • the technical solution of the present application provides a computer program product that, when the computer program product is run on a processor, implements any one of the first, second, third, and first to third aspects The method provided by any possible technical solution.
  • the camera component may be at least one camera.
  • the electronic device includes one camera, or the electronic device has two cameras, or the electronic device has three cameras, or four cameras.
  • the camera in the solution may be located on the same side of the electronic device, for example, on the rear side of the electronic device.
  • FIG. 1 is a schematic diagram of selecting a recording mode on an electronic device provided by the prior art
  • FIG. 2 is a schematic diagram of an audio and video shooting scene provided by an embodiment of the present application.
  • FIG. 3 is a schematic diagram of a space enhancement implementation provided by an embodiment of the present application.
  • FIG. 4 is a schematic flowchart of an audio processing method provided by an embodiment of the present application.
  • FIG. 5 is a schematic diagram of identifying a photographed target provided by an embodiment of the present application.
  • FIG. 6 is a schematic diagram of determining the orientation between a subject and a microphone through a coordinate system conversion provided by an embodiment of the present application;
  • FIG. 7 is a schematic diagram of a principle for determining the orientation of a subject relative to a microphone provided by an embodiment of the present application.
  • FIG. 10 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.
  • the processor in the electronic device can call the camera to capture video clips of the target, and call the microphone to collect the target during the recording process. Audio.
  • FIG. 2 is a schematic diagram of an audio and video shooting scene provided by an embodiment of the present application.
  • the camera in the electronic device is associated with the “camera” icon displayed on the display screen, such as a camera app, and in response to the user ’s operation on the icon, such as the touch selection of the icon, the processor in the electronic device may start Camera to capture images.
  • the camera In the "video" mode of the camera, the camera can be configured to capture video.
  • the processor in the electronic device may collect the video captured by the camera and the audio captured by the microphone after the first key detects the touch operation.
  • the first display control may be displayed on the display screen, and the first display control is used to time the duration of recording the video after the first key is touched.
  • the second button such as a touch operation
  • the electronic device stops collecting the video captured by the camera and stops capturing the audio captured by the microphone, and the first display control stops timing.
  • the first key and the second key may be the same key or different keys.
  • the electronic device completes an audio and video shooting process
  • the playback duration of the obtained audio and video clip is the difference between the time when the user operates the second key and the time when the user operates the first key.
  • the video part contains the picture continuously captured by the camera from the start of the user's operation on the first key to the end of the user's operation on the second key.
  • the audio part contains audio continuously captured by the microphone from the start of the user's operation of the first key to the end of the user's operation of the second key.
  • the time displayed by the first display control is 00:00, and the first display control continues to time until the user operates the second key.
  • the first display control stops timing and displays The timing is 02:15.
  • the playback duration of the audio and video clip obtained after the user operates the second button may be 02:15.
  • the electronic device may call one or more cameras to complete the video signal pickup.
  • the multiple cameras are all associated with the "camera" icon displayed on the display screen, and the video signals picked up by the multiple cameras can be acquired by the processor.
  • the collection of audio signals depends on the microphone to capture.
  • the processor in the electronic device performs different parameter processing on the original audio signal collected by the microphone.
  • Parameter processing may include, for example, digital filtering, spatial enhancement, gain control, and EQ frequency response control.
  • an audio processing method In order to improve the convenience of recording scene selection and improve the accuracy of parameter processing in different recording scenes, embodiments of the present application provide an audio processing method.
  • This audio processing method can be applied to scenes where audio and video shooting is performed using electronic devices, such as the scenes described in FIG. 2.
  • the processor in the electronic device can focus on the subject picked up by the image picked up by the camera and recognize the subject in the image.
  • the audio signal picked up by the sound pickup component is spatially enhanced to increase the audio intensity of the orientation of the subject relative to the microphone.
  • the audio intensity in the direction decreases.
  • a filter corresponding to the target type of the subject is determined, and the filter can filter out the noise signal of the audio signal of the subject.
  • the audio processing strategy may include the selected filter, gain control curve and EQ frequency response curve.
  • the processing effect of the audio signal may be a combination of the target type of the captured object from which the audio signal comes and the recording scene, which conforms to the type and the listening rule of the user in the recording environment.
  • the filter can be used to reduce noise in the frequency domain, retain audio signals from the recorded target, and filter out audio signals from targets other than the recorded target.
  • the filter involved in the embodiment of the present application may be a digital filter, which is implemented by the processor calling an algorithm model.
  • Different types of audio signals have different frequency domain distribution probability characteristics, and the frequency domain distribution probability characteristics of microphone pickup audio signals in different recording scenarios are different.
  • the frequency domain distribution probability characteristics of different types of audio signals can be summarized as prior information to guide the gain estimation of various types of audio at various frequency points, and the frequency domain distribution probability characteristics of audio signals under different recording scenes can be summarized as prior information to guide each The gain estimation of the audio signal at various frequency points in the recording scene.
  • Different types of audio may include, for example, voice, bird song, running water, piano music, music, and so on.
  • the frequency range of human speech is 85HZ-8kHZ.
  • An important feature of speech signals is the pitch period.
  • the pitch period is the time interval or the frequency of opening and closing of the human glottis.
  • Different recording scenes can include: conference scenes, karaoke scenes, remote scenes, etc. For example, in a conference scene and a karaoke scene, people's voice frequency range and gain size requirements are very different.
  • Each audio signal target type can correspond to a filter, and each recording scene can also correspond to a filter.
  • the target type of the audio signal is the target type of the audio from the recorded target, that is, the target type of the recorded target in the context.
  • a filter corresponding to the recording scene may be an algorithm model implemented by a processor, and the filter may be determined through machine learning.
  • the clean speech signal is used as the supervision signal, and iterative optimization of the filter parameters until the output result of the filter on the mixed audio signal approaches the supervision signal and converges, thereby generating a frequency domain reduction for the speech type target signal Noise filter.
  • the mixed audio signal includes a voice signal and other types of audio signals.
  • the processor in the electronic device can filter out other types of audio signals in the mixed audio signal through the filter corresponding to the trained voice signal, and only the voice signal is retained.
  • other types of audio signals are noise signals relative to the voice signals.
  • the mixed audio signal used for machine learning can be obtained by superimposing a clean speech signal with a noise signal.
  • the clean speech signal is used as the supervision signal, and the mixed audio signal is used as the input signal of the filter to iteratively optimize the parameters of the filter.
  • the spatial enhancement can realize the enhancement of the audio signal in a specific direction, and weaken the audio signal in the direction other than the specific direction.
  • the specific direction may be the orientation of the recorded object relative to the microphone.
  • the processor in the electronic device can process the original audio signal received by the microphone or adjust the direction of the microphone to enhance the audio intensity of the collected audio signal in the orientation of the target ,
  • the audio intensity in the remaining directions is weakened, that is, the audio of the recorded target is spatially enhanced.
  • the spatially enhanced azimuth may include the direction center and angle range.
  • the direction center represents the center position of the azimuth
  • the angle range represents the angle area covered by the azimuth.
  • the processor in the electronic device can adjust the direction of the microphone to the direction center of the target;
  • the processor in the electronic device can use an algorithm to increase the audio intensity in the direction of the target.
  • the intensity of the audio signal captured by the microphone is related to the orientation of the recorded object relative to the microphone.
  • FIG. 3 is a schematic diagram of a space enhancement implementation provided by an embodiment of the present application.
  • the direction of the microphone can coincide with the center of the direction where the target is located.
  • the intensity of the audio signal collected by the microphone from the recorded target is the strongest, and it The noise signal is the weakest.
  • the direction of the microphone may refer to the direction in which the microphone captures the audio signal. In the orientation of the target, the direction center and angle range may be relative to the microphone.
  • the processor in the electronic device can adjust the direction of each microphone to the direction center of the target relative to the microphone, thereby achieving more
  • Each of the three microphones spatially enhances the audio of the recorded target.
  • the audio signal propagating in space is a signal generated by vibration. Because the distance between multiple microphones and the sound source in the electronic device is different, the multiple microphones receive different audio signals from the sound source at the same time. Specifically, the multiple microphones capture sound sources to obtain multiple audio signals. Due to the time delay, the phases of the multiple audio signals are different. Then, when the multiple audio signals are superimposed, the audio signals with the same phase are superimposed and the audio signals with the opposite phase are superimposed and cancelled.
  • the processor in the electronic device can perform time delay compensation or phase compensation on the multiple audio signals picked up by multiple microphones, so that the multiple audio signals cancel when superimposed, thereby Reduce the strength of audio signals from non-target directions.
  • the non-target direction is a direction other than the bearing of the target.
  • the processor in the electronic device can perform time delay compensation or phase compensation on the multiple audio signals picked up by multiple microphones, so that the multiple audio signals are enhanced when superimposed, thereby enhancing the azimuth from the target The strength of the audio signal.
  • multiple microphones are used to spatially enhance the audio of the recorded object.
  • Gain control refers to adjusting the audio signal strength of the audio signal picked up by the microphone. Gain control can adjust the magnification of various amplitude signals. The gain corresponding to different signal amplitudes can be different. Gain control is related to one or more of the following factors: the target type of the recorded target, the recording scene, and the distance between the recorded target and the microphone. The following describes the factors that affect the gain control.
  • the target type of the recorded target is the target type of the captured target obtained by the image recognition in the context.
  • the recorded target is the captured target obtained by the image recognition in the context.
  • the recording signal strength of the audio signal at different input sound pressure levels For audio signals of different target types, users have different requirements for the recording signal strength of the audio signal at different input sound pressure levels, and the corresponding gains are different.
  • the processor in the electronic device can amplify the recorded audio signal at each sound pressure level to a fixed signal strength.
  • Sound pressure level refers to the size of the effective sound pressure relative to a reference value measured by a logarithmic scale, and the relationship between the effective sound pressure and the reference value is described in decibels (dB).
  • the human hearing threshold that is, the lowest sound pressure that produces hearing
  • a sound of 1 KHz is 20 ⁇ Pa, and this hearing threshold is usually used as a reference value for the sound pressure level.
  • the abscissa of the gain control curve may be the magnitude of the input audio signal, and the ordinate may be the gain.
  • the gain control curve can be set for the corresponding target type, so that the audio signal of the target type conforms to the listening rule of the user after gain control.
  • the corresponding gain control curve can achieve a constant signal strength of the audio signal output after gain control.
  • the gain can be reduced so that the signal intensity of the voice signal output after gain control is not too large.
  • the gain can be increased to make the signal intensity of the voice signal output after gain control not too small.
  • each recording scene it can also correspond to a gain control curve.
  • the abscissa of the gain control curve may be the amplitude of the picked-up signal, and the ordinate may be the magnitude of the gain.
  • the gain control curve can be set for the corresponding recording scene to make this type of audio signal conform to the user's listening rule after gain control. For example, for the same voice signal, the gain control curves corresponding to the k-song recording scene and the far-field recording scene are completely different. In the karaoke recording scene, the corresponding gain control curve can realize that after gain control, the signal strength of the audio signal corresponding to the signal other than the small signal collected by the microphone is constant, and the small signal is suppressed, the gain of the small signal is reduced.
  • the corresponding gain control curve can realize that after gain control, the small signal collected by the microphone is amplified, that is, the gain of the small signal is increased.
  • the small signal may be the sum of signals whose signal amplitude is less than the preset amplitude.
  • the mapping relationship may be stored in the electronic device: the target type of a recorded target and a recording scene jointly map a gain control curve.
  • the mapping relationship can also be stored in the electronic device: a target type of a recorded target maps a gain control curve.
  • the mapping relationship can also be stored in the electronic device: a recording scene is mapped to a gain control curve.
  • the following mapping relationship may also be stored in the electronic device: the target type A of the recorded target, the recording scene B, and the distance C between the recorded target and the microphone map a gain control curve.
  • the target type A is any target type
  • the recording scene B is any recording scene
  • the distance C between the recorded target and the microphone is any distance gradient.
  • the processor in the electronic device can set the gain to be proportional to the distance of the sound source target.
  • the gain compensation curve can be superimposed on the gain control curve to compensate for the effect of the distance between the recorded target and the microphone on the gain, and jointly complete the gain control.
  • the distance gradient may include, for example, far, far, medium, close, and close.
  • the adjustment of EQ frequency response can compensate the defects of the speaker and sound field, and accurately restore the original recorded audio signal.
  • EQ frequency response can adjust the magnification of audio signals of various frequency components in the audio signal.
  • EQ frequency response is also related to one or more of the following factors: the target type of the recorded target, the recording scene, and the distance between the recorded target and the microphone. The following describes the factors that affect the frequency response of EQ.
  • the gain of the 5kHz component in the speech signal can be increased to improve the clarity of the speech signal.
  • the gain of the 1.8kHz and 2.5kHz components in the speech signal can be reduced to soften and purify the speech signal.
  • the audio signal of the piano music is mostly concentrated in the intermediate frequency region, such as 3kHz or 4kHz. A slight increase in the gain near the 8kHz component in the audio signal of piano music can make the treble keys sound brighter.
  • the abscissa of the EQ frequency response curve may be the frequency of the input audio signal, and the ordinate may be the gain.
  • the EQ frequency response curve can be set for the corresponding target type, so that the audio signal of the target type conforms to the listening rule of the user after the EQ frequency response.
  • EQ frequency response adjustment can adjust the timbre of the audio signal.
  • users have different gain requirements for different frequency components in the audio signal.
  • the vocal signal can be highlighted by increasing the gain of the intermediate frequency component.
  • the intermediate frequency component may include 1-4 kHz, for example.
  • the sound needs to be as thick as possible, you can retain as many low-frequency components as possible, that is, to increase the gain of low-frequency components. If the sound needs to be loud, you can increase the gain of the 60Hz component and the 120Hz component, and increase the gain of the high-frequency component near 7kHz.
  • the abscissa of the EQ frequency response curve may be the frequency of the input audio signal, and the ordinate may be the gain.
  • the EQ frequency response curve can be set for the corresponding recording scene to make the audio signal of the recording scene conform to the user's listening rules after the EQ frequency response.
  • the gain of the high-frequency signal is greater than that of the low-frequency signal when the distance between the microphone and the microphone is equal.
  • EQ frequency response can also correspond to EQ frequency response curve.
  • the abscissa of the EQ frequency response curve can be the frequency of the audio signal, and the ordinate can be the gain.
  • the EQ frequency response compensation curve can be superimposed on the EQ frequency response curve as a final curve for adjusting the audio signal.
  • the electronic device may also store the following mapping relationship: the target type A of the recorded target, the recording scene B, and the distance C between the recorded target and the microphone map an EQ frequency response curve.
  • the target type A is any target type
  • the recording scene B is any recording scene
  • the distance C between the recorded target and the microphone is any distance gradient.
  • the process of image recognition technology can include: information acquisition, preprocessing, feature extraction and selection, classifier design and classification decision. The following are introduced separately.
  • the acquisition of information refers to the conversion of optical information into electrical information through sensors. That is to obtain the basic information of the research object and transform it into information that the machine can recognize by some method.
  • Preprocessing mainly refers to the operations of denoising, smoothing, transforming, etc. in image processing, thereby enhancing the important features of the image.
  • Feature extraction and selection refers to the need for feature extraction and selection in pattern recognition. Since different images need to be classified, they can be distinguished by the features of these images, and the process of acquiring these features is feature extraction. The features obtained in feature extraction may not be all useful for this recognition. At this time, it is necessary to extract useful features. This is the feature selection.
  • Classifier design refers to a recognition rule obtained through training. Through this recognition rule, the processor in the electronic device can obtain a feature classification, so that the image recognition technology can obtain a high recognition rate. Classification decision refers to classifying the identified objects in the feature space, so as to better identify the specific category of the object under study.
  • Image recognition technology can be implemented using computer vision algorithms.
  • Computer vision algorithm is a mathematical model that helps computers understand images. The core idea of computer vision algorithms is to use data-driven methods to learn statistical characteristics and patterns from big data. Generally, a large number of training samples are required to train the model. Specifically, computer vision algorithms can be used to model image features including textures, colors, shapes, spatial relationships, and high-level semantics.
  • the initial model is trained through training samples, and the parameters in the initial model are adjusted to converge the error of image recognition to construct a new model. After the training is completed, the processor in the electronic device can predict the image classification and classification probability through the new model, thereby performing image recognition.
  • Computer vision algorithms can be implemented using deep learning algorithms based on artificial neural networks.
  • the deep learning algorithm of the artificial neural network can extract image features through multiple neural network layers and calculate the probability that the image contains preset image features.
  • the deep learning algorithm of the artificial neural network may be, for example, a convolutional neural network (CNN).
  • CNN convolutional neural network
  • Deep learning algorithms can extract image features through convolutional neural networks and calculate the probability that the image contains preset image features.
  • the convolutional neural network used for image recognition can be regarded as a classifier, which uses the convolutional neural network to classify the image, classifies the image input to the convolutional neural network, and obtains the probability of each classification.
  • the convolutional neural network may be a new model obtained by adjusting the parameters in the initial model after training samples to adjust the parameters in the initial model to make the recognition error converge.
  • the parameters in the model may include, for example, the size of the convolution kernel, the size of the pooling kernel, the number of fully connected layers, and so on.
  • the target type of the photographed target may be determined through information acquisition, preprocessing, feature extraction and selection, classifier design, and classification decision.
  • the image recognition in the embodiment of the present application may further include: determining the distance from the subject to the camera according to the size of the two-dimensional frame, and determining the orientation of the subject according to the intersection of the image grid lines where the target is located.
  • the sound pickup component may include a microphone or a microphone array composed of multiple microphones.
  • the microphone array is a system composed of a certain number of microphones used to sample and process the spatial sound field.
  • the processor in the electronic device can use the difference between the phases of the audio signals received by the multiple microphones to filter the sound waves, which can remove the ambient background sound to the maximum extent, leaving the audio signal from the recorded target.
  • the sound pickup component may also include a dedicated processing chip connected to the microphone, and the dedicated processing chip may be used to implement one or more of the following: filters, spatial enhancement, gain control, and EQ frequency response.
  • the camera component may include a camera, and the camera is used to pick up images within a range of viewing angles, and these images are accumulated in time to obtain a video signal.
  • the number of cameras in the camera assembly may be one or more.
  • the video signals picked up by the multiple cameras can be acquired by the processor.
  • the processor in the electronic device can collect the images picked up by the camera and store these images and video signals in a buffer or storage device.
  • the camera component may also include a dedicated processing chip connected to the camera, the dedicated processing chip may be used to implement one or more of the following: subject recognition, target type recognition, scene recognition, target on the image Recognition of the bearing and distance of the target relative to the camera.
  • the dedicated processing chip may be used to implement one or more of the following: subject recognition, target type recognition, scene recognition, target on the image Recognition of the bearing and distance of the target relative to the camera.
  • the target type of the captured object may be obtained by performing image recognition on the image picked up by the camera.
  • the image content is obtained by image recognition, and the image content may be, for example, a portrait, a bird, a waterfall, a piano, a band, or the like.
  • the target type of the subject can be determined according to the audio type associated with each image content.
  • the target type of the photographed target corresponds to different types of audio signals. Specifically, please refer to Table 1, which is an example of the mapping relationship between the image content and the target type of the captured target provided by the embodiment of the present application.
  • Table 1 An example of the mapping relationship between the image content and the target type of the captured target provided by the embodiment of the present application
  • the electronic device may pre-store the mapping table.
  • the target type corresponding to the captured object is "Voice” type through Table 1.
  • multiple image contents may correspond to the target type of a subject. This is because the target type of the same subject may contain multiple image contents. For example, for “water flowing sound”, the image contents “waterfall” and “river” may be corresponding.
  • piano music can correspond to the image content “piano” and “piano score”.
  • the image content "band” and “player” can be corresponded to.
  • Table 1 may be preset in the memory of the electronic device according to a priori experience, and Table 1 may be called by the processor in the electronic device to determine the target type of the photographed target. Table 1 above is an example of the eighth mapping table in the context.
  • FIG. 4 is a schematic flowchart of an audio processing method according to an embodiment of the present application.
  • the audio processing method is applied to an electronic device, and the electronic device includes a camera component and a sound pickup component.
  • the camera component is used to pick up video signals and perform image recognition.
  • the pickup component is used to pick up audio signals.
  • the audio processing method involved in the embodiment of the present application will be described below with reference to FIG. 4.
  • the processor in the electronic device collects the image picked up by the camera component, performs image focusing, and obtains the photographed target.
  • the processor in the electronic device uses image recognition to determine the target type of the photographed target, the orientation relative to the microphone, and the distance relative to the microphone.
  • the processor in the electronic device determines the space-enhanced orientation according to the orientation of the subject relative to the microphone.
  • the processor in the electronic device determines the filter according to the target type of the photographed target.
  • the processor in the electronic device determines the gain control curve and the EQ frequency response curve according to the distance of the subject relative to the microphone and the target type to which the subject belongs.
  • the processor in the electronic device obtains the original audio signal picked up by the sound pickup component.
  • the processor in the electronic device spatially enhances the audio signal picked up by the sound pickup component according to the direction of increasing space, and outputs the first audio signal.
  • the processor in the electronic device filters the first audio signal according to the determined filter to filter out the noise signal to obtain the second audio signal.
  • the processor in the electronic device performs gain control on the second audio signal according to the determined gain control curve to obtain a third audio signal.
  • the processor in the electronic device performs EQ frequency response on the third audio signal according to the determined gain control curve to obtain the fourth audio signal.
  • the processor in the electronic device adds the fourth audio signal and the original audio signal picked up by the sound pickup component to obtain a fifth audio signal.
  • the fifth audio signal may be an audio signal used for audio output after the processing is completed.
  • using image recognition to obtain the type of the subject, the orientation relative to the microphone, and the distance relative to the microphone can improve the accuracy of scene recognition and subject recognition in the recording scene. Then determine the audio processing strategy according to the type of the target, the orientation relative to the microphone and the distance relative to the microphone, which can filter out the interference signals in the audio signal and improve the processing effect of the audio signal.
  • step S101 is a schematic diagram of identifying a photographed target provided by an embodiment of the present application.
  • the processor starts to collect the video signal picked up by the camera
  • the camera can focus on the subject and display the focused subject through the two-dimensional pixel area framed by the two-dimensional frame.
  • the electronic device stops collecting video captured by the camera and stops capturing audio captured by the microphone, and the first display control stops timing.
  • the two-dimensional image frame obtained by focusing the subject can be realized by using the principle of autofocus in an electronic device.
  • an autofocus principle is listed: the motor can be used to drive the lens in the camera to move along the optical axis to achieve focusing. Using the motor driver chip to output the corresponding current, the motor will make a corresponding displacement. Under this displacement, the camera picks up the image, and determines whether the lens has reached the position where the captured image is clear (such as the clearest position) by the clarity of the picked up image If the clear position is not reached, the motor driver chip is notified again to adjust the output current, and the above process is repeated until the judgment result is that the lens reaches the clear position of the captured image. Focusing is accomplished through the above closed-loop adjustment process.
  • the focusing of the subject may also be achieved in response to a manual focusing operation by the user.
  • the two-dimensional pixel area corresponding to the two-dimensional frame can be used as the image area corresponding to the subject.
  • the two-dimensional pixel area can determine the target type of the subject, the distance between the subject and the microphone, and the orientation between the subject and the camera. The following describes how to determine the target type of the subject, the distance between the subject and the microphone, and the orientation between the subject and the microphone through image recognition.
  • Image recognition determines the target type of the subject
  • the processor in the electronic device can obtain the image content by performing image recognition on the two-dimensional pixel area, or it can be the image content obtained by performing image recognition on the image of the subject.
  • the processor in the electronic device can use the table one to look up the table to obtain the target type of the photographed target.
  • Table 1 is an example of the eighth mapping table in the context.
  • Image recognition can determine the distance between the subject and the camera. Since both the camera and the microphone are installed in the electronic device, the distance between the subject and the camera can be approximately regarded as the distance between the subject and the microphone. Image recognition determines the distance between the subject and the camera, which can be a distance gradient, for example, it can include: far, far, medium, close, and close. Image recognition may determine the distance between the subject and the camera by using the size and image content of the two-dimensional frame. For the same image content, the larger the two-dimensional frame obtained by focusing, the closer the distance between the subject and the camera, and the smaller the two-dimensional frame obtained by focusing, the greater the distance between the subject and the camera far.
  • mapping relationship between the size of the two-dimensional frame obtained by focusing and the distance gradient can be pre-stored.
  • Table 2 is an example of the mapping relationship between the size of the two-dimensional frame and the distance gradient when the image content is a portrait provided by an embodiment of the present application.
  • Table 2 An example of the mapping relationship between the size of the two-dimensional frame and the distance gradient when the image content is a portrait
  • the size of the two-dimensional frame can be expressed by the number of pixels in the pixel area occupied by the two-dimensional frame.
  • a, b, c, d, e, and f respectively represent the number of pixels, and a ⁇ b ⁇ c ⁇ d ⁇ e ⁇ f.
  • the size of the two-dimensional frame is in the range of a ⁇ a ⁇ b ⁇ b, the distance gradient obtained by the mapping is “far”, and the size of the two-dimensional image gallery is b ⁇ b ⁇ c ⁇ c, c ⁇ c ⁇ d ⁇ d, d ⁇ d ⁇ e ⁇ e and e ⁇ e ⁇ f ⁇ f, the corresponding distance gradients are “farther”, “middle”, “closer” and “nearer”.
  • the processor in the electronic device finds the mapping relationship between the size of the two-dimensional frame corresponding to the portrait and the distance gradient according to the image content portrait, that is, table two, according to table two and two For the size of the two-dimensional image frame, find the distance gradient corresponding to the two-dimensional image frame from Table 2 above.
  • Table 2 may also be specifically implemented as a two-dimensional mapping table, which includes multiple image contents, multiple two-dimensional frame sizes, and image content k and two-dimensional frame size l correspond to each other The distance; wherein, the image content k is any one of the plurality of image content, the two-dimensional frame size l is any one of the plurality of two-dimensional frame size any two-dimensional frame size; the The plurality of image contents includes the image content of the subject, and the size of the plurality of two-dimensional frames includes the size of the two-dimensional frame obtained by focusing the subject.
  • the two-dimensional mapping table is the ninth mapping table in the context.
  • the distance between the subject and the microphone can also be determined using the principle of multiple cameras ranging.
  • the distance between the subject and the camera is measured by multiple cameras as the distance between the subject and the microphone.
  • the distance between the subject and the camera can be determined using the disparity of the subject's imaging in multiple cameras.
  • the distance from the subject to the camera is inversely proportional to the distance from the subject to the imaging plane. In the scenario of two cameras:
  • Z is the distance from the subject to the camera
  • f is the focal length of the two cameras
  • d is the distance difference between the coordinate positions of the subject on the images of the two cameras
  • t is the physical distance between the two cameras.
  • the above example for determining the distance between the subject and the microphone is only used to explain the embodiment of the present application, and should not be construed as a limitation.
  • the distance between the subject and the microphone can also adopt other methods, such as using structured light ranging.
  • the embodiment of the present application does not limit the measurement method of the distance between the subject and the microphone.
  • the conversion relationship between the two three-dimensional coordinate systems can be determined according to the fixed position relationship between the camera microphones.
  • the processor in the electronic device can obtain the coordinates in the three-dimensional coordinate system corresponding to the camera on the subject through image recognition.
  • the processor in the electronic device can use the conversion relationship between the two coordinate systems to determine the coordinates of the subject in the three-dimensional coordinate system corresponding to the microphone.
  • the processor in the electronic device may determine the orientation between the subject and the microphone according to the coordinates of the subject in the three-dimensional coordinate system corresponding to the microphone.
  • the coordinates on the subject in the three-dimensional coordinate system corresponding to the camera may be the three-dimensional corresponding to each point of the multiple points on the subject Coordinates in the coordinate system.
  • the processor in the electronic device Converted to the three-dimensional coordinate system corresponding to the microphone, the processor in the electronic device can obtain the coordinates of multiple points on the subject in the three-dimensional coordinate system corresponding to Kefeng.
  • the processor in the electronic device may determine the orientation between the subject and the microphone according to the coordinates of multiple points on the subject in the three-dimensional coordinate system corresponding to the microphone.
  • FIG. 6 is a schematic diagram of determining the orientation between a subject and a microphone through a coordinate system conversion provided by an embodiment of the present application.
  • a three-dimensional coordinate system OXYZ is established.
  • Image recognition can determine the coordinates (i, j, k) of the point A on the subject within the three-dimensional coordinate system OXYZ.
  • the X axis is parallel to the horizontal plane and parallel to the display plane of the mobile phone
  • the Y axis is parallel to the display plane of the mobile phone and perpendicular to Z
  • the direction of the axis, the Z axis is the direction of the optical axis of the camera.
  • the X1 axis is the direction parallel to the horizontal plane and the display plane of the mobile phone
  • the Y1 axis is the direction parallel to the display plane of the mobile phone and perpendicular to the Z1 axis
  • the Z1 axis is the direction perpendicular to the display plane of the mobile phone. It can be obtained that the X axis is parallel to the X1 axis
  • the Y axis is parallel to the Y1 axis
  • the Z axis is parallel to the Z1 axis.
  • the values of i0, j0, and k0 can be determined according to the fixed position relationship between the camera microphones in the electronic device.
  • the coordinate value k in the direction of the distance between the subject and the camera can be obtained from the distance gradient in Table Estimated.
  • the coordinate value k in the direction of the distance between the subject and the camera may also be obtained by distance measurement using dual cameras.
  • the processor in the electronic device can use the above coordinate system conversion to determine the coordinates of multiple points on the subject within the three-dimensional coordinate system O1X1Y1Z1. Then the processor in the electronic device can calculate the orientation between the subject and the microphone according to the coordinates of multiple points on the subject in the three-dimensional coordinate system O1X1Y1Z1.
  • the processor in the electronic device may use the position of the subject in the image picked up by the camera to determine the orientation of the subject relative to the camera.
  • the orientation of the subject relative to the camera is then used as the orientation of the subject relative to the microphone. Understandably, when the distance between the subject and the electronic device is sufficiently far away, the distance between the camera and the microphone can be ignored, so no coordinate system conversion is required, and the orientation of the subject relative to the camera can be directly used as the relative position of the subject The orientation of the microphone.
  • FIG. 7 is a schematic diagram of a principle for determining the orientation of a subject relative to a microphone according to an embodiment of the present application.
  • the processor in the electronic device may discretize the screen shot by the camera, and pre-store the orientation corresponding to the intersection point on the intersection point of each grid line.
  • the processor in the electronic device may determine the orientation corresponding to the intersection according to one or more intersections in the focused two-dimensional image area. As shown in FIG. 7, the coordinates of grid intersection points A and B in the two-dimensional image area are (x0, y0) and (x1, y1).
  • the direction corresponding to the intersection point A (x0, y0) is expressed as ( ⁇ 0, ⁇ 0)
  • the direction corresponding to the intersection point B (x1, y1) is expressed as ( ⁇ 1, ⁇ 1).
  • the orientation of the subject relative to the camera is obtained according to the direction of the subject ( ⁇ 0, ⁇ 0) and ( ⁇ 1, ⁇ 1).
  • the processor in the electronic device uses the position of the subject in the image picked up by the camera to determine the orientation of the subject relative to the camera, which may be specifically implemented as: acquiring the coordinate points contained in the two-dimensional frame obtained by focusing the subject. Obtain the orientation of the point on the subject relative to the microphone from the tenth mapping table according to the coordinate points contained in the two-dimensional frame.
  • the tenth mapping table contains multiple coordinate points and the orientation corresponding to each coordinate point in the multiple coordinate points; the multiple coordinate points include the coordinate points contained in the two-dimensional frame.
  • the intersection points A and B of the grid lines are the coordinate points contained in the two-dimensional frame obtained by focusing the subject.
  • ⁇ 0 and ⁇ 0 are the zenith angle and azimuth angle of the point corresponding to point A on the subject in the spherical coordinate system representation corresponding to the coordinate system OXYZ.
  • the radial distance of the point corresponding to point A on the subject is r0.
  • ⁇ 1 and ⁇ 1 are the zenith angle and azimuth angle of the point corresponding to point B on the subject in the spherical coordinate system representation corresponding to the coordinate system OXYZ.
  • the radial distance of the point corresponding to point B on the subject is r1.
  • the processor in the electronic device can obtain the coordinates of points A and B on the subject in the coordinate system OXYZ as ( r0cos ⁇ 0) and ( r1cos ⁇ 1).
  • the processor in the electronic device may determine the orientation of the subject relative to the camera by using the position of each of the multiple images picked up by the camera by the subject.
  • intersection point of each grid line and the orientation corresponding to the intersection point prestored in the electronic device may be pre-measured: the intersection point (xi, yi) of the grid line and its corresponding orientation ( ⁇ i, ⁇ i).
  • the intersection point (xi, yi) is any intersection point.
  • the number of grid line intersections can be k, k is a positive integer, and i is a positive integer that satisfies 1 ⁇ i ⁇ k.
  • An example of measuring the azimuth process corresponding to the intersection point C (xi, yi) and the intersection point C (xi, yi) may be: placing the subject first in front of the camera, that is, the spherical coordinate system corresponding to the coordinate system OXYZ The zenith angle and azimuth angle of the subject in the representation are both 0. Keeping the camera position still, rotate the electronic device until the subject appears at the intersection point C (xi, yi) in the image captured by the camera. The angles ⁇ i and ⁇ i of the rotation of the electronic device are recorded, which is the orientation corresponding to the intersection C (xi, yi).
  • the radial distance of the point on the object can be measured by multiple cameras. Then use the coordinate transformation in formula (2) to get the coordinates of the point on the subject in the coordinate system O1X1Y1Z1, and then get the orientation of the subject relative to the microphone.
  • the radial distance r0 of the point on the subject corresponding to point A and the radial distance r1 of the point on the subject corresponding to point B can be obtained by dual camera distance measurement.
  • the coordinates of points A and B on the subject in the coordinate system OXYZ are ( r0cos ⁇ 0) and ( r1cos ⁇ 1).
  • the orientation between the subject and the microphone is calculated according to the coordinates of the two points on the subject in the three-dimensional coordinate system O1X1Y1Z1.
  • step S103 the specific determination process of the spatially-enhanced azimuth, filter, gain control curve, and EQ frequency response curve will be introduced respectively.
  • the processor in the electronic device may determine the orientation of the subject obtained in step S102 relative to the microphone as the orientation of spatial enhancement.
  • the spatially enhanced orientation is used to spatially enhance the original audio signal.
  • the processor in the electronic device may determine the filter according to the target type of the photographed target obtained in step S102.
  • the first mapping table may be stored in the memory in the electronic device, and the filter may be obtained from the first mapping table according to the target type of the photographed target.
  • the first mapping table includes multiple target types, and a filter corresponding to each target type in the multiple target types.
  • the multiple target types include the target type of the photographed target.
  • the processor in the electronic device may determine the filter based on the content stored in the memory and the target type of the photographed target.
  • For the specific description of the filter please refer to the foregoing specific description of the concept, which will not be repeated here.
  • the processor in the electronic device may also use image recognition to obtain an image scene as a recording scene.
  • the process of obtaining an image scene by image recognition can be compared with the target type of the captured object obtained by image recognition in step S102.
  • the processor in the electronic device can determine the filter according to one or more of the following: the target type of the photographed target and the recording scene.
  • the processor in the electronic device may determine the gain control curve and the EQ frequency response curve according to the distance of the subject relative to the microphone obtained in step S102 and the target type to which the subject belongs.
  • the processor in the electronic device may also use image recognition to obtain an image scene as a recording scene.
  • the process of obtaining an image scene by image recognition can be compared with the target type of the captured object obtained by image recognition in step S102.
  • the processor in the electronic device may determine the gain control curve according to one or more of the following: the target type of the recorded target, the recording scene, and the distance between the recorded target and the microphone.
  • the gain control curve For a specific description of the gain control curve, reference may be made to the foregoing specific description of the concept of gain control, which will not be repeated here.
  • the processor in the electronic device determines the first gain control curve according to the target type of the photographed target and the distance of the photographed target relative to the microphone, which may be specifically implemented as follows: according to the target type of the photographed target and the distance of the photographed target relative to the microphone, Obtain the first gain control curve from the second mapping table.
  • the second mapping table includes multiple target types, multiple distances, and a gain control curve corresponding to the target type i and the distance j.
  • the target type i is any one of the multiple target types
  • the distance j is any one of the multiple distances.
  • the multiple target types include the target type of the photographed target, and the multiple distances include the distance of the photographed target relative to the microphone.
  • the first gain control curve is a gain control curve selected from a plurality of gain control curves.
  • the processor in the electronic device determines the first gain control curve according to the target type of the photographed target and the distance of the photographed target relative to the microphone, and may also be specifically implemented as follows: The target type of obtains the second gain control curve from the third mapping table.
  • the third mapping table contains multiple target types and the gain control curve corresponding to each of the multiple target types; multiple target types include the target type of the captured target; according to the distance of the captured target relative to the microphone from The first gain compensation curve is obtained in the fourth mapping table; wherein, the fourth mapping table includes multiple distances and gain compensation curves corresponding to each distance in the multiple distances; multiple distances include the distance of the photographed target relative to the microphone.
  • the processor in the electronic device can also determine the EQ frequency response curve according to one or more of the following: the target type of the recorded target, the recording scene, the distance between the recorded target and the microphone, and the frequency of the audio signal.
  • the EQ frequency response curve For a specific description of the EQ frequency response curve, reference may be made to the foregoing specific description of the EQ frequency response concept, which will not be repeated here.
  • the processor in the electronic device determines the first EQ frequency response curve according to the target type of the target and the distance of the target relative to the microphone, which may be specifically implemented as follows: according to the target type of the target and the target For the distance of the microphone, the first EQ frequency response curve is obtained from the fifth mapping table.
  • the fifth mapping table contains multiple target types, multiple distances, and the EQ frequency response curve corresponding to the target type i and the distance j.
  • the target type i is any one of multiple target types
  • the distance j is any one of multiple distances.
  • Multiple target types include the target type of the target, and multiple distances include the distance of the target relative to the microphone.
  • the first EQ frequency response curve is an EQ frequency response curve selected from a plurality of EQ frequency response curves.
  • the processor in the electronic device determines the first EQ frequency response curve according to the target type of the captured target and the distance of the captured target relative to the microphone, which may be specifically implemented as follows: from the sixth mapping table according to the target type of the captured target To obtain the second EQ frequency response curve.
  • the sixth mapping table contains multiple target types and the EQ frequency response curve corresponding to each target type in the multiple target types.
  • the multiple target types include the target type of the captured target; the first EQ frequency response compensation curve is obtained from the seventh mapping table according to the distance of the captured target relative to the microphone.
  • the seventh mapping table contains multiple distances and the EQ frequency response compensation curve corresponding to each distance in the multiple distances; multiple distances include the distance of the subject to the microphone.
  • the embodiment of the present application does not limit the order in which the processor in the electronic device determines the spatial enhancement orientation, the first gain control curve, the filter, the first gain control curve, and the first equalizer frequency response curve.
  • the following audio processing strategies can be determined: spatially enhanced azimuth, filter gain control curve, and EQ frequency response curve.
  • the determined audio processing strategy can be implemented through steps S104-S108 to process the audio signal.
  • the original audio signal obtained from the sound pickup component is sequentially subjected to spatial enhancement, enhanced filtering, gain control, and EQ equalization.
  • spatial enhancement please refer to the specific description of the aforementioned spatial enhancement concept, which will not be repeated here.
  • the enhanced filtering please refer to the detailed description of the aforementioned filter concept, which will not be repeated here.
  • gain control and EQ equalization reference may be made to the foregoing specific description of the concepts of gain control and EQ equalization, which will not be repeated here.
  • the order of performing steps S105-S107 may also be other orders, which is not limited in the embodiment of the present application.
  • the spatial enhancement and filtering are performed first, which can increase the proportion of the output audio signal from the subject and reduce The proportion of noise, so as to improve the processing effect of audio signals.
  • step S108 since the audio signal of the sound source other than the subject in the space can enhance the stereoscopic sense of the spatial sound field, the original audio signal picked up by the sound pickup component can be superimposed on the fourth audio signal to improve the stereoscopic output audio signal sense.
  • the processor in the electronic device may determine the original audio signal of each of the multiple channels according to the original audio signal picked up by the sound pickup component.
  • the sound pickup component in order to improve the stereoscopic effect of the audio signal, can be used to form A pair of orthogonal directivity outputs direct the two outputs to the left front and right front of the electronic device, respectively.
  • the output audio signal directed to the left front is regarded as the original audio signal of the left channel
  • the output audio signal directed to the front right is regarded as the original audio signal of the right channel.
  • Steps S101-S107 are performed on the original audio signal of the left channel to obtain the fourth audio signal of the left channel.
  • Steps S101-S107 are performed on the original audio signal of the right channel to obtain the fourth audio signal of the right channel. Then, the original audio signal of the right channel and the fourth audio signal of the right channel are superimposed to obtain the fifth audio signal of the right channel. The fifth audio signal of the left channel is played through the left channel, and the fifth audio signal of the right channel is played through the right channel.
  • the algorithm of the processor in the electronic device to determine the original audio signal of each channel in the multiple channels according to the original audio signal picked up by the sound pickup component can also be other algorithms, and the number of microphones included in the sound pickup component can also be more Or less, the number of sound channels in the electronic device may also be more or less, which is not limited in the embodiments of the present application.
  • the number of the subject obtained by focusing may be multiple.
  • steps S102-S108 it can be determined according to the positions of the plurality of captured objects in one of the following ways: a. Perform step S102 with the multiple captured objects as one captured target -S108; b. Steps S102-S108 are performed separately for each of the multiple subjects.
  • the processor in the electronic device may use the method b to perform steps S102-S108 separately for each of the multiple objects.
  • the processor in the electronic device detects that the angle range of the multiple subjects with respect to the microphone is less than or equal to the preset angle threshold, it indicates that the orientation of the multiple subjects with respect to the microphone is relatively concentrated, and can be used as a subject deal with.
  • the processor in the electronic device may use the method a to perform steps S102-S108 by using a plurality of subjects as one subject.
  • the processor in the electronic device when the processor in the electronic device detects that the number or proportion of the multiple target objects belonging to the same target type is greater than or equal to the set threshold, it indicates that the multiple target objects can be processed as the same target type.
  • the processor in the electronic device may use the method a to perform steps S102-S108 by using a plurality of subjects as one subject.
  • the processor in the electronic device may use method b to separate each of the multiple target objects The subject performs steps S102-S108 separately.
  • the processor in the electronic device may call the display screen to display the audio processing strategy for the user to select.
  • FIGS. 8 and 9 are examples of an audio processing strategy user interaction interface provided by an embodiment of the present application. The following are introduced separately.
  • the processor in the electronic device uses the display screen to display the recognized prompt operation control related to the target content and the recording scene, that is, the first operation control.
  • the processor in the electronic device executes steps S103-S108.
  • the processor in the electronic device uses the display screen to display the first display interface.
  • the first display interface displays an audio processing strategy adjustment area in which the display screen can display the target type selected by the user, the target orientation of the captured target, and the distance between the captured target and the microphone according to the detected user operation.
  • the selection list corresponding to the “type” in the audio processing strategy adjustment area displays the target types of multiple captured targets, and the target types of the multiple captured targets are available for the user to select.
  • the direction selection bar corresponding to the "direction of the target” contains angle values of multiple direction centers for the user to select.
  • the selection list corresponding to "distance” displays multiple distance gradients, which can be selected by the user.
  • the target type, the target orientation of the photographed target, and the distance between the photographed target and the microphone can be selected by default as parameter values obtained by the processor in the electronic device according to step S102.
  • the processor in the electronic device can adjust the target type of the target, the target orientation of the target, and the distance between the target and the microphone according to the received user operation.
  • the user operates the "confirmation" control in the display screen, for example, when touching, it indicates that the user has completed the parameter adjustment, and the processor in the electronic device performs steps S103-S108 according to the value of the adjusted parameter.
  • the user's confirmation of the audio processing strategy automatically obtained by the electronic device can improve the accuracy and convenience of audio processing strategy recognition.
  • the examples of the audio processing strategy user interaction interface shown in FIG. 8 and FIG. 9 are only used to explain the embodiments of the present application, and should not constitute a limitation.
  • the audio processing strategy user interaction interface may also have other designs.
  • the audio processing strategy adjustment area of FIG. 9 not only the target direction is displayed for the user to choose, but also the angle range is displayed for the user to choose.
  • the embodiment of the present application does not limit the specific design of the user interaction interface of the audio processing strategy.
  • the processor in the electronic device uses image recognition to determine the target type of the photographed target.
  • the processor in the electronic device can also use image recognition to determine one or more of the following: the orientation of the photographed target relative to the microphone and The distance relative to the microphone.
  • the processor in the electronic device determines the filter using the target type of the photographed target, and the processor in the electronic device can optionally perform: determine the orientation of the spatial enhancement according to the orientation of the photographed target relative to the microphone.
  • the processor in the electronic device can optionally perform: determining the gain control curve and the EQ frequency response curve according to the distance of the subject relative to the microphone and the type of target to which the subject belongs.
  • the processor in the electronic device may perform the audio processing method shown in FIG. 4 when starting to record audio and video. That is, after responding to the user operation of the first key in the scenario shown in FIG. 2, the processor in the electronic device executes the audio processing method shown in FIG. 4.
  • the processor in the electronic device executes the audio processing method shown in FIG. 4 when starting to record video, and can process the audio signal picked up by the sound pickup component in real time, and can automatically select audio processing in real time in the scene of recording and broadcasting Strategy, improve the convenience of audio processing strategy selection, and improve the processing effect of the audio signal under different recording target types or different recording scenarios.
  • the processor in the electronic device may perform the audio processing method shown in FIG. 4 after recording audio and video ends. That is, after responding to the user operation of the second key in the scenario shown in FIG. 2, the processor in the electronic device executes the audio processing method shown in FIG. 4.
  • the processor in the electronic device executes the audio processing method shown in FIG. 4 at the end of recording video, which can reduce the occupation of the processor during audio and video recording, improve the smoothness of the audio and video recording process, and improve the convenience of audio processing strategy selection And improve the processing effect of the audio signal in different recording target types or different recording scenes.
  • the processor in the electronic device may also perform the audio processing method shown in FIG. 4 when the audio and video recording ends and the audio and video signals recorded are stored in the memory.
  • the audio processing method shown in FIG. 4 is executed, which can reduce the occupation of the processor and improve the audio and video recording process during the recording of audio and video Fluency.
  • the audio processing method shown in FIG. 4 is performed on the audio and video signals only when the audio and video signals need to be saved and recorded, which reduces the waste of processor resources when the audio and video signals are not saved and saves processor resources .
  • FIG. 10 is a schematic structural diagram of an electronic device 100 according to an embodiment of the present application.
  • the electronic device 100 may include a processor 110, an external memory interface 120, an internal memory 121, a universal serial bus (USB) interface 130, a charging management module 140, a power management module 141, a battery 142, an antenna 1, an antenna 2 , Mobile communication module 150, wireless communication module 160, audio module 170, speaker 170A, receiver 170B, microphone 170C, headphone jack 170D, sensor module 180, key 190, motor 191, indicator 192, camera 193, display 194, and Subscriber identification module (SIM) card interface 195, etc.
  • SIM Subscriber identification module
  • the sensor module 180 may include a pressure sensor 180A, a gyro sensor 180B, an air pressure sensor 180C, a magnetic sensor 180D, an acceleration sensor 180E, a distance sensor 180F, a proximity light sensor 180G, a fingerprint sensor 180H, a temperature sensor 180J, a touch sensor 180K, and ambient light Sensor 180L, bone conduction sensor 180M, etc.
  • the electronic device 100 may be a mobile phone, a tablet computer, an independent camera device, or other devices including a camera and a microphone. It can be understood that the structure illustrated in the embodiment of the present invention does not constitute a specific limitation on the electronic device 100. In other embodiments of the present application, the electronic device 100 may include more or less components than shown, or combine some components, or split some components, or arrange different components. The illustrated components can be implemented in hardware, software, or a combination of software and hardware.
  • the processor 110 is used to read the program code stored in the memory 1302 and execute the audio processing method provided in the embodiment of the present application, such as the audio processing method described in FIG. 4.
  • the processor 110 is used to read the program code stored in the memory 1302, perform image recognition on the first image acquired by the camera component, and obtain the target type of the captured object in the first image, and the Azimuth and distance of the subject relative to the microphone.
  • the processor 110 is also used to read the program code stored in the memory 1302, and execute the audio processing strategy according to the target type of the subject, the orientation of the subject relative to the microphone 170C, and the distance of the subject relative to the microphone 170C;
  • the audio processing strategy processes the audio signal picked up by the microphone. Specifically, the audio signal picked up by the microphone 170C is subjected to spatial enhancement, filtering, gain control, and equalizer frequency response control according to the audio processing strategy.
  • the processor 110 is further used to read the program code stored in the memory 1302, and executes to superimpose the audio signal picked up by the microphone 170C and the fourth audio signal to obtain a fifth audio signal; the fourth audio signal is the audio signal picked up by the microphone 170C.
  • the audio signal obtained after spatial enhancement, filtering, gain control and equalizer frequency response control.
  • the fifth audio signal can be played using the speaker 170A, and the fifth audio signal can also be played through a wired headset connected to the headphone interface 170D.
  • the fifth audio signal may be an audio signal played synchronously when playing a video.
  • the camera component may include a camera 193.
  • the camera component may also include a video codec.
  • the sound pickup assembly may include a microphone 170C.
  • the sound pickup assembly may also include an audio module 170.
  • the processor 110 may include one or more processing units.
  • the processor 110 may include an application processor (application processor, AP), a modem processor, a graphics processor (graphics processing unit, GPU), and an image signal processor. (image) signal processor (ISP), controller, memory, video codec, digital signal processor (DSP), baseband processor, and / or neural-network processing unit (NPU) Wait.
  • image image signal processor
  • ISP image signal processor
  • controller memory
  • video codec digital signal processor
  • DSP digital signal processor
  • NPU neural-network processing unit
  • different processing units may be independent devices, or may be integrated in one or more processors.
  • the controller may be the nerve center and command center of the electronic device 100.
  • the controller can generate the operation control signal according to the instruction operation code and the timing signal to complete the control of fetching instructions and executing instructions.
  • the processor 110 may also be provided with a memory for storing instructions and data.
  • the memory in the processor 110 is a cache memory.
  • the memory may store instructions or data that the processor 110 has just used or recycled. If the processor 110 needs to use the instruction or data again, it can be directly called from the memory. The repeated access is avoided, and the waiting time of the processor 110 is reduced, thereby improving the efficiency of the system.
  • the processor 110 may include one or more interfaces.
  • Interfaces can include integrated circuit (inter-integrated circuit, I2C) interface, integrated circuit built-in audio (inter-integrated circuit, sound, I2S) interface, pulse code modulation (pulse code modulation (PCM) interface, universal asynchronous transceiver (universal asynchronous) receiver / transmitter, UART) interface, mobile industry processor interface (MIPI), general-purpose input / output (GPIO) interface, subscriber identity module (SIM) interface, and / Or universal serial bus (USB) interface, etc.
  • I2C integrated circuit
  • I2S integrated circuit built-in audio
  • PCM pulse code modulation
  • PCM pulse code modulation
  • UART universal asynchronous transceiver
  • MIPI mobile industry processor interface
  • GPIO general-purpose input / output
  • SIM subscriber identity module
  • USB universal serial bus
  • the I2C interface is a bidirectional synchronous serial bus, including a serial data line (serial data line, SDA) and a serial clock line (derail clock line, SCL).
  • the processor 110 may include multiple sets of I2C buses.
  • the processor 110 may be coupled to the touch sensor 180K, the charger, the flash, the camera 193, etc. through different I2C bus interfaces.
  • the processor 110 may couple the touch sensor 180K through the I2C interface, so that the processor 110 and the touch sensor 180K communicate through the I2C bus interface to realize the touch function of the electronic device 100.
  • the I2S interface can be used for audio communication.
  • the processor 110 may include multiple sets of I2S buses.
  • the processor 110 may be coupled to the audio module 170 through an I2S bus to implement communication between the processor 110 and the audio module 170.
  • the audio module 170 can transmit audio signals to the wireless communication module 160 through the I2S interface, to realize the function of answering the phone call through the Bluetooth headset.
  • the PCM interface can also be used for audio communication, sampling, quantizing and encoding analog signals.
  • the audio module 170 and the wireless communication module 160 may be coupled through a PCM bus interface.
  • the audio module 170 can also transmit audio signals to the wireless communication module 160 through the PCM interface to realize the function of answering the call through the Bluetooth headset. Both the I2S interface and the PCM interface can be used for audio communication.
  • the UART interface is a universal serial data bus used for asynchronous communication.
  • the bus may be a bidirectional communication bus. It converts the data to be transmitted between serial communication and parallel communication.
  • the UART interface is generally used to connect the processor 110 and the wireless communication module 160.
  • the processor 110 communicates with the Bluetooth module in the wireless communication module 160 through the UART interface to implement the Bluetooth function.
  • the audio module 170 may transmit audio signals to the wireless communication module 160 through the UART interface, so as to realize the function of playing music through the Bluetooth headset.
  • the MIPI interface can be used to connect the processor 110 to peripheral devices such as the display screen 194 and the camera 193.
  • MIPI interface includes camera serial interface (camera serial interface, CSI), display serial interface (display serial interface, DSI) and so on.
  • the processor 110 and the camera 193 communicate through a CSI interface to implement the shooting function of the electronic device 100.
  • the processor 110 and the display screen 194 communicate through the DSI interface to realize the display function of the electronic device 100.
  • the GPIO interface can be configured via software.
  • the GPIO interface can be configured as a control signal or a data signal.
  • the GPIO interface may be used to connect the processor 110 to the camera 193, the display screen 194, the wireless communication module 160, the audio module 170, the sensor module 180, and the like.
  • GPIO interface can also be configured as I2C interface, I2S interface, UART interface, MIPI interface, etc.
  • the USB interface 130 is an interface that conforms to the USB standard, and may specifically be a Mini USB interface, a Micro USB interface, a USB Type C interface, etc.
  • the USB interface 130 can be used to connect a charger to charge the electronic device 100, and can also be used to transfer data between the electronic device 100 and peripheral devices. It can also be used to connect headphones and play audio through the headphones.
  • the interface can also be used to connect other electronic devices, such as AR devices.
  • the interface connection relationship between the modules illustrated in the embodiments of the present invention is only a schematic description, and does not constitute a limitation on the structure of the electronic device 100.
  • the electronic device 100 may also use different interface connection methods in the foregoing embodiments, or a combination of multiple interface connection methods.
  • the charging management module 140 is used to receive charging input from the charger.
  • the charger can be a wireless charger or a wired charger.
  • the charging management module 140 may receive the charging input of the wired charger through the USB interface 130.
  • the charging management module 140 may receive wireless charging input through the wireless charging coil of the electronic device 100. While the charging management module 140 charges the battery 142, it can also supply power to the electronic device through the power management module 141.
  • the power management module 141 is used to connect the battery 142, the charging management module 140 and the processor 110.
  • the power management module 141 receives input from the battery 142 and / or the charging management module 140, and supplies power to the processor 110, the internal memory 121, the external memory, the display screen 194, the camera 193, and the wireless communication module 160.
  • the power management module 141 can also be used to monitor battery capacity, battery cycle times, battery health status (leakage, impedance) and other parameters.
  • the power management module 141 may also be disposed in the processor 110.
  • the power management module 141 and the charging management module 140 may also be set in the same device.
  • the wireless communication function of the electronic device 100 can be realized by the antenna 1, the antenna 2, the mobile communication module 150, the wireless communication module 160, the modem processor, and the baseband processor.
  • Antenna 1 and antenna 2 are used to transmit and receive electromagnetic wave signals.
  • Each antenna in the electronic device 100 may be used to cover a single or multiple communication frequency bands. Different antennas can also be reused to improve antenna utilization.
  • the antenna 1 can be multiplexed as a diversity antenna of a wireless local area network. In other embodiments, the antenna may be used in conjunction with a tuning switch.
  • the mobile communication module 150 can provide a wireless communication solution including 2G / 3G / 4G / 5G and the like applied to the electronic device 100.
  • the mobile communication module 150 may include at least one filter, switch, power amplifier, low noise amplifier (LNA), and the like.
  • the mobile communication module 150 can receive the electromagnetic wave from the antenna 1, filter and amplify the received electromagnetic wave, and transmit it to the modem processor for demodulation.
  • the mobile communication module 150 can also amplify the signal modulated by the modulation and demodulation processor and convert it to electromagnetic wave radiation through the antenna 1.
  • at least part of the functional modules of the mobile communication module 150 may be provided in the processor 110.
  • at least part of the functional modules of the mobile communication module 150 and at least part of the modules of the processor 110 may be provided in the same device.
  • the modem processor may include a modulator and a demodulator.
  • the modulator is used to modulate the low-frequency baseband signal to be transmitted into a high-frequency signal.
  • the demodulator is used to demodulate the received electromagnetic wave signal into a low-frequency baseband signal.
  • the demodulator then transmits the demodulated low-frequency baseband signal to the baseband processor for processing.
  • the low-frequency baseband signal is processed by the baseband processor and then passed to the application processor.
  • the application processor outputs a sound signal through an audio device (not limited to a speaker 170A, a receiver 170B, etc.), or displays an image or video through a display screen 194.
  • the modem processor may be an independent device.
  • the modem processor may be independent of the processor 110, and may be set in the same device as the mobile communication module 150 or other functional modules.
  • the wireless communication module 160 can provide wireless local area networks (wireless local area networks, WLAN) (such as wireless fidelity (Wi-Fi) networks), Bluetooth (bluetooth, BT), and global navigation satellites that are applied to the electronic device 100. Wireless communication solutions such as global navigation (satellite system, GNSS), frequency modulation (FM), near field communication (NFC), infrared (IR), etc.
  • the wireless communication module 160 may be one or more devices integrating at least one communication processing module.
  • the wireless communication module 160 receives the electromagnetic wave via the antenna 2, frequency-modulates and filters the electromagnetic wave signal, and sends the processed signal to the processor 110.
  • the wireless communication module 160 may also receive the signal to be transmitted from the processor 110, frequency-modulate it, amplify it, and convert it to electromagnetic waves through the antenna 2 to radiate it out.
  • the antenna 1 of the electronic device 100 and the mobile communication module 150 are coupled, and the antenna 2 and the wireless communication module 160 are coupled so that the electronic device 100 can communicate with the network and other devices through wireless communication technology.
  • the wireless communication technology may include global mobile communication system (global system for mobile communications, GSM), general packet radio service (general packet radio service, GPRS), code division multiple access (code division multiple access, CDMA), broadband Wideband code division multiple access (WCDMA), time-division code division multiple access (TD-SCDMA), long-term evolution (LTE), BT, GNSS, WLAN, NFC , FM, and / or IR technology, etc.
  • the GNSS may include a global positioning system (GPS), a global navigation satellite system (GLONASS), a beidou navigation system (BDS), and a quasi-zenith satellite system (quasi -zenith satellite system (QZSS) and / or satellite-based augmentation system (SBAS).
  • GPS global positioning system
  • GLONASS global navigation satellite system
  • BDS beidou navigation system
  • QZSS quasi-zenith satellite system
  • SBAS satellite-based augmentation system
  • the electronic device 100 realizes a display function through a GPU, a display screen 194, and an application processor.
  • the GPU is a microprocessor for image processing, connecting the display screen 194 and the application processor.
  • the GPU is used to perform mathematical and geometric calculations, and is used for graphics rendering.
  • the processor 110 may include one or more GPUs that execute program instructions to generate or change display information.
  • the display screen 194 is used to display images, videos and the like.
  • the display screen 194 includes a display panel.
  • the display panel may use a liquid crystal display (LCD), an organic light-emitting diode (OLED), an active matrix organic light-emitting diode or an active matrix organic light-emitting diode (active-matrix organic light) emitting diode, AMOLED), flexible light-emitting diode (FLED), Miniled, MicroLed, Micro-oLed, quantum dot light emitting diode (QLED), etc.
  • the electronic device 100 may include 1 or N display screens 194, where N is a positive integer greater than 1.
  • the electronic device 100 can realize a shooting function through an ISP, a camera 193, a video codec, a GPU, a display screen 194, an application processor, and the like.
  • the ISP processes the data fed back by the camera 193. For example, when taking a picture, the shutter is opened, the light is transmitted to the camera photosensitive element through the lens, the optical signal is converted into an electrical signal, and the camera photosensitive element transmits the electrical signal to the ISP for processing, and converts it into an image visible to the naked eye.
  • ISP can also optimize the algorithm of image noise, brightness and skin color. ISP can also optimize the exposure, color temperature and other parameters of the shooting scene. In some embodiments, the ISP may be set in the camera 193.
  • the camera 193 is used to capture still images or videos.
  • the object generates an optical image through the lens and projects it onto the photosensitive element.
  • the photosensitive element may be a charge coupled device (charge coupled device, CCD) or a complementary metal-oxide-semiconductor (CMOS) phototransistor.
  • CCD charge coupled device
  • CMOS complementary metal-oxide-semiconductor
  • the photosensitive element converts the optical signal into an electrical signal, and then transmits the electrical signal to the ISP to convert it into a digital image signal.
  • the ISP outputs the digital image signal to the DSP for processing.
  • DSP converts digital image signals into standard RGB, YUV and other image signals.
  • the electronic device 100 may include 1 or N cameras 193, where N is a positive integer greater than 1. When N is greater than 2, the electronic device can use these N cameras to measure the distance between the subject and the camera.
  • the digital signal processor is used to process digital signals. In addition to digital image signals, it can also process other digital signals. For example, when the electronic device 100 is selected at a frequency point, the digital signal processor is used to perform Fourier transform on the energy at the frequency point.
  • Video codec is used to compress or decompress digital video.
  • the electronic device 100 may support one or more video codecs. In this way, the electronic device 100 can play or record videos in various encoding formats, for example: moving picture experts group (MPEG) 1, MPEG2, MPEG3, MPEG4, etc.
  • MPEG moving picture experts group
  • NPU is a neural-network (NN) computing processor.
  • NN neural-network
  • the NPU can realize applications such as intelligent recognition of the electronic device 100, such as image recognition, face recognition, voice recognition, and text understanding.
  • the external memory interface 120 can be used to connect an external memory card, such as a Micro SD card, to expand the storage capacity of the electronic device 100.
  • the external memory card communicates with the processor 110 through the external memory interface 120 to realize the data storage function. For example, save music, video and other files in an external memory card.
  • the internal memory 121 may be used to store computer executable program code, where the executable program code includes instructions.
  • the processor 110 executes various functional applications and data processing of the electronic device 100 by executing instructions stored in the internal memory 121.
  • the internal memory 121 may include a storage program area and a storage data area.
  • the storage program area may store an operating system, at least one function required application programs (such as sound playback function, image playback function, etc.) and so on.
  • the storage data area may store data (such as audio data, phone book, etc.) created during use of the electronic device 100 and the like.
  • the internal memory 121 may include a high-speed random access memory, and may also include a non-volatile memory, such as at least one disk storage device, a flash memory device, a universal flash memory (universal flash storage, UFS), and so on.
  • a non-volatile memory such as at least one disk storage device, a flash memory device, a universal flash memory (universal flash storage, UFS), and so on.
  • the electronic device 100 may implement audio functions through an audio module 170, a speaker 170A, a receiver 170B, a microphone 170C, a headphone interface 170D, and an application processor. For example, music playback, recording, etc.
  • the audio module 170 is used to convert digital audio information into analog audio signal output, and also used to convert analog audio input into digital audio signal.
  • the audio module 170 can also be used to encode and decode audio signals.
  • the audio module 170 may be disposed in the processor 110, or some functional modules of the audio module 170 may be disposed in the processor 110.
  • the speaker 170A also called “speaker” is used to convert audio electrical signals into sound signals.
  • the electronic device 100 can listen to music through the speaker 170A, or listen to a hands-free call.
  • the receiver 170B also known as "handset" is used to convert audio electrical signals into sound signals.
  • the voice can be received by bringing the receiver 170B close to the ear.
  • Microphone 170C also known as “microphone”, “microphone”, is used to convert sound signals into electrical signals.
  • the user can make a sound by approaching the microphone 170C through a person's mouth, and input a sound signal to the microphone 170C.
  • the electronic device 100 may be provided with at least one microphone 170C. In other embodiments, the electronic device 100 may be provided with two microphones 170C. In addition to collecting sound signals, it may also implement a noise reduction function. In other embodiments, the electronic device 100 may also be provided with three, four, or more microphones 170C to collect sound signals, reduce noise, identify sound sources, and implement directional recording functions.
  • the headset interface 170D is used to connect wired headsets.
  • the earphone interface 170D may be a USB interface 130, or a 3.5mm open mobile electronic device (open mobile terminal) platform (OMTP) standard interface, the American Telecommunications Industry Association (cellular telecommunications industry association of the United States, CTIA) standard interface.
  • OMTP open mobile electronic device
  • CTIA cellular telecommunications industry association of the United States
  • the pressure sensor 180A is used to sense the pressure signal and can convert the pressure signal into an electrical signal.
  • the pressure sensor 180A may be provided on the display screen 194.
  • the capacitive pressure sensor may be a parallel plate including at least two conductive materials. When force is applied to the pressure sensor 180A, the capacitance between the electrodes changes.
  • the electronic device 100 determines the intensity of the pressure according to the change in capacitance.
  • the electronic device 100 detects the intensity of the touch operation according to the pressure sensor 180A.
  • the electronic device 100 may also calculate the touched position based on the detection signal of the pressure sensor 180A.
  • touch operations that act on the same touch position but have different touch operation intensities may correspond to different operation instructions. For example, when a touch operation with a touch operation intensity less than the first pressure threshold acts on the short message application icon, an instruction to view the short message is executed. When a touch operation with a touch operation intensity greater than or equal to the first pressure threshold acts on the short message application icon, an instruction to create a new short message is executed.
  • the gyro sensor 180B may be used to determine the movement posture of the electronic device 100.
  • the angular velocity of the electronic device 100 around three axes ie, x, y, and z axes
  • the gyro sensor 180B can be used for shooting anti-shake.
  • the gyro sensor 180B detects the shaking angle of the electronic device 100, calculates the distance that the lens module needs to compensate based on the angle, and allows the lens to counteract the shaking of the electronic device 100 through reverse movement to achieve anti-shake.
  • the gyro sensor 180B can also be used for navigation and somatosensory game scenes.
  • the air pressure sensor 180C is used to measure air pressure.
  • the electronic device 100 calculates the altitude by using the air pressure value measured by the air pressure sensor 180C to assist positioning and navigation.
  • the magnetic sensor 180D includes a Hall sensor.
  • the electronic device 100 can detect the opening and closing of the flip holster using the magnetic sensor 180D.
  • the electronic device 100 may detect the opening and closing of the clamshell according to the magnetic sensor 180D.
  • characteristics such as automatic unlocking of the flip cover are set.
  • the acceleration sensor 180E can detect the magnitude of acceleration of the electronic device 100 in various directions (generally three axes). When the electronic device 100 is stationary, the magnitude and direction of gravity can be detected. It can also be used to identify the posture of electronic devices, and be used in applications such as horizontal and vertical screen switching and pedometers.
  • the distance sensor 180F is used to measure the distance.
  • the electronic device 100 can measure the distance by infrared or laser. In some embodiments, when shooting scenes, the electronic device 100 may use the distance sensor 180F to measure distance to achieve fast focusing.
  • the proximity light sensor 180G may include, for example, a light emitting diode (LED) and a light detector, such as a photodiode.
  • the light emitting diode may be an infrared light emitting diode.
  • the electronic device 100 emits infrared light outward through the light emitting diode.
  • the electronic device 100 uses a photodiode to detect infrared reflected light from nearby objects. When sufficient reflected light is detected, it may be determined that there is an object near the electronic device 100. When insufficient reflected light is detected, the electronic device 100 may determine that there is no object near the electronic device 100.
  • the electronic device 100 can use the proximity light sensor 180G to detect that the user holds the electronic device 100 close to the ear to talk, so as to automatically turn off the screen to save power.
  • the proximity light sensor 180G can also be used in leather case mode, pocket mode automatically unlocks and locks the screen.
  • the ambient light sensor 180L is used to sense the brightness of ambient light.
  • the electronic device 100 can adaptively adjust the brightness of the display screen 194 according to the perceived brightness of the ambient light.
  • the ambient light sensor 180L can also be used to automatically adjust the white balance when taking pictures.
  • the ambient light sensor 180L can also cooperate with the proximity light sensor 180G to detect whether the electronic device 100 is in a pocket to prevent accidental touch.
  • the fingerprint sensor 180H is used to collect fingerprints.
  • the electronic device 100 can use the collected fingerprint characteristics to realize fingerprint unlocking, access to application locks, fingerprint taking pictures, fingerprint answering calls, and the like.
  • the temperature sensor 180J is used to detect the temperature.
  • the electronic device 100 uses the temperature detected by the temperature sensor 180J to execute a temperature processing strategy. For example, when the temperature reported by the temperature sensor 180J exceeds a threshold, the electronic device 100 performs performance reduction of the processor located near the temperature sensor 180J in order to reduce power consumption and implement thermal protection. In other embodiments, when the temperature is lower than another threshold, the electronic device 100 heats the battery 142 to avoid abnormal shutdown of the electronic device 100 due to low temperature. In some other embodiments, when the temperature is below another threshold, the electronic device 100 performs boosting on the output voltage of the battery 142 to avoid abnormal shutdown due to low temperature.
  • Touch sensor 180K also known as "touch panel”.
  • the touch sensor 180K may be provided on the display screen 194, and the touch sensor 180K and the display screen 194 constitute a touch screen, also called a "touch screen”.
  • the touch sensor 180K is used to detect a touch operation acting on or near it.
  • the touch sensor can pass the detected touch operation to the application processor to determine the type of touch event.
  • the visual output related to the touch operation can be provided through the display screen 194.
  • the touch sensor 180K may also be disposed on the surface of the electronic device 100, which is different from the location where the display screen 194 is located.
  • the bone conduction sensor 180M can acquire vibration signals.
  • the bone conduction sensor 180M can acquire the vibration signal of the vibrating bone mass of the human body part.
  • the bone conduction sensor 180M can also contact the pulse of the human body and receive a blood pressure beating signal.
  • the bone conduction sensor 180M may also be provided in the earphone and combined into a bone conduction earphone.
  • the audio module 170 may parse out the voice signal based on the vibration signal of the vibrating bone block of the voice part acquired by the bone conduction sensor 180M to realize the voice function.
  • the application processor may analyze the heart rate information based on the blood pressure beating signal acquired by the bone conduction sensor 180M to implement the heart rate detection function.
  • the key 190 includes a power-on key, a volume key, and the like.
  • the key 190 may be a mechanical key. It can also be a touch button.
  • the electronic device 100 can receive key input and generate key signal input related to user settings and function control of the electronic device 100.
  • the motor 191 may generate a vibration prompt.
  • the motor 191 can be used for vibration notification of incoming calls and can also be used for touch vibration feedback.
  • touch operations applied to different applications may correspond to different vibration feedback effects.
  • the motor 191 can also correspond to different vibration feedback effects.
  • Different application scenarios for example: time reminder, receiving information, alarm clock, game, etc.
  • Touch vibration feedback effect can also support customization.
  • the indicator 192 may be an indicator light, which may be used to indicate a charging state, a power change, and may also be used to indicate a message, a missed call, a notification, and the like.
  • the SIM card interface 195 is used to connect a SIM card.
  • the SIM card can be inserted into or removed from the SIM card interface 195 to achieve contact and separation with the electronic device 100.
  • the electronic device 100 may support 1 or N SIM card interfaces, where N is a positive integer greater than 1.
  • the SIM card interface 195 can support Nano SIM cards, Micro SIM cards, SIM cards, etc.
  • the same SIM card interface 195 can insert multiple cards at the same time. The types of the multiple cards may be the same or different.
  • the SIM card interface 195 can also be compatible with different types of SIM cards.
  • the SIM card interface 195 can also be compatible with external memory cards.
  • the electronic device 100 interacts with the network through a SIM card to realize functions such as call and data communication.
  • the electronic device 100 uses eSIM, that is, an embedded SIM card.
  • the eSIM card can be embedded in the electronic device 100 and cannot be separated from the electronic device 100.
  • An embodiment of the present application also provides an electronic device including a module or unit for implementing the audio processing method described in FIG. 4.
  • An embodiment of the present application further provides a chip system, the chip system includes at least one processor, a memory and an interface circuit, the memory, the interface circuit and the at least one processor are connected, the at least one memory stores program instructions; When the program instructions are executed by the processor, the audio processing method described in FIG. 4 can be realized.
  • Embodiments of the present application also provide a computer-readable storage medium, in which program instructions are stored, and when the program instructions are executed by a processor, the audio processing method described in FIG. 4 is implemented.
  • all or part of the functions may be implemented by software, hardware, or a combination of software and hardware.
  • the computer program product includes one or more computer instructions.
  • the computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable devices.
  • the computer instructions may be stored in a computer-readable storage medium.
  • the computer-readable storage medium may be any available medium that can be accessed by a computer or a data storage device including a server, a data center, and the like integrated with one or more available media.
  • the usable medium may be a magnetic medium (eg, floppy disk, hard disk, magnetic tape), optical medium (eg, DVD), or semiconductor medium (eg, solid state disk (SSD)), or the like.
  • the process may be completed by a computer program instructing relevant hardware.
  • the program may be stored in a computer-readable storage medium.
  • When the program is executed May include the processes of the foregoing method embodiments.
  • the foregoing storage media include various media that can store program codes, such as ROM or random storage memory RAM, magnetic disks, or optical disks.

Landscapes

  • Engineering & Computer Science (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • Quality & Reliability (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Circuit For Audible Band Transducer (AREA)

Abstract

本申请实施例提供一种音频处理方法和电子设备,该音频处理方法包括:对摄像组件获取的第一图像进行图像识别,得到第一图像中被摄目标的目标类型、被摄目标相对于麦克风的方位和被摄目标相对于麦克风的距离;根据被摄目标的目标类型、被摄目标相对于麦克风的方位和被摄目标相对于麦克风的距离确定音频处理策略;根据音频处理策略对麦克风拾取的音频信号进行处理。实施本申请实施例,可以提高音频处理策略选择的便利性。

Description

音频处理方法和电子设备 技术领域
本申请涉及电子技术领域,尤其涉及一种音频处理方法和电子设备。
背景技术
录音应用是电子设备用户最为重要的多媒体影音体验之一。由于录音场景的复杂性,用户录音目的的多样性,导致用户对录音的效果在不同场景下存在多样化的需求。例如,在课堂、会议等场景,为了提升录音记录的清晰度,需要对主讲人的语音进行增强,而对其他的噪声干扰进行衰减。又例如,在古典乐器演奏等音乐录制场合,则强调录音的保真性,避免过度处理带来的音质损伤。再例如,在自拍录像、直播等近场人声录音场景,需要弱化远场声音,保证近场声音干净清晰。
为了提升用户体验,近年来电子设备上出现越来越多的录音模式来适应不同的录音场景和不同的录音目的。不同模式下电子设备对接收到的原始音频信号的参数处理存在差异。参数处理例如可以包含数字滤波、增益控制和均衡器(equalizer,EQ)频响控制。
例如,如图1所示,用户可以选取电子设备上的各个不同的录音模式。录音模式可以包含课堂、会议等场景对应设置“会议模式”,音乐录制场合的场景对应设置“音乐模式”,近场录音场景对应设置“人声模式”,访谈、采访对应设置的“采访模式”,被录目标较远时对应的“远距离模式”,被录目标为自然环境时对应设置的“自然环境模式”等等。用户可以在电子设备上选取不同的模式来适应不同的录音场景和不同的录音目的。用户选取模式可以是通过触控屏触控选取,也可以是利用电子设备对应的遥控设备遥控选取。
越来越多的录音模式增加了用户操作的复杂度,且录音场景的细化导致用户对不同录音场景不易理解,易出现选择场景错误的情况,从而增加了录音场景确定的复杂度。
发明内容
本申请技术方案公开了一种音频处理方法和电子设备,可以提高音频处理策略选择的便利性。
第一方面,本申请技术方案提供一种音频处理方法,所述方法包括:对摄像组件获取的第一图像进行图像识别,得到所述第一图像中被摄目标的目标类型、所述被摄目标相对于麦克风的方位和所述被摄目标相对于所述麦克风的距离;根据所述被摄目标的目标类型、所述被摄目标相对于麦克风的方位和所述被摄目标相对于所述麦克风的距离确定音频处理策略;根据所述音频处理策略对所述麦克风拾取的音频信号进行处理。
上述的音频处理方法中,可以利用图像识别的方法确定对麦克风拾取的音频信 号进行处理的音频处理策略。从而可以提高音频处理策略选择的便利性,并提高音频信号的处理效果。
作为一种可能的技术方案,所述根据所述被摄目标的目标类型、所述被摄目标相对于麦克风的方位和所述被摄目标相对于所述麦克风的距离确定音频处理策略,包括:根据所述被摄目标相对于麦克风的方位确定空间增强的方位;根据所述被摄目标的目标类型确定滤波器;根据所述被摄目标的目标类型和所述被摄目标相对于所述麦克风的距离确定第一增益控制曲线和第一均衡器频响曲线;所述音频处理策略包括所述空间增强的方位、所述滤波器、所述第一增益控制曲线和所述第一均衡器频响曲线。
本申请技术方案对电子设备中处理器确定空间增强的方位、滤波器、第一增益控制曲线、第一增益控制曲线和第一均衡器频响曲线的先后顺序可以不作限定。
作为一种可能的技术方案,所述根据所述音频处理策略对所述麦克风拾取的音频信号进行处理,包括:根据所述音频处理策略对所述麦克风拾取的音频信号进行空间增强、滤波、增益控制和均衡器频响控制。
作为一种可能的技术方案,所述根据所述音频处理策略对所述麦克风拾取的音频信号进行空间增强、滤波、增益控制和均衡器频响控制,包括:对原始音频信号在所述空间增强的方位上进行空间增强,得到第一音频信号;所述原始音频信号是所述麦克风拾取的音频信号;使用所述滤波器对所述第一音频信号进行滤波,得到第二音频信号;使用所述第一增益控制曲线对所述第二音频信号进行增益控制,得到第三音频信号;使用所述第一均衡器频响曲线对所述第三音频信号进行均衡器频响控制,得到第四音频信号。
其中,电子设备中处理器也可以利用图像识别得到图像场景作为录音场景。电子设备中处理器可以根据以下一个或多个来确定第一增益控制曲线和第一均衡器频响曲线:被录目标的目标类型、录音场景、被录目标与麦克风之间的距离。
作为一种可能的技术方案,所述根据所述被摄目标相对于麦克风的方位确定空间增强的方位,包括:将所述被摄目标相对于麦克风的方位确定为所述麦克风拾取的音频信号的空间增强的方位;所述根据被摄目标的目标类型确定滤波器,包括:根据所述被摄目标的目标类型从第一映射表中获取滤波器;其中,所述第一映射表中包含多个目标类型以及所述多个目标类型中每个目标类型对应的滤波器;所述多个目标类型包含所述被摄目标的目标类型。
其中,所述根据所述被摄目标的目标类型和所述被摄目标相对于所述麦克风的距离确定所述第一增益控制曲线,可为:根据所述被摄目标的目标类型和所述被摄目标相对于所述麦克风的距离,从第二映射表中获取所述第一增益控制曲线;其中,所述第二映射表中包含多个目标类型、多个距离,以及目标类型i和距离j共同对应的增益控制曲线;其中,所述目标类型i为所述多个目标类型中任一个目标类型,所述距离j为所述多个距离中任一个距离;所述多个目标类型包含所述被摄目标的目标类型,所述多个距离包含所述被摄目标相对于所述麦克风的距离。
其中,所述根据所述被摄目标的目标类型和所述被摄目标相对于所述麦克风的距离确定所述第一增益控制曲线,也可为:根据所述被摄目标的目标类型从第三映 射表中获取第二增益控制曲线;其中,所述第三映射表中包含多个目标类型以及所述多个目标类型中每个目标类型对应的增益控制曲线;所述多个目标类型包含所述被摄目标的目标类型;根据所述被摄目标相对于所述麦克风的距离从第四映射表中获取第一增益补偿曲线;其中,所述第四映射表中包含多个距离以及所述多个距离中每个距离对应的增益补偿曲线;所述多个距离包含所述被摄目标相对于所述麦克风的距离。
其中,所述根据所述被摄目标的目标类型和所述被摄目标相对于所述麦克风的距离确定所述第一EQ频响曲线,可为:根据所述被摄目标的目标类型和所述被摄目标相对于所述麦克风的距离,从第五映射表中获取所述第一EQ频响曲线;其中,所述第五映射表中包含多个目标类型、多个距离,以及所述目标类型i和距离j共同对应的EQ频响曲线;其中,所述目标类型i为所述多个目标类型中任一个目标类型,所述距离j为所述多个距离中任一个距离;所述多个目标类型包含所述被摄目标的目标类型,所述多个距离包含所述被摄目标相对于所述麦克风的距离。
其中,所述根据所述被摄目标的目标类型和所述被摄目标相对于所述麦克风的距离确定所述第一EQ频响曲线,可为:根据所述被摄目标的目标类型从第六映射表中获取第二EQ频响曲线;其中,所述第六映射表中包含多个目标类型以及所述多个目标类型中每个目标类型对应的EQ频响曲线;所述多个目标类型包含所述被摄目标的目标类型;根据所述被摄目标相对于所述麦克风的距离从第七映射表中获取第一EQ频响补偿曲线;其中,所述第七映射表中包含多个距离以及所述多个距离中每个距离对应的EQ频响补偿曲线;所述多个距离包含所述被摄目标相对于所述麦克风的距离。
作为一种可能的技术方案,所述对摄像组件获取的第一图像进行图像识别,得到所述被摄目标的目标类型、所述被摄目标相对于麦克风的方位和所述被摄目标相对于所述麦克风的距离,可具体实施为:对所述第一图像进行图像识别得到被摄目标的图像内容,根据所述被摄目标的图像内容从第八映射表中获取所述被摄目标的目标类型;其中,所述第八映射表中包含多个图像内容以及所述多个图像内容中每个图像内容对应的目标类型;所述多个图像内容包含所述被摄目标的图像内容;根据所述被摄目标的图像内容和所述第一图像中被摄目标聚焦得到的二维图框的尺寸,从第九映射表中获取所述被摄目标相对于所述麦克风的距离;其中,所述第九映射表中包含多个图像内容、多个二维图框尺寸,以及图像内容k和二维图框尺寸l共同对应的距离;其中,所述图像内容k为所述多个图像内容中任一个图像内容,所述二维图框尺寸l为所述多个二维图框尺寸中任一个二维图框尺寸;所述多个图像内容包含所述被摄目标的图像内容,所述多个二维图框尺寸包含所述被摄目标聚焦得到的二维图框的尺寸;获取所述被摄目标聚焦得到的二维图框中包含的坐标点;根据所述二维图框中包含的坐标点从第十映射表中获取所述被摄目标上的点相对于麦克风的方位;其中,所述第十映射表中包含多个坐标点和所述多个坐标点中每个坐标点对应的方位;所述多个坐标点包含所述二维图框中包含的坐标点。
其中,被摄目标聚焦得到二维图像框可以是利用电子设备中自动对焦原理实现。被摄目标的聚焦也可以是响应于用户手动对焦操作实现的,即被摄目标聚焦得到二 维图像框也可以是响应于用户手动对焦操作得到的。
其中,被摄目标相对于所述麦克风的距离还可以利用多摄像头测距确定。例如在两个摄像头的场景下,利用公式Z=ft/d确定被摄目标到摄像头的距离。其中,Z为被摄目标到摄像头的距离,f为这两个摄像头的焦距,d为被摄目标在两个摄像头对应图像上坐标位置的距离差,t为两个摄像头之间的物理距离。
在被摄目标相对于电子设备距离足够远时,摄像头和麦克风之间的距离可以忽略,则无需进行坐标系转换,可以直接将被摄目标相对于摄像头的距离作为被摄目标相对于麦克风的距离,被摄目标相对于摄像头的方位作为被摄目标相对于麦克风的方位。
在摄像头和麦克风之间的距离不可忽略时,则可以利用公式(2)进行坐标系转换,得到被摄目标上的点在麦克风为原点的三维坐标系的坐标,进而得到被摄目标相对于麦克风的距离和被摄目标相对于麦克风的方位。
Figure PCTCN2019110095-appb-000001
可以理解的,被摄目标与麦克风之间的距离还可以采用其他方式,例如利用结构光测距。本申请技术方案对被摄目标与麦克风之间的距离的测量方式不作限定。
作为一种可能的技术方案,所述根据所述音频处理策略对所述麦克风拾取的音频信号进行空间增强、滤波、增益控制和EQ频响控制之后,所述方法还包括:将所述麦克风拾取的音频信号与第四音频信号叠加得到第五音频信号;所述第四音频信号为所述麦克风拾取的音频信号经过空间增强、滤波、增益控制和EQ频响控制之后得到的音频信号。
其中,第五音频信号可以是处理完成后用于音频输出的音频信号。
可以理解的,对麦克风拾取的音频信号进行处理的过程,本申请技术方案对空间增强、滤波、增益控制和EQ频响控制的先后顺序不作限定。
其中,可以按照空间增强、滤波、增益控制和EQ频响控制的顺序执行,首先执行空间增强和滤波,可以提升处理得到的音频信号中来自被摄目标的音频信号所占的比例,降低噪声所占的比例,从而提升对音频信号的处理效果。
作为一种可能的技术方案,所述根据所述音频处理策略对所述麦克风拾取的音频信号进行空间增强、滤波、增益控制和EQ频响控制,可以具体实施为:根据所述麦克风拾取的音频信号确定多个声道中每个声道的原始音频信号;根据所述音频处理策略对所述每个声道的原始音频信号进行空间增强、滤波、增益控制和EQ频响控制。左右声道分别单独执行空间增强、滤波、增益控制和EQ频响控制,两个声道之间的音频信号处理和播放互不影响,从而可以提高输出音频信号的立体感。
作为一种可能的技术方案,所述根据所述音频处理策略对所述麦克风拾取的音频信号进行处理之前,所述方法还包括:显示所述音频处理策略;所述根据所述音频处理策略对所述麦克风拾取的音频信号进行处理,包括:响应于用户对所述音频处理策略的操作,根据所述音频处理策略对所述麦克风拾取的音频信号进行空间增 强、滤波、增益控制和EQ频响控制。通过用户对电子设备自动识别得到的音频处理策略进行确认,可以提高音频处理策略识别的准确性和便利性。
其中,电子设备中处理器可以是在开始录制音视频时,执行对麦克风拾取的音频信号进行处理。可以实时对拾音组件拾取的音频信号进行处理,在边录边播的场景下,可以实时的自动选择音频处理策略,提高音频处理策略选择的便利性,并提高不同被摄目标的目标类型或者不同录音场景下音频信号的处理效果。
其中,电子设备中处理器可以是在录制音视频结束后,执行对麦克风拾取的音频信号进行处理。可以在录制音视频过程中减少处理器的占用,提高音视频录制过程的流畅度,提高音频处理策略选择的便利性,并提高不同被摄目标的目标类型或者不同录音场景下音频信号的处理效果。
其中,电子设备中处理器还可以是在录制音视频结束,将录制得到音视频信号存储进存储器时,执行对麦克风拾取的音频信号进行处理。可以在录制音视频过程中减少处理器的占用,提高音视频录制过程的流畅度。这样,在需要保存录制得到音视频信号时才执行对麦克风拾取的音频信号进行处理,减少不需要保存录制得到音视频信号时对处理器资源的浪费,从而可以节省处理器资源。
第二方面,本申请技术方案提供一种音频处理方法,所述方法包括:对摄像组件获取的第一图像进行图像识别,得到所述第一图像中被摄目标的目标类型;根据所述被摄目标的目标类型确定滤波器;使用所述滤波器对麦克风拾取的音频信号进行滤波。
上述的音频处理方法中,可以利用图像识别的方法确定对麦克风拾取的音频信号进行处理的滤波器。从而可以提高音频处理策略选择的便利性,并提高音频信号的处理效果。
在一种可能的技术方案中,所述使用所述滤波器对麦克风拾取的音频信号进行滤波之前,所述方法还包括:根据所述图像识别得到所述被摄目标相对于所述麦克风的方位;对原始音频信号在所述被摄目标相对于麦克风的方位上进行空间增强,得到第一音频信号;所述原始音频信号是所述麦克风拾取到的音频信号;所述使用所述滤波器对麦克风拾取的音频信号进行滤波,可具体实施为:使用所述滤波器对所述第一音频信号进行滤波,得到第二音频信号。上述的音频处理方法中,还可以利用图像识别的方法确定空间增强的方向,从而可以进一步提高音频信号的处理效果。
其中,在对音频信号进行滤波之前进行空间增强,可以提升处理得到的音频信号中来自被摄目标的音频信号所占的比例,降低噪声所占的比例,从而提升对音频信号的处理效果。
本申请技术方案对电子设备中处理器确定空间增强的方位、滤波器的先后顺序不作限定。
在一种可能的技术方案中,所述方法还包括:根据所述图像识别,得到所述被摄目标相对于所述麦克风的距离;根据所述被摄目标的目标类型和所述被摄目标相对于所述麦克风的距离,确定所述第一增益控制曲线和第一均衡器频响曲线;所述 使用所述滤波器对麦克风拾取的音频信号进行滤波之后,所述方法还包括:使用所述第一增益控制曲线对第二音频信号进行增益控制,得到第三音频信号;所述第二音频信号为所述滤波器对所述麦克风拾取到的音频信号进行滤波得到的音频信号;使用所述第一均衡器频响曲线对所述第三音频信号进行均衡器频响控制,得到第四音频信号。
其中,在对音频信号进行增益控制、EQ控制之前进行空间增强、滤波,可以提升处理得到的音频信号中来自被摄目标的音频信号所占的比例,降低噪声所占的比例,从而提升对音频信号的处理效果。
本申请技术方案对电子设备中处理器确定空间增强的方位、滤波器、第一增益控制曲线和第一均衡器频响曲线的先后顺序不作限定。
其中,电子设备中处理器也可以利用图像识别得到图像场景作为录音场景。电子设备中处理器可以根据以下一个或多个来确定第一增益控制曲线和第一均衡器频响曲线:被录目标的目标类型、录音场景、被录目标与麦克风之间的距离。
在一种可能的技术方案中,所述根据所述被摄目标的目标类型确定滤波器,可具体实施为:根据所述被摄目标的目标类型从第一映射表中获取所述滤波器;其中,所述第一映射表中包含多个目标类型以及所述多个目标类型中每个目标类型对应的滤波器;所述多个目标类型包含所述被摄目标的目标类型。
在一种可能的技术方案中,所述根据所述被摄目标的目标类型和所述被摄目标相对于所述麦克风的距离,确定所述第一增益控制曲线,可为:根据所述被摄目标的目标类型和所述被摄目标相对于所述麦克风的距离,从第二映射表中获取所述第一增益控制曲线;所述第二映射表中包含多个目标类型、多个距离,以及目标类型i和距离j共同对应的增益控制曲线;其中,所述目标类型i为所述多个目标类型中任一个目标类型,所述距离j为所述多个距离中任一个距离;所述多个目标类型包含所述被摄目标的目标类型,所述多个距离包含所述被摄目标相对于所述麦克风的距离。
在一种可能的技术方案中,所述根据所述被摄目标的目标类型和所述被摄目标相对于所述麦克风的距离,确定所述第一增益控制曲线,还可为:根据所述被摄目标的目标类型从第三映射表中获取第二增益控制曲线;其中,所述第三映射表中包含多个目标类型以及所述多个目标类型中每个目标类型对应的增益控制曲线;所述多个目标类型包含所述被摄目标的目标类型;根据所述被摄目标相对于所述麦克风的距离从第四映射表中获取第一增益补偿曲线;其中,所述第四映射表中包含多个距离以及所述多个距离中每个距离对应的增益补偿曲线;所述多个距离包含所述被摄目标相对于所述麦克风的距离。
在一种可能的技术方案中,所述根据所述被摄目标的目标类型和所述被摄目标相对于所述麦克风的距离,确定所述第一均衡器频响曲线,可为:根据所述被摄目标的目标类型和所述被摄目标相对于所述麦克风的距离,从第五映射表中获取所述第一均衡器频响曲线;其中,所述第五映射表中包含多个目标类型、多个距离,以及所述目标类型i和距离j共同对应的均衡器频响曲线;其中,所述目标类型i为所述多个目标类型中任一个目标类型,所述距离j为所述多个距离中任一个距离;所述多个目标类型包含所述被摄目标的目标类型,所述多个距离包含所述被摄目标相对 于所述麦克风的距离。
在一种可能的技术方案中,所述根据所述被摄目标的目标类型和所述被摄目标相对于所述麦克风的距离,确定所述第一均衡器频响曲线,还可为:根据所述被摄目标的目标类型从第六映射表中获取第二均衡器频响曲线;其中,所述第六映射表中包含多个目标类型以及所述多个目标类型中每个目标类型对应的均衡器频响曲线;所述多个目标类型包含所述被摄目标的目标类型;根据所述被摄目标相对于所述麦克风的距离从第七映射表中获取第一均衡器频响补偿曲线;其中,所述第七映射表中包含多个距离以及所述多个距离中每个距离对应的均衡器频响补偿曲线;所述多个距离包含所述被摄目标相对于所述麦克风的距离。
在一种可能的技术方案中,对摄像组件获取的第一图像进行图像识别,得到所述第一图像中被摄目标的目标类型,可为:对所述第一图像进行图像识别得到被摄目标的图像内容,根据所述被摄目标的图像内容从第八映射表中获取所述被摄目标的目标类型;其中,所述第八映射表中包含多个图像内容以及所述多个图像内容中每个图像内容对应的目标类型;所述多个图像内容包含所述被摄目标的图像内容。
在一种可能的技术方案中,所述根据所述图像识别得到所述被摄目标相对于所述麦克风的方位,可为:获取所述被摄目标聚焦得到的二维图框中包含的坐标点;根据所述二维图框中包含的坐标点从第十映射表中获取所述被摄目标上的点相对于麦克风的方位;其中,所述第十映射表中包含多个坐标点和所述多个坐标点中每个坐标点对应的方位;所述多个坐标点包含所述二维图框中包含的坐标点。
在一种可能的技术方案中,所述根据所述图像识别,得到所述被摄目标相对于所述麦克风的距离,可为:根据所述被摄目标的图像内容和所述第一图像中被摄目标聚焦得到的二维图框的尺寸,从第九映射表中获取所述被摄目标相对于所述麦克风的距离;其中,所述第九映射表中包含多个图像内容、多个二维图框尺寸,以及图像内容k和二维图框尺寸l共同对应的距离;其中,所述图像内容k为所述多个图像内容中任一个图像内容,所述二维图框尺寸l为所述多个二维图框尺寸中任一个二维图框尺寸;所述多个图像内容包含所述被摄目标的图像内容,所述多个二维图框尺寸包含所述被摄目标聚焦得到的二维图框的尺寸。
其中,被摄目标聚焦得到二维图像框可以是利用电子设备中自动对焦原理实现。被摄目标的聚焦也可以是响应于用户手动对焦操作实现的,即被摄目标聚焦得到二维图像框也可以是响应于用户手动对焦操作得到的。
在一种可能的技术方案中,被摄目标相对于所述麦克风的距离还可以利用多摄像头测距确定。例如在两个摄像头的场景下,利用公式Z=ft/d确定被摄目标到摄像头的距离。其中,Z为被摄目标到摄像头的距离,f为这两个摄像头的焦距,d为被摄目标在两个摄像头对应图像上坐标位置的距离差,t为两个摄像头之间的物理距离。
在被摄目标相对于电子设备距离足够远时,摄像头和麦克风之间的距离可以忽略,则无需进行坐标系转换,可以直接将被摄目标相对于摄像头的距离作为被摄目标相对于麦克风的距离,被摄目标相对于摄像头的方位作为被摄目标相对于麦克风的方位。
在摄像头和麦克风之间的距离不可忽略时,则可以利用公式(2)进行坐标系转换,得到被摄目标上的点在麦克风为原点的三维坐标系的坐标,进而得到被摄目标相对于麦克风的距离和被摄目标相对于麦克风的方位。
Figure PCTCN2019110095-appb-000002
其中,被摄目标与麦克风之间的距离还可以采用其他方式,例如利用结构光测距。本申请技术方案对被摄目标与麦克风之间的距离的测量方式不作限定。
在一种可能的技术方案中,所述使用所述第一均衡器频响曲线对所述第三音频信号进行均衡器频响控制,得到第四音频信号之后,所述方法还包括:将所述原始音频信号与第四音频信号叠加得到第五音频信号;所述原始音频信号是所述麦克风拾取到的音频信号。
其中,第五音频信号可以是处理完成后用于音频输出的音频信号。
在一种可能的技术方案中,所述使用所述滤波器对麦克风拾取的音频信号进行滤波,可具体实施为:根据所述麦克风拾取的音频信号确定多个声道中每个声道的原始音频信号;对所述每个声道的原始音频信号进行处理,所述处理包含使用所述滤波器进行滤波。左右声道分别单独执行空间增强、滤波、增益控制和EQ频响控制,两个声道之间的音频信号处理和播放互不影响,从而可以提高输出音频信号的立体感。
在一种可能的技术方案中,所述使用所述滤波器对麦克风拾取的音频信号进行滤波之前,所述方法还包括:显示所述音频处理策略;所述音频处理策略包含所述滤波器;所述使用所述滤波器对麦克风拾取的音频信号进行滤波,可具体实施为:响应于用户对所述音频处理策略的操作,使用所述滤波器对麦克风拾取的音频信号进行滤波。通过用户对电子设备自动识别得到的音频处理策略进行确认,可以提高音频处理策略识别的准确性和便利性。
作为一种可能的技术方案中,电子设备中处理器可以是在开始录制音视频时,执行对麦克风拾取的音频信号进行处理。可以实时对拾音组件拾取的音频信号进行处理,在边录边播的场景下,可以实时的自动选择音频处理策略,提高音频处理策略选择的便利性,并提高不同被摄目标的目标类型或者不同录音场景下音频信号的处理效果。
在一种可能的技术方案中,电子设备中处理器可以是在录制音视频结束后,执行对麦克风拾取的音频信号进行处理。可以在录制音视频过程中减少处理器的占用,提高音视频录制过程的流畅度,提高音频处理策略选择的便利性,并提高不同被摄目标的目标类型或者不同录音场景下音频信号的处理效果。
作为一种可能的技术方案中,电子设备中处理器还可以是在录制音视频结束,将录制得到音视频信号存储进存储器时,执行对麦克风拾取的音频信号进行处理。可以在录制音视频过程中减少处理器的占用,提高音视频录制过程的流畅度。这样,在需要保存录制得到音视频信号时才执行对麦克风拾取的音频信号进行处理,减少 不需要保存录制得到音视频信号时对处理器资源的浪费,从而可以节省处理器资源。
第三方面,本申请技术方案提供一种音频处理方法,所述方法包括:对摄像组件获取的第一图像进行图像识别,得到所述第一图像中被摄目标的目标类型和所述被摄目标相对于所述麦克风的距离;根据所述被摄目标的目标类型和所述被摄目标相对于所述麦克风的距离确定音频处理策略;根据所述音频处理策略对所述麦克风拾取的音频信号进行处理。
上述的音频处理方法中,可以利用图像识别的方法确定对麦克风拾取的音频信号进行处理的音频处理策略。从而可以提高音频处理策略选择的便利性,并提高音频信号的处理效果。
作为一种可能的技术方案中,所述被摄目标的目标类型可以包含语音类型和非语音类型。
在一种可能的技术方案中,当图像识别得到包含“人”这一被摄目标时,可以确定被摄目标的目标类型为语音类型。当图像识别得到第一图像不包含“人”这一被摄目标时,可以确定被摄目标的目标类型为非语音类型。
作为一种可能的技术方案,所述根据所述被摄目标的目标类型和所述被摄目标相对于所述麦克风的距离确定音频处理策略,包括:根据所述被摄目标的目标类型确定滤波器;根据所述被摄目标的目标类型和所述被摄目标相对于所述麦克风的距离确定第一增益控制曲线和第一均衡器频响曲线;所述音频处理策略包括所述滤波器、所述第一增益控制曲线和所述第一均衡器频响曲线。
可以理解的,本申请技术方案对电子设备中处理器确定滤波器、第一增益控制曲线、第一增益控制曲线和第一均衡器频响曲线的先后顺序不作限定。
作为一种可能的技术方案,所述根据所述音频处理策略对所述麦克风拾取的音频信号进行处理,包括:根据所述音频处理策略对所述麦克风拾取的音频信号进行滤波、增益控制和均衡器频响控制。
作为一种可能的技术方案,所述使用所述滤波器对麦克风拾取的音频信号进行滤波之前,所述方法还包括:根据所述图像识别得到所述被摄目标相对于所述麦克风的方位;对原始音频信号在所述被摄目标相对于麦克风的方位上进行空间增强,得到第一音频信号;所述原始音频信号是所述麦克风拾取到的音频信号;所述使用所述滤波器对麦克风拾取的音频信号进行滤波,可具体实施为:使用所述滤波器对所述第一音频信号进行滤波,得到第二音频信号。上述的音频处理方法中,还可以利用图像识别的方法确定空间增强的方向,从而可以进一步提高音频信号的处理效果。
作为一种可能的技术方案,所述根据所述音频处理策略对所述麦克风拾取的音频信号进行空间增强、滤波、增益控制和均衡器频响控制,包括:对原始音频信号在所述空间增强的方位上进行空间增强,得到第一音频信号;所述原始音频信号是所述麦克风拾取的音频信号;使用所述滤波器对所述第一音频信号进行滤波,得到第二音频信号;使用所述第一增益控制曲线对所述第二音频信号进行增益控制,得到第三音频信号;使用所述第一均衡器频响曲线对所述第三音频信号进行均衡器频 响控制,得到第四音频信号。
所述根据所述音频处理策略对所述麦克风拾取的音频信号进行空间增强、滤波、增益控制和EQ频响控制之后,所述方法还包括:将所述麦克风拾取的音频信号与第四音频信号叠加得到第五音频信号;所述第四音频信号为所述麦克风拾取的音频信号经过空间增强、滤波、增益控制和EQ频响控制之后得到的音频信号。
其中,第五音频信号可以是处理完成后用于音频输出的音频信号。
作为一种可能的技术方案,对麦克风拾取的音频信号进行处理的过程,本申请技术方案对空间增强、滤波、增益控制和EQ频响控制的先后顺序不作限定。
作为一种可能的技术方案,可以按照空间增强、滤波、增益控制和EQ频响控制的顺序执行,首先执行空间增强和滤波,可以提升处理得到的音频信号中来自被摄目标的音频信号所占的比例,降低噪声所占的比例,从而提升对音频信号的处理效果。
作为一种可能的技术方案,所述根据所述音频处理策略对所述麦克风拾取的音频信号进行空间增强、滤波、增益控制和EQ频响控制,可以具体实施为:根据所述麦克风拾取的音频信号确定多个声道中每个声道的原始音频信号;根据所述音频处理策略对所述每个声道的原始音频信号进行空间增强、滤波、增益控制和EQ频响控制。左右声道分别单独执行空间增强、滤波、增益控制和EQ频响控制,两个声道之间的音频信号处理和播放互不影响,从而可以提高输出音频信号的立体感。
作为一种可能的技术方案,所述根据所述音频处理策略对所述麦克风拾取的音频信号进行处理之前,所述方法还包括:显示所述音频处理策略;所述根据所述音频处理策略对所述麦克风拾取的音频信号进行处理,包括:响应于用户对所述音频处理策略的操作,根据所述音频处理策略对所述麦克风拾取的音频信号进行空间增强、滤波、增益控制和EQ频响控制。通过用户对电子设备自动识别得到的音频处理策略进行确认,可以提高音频处理策略识别的准确性和便利性。
作为一种可能的技术方案,电子设备中处理器可以是在开始录制音视频时,执行对麦克风拾取的音频信号进行处理。可以实时对拾音组件拾取的音频信号进行处理,在边录边播的场景下,可以实时的自动选择音频处理策略,提高音频处理策略选择的便利性,并提高不同被摄目标的目标类型或者不同录音场景下音频信号的处理效果。
作为一种可能的技术方案,电子设备中处理器可以是在录制音视频结束后,执行对麦克风拾取的音频信号进行处理。可以在录制音视频过程中减少处理器的占用,提高音视频录制过程的流畅度,提高音频处理策略选择的便利性,并提高不同被摄目标的目标类型或者不同录音场景下音频信号的处理效果。
作为一种可能的技术方案,电子设备中处理器还可以是在录制音视频结束,将录制得到音视频信号存储进存储器时,执行对麦克风拾取的音频信号进行处理。可以在录制音视频过程中减少处理器的占用,提高音视频录制过程的流畅度。这样,在需要保存录制得到音视频信号时才执行对麦克风拾取的音频信号进行处理,减少不需要保存录制得到音视频信号时对处理器资源的浪费,从而可以节省处理器资源。
第四方面,本申请技术方案提供了一种电子设备,包括一个或多个处理器和一个或多个存储器。该一个或多个存储器与一个或多个处理器耦合,一个或多个存储器用于存储计算机程序代码,计算机程序代码包括计算机指令,当一个或多个处理器执行计算机指令时,使得电子设备执行第一方面、第二方面、第三方面和第一至三方面任一个方面的任一个可能的技术方案所提供的方法。
第五方面,本申请技术方案提供一种电子设备,该电子设备包括用于执行第一方面、第二方面、第三方面和第一至三方面任一个方面的任一个可能的技术方案所提供的方法的模块或单元。
第六方面,本申请技术方案提供一种芯片系统,该芯片系统包括至少一个处理器,存储器和接口电路,该存储器、该接口电路和该至少一个处理器连接,该至少一个存储器中存储有程序指令;该程序指令被该处理器执行时,实现第一方面、第二方面、第三方面和第一至三方面任一个方面的任一个可能的技术方案所提供的方法。
第七方面,本申请技术方案提供一种计算机可读存储介质,该计算机可读存储介质中存储有程序指令,当该程序指令由处理器运行时,实现第一方面、第二方面、第三方面和第一至三方面任一个方面的任一个可能的技术方案所提供的方法。
第八方面,本申请技术方案提供一种计算机程序产品,当该计算机程序产品在由处理器上运行时,实现第一方面、第二方面、第三方面和第一至三方面任一个方面的任一个可能的技术方案所提供的方法。
其中,上述技术方案中,摄像组件可以为至少一个摄像头,例如电子设备包括一个摄像头,或电子设备上具有两个摄像头,或电子设备具有三个摄像头,或四个摄像头等,其中,这些可选方案中的摄像头可以位于电子设备的同一侧,例如位于电子设备的后侧。
附图说明
下面对本申请实施例用到的附图进行介绍。
图1是现有技术提供的一种电子设备上录音模式选取示意图;
图2是本申请实施例提供的一种音视频拍摄场景示意图;
图3是本申请实施例提供的一种空间增强的实现示意图;
图4是本申请实施例提供的一种音频处理方法的流程示意图;
图5是本申请实施例提供的一种识别被摄目标的示意图;
图6是本申请实施例提供的一种坐标系换算确定被摄目标与麦克风之间的方位的示意图;
图7是本申请实施例提供的一种被摄目标相对于麦克风的方位确定原理的示意图;
图8是本申请实施例提供的一种音频处理策略用户交互界面的示例;
图9是本申请实施例提供的另一种音频处理策略用户交互界面的示例;
图10是本申请实施例提供的一种电子设备的结构示意图。
具体实施方式
下面结合本申请实施例中的附图对本申请实施例进行描述。本申请实施例的实施方式部分使用的术语仅用于对本申请的具体实施例进行解释,而非旨在限定本申请。
下面介绍本申请实施例涉及的应用场景。在手机、平板电脑、摄像设备或者包含摄像头和麦克风的其他设备等这些电子设备中,电子设备中处理器可以调用摄像头来实现对目标拍摄视频片段,并调用麦克风来采集目标在被录过程中的音频。
请参阅图2,图2是本申请实施例提供的一种音视频拍摄场景示意图。如图2所示,电子设备中摄像头与显示屏上显示的“相机”图标关联,例如相机app,响应于用户对该图标的操作,例如对该图标的触摸选择,电子设备中处理器可以启动摄像头来捕捉图像。在相机的“视频”模式下,摄像头可以被配置为拍摄视频的功能。响应于用户对第一按键的用户操作,例如用户触摸该第一按键,电子设备中处理器可以从第一按键检测到触摸操作之后采集摄像头捕捉到的视频,并采集麦克风捕捉到的音频。在处理器检测到触摸操作之后,显示屏上可以显示第一显示控件,第一显示控件用于计时从第一按键被触摸操作之后录制视频的时长。当检测到用户对第二按键的操作,例如触摸操作,电子设备停止采集摄像头捕捉到的视频,并停止采集麦克风捕捉到的音频,且第一显示控件停止计时。可以理解的,第一按键和第二按键可以为相同的按键,也可以为不同的按键。
上述过程中电子设备完成一次音视频拍摄过程,得到的音视频片段的播放时长为用户对第二按键的操作的时刻和用户对第一按键操作的时刻之间的差值。音视频片段中,视频部分包含从用户对第一按键的操作开始到用户对第二按键操作结束期间摄像头持续捕捉到的画面。音视频片段中,音频部分包含从用户对第一按键的操作开始到用户对第二按键的操作结束期间麦克风持续捕捉到的音频。如图2所示,在用户对第一按键操作时,第一显示控件显示的时刻为00:00,第一显示控件持续计时直至用户对第二按键的操作,第一显示控件停止计时,显示计时时长02:15。则用户对第二按键的操作(例如,用户点击第二按键)后得到的音视频片段的播放时长可以是02:15。
可选的,上述音视频拍摄过程中,电子设备可以调用一个或多个摄像头来完成视频信号的拾取。在使用多个摄像头来完成视频信号的拾取的场景下,这多个摄像头均与显示屏上显示的“相机”图标关联,这多个摄像头拾取的视频信号可以被处理器获取。
可以理解的,图2对应用场景的举例仅用于解释本申请实施例,不应构成限定。本申请实施例对于其他利用摄像头拾取视频信号,利用麦克风拾取音频信号的场景同样适用。
在图2所描述的音视频拍摄过程中,音频信号的采集依靠麦克风来捕捉。为适应不同的录音场景和不同的录音目的,电子设备中处理器对麦克风采集到的原始音频信号进行不同的参数处理。参数处理例如可以包含数字滤波、空间增强、增益控制和EQ频响控制。
为了提高录音场景选择的便利性,并提高不同录音场景下参数处理的精确度,本申请实施例提供一种音频处理方法。该音频处理方法可以应用在使用电子设备进 行音视频拍摄的场景中,例如图2所描述的音视频拍摄场景中。
该音频处理方法中,电子设备中处理器可以对摄像头拾取的图像进行被摄目标对焦,识别得到图像中被摄目标。利用图像识别确定该被摄目标的目标类型、相对于麦克风的方位和相对于麦克风的距离。根据被摄目标相对于麦克风的方位,对拾音组件拾取的音频信号进行空间增强,来使该被摄目标相对于麦克风的方位上的音频强度增加,该被摄目标相对于麦克风的方位以外的方向上的音频强度减弱。根据被摄目标所属的目标类型,确定该被摄目标的目标类型对应的滤波器,该滤波器可以滤除被摄目标音频信号的噪声信号。根据被摄目标相对于麦克风的距离和该被摄目标所属的目标类型确定增益控制曲线和EQ频响曲线,并根据该增益控制曲线对被摄目标的音频信号进行增益控制和EQ频响输出处理后的音频信号。其中,为增加声音的空间感,利用麦克风组件拾取的原始音频信号叠加在处理后的音频信号中。
上述的音频处理方法中,利用图像识别得到被摄目标的类型、相对于麦克风的方位和相对于麦克风的距离,可以提高录音场景中场景识别和被摄目标识别的精确度。然后根据被摄目标的类型、相对于麦克风的方位和相对于麦克风的距离来确定音频处理策略,可以过滤掉音频信号中的干扰信号,提升音频信号的处理效果。其中,音频处理策略可以包含选取的滤波器、增益控制曲线和EQ频响曲线。音频信号的处理效果可以是结合音频信号来自的被摄目标的目标类型和录音场景,符合该类型和录音环境下用户听取规律。
为了理解本申请实施例,下面介绍本申请实施例涉及的概念。
(1)滤波器
滤波器可以用于频域降噪,保留来自被录目标的音频信号,滤除来自被录目标以外的目标的音频信号。本申请实施例涉及的滤波器可以是数字滤波器,由处理器调用算法模型实现。
不同类型的音频信号的频域分布概率特征不同,不同录音场景下麦克风拾取音频信号的频域分布概率特征不同。可以归纳不同类型音频信号的频域分布概率特征,作为先验信息指导各类型音频在各个频率点上的增益估计,归纳不同录音场景下音频信号的频域分布概率特征,作为先验信息指导各录音场景下音频信号在各个频率点上的增益估计。不同类型的音频例如可以包括:语音、鸟鸣、流水声、钢琴曲和音乐等等。具体的,例如人的语音的频率范围为85HZ-8kHZ,语音信号的一个重要特征是基音周期,基音周期是人的声门相邻两次开闭之间的时间间隔或开闭的频率。不同录音场景可以包含:会议场景、K歌场景、远距离场景等等。例如,在会议场景和k歌场景人的语音频率范围、增益大小需求差别很大。
每个音频信号的目标类型可以对应一个滤波器,每个录音场景也可以对应一个滤波器。其中,音频信号的目标类型即来自被录目标的音频的目标类型,即前后文中的被录目标的目标类型。电子设备中也可以是由一个音频信号的目标类型和一个录音场景对应一个滤波器。录音场景对应一个滤波器可以是处理器实现的算法模型,该滤波器可以经过机器学习确定。如对于语音类型来说,将干净的语音信号作为监督信号,通过对滤波器参数迭代优化直至滤波器对混合音频信号的输出结果逼近监督信号并收敛,从而生成针对语音类型目标信号的频域降噪滤波器。混合音频信号 中包含语音信号和其他类型的音频信号,电子设备中处理器可以通过训练得到的语音信号对应的滤波器滤除混合音频信号中的其他类型的音频信号,仅保留语音信号。在该混合音频信号中,其他类型的音频信号相对于语音信号来说即为噪音信号。其中,机器学习过程中,用于机器学习的混合音频信号可以由干净的语音信号叠加上噪音信号得到。将该干净的语音信号作为监督信号,将混合音频信号作为滤波器的输入信号对滤波器进行参数迭代优化。
(2)空间增强
空间增强可以实现将特定方位的音频信号增强,将该特定方向以外的方向上的音频信号弱化。其中,该特定方向可以是被录目标相对于麦克风的方位。
由于被录目标相对于麦克风所在方位不同,电子设备中处理器可以对麦克风接收到的原始音频信号进行处理或者调整麦克风的指向,来使采集到的音频信号在目标所在的方位上的音频强度增强,其余方位上的音频强度弱化,即对被录目标的音频进行空间增强。该空间增强的方位可以包含方向中心和角度范围。方向中心表征该方位的中心位置,角度范围表征该方位覆盖的角度区域。根据麦克风是否可变指向性,空间增强的实现方式可包含两种:(a)麦克风为可变指向时,电子设备中处理器可以调节麦克风的指向到目标所在的方向中心;(b)麦克风为不可变指向时,电子设备中处理器可以通过算法来实现目标所在的方位上的音频强度增强。以下分别进行描述。
(a)麦克风为可变指向时,调节麦克风的指向到目标所在的方向中心
麦克风捕获音频信号的强度与被录目标相对于麦克风的方位相关。请参阅图3,图3是本申请实施例提供的一种空间增强的实现示意图。如图3所示,当麦克风的指向正对被录目标时,麦克风的指向与目标所在的方向中心可以重合,麦克风采集到来自被录目标的音频信号的强度最强,来自被录目标以外的噪音信号的强度最弱。其中,麦克风的指向可以是指麦克风捕捉音频信号的方向。目标所在的方位中,方向中心和角度范围可以是目标相对于麦克风而言的。
当电子设备中包含的可变指向的麦克风的数量为多个时,为实现空间增强,电子设备中处理器可以将每个麦克风的指向调节到目标相对于该麦克风所在的方向中心,进而实现多个麦克风中每个麦克风均对被录目标的音频进行空间增强。
(b)麦克风为不可变指向时,通过算法来实现目标所在的方位上的音频强度增强
空间中传播的音频信号是振动产生的信号。由于电子设备中多个麦克风与声源之间的距离不同,多个麦克风在同一时刻接收到来自声源的音频信号不同。具体的,这多个麦克风捕捉声源得到多个音频信号,由于存在时延,这多个音频信号的相位不同。则在将这多个音频信号进行叠加时,相位相同的音频信号之间叠加增强,相位相反的音频信号之间叠加相消。
利用这个原理,对于来自非目标方向的音频信号,电子设备中处理器可以对多个麦克风拾取的多个音频信号进行时延补偿或者相位补偿,使得这多个音频信号在叠加时相消,从而减弱来自非目标方向的音频信号的强度。其中,非目标方向为目标所在方位以外的方向。对于来自目标所在方位的音频信号,电子设备中处理器可 以对多个麦克风拾取的多个音频信号进行时延补偿或者相位补偿,使得这多个音频信号在叠加时增强,从而增强来自目标所在方位的音频信号的强度。利用上述的算法原理,来实现多个麦克风对被录目标的音频进行空间增强。
可以理解的,本申请实施例涉及的空间增强的实现方式可以包含上述(a)和(b)中的任意一个或多个。
(3)增益控制和增益控制曲线
增益控制是指对麦克风拾取到的音频信号进行音频信号强度的调整。增益控制可以调节各种幅值信号的放大倍数。不同的信号幅度对应的增益可以不同。增益控制与以下一个或多个因素相关:被录目标的目标类型、录音场景、被录目标与麦克风之间的距离。以下分别对影响增益控制的因素进行介绍。
本申请实施例中,被录目标的目标类型即为前后文中图像识别得到的被摄目标的目标类型,被录目标即为前后文中图像识别得到的被摄目标。
(a)增益控制与被录目标的目标类型
对于不同目标类型的音频信号,用户对音频信号在不同输入声压级下的录音信号强度要求不同,对应的增益大小不同。例如,对于古典乐来说,用户希望保留麦克风拾取的原始录音信号的保真性,避免过度调节增益导致动态压缩失真。因此在古典乐录音类型中,无需将声压级下的录音音频信号都放大到固定的信号强度,声压级下的录音音频信号的放大倍数可以相等,来保留古典音乐录音的动态范围。对于流行音乐来说,用户更加追求录音音频信号的录音强度足够大。因此在流行音乐录音类型中,电子设备中处理器可以将各声压级下的录音音频信号放大到固定的信号强度。
其中,音频信号的强度可以表征音频信号的振动幅值。声压级(sound pressure level,SPL)是指以对数尺衡量有效声压相对于一个基准值的大小,用分贝(dB)来描述有效声压与基准值的关系。人类的对于1KHz的声音的听阈(即产生听觉的最低声压)为20μPa,通常以该听阈作为声压级的基准值。
对于每种音源类型,可以对应一个增益控制曲线。该增益控制曲线的横坐标可以是输入音频信号的幅值大小,纵坐标可以是增益大小。该增益控制曲线可以针对对应的目标类型设,使该目标类型的音频信号在经过增益控制后符合用户听取规律。例如,对于语音信号,对应的增益控制曲线可以实现经过增益控制后输出的音频信号的信号强度恒定。具体的,当麦克风拾取到的语音信号强度过大时,根据语音对应的增益控制曲线,可以降低增益来使经过增益控制后输出的语音信号的信号强度不至过大。当麦克风拾取到的语音信号强度过小时,根据语音对应的增益控制曲线,可以增大增益来使经过增益控制后输出的语音信号的信号强度不至过小。
(b)增益控制与录音场景
不同录音场景下,用户对音频信号的信号在不同频率点的信号强度要求也不同,对应的增益大小不同。在自拍录像、采访、k歌直播录音场景中,用户希望弱化远场音频信号,而保证近场音频信号清晰。则在自拍录像、采访、k歌直播录音场景中,远场音频信号经过麦克风拾取得到的即为小信号,可以减小小信号增益,增大普通信号的增益。
对于每种录音场景,也可以对应一个增益控制曲线。该增益控制曲线的横坐标可以是拾取到的信号的幅值大小,纵坐标可以是增益大小。该增益控制曲线可以针对对应的录音场景设置,来使该类型的音频信号在经过增益控制后符合用户听取规律。例如,同为语音信号,k歌录音场景和远场录音场景对应的增益控制曲线完全不同。在k歌录音场景,对应的增益控制曲线可以实现经过增益控制后,麦克风采集的小信号以外的信号对应输出的音频信号的信号强度恒定,小信号被抑制,则小信号的增益减小。在远场录音场景,对应的增益控制曲线可以实现经过增益控制后,麦克风采集的小信号被放大,即增大小信号的增益。其中,小信号可以是信号幅值小于预设幅值的信号的总和。
本申请实施例中,电子设备中可以存储映射关系:一个被录目标的目标类型和一个录音场景共同映射一个增益控制曲线。电子设备中也可以存储映射关系:一个被录目标的目标类型映射一个增益控制曲线。电子设备中还可以存储映射关系:一个录音场景映射一个增益控制曲线。
可选的,电子设备中还可以存储以下映射关系:被录目标的目标类型A、录音场景B、被录目标与麦克风之间的距离C映射一条增益控制曲线。其中,目标类型A为任一个目标类型,录音场景B为任一个录音场景,被录目标与麦克风之间的距离C为任一个距离梯度。
(c)增益控制、被录目标与麦克风之间的距离
距离越远,音频信号衰减幅度越大,因此电子设备中处理器可以设置增益大小与音源目标的距离成正比。
对于每个距离梯度,可以对应一个增益补偿曲线。该增益补偿曲线可以叠加在增益控制曲线上,补偿被录目标与麦克风之间的距离对增益的影响,共同完成增益控制。距离梯度例如可以包含:远、较远、中、较近和近。
(4)EQ频响和EQ频响曲线
EQ频响的调节可以补偿扬声器和声场的缺陷,准确还原原始录制的音频信号。
EQ频响可以调节音频信号中各种频率成分音频信号的放大倍数。
对于不同的被录目标的目标类型、不同的录音场景、不同的被录目标与麦克风之间的距离,EQ频响需求也不同。因此EQ频响也与以下一个或多个因素相关:被录目标的目标类型、录音场景、被录目标与麦克风之间的距离。以下分别对影响EQ频响的因素进行介绍。
(a)EQ频响与被录目标类型
对于不同目标类型的音频信号,用户对音频信号中不同的频率成分的增益要求不同。例如,对于语音这一目标类型,可以将语音信号中5kHz成分的增益提高来提升语音信号的清晰度。可以将语音信号中1.8kHz成分和2.5kHz成分的增益降低来柔化和净化语音信号。而对于钢琴曲这一目标类型,钢琴曲的音频信号多集中在中频区,如3kHz或4kHz。将钢琴曲的音频信号中8kHz成分附近增益略作提升可以使高音键听起来更明亮。
在电子设备中对于每种目标类型,可以对应一个EQ频响曲线。该EQ频响曲线的横坐标可以是输入音频信号的频率大小,纵坐标可以是增益大小。该EQ频响曲线 可以针对对应的目标类型设置,来使该目标类型的音频信号在经过EQ频响后符合用户听取规律。
(b)EQ频响与录音场景
EQ频响调节可以调整音频信号的音色。对于不同录音场景的音频信号,用户对音频信号中不同的频率成分的增益要求也不同。例如,在K歌场景下,可以通过提升中频成分的增益来突出人声信号。中频成分例如可以包含1-4kHz。而在会议场景下,需要声音尽量厚重,则可以尽可能保留更多的低频成分,即提升低频成分的增益。如果需要声音宏亮,则可以提升60Hz成分及120Hz成分的增益,并提升7kHz附近的高频成分的增益。
在电子设备中对于每种录音场景,可以对应一个EQ频响曲线。该EQ频响曲线的横坐标可以是输入音频信号的频率大小,纵坐标可以是增益大小。该EQ频响曲线可以针对对应的录音场景设置,来使该录音场景的音频信号在经过EQ频响后符合用户听取规律。
(c)EQ频响、被录目标与麦克风之间的距离
由于高频信号随距离衰减速度比低频信号快,与麦克风之间距离相等的情况下,高频信号的增益大于低频信号的增益。
EQ频响也可以对应EQ频响曲线。EQ频响曲线的横坐标可以是音频信号的频率大小,纵坐标可以是增益大小。对于每种目标类型,可以对应一个EQ频响曲线。对于每种录音场景,也可以对应一个EQ频响曲线。对于每个距离梯度,可以对应一个EQ频响补偿曲线。该EQ频响补偿曲线可以叠加在EQ频响曲线上作为最终用于调节音频信号的曲线。
可选的,电子设备中还可以存储以下映射关系:被录目标的目标类型A、录音场景B、被录目标与麦克风之间的距离C映射一条EQ频响曲线。其中,目标类型A为任一个目标类型,录音场景B为任一个录音场景,被录目标与麦克风之间的距离C为任一个距离梯度。
(5)图像识别
图像识别技术的过程可以包括:信息的获取、预处理、特征抽取和选择、分类器设计和分类决策。以下分别进行介绍。
信息的获取是指通过传感器,将光信息转化为电信息。也就是获取研究对象的基本信息并通过某种方法将其转变为机器能够认识的信息。
预处理主要是指图像处理中的去噪、平滑、变换等的操作,从而加强图像的重要特征。
特征提取和选择是指在模式识别中,需要进行特征的抽取和选择。由于需要分类不同的图像,可以通过这些图像所具有的特征来进行区分,而获取这些特征的过程就是特征抽取。在特征抽取中所得到的特征也许对此次识别并不都是有用的,这个时候就要提取有用的特征,这就是特征的选择。
分类器设计是指通过训练而得到一种识别规则,通过此识别规则,电子设备中处理器可以得到一种特征分类,使图像识别技术能够得到高识别率。分类决策是指在特征空间中对被识别对象进行分类,从而更好地识别所研究的对象具体属于哪一 类。
图像识别技术可以利用计算机视觉算法实现。计算机视觉算法是帮助计算机理解图像的一种数学模型。计算机视觉算法的核心思想是利用数据驱动的方法从大数据之中学习出统计特性和模式,一般需要大量的训练样本对模型进行训练。具体地,可以使用计算机视觉算法对包括纹理、颜色、形状、空间关系和高层语义等的图像特征进行建模。通过训练样本对初始的模型进行训练,调整初始的模型中的参数来使图像识别的误差收敛,以构建新的模型。训练完成后,电子设备中处理器可以通过新的模型预测图像分类及分类的概率,从而进行图像识别。
计算机视觉算法可以使用基于人工神经网络的深度学习算法来实现。人工神经网络的深度学习算法可以通过多层神经网络层来提取图像特征,并计算图像包含预设图像特征的概率。人工神经网络的深度学习算法例如可以是卷积神经网络(convolutional neural network,CNN)。深度学习算法可以通过卷积神经网络提取图像特征并计算图像包含预设图像特征的概率。用于进行图像识别的卷积神经网络可以看作是一个分类器,使用卷积神经网络对图像进行分类,对输入卷积神经网络的图像分类,并得到每个分类的概率。该卷积神经网络可以是一定网络架构的初始模型经过训练样本对初始模型中的参数进行调整以使识别误差收敛,从而得到的新的模型。模型中的参数例如可以包括卷积核大小、池化核大小和全连接层的个数等。
本申请实施例中,可以通过信息的获取、预处理、特征抽取和选择、分类器设计和分类决策确定被摄目标的目标类型。另外,本申请实施例的图像识别还可以包括:根据二维图框大小确定被摄目标到摄像头的距离、根据目标所在图像网格线的交叉点确定被摄目标的方位。
(6)拾音组件和摄像组件
本申请实施例中,拾音组件可以包含一个麦克风或多个麦克风组成的麦克风阵列。麦克风阵列即一定数量的麦克风组成,用来对空间声场进行采样并处理的系统。电子设备中处理器可以利用这多个麦克风接收到音频信号的相位之间的差异对声波进行过滤,能最大限度将环境背景声音清除掉,留下来自被录目标的音频信号。
可选的,拾音组件还可以包含与麦克风相连的专用处理芯片,该专用处理芯片可以用于实现以下一项或多项:滤波器、空间增强、增益控制和EQ频响。
摄像组件可以包含摄像头,摄像头用于拾取视角范围内的图像,这些图像在时间上累积可以得到视频信号。其中,摄像组件中摄像头的数量可以是一个或多个。在使用多个摄像头来完成视频信号的拾取的场景下,这多个摄像头拾取的视频信号可以被处理器获取。电子设备中的处理器可以采集摄像头拾取的图像,并将这些图像和视频信号存入缓存或者存储设备中。
可选的,摄像组件还可以包含与摄像头相连的专用处理芯片,该专用处理芯片可以用于实现以下一项或多项:被摄目标识别、目标类型识别、被摄场景识别、目标在图像上的方位识别和目标相对于摄像头的距离识别。
(7)被摄目标的目标类型
被摄目标的目标类型可以是对摄像头拾取图像进行图像识别得到的。图像识别得到图像内容,该图像内容例如可以是人像、鸟、瀑布、钢琴、乐队等。可以根据 各个图像内容关联的音频类型,来确定被摄目标的目标类型。该被摄目标的目标类型对应不同类型的音频信号。具体的,请参阅表一,表一是本申请实施例提供的一种图像内容与被摄目标的目标类型的映射关系示例。
表一本申请实施例提供的一种图像内容与被摄目标的目标类型的映射关系示例
Figure PCTCN2019110095-appb-000003
如表一所示,电子设备可以预存该映射表,在图像识别得到图像内容为“人像”时,通过表一对应得到被摄目标的目标类型是“语音”类型。另外,表一中,可以多个图像内容对应一个被摄目标的目标类型。这是由于同一个被摄目标的目标类型对应的可能包含多个图像内容,例如,对于“流水声”可以对应图像内容“瀑布”和“河流”。对于“钢琴曲”可以对应图像内容“钢琴”和“琴谱”。对于“音乐”可以对应图像内容“乐队”和“演奏者”。其中,表一可以是根据先验经验预先设置在电子设备中的存储器中,该表一可供电子设备中处理器调用来确定被摄目标的目标类型。上述表一即为前后文中第八映射表的一种示例。
下面介绍本申请实施例的具体实现流程。请参阅图4,图4是本申请实施例提供的一种音频处理方法的流程示意图。该音频处理方法应用于电子设备,该电子设备包含摄像组件和拾音组件。摄像组件用于拾取视频信号并进行图像识别。拾音组件用于拾取音频信号。下面结合图4介绍本申请实施例涉及的音频处理方法。
S101、电子设备中处理器采集摄像组件拾取的图像,进行图像对焦,得到被摄目标。
S102、电子设备中的处理器利用图像识别确定被摄目标的目标类型、相对于麦克风的方位和相对于麦克风的距离。
S103、电子设备中的处理器根据被摄目标相对于麦克风的方位,确定空间增强的方位。电子设备中的处理器根据被摄目标的目标类型确定滤波器。电子设备中的处理器根据被摄目标相对于麦克风的距离和该被摄目标所属的目标类型确定增益控制曲线和EQ频响曲线。
S104、电子设备中的处理器获取拾音组件拾取的原始音频信号。电子设备中的处理器根据空间增加的方位,对拾音组件拾取的音频信号进行空间增强,输出第一音频信号。
S105、电子设备中的处理器根据确定的滤波器对第一音频信号进行滤波,以滤除噪音信号,得到第二音频信号。
S106、电子设备中的处理器根据确定的增益控制曲线对第二音频信号进行增益控制,以得到第三音频信号。
S107、电子设备中的处理器根据确定的增益控制曲线对第三音频信号进行EQ频响,以得到第四音频信号。
S108、电子设备中的处理器将第四音频信号和拾音组件拾取的原始音频信号叠 加得到第五音频信号。
该第五音频信号可以是处理完成后用于音频输出的音频信号。
上述的音频处理方法中,利用图像识别得到被摄目标的类型、相对于麦克风的方位和相对于麦克风的距离,可以提高录音场景中场景识别和被摄目标识别的精确度。然后根据被摄目标的类型、相对于麦克风的方位和相对于麦克风的距离来确定音频处理策略,可以过滤掉音频信号中的干扰信号,提升音频信号的处理效果。
关于步骤S101,请参阅图5,图5是本申请实施例提供的一种识别被摄目标的示意图。在音视频拍摄场景中,处理器开始采集摄像头拾取到的视频信号之后,摄像头可以聚焦到被摄目标,并通过二维图框框出的二维像素区域显示出聚焦到的被摄目标。当用户对第二按键进行操作,例如触摸操作,电子设备停止采集摄像头捕捉到的视频,并停止采集麦克风捕捉到的音频,且第一显示控件停止计时。
可选的,如图5所示,被摄目标聚焦得到二维图像框可以是利用电子设备中自动对焦原理实现。可选的,列出一种自动对焦原理:可以利用马达带动摄像头中镜头沿光轴移动来实现对焦。利用马达驱动芯片输出对应的电流,马达会做出相应的位移,在该位移下摄像头拾取图像,通过拾取的图像的清晰度来判断镜头是否达到拍摄图像清晰的位置(例如最清晰的位置),如果未达到清晰的位置,重新通知马达驱动芯片调整输出电流,并重复执行上述流程直至判断结果是镜头达到拍摄图像清晰的位置。通过上述闭环调节过程完成对焦。
可选的,本申请实施例中,被摄目标的聚焦也可以是响应于用户手动对焦操作实现的。
关于步骤S102,如图5所示,二维图框对应的二维像素区域可以作为被摄目标所对应的图像区域。通过该二维像素区域可以确定被摄目标的目标类型、被摄目标与麦克风之间的距离和被摄目标与摄像头之间的方位。以下分别介绍怎样通过图像识别确定被摄目标的目标类型、被摄目标与麦克风之间的距离、被摄目标与麦克风之间的方位。
(1)图像识别确定被摄目标的目标类型
首先,电子设备中的处理器通过对二维像素区域进行图像识别可以得到图像内容,也可以是对被摄目标的图像进行图像识别得到图像内容。其次,电子设备中的处理器可以利用表一查表得到被摄目标的目标类型。本申请实施例中,表一即为前后文中第八映射表的一种示例。
(2)图像识别确定被摄目标与麦克风之间的距离
图像识别可以确定被摄目标与摄像头之间的距离。由于摄像头和麦克风均设置在电子设备中,被摄目标与摄像头之间的距离可以近似看作是被摄目标与麦克风之间的距离。图像识别确定被摄目标与摄像头之间的距离,可以是距离梯度,例如可以包含:远、较远、中、较近和近。图像识别确定被摄目标与摄像头之间的距离可以是利用二维图框的大小和图像内容确定。对于同一种图像内容,聚焦得到的二维图框越大,表明被摄目标与摄像头之间的距离越近,聚焦得到的二维图框越小,表明被摄目标与摄像头之间的距离越远。利用该规律,针对于每一种图像内容,可以 预存聚焦得到的二维图框大小与距离梯度之间的映射关系。请参阅表二,表二是本申请实施例提供的一种图像内容为人像时,二维图框大小与距离梯度之间的映射关系示例。
表二一种图像内容为人像时,二维图框大小与距离梯度之间的映射关系示例
Figure PCTCN2019110095-appb-000004
如表二所示,二维图框大小可以用二维框所占用的像素区域的像素数量表示。其中a、b、c、d、e和f分别表示像素数量,且a<b<c<d<e<f。二维图框大小在a×a~b×b范围内,映射得到的距离梯度为“远”,二维图库大小依次为b×b~c×c、c×c~d×d、d×d~e×e和e×e~f×f,对应得到的距离梯度依次为“较远”、“中”、“较近”和“近”。
电子设备中的处理器在识别得到图像内容为人像之后,根据该图像内容人像查找得到该人像对应的二维图框大小与距离梯度之间的映射关系,即表二,根据该表二和二维图像框的大小从上述表二中查找出二维图像框对应的距离梯度。本申请实施例中,表二也可以具体实现为二维映射表,该二维映射表包含多个图像内容、多个二维图框尺寸,以及图像内容k和二维图框尺寸l共同对应的距离;其中,图像内容k为所述多个图像内容中任一个图像内容,所述二维图框尺寸l为所述多个二维图框尺寸中任一个二维图框尺寸;所述多个图像内容包含被摄目标的图像内容,所述多个二维图框尺寸包含所述被摄目标聚焦得到的二维图框的尺寸。该二维映射表即为前后文中第九映射表。
可选的,被摄目标与麦克风之间的距离还可以使用多个摄像头测距的原理确定。将多个摄像头测量得到被摄目标与摄像头之间的距离作为被摄目标与麦克风之间的距离。具体的,可以利用被摄目标在多个摄像头中成像的视差(disparity)来确定被摄目标与摄像头之间的距离。被摄目标到摄像头的距离与被摄目标到成像平面的距离成反比。在两个摄像头的场景下即:
Z=ft/d  (1)
其中,Z为被摄目标到摄像头的距离,f为这两个摄像头的焦距,d为被摄目标在两个摄像头对应图像上坐标位置的距离差,t为两个摄像头之间的物理距离。
可以理解的,上述确定被摄目标与麦克风之间的距离的举例仅用于解释本申请实施例,不应构成限定。被摄目标与麦克风之间的距离还可以采用其他方式,例如利用结构光测距。本申请实施例对被摄目标与麦克风之间的距离的测量方式不作限定。
(3)图像识别确定被摄目标与麦克风之间的方位
以摄像头和麦克风分别为坐标原点建立两个三维坐标系。这两个三维坐标系之间的换算关系可以根据摄像头麦克风之间固定的位置关系确定。电子设备中处理器通过图像识别可以得到被摄目标上在摄像头对应的三维坐标系中的坐标。电子设备中处理器可以利用两个坐标系之间的换算关系可以确定被摄目标在麦克风对应的三 维坐标系中的坐标。电子设备中处理器可以根据被摄目标在麦克风对应的三维坐标系中的坐标来确定被摄目标与麦克风之间的方位。
其中,由于被摄目标被摄像头捕捉到的形状可以是二维形状,被摄目标上在摄像头对应的三维坐标系中的坐标可以是被摄目标上多个点中每个点在摄像头对应的三维坐标系中的坐标。换算到麦克风对应的三维坐标系中,电子设备中处理器可以得到被摄目标上多个点在克风对应的三维坐标系中的坐标。电子设备中处理器可以根据被摄目标上多个点在麦克风对应的三维坐标系中的坐标,确定被摄目标与麦克风之间的方位。
下面介绍一种坐标系换算确定被摄目标与麦克风之间的方位的示例。请参阅图6,图6是本申请实施例提供的一种坐标系换算确定被摄目标与麦克风之间的方位的示意图。如图6所示,以摄像头所在的位置为坐标原点O,X、Y和Z为空间中三个互相垂直的坐标轴,建立三维坐标系OXYZ。图像识别可以确定被摄目标上点A在该三维坐标系OXYZ内的坐标(i,j,k)。同样的,以麦克风所在的位置为另一坐标原点O1,X1、Y1和Z1为空间中三个互相垂直的坐标轴,建立另一个三维坐标系O1X1Y1Z1。利用点A在三维坐标系OXYZ内的坐标(i,j,k)和两个坐标系之间的位置关系进行坐标换算可以得到点A在三维坐标系O1X1Y1Z1内的坐标。
在一种可能的实施方式中,如图6所示,在电子设备为手机的场景下X轴为平行于水平面且平行于手机显示平面的方向,Y轴为平行于手机显示平面且垂直于Z轴的方向,Z轴为摄像头的光轴方向。X1轴为平行于水平面且平行于手机显示平面的方向,Y1轴为平行于手机显示平面且垂直于Z1轴的方向,Z1轴为垂直于手机显示平面的方向。可以得到,X轴与X1轴平行,Y轴与Y1轴平行,Z轴与Z1轴平行。
如果三维坐标系OXYZ的坐标原点O在三维坐标系O1X1Y1Z1中的坐标为(i0,j0,k0),设点A在三维坐标系O1X1Y1Z1内的坐标为(i1,j 1,k1),则可以得到点A在三维坐标系O1X1Y1Z1中的坐标为:
Figure PCTCN2019110095-appb-000005
其中,i0、j0和k0的取值可以根据电子设备中摄像头麦克风之间固定的位置关系确定。
其中,在图像识别得到点A在三维坐标系OXYZ内的坐标(i,j,k)时,沿被摄目标与摄像头之间的距离的方向上的坐标值k可以根据表二得到的距离梯度估算得到。沿被摄目标与摄像头之间的距离的方向上的坐标值k也可以是利用双摄像头测距得到。
电子设备中的处理器可以利用上述坐标系转换确定被摄目标上多个点在三维坐标系O1X1Y1Z1内的坐标。然后电子设备中的处理器可以根据被摄目标上多个点在三维坐标系O1X1Y1Z1内的坐标计算得到被摄目标与麦克风之间的方位。
可以理解的,上述利用坐标变换来确定被摄目标与麦克风之间的方位的举例仅 用于解释本申请实施例,不应构成限定。三维坐标系OXYZ和三维坐标系O1X1Y1Z1中坐标轴还可以是其他方向。另外,不限于直角坐标系,以摄像头和麦克风建立的坐标系还可以是其他类型的坐标系,例如球坐标系。
在一种可能的实施例中,电子设备中处理器可以利用被摄目标在摄像头拾取的图像中的位置确定被摄目标相对于摄像头的方位。然后将该被摄目标相对于摄像头的方位作为被摄目标相对于麦克风的方位。可以理解的,在被摄目标相对于电子设备距离足够远时,摄像头和麦克风之间的距离可以忽略,则无需进行坐标系转换,可以直接将被摄目标相对于摄像头的方位作为被摄目标相对于麦克风的方位。
具体的,请参阅图7,图7是本申请实施例提供的一种被摄目标相对于麦克风的方位确定原理的示意图。如图7所示,电子设备中处理器可以对摄像头拍摄的画面进行网格离散化,并预先存储每个网格线的交叉点上和该交叉点对应的方位。电子设备中处理器可以根据聚焦得到的二维图像区域内的一个或多个交叉点,确定交叉点对应的方位。如图7所示,二维图像区域内的网格交叉点A、B的坐标为(x0,y0)和(x1,y1)。根据预存的网格线的交叉点和方向的映射关系,得到该交叉点A(x0,y0)对应的方向表示为(θ0,ψ0),以及交叉点B(x1,y1)对应的方向表示为(θ1,ψ1)。根据被摄目标的方向(θ0,ψ0)和(θ1,ψ1)得到被摄目标相对于摄像头的方位。
电子设备中处理器利用被摄目标在摄像头拾取的图像中的位置确定被摄目标相对于摄像头的方位,可具体实施为:获取被摄目标聚焦得到的二维图框中包含的坐标点。根据二维图框中包含的坐标点从第十映射表中获取被摄目标上的点相对于麦克风的方位。
其中,第十映射表中包含多个坐标点和多个坐标点中每个坐标点对应的方位;多个坐标点包含二维图框中包含的坐标点。其中,网格线的交叉点A和B即为被摄目标聚焦得到的二维图框中包含的坐标点。
其中,θ0和ψ0为坐标系OXYZ对应的球坐标系表示中的被摄目标上与A点对应的点的天顶角和方位角。被摄目标上与A点对应的点的径向距离为r0。θ1和ψ1为坐标系OXYZ对应的球坐标系表示中的被摄目标上与B点对应的点的天顶角和方位角。被摄目标上与B点对应的点的径向距离为r1。
根据被摄目标的方向(θ0,ψ0)和(θ1,ψ1)表示,电子设备中处理器可以得到被摄目标上的点A和B在坐标系OXYZ中的坐标为(
Figure PCTCN2019110095-appb-000006
r0cosθ0)和(
Figure PCTCN2019110095-appb-000007
r1cosθ1)。
可以理解的,电子设备中处理器可以利用被摄目标在摄像头拾取的多张图像中每张图像的位置来确定被摄目标相对于摄像头的方位。
可选的,电子设备中预存每个网格线的交叉点和该交叉点对应的方位可以是预先测量得到:网格线的交叉点(xi,yi)与其对应的方位(θi,ψi)。其中的交叉点(xi,yi)为任一个交叉点。网格线交叉点的数量可以为k,k为正整数,i为满足1≤i≤k的正整数。
一种测量交叉点C(xi,yi)和该交叉点C(xi,yi)对应的方位过程示例可以是:将被摄目标首先置于摄像头的正前方,即坐标系OXYZ对应的球坐标系表示中 被摄目标的天顶角和方位角均为0。保持摄像头位置不动,旋转电子设备直至被摄目标出现在摄像头拍摄图像中交叉点C(xi,yi)的位置。记录电子设备旋转的角度θi和ψi,即为交叉点C(xi,yi)对应的方位。
可选的,在摄像头和麦克风之间的距离不可忽略的场景下,可以通过多个摄像头测距测得被摄目标上的点的径向距离。然后利用公式(2)中的坐标变换得到被摄目标上的点在坐标系O1X1Y1Z1中的坐标,进而得到被摄目标上相对于麦克风的方位。例如前例中的A点对应的被摄目标上的点的径向距离r0和B点对应的被摄目标上的点的径向距离r1,可以通过双摄像头测距得到。被摄目标上的点A和B在坐标系OXYZ中的坐标为(
Figure PCTCN2019110095-appb-000008
r0cosθ0)和(
Figure PCTCN2019110095-appb-000009
Figure PCTCN2019110095-appb-000010
r1cosθ1)。利用公式(2)进行坐标转换得到被摄目标上这两个点在三维坐标系O1X1Y1Z1内的坐标。根据被摄目标上这两个点在三维坐标系O1X1Y1Z1内的坐标计算得到被摄目标与麦克风之间的方位。
关于步骤S103,以下分别对空间增强的方位、滤波器和增益控制曲线和EQ频响曲线的具体确定过程进行介绍。
(a)空间增强的方位的确定
电子设备中的处理器可以将步骤S102得到的被摄目标相对于麦克风的方位确定为空间增强的方位。该空间增强的方位用于对原始音频信号进行空间增强。关于空间增强的具体描述可以参考前述概念具体描述,这里不再赘述。
(b)滤波器的确定
电子设备中的处理器可以根据步骤S102得到的被摄目标的目标类型确定滤波器。电子设备中的存储器中可以存储第一映射表,根据所述被摄目标的目标类型从第一映射表中获取滤波器。其中,所述第一映射表中包含多个目标类型,以及多个目标类型中每个目标类型对应的滤波器。这多个目标类型包含所述被摄目标的目标类型。电子设备中的处理器可以根据上述存储器中存储内容和被摄目标的目标类型确定滤波器。关于滤波器的具体描述可以参考前述概念具体描述,这里不再赘述。
可选的,在步骤S102的图像识别过程中,电子设备中处理器也可以利用图像识别得到图像场景作为录音场景。图像识别得到图像场景的过程可以类比步骤S102中图像识别得到被摄目标的目标类型。电子设备中处理器可以根据以下一个或多个来确定滤波器:被摄目标的目标类型和录音场景。
(c)增益控制曲线和EQ频响曲线的确定
电子设备中的处理器可以根据步骤S102得到的被摄目标相对于麦克风的距离和该被摄目标所属的目标类型确定增益控制曲线和EQ频响曲线。
可选的,在步骤S102的图像识别过程中,电子设备中处理器也可以利用图像识别得到图像场景作为录音场景。图像识别得到图像场景的过程可以类比步骤S102中图像识别得到被摄目标的目标类型。电子设备中处理器可以根据以下一个或多个来确定增益控制曲线:被录目标的目标类型、录音场景、被录目标与麦克风之间的距离。关于增益控制曲线的具体描述可以参考前述增益控制的概念具体描述,这里不再赘述。
电子设备中处理器根据被摄目标的目标类型和被摄目标相对于麦克风的距离确 定第一增益控制曲线,可具体实施为:根据被摄目标的目标类型和被摄目标相对于麦克风的距离,从第二映射表中获取所述第一增益控制曲线。
其中,该第二映射表中包含多个目标类型、多个距离,以及目标类型i和距离j共同对应的增益控制曲线。目标类型i为这多个目标类型中任一个目标类型,距离j为这多个距离中任一个距离。这多个目标类型包含被摄目标的目标类型,这多个距离包含被摄目标相对于所述麦克风的距离。第一增益控制曲线为从多个增益控制曲线中选取的增益控制曲线。
具体实现中,电子设备中处理器根据所述被摄目标的目标类型和所述被摄目标相对于所述麦克风的距离确定所述第一增益控制曲线,也可具体实施为:根据被摄目标的目标类型从第三映射表中获取第二增益控制曲线。
其中,第三映射表中包含多个目标类型以及多个目标类型中每个目标类型对应的增益控制曲线;多个目标类型包含被摄目标的目标类型;根据被摄目标相对于麦克风的距离从第四映射表中获取第一增益补偿曲线;其中,第四映射表中包含多个距离以及多个距离中每个距离对应的增益补偿曲线;多个距离包含被摄目标相对于麦克风的距离。
电子设备中处理器也可以根据以下一个或多个来确定EQ频响曲线:被录目标的目标类型、录音场景、被录目标与麦克风之间的距离、音频信号的频率。关于EQ频响曲线的具体描述可以参考前述EQ频响的概念具体描述,这里不再赘述。
具体实现中,电子设备中处理器根据被摄目标的目标类型和被摄目标相对于麦克风的距离确定第一EQ频响曲线,可具体实施为:根据被摄目标的目标类型和被摄目标相对于麦克风的距离,从第五映射表中获取第一EQ频响曲线。
其中,第五映射表中包含多个目标类型、多个距离,以及目标类型i和距离j共同对应的EQ频响曲线。其中,目标类型i为多个目标类型中任一个目标类型,距离j为多个距离中任一个距离。多个目标类型包含被摄目标的目标类型,多个距离包含被摄目标相对于麦克风的距离。
第一EQ频响曲线为从多个EQ频响曲线中选取的EQ频响曲线。
具体实现中,电子设备中处理器根据被摄目标的目标类型和被摄目标相对于麦克风的距离确定第一EQ频响曲线,可具体实施为:根据被摄目标的目标类型从第六映射表中获取第二EQ频响曲线。
其中,第六映射表中包含多个目标类型以及多个目标类型中每个目标类型对应的EQ频响曲线。多个目标类型包含被摄目标的目标类型;根据被摄目标相对于麦克风的距离从第七映射表中获取第一EQ频响补偿曲线。其中,第七映射表中包含多个距离以及多个距离中每个距离对应的EQ频响补偿曲线;多个距离包含被摄目标相对于麦克风的距离。
可以理解的,本申请实施例对电子设备中处理器确定空间增强的方位、第一增益控制曲线、滤波器、第一增益控制曲线和第一均衡器频响曲线的先后顺序不作限定。
根据步骤S101-S103可以确定了以下音频处理策略:空间增强的方位、滤波器 增益控制曲线和EQ频响曲线。确定得到的音频处理策略可以通过步骤S104-S108,来实现对音频信号进行处理。具体的,依次对从拾音组件中获取的原始音频信号进行空间增强、增强滤波、增益控制和EQ均衡。关于空间增强可以参考前述空间增强的概念具体描述,这里不再赘述。关于增强滤波可以参考前述滤波器的概念具体描述,这里不再赘述。关于增益控制和EQ均衡可以参考前述增益控制和EQ均衡的概念具体描述,这里不再赘述。
另外,步骤S105-S107执行的先后顺序也可以是其他的顺序,本申请实施例对此不作限定。按照图4所示出的空间增强、滤波、增益控制和EQ频响控制的顺序执行,首先执行空间增强和滤波,可以提升输出的音频信号中来自被摄目标的音频信号所占的比例,降低噪声所占的比例,从而提升对音频信号的处理效果。
关于步骤S108,由于空间中被摄目标以外的声源的音频信号可以增强空间声场的立体感,可以将拾音组件拾取的原始音频信号叠加到第四音频信号中,来提高输出音频信号的立体感。
具体实现中,为了提高电子设备播放音频的立体感,电子设备中处理器可以根据拾音组件拾取的原始音频信号确定多个声道中每个声道的原始音频信号。
例如在声道数量为两个(左声道和右声道)、麦克风数量也是两个(左侧麦克风和右侧麦克风)的场景下,为提高音频信号的立体感,可以利用拾音组件形成一对正交指向性输出,分别将两个输出指向电子设备的左前方和右前方。将指向左前方的输出音频信号作为左声道的原始音频信号,将指向右前方的输出音频信号作为右声道的原始音频信号。将左声道的原始音频信号执行步骤S101-S107,得到左声道的第四音频信号。之后将左声道的原始音频信号与左声道的第四音频信号叠加,得到左声道的第五音频信号。将右声道的原始音频信号执行步骤S101-S107,得到右声道的第四音频信号。之后将右声道的原始音频信号与右声道的第四音频信号叠加,得到右声道的第五音频信号。通过左声道播放左声道的第五音频信号,通过右声道播放右声道的第五音频信号。上述左右声道分别单独执行步骤S101-S107的音频处理策略的过程中,两个声道之间的音频信号处理和播放互不影响,从而可以提高输出音频信号的立体感。
可以理解的,上述对多个声道分别单独执行步骤S101-S107的音频处理策略的过程的举例仅用于解释本申请实施例,不应构成限定。电子设备中处理器根据拾音组件拾取的原始音频信号确定多个声道中每个声道的原始音频信号的算法还可以是其他算法,拾音组件中包含的麦克风的数量也可以是更多或更少,电子设备中声道的数量也可以是更多或更少,本申请实施例对此不作限定。
可选的,对焦得到的被摄目标的数量可以是多个。如图5所示,则在执行步骤S102-S108时,可以根据这多个被摄目标的位置确定使用以下一种方式实现:a.将这多个被摄目标作为一个被摄目标执行步骤S102-S108;b.多个被摄目标中每个被摄目标单独执行步骤S102-S108。
下面列出两种根据这多个被摄目标的位置确定使用上述a和b方式的示例。
①根据这多个被摄目标相对于麦克风的角度范围确定方式a或b
具体的,当电子设备中处理器检测到这多个被摄目标相对于麦克风的角度范围大于或等于预设角度阈值时,表明这多个被摄目标相对于麦克风的方位相对分散。电子设备中处理器可以使用方式b将多个被摄目标中每个被摄目标单独执行步骤S102-S108。当电子设备中处理器检测到这多个被摄目标相对于麦克风的角度范围小于或等于预设角度阈值时,表明这多个被摄目标相对于麦克风的方位比较聚集,可以作为一个被摄目标处理。则电子设备中处理器可以使用方式a将多个被摄目标作为一个被摄目标执行步骤S102-S108。
②根据这多个被摄目标的目标类型确定方式a或b
具体的,当电子设备中处理器检测到这多个被摄目标中属于同一目标类型的数量或者比例大于或等于设定阈值时,表明这多个被摄目标可以作为同一目标类型进行处理。电子设备中处理器可以使用方式a将多个被摄目标作为一个被摄目标执行步骤S102-S108。当电子设备中处理器检测到这多个被摄目标中属于同一目标类型的数量或者比例小于或等于设定阈值时,电子设备中处理器可以使用方式b将多个被摄目标中每个被摄目标单独执行步骤S102-S108。
可以理解的,上述两种根据这多个被摄目标的位置确定使用上述a和b方式的示例仅用于解释本申请实施例,不应构成限定。
可选的,在步骤S102之后,电子设备中处理器可以调用显示屏显示音频处理策略供用户选择。在检测到用户操作后,根据用户操作确定是否执行步骤S104-S108。下面介绍显示音频处理策略供用户选择的相关界面。
请参阅图8和图9,图8和图9是本申请实施例提供的一种音频处理策略用户交互界面的示例。以下分别进行介绍。
如图8所示,在电子设备的处理器确定音频处理策略后,电子设备中处理器利用显示屏显示识别得到的与目标内容、录音场景相关的提示操作控件,即第一操作控件。在显示屏中第一操作控件检测到用户操作时,例如触摸操作时,电子设备中处理器执行步骤S103-S108。
如图9所示,在电子设备的处理器确定音频处理策略后,电子设备中处理器利用显示屏显示第一显示界面。该第一显示界面显示音频处理策略调整区域,在该区域内,显示屏可以根据检测到的用户操作显示用户选择的目标类型、被摄目标的目标方位、被摄目标与麦克风之间的距离。具体的,如图9所示,音频处理策略调整区域中“类型”对应的选取列表显示多个被摄目标的目标类型,这多个被摄目标的目标类型可供用户选择。“目标的方向”对应的方向选择条包含多个方向中心的角度值可供用户选择。“距离”对应的选取列表显示多个距离梯度,这多个距离梯度可供用户选择。
其中,目标类型、被摄目标的目标方位、被摄目标与麦克风之间的距离默认被选取的取值可以是电子设备中处理器根据步骤S102识别得到的参数取值。如图9所示,电子设备中处理器可以根据接收到的用户操作调整被摄目标的目标类型、被摄目标的目标方位、被摄目标与麦克风之间的距离的取值。在用户对显示屏中“确认”控件操作时,例如触摸操作时,表明用户调整参数完成,电子设备中处理器根据调 整完成的参数取值执行步骤S103-S108。
通过用户对电子设备自动识别得到的音频处理策略进行确认,可以提高音频处理策略识别的准确性和便利性。
可以理解的,图8和图9示出的音频处理策略用户交互界面示例仅用于解释本申请实施例,不应构成限定。音频处理策略用户交互界面还可以有其他的设计,例如图9音频处理策略调整区域中不仅显示目标方向供用户选择,也可以显示角度范围供用户选择。本申请实施例对音频处理策略用户交互界面的具体设计不作限定。
电子设备中的处理器利用图像识别确定被摄目标的目标类型,可选的,电子设备中的处理器还可以利用图像识别确定以下中的一个或多个:被摄目标相对于麦克风的方位和相对于麦克风的距离。则电子设备中的处理器利用被摄目标的目标类型确定滤波器,电子设备中的处理器可选执行:根据被摄目标相对于麦克风的方位,确定空间增强的方位。电子设备中的处理器可选执行:根据被摄目标相对于麦克风的距离和该被摄目标所属的目标类型确定增益控制曲线和EQ频响曲线。
可选的,电子设备中处理器可以是在开始录制音视频时,执行图4所示出的音频处理方法。即在图2示出的场景中响应于第一按键的用户操作之后,电子设备中处理器执行图4所示出的音频处理方法。电子设备中处理器在开始录制视频时执行图4所示出的音频处理方法,可以实时对拾音组件拾取的音频信号进行处理,在边录边播的场景下,可以实时的自动选择音频处理策略,提高音频处理策略选择的便利性,并提高不同被摄目标的目标类型或者不同录音场景下音频信号的处理效果。
可选的,电子设备中处理器可以是在录制音视频结束后,执行图4所示出的音频处理方法。即在图2示出的场景中响应于第二按键的用户操作之后,电子设备中处理器执行图4所示出的音频处理方法。电子设备中处理器在结束录制视频时执行图4所示出的音频处理方法,可以在录制音视频过程中减少处理器的占用,提高音视频录制过程的流畅度,提高音频处理策略选择的便利性,并提高不同被摄目标的目标类型或者不同录音场景下音频信号的处理效果。
可选的,电子设备中处理器还可以是在录制音视频结束,将录制得到音视频信号存储进存储器时,执行图4所示出的音频处理方法。电子设备中处理器在结束录制音视频,将录制得到音视频信号存储进存储器时执行图4所示出的音频处理方法,可以在录制音视频过程中减少处理器的占用,提高音视频录制过程的流畅度。这样,在需要保存录制得到音视频信号时才对音视频信号执行图4所示出的音频处理方法,减少不需要保存录制得到音视频信号时对处理器资源的浪费,从而可以节省处理器资源。
下面介绍本申请实施例的装置。请参阅图10,图10是本申请实施例提供的一种电子设备100的结构示意图。
电子设备100可以包括处理器110,外部存储器接口120,内部存储器121,通用串行总线(universal serial bus,USB)接口130,充电管理模块140,电源管理模块141,电池142,天线1,天线2,移动通信模块150,无线通信模块160,音频模 块170,扬声器170A,受话器170B,麦克风170C,耳机接口170D,传感器模块180,按键190,马达191,指示器192,摄像头193,显示屏194,以及用户标识模块(subscriber identification module,SIM)卡接口195等。其中传感器模块180可以包括压力传感器180A,陀螺仪传感器180B,气压传感器180C,磁传感器180D,加速度传感器180E,距离传感器180F,接近光传感器180G,指纹传感器180H,温度传感器180J,触摸传感器180K,环境光传感器180L,骨传导传感器180M等。
电子设备100可以是手机、平板电脑、独立的摄像设备或者包含摄像头和麦克风的其他设备。可以理解的是,本发明实施例示意的结构并不构成对电子设备100的具体限定。在本申请另一些实施例中,电子设备100可以包括比图示更多或更少的部件,或者组合某些部件,或者拆分某些部件,或者不同的部件布置。图示的部件可以以硬件,软件或软件和硬件的组合实现。
处理器110用于读取存储器1302中存储的程序代码,执行本申请实施例提供的音频处理方法,例如图4所描述的音频处理方法。
具体的,处理器110用于读取存储器1302中存储的程序代码,执行对摄像组件获取的第一图像进行图像识别,得到第一图像中被摄目标的目标类型、被摄目标相对于麦克风的方位和被摄目标相对于麦克风的距离。
处理器110还用于读取存储器1302中存储的程序代码,执行根据被摄目标的目标类型、被摄目标相对于麦克风170C的方位和被摄目标相对于麦克风170C的距离确定音频处理策略;根据音频处理策略对麦克风拾取的音频信号进行处理。具体的,根据音频处理策略对麦克风170C拾取的音频信号进行空间增强、滤波、增益控制和均衡器频响控制。
处理器110还用于读取存储器1302中存储的程序代码,执行将所述麦克风170C拾取的音频信号与第四音频信号叠加得到第五音频信号;第四音频信号为麦克风170C拾取的音频信号经过空间增强、滤波、增益控制和均衡器频响控制之后得到的音频信号。
其中,第五音频信号可以利用扬声器170A进行播放,第五音频信号还可以通过耳机接口170D外接的有线耳机播放。第五音频信号可以是播放视频时同步播放的音频信号。
本申请实施例中,摄像组件可以包含摄像头193。在一些实施例中,摄像组件也可以包含视频编解码器。拾音组件可以包含麦克风170C。在一些实施例中,拾音组件也可以包含音频模块170。
处理器110可以包括一个或多个处理单元,例如:处理器110可以包括应用处理器(application processor,AP),调制解调处理器,图形处理器(graphics processing unit,GPU),图像信号处理器(image signal processor,ISP),控制器,存储器,视频编解码器,数字信号处理器(digital signal processor,DSP),基带处理器,和/或神经网络处理器(neural-network processing unit,NPU)等。其中,不同的处理单元可以是独立的器件,也可以集成在一个或多个处理器中。
其中,控制器可以是电子设备100的神经中枢和指挥中心。控制器可以根据指令操作码和时序信号,产生操作控制信号,完成取指令和执行指令的控制。
处理器110中还可以设置存储器,用于存储指令和数据。在一些实施例中,处理器110中的存储器为高速缓冲存储器。该存储器可以保存处理器110刚用过或循环使用的指令或数据。如果处理器110需要再次使用该指令或数据,可从所述存储器中直接调用。避免了重复存取,减少了处理器110的等待时间,因而提高了系统的效率。
在一些实施例中,处理器110可以包括一个或多个接口。接口可以包括集成电路(inter-integrated circuit,I2C)接口,集成电路内置音频(inter-integrated circuit sound,I2S)接口,脉冲编码调制(pulse code modulation,PCM)接口,通用异步收发传输器(universal asynchronous receiver/transmitter,UART)接口,移动产业处理器接口(mobile industry processor interface,MIPI),通用输入输出(general-purpose input/output,GPIO)接口,用户标识模块(subscriber identity module,SIM)接口,和/或通用串行总线(universal serial bus,USB)接口等。
I2C接口是一种双向同步串行总线,包括一根串行数据线(serial data line,SDA)和一根串行时钟线(derail clock line,SCL)。在一些实施例中,处理器110可以包含多组I2C总线。处理器110可以通过不同的I2C总线接口分别耦合触摸传感器180K,充电器,闪光灯,摄像头193等。例如:处理器110可以通过I2C接口耦合触摸传感器180K,使处理器110与触摸传感器180K通过I2C总线接口通信,实现电子设备100的触摸功能。
I2S接口可以用于音频通信。在一些实施例中,处理器110可以包含多组I2S总线。处理器110可以通过I2S总线与音频模块170耦合,实现处理器110与音频模块170之间的通信。在一些实施例中,音频模块170可以通过I2S接口向无线通信模块160传递音频信号,实现通过蓝牙耳机接听电话的功能。
PCM接口也可以用于音频通信,将模拟信号抽样,量化和编码。在一些实施例中,音频模块170与无线通信模块160可以通过PCM总线接口耦合。在一些实施例中,音频模块170也可以通过PCM接口向无线通信模块160传递音频信号,实现通过蓝牙耳机接听电话的功能。所述I2S接口和所述PCM接口都可以用于音频通信。
UART接口是一种通用串行数据总线,用于异步通信。该总线可以为双向通信总线。它将要传输的数据在串行通信与并行通信之间转换。在一些实施例中,UART接口通常被用于连接处理器110与无线通信模块160。例如:处理器110通过UART接口与无线通信模块160中的蓝牙模块通信,实现蓝牙功能。在一些实施例中,音频模块170可以通过UART接口向无线通信模块160传递音频信号,实现通过蓝牙耳机播放音乐的功能。
MIPI接口可以被用于连接处理器110与显示屏194,摄像头193等外围器件。MIPI接口包括摄像头串行接口(camera serial interface,CSI),显示屏串行接口(display serial interface,DSI)等。在一些实施例中,处理器110和摄像头193通过CSI接口通信,实现电子设备100的拍摄功能。处理器110和显示屏194通过DSI接口通信,实现电子设备100的显示功能。
GPIO接口可以通过软件配置。GPIO接口可以被配置为控制信号,也可被配置为数据信号。在一些实施例中,GPIO接口可以用于连接处理器110与摄像头193, 显示屏194,无线通信模块160,音频模块170,传感器模块180等。GPIO接口还可以被配置为I2C接口,I2S接口,UART接口,MIPI接口等。
USB接口130是符合USB标准规范的接口,具体可以是Mini USB接口,Micro USB接口,USB Type C接口等。USB接口130可以用于连接充电器为电子设备100充电,也可以用于电子设备100与外围设备之间传输数据。也可以用于连接耳机,通过耳机播放音频。该接口还可以用于连接其他电子设备,例如AR设备等。
可以理解的是,本发明实施例示意的各模块间的接口连接关系,只是示意性说明,并不构成对电子设备100的结构限定。在本申请另一些实施例中,电子设备100也可以采用上述实施例中不同的接口连接方式,或多种接口连接方式的组合。
充电管理模块140用于从充电器接收充电输入。其中,充电器可以是无线充电器,也可以是有线充电器。在一些有线充电的实施例中,充电管理模块140可以通过USB接口130接收有线充电器的充电输入。在一些无线充电的实施例中,充电管理模块140可以通过电子设备100的无线充电线圈接收无线充电输入。充电管理模块140为电池142充电的同时,还可以通过电源管理模块141为电子设备供电。
电源管理模块141用于连接电池142,充电管理模块140与处理器110。电源管理模块141接收电池142和/或充电管理模块140的输入,为处理器110,内部存储器121,外部存储器,显示屏194,摄像头193,和无线通信模块160等供电。电源管理模块141还可以用于监测电池容量,电池循环次数,电池健康状态(漏电,阻抗)等参数。在其他一些实施例中,电源管理模块141也可以设置于处理器110中。在另一些实施例中,电源管理模块141和充电管理模块140也可以设置于同一个器件中。
电子设备100的无线通信功能可以通过天线1,天线2,移动通信模块150,无线通信模块160,调制解调处理器以及基带处理器等实现。
天线1和天线2用于发射和接收电磁波信号。电子设备100中的每个天线可用于覆盖单个或多个通信频带。不同的天线还可以复用,以提高天线的利用率。例如:可以将天线1复用为无线局域网的分集天线。在另外一些实施例中,天线可以和调谐开关结合使用。
移动通信模块150可以提供应用在电子设备100上的包括2G/3G/4G/5G等无线通信的解决方案。移动通信模块150可以包括至少一个滤波器,开关,功率放大器,低噪声放大器(low noise amplifier,LNA)等。移动通信模块150可以由天线1接收电磁波,并对接收的电磁波进行滤波,放大等处理,传送至调制解调处理器进行解调。移动通信模块150还可以对经调制解调处理器调制后的信号放大,经天线1转为电磁波辐射出去。在一些实施例中,移动通信模块150的至少部分功能模块可以被设置于处理器110中。在一些实施例中,移动通信模块150的至少部分功能模块可以与处理器110的至少部分模块被设置在同一个器件中。
调制解调处理器可以包括调制器和解调器。其中,调制器用于将待发送的低频基带信号调制成中高频信号。解调器用于将接收的电磁波信号解调为低频基带信号。随后解调器将解调得到的低频基带信号传送至基带处理器处理。低频基带信号经基带处理器处理后,被传递给应用处理器。应用处理器通过音频设备(不限于扬声器 170A,受话器170B等)输出声音信号,或通过显示屏194显示图像或视频。在一些实施例中,调制解调处理器可以是独立的器件。在另一些实施例中,调制解调处理器可以独立于处理器110,与移动通信模块150或其他功能模块设置在同一个器件中。
无线通信模块160可以提供应用在电子设备100上的包括无线局域网(wireless local area networks,WLAN)(如无线保真(wireless fidelity,Wi-Fi)网络),蓝牙(bluetooth,BT),全球导航卫星系统(global navigation satellite system,GNSS),调频(frequency modulation,FM),近距离无线通信技术(near field communication,NFC),红外技术(infrared,IR)等无线通信的解决方案。无线通信模块160可以是集成至少一个通信处理模块的一个或多个器件。无线通信模块160经由天线2接收电磁波,将电磁波信号调频以及滤波处理,将处理后的信号发送到处理器110。无线通信模块160还可以从处理器110接收待发送的信号,对其进行调频,放大,经天线2转为电磁波辐射出去。
在一些实施例中,电子设备100的天线1和移动通信模块150耦合,天线2和无线通信模块160耦合,使得电子设备100可以通过无线通信技术与网络以及其他设备通信。所述无线通信技术可以包括全球移动通讯系统(global system for mobile communications,GSM),通用分组无线服务(general packet radio service,GPRS),码分多址接入(code division multiple access,CDMA),宽带码分多址(wideband code division multiple access,WCDMA),时分码分多址(time-division code division multiple access,TD-SCDMA),长期演进(long term evolution,LTE),BT,GNSS,WLAN,NFC,FM,和/或IR技术等。所述GNSS可以包括全球卫星定位系统(global positioning system,GPS),全球导航卫星系统(global navigation satellite system,GLONASS),北斗卫星导航系统(beidou navigation satellite system,BDS),准天顶卫星系统(quasi-zenith satellite system,QZSS)和/或星基增强系统(satellite based augmentation systems,SBAS)。
电子设备100通过GPU,显示屏194,以及应用处理器等实现显示功能。GPU为图像处理的微处理器,连接显示屏194和应用处理器。GPU用于执行数学和几何计算,用于图形渲染。处理器110可包括一个或多个GPU,其执行程序指令以生成或改变显示信息。
显示屏194用于显示图像,视频等。显示屏194包括显示面板。显示面板可以采用液晶显示屏(liquid crystal display,LCD),有机发光二极管(organic light-emitting diode,OLED),有源矩阵有机发光二极体或主动矩阵有机发光二极体(active-matrix organic light emitting diode的,AMOLED),柔性发光二极管(flex light-emitting diode,FLED),Miniled,MicroLed,Micro-oLed,量子点发光二极管(quantum dot light emitting diodes,QLED)等。在一些实施例中,电子设备100可以包括1个或N个显示屏194,N为大于1的正整数。
电子设备100可以通过ISP,摄像头193,视频编解码器,GPU,显示屏194以及应用处理器等实现拍摄功能。
ISP用于处理摄像头193反馈的数据。例如,拍照时,打开快门,光线通过镜头被传递到摄像头感光元件上,光信号转换为电信号,摄像头感光元件将所述电信 号传递给ISP处理,转化为肉眼可见的图像。ISP还可以对图像的噪点,亮度,肤色进行算法优化。ISP还可以对拍摄场景的曝光,色温等参数优化。在一些实施例中,ISP可以设置在摄像头193中。
摄像头193用于捕获静态图像或视频。物体通过镜头生成光学图像投射到感光元件。感光元件可以是电荷耦合器件(charge coupled device,CCD)或互补金属氧化物半导体(complementary metal-oxide-semiconductor,CMOS)光电晶体管。感光元件把光信号转换成电信号,之后将电信号传递给ISP转换成数字图像信号。ISP将数字图像信号输出到DSP加工处理。DSP将数字图像信号转换成标准的RGB,YUV等格式的图像信号。在一些实施例中,电子设备100可以包括1个或N个摄像头193,N为大于1的正整数。当N大于2时,电子设备可以使用这N个摄像头测量被摄目标与摄像头之间的距离。
数字信号处理器用于处理数字信号,除了可以处理数字图像信号,还可以处理其他数字信号。例如,当电子设备100在频点选择时,数字信号处理器用于对频点能量进行傅里叶变换等。
视频编解码器用于对数字视频压缩或解压缩。电子设备100可以支持一种或多种视频编解码器。这样,电子设备100可以播放或录制多种编码格式的视频,例如:动态图像专家组(moving picture experts group,MPEG)1,MPEG2,MPEG3,MPEG4等。
NPU为神经网络(neural-network,NN)计算处理器,通过借鉴生物神经网络结构,例如借鉴人脑神经元之间传递模式,对输入信息快速处理,还可以不断的自学习。通过NPU可以实现电子设备100的智能认知等应用,例如:图像识别,人脸识别,语音识别,文本理解等。
外部存储器接口120可以用于连接外部存储卡,例如Micro SD卡,实现扩展电子设备100的存储能力。外部存储卡通过外部存储器接口120与处理器110通信,实现数据存储功能。例如将音乐,视频等文件保存在外部存储卡中。
内部存储器121可以用于存储计算机可执行程序代码,所述可执行程序代码包括指令。处理器110通过运行存储在内部存储器121的指令,从而执行电子设备100的各种功能应用以及数据处理。内部存储器121可以包括存储程序区和存储数据区。其中,存储程序区可存储操作系统,至少一个功能所需的应用程序(比如声音播放功能,图像播放功能等)等。存储数据区可存储电子设备100使用过程中所创建的数据(比如音频数据,电话本等)等。此外,内部存储器121可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件,闪存器件,通用闪存存储器(universal flash storage,UFS)等。
电子设备100可以通过音频模块170,扬声器170A,受话器170B,麦克风170C,耳机接口170D,以及应用处理器等实现音频功能。例如音乐播放,录音等。
音频模块170用于将数字音频信息转换成模拟音频信号输出,也用于将模拟音频输入转换为数字音频信号。音频模块170还可以用于对音频信号编码和解码。在一些实施例中,音频模块170可以设置于处理器110中,或将音频模块170的部分功能模块设置于处理器110中。
扬声器170A,也称“喇叭”,用于将音频电信号转换为声音信号。电子设备100可以通过扬声器170A收听音乐,或收听免提通话。
受话器170B,也称“听筒”,用于将音频电信号转换成声音信号。当电子设备100接听电话或语音信息时,可以通过将受话器170B靠近人耳接听语音。
麦克风170C,也称“话筒”,“传声器”,用于将声音信号转换为电信号。当拨打电话或发送语音信息时,用户可以通过人嘴靠近麦克风170C发声,将声音信号输入到麦克风170C。电子设备100可以设置至少一个麦克风170C。在另一些实施例中,电子设备100可以设置两个麦克风170C,除了采集声音信号,还可以实现降噪功能。在另一些实施例中,电子设备100还可以设置三个,四个或更多麦克风170C,实现采集声音信号,降噪,还可以识别声音来源,实现定向录音功能等。
耳机接口170D用于连接有线耳机。耳机接口170D可以是USB接口130,也可以是3.5mm的开放移动电子设备平台(open mobile terminal platform,OMTP)标准接口,美国蜂窝电信工业协会(cellular telecommunications industry association of the USA,CTIA)标准接口。
压力传感器180A用于感受压力信号,可以将压力信号转换成电信号。在一些实施例中,压力传感器180A可以设置于显示屏194。压力传感器180A的种类很多,如电阻式压力传感器,电感式压力传感器,电容式压力传感器等。电容式压力传感器可以是包括至少两个具有导电材料的平行板。当有力作用于压力传感器180A,电极之间的电容改变。电子设备100根据电容的变化确定压力的强度。当有触摸操作作用于显示屏194,电子设备100根据压力传感器180A检测所述触摸操作强度。电子设备100也可以根据压力传感器180A的检测信号计算触摸的位置。在一些实施例中,作用于相同触摸位置,但不同触摸操作强度的触摸操作,可以对应不同的操作指令。例如:当有触摸操作强度小于第一压力阈值的触摸操作作用于短消息应用图标时,执行查看短消息的指令。当有触摸操作强度大于或等于第一压力阈值的触摸操作作用于短消息应用图标时,执行新建短消息的指令。
陀螺仪传感器180B可以用于确定电子设备100的运动姿态。在一些实施例中,可以通过陀螺仪传感器180B确定电子设备100围绕三个轴(即,x,y和z轴)的角速度。陀螺仪传感器180B可以用于拍摄防抖。示例性的,当按下快门,陀螺仪传感器180B检测电子设备100抖动的角度,根据角度计算出镜头模组需要补偿的距离,让镜头通过反向运动抵消电子设备100的抖动,实现防抖。陀螺仪传感器180B还可以用于导航,体感游戏场景。
气压传感器180C用于测量气压。在一些实施例中,电子设备100通过气压传感器180C测得的气压值计算海拔高度,辅助定位和导航。
磁传感器180D包括霍尔传感器。电子设备100可以利用磁传感器180D检测翻盖皮套的开合。在一些实施例中,当电子设备100是翻盖机时,电子设备100可以根据磁传感器180D检测翻盖的开合。进而根据检测到的皮套的开合状态或翻盖的开合状态,设置翻盖自动解锁等特性。
加速度传感器180E可检测电子设备100在各个方向上(一般为三轴)加速度的大小。当电子设备100静止时可检测出重力的大小及方向。还可以用于识别电子设 备姿态,应用于横竖屏切换,计步器等应用。
距离传感器180F,用于测量距离。电子设备100可以通过红外或激光测量距离。在一些实施例中,拍摄场景,电子设备100可以利用距离传感器180F测距以实现快速对焦。
接近光传感器180G可以包括例如发光二极管(LED)和光检测器,例如光电二极管。发光二极管可以是红外发光二极管。电子设备100通过发光二极管向外发射红外光。电子设备100使用光电二极管检测来自附近物体的红外反射光。当检测到充分的反射光时,可以确定电子设备100附近有物体。当检测到不充分的反射光时,电子设备100可以确定电子设备100附近没有物体。电子设备100可以利用接近光传感器180G检测用户手持电子设备100贴近耳朵通话,以便自动熄灭屏幕达到省电的目的。接近光传感器180G也可用于皮套模式,口袋模式自动解锁与锁屏。
环境光传感器180L用于感知环境光亮度。电子设备100可以根据感知的环境光亮度自适应调节显示屏194亮度。环境光传感器180L也可用于拍照时自动调节白平衡。环境光传感器180L还可以与接近光传感器180G配合,检测电子设备100是否在口袋里,以防误触。
指纹传感器180H用于采集指纹。电子设备100可以利用采集的指纹特性实现指纹解锁,访问应用锁,指纹拍照,指纹接听来电等。
温度传感器180J用于检测温度。在一些实施例中,电子设备100利用温度传感器180J检测的温度,执行温度处理策略。例如,当温度传感器180J上报的温度超过阈值,电子设备100执行降低位于温度传感器180J附近的处理器的性能,以便降低功耗实施热保护。在另一些实施例中,当温度低于另一阈值时,电子设备100对电池142加热,以避免低温导致电子设备100异常关机。在其他一些实施例中,当温度低于又一阈值时,电子设备100对电池142的输出电压执行升压,以避免低温导致的异常关机。
触摸传感器180K,也称“触控面板”。触摸传感器180K可以设置于显示屏194,由触摸传感器180K与显示屏194组成触摸屏,也称“触控屏”。触摸传感器180K用于检测作用于其上或附近的触摸操作。触摸传感器可以将检测到的触摸操作传递给应用处理器,以确定触摸事件类型。可以通过显示屏194提供与触摸操作相关的视觉输出。在另一些实施例中,触摸传感器180K也可以设置于电子设备100的表面,与显示屏194所处的位置不同。
骨传导传感器180M可以获取振动信号。在一些实施例中,骨传导传感器180M可以获取人体声部振动骨块的振动信号。骨传导传感器180M也可以接触人体脉搏,接收血压跳动信号。在一些实施例中,骨传导传感器180M也可以设置于耳机中,结合成骨传导耳机。音频模块170可以基于所述骨传导传感器180M获取的声部振动骨块的振动信号,解析出语音信号,实现语音功能。应用处理器可以基于所述骨传导传感器180M获取的血压跳动信号解析心率信息,实现心率检测功能。
按键190包括开机键,音量键等。按键190可以是机械按键。也可以是触摸式按键。电子设备100可以接收按键输入,产生与电子设备100的用户设置以及功能控制有关的键信号输入。
马达191可以产生振动提示。马达191可以用于来电振动提示,也可以用于触摸振动反馈。例如,作用于不同应用(例如拍照,音频播放等)的触摸操作,可以对应不同的振动反馈效果。作用于显示屏194不同区域的触摸操作,马达191也可对应不同的振动反馈效果。不同的应用场景(例如:时间提醒,接收信息,闹钟,游戏等)也可以对应不同的振动反馈效果。触摸振动反馈效果还可以支持自定义。
指示器192可以是指示灯,可以用于指示充电状态,电量变化,也可以用于指示消息,未接来电,通知等。
SIM卡接口195用于连接SIM卡。SIM卡可以通过插入SIM卡接口195,或从SIM卡接口195拔出,实现和电子设备100的接触和分离。电子设备100可以支持1个或N个SIM卡接口,N为大于1的正整数。SIM卡接口195可以支持Nano SIM卡,Micro SIM卡,SIM卡等。同一个SIM卡接口195可以同时插入多张卡。所述多张卡的类型可以相同,也可以不同。SIM卡接口195也可以兼容不同类型的SIM卡。SIM卡接口195也可以兼容外部存储卡。电子设备100通过SIM卡和网络交互,实现通话以及数据通信等功能。在一些实施例中,电子设备100采用eSIM,即:嵌入式SIM卡。eSIM卡可以嵌在电子设备100中,不能和电子设备100分离。
本申请实施例还提供一种电子设备,该电子设备包括用于实现图4描述的音频处理方法的模块或单元。
本申请实施例还提供一种芯片系统,该芯片系统包括至少一个处理器,存储器和接口电路,该存储器、该接口电路和该至少一个处理器连接,该至少一个存储器中存储有程序指令;该程序指令被该处理器执行时,可以实现图4描述的音频处理方法。
本申请实施例还提供一种计算机可读存储介质,该计算机可读存储介质中存储有程序指令,当该程序指令由处理器运行时,实现图4描述的音频处理方法。
在上述实施例中,全部或部分功能可以通过软件、硬件、或者软件加硬件的组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时,全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质(例如,DVD)、或者半导体介质(例如,固态硬盘(solid state disk,SSD))等。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,该流程可以由计算机程序来指令相关的硬件完成,该程序可存储于计算机可读取存储介质中,该程序在执行时,可包括如上述各方法实施例的流程。而前述的存储介质包括:ROM或随机存储记忆体RAM、磁碟或者光盘等各种可存储程序代码的介质。

Claims (26)

  1. 一种音频处理方法,其特征在于,所述方法包括:
    对摄像组件获取的第一图像进行图像识别,得到所述第一图像中被摄目标的目标类型、所述被摄目标相对于麦克风的方位和所述被摄目标相对于所述麦克风的距离;
    根据所述被摄目标的目标类型、所述被摄目标相对于麦克风的方位和所述被摄目标相对于所述麦克风的距离确定音频处理策略;
    根据所述音频处理策略对所述麦克风拾取的音频信号进行处理。
  2. 根据权利要求1所述的方法,其特征在于,所述根据所述被摄目标的目标类型、所述被摄目标相对于麦克风的方位和所述被摄目标相对于所述麦克风的距离确定音频处理策略,包括:
    根据所述被摄目标相对于麦克风的方位确定空间增强的方位;
    根据所述被摄目标的目标类型确定滤波器;
    根据所述被摄目标的目标类型和所述被摄目标相对于所述麦克风的距离确定第一增益控制曲线和第一均衡器频响曲线;
    所述音频处理策略包括所述空间增强的方位、所述滤波器、所述第一增益控制曲线和所述第一均衡器频响曲线。
  3. 根据权利要求1或2所述的方法,其特征在于,所述根据所述音频处理策略对所述麦克风拾取的音频信号进行处理,包括:根据所述音频处理策略对所述麦克风拾取的音频信号进行空间增强、滤波、增益控制和均衡器频响控制。
  4. 根据权利要求3所述的音频处理方法,其特征在于,所述根据所述音频处理策略对所述麦克风拾取的音频信号进行空间增强、滤波、增益控制和均衡器频响控制,包括:
    对原始音频信号在所述空间增强的方位上进行空间增强,得到第一音频信号;所述原始音频信号是所述麦克风拾取的音频信号;
    使用所述滤波器对所述第一音频信号进行滤波,得到第二音频信号;
    使用所述第一增益控制曲线对所述第二音频信号进行增益控制,得到第三音频信号;
    使用所述第一均衡器频响曲线对所述第三音频信号进行均衡器频响控制,得到第四音频信号。
  5. 根据权利要求4所述的音频处理方法,其特征在于,所述根据所述音频处理策略对所述麦克风拾取的音频信号进行空间增强、滤波、增益控制和均衡器频响控制之后,所述方法还包括:
    将所述麦克风拾取的音频信号与所述第四音频信号叠加得到第五音频信号。
  6. 根据权利要求1至5任一项所述的音频处理方法,其特征在于,所述根据所述 音频处理策略对所述麦克风拾取的音频信号进行处理,包括:
    根据所述麦克风拾取的音频信号确定多个声道中每个声道的原始音频信号;
    根据所述音频处理策略对所述每个声道的原始音频信号进行处理。
  7. 根据权利要求1至6任一项所述的音频处理方法,其特征在于,所述根据所述音频处理策略对所述麦克风拾取的音频信号进行处理之前,所述方法还包括:
    显示所述音频处理策略;
    所述根据所述音频处理策略对所述麦克风拾取的音频信号进行处理,包括:
    响应于用户对所述音频处理策略的操作,根据所述音频处理策略对所述麦克风拾取的音频信号进行处理。
  8. 一种音频处理方法,其特征在于,所述方法包括:
    对摄像组件获取的第一图像进行图像识别,得到所述第一图像中被摄目标的目标类型;
    根据所述被摄目标的目标类型确定滤波器;
    使用所述滤波器对麦克风拾取的音频信号进行滤波。
  9. 根据权利要求8所述的音频处理方法,其特征在于,所述使用所述滤波器对麦克风拾取的音频信号进行滤波之前,所述方法还包括:根据所述图像识别得到所述被摄目标相对于所述麦克风的方位;
    对原始音频信号在所述被摄目标相对于麦克风的方位上进行空间增强,得到第一音频信号;所述原始音频信号是所述麦克风拾取到的音频信号;
    所述使用所述滤波器对麦克风拾取的音频信号进行滤波,包括:
    使用所述滤波器对所述第一音频信号进行滤波,得到第二音频信号。
  10. 根据权利要求8或9所述的音频处理方法,其特征在于,所述方法还包括:
    根据所述图像识别,得到所述被摄目标相对于所述麦克风的距离;
    根据所述被摄目标的目标类型和所述被摄目标相对于所述麦克风的距离,确定所述第一增益控制曲线和第一均衡器频响曲线;
    所述使用所述滤波器对麦克风拾取的音频信号进行滤波之后,所述方法还包括:
    使用所述第一增益控制曲线对第二音频信号进行增益控制,得到第三音频信号;所述第二音频信号为所述滤波器对所述麦克风拾取到的音频信号进行滤波得到的音频信号;
    使用所述第一均衡器频响曲线对所述第三音频信号进行均衡器频响控制,得到第四音频信号。
  11. 根据权利要求10所述的音频处理方法,其特征在于,所述使用所述第一均衡器频响曲线对所述第三音频信号进行均衡器频响控制,得到第四音频信号之后,所述方法还包括:
    将所述原始音频信号与第四音频信号叠加得到第五音频信号;所述原始音频信号是所述麦克风拾取的音频信号。
  12. 根据权利要求8至11任一项所述的音频处理方法,其特征在于,所述使用所述滤波器对麦克风拾取的音频信号进行滤波,包括:
    根据所述麦克风拾取的音频信号确定多个声道中每个声道的原始音频信号;
    对所述每个声道的原始音频信号进行处理,所述处理包含使用所述滤波器进行滤波。
  13. 根据权利要求8至12任一项所述的音频处理方法,其特征在于,所述使用所述滤波器对麦克风拾取的音频信号进行滤波之前,所述方法还包括:
    显示所述音频处理策略;
    所述使用所述滤波器对麦克风拾取的音频信号进行滤波,包括:
    响应于用户对所述音频处理策略的操作,使用所述滤波器对麦克风拾取的音频信号进行滤波。
  14. 一种电子设备,其特征在于,所述电子设备包括处理器和存储器,所述存储器用于存储程序指令,所述处理器调用所述程序指令,用于:
    对摄像组件获取的第一图像进行图像识别,得到所述第一图像中被摄目标的目标类型、所述被摄目标相对于麦克风的方位和所述被摄目标相对于所述麦克风的距离;
    根据所述被摄目标的目标类型、所述被摄目标相对于麦克风的方位和所述被摄目标相对于所述麦克风的距离确定音频处理策略;
    根据所述音频处理策略对所述麦克风拾取的音频信号进行处理。
  15. 根据权利要求14所述的电子设备,其特征在于,所述处理器调用所述程序指令,用于:
    根据所述被摄目标相对于麦克风的方位确定空间增强的方位;
    根据所述被摄目标的目标类型确定滤波器;
    根据所述被摄目标的目标类型和所述被摄目标相对于所述麦克风的距离确定第一增益控制曲线和第一均衡器频响曲线;
    所述音频处理策略包括所述空间增强的方位、所述滤波器、所述第一增益控制曲线和所述第一均衡器频响曲线。
  16. 根据权利要求14或15所述的电子设备,其特征在于,所述处理器调用所述程序指令,用于:根据所述音频处理策略对所述麦克风拾取的音频信号进行空间增强、滤波、增益控制和均衡器频响控制。
  17. 根据权利要求16所述的电子设备,其特征在于,所述处理器调用所述程序指令,用于:
    对原始音频信号在所述空间增强的方位上进行空间增强,得到第一音频信号;所述原始音频信号是所述麦克风拾取的音频信号;
    使用所述滤波器对所述第一音频信号进行滤波,得到第二音频信号;
    使用所述第一增益控制曲线对所述第二音频信号进行增益控制,得到第三音频信号;
    使用所述第一均衡器频响曲线对所述第三音频信号进行均衡器频响控制,得到第四音频信号。
  18. 根据权利要求16或17所述的电子设备,其特征在于,所述处理器执行根据所述音频处理策略对所述麦克风拾取的音频信号进行空间增强、滤波、增益控制和均衡器频响控制之后,所述处理器调用所述程序指令,用于:
    将所述麦克风拾取的音频信号与第四音频信号叠加得到第五音频信号;所述第四音频信号为所述麦克风拾取的音频信号经过空间增强、滤波、增益控制和均衡器频响控制之后得到的音频信号。
  19. 根据权利要求14至18任一项所述的电子设备,其特征在于,所述处理器调用所述程序指令,用于:
    根据所述麦克风拾取的音频信号确定多个声道中每个声道的原始音频信号;
    根据所述音频处理策略对所述每个声道的原始音频信号进行处理。
  20. 根据权利要求14至19任一项所述的电子设备,其特征在于,所述电子设备还包括显示屏,所述处理器执行根据所述音频处理策略对所述麦克风拾取的音频信号进行处理之前,所述显示屏,用于显示所述音频处理策略;
    所述处理器调用所述程序指令,用于:
    响应于用户对所述音频处理策略的操作,根据所述音频处理策略对所述麦克风拾取的音频信号进行处理。
  21. 一种电子设备,其特征在于,所述电子设备包括处理器和存储器,所述存储器用于存储程序指令,所述处理器调用所述程序指令,用于:
    对摄像组件获取的第一图像进行图像识别,得到所述第一图像中被摄目标的目标类型;
    根据所述被摄目标的目标类型确定滤波器;
    使用所述滤波器对麦克风拾取的音频信号进行滤波。
  22. 根据权利要求21所述的电子设备,其特征在于,所述处理器执行所述使用所述滤波器对麦克风拾取的音频信号进行滤波之前,所述处理器调用所述程序指令,用于:
    根据所述图像识别得到所述被摄目标相对于所述麦克风的方位;
    对原始音频信号在所述被摄目标相对于麦克风的方位上进行空间增强,得到第一 音频信号;所述原始音频信号是所述麦克风拾取到的音频信号;
    所述处理器还用于调用所述程序指令来执行如下操作:
    使用所述滤波器对所述第一音频信号进行滤波,得到第二音频信号。
  23. 根据权利要求21或22所述的电子设备,其特征在于,所述处理器调用所述程序指令,用于:
    根据所述图像识别,得到所述被摄目标相对于所述麦克风的距离;
    根据所述被摄目标的目标类型和所述被摄目标相对于所述麦克风的距离,确定所述第一增益控制曲线和第一均衡器频响曲线;
    所述处理器执行所述使用所述滤波器对麦克风拾取的音频信号进行滤波之后,所述处理器还用于调用所述程序指令来执行如下操作:
    使用所述第一增益控制曲线对第二音频信号进行增益控制,得到第三音频信号;所述第二音频信号为所述滤波器对所述麦克风拾取到的音频信号进行滤波得到的音频信号;
    使用所述第一均衡器频响曲线对所述第三音频信号进行均衡器频响控制,得到第四音频信号。
  24. 根据权利要求23所述的电子设备,其特征在于,所述处理器执行所述使用所述第一均衡器频响曲线对所述第三音频信号进行均衡器频响控制,得到第四音频信号之后,所述处理器调用所述程序指令,用于:
    将所述原始音频信号与第四音频信号叠加得到第五音频信号;所述原始音频信号是所述麦克风拾取的音频信号。
  25. 根据权利要求21至24任一项所述的电子设备,其特征在于,所述处理器调用所述程序指令,用于:
    根据所述麦克风拾取的音频信号确定多个声道中每个声道的原始音频信号;
    对所述每个声道的原始音频信号进行处理,所述处理包含使用所述滤波器进行滤波。
  26. 根据权利要求21至24任一项所述的电子设备,其特征在于,所述电子设备还包括显示屏:
    所述显示屏,用于显示所述音频处理策略;
    所述处理器调用所述程序指令,用于:
    响应于用户对所述音频处理策略的操作,使用所述滤波器对麦克风拾取的音频信号进行滤波。
PCT/CN2019/110095 2018-10-15 2019-10-09 音频处理方法和电子设备 WO2020078237A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201811196568.X 2018-10-15
CN201811196568.XA CN111050269B (zh) 2018-10-15 2018-10-15 音频处理方法和电子设备

Publications (1)

Publication Number Publication Date
WO2020078237A1 true WO2020078237A1 (zh) 2020-04-23

Family

ID=70230302

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/110095 WO2020078237A1 (zh) 2018-10-15 2019-10-09 音频处理方法和电子设备

Country Status (2)

Country Link
CN (1) CN111050269B (zh)
WO (1) WO2020078237A1 (zh)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111933161A (zh) * 2020-07-16 2020-11-13 腾讯音乐娱乐科技(深圳)有限公司 均衡器滤波参数的生成方法、音频信号滤波方法及均衡器
CN114143696A (zh) * 2020-09-04 2022-03-04 华为技术有限公司 音箱位置调节方法、音频渲染方法和装置
WO2022078291A1 (zh) * 2020-10-16 2022-04-21 华为技术有限公司 拾音方法和拾音装置
CN114679647A (zh) * 2022-05-30 2022-06-28 杭州艾力特数字科技有限公司 无线麦拾音距离的确定方法、装置、设备及可读存储介质
WO2022134910A1 (zh) * 2020-12-21 2022-06-30 Oppo广东移动通信有限公司 终端、终端控制方法、装置及存储介质
CN114710583A (zh) * 2022-04-08 2022-07-05 维沃移动通信有限公司 麦克风调用方法、装置、电子设备及可读存储介质
CN116055982A (zh) * 2022-08-12 2023-05-02 荣耀终端有限公司 音频输出方法、设备及存储介质
US11696068B2 (en) * 2019-11-22 2023-07-04 Shure Acquisition Holdings, Inc. Microphone with adjustable signal processing
CN116668892A (zh) * 2022-11-14 2023-08-29 荣耀终端有限公司 音频信号的处理方法、电子设备及可读存储介质

Families Citing this family (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11616760B1 (en) * 2020-02-20 2023-03-28 Meta Platforms, Inc. Model thresholds for digital content management and selection
CN111885414B (zh) * 2020-07-24 2023-03-21 腾讯科技(深圳)有限公司 一种数据处理方法、装置、设备及可读存储介质
CN111916102B (zh) * 2020-07-31 2024-05-28 维沃移动通信有限公司 电子设备的录音方法及录音装置
CN113556501A (zh) * 2020-08-26 2021-10-26 华为技术有限公司 音频处理方法及电子设备
CN111970625B (zh) * 2020-08-28 2022-03-22 Oppo广东移动通信有限公司 录音方法和装置、终端和存储介质
CN112151044B (zh) * 2020-09-23 2024-06-11 北京百瑞互联技术股份有限公司 在lc3音频编码器中自动调节蓝牙播放设备频响曲线的方法、装置及存储介质
CN114255781A (zh) * 2020-09-25 2022-03-29 Oppo广东移动通信有限公司 一种多通道音频信号获取方法、装置及系统
CN112273366A (zh) * 2020-10-28 2021-01-29 玉林市农业科学院(广西农业科学院玉林分院) 水稻直播栽培中农用无人机生态驱赶鸟类的方法及系统
CN112423191B (zh) * 2020-11-18 2022-12-27 青岛海信商用显示股份有限公司 一种视频通话设备和音频增益方法
CN113099031B (zh) * 2021-02-26 2022-05-17 华为技术有限公司 声音录制方法及相关设备
CN116208704A (zh) * 2021-06-24 2023-06-02 北京荣耀终端有限公司 一种声音处理方法及其装置
CN113542785B (zh) * 2021-07-13 2023-04-07 北京字节跳动网络技术有限公司 应用于直播的音频的输入输出的切换方法、直播设备
CN113707165B (zh) * 2021-09-07 2024-09-17 联想(北京)有限公司 音频处理方法、装置及电子设备和存储介质
US12096203B2 (en) 2021-12-10 2024-09-17 Realtek Semiconductor Corp. Audio system with dynamic target listening spot and ambient object interference cancelation
TWI842055B (zh) * 2021-12-10 2024-05-11 瑞昱半導體股份有限公司 可動態調整目標聆聽點並消除環境物件干擾的音響系統
CN114464184B (zh) * 2022-04-11 2022-09-02 北京荣耀终端有限公司 语音识别的方法、设备和存储介质
CN115150713A (zh) * 2022-06-20 2022-10-04 歌尔股份有限公司 双向通话方法和装置、电子设备及计算机可读存储介质
CN115101102B (zh) * 2022-06-23 2024-07-16 歌尔股份有限公司 一种录音设备、定向录音的方法、装置、智能手表及介质
CN115756376A (zh) * 2022-10-21 2023-03-07 中电智恒信息科技服务有限公司 一种基于lstm的会议音量调控方法、装置及系统
CN116994600B (zh) * 2023-09-28 2023-12-12 中影年年(北京)文化传媒有限公司 基于音频驱动角色口型的方法及系统

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101163204A (zh) * 2006-08-21 2008-04-16 索尼株式会社 声音拾取设备和声音拾取方法
CN102316265A (zh) * 2010-06-29 2012-01-11 三洋电机株式会社 电子设备
CN102483928A (zh) * 2009-09-04 2012-05-30 株式会社尼康 声音数据合成装置
CN103516894A (zh) * 2012-06-25 2014-01-15 Lg电子株式会社 移动终端及其音频缩放方法
CN103797816A (zh) * 2011-07-14 2014-05-14 峰力公司 语音增强系统和方法
CN105474666A (zh) * 2014-04-25 2016-04-06 松下知识产权经营株式会社 声音处理装置、声音处理系统及声音处理方法
US20160157013A1 (en) * 2014-02-26 2016-06-02 Qualcomm Incorporated Listen to people you recognize
CN107360387A (zh) * 2017-07-13 2017-11-17 广东小天才科技有限公司 一种视频录制的方法、装置及终端设备

Family Cites Families (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101478711B (zh) * 2008-12-29 2013-07-31 无锡中星微电子有限公司 控制麦克风录音的方法、数字化音频信号处理方法及装置
JP2011071962A (ja) * 2009-08-28 2011-04-07 Sanyo Electric Co Ltd 撮像装置及び再生装置
EP2658281A1 (en) * 2010-12-20 2013-10-30 Nikon Corporation Audio control device and image capture device
US9495591B2 (en) * 2012-04-13 2016-11-15 Qualcomm Incorporated Object recognition using multi-modal matching scheme
KR102052153B1 (ko) * 2013-02-15 2019-12-17 삼성전자주식회사 보청기를 제어하는 휴대 단말 및 방법
CN103458210B (zh) * 2013-09-03 2017-02-22 华为技术有限公司 一种录制方法、装置及终端
CN103888703B (zh) * 2014-03-28 2015-11-25 努比亚技术有限公司 增强录音的拍摄方法和摄像装置
CN105451139A (zh) * 2014-06-24 2016-03-30 索尼公司 声音信号处理方法、装置和移动终端
KR102339798B1 (ko) * 2015-08-21 2021-12-15 삼성전자주식회사 전자 장치의 음향 처리 방법 및 그 전자 장치
CN105245811B (zh) * 2015-10-16 2018-03-27 广东欧珀移动通信有限公司 一种录像方法及装置
US10362270B2 (en) * 2016-12-12 2019-07-23 Dolby Laboratories Licensing Corporation Multimodal spatial registration of devices for congruent multimedia communications
CN106653041B (zh) * 2017-01-17 2020-02-14 北京地平线信息技术有限公司 音频信号处理设备、方法和电子设备
CN107333120B (zh) * 2017-08-11 2020-08-04 吉林大学 一种基于麦克风阵列和立体视觉的集成传感器

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101163204A (zh) * 2006-08-21 2008-04-16 索尼株式会社 声音拾取设备和声音拾取方法
CN102483928A (zh) * 2009-09-04 2012-05-30 株式会社尼康 声音数据合成装置
CN102316265A (zh) * 2010-06-29 2012-01-11 三洋电机株式会社 电子设备
CN103797816A (zh) * 2011-07-14 2014-05-14 峰力公司 语音增强系统和方法
CN103516894A (zh) * 2012-06-25 2014-01-15 Lg电子株式会社 移动终端及其音频缩放方法
US20160157013A1 (en) * 2014-02-26 2016-06-02 Qualcomm Incorporated Listen to people you recognize
CN105474666A (zh) * 2014-04-25 2016-04-06 松下知识产权经营株式会社 声音处理装置、声音处理系统及声音处理方法
CN107360387A (zh) * 2017-07-13 2017-11-17 广东小天才科技有限公司 一种视频录制的方法、装置及终端设备

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11696068B2 (en) * 2019-11-22 2023-07-04 Shure Acquisition Holdings, Inc. Microphone with adjustable signal processing
CN111933161A (zh) * 2020-07-16 2020-11-13 腾讯音乐娱乐科技(深圳)有限公司 均衡器滤波参数的生成方法、音频信号滤波方法及均衡器
CN114143696A (zh) * 2020-09-04 2022-03-04 华为技术有限公司 音箱位置调节方法、音频渲染方法和装置
CN114143696B (zh) * 2020-09-04 2022-12-30 华为技术有限公司 音箱位置调节方法、音频渲染方法和装置
WO2022078291A1 (zh) * 2020-10-16 2022-04-21 华为技术有限公司 拾音方法和拾音装置
WO2022134910A1 (zh) * 2020-12-21 2022-06-30 Oppo广东移动通信有限公司 终端、终端控制方法、装置及存储介质
CN114710583A (zh) * 2022-04-08 2022-07-05 维沃移动通信有限公司 麦克风调用方法、装置、电子设备及可读存储介质
CN114679647A (zh) * 2022-05-30 2022-06-28 杭州艾力特数字科技有限公司 无线麦拾音距离的确定方法、装置、设备及可读存储介质
CN116055982A (zh) * 2022-08-12 2023-05-02 荣耀终端有限公司 音频输出方法、设备及存储介质
CN116055982B (zh) * 2022-08-12 2023-11-17 荣耀终端有限公司 音频输出方法、设备及存储介质
CN116668892A (zh) * 2022-11-14 2023-08-29 荣耀终端有限公司 音频信号的处理方法、电子设备及可读存储介质
CN116668892B (zh) * 2022-11-14 2024-04-12 荣耀终端有限公司 音频信号的处理方法、电子设备及可读存储介质

Also Published As

Publication number Publication date
CN111050269A (zh) 2020-04-21
CN111050269B (zh) 2021-11-19

Similar Documents

Publication Publication Date Title
WO2020078237A1 (zh) 音频处理方法和电子设备
WO2020019356A1 (zh) 一种终端切换摄像头的方法及终端
WO2021052214A1 (zh) 一种手势交互方法、装置及终端设备
CN114697812B (zh) 声音采集方法、电子设备及系统
CN110458902B (zh) 3d光照估计方法及电子设备
CN113393856B (zh) 拾音方法、装置和电子设备
WO2022100685A1 (zh) 一种绘制命令处理方法及其相关设备
CN114846816B (zh) 立体声拾音方法、装置、终端设备和计算机可读存储介质
CN110138999B (zh) 一种用于移动终端的证件扫描方法及装置
CN113823314B (zh) 语音处理方法和电子设备
WO2022022319A1 (zh) 一种图像处理方法、电子设备、图像处理系统及芯片系统
CN113744750B (zh) 一种音频处理方法及电子设备
CN113542580A (zh) 去除眼镜光斑的方法、装置及电子设备
CN112037157B (zh) 数据处理方法及装置、计算机可读介质及电子设备
CN113593567A (zh) 视频声音转文本的方法及相关设备
WO2023197997A1 (zh) 穿戴设备、拾音方法及装置
WO2020078267A1 (zh) 在线翻译过程中的语音数据处理方法及装置
CN113496477A (zh) 屏幕检测方法及电子设备
WO2022214004A1 (zh) 一种目标用户确定方法、电子设备和计算机可读存储介质
WO2022033344A1 (zh) 视频防抖方法、终端设备和计算机可读存储介质
WO2022068505A1 (zh) 一种拍摄方法和电子设备
CN114120987B (zh) 一种语音唤醒方法、电子设备及芯片系统
CN115480250A (zh) 语音识别方法、装置、电子设备及存储介质
CN115701113A (zh) 拍摄方法、拍摄参数训练方法、电子设备及存储介质
CN115019803B (zh) 音频处理方法、电子设备以及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19874501

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19874501

Country of ref document: EP

Kind code of ref document: A1