WO2023124200A1 - 视频处理方法与电子设备 - Google Patents

视频处理方法与电子设备 Download PDF

Info

Publication number
WO2023124200A1
WO2023124200A1 PCT/CN2022/117323 CN2022117323W WO2023124200A1 WO 2023124200 A1 WO2023124200 A1 WO 2023124200A1 CN 2022117323 W CN2022117323 W CN 2022117323W WO 2023124200 A1 WO2023124200 A1 WO 2023124200A1
Authority
WO
WIPO (PCT)
Prior art keywords
audio data
amplitude spectrum
electronic device
camera
energy
Prior art date
Application number
PCT/CN2022/117323
Other languages
English (en)
French (fr)
Inventor
刘镇亿
玄建永
曹国智
Original Assignee
北京荣耀终端有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from CN202210320689.0A external-priority patent/CN116405774A/zh
Application filed by 北京荣耀终端有限公司 filed Critical 北京荣耀终端有限公司
Priority to EP22882090.8A priority Critical patent/EP4231622A4/en
Publication of WO2023124200A1 publication Critical patent/WO2023124200A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/60Control of cameras or camera modules
    • H04N23/63Control of cameras or camera modules by using electronic viewfinders
    • H04N23/631Graphical user interfaces [GUI] specially adapted for controlling image capture or setting capture parameters
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/60Control of cameras or camera modules
    • H04N23/62Control of parameters via user interfaces
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/60Control of cameras or camera modules
    • H04N23/63Control of cameras or camera modules by using electronic viewfinders
    • H04N23/631Graphical user interfaces [GUI] specially adapted for controlling image capture or setting capture parameters
    • H04N23/632Graphical user interfaces [GUI] specially adapted for controlling image capture or setting capture parameters for displaying or modifying preview images prior to image capturing, e.g. variety of image resolutions or capturing parameters
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/60Control of cameras or camera modules
    • H04N23/667Camera operation mode switching, e.g. between still and video, sport and normal or high- and low-resolution modes
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/60Control of cameras or camera modules
    • H04N23/68Control of cameras or camera modules for stable pick-up of the scene, e.g. compensating for camera body vibrations
    • H04N23/681Motion detection
    • H04N23/6812Motion detection based on additional sensors, e.g. acceleration sensors
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/90Arrangement of cameras or camera modules, e.g. multiple cameras in TV studios or sports stadiums
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/45Cameras or camera modules comprising electronic image sensors; Control thereof for generating image signals from two or more image sensors being of different type or operating in different modes, e.g. with a CMOS sensor for moving images in combination with a charge-coupled device [CCD] for still images

Definitions

  • the present application relates to the field of video processing, and in particular, to a video processing method and electronic equipment.
  • the present application provides a video processing method and an electronic device, which can complete video recording without the need for the user to switch the shooting mode of the electronic device, thereby improving the user's shooting experience.
  • a video processing method which is applied to an electronic device, the electronic device includes at least two sound pickup devices, and the video processing method includes:
  • the first image is an image collected when the electronic device is in a first shooting mode
  • Audio data is data collected by the at least two sound pickup devices
  • the switching instruction being used to instruct the electronic device to switch from the first shooting mode to a second shooting mode
  • the second image is an image collected when the electronic device is in the second shooting mode.
  • the electronic device can collect audio data in the shooting environment through at least two sound pickup devices (for example, microphones); a switching instruction is generated based on the audio data, and the electronic device automatically starts from the current first shooting based on the switching instruction.
  • the mode is switched to the second shooting mode, and the second image collected in the second shooting mode is displayed; without the user switching the shooting mode of the electronic device, the electronic device can automatically switch the shooting mode to complete the video recording, improving the shooting experience of the user .
  • the electronic device since the electronic device needs to judge the directionality of the audio data, the electronic device includes at least two sound pickup devices in the embodiment of the present application, and the specific number of the sound pickup devices is not made. Any restrictions.
  • the first shooting mode may refer to a single-shot mode or any one of the multi-shot modes; wherein, the single-shot mode may include a front single-shot mode or a rear single-shot mode;
  • the shooting modes can include front/rear dual camera mode, rear/front dual camera mode, picture-in-picture front main picture mode, or picture-in-picture rear main picture mode.
  • a front camera in the electronic device is used for video shooting; in the rear single camera mode, a rear camera in the electronic device is used for video shooting; in the front and rear dual camera mode , use a front camera and a rear camera for video shooting; in the picture-in-picture front mode, use a front camera and a rear camera for video shooting, and put the picture taken by the rear camera on the front Among the pictures taken by the camera, the picture taken by the front camera is the main picture; in the picture-in-picture rear mode, a front camera and a rear camera are used for video shooting, and the picture taken by the front camera is placed in the Among the images captured by the rear camera, the image captured by the rear camera is the main image.
  • the multi-camera mode may also include a front dual-camera mode, a rear dual-camera mode, a front picture-in-picture mode, or a rear picture-in-picture mode.
  • the first shooting mode and the second shooting mode may refer to the same shooting mode or different shooting modes; if the switching instruction is the default current shooting mode, the second shooting mode and the first shooting mode may be the same shooting mode ; In other cases, the second shooting mode and the first shooting mode may be different shooting modes.
  • the electronic device includes a first camera and a second camera, and the first camera and the second camera are located in different directions of the electronic device, so The switching instruction is obtained based on the audio data, including:
  • the audio data includes a target keyword, where the target keyword is text information corresponding to the switching instruction;
  • the electronic device will switch the shooting mode to the second shooting mode corresponding to the target keyword; if the audio data does not include the target keyword, the electronic device can obtain switching instructions based on the audio data of the first direction and/or the audio data of the second direction; for example, if the user is in front of the electronic device, the image is generally collected through the front camera; If the user's audio information exists in the forward direction of the electronic device, it can be considered that the user is in the forward direction of the electronic device, and the front camera can be turned on at this time; if the user is in the rear of the electronic device, the image is generally collected through the rear camera; If there is audio information of the user in the backward direction of the electronic device, it can be considered that the user is in the backward direction of the electronic device, and the rear camera can be turned on at this time.
  • the processing the audio data to obtain the audio data in the first direction and/or the audio data in the second direction includes:
  • the probability of audio data in each direction can be calculated, so that the audio data can be separated in direction to obtain audio data in the first direction and audio data in the second direction; based on the audio data in the first direction and the audio data in the second direction
  • the audio data in the second direction after / can get a switching instruction; the electronic device can automatically switch the shooting mode based on the switching instruction.
  • the obtaining the switching instruction based on the audio data in the first direction and/or the audio data in the second direction includes:
  • the switching instruction is obtained based on the energy of the first amplitude spectrum and/or the energy of the second amplitude spectrum, the first amplitude spectrum is the amplitude spectrum of the audio data in the first direction, and the second amplitude spectrum is the amplitude spectrum of the audio data in the first direction.
  • the direction with greater energy of audio data can generally be considered as the main shooting direction; it can be obtained based on the energy of the amplitude spectrum of audio data in different directions The main shooting direction; for example, if the energy of the amplitude spectrum of the audio data in the first direction is greater than the energy of the amplitude spectrum of the audio data in the second direction, then the first direction can be considered as the main shooting direction; at this time, the electronic device can be turned on The camera corresponding to the first direction.
  • the switching instruction includes the current shooting mode, the first picture-in-picture mode, the second picture-in-picture mode, the first dual-view mode, and the second dual-view mode .
  • the single-shot mode of the first camera or the single-shot mode of the second camera, the switching instruction is obtained based on the energy of the first amplitude spectrum and/or the energy of the second amplitude spectrum, including:
  • the switching instruction is obtained to maintain the current shooting mode
  • the switching instruction is to switch to the single camera of the first camera. shooting mode
  • the switching instruction is to switch to the single camera of the second camera. shooting mode
  • the switching instruction is to switch to the first picture-in-picture mode
  • the switching instruction is to switch to the second picture-in-picture mode
  • the switching The instruction is to switch to the first dual-view mode
  • the switching The instruction is to switch to the second dual-view mode
  • the second preset threshold is greater than the first preset threshold
  • the first picture-in-picture mode refers to the shooting mode in which the image collected by the first camera is the main picture
  • the second picture-in-picture Mode refers to the shooting mode in which the image captured by the second camera is the main picture
  • the first dual-view mode refers to the mode in which the image captured by the first camera is located on the upper side or left side of the display screen of the electronic device.
  • the second dual-view mode refers to a shooting mode in which the image captured by the second camera is located on the upper side or the left side of the display screen of the electronic device.
  • the first amplitude spectrum is a first average amplitude spectrum obtained by averaging amplitude spectra corresponding to each frequency point in the audio data in the first direction; and /or,
  • the second amplitude spectrum is a second average amplitude spectrum obtained by averaging amplitude spectra corresponding to each frequency point in the audio data in the second direction.
  • the amplitude spectrum obtained after averaging the amplitude spectrums of different frequency points in the audio data in the first direction may be called the first average amplitude spectrum; the difference in the audio data in the second direction
  • the amplitude spectrum obtained after averaging the amplitude spectrum of the frequency point can be called the second average amplitude spectrum; since the first average amplitude spectrum and/or the second average amplitude spectrum is the amplitude spectrum obtained by averaging the amplitude spectrum of different frequency points ; Therefore, the audio data in the first direction and/or the accuracy of the information in the audio data in the first direction can be improved.
  • the first amplitude spectrum is an amplitude spectrum obtained after performing first amplification processing and/or second amplification on the first average amplitude spectrum, and the first The average amplitude spectrum is obtained by averaging amplitude spectra corresponding to each frequency point in the audio data in the first direction.
  • the video processing method further includes:
  • the first detection indicates that the audio data in the first direction includes audio information of the user, perform the first amplification process on the amplitude spectrum of the audio data in the first direction; and/or,
  • the predicted angle information includes angle information in the first preset angle range, performing the second amplification process on the amplitude spectrum of the audio data in the first direction.
  • the second amplitude spectrum is an amplitude spectrum obtained after performing first amplification processing and/or second amplification on the second average amplitude spectrum
  • the second The average amplitude spectrum is obtained by averaging amplitude spectra corresponding to each frequency point in the audio data in the second direction.
  • the video processing method further includes:
  • the second detection indicates that the audio data in the second direction includes audio information of the user, perform the first amplification process on the amplitude spectrum of the audio data in the second direction; and/or,
  • the predicted angle information includes angle information in the second preset angle range, performing the second amplification process on the amplitude spectrum of the audio data in the second direction.
  • the direction in which the user is usually can be considered as the main shooting direction; if the detection result indicates that the direction includes the user's audio information, it can be considered that the user is in this direction;
  • the audio data is subjected to a first amplification process, and the accuracy of the acquired user audio information can be improved through the first amplification process.
  • the direction of arrival estimation refers to the algorithm of performing spatial Fourier transform on the received signal, and then taking the square of the modulus to obtain the spatial spectrum, and estimating the direction of arrival of the signal.
  • the direction in which the user is usually can be considered as the main shooting direction; if the detection result indicates that the direction includes the user's audio information, it can be considered that the user is in this direction;
  • the audio data is subjected to the first amplification process, and the accuracy of the acquired user audio information can be improved through the first amplification process; when the predicted angle information includes the first preset angle range and/or the second preset angle range, it can be explained that the electronic device There is audio information in the first direction and/or the rear second direction; through the second amplification process, the accuracy of the first amplitude spectrum or the second amplitude spectrum can be improved; when the accuracy of the amplitude spectrum and user audio information is improved, The switching instruction can be accurately obtained.
  • the identifying whether the audio data includes the target keyword includes:
  • the audio data collected by at least two sound pickup devices can be separated and processed first to obtain N audio information from different sources; identify whether target keywords are included in the N audio information, so as to be able to Improve the accuracy of identifying target keywords.
  • the first image is a preview image captured when the electronic device is in multi-mirror video recording.
  • the first image is a video frame captured by the electronic device when it is in multi-mirror video recording.
  • the audio data refers to data collected by the sound pickup device in a shooting environment where the electronic device is located.
  • an electronic device in a second aspect, includes one or more processors, memory, and at least two sound pickup devices; the memory is coupled with the one or more processors, and the memory uses storing computer program code, the computer program code comprising computer instructions, the one or more processors invoking the computer instructions to cause the electronic device to perform:
  • the first image is an image collected when the electronic device is in a first shooting mode
  • Audio data is data collected by the at least two sound pickup devices
  • the switching instruction being used to instruct the electronic device to switch from the first shooting mode to a second shooting mode
  • the second image is an image collected when the electronic device is in the second shooting mode.
  • the electronic device includes a first camera and a second camera, and the first camera and the second camera are located in different directions of the electronic device, so
  • the one or more processors invoke the computer instructions to cause the electronic device to perform:
  • the audio data includes a target keyword, where the target keyword is text information corresponding to the switching instruction;
  • the processing the audio data to obtain the audio data in the first direction and/or the audio data in the second direction includes:
  • the one or more processors call the computer instructions so that the electronic device executes:
  • the switching instruction is obtained based on the energy of the first amplitude spectrum and/or the energy of the second amplitude spectrum, the first amplitude spectrum is the amplitude spectrum of the audio data in the first direction, and the second amplitude spectrum is the amplitude spectrum of the audio data in the first direction.
  • the switching instruction includes the current shooting mode, the first picture-in-picture mode, the second picture-in-picture mode, the first dual-view mode, and the second dual-view mode , the single-shot mode of the first camera or the single-shot mode of the second camera, the one or more processors call the computer instructions to make the electronic device execute:
  • the switching instruction is obtained to maintain the current shooting mode
  • the switching instruction is to switch to the single camera of the first camera. shooting mode
  • the switching instruction is to switch to the single camera of the second camera. shooting mode
  • the switching instruction is to switch to the first picture-in-picture mode
  • the switching instruction is to switch to the second picture-in-picture mode
  • the switching The instruction is to switch to the first dual-view mode
  • the switching The instruction is to switch to the second dual-view mode
  • the second preset threshold is greater than the first preset threshold
  • the first picture-in-picture mode refers to the shooting mode in which the image collected by the first camera is the main picture
  • the second picture-in-picture Mode refers to the shooting mode in which the image captured by the second camera is the main picture
  • the first dual-view mode refers to the mode in which the image captured by the first camera is located on the upper side or left side of the display screen of the electronic device.
  • the second dual-view mode refers to a shooting mode in which the image captured by the second camera is located on the upper side or the left side of the display screen of the electronic device.
  • the first amplitude spectrum is a first average amplitude spectrum obtained by averaging amplitude spectra corresponding to each frequency point in the audio data in the first direction; and /or,
  • the second amplitude spectrum is a second average amplitude spectrum obtained by averaging amplitude spectra corresponding to each frequency point in the audio data in the second direction.
  • the first amplitude spectrum is an amplitude spectrum obtained after performing first amplification processing and/or second amplification on the first average amplitude spectrum
  • the first The average amplitude spectrum is obtained by averaging amplitude spectra corresponding to each frequency point in the audio data in the first direction.
  • the one or more processors call the computer instructions so that the electronic device executes:
  • the first detection indicates that the audio data in the first direction includes audio information of the user, perform the first amplification process on the amplitude spectrum of the audio data in the first direction; and/or,
  • the predicted angle information includes angle information in the first preset angle range, performing the second amplification process on the amplitude spectrum of the audio data in the first direction.
  • the second amplitude spectrum is an amplitude spectrum obtained after performing first amplification processing and/or second amplification on the second average amplitude spectrum
  • the second The average amplitude spectrum is obtained by averaging amplitude spectra corresponding to each frequency point in the audio data in the second direction.
  • the one or more processors call the computer instructions so that the electronic device executes:
  • the second detection indicates that the audio data in the second direction includes audio information of the user, perform the first amplification process on the amplitude spectrum of the audio data in the second direction; and/or,
  • the predicted angle information includes angle information in the second preset angle range, performing the second amplification process on the amplitude spectrum of the audio data in the second direction.
  • the one or more processors call the computer instructions so that the electronic device executes:
  • the first image is a preview image captured when the electronic device is in multi-mirror video recording.
  • the first image is a video frame captured by the electronic device when it is in multi-mirror video recording.
  • the audio data refers to data collected by the sound pickup device in the shooting environment where the electronic device is located.
  • an electronic device including a module/unit for executing the first aspect or any video processing method in the first aspect.
  • an electronic device in a fourth aspect, includes one or more processors and memory; the memory is coupled to the one or more processors, the memory is used to store computer program codes, the The computer program code includes computer instructions, and the one or more processors call the computer instructions to make the electronic device execute the first aspect or any method in the first aspect.
  • a chip system is provided, the chip system is applied to an electronic device, and the chip system includes one or more processors, and the processor is used to call a computer instruction so that the electronic device executes the first aspect or any of the methods in the first aspect.
  • a computer-readable storage medium stores computer program code, and when the computer program code is run by an electronic device, the electronic device executes the first aspect or the first Either method in the aspect.
  • a computer program product comprising: computer program code, when the computer program code is run by an electronic device, the electronic device is made to execute the first aspect or any one of the first aspects. a way.
  • the electronic device can collect audio data in the shooting environment through at least two sound pickup devices (for example, microphones); a switching instruction is generated based on the audio data, and the electronic device automatically starts from the current first shooting based on the switching instruction.
  • the mode is switched to the second shooting mode, and the second image collected in the second shooting mode is displayed; without the need for the user to switch the shooting mode of the electronic device, the electronic device can automatically switch the shooting mode to complete the video recording, improving the shooting experience of the user .
  • FIG. 1 is a schematic diagram of a hardware system applicable to an electronic device of the present application
  • Fig. 2 is a schematic diagram of a software system applicable to the electronic device of the present application
  • FIG. 3 is a schematic diagram of an application scenario applicable to an embodiment of the present application.
  • FIG. 4 is a schematic diagram of an application scenario applicable to an embodiment of the present application.
  • FIG. 5 is a schematic diagram of an application scenario applicable to an embodiment of the present application.
  • FIG. 6 is a schematic diagram of an application scenario applicable to an embodiment of the present application.
  • FIG. 7 is a schematic flowchart of a video processing method provided by an embodiment of the present application.
  • FIG. 8 is a schematic flowchart of a video processing method provided by an embodiment of the present application.
  • FIG. 9 is a schematic flowchart of a video processing method provided by an embodiment of the present application.
  • FIG. 10 is a schematic diagram of a target angle of an electronic device provided in an embodiment of the present application.
  • FIG. 11 is a schematic flowchart of a method for identifying a switching instruction provided in an embodiment of the present application.
  • Fig. 12 is a schematic diagram of a direction of arrival estimation provided by an embodiment of the present application.
  • Fig. 13 is a schematic diagram of a graphical user interface applicable to the embodiment of the present application.
  • Fig. 14 is a schematic diagram of a graphical user interface applicable to the embodiment of the present application.
  • Fig. 15 is a schematic diagram of a graphical user interface applicable to the embodiment of the present application.
  • Fig. 16 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.
  • FIG. 17 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.
  • the Fourier transform is a linear integral transform used to represent the transformation of a signal between the time domain (or, spatial domain) and the frequency domain.
  • FFT refers to the fast algorithm of discrete Fourier transform, which can transform a signal from the time domain to the frequency domain.
  • Blind signal separation refers to algorithms for recovering independent source signals from an acquired mixture of signals (usually the outputs of multiple sensors).
  • Beam results at different angles can be obtained based on the frequency domain signal obtained by FFT transforming the input signal collected by the non-pickup device (for example, a microphone) and the filter coefficients at different angles.
  • y( ⁇ ) represents the beam results at different angles; Represents the filter coefficients of different angles; x i ( ⁇ ) represents the frequency domain signal obtained by FFT transforming the input signal obtained by the pickup device; i represents the pickup signal of the i-th microphone; M represents the microphone’s quantity.
  • VAD Voice activity detection
  • Voice activity detection is a technique used in speech processing to detect the presence or absence of a speech signal.
  • Direction of arrival estimation refers to the algorithm of performing spatial Fourier transform on the received signal, and then taking the square of the modulus to obtain the spatial spectrum, and estimating the direction of arrival of the signal.
  • TDOA is used to represent the time difference between the arrival of a sound source at different microphones in an electronic device.
  • GCC-PHAT is an algorithm for calculating the angle of arrival (AOA), as shown in Figure 12.
  • ESRT refers to a rotation invariance technology algorithm, its principle is mainly based on the rotation of the signal without writing the estimated signal parameters.
  • the principle of the positioning algorithm of the controllable beamforming method is to filter, weight and sum the signals received by the microphone to form a beam, and search for the position of the sound source according to a certain rule.
  • the searched sound source The position is the real sound source direction.
  • the cepstrum algorithm is a method in signal processing and signal detection; the so-called cepstrum refers to the power spectrum of the logarithmic power spectrum of the signal;
  • the spectrum is a periodic impulse, so that the pitch period can be obtained;
  • the second impulse in the cepstrum waveform (the first is the envelope information) is considered to be the fundamental frequency of the excitation source.
  • IDFT refers to the inverse transform, which is the inverse process of the Fourier transform.
  • Angular central Gaussian mixture model (complex angular central gaussian mixture model, cACGMM)
  • cACGMM is a Gaussian mixture model
  • a Gaussian mixture model refers to accurately quantifying things with a Gaussian probability density function (for example, a normal distribution curve), and decomposing a thing into several Gaussian probability density functions (for example, a normal distribution curve) ) to form a model.
  • the magnitude spectrum can be obtained by performing a modulo operation on the signal.
  • multi-lens video recording can refer to a camera mode similar to video recording, shooting, etc. in the camera application program; multiple different shooting modes can be included in multi-lens video recording; for example, as shown in (b) in Figure 4, the shooting mode may include but not limited to: front/rear dual-camera mode, rear/front dual-camera mode, picture-in-picture 1 mode, picture-in-picture 2 mode, rear single-camera mode or Front single camera mode, etc.
  • Fig. 1 shows a hardware system applicable to the electronic equipment of this application.
  • the electronic device 100 may be a mobile phone, a smart screen, a tablet computer, a wearable electronic device, a vehicle electronic device, an augmented reality (augmented reality, AR) device, a virtual reality (virtual reality, VR) device, a notebook computer, a super mobile personal computer ( ultra-mobile personal computer (UMPC), netbook, personal digital assistant (personal digital assistant, PDA), projector, etc.
  • augmented reality augmented reality
  • VR virtual reality
  • a notebook computer a super mobile personal computer ( ultra-mobile personal computer (UMPC), netbook, personal digital assistant (personal digital assistant, PDA), projector, etc.
  • UMPC ultra-mobile personal computer
  • PDA personal digital assistant
  • the electronic device 100 may include a processor 110, an external memory interface 120, an internal memory 121, a universal serial bus (universal serial bus, USB) interface 130, a charging management module 140, a power management module 141, a battery 142, an antenna 1, and an antenna 2 , mobile communication module 150, wireless communication module 160, audio module 170, speaker 170A, receiver 170B, microphone 170C, earphone jack 170D, sensor module 180, button 190, motor 191, indicator 192, camera 193, display screen 194, and A subscriber identification module (subscriber identification module, SIM) card interface 195 and the like.
  • SIM subscriber identification module
  • the sensor module 180 may include a pressure sensor 180A, a gyroscope sensor 180B, an air pressure sensor 180C, a magnetic sensor 180D, an acceleration sensor 180E, a distance sensor 180F, a proximity light sensor 180G, a fingerprint sensor 180H, a temperature sensor 180J, a touch sensor 180K, an ambient light sensor 180L, bone conduction sensor 180M, etc.
  • the audio module 170 is used to convert digital audio information into an output analog audio signal, and may also be used to convert an analog audio input into a digital audio signal.
  • the audio module 170 may also be used to encode and decode audio signals.
  • the audio module 170 or some functional modules of the audio module 170 may be set in the processor 110 .
  • the audio module 170 may send the audio data collected by the microphone to the processor 110 .
  • the structure shown in FIG. 1 does not constitute a specific limitation on the electronic device 100 .
  • the electronic device 100 may include more or fewer components than those shown in FIG. 1 , or the electronic device 100 may include a combination of some of the components shown in FIG. 1 , or , the electronic device 100 may include subcomponents of some of the components shown in FIG. 1 .
  • the components shown in FIG. 1 can be realized in hardware, software, or a combination of software and hardware.
  • Processor 110 may include one or more processing units.
  • the processor 110 may include at least one of the following processing units: an application processor (application processor, AP), a modem processor, a graphics processing unit (graphics processing unit, GPU), an image signal processor (image signal processor) , ISP), controller, video codec, digital signal processor (digital signal processor, DSP), baseband processor, neural network processor (neural-network processing unit, NPU).
  • an application processor application processor, AP
  • modem processor graphics processing unit
  • graphics processing unit graphics processing unit
  • image signal processor image signal processor
  • ISP image signal processor
  • controller video codec
  • digital signal processor digital signal processor
  • DSP digital signal processor
  • baseband processor baseband processor
  • neural network processor neural-network processing unit
  • a memory may also be provided in the processor 110 for storing instructions and data.
  • the memory in processor 110 is a cache memory.
  • the memory may hold instructions or data that the processor 110 has just used or recycled. If the processor 110 needs to use the instruction or data again, it can be called directly from the memory. Repeated access is avoided, and the waiting time of the processor 110 is reduced, thereby improving the efficiency of the system.
  • the processor 110 can be used to execute the video processing method of the embodiment of the present application; for example, run the camera application program in the electronic device; display the first image, the first image is collected when the electronic device is in the first shooting mode image; acquire audio data, the audio data is data collected by at least two sound pickup devices; obtain a switching instruction based on the audio data, the switching instruction is used to instruct the electronic device to switch from the first shooting mode to the second shooting mode; display the second image, The second image is an image collected when the electronic device is in the second shooting mode.
  • connection relationship between the modules shown in FIG. 1 is only a schematic illustration, and does not constitute a limitation on the connection relationship between the modules of the electronic device 100 .
  • each module of the electronic device 100 may also adopt a combination of various connection modes in the foregoing embodiments.
  • the wireless communication function of the electronic device 100 may be realized by components such as the antenna 1 , the antenna 2 , the mobile communication module 150 , the wireless communication module 160 , a modem processor, and a baseband processor.
  • Antenna 1 and Antenna 2 are used to transmit and receive electromagnetic wave signals.
  • Each antenna in electronic device 100 may be used to cover single or multiple communication frequency bands. Different antennas can also be multiplexed to improve the utilization of the antennas.
  • Antenna 1 can be multiplexed as a diversity antenna of a wireless local area network.
  • the antenna may be used in conjunction with a tuning switch.
  • the electronic device 100 can realize the display function through the GPU, the display screen 194 and the application processor.
  • the GPU is a microprocessor for image processing, and is connected to the display screen 194 and the application processor. GPUs are used to perform mathematical and geometric calculations for graphics rendering.
  • Processor 110 may include one or more GPUs that execute program instructions to generate or change display information.
  • Display 194 may be used to display images or video.
  • the electronic device 100 can realize the shooting function through the ISP, the camera 193 , the video codec, the GPU, the display screen 194 , and the application processor.
  • the ISP is used for processing the data fed back by the camera 193 .
  • the light is transmitted to the photosensitive element of the camera through the lens, and the light signal is converted into an electrical signal, and the photosensitive element of the camera transmits the electrical signal to the ISP for processing, and converts it into an image visible to the naked eye.
  • ISP can optimize the algorithm of image noise, brightness and color, and ISP can also optimize parameters such as exposure and color temperature of the shooting scene.
  • the ISP may be located in the camera 193 .
  • Camera 193 is used to capture still images or video.
  • the object generates an optical image through the lens and projects it to the photosensitive element.
  • the photosensitive element may be a charge coupled device (CCD) or a complementary metal-oxide-semiconductor (CMOS) phototransistor.
  • CMOS complementary metal-oxide-semiconductor
  • the photosensitive element converts the light signal into an electrical signal, and then transmits the electrical signal to the ISP to convert it into a digital image signal.
  • the ISP outputs the digital image signal to the DSP for processing.
  • DSP converts digital image signals into standard red green blue (red green blue, RGB), YUV and other image signals.
  • the electronic device 100 may include 1 or N cameras 193 , where N is a positive integer greater than 1.
  • the electronic device may include multiple cameras 193; the multiple cameras may include a front camera and a rear camera.
  • Digital signal processors are used to process digital signals. In addition to digital image signals, they can also process other digital signals. For example, when the electronic device 100 selects a frequency point, the digital signal processor is used to perform Fourier transform on the energy of the frequency point.
  • Video codecs are used to compress or decompress digital video.
  • the electronic device 100 may support one or more video codecs.
  • the electronic device 100 can play or record videos in various encoding formats, for example: moving picture experts group (moving picture experts group, MPEG) 1, MPEG2, MPEG3 and MPEG4.
  • the gyro sensor 180B can be used to determine the motion posture of the electronic device 100 .
  • the angular velocity of the electronic device 100 around three axes may be determined by the gyro sensor 180B.
  • the gyro sensor 180B can be used for image stabilization. For example, when the shutter is pressed, the gyro sensor 180B detects the shaking angle of the electronic device 100, calculates the distance that the lens module needs to compensate according to the angle, and allows the lens to counteract the shaking of the electronic device 100 through reverse movement to achieve anti-shake.
  • the gyro sensor 180B can also be used in scenarios such as navigation and somatosensory games.
  • the acceleration sensor 180E can detect the acceleration of the electronic device 100 in various directions (generally x-axis, y-axis and z-axis). The magnitude and direction of gravity can be detected when the electronic device 100 is stationary. The acceleration sensor 180E can also be used to identify the posture of the electronic device 100 as an input parameter for application programs such as horizontal and vertical screen switching and pedometer.
  • the distance sensor 180F is used to measure distance.
  • the electronic device 100 may measure the distance by infrared or laser. In some embodiments, for example, in a shooting scene, the electronic device 100 can use the distance sensor 180F for distance measurement to achieve fast focusing.
  • the ambient light sensor 180L is used for sensing ambient light brightness.
  • the electronic device 100 can adaptively adjust the brightness of the display screen 194 according to the perceived ambient light brightness.
  • the ambient light sensor 180L can also be used to automatically adjust the white balance when taking pictures.
  • the ambient light sensor 180L can also cooperate with the proximity light sensor 180G to detect whether the electronic device 100 is in the pocket, so as to prevent accidental touch.
  • the fingerprint sensor 180H is used to collect fingerprints.
  • the electronic device 100 can use the collected fingerprint characteristics to implement functions such as unlocking, accessing the application lock, taking pictures, and answering incoming calls.
  • the touch sensor 180K is also referred to as a touch device.
  • the touch sensor 180K may be disposed on the display screen 194, and the touch sensor 180K and the display screen 194 form a touch screen, which is also called a touch screen.
  • the touch sensor 180K is used to detect a touch operation on or near it.
  • the touch sensor 180K may transmit the detected touch operation to the application processor to determine the touch event type.
  • Visual output related to the touch operation can be provided through the display screen 194 .
  • the touch sensor 180K may also be disposed on the surface of the electronic device 100 and disposed at a different position from the display screen 194 .
  • the hardware system of the electronic device 100 is described in detail above, and the software system of the electronic device 100 is introduced below.
  • Fig. 2 is a schematic diagram of a software system of an electronic device provided by an embodiment of the present application.
  • the system architecture may include an application layer 210 , an application framework layer 220 , a hardware abstraction layer 230 , a driver layer 240 and a hardware layer 250 .
  • the application layer 210 can include application programs such as camera application, gallery, calendar, call, map, navigation, WLAN, Bluetooth, music, video, short message; the application layer 210 can be divided into application interface and application logic; wherein, the camera application
  • the application interface can include single-view mode, dual-view mode, picture-in-picture mode, etc., corresponding to different video shooting modes.
  • the application framework layer 220 provides an application programming interface (application programming interface, API) and a programming framework for applications in the application layer; the application framework layer may include some predefined functions.
  • API application programming interface
  • the application framework layer 220 may include a camera access interface; the camera access interface may include camera management and camera equipment.
  • the camera management can be used to provide an access interface for managing the camera; the camera device can be used to provide an interface for accessing the camera.
  • the hardware abstraction layer 230 is used to abstract hardware.
  • the hardware abstraction layer can include the camera abstraction layer and other hardware device abstraction layers; the camera hardware abstraction layer can call the camera algorithm.
  • the hardware abstraction layer 230 includes a camera hardware abstraction layer and camera algorithms; the camera algorithms may include software algorithms for video processing or image processing.
  • the algorithm in the camera algorithm may refer to implementation that does not depend on specific hardware; for example, code that can usually be run on a CPU.
  • the driver layer 240 is used to provide drivers for different hardware devices.
  • the driver layer may include camera drivers.
  • the hardware layer 250 is located at the bottom layer of the operating system; as shown in FIG. 2 , the hardware layer 250 may include camera 1 , camera 2 , camera 3 and so on. Wherein, the camera 1, the camera 2, and the camera 3 may correspond to multiple cameras on the electronic device.
  • the video processing method and the electronic device provided by the embodiments of the present application can run on the hardware abstraction layer; or, can run on the application framework layer; or, can run on the digital signal processor.
  • the switching of the shooting mode (for example, camera) of the electronic device depends on the manual operation of the user, so the distance between the user and the electronic device is required to be relatively short during the shooting process; if the distance between the user and the electronic device is relatively long, then It is necessary to switch the shooting mode of the electronic device based on Bluetooth technology; when switching the shooting mode of the electronic device based on Bluetooth technology, it is necessary to perform corresponding operations on the lens of the electronic device through the control device. On the one hand, the operation is more complicated; on the other hand, the control device It is easy to be exposed in the video and affect the beauty of the video, resulting in poor user experience.
  • the shooting mode for example, camera
  • the embodiment of the present application provides a video processing method.
  • the electronic device can obtain a switching instruction according to the audio data in the shooting environment, and the electronic device can automatically switch based on the switching instruction.
  • the shooting mode of the electronic device for example, it can automatically switch between different cameras in the electronic device; for example, the electronic device can automatically determine whether to switch cameras, or whether to enable multi-lens video recording, or whether to switch between different shooting modes in multi-lens video recording, etc. This enables the video recording to be completed without the need for the user to switch the shooting mode of the electronic device, achieving a one-shot video experience.
  • one shot to the end means that after the user selects a certain shooting mode, the user does not need to perform corresponding operations to switch the shooting mode; the electronic device can automatically generate a switching instruction based on the audio data collected in the shooting environment; the electronic device is based on Switch command, automatically switch shooting mode.
  • the video processing method in the embodiment of the present application can be applied to the field of recording video, the field of video calling, or other image processing fields;
  • Microphone collects audio data in the shooting environment; generates a switching instruction based on the audio data, and the electronic device automatically switches from the first shooting mode to the second shooting mode based on the switching instruction, and displays the second image collected in the second shooting mode; without the user
  • the electronic device can automatically switch the shooting mode to complete video recording, thereby improving the user's shooting experience.
  • the video processing method in the embodiment of the present application may be applied to a preview state of a recorded video.
  • the electronic device is in the preview state of multi-mirror video recording, and the current shooting mode of the electronic device can default to the shooting mode of front/rear dual view, wherein the foreground picture can be as shown in image 251, and the background picture can be as shown in As shown in image 252; the foreground picture may refer to the image collected by the front camera of the electronic device, and the background picture may refer to the image collected by the rear camera of the electronic device; the foreground picture is image 251 below, and the background picture is image 252 As an example to illustrate.
  • the electronic device can display a variety of different shooting modes in the multi-mirror video; Different shooting modes can include but are not limited to: front/rear dual camera mode, rear/front dual camera mode, picture-in-picture 1 mode (rear picture-in-picture mode), picture-in-picture 2 mode (front picture-in-picture mode) ), rear camera single shot mode, or front camera single shot mode, etc., as shown in (b) in Figure 4; through the video processing method in the embodiment of the application, the electronic device is in the preview state of multi-mirror video recording
  • at least two sound pickup devices for example, microphones
  • a switching instruction is generated based on the audio data, and the electronic device automatically switches from the first shooting mode to the second shooting mode based on the switching instruction Mode, displaying the second image collected in the second shooting mode;
  • single camera mode can include front single camera mode, rear single camera mode, etc.
  • multi-camera mode can include front/rear dual camera mode, rear/front dual camera mode, picture-in-picture 1 mode, picture-in-picture 2 mode wait.
  • the multi-camera mode may also include a front-facing dual-camera mode, or a rear-mounted dual-camera mode.
  • the single camera mode one camera in the electronic device is used for video shooting; in the multi-camera mode, two or more cameras in the electronic device are used for video shooting.
  • a front camera is used for video shooting; in the rear single camera mode, a rear camera is used for video shooting; in the front dual camera mode, two front cameras are used In the rear dual camera mode, use two rear cameras for video shooting; in the front and rear dual camera mode, use a front camera and a rear camera for video shooting; in the front picture
  • two front-facing cameras are used for video shooting, and the picture taken by one front-facing camera is placed in the picture taken by the other front-facing camera; in the rear picture-in-picture mode, two rear-facing cameras are used
  • the camera is used for video shooting, and the picture taken by one rear camera is placed in the picture taken by the other rear camera; in the front and rear picture-in-picture mode, a front camera and a rear camera are used for video shooting, and Put the picture taken by the front camera or the rear camera into the picture taken by the rear camera or the front camera.
  • FIG. 4 may be the shooting interface of different shooting modes of multi-mirror video recording when the electronic device is in the vertical screen state
  • FIG. 5 may be the shooting interface of different shooting modes of multi-mirror video recording when the electronic device is in the horizontal screen state.
  • Shooting interface wherein, (a) in Figure 4 corresponds to (a) in Figure 5, and (b) in Figure 4 corresponds to (b) in Figure 5; the electronic device can be determined according to the state of the user using the electronic device Vertical screen display or horizontal screen display.
  • the video processing method in the embodiment of the present application may be applied in the process of recording video.
  • the electronic device is in the video recording state of multi-mirror video recording, and the current shooting mode of the electronic device can default to the shooting mode of the front/rear dual scene, as shown in (a) in Figure 6, when the electronic device records At the 5th second of the video, after detecting the operation of the control 270 of the shooting mode of the multi-camera video, the electronic device can display multiple different shooting modes in the multi-camera video, as shown in (b) in Figure 6;
  • the video processing method in the embodiment of the present application when the electronic device is in the recording state of multi-mirror video recording, at least two sound pickup devices (for example, microphones) in the electronic device collect audio data in the shooting environment; The data generates a switching instruction, and the electronic device automatically switches from the first shooting mode to the second shooting mode based on the switching instruction, and displays the second image collected in the second shooting mode; for example, assuming that the electronic device is currently recording a video, the electronic device starts to record the video The default shooting mode of the front/rear dual camera
  • the video processing method in the embodiment of the present application can also be applied to: video calls, video conferencing applications, long and short video applications, live video applications, video online class applications, portrait intelligent movement Mirror application scenarios, system camera recording function recording video, video surveillance, or smart peephole and other shooting scenarios.
  • the video processing method in the embodiment of the present application can also be applied to the video recording state of the electronic device;
  • a sound pickup device for example, a microphone
  • a switching command is generated based on the audio data, and the electronic device can automatically switch from the rear single-shot mode to the front single-shot mode based on the switching command; or, the electronic device is based on the switching command.
  • the switching instruction can automatically switch from the single-shot mode to the multi-shot mode, and display the second image captured in the second shooting mode; the second image can be a preview image, or the second image can also be a video image.
  • the video processing method in the embodiment of the present application can also be applied to the field of photographing; for example, when the electronic device is in the video recording state, the default rear single-shot shooting mode can be adopted, and at least two sound pickup devices in the electronic device (for example, a microphone) collects audio data in the shooting environment; a switching instruction is generated based on the audio data, and the electronic device automatically switches from the rear single-shot mode to the front single-shot mode based on the switching instruction, and is displayed in the second shot captured by the second shooting mode.
  • image the second image may be a preview image, or the second image may also be a video frame.
  • Fig. 7 is a schematic flowchart of a video processing method provided by an embodiment of the present application.
  • the video processing method 300 may be executed by the electronic device shown in FIG. 1; the video processing method includes steps S310 to S350, and the steps S310 to S350 will be described in detail below.
  • Step S310 running the camera application program of the electronic device.
  • the user may instruct the electronic device to run the camera application by clicking the icon of the "camera” application.
  • the user may instruct the electronic device to run the camera application by sliding right on the display screen of the electronic device.
  • the electronic device is in a locked screen state, and the lock screen interface includes an icon of the camera application program, and the user instructs the electronic device to run the camera application program by clicking the icon of the camera application program.
  • the application has the permission to call the camera application program; the user can instruct the electronic device to run the camera application program by clicking a corresponding control.
  • the electronic device is running an instant messaging application, the user may instruct the electronic device to run the camera application by selecting a control of the camera function.
  • Step S320 displaying the first image.
  • the first image is an image collected when the electronic device is in the first shooting mode.
  • the first shooting mode may refer to single-shot mode or any one of multi-shot modes; wherein, single-shot mode may include front single-shot mode or rear single-shot mode; multi-shot mode may include front /Rear dual camera mode, rear/front dual camera mode, picture-in-picture front main picture mode, or picture-in-picture rear main picture mode.
  • a front camera in the electronic device is used for video shooting; in the rear single camera mode, a rear camera in the electronic device is used for video shooting; in the front and rear dual camera mode , use a front camera and a rear camera for video shooting; in the picture-in-picture front mode, use a front camera and a rear camera for video shooting, and put the picture taken by the rear camera on the front Among the pictures taken by the camera, the picture taken by the front camera is the main picture; in the picture-in-picture rear mode, a front camera and a rear camera are used for video shooting, and the picture taken by the front camera is placed in the Among the images captured by the rear camera, the image captured by the rear camera is the main image.
  • the multi-camera mode may also include a front dual-camera mode, a rear dual-camera mode, a front picture-in-picture mode, or a rear picture-in-picture mode.
  • the first image is a preview image.
  • the first image is a video frame.
  • the first image is a preview image.
  • the first image is a video frame.
  • Step S330 acquiring audio data.
  • the audio data is data collected by at least two sound pickup devices in the electronic device; for example, data collected by at least two microphones.
  • the electronic device since the electronic device needs to judge the directionality of the audio data, the electronic device includes at least two sound pickup devices in the embodiment of the present application, and the specific number of the sound pickup devices is not made. Any restrictions.
  • the electronic equipment includes three sound pickup devices.
  • the audio data may refer to data collected by a sound pickup device in a shooting environment where the electronic device is located.
  • Step S340 obtaining a switching instruction based on the audio data.
  • the switching instruction is used to instruct the electronic device to switch from the first shooting mode to the second shooting mode.
  • the first shooting mode and the second shooting mode may be the same shooting mode or different shooting modes; if the switching instruction is the default current shooting mode, the second shooting mode and the first shooting mode may be the same shooting mode , as shown in Table 1, the identification 0; in other cases, the second shooting mode and the first shooting mode may be different shooting modes, such as the identifications 1 to 6 shown in Table 1.
  • the step is 350, displaying the second image.
  • the second image is an image collected when the electronic device is in the second shooting mode.
  • the electronic device can collect audio data in the shooting environment through at least two sound pickup devices (for example, microphones); a switching instruction is generated based on the audio data, and the electronic device automatically starts from the current first shooting based on the switching instruction.
  • the mode is switched to the second shooting mode, and the second image collected in the second shooting mode is displayed; without the need for the user to switch the shooting mode of the electronic device, the electronic device can automatically switch the shooting mode to complete the video recording, improving the shooting experience of the user .
  • the electronic device may include a first camera (for example, a front camera) and a second camera (for example, a rear camera), and the first camera and the second camera may be located in different directions of the electronic device;
  • Switch instructions including:
  • a switching instruction is obtained based on the target keyword
  • the audio data is processed to obtain the audio data of the first direction and/or the audio data of the second direction, and the first direction is used to represent the first preview corresponding to the first camera.
  • An angle range is set, and the second direction is used to indicate a second preset angle range corresponding to the second camera; a switching instruction is obtained based on the audio data of the first direction and/or the audio data of the second direction.
  • the electronic device will switch the shooting mode to the second shooting mode corresponding to the target keyword; if the audio data does not include the target keyword, the electronic device can obtain switching instructions based on the audio data of the first direction and/or the audio data of the second direction; for example, if the user is in front of the electronic device, the image is generally collected through the front camera; If the user's audio information exists in the forward direction of the electronic device, it can be considered that the user is in the forward direction of the electronic device, and the front camera can be turned on at this time; if the user is in the rear of the electronic device, the image is generally collected through the rear camera; If there is audio information of the user in the backward direction of the electronic device, it can be considered that the user is in the backward direction of the electronic device, and the rear camera can be turned on at this time.
  • the target keyword may include but not limited to: front camera, rear camera, front video, rear video, dual-view video, picture-in-picture video, etc.;
  • the first direction may refer to the forward direction of the electronic device,
  • the first preset angle range may be -30 degrees and 30 degrees;
  • the second direction may be the backward direction of the electronic device, and the second preset angle range may be 150 degrees and 210 degrees, as shown in FIG. 10 .
  • audio data may be processed based on a sound direction probability calculation algorithm to obtain audio data in a first direction (for example, a forward direction) and/or audio data in a second direction (for example, a backward direction).
  • a first direction for example, a forward direction
  • a second direction for example, a backward direction
  • the probability of audio data in each direction can be calculated, so that the audio data can be separated in direction to obtain audio data in the first direction and audio data in the second direction; based on the audio data in the first direction and the audio data in the second direction
  • the audio data in the second direction after / can get a switching instruction; the electronic device can automatically switch the shooting mode based on the switching instruction.
  • the switching instruction is obtained, including:
  • the switching instruction is obtained, the first amplitude spectrum is the amplitude spectrum of the audio data in the first direction, and the second amplitude spectrum is the amplitude spectrum of the audio data in the second direction .
  • the direction with greater energy of audio data can generally be considered as the main shooting direction; it can be obtained based on the energy of the amplitude spectrum of audio data in different directions The main shooting direction; for example, if the energy of the amplitude spectrum of the audio data in the first direction is greater than the energy of the amplitude spectrum of the audio data in the second direction, then the first direction can be considered as the main shooting direction; at this time, the electronic device can be turned on The camera corresponding to the first direction.
  • the switching instruction may include the current shooting mode, the first picture-in-picture mode, the second picture-in-picture mode, the first dual-view mode, the second dual-view mode, the single-shot mode of the first camera, or the In the single shot mode of the second camera, a switching instruction is obtained based on the energy of the first amplitude spectrum and/or the energy of the second amplitude spectrum, including:
  • the switching instruction is to switch to the single-shot mode of the first camera
  • the switching instruction is to switch to the second camera single shot mode
  • the switching instruction is to switch to the first picture-in-picture mode
  • the switching instruction is to switch to the second picture-in-picture mode
  • the switching instruction is to switch to the first dual-view mode
  • the switching instruction is to switch to the second dual-view mode
  • the second preset threshold is greater than the first preset threshold.
  • the first picture-in-picture mode refers to the shooting mode in which the image collected by the first camera is the main picture
  • the second picture-in-picture mode refers to the image collected by the second camera.
  • the shooting mode of the main picture the first dual-view mode refers to the shooting mode in which the image collected by the first camera is located on the upper side or the left side of the display screen of the electronic device
  • the second dual-view mode refers to the image collected by the second camera is located on the electronic device. Shooting modes on the upper or left side of the display of the device.
  • step S515 shown in FIG. 9 .
  • the first amplitude spectrum is a first average amplitude spectrum obtained by averaging the amplitude spectrum corresponding to each frequency point in the audio data in the first direction; and/or,
  • the second amplitude spectrum is a second average amplitude spectrum obtained by averaging amplitude spectra corresponding to each frequency point in the audio data in the second direction.
  • the amplitude spectrum obtained after averaging the amplitude spectrums of different frequency points in the audio data in the first direction may be called the first average amplitude spectrum; the difference in the audio data in the second direction
  • the amplitude spectrum obtained after averaging the amplitude spectrum of the frequency point can be called the second average amplitude spectrum; since the first average amplitude spectrum and/or the second average amplitude spectrum is the amplitude spectrum obtained by averaging the amplitude spectrum of different frequency points ; Therefore, the audio data in the first direction and/or the accuracy of the information in the audio data in the first direction can be improved.
  • the first amplitude spectrum is the amplitude spectrum obtained after performing the first amplification process and/or the second amplification on the first average amplitude spectrum, and the first average amplitude spectrum corresponds to each frequency point in the audio data in the first direction
  • the magnitude spectrum is averaged.
  • the above video processing method further includes:
  • the first detection indicates that the audio data in the first direction includes the audio information of the user, perform a first amplification process on the amplitude spectrum of the audio data in the first direction; and/or, if the predicted angle information includes the first preset angle range
  • the second amplification process is performed on the amplitude spectrum of the audio data in the first direction.
  • the second amplitude spectrum is the amplitude spectrum obtained after performing the first amplification process and/or the second amplification on the second average amplitude spectrum
  • the second average amplitude spectrum is the amplitude spectrum of each frequency in the audio data in the second direction The amplitude spectrum corresponding to the point is averaged.
  • the above video processing method further includes:
  • the second detection indicates that the audio data in the second direction includes the audio information of the user, perform the first amplification process on the amplitude spectrum of the audio data in the second direction; and/or, if the predicted angle information includes the second preset angle range The second amplification process is performed on the amplitude spectrum of the audio data in the second direction.
  • the direction in which the user is usually can be considered as the main shooting direction; if the detection result indicates that the direction includes the user's audio information, it can be considered that the user is in this direction;
  • the audio data is subjected to a first amplification process, and the accuracy of the acquired user audio information can be improved through the first amplification process.
  • the direction in which the user is usually can be considered as the main shooting direction; if the detection result indicates that the direction includes the user's audio information, it can be considered that the user is in this direction;
  • the first amplification process is performed on the audio data, and the accuracy of the acquired user audio information can be improved through the first amplification process; when the predicted angle information includes the first preset angle range and/or the second preset angle range, it can be explained that the electronic There is audio information in the first direction and/or rear second direction of the device; through the second amplification process, the accuracy of the first amplitude spectrum or the second amplitude spectrum can be improved; when the accuracy of the amplitude spectrum and user audio information is improved , the switching instruction can be obtained accurately.
  • step S511 or step S513 in the subsequent FIG. 9 refer to the related description of step S511 or step S513 in the subsequent FIG. 9 .
  • step S515 in FIG. 9 For the specific process of the above first enlargement processing and/or the second enlargement processing, refer to the related description of step S515 in FIG. 9 .
  • direction of arrival estimation refers to an algorithm for performing spatial Fourier transform on a received signal, and then taking the square of the modulus to obtain a spatial spectrum, and estimating the direction of arrival of the signal.
  • step S407 in FIG. 8 or the related description of step S514 in FIG. 9 .
  • identifying whether the target keyword is included in the audio data includes:
  • the audio data is separated and processed to obtain N audio information, and the N audio information is audio information of different users;
  • Each audio information in the N pieces of audio information is identified, and it is determined whether the N pieces of audio information include the target keyword.
  • the audio data collected by at least two sound pickup devices can be separated and processed first to obtain N audio information from different sources; identify whether target keywords are included in the N audio information, so as to be able to Improve the accuracy of identifying target keywords.
  • a blind signal separation algorithm refers to an algorithm for recovering independent source signals from an acquired mixed signal (usually the output of multiple sensors).
  • step S405 in FIG. 8 refers to the related description of step S405 in FIG. 8 or the related description of step S504 in FIG. 9 .
  • the electronic device can collect audio data in the shooting environment through at least two sound pickup devices (for example, microphones); a switching instruction is generated based on the audio data, and the electronic device automatically starts from the current first shooting based on the switching instruction.
  • the mode is switched to the second shooting mode, and the second image collected in the second shooting mode is displayed; without the need for the user to switch the shooting mode of the electronic device, the electronic device can automatically switch the shooting mode to complete the video recording, improving the shooting experience of the user .
  • Fig. 8 is a schematic flowchart of a video processing method provided by an embodiment of the present application.
  • the video processing method 400 may be executed by the electronic device shown in FIG. 1 ; the video processing method includes steps S401 to S410, and the steps S401 to S410 will be described in detail below.
  • Step S401 Acquire audio data collected by N sound pickup devices (for example, microphones).
  • N sound pickup devices for example, microphones
  • Step S402 performing sound source separation processing on the audio data to obtain M pieces of audio information.
  • sound source separation can also be called sound source separation; for example, the collected N-channel audio data can be Fourier transformed, and then the frequency-domain data of the N-channel audio data plus hyperparameters can be sent to the separator for sound source separation. separated to obtain M pieces of audio information.
  • Step S403 judging whether each audio message includes a switching instruction; if it includes a switching instruction, execute step S404; if it does not include a switching instruction, execute steps S405 to S410.
  • each audio information in the M audio information includes a switching instruction (an example of a target keyword); if any audio in the M audio information includes a switching instruction, step S404 is performed; if the M audio information If none of the information includes a switching instruction, step S405 to step S410 are performed.
  • a switching instruction an example of a target keyword
  • the switching instruction may include but not limited to: switching to the front camera, switching to the rear camera, front recording, rear recording, dual-view recording, picture-in-picture recording, and the like.
  • the method for identifying the switching instruction may be as shown in FIG. 11 .
  • Step S404 execute the switching instruction.
  • the execution of the switching instruction by the electronic device may mean that the electronic device can automatically switch the camera of the electronic device based on the switching instruction without the user manually switching the camera application program.
  • Step S405 performing direction separation processing on the audio data to obtain forward audio information and/or backward audio information.
  • direction separation processing is performed on the audio data collected by N microphones to obtain forward audio information (an example of audio data in the first direction) and/or backward audio information (an example of audio data in the second direction) ).
  • the electronic device if a switching instruction is detected in the M pieces of audio information, the electronic device automatically executes the switching instruction; if no switching instruction is detected in the M pieces of audio information, the electronic equipment can collect The N-way audio data obtains the forward audio information in the target angle of the forward direction of the electronic device, and/or the backward audio information in the target angle of the backward direction of the electronic device; based on the energy of the forward audio information
  • the switching instruction can be obtained by analyzing the energy of the backward audio information, and the electronic device can execute the corresponding switching instruction.
  • the forward voice beam may refer to audio data in the forward direction of the electronic device; wherein, the target angle of the forward direction of the electronic device (an example of the first preset angle range ) can be [-30,30]; the backward voice beam can refer to the audio data in the backward direction of the electronic device; wherein, the target angle of the backward direction of the electronic device (an example of the second preset angle range ) can be [150,210].
  • the N-channel audio data can be separated into forward audio data and/or backward audio data; for example, the specific implementation method Refer to step S507 to step S511 shown in FIG. 6 .
  • Step S406 voice detection processing.
  • voice detection processing is performed on the forward audio information and/or the backward audio information to obtain a detection result.
  • the purpose of performing voice detection processing on the forward audio information and/or backward audio information is to determine whether the user's audio information is included in the forward audio information and/or backward audio information; if the forward audio information If the audio information of the user (or the backward audio information) includes the audio information of the user, the forward audio information (or the backward audio information) may be amplified, so as to ensure that the audio information of the user can be accurately acquired.
  • voice detection processing may include but not limited to: voice activity detection, or other user audio information detection methods, which are not limited in this application.
  • Step S407 perform DOA estimation on the audio data to obtain predicted angle information.
  • direction of arrival estimation is performed on the audio data collected by the N microphones to obtain predicted angle information.
  • the N-way audio data collected by the sound pickup device can be divided into forward audio information and/or backward audio information; further, through the N-way audio data collected by the sound pickup device
  • the direction of arrival estimation of the audio data can obtain the angle information corresponding to the audio data, so as to determine whether the audio data acquired by the sound pickup device is within the target angle range; for example, determine whether the audio data is within the target angle range of the forward direction of the electronic device within, or within, the range of target angles in the backward direction.
  • the specific implementation method of performing direction-of-arrival estimation on audio data collected by N microphones to obtain predicted angle information may refer to step S514 shown in FIG. 9 .
  • step S405, step S406, step S408 to step S410 may be performed in the case that each audio information does not include a switching instruction.
  • Step S408 performing amplification processing on the amplitude spectrum of the forward audio information and/or the backward audio information.
  • the amplitude spectrum of the forward audio information and/or the backward audio information may be amplified based on the detection result of the voice detection processing.
  • the forward audio information when the detection result of the voice detection corresponding to the forward audio information (or backward audio information) includes the user's audio information, the forward audio information (or backward audio information) can be The magnitude spectrum is amplified, thereby improving the accuracy of the acquired user audio information.
  • the amplitude spectrum of the forward audio information and/or the backward audio information may be amplified based on the detection result of the voice detection processing and the predicted angle information.
  • the amplitude spectrum of the forward audio information can be amplified, so that Improve the accuracy of the magnitude spectrum;
  • the detection result of the voice detection corresponding to the forward audio information (or backward audio information) includes the user's audio information
  • the forward audio information (or backward audio information) can be ) amplitude spectrum, thereby improving the accuracy of the acquired user audio information;
  • the accuracy of the amplitude spectrum and the accuracy of the user's audio information are improved, the accuracy of the obtained switching instruction can be improved.
  • the amplitude spectra of the forward audio information and the backward audio information are calculated respectively; Amplification processing; or, when the voice activity detection result indicates that the audio information of the user is included in the backward audio information, the amplitude spectrum of the backward audio information can be subjected to the first amplification processing; for example, the amplification factor of the first amplification processing is ⁇ (1 ⁇ 2).
  • a second amplification process may be performed on the amplitude spectrum of the forward audio information; or , when the predicted angle information obtained based on direction of arrival estimation indicates that the N-way audio data collected by the pickup device includes a target angle in the backward direction, the second amplification process may be performed on the amplitude spectrum of the backward audio information; for example, the second The amplification factor of the amplification processing is ⁇ (1 ⁇ 2), and the amplitude spectrum of the forward audio information and/or the backward audio information after the amplification processing is obtained.
  • Step S409 obtaining a switching instruction based on the amplified forward audio information and/or backward audio information.
  • the switching instruction is obtained based on the energy of the magnitude spectrum of the amplified forward audio information and/or backward audio information.
  • the electronic device keeps the default lens to record video; for example, the switching instruction may correspond to the flag 0.
  • the electronic device determines that the energy is greater than the first threshold.
  • the direction corresponding to the magnitude spectrum of the preset threshold is the direction of the main sound source, and the lens of the electronic device is switched to this direction;
  • the command may be to switch to the front lens, and the switch command may correspond to the sign 2.
  • the electronic device can determine that the direction corresponding to the amplitude spectrum with energy greater than the second preset threshold is the main sound source direction, and the energy greater than the first preset threshold Set the direction corresponding to the amplitude spectrum of the threshold as the direction of the second sound source.
  • the electronic device can start the picture-in-picture recording mode; use the picture in the direction corresponding to the amplitude spectrum with energy greater than or equal to the second preset threshold as the main picture, and set A picture in a direction corresponding to an amplitude spectrum whose energy is greater than or equal to the first preset threshold is used as a secondary picture.
  • the switching instruction of the electronic device may be The picture-in-picture is preceded by the main picture, and the switching command may correspond to the mark 3.
  • the switching instruction of the electronic device may be The main picture is placed behind the picture-in-picture, and the switching command may correspond to the mark 4.
  • the electronic device may determine to enable dual-view recording , that is, turn on the front lens and the rear lens; optionally, the image captured by the lens corresponding to the direction with greater energy may be displayed on the upper side or the left side of the display screen.
  • the switching command of the electronic device can be recorded for both front and rear scenes, and the picture captured by the front lens of the electronic device is displayed on the upper side or the left side of the display screen, and the switching command can correspond to the mark 5 .
  • the switching command of the electronic device can be recorded for the rear and front dual scenes, and the picture captured by the rear lens of the electronic device is displayed on the upper side or the left side of the display screen, and the switching command can correspond to the mark 6 .
  • Step S410 execute the switching instruction.
  • the electronic device can obtain the switching instruction based on the magnitude spectrum of the amplified forward audio information and/or the amplified backward audio information, and automatically execute the switching instruction; that is, the electronic device can In the case of switching the camera application, the camera of the electronic device is automatically switched based on the switching command.
  • the switching instruction in the scene of video shooting, can be obtained according to the audio data in the shooting environment, so that the electronic device can automatically judge whether to switch the lens, or whether to enable multi-lens video recording, etc., so that the user does not need to manually In the case of operation, a one-shot video experience can be realized to improve the user experience.
  • FIG. 9 is a schematic flowchart of a video processing method provided by an embodiment of the present application.
  • the video processing method 500 may be executed by the electronic device shown in FIG. 1 ; the video processing method includes steps S501 to S515, and the steps S501 to S515 will be described in detail below.
  • the video processing method shown in FIG. 8 is illustrated by including three sound pickup devices in the electronic device; since the electronic device needs to judge the directionality of the audio information, in the embodiment of the application, the electronic device at least includes Two sound pickup devices, the specific number of sound pickup devices is not limited in any way.
  • Step S501 the sound pickup device 1 collects audio data.
  • Step S502 the sound pickup device 2 collects audio data.
  • Step S503 the sound pickup device 3 collects audio data.
  • the sound pickup device 1, the sound pickup device 2 or the sound pickup device 3 may be located in different positions in the electronic device for collecting audio information in different directions; for example, the sound pickup device 1, the sound pickup device 2 or the sound pickup device Device 3 may refer to a microphone.
  • the electronic device detects that the user selects the video recording mode and starts recording video, start the sound pickup device 1, the sound pickup device 2, and the sound pickup device 3 to start collecting audio data.
  • step S501 to step S503 may be executed simultaneously.
  • Step S504 blind signal separation.
  • blind signal separation is performed on the audio data collected by the sound pickup device to obtain M channels of audio information.
  • blind signal separation can also be called blind signal/source separation (BSS), which refers to estimating the source signal from the mixed signal without knowing the source signal and signal mixing parameters.
  • BSS blind signal/source separation
  • audio information from different sources that is, audio signals of different objects, can be obtained by performing blind tone signal separation on the collected audio data.
  • the shooting environment where the electronic device is located includes 3 users, namely user A, user B and user C; through blind signal separation, the audio information of user A and the audio information of user B in the audio data can be obtained message with user C's audio message.
  • Step S505 judging whether a switch instruction is included.
  • step S506 it is determined whether the switching instruction is included in the M channels of audio information; if the switching instruction is included in the M channels of audio information, step S506 is performed; if the switching instruction is not included in the M channels of audio information, step S507 to step S515 are performed.
  • M pieces of audio information can be obtained through step S504; by performing switching instruction identification on each audio signal in the M pieces of audio information, it is determined whether each audio signal in the M pieces of audio information includes a switching instruction; wherein, switching The instructions may include, but are not limited to: switching to the front camera, switching to the rear camera, front recording, rear recording, dual-view recording, picture-in-picture recording, and the like.
  • FIG. 11 is a schematic flowchart of a method for identifying a switching instruction provided in an embodiment of the present application.
  • the identification method 600 includes steps S601 to S606, which will be described in detail below respectively.
  • Step S601 acquiring M pieces of audio information.
  • M pieces of audio information after separation and processing are acquired.
  • step S601 the audio data collected by the sound pickup device may also be acquired, as shown in step S401 in FIG. 8 .
  • Step S602 noise reduction processing.
  • noise reduction processing is performed on the M pieces of audio information respectively.
  • the noise reduction processing can adopt any noise reduction processing algorithm; for example, the noise reduction processing algorithm can include spectral subtraction or Wiener filtering algorithm; wherein, the principle of the spectral reduction method is to use the spectrum of the noisy signal to subtract the noise The spectrum of the signal is obtained from the spectrum of the clean signal; the principle of the Wiener filter algorithm is to transform the noisy signal through a linear filter to approximate the original signal, and find the linear filter parameters when the mean square error is the smallest.
  • the noise reduction processing algorithm can include spectral subtraction or Wiener filtering algorithm; wherein, the principle of the spectral reduction method is to use the spectrum of the noisy signal to subtract the noise The spectrum of the signal is obtained from the spectrum of the clean signal; the principle of the Wiener filter algorithm is to transform the noisy signal through a linear filter to approximate the original signal, and find the linear filter parameters when the mean square error is the smallest.
  • Step S603 an acoustic model.
  • the M pieces of audio information after noise reduction processing are respectively input to the acoustic model, wherein the acoustic model is a pre-trained deep neural network.
  • Step S604 output confidence.
  • a confidence degree is output for each channel of audio information in the M pieces of audio information, and the confidence degree is used to indicate the degree of confidence that a channel of audio information includes a certain switching instruction.
  • Step S605 determining that the confidence level is greater than a preset threshold.
  • the confidence level is compared with a preset threshold; when the confidence level is greater than the preset threshold, step S606 is performed.
  • Step S606 obtaining a switching instruction.
  • steps S601 to S606 are examples; other identification methods can also be used to identify whether the audio information includes a switching instruction, and this application does not make any limitation on this.
  • Step S506 execute the switching instruction.
  • the electronic device automatically executes the switching instruction.
  • the automatic execution of the switching instruction by the electronic device may mean that the electronic device can automatically switch the camera of the electronic device based on the switching instruction without the user manually switching the camera application program.
  • steps S507 to S509 are used to output the directivity of the M pieces of audio information, that is, to determine the forward audio signal and the backward audio signal in the M pieces of audio information;
  • the audio signal within the preset angle range of the rear camera; the backward audio signal may refer to the audio signal within the preset angle range of the rear camera of the electronic device.
  • Step S507 calculating the sound direction probability.
  • the sound direction probability calculation is performed on the M pieces of audio information.
  • the probability values of the frequency points of the current input audio data existing in each direction can be calculated.
  • cACGMM is a Gaussian mixture model
  • a Gaussian mixture model refers to accurately quantifying things with a Gaussian probability density function (for example, a normal distribution curve), and decomposing a thing into several Gaussian probability density functions based on a Gaussian probability density function (for example, a normal distribution curve). The model formed by the state distribution curve).
  • the probability values of frequency points of audio data in each direction satisfy the following constraints:
  • P k (t, f) represents the probability value in the k direction
  • t represents a speech frame (for example, a frame of audio data)
  • f represents a frequency point (for example, a frequency angle of a frame of audio data).
  • a frequency point may refer to a time-frequency point; the time-frequency point may include time information, frequency range information, and energy information corresponding to audio data.
  • K may be 36; since a circle of the electronic device is 360 degrees, if K is 36, every 10 degrees may be set as a direction.
  • Step S508 spatial clustering.
  • the probability value of the audio data within the viewing angle range of the camera of the electronic device can be determined through spatial clustering.
  • the front of the screen of the electronic device is usually in the direction of 0 degrees.
  • the target angle of the forward direction can be set to [-30,30 ]; set the target angle of the backward direction of the electronic device to [150,210]; the corresponding angle and direction indexes are k1 ⁇ k2 respectively, and the spatial clustering probability is:
  • P(t, f) represents the probability of the frequency point of the audio data at the target angle
  • P k (t, f) represents the probability value of the frequency point of the audio data in the k direction.
  • Step S509 gain calculation
  • g mask (t, f) represents the frequency point gain of the audio data
  • P th1 represents the first probability threshold
  • P th2 represents the second probability threshold
  • g mask-min represents the frequency point gain of the audio data in the non-target angle.
  • the probability of the frequency point of the audio data at the target angle when the probability of the frequency point of the audio data at the target angle is greater than the first probability threshold, it may indicate that the frequency point is within the target angle range; when the probability of the frequency point of the audio data at the target angle is less than or equal to the second probability threshold , it can indicate that the frequency point is in the non-target angle range; for example, the first probability threshold can be 0.8; the frequency point gain of the audio data in the non-target angle can be a pre-configured parameter; for example, 0.2; the second probability threshold can be is 0.1.
  • smoothing the audio data can be achieved through the gain calculation of the audio data; the frequency points of the audio data in the target angle range are enhanced, and the frequency points of the audio data in the non-target angle range are weakened.
  • Step S510 backward voice beam.
  • a backward voice beam that is, backward audio data, can be obtained.
  • the backward audio data may refer to audio data in the backward direction of the electronic device; wherein, the target angle of the electronic device in the backward direction may be [150, 210].
  • y back (t, f) g back-mask (t, f)*x back (t, f); wherein, y back (t, f) can represent backward audio data; g back-mask (t , f) represents the frequency point gain of the backward audio data; x back (t, f) represents the Fourier transform of the backward audio data.
  • Step S511 voice activity detection.
  • voice activity detection is performed on the backward voice beam (eg, backward audio data).
  • the backward audio data can be semantically detected by the cepstrum algorithm to obtain the voice activity detection result; if the fundamental frequency is detected, it is determined that the audio information of the user is included in the backward voice beam; if the fundamental frequency is not detected, Then it is determined that the voice information of the user is not included in the backward voice beam.
  • the backward audio data refers to the audio data in the angle range of the backward direction collected by the electronic device; the backward audio data may include audio information in the shooting environment (for example, the whistle of a vehicle, etc.) , or the voice information of the user; performing voice detection on the backward audio data is to determine whether the voice information of the user is included in the backward voice data; when the voice information of the user is included in the backward voice data, when performing subsequent step S515
  • the backward voice data can be amplified so that the accuracy of acquiring the user's voice information can be improved.
  • the cepstrum algorithm is a method in signal processing and signal detection; the so-called cepstrum refers to the power spectrum of the signal logarithmic power spectrum; the cepstrum algorithm is a method in signal processing and signal detection; the so-called cepstrum refers to the signal pair
  • the power spectrum of the digital power spectrum; the principle of obtaining speech through the cepstrum is: since the voiced sound signal is periodically excited, the voiced sound signal is a periodic impulse on the cepstrum, so that the pitch period can be obtained; generally the cepstrum waveform
  • the second impulse (the first one is the envelope information) is considered to be the fundamental frequency of the excitation source; the fundamental frequency refers to one of the characteristics of speech, and if there is a fundamental frequency, it means that there is speech in the current audio data.
  • Step S512 forward voice beam.
  • a forward voice beam that is, forward audio data, may be obtained.
  • forward audio data may refer to audio data in the forward direction of the electronic device; wherein, the target angle in the forward direction of the electronic device may be [-30, 30].
  • y front (t, f) g front-mask (t, f)*x front (t, f); where, y front (t, f) can represent the forward voice beam; g front-mask (t , f) represents the frequency point gain of the forward audio data; x front (t, f) represents the Fourier transform of the forward audio data.
  • voice activity detection is performed on the forward voice beam (eg, forward audio data).
  • semantic detection may be performed on the forward audio data by a cepstrum algorithm to obtain a voice activity detection result; if the fundamental frequency is detected, it is determined that the audio information of the user is included in the forward voice beam; if the fundamental frequency is not detected, Then it is determined that the voice information of the user is not included in the forward voice beam.
  • the forward audio data refers to the audio data collected by the electronic device in the angle range of the forward direction; the forward audio data may include audio information in the shooting environment (for example, the sound of the whistle of the vehicle, etc.) , or the voice information of the user; performing voice detection on the forward audio data is to determine whether the voice information of the user is included in the forward voice data; when the voice information of the user is included in the forward voice data, when performing subsequent step S515
  • the forward voice data can be amplified so that the accuracy of acquiring the user's voice information can be improved.
  • Step S5 direction of arrival estimation.
  • direction of arrival estimation is performed on the audio data collected by the sound pickup device.
  • the angle information corresponding to the audio data can be obtained by estimating the direction of arrival of the audio data collected by the sound pickup device, so that it can be determined whether the audio data acquired by the sound pickup device is within the target angle range ; For example, determining whether the audio data is within a target angle range in a forward direction of the electronic device, or within a target angle range in a backward direction.
  • a positioning algorithm based on high-resolution spectral estimation for example, signal parameter estimation (estimating signal parameter via rotational invariance techniques, ESPRIT) based on rotation invariant technique), a positioning algorithm based on steerable beamforming, or a time difference of arrival based (time difference of arrival, TDOA) positioning algorithm, etc., to estimate the direction of arrival of the audio data collected by the pickup device.
  • signal parameter estimation estimating signal parameter via rotational invariance techniques, ESPRIT
  • a positioning algorithm based on steerable beamforming for example, a positioning algorithm based on steerable beamforming, or a time difference of arrival based (time difference of arrival, TDOA) positioning algorithm, etc.
  • ESRT refers to a rotation invariance technology algorithm, and its principle is mainly to estimate signal parameters based on the rotation invariance of the signal.
  • the principle of the positioning algorithm of the controllable beamforming method is to filter, weight and sum the signals received by the microphone to form a beam, and search for the position of the sound source according to a certain rule. When the microphone reaches the maximum output power, the searched sound source The position is the real sound source direction.
  • TDOA is used to represent the time difference between the arrival of a sound source at different microphones in an electronic device.
  • the positioning algorithm of TDOA can include the GCC-PHAT algorithm; taking the GCC-PHAT algorithm as an example, the direction of arrival estimation based on audio data is described; as shown in Figure 12, the sound pickup device 1 and the sound pickup device 2 Collect audio data; the distance between the sound pickup device 1 and the sound pickup device 2 is d, then the angle information between the audio data and the electronic device can be obtained according to the GCC-PHAT algorithm.
  • angle ⁇ shown in Figure 12 can be obtained based on the following formula:
  • IDFT represents the inverse operation of the inverse discrete Fourier transform
  • x a (t, f) represents the frequency domain information obtained by Fourier transforming the audio data collected by the sound pickup device 1
  • x b (t, f) represents the sound pickup device 2
  • arg represents the variable (that is, the English abbreviation of the independent variable argument)
  • arg max represents the value of the variable when the following formula reaches the maximum value.
  • data analysis may be performed on the forward voice beam and the backward voice beam to obtain a switching instruction.
  • the average amplitude spectrum of the forward voice beam and the backward voice beam are calculated respectively; when the voice activity detection result indicates that the audio information of the user is included in the forward voice beam, the average amplitude spectrum of the forward voice beam can be respectively The first amplification process; or when the voice activity detection result indicates that the audio information of the user is included in the backward voice beam, the first amplification process can be performed on the average amplitude spectrum of the backward voice beam; for example, the amplification factor of the first amplification process is ⁇ (1 ⁇ 2).
  • the amplitude spectrum obtained after averaging the amplitude spectra of different frequency points in the forward voice beam can be called the average amplitude spectrum of the forward beam; the average amplitude spectrum of different frequency points in the backward voice beam
  • the amplitude spectrum obtained after the value can be called the average amplitude spectrum of the backward beam; data analysis based on the average amplitude spectrum of the forward voice beam and/or the average amplitude spectrum of the backward voice beam can improve the forward voice beam and/or The accuracy of the information in the backward speech beam.
  • the second amplification process may be performed on the average amplitude spectrum of the forward speech beam; or, when the direction of arrival is based on the
  • the second amplification process can be performed on the average amplitude spectrum of the backward speech beam; for example, the amplification factor of the second amplification process is ⁇ (1 ⁇ ⁇ 2), the magnitude spectrum of the forward speech beam and the magnitude spectrum of the backward speech beam after the amplification process are obtained.
  • the purpose of amplifying the forward speech beam or the backward speech beam is to adjust the accuracy of the amplitude spectrum;
  • the amplitude spectrum of the voice beam is amplified, which can improve the accuracy of the acquired user audio information; when the accuracy of the amplitude spectrum and the user audio information is improved, it can accurately Get the switch command in the voice beam.
  • the amplitude spectrum corresponding to a frequency point in audio data can be calculated by the following formula:
  • Mag(i) represents the amplitude spectrum corresponding to the i-th frequency point; i represents the i-th frequency point; K represents the frequency point range; K i-1 ⁇ K i represents the frequency point range required for averaging; it should be understood , you can obtain the average value of some frequency points without taking the average of all frequency points.
  • the average amplitude spectrum of the enlarged forward voice beam is:
  • MagFront represents the average amplitude spectrum of the amplified forward speech beam
  • MagFront 1 represents the average amplitude spectrum of the original forward speech beam
  • represents the preset first amplification factor
  • represents the preset second amplification factor .
  • the amplitude spectrum obtained after averaging the amplitude spectra of different frequency points in the forward voice beam may be called the average amplitude spectrum of the forward beam.
  • the average amplitude spectrum of the enlarged backward voice beam is:
  • MagBack represents the average magnitude spectrum of the amplified backward speech beam
  • MagBack 1 represents the average magnitude spectrum of the original backward speech beam
  • represents the preset first amplification factor
  • represents the preset second amplification factor .
  • the amplitude spectrum obtained by averaging the amplitude spectra of different frequency points in the backward speech beam may be called the average amplitude spectrum of the backward beam.
  • the switching instruction may correspond to the flag 0.
  • the electronic device determines that the direction corresponding to the amplitude spectrum with energy greater than the first preset threshold is the main sound source direction, and switches the lens of the electronic device To this direction; for example, as shown in Table 1, the switch command can be to switch to the rear lens, and the switch command can correspond to the logo 1; or, the switch command can be to switch to the front camera, and the switch command can correspond to the logo 2 .
  • the electronic device can determine that the direction corresponding to the amplitude spectrum with energy greater than the second preset threshold is the direction of the main sound source, and the direction corresponding to the amplitude spectrum with energy greater than the first preset threshold is the direction of the second sound source.
  • the electronic device can start Picture-in-picture recording mode: the picture in the direction corresponding to the amplitude spectrum with energy greater than or equal to the second preset threshold is used as the main picture, and the picture in the direction corresponding to the amplitude spectrum with energy greater than or equal to the first preset threshold is used as the secondary picture.
  • the switching command of the electronic device can be the picture-in-picture front main picture, and the switching command can correspond to the identification 3.
  • the switching instruction of the electronic device may be The main picture is placed behind the picture-in-picture;
  • the electronic device may determine to enable dual-view recording, that is, enable the front camera and the rear camera.
  • the picture captured by the lens corresponding to the direction with higher energy may be displayed on the upper side or the left side of the display screen.
  • the switching command of the electronic device can be recorded for front and rear dual scenes, and the picture collected by the front lens of the electronic device Displayed on the upper or left side of the display screen; for example, as shown in Table 1, the switching instruction may correspond to the mark 5.
  • the switching command of the electronic device can be recorded for the rear and front dual scenes, and the rear camera of the electronic device collects
  • the screen is displayed on the upper side or the left side of the display screen; for example, as shown in Table 1, the switching instruction may correspond to the mark 6 .
  • Table 1 is an example for the identification corresponding to the recording scene, which is not limited in this application; the electronic device may automatically switch between different cameras in the electronic device in different recording scenes.
  • the electronic device can obtain the switching instruction based on the magnitude spectrum of the amplified forward audio information and/or the amplified backward audio information, and automatically execute the switching instruction; that is, the electronic device can In the case of switching the camera application, the camera of the electronic device is automatically switched based on the switching command.
  • the switching instruction in the scene of video shooting, can be obtained according to the audio data in the shooting environment, so that the electronic device can automatically judge whether to switch the lens, or whether to enable multi-lens video recording, etc., so that the user does not need to manually In the case of operation, a one-shot video experience can be realized to improve the user experience.
  • FIG. 13 shows a graphical user interface (graphical user interface, GUI) of an electronic device.
  • the control 601 that is used to indicate setting can be included in the preview interface of multi-mirror video recording; Detect the operation that the user clicks control 601, respond to user operation display setting interface, as shown in Fig. 13
  • the setting interface includes a control 610 for voice-activated photography, and it is detected that the user turns on voice-activated photography; the control 620 for automatically switching the shooting mode is included in the voice-activated photography, and after detecting that the user clicks the control 620 for automatically switching the shooting mode , the electronic device can start the automatic switching shooting mode of the camera application program; that is, the video processing method provided by the embodiment of the present application can be executed, and in the scene of video shooting, the switching instruction can be obtained according to the audio data in the shooting environment, so that the electronic device can Automatically judge whether to switch the shooting mode; complete video recording without the need for the user to switch the shooting mode of the electronic device, and improve the user's shooting experience.
  • the preview interface of the multi-camera video may include a control 630 indicating to enable automatic switching of shooting modes.
  • the electronic device may start the camera application.
  • the program automatically switches the shooting mode; that is, the video processing method provided by the embodiment of the present application can be executed.
  • the switching instruction can be obtained according to the audio data in the shooting environment, so that the electronic device can automatically judge whether to switch the shooting mode; Video recording is completed without the need for the user to switch the shooting mode of the electronic device, thereby improving the shooting experience of the user.
  • FIG. 15 shows a graphical user interface (graphical user interface, GUI) of an electronic device.
  • the GUI shown in (a) in Figure 15 is the desktop 640 of the electronic device; after the electronic device detects that the user clicks the operation of setting the icon 650 on the desktop 640, it can display the GUI shown in (b) in Figure 15 Another GUI; the GUI shown in (b) in Figure 15 can be a display interface for setting, and options such as wireless network, bluetooth or camera can be included in the display interface for setting; click on the camera option to enter the setting interface of the camera, and display The camera setting interface shown in (c) in Figure 15; the control 660 for automatically switching the shooting mode may be included in the camera setting interface; after detecting that the user clicks the control 660 for automatically switching the shooting mode, the electronic device can start the camera application automatically switch the shooting mode; that is, the video processing method provided by the embodiment of the present application can be executed, and in the scene of video shooting, the switching instruction can be obtained according to the audio data in the shooting environment, so that the electronic device can automatically determine whether to switch the shooting mode; The video recording is completed without the user switching the shooting mode of the electronic
  • FIG. 16 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.
  • the electronic device 700 includes a processing module 710 and a display module 720; the electronic device 700 may also include at least two sound pickup devices; for example, at least two microphones.
  • the processing module 710 is used to start a camera application in the electronic device; the display module 720 is used to display a first image, and the first image is an image collected when the electronic device is in a first shooting mode; The processing module 710 is also used to acquire audio data, the audio data is data collected by the at least two sound pickup devices; a switching instruction is obtained based on the audio data, and the switching instruction is used to instruct the electronic device to switch from the The first shooting mode is switched to the second shooting mode; the display module 720 is also used to display a second image, and the second image is an image collected when the electronic device is in the second shooting mode.
  • the electronic device includes a first camera and a second camera, the first camera and the second camera are located in different directions of the electronic device, and the processing module 710 is specifically used to :
  • the audio data includes a target keyword, where the target keyword is text information corresponding to the switching instruction;
  • processing module 710 is specifically configured to:
  • the audio data is processed based on a sound direction probability calculation algorithm to obtain audio data in the first direction and/or audio data in the second direction.
  • processing module 710 is specifically configured to:
  • the switching instruction is obtained based on the energy of the first amplitude spectrum and/or the energy of the second amplitude spectrum, the first amplitude spectrum is the amplitude spectrum of the audio data in the first direction, and the second amplitude spectrum is the amplitude spectrum of the audio data in the first direction.
  • the switching instruction includes the current shooting mode, the first picture-in-picture mode, the second picture-in-picture mode, the first dual-view mode, the second dual-view mode, the first camera's In the single-shot mode or the single-shot mode of the second camera, the processing module 710 is specifically used for:
  • the switching instruction is obtained to maintain the current shooting mode
  • the switching instruction is to switch to the single camera of the first camera. shooting mode
  • the switching instruction is to switch to the single camera of the second camera. shooting mode
  • the switching instruction is to switch to the first picture-in-picture mode
  • the switching instruction is to switch to the second picture-in-picture mode
  • the switching The instruction is to switch to the first dual-view mode
  • the switching The instruction is to switch to the second dual-view mode
  • the second preset threshold is greater than the first preset threshold
  • the first picture-in-picture mode refers to the shooting mode in which the image collected by the first camera is the main picture
  • the second picture-in-picture Mode refers to the shooting mode in which the image captured by the second camera is the main picture
  • the first dual-view mode refers to the mode in which the image captured by the first camera is located on the upper side or left side of the display screen of the electronic device.
  • the second dual-view mode refers to a shooting mode in which the image captured by the second camera is located on the upper side or the left side of the display screen of the electronic device.
  • the first amplitude spectrum is a first average amplitude spectrum obtained by averaging amplitude spectra corresponding to each frequency point in the audio data in the first direction; and/or,
  • the second amplitude spectrum is a second average amplitude spectrum obtained by averaging amplitude spectra corresponding to each frequency point in the audio data in the second direction.
  • the first amplitude spectrum is an amplitude spectrum obtained after performing first amplification processing and/or second amplification on the first average amplitude spectrum
  • the first average amplitude spectrum is the The amplitude spectrum corresponding to each frequency point in the audio data in the first direction is obtained by averaging.
  • processing module 710 is specifically configured to:
  • the first detection indicates that the audio data in the first direction includes audio information of the user, perform the first amplification process on the amplitude spectrum of the audio data in the first direction; and/or,
  • the predicted angle information includes angle information in the first preset angle range, performing the second amplification process on the amplitude spectrum of the audio data in the first direction.
  • the second amplitude spectrum is an amplitude spectrum obtained after performing first amplification processing and/or second amplification on the second average amplitude spectrum
  • the second average amplitude spectrum is the The amplitude spectrum corresponding to each frequency point in the audio data in the second direction is obtained by averaging.
  • processing module 710 is specifically configured to:
  • the second detection indicates that the audio data in the second direction includes audio information of the user, performing the first amplification process on the amplitude spectrum of the audio data in the second direction;
  • the predicted angle information includes angle information in the second preset angle range, performing the second amplification process on the amplitude spectrum of the audio data in the second direction.
  • processing module 710 is specifically configured to:
  • the first image is a preview image captured when the electronic device is in multi-mirror video recording.
  • the first image is a video frame captured when the electronic device is in multi-mirror video recording.
  • the audio data refers to data collected by the sound pickup device in a shooting environment where the electronic device is located.
  • module here may be implemented in the form of software and/or hardware, which is not specifically limited.
  • a “module” may be a software program, a hardware circuit or a combination of both to realize the above functions.
  • the hardware circuitry may include application specific integrated circuits (ASICs), electronic circuits, processors (such as shared processors, dedicated processors, or group processors) for executing one or more software or firmware programs. etc.) and memory, incorporating logic, and/or other suitable components to support the described functionality.
  • ASICs application specific integrated circuits
  • processors such as shared processors, dedicated processors, or group processors for executing one or more software or firmware programs. etc.
  • memory incorporating logic, and/or other suitable components to support the described functionality.
  • the units of each example described in the embodiments of the present application can be realized by electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are executed by hardware or software depends on the specific application and design constraints of the technical solution. Skilled artisans may use different methods to implement the described functions for each specific application, but such implementation should not be regarded as exceeding the scope of the present application.
  • FIG. 17 shows a schematic structural diagram of an electronic device provided by the present application.
  • the dotted line in FIG. 17 indicates that this unit or this module is optional; the electronic device 800 can be used to implement the methods described in the foregoing method embodiments.
  • the electronic device 800 includes one or more processors 801, and the one or more processors 801 can support the electronic device 800 to implement the video processing method in the method embodiment.
  • Processor 801 may be a general purpose processor or a special purpose processor.
  • the processor 801 may be a central processing unit (central processing unit, CPU), a digital signal processor (digital signal processor, DSP), an application specific integrated circuit (application specific integrated circuit, ASIC), a field programmable gate array (field programmable gate array, FPGA) or other programmable logic devices such as discrete gates, transistor logic devices, or discrete hardware components.
  • the processor 801 may be used to control the electronic device 800, execute software programs, and process data of the software programs.
  • the electronic device 800 may also include a communication unit 805, configured to implement input (reception) and output (send) of signals.
  • the electronic device 800 can be a chip, and the communication unit 805 can be an input and/or output circuit of the chip, or the communication unit 805 can be a communication interface of the chip, and the chip can be used as a component of a terminal device or other electronic devices .
  • the electronic device 800 may be a terminal device, and the communication unit 805 may be a transceiver of the terminal device, or the communication unit 805 may be a transceiver circuit of the terminal device.
  • the electronic device 800 may include one or more memories 802, on which a program 804 is stored, and the program 804 may be run by the processor 801 to generate an instruction 803, so that the processor 801 executes the video processing described in the above method embodiment according to the instruction 803 method.
  • data may also be stored in the memory 802 .
  • the processor 801 may also read data stored in the memory 802, the data may be stored at the same storage address as the program 804, or the data may be stored at a different storage address from the program 804.
  • the processor 801 and the memory 802 may be set independently, or may be integrated together, for example, integrated on a system-on-chip (system on chip, SOC) of a terminal device.
  • SOC system on chip
  • the memory 802 can be used to store the related program 804 of the video processing method provided in the embodiment of the present application
  • the processor 801 can be used to call the related program 804 of the video processing method stored in the memory 802 when executing the video processing method , execute the video processing method of the embodiment of the present application; for example, start the camera application program in the electronic device; display the first image, the first image is the image collected when the electronic device is in the first shooting mode; acquire audio data, the audio data is Data collected by at least two sound pickup devices in the electronic device; a switching instruction is obtained based on the audio data, and the switching instruction is used to instruct the electronic device to switch from the first shooting mode to the second shooting mode; a second image is displayed, and the second image is electronic Image captured when the device is in the second capture mode.
  • the present application also provides a computer program product, which implements the video processing method in any method embodiment of the present application when the computer program product is executed by the processor 801 .
  • the computer program product may be stored in the memory 802 , such as a program 804 , and the program 804 is finally converted into an executable object file executable by the processor 801 through processes such as preprocessing, compiling, assembling and linking.
  • the present application also provides a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a computer, the video processing method described in any method embodiment in the present application is implemented.
  • the computer program may be a high-level language program or an executable object program.
  • the computer readable storage medium is, for example, the memory 802 .
  • the memory 802 may be a volatile memory or a nonvolatile memory, or, the memory 802 may include both a volatile memory and a nonvolatile memory.
  • the non-volatile memory can be read-only memory (read-only memory, ROM), programmable read-only memory (programmable ROM, PROM), erasable programmable read-only memory (erasable PROM, EPROM), electrically programmable Erases programmable read-only memory (electrically EPROM, EEPROM) or flash memory.
  • Volatile memory can be random access memory (RAM), which acts as external cache memory.
  • RAM random access memory
  • SRAM static random access memory
  • DRAM dynamic random access memory
  • DRAM synchronous dynamic random access memory
  • SDRAM double data rate synchronous dynamic random access memory
  • double data rate SDRAM double data rate SDRAM
  • DDR SDRAM enhanced synchronous dynamic random access memory
  • ESDRAM enhanced synchronous dynamic random access memory
  • serial link DRAM SLDRAM
  • direct memory bus random access memory direct rambus RAM, DR RAM
  • the disclosed systems, devices and methods may be implemented in other ways.
  • the embodiments of the electronic equipment described above are only illustrative.
  • the division of the modules is only a logical function division, and there may be other division methods in actual implementation.
  • multiple units or components can be Incorporation may either be integrated into another system, or some features may be omitted, or not implemented.
  • the mutual coupling or direct coupling or communication connection shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or units may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place, or may be distributed to multiple network units. Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit.
  • sequence numbers of the processes do not mean the order of execution, and the execution order of the processes should be determined by their functions and internal logic, rather than by the embodiments of the present application.
  • the implementation process constitutes any limitation.
  • the functions described above are realized in the form of software function units and sold or used as independent products, they can be stored in a computer-readable storage medium.
  • the technical solution of the present application is essentially or the part that contributes to the prior art or the part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium, including Several instructions are used to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods described in the various embodiments of the present application.
  • the aforementioned storage medium includes: U disk, mobile hard disk, read-only memory (read-only memory, ROM), random access memory (random access memory, RAM), magnetic disk or optical disc and other media that can store program codes. .

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Human Computer Interaction (AREA)
  • Studio Devices (AREA)

Abstract

涉及视频处理领域,一种视频处理方法与电子设备,该视频处理方法包括:运行电子设备中的相机应用程序;显示第一图像,第一图像为电子设备处于第一拍摄模式时采集的图像;获取音频数据,音频数据为至少两个拾音装置采集的数据;基于音频数据得到切换指令,切换指令用于指示电子设备从第一拍摄模式切换至第二拍摄模式;显示第二图像,第二图像为电子设备处于第二拍摄模式时采集的图像。由此能够在无需用户切换电子设备的拍摄模式的情况下完成视频的录制,提高用户的拍摄体验。

Description

视频处理方法与电子设备
本申请要求于2021年12月27日提交国家知识产权局、申请号为202111636357.5、申请名称为“视频处理方法与电子设备”的中国专利申请的优先权,以及要求于2022年03月29日提交国家知识产权局、申请号为202210320689.0、申请名称为“视频处理方法与电子设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及视频处理领域,具体地,涉及一种视频处理方法与电子设备。
背景技术
电子设备在录像或者视频通话的场景中,经常会面临镜头转换的需求,需要对拍摄模式进行切换;例如,前置镜头与后置镜头的切换、多镜头录像或者单镜头录像的切换;目前,电子设备的镜头切换依赖于用户手动操作,因此需要拍摄者在拍摄过程中与电子设备之间的距离较近;若用户与电子设备之间的距离较远,则需要基于蓝牙技术实现电子设备的镜头切换;基于蓝牙技术实现电子设备的镜头切换时,需要通过控制设备对电子设备的镜头进行相应的操作,一方面操作较复杂;另一方面控制设备容易暴露在视频中,影响视频的美感,从而导致用户体验较差。
因此,在视频场景中,电子设备如何基于用户需求自动切换镜头成为一个亟需解决的问题。
发明内容
本申请提供了一种视频处理方法与电子设备,能够在无需用户切换电子设备的拍摄模式的情况下完成视频的录制,提高用户的拍摄体验。
第一方面,提供了一种视频处理方法,应用于电子设备,所述电子设备包括至少两个拾音装置,所述视频处理方法包括:
运行所述电子设备中的相机应用程序;
显示第一图像,所述第一图像为所述电子设备处于第一拍摄模式时采集的图像;
获取音频数据,所述音频数据为所述至少两个拾音装置采集的数据;
基于所述音频数据得到切换指令,所述切换指令用于指示所述电子设备从所述第一拍摄模式切换至第二拍摄模式;
显示第二图像,所述第二图像为所述电子设备处于所述第二拍摄模式时采集的图像。
在本申请的实施例中,电子设备可以通过至少两个拾音装置(例如,麦克风)采集拍摄环境中的音频数据;基于音频数据生成切换指令,电子设备基于切换指令自动从当前的第一拍摄模式切换至第二拍摄模式,显示在第二拍摄模式采集的第二图像; 在无需用户切换电子设备的拍摄模式的情况下,电子设备能够自动切换拍摄模式完成视频的录制,提高用户的拍摄体验。
应理解,在本申请的实施例中,由于电子设备需要对音频数据进行方向性判断,因此在本申请的实施例中电子设备中至少包括两个拾音装置,对拾音装置的具体数量不作任何限定。
在一种可能的实现方式中,第一拍摄模式可以是指单摄模式,或者多摄模式中的任意一种;其中,单摄模式可以包括前置单摄模式或者后置单摄模式;多摄模式可以包括前/后双摄模式、后/前双摄模式、画中画前置主画模式,或者画中画后置主画模式。
例如,在前置单摄模式下,采用电子设备中的一个前置摄像头进行视频拍摄;在后置单摄模式下,采用电子设备中的一个后置摄像头进行视频拍摄;在前后双摄模式下,采用一个前置摄像头和一个后置摄像头进行视频拍摄;在画中画前置模式下,采用一个前置摄像头和一个后置摄像头进行视频拍摄,且将后置摄像头拍摄的画面置于前置摄像头拍摄的画面之中,前指摄像头拍摄的画面为主画面;在画中画后置模式下,采用一个前置摄像头和一个后置摄像头进行视频拍摄,且将前置摄像头拍摄的画面置于后置摄像头拍摄的画面之中,后指摄像头拍摄的画面为主画面。
可选地,多摄模式还可以包括前置双摄模式、后置双摄模式、前置画中画模式或者后置画中画模式等。
应理解,第一拍摄模式与第二拍摄模式可以是指相同的拍摄模式或者不同的拍摄模式;若切换指令为默认当前拍摄模式,则第二拍摄模式与第一拍摄模式可以为相同的拍摄模式;在其他情况下,第二拍摄模式与第一拍摄模式可以为不同的拍摄模式。
结合第一方面,在第一方面的某些实现方式中,所述电子设备包括第一摄像头与第二摄像头,所述第一摄像头与所述第二摄像头位于所述电子设备的不同方向,所述基于音频数据得到切换指令,包括:
识别所述音频数据中是否包括目标关键词,所述目标关键词为所述切换指令对应的文本信息;
在所述音频数据中识别到所述目标关键词的情况下,基于所述目标关键词得到所述切换指令;
在所述音频数据中未识别所述目标关键词的情况下,对所述音频数据进行处理,得到第一方向的音频数据和/或第二方向的音频数据,所述第一方向用于表示所述第一摄像头对应的第一预设角度范围,所述第二方向用于表示所述第二摄像头对应的第二预设角度范围;基于所述第一方向的音频数据和/或所述第二方向音频数据,得到所述切换指令。
在本申请的实施例中,可以先识别音频数据中是否包括目标关键词;若音频数据中包括目标关键词,则电子设备将拍摄模式切换至目标关键词对应的第二拍摄模式;若音频数据中不包括目标关键词,则电子设备可以基于第一方向的音频数据和/或第二方向的音频数据得到切换指令;例如,若用户在电子设备的前方,则一般通过前置摄像头采集图像;若电子设备的前向方向存在用户的音频信息,则可以认为用户在电子设备的前向方向,此时可以开启前置摄像头;若用户在电子设备的后方,则一般通过后置摄像头采集图像;若电子设备的后向方向存在用户的音频信息,则可以认为用户 在电子设备的后向方向,此时可以开启后置摄像头。
结合第一方面,在第一方面的某些实现方式中,所述对所述音频数据进行处理,得到第一方向的音频数据和/或第二方向的音频数据,包括:
基于声音方向概率计算算法对所述音频数据进行处理,得到所述第一方向的音频数据和/或所述第二方向的音频数据。
在本申请的实施例中,可以计算音频数据在各个方向的概率大小,从而将音频数据进行方向分离,得到第一方向的音频数据与第二方向的音频数据;基于第一方向的音频数据和/后第二方向的音频数据可以得到切换指令;电子设备基于切换指令可以自动实现拍摄模式的切换。
结合第一方面,在第一方面的某些实现方式中,所述基于所述第一方向的音频数据和/或所述第二方向音频数据,得到所述切换指令,包括:
基于第一幅度谱的能量和/或第二幅度谱的能量,得到所述切换指令,所述第一幅度谱为所述第一方向的音频数据的幅度谱,所述第二幅度谱为所述第二方向的音频数据的幅度谱。
应理解,在录制视频场景中,通常可以认为音频数据的能量越大的方向(例如,音频信息的音量越大的方向)为主要拍摄方向;可以基于不同方向的音频数据的幅度谱的能量得到主要拍摄方向;比如,若第一方向的音频数据的幅度谱的能量大于第二方向的音频数据的幅度谱的能量,则可以认为第一方向为主要拍摄方向;此时,可以开启电子设备中第一方向对应的摄像头。
结合第一方面,在第一方面的某些实现方式中,所述切换指令包括当前拍摄模式、第一画中画模式、第二画中画模式、第一双景模式、第二双景模式、所述第一摄像头的单摄模式或者所述第二摄像头的单摄模式,所述基于第一幅度谱的能量和/或第二幅度谱的能量,得到所述切换指令,包括:
若所述第一幅度谱的能量与所述第二幅度谱的能量均小于第一预设阈值,得到所述切换指令为保持所述当前拍摄模式;
若所述第一幅度谱的能量大于第二预设阈值,且所述第二幅度谱的能量小于或者等于所述第二预设阈值,所述切换指令为切换为所述第一摄像头的单摄模式;
若所述第二幅度谱的能量大于第二预设阈值,且所述第一幅度谱的能量小于或者等于所述第二预设阈值,所述切换指令为切换为所述第二摄像头的单摄模式;
若所述第一幅度谱的能量大于第二预设阈值,且所述第二幅度谱的能量大于或者等于第一预设阈值,所述切换指令为切换为所述第一画中画模式;
若所述第二幅度谱的能量大于第二预设阈值,且所述第一幅度谱的能量大于或者等于第一预设阈值,所述切换指令为切换为所述第二画中画模式;
若所述第一幅度谱的能量与所述第二幅度谱的能量均大于或者等于第二预设阈值,且所述第一幅度谱的能量大于所述第二幅度谱的能量,所述切换指令为切换为所述第一双景模式;
若所述第一幅度谱的能量与所述第二幅度谱的能量均大于或者等于第二预设阈值,且所述第二幅度谱的能量大于所述第一幅度谱的能量,所述切换指令为切换为所述第二双景模式;
其中,所述第二预设阈值大于所述第一预设阈值,所述第一画中画模式是指所述第一摄像头采集的图像为主画面的拍摄模式,所述第二画中画模式是指所述第二摄像头采集的图像为主画面的拍摄模式,所述第一双景模式是指所述第一摄像头采集的图像位于所述电子设备的显示屏的上侧或者左侧的拍摄模式,所述第二双景模式是指所述第二摄像头采集的图像位于所述电子设备的显示屏的上侧或者左侧的拍摄模式。
结合第一方面,在第一方面的某些实现方式中,所述第一幅度谱为所述第一方向的音频数据中各个频点对应的幅度谱取平均得到的第一平均幅度谱;和/或,
所述第二幅度谱为所述第二方向的音频数据中各个频点对应的幅度谱取平均得到的第二平均幅度谱。
在本申请的实施例中,对第一方向的音频数据中的不同频点的幅度谱取平均值后得到的幅度谱可以称为第一平均幅度谱;对第二方向的音频数据中的不同频点的幅度谱取平均值后得到的幅度谱可以称为第二平均幅度谱;由于第一平均幅度谱和/或第二平均幅度谱是对不同频点的幅度谱取均值得到的幅度谱;因此,可以提高第一方向的音频数据和/或第一方向的音频数据中信息的准确性。
结合第一方面,在第一方面的某些实现方式中,所述第一幅度谱为对第一平均幅度谱进行第一放大处理和/或第二放大后得到的幅度谱,所述第一平均幅度谱为对所述第一方向的音频数据中各个频点对应的幅度谱取平均得到的。
结合第一方面,在第一方面的某些实现方式中,所述视频处理方法还包括:
对所述第一方向的音频数据进行语音检测,得到第一检测结果;
对所述至少两个拾音装置采集的数据进行波达方向估计,得到预测角度信息;
若所述第一检测检测指示所述第一方向的音频数据包括用户的音频信息,对所述第一方向的音频数据的幅度谱进行所述第一放大处理;和/或,
若所述预测角度信息包括所述第一预设角度范围中的角度信息,对所述第一方向的音频数据的幅度谱进行所述第二放大处理。
结合第一方面,在第一方面的某些实现方式中,所述第二幅度谱为对第二平均幅度谱进行第一放大处理和/或第二放大后得到的幅度谱,所述第二平均幅度谱为对所述第二方向的音频数据中各个频点对应的幅度谱取平均得到的。
结合第一方面,在第一方面的某些实现方式中,所述视频处理方法还包括:
对所述第二方向的音频数据进行语音检测,得到第二检测结果;
对所述至少两个拾音装置采集的数据进行波达方向估计,得到预测角度信息;
若所述第二检测检测指示所述第二方向的音频数据包括用户的音频信息,对所述第二方向的音频数据的幅度谱进行所述第一放大处理;和/或,
若所述预测角度信息包括所述第二预设角度范围中的角度信息,对所述第二方向的音频数据的幅度谱进行所述第二放大处理。
应理解,在录制视频场景中,通常可以认为用户所在的方向为主要拍摄方向;若检测结果中指示该方向包括用户的音频信息,则可以认为用户在该方向;此时,可以对该方向的音频数据进行第一放大处理,通过第一放大处理能够提高获取的用户音频信息的准确性。
其中,波达方向估计是指将接收的信号进行空间傅里叶变换,进而取模的平方得 到空间谱,估计出信号的到达方向的算法。
应理解,在录制视频场景中,通常可以认为用户所在的方向为主要拍摄方向;若检测结果中指示该方向包括用户的音频信息,则可以认为用户在该方向;此时,可以对该方向的音频数据进行第一放大处理,通过第一放大处理能够提高获取的用户音频信息的准确性;当预测角度信息包括第一预设角度范围和/或第二预设角度范围,则可以说明电子设备的第一方向和/后第二方向存在音频信息;通过第二放大处理,能够提高第一幅度谱或者第二幅度谱的准确性;在幅度谱与用户音频信息的准确性提升的情况下,能够准确地得到切换指令。
结合第一方面,在第一方面的某些实现方式中,所述识别所述音频数据中是否包括目标关键词,包括:
基于盲信号分离算法对所述音频数据进行分离处理,得到N个音频信息,所述N个音频信息为不同用户的音频信息;
对所述N个音频信息中的每个音频信息进行识别,确定所述N个音频信息中是否包括所述目标关键词。
在本申请的实施例中,可以先将至少两个拾音装置采集的音频数据进行分离处理,得到N个不同源的音频信息;在N个音频信息中分别识别是否包括目标关键词,从而能够提高识别目标关键词的准确性。
结合第一方面,在第一方面的某些实现方式中,所述第一图像为所述电子设备处于多镜录像时采集的预览图像。
结合第一方面,在第一方面的某些实现方式中,所述第一图像为所述电子设备处于多镜录像时采集的视频画面。
结合第一方面,在第一方面的某些实现方式中,所述音频数据是指在所述电子设备所处的拍摄环境中所述拾音装置采集的数据。
第二方面,提供了一种电子设备,所述电子设备包括一个或多个处理器、存储器、至少两个拾音装置;所述存储器与所述一个或多个处理器耦合,所述存储器用于存储计算机程序代码,所述计算机程序代码包括计算机指令,所述一个或多个处理器调用所述计算机指令以使得所述电子设备执行:
运行所述电子设备中的相机应用程序;
显示第一图像,所述第一图像为所述电子设备处于第一拍摄模式时采集的图像;
获取音频数据,所述音频数据为所述至少两个拾音装置采集的数据;
基于所述音频数据得到切换指令,所述切换指令用于指示所述电子设备从所述第一拍摄模式切换至第二拍摄模式;
显示第二图像,所述第二图像为所述电子设备处于所述第二拍摄模式时采集的图像。
结合第二方面,在第二方面的某些实现方式中,所述电子设备包括第一摄像头与第二摄像头,所述第一摄像头与所述第二摄像头位于所述电子设备的不同方向,所述一个或多个处理器调用所述计算机指令以使得所述电子设备执行:
识别所述音频数据中是否包括目标关键词,所述目标关键词为所述切换指令对应的文本信息;
在所述音频数据中识别到所述目标关键词的情况下,基于所述目标关键词得到所述切换指令;
在所述音频数据中未识别所述目标关键词的情况下,对所述音频数据进行处理,得到第一方向的音频数据和/或第二方向的音频数据,所述第一方向用于表示所述第一摄像头对应的第一预设角度范围,所述第二方向用于表示所述第二摄像头对应的第二预设角度范围;基于所述第一方向的音频数据和/或所述第二方向音频数据,得到所述切换指令。
结合第二方面,在第二方面的某些实现方式中,所述对所述音频数据进行处理,得到第一方向的音频数据和/或第二方向的音频数据,包括:
基于声音方向概率计算算法对所述音频数据进行处理,得到所述第一方向的音频数据和/或所述第二方向的音频数据。
结合第二方面,在第二方面的某些实现方式中,所述一个或多个处理器调用所述计算机指令以使得所述电子设备执行:
基于第一幅度谱的能量和/或第二幅度谱的能量,得到所述切换指令,所述第一幅度谱为所述第一方向的音频数据的幅度谱,所述第二幅度谱为所述第二方向的音频数据的幅度谱。
结合第二方面,在第二方面的某些实现方式中,所述切换指令包括当前拍摄模式、第一画中画模式、第二画中画模式、第一双景模式、第二双景模式、所述第一摄像头的单摄模式或者所述第二摄像头单摄模式,所述一个或多个处理器调用所述计算机指令以使得所述电子设备执行:
若所述第一幅度谱的能量与所述第二幅度谱的能量均小于第一预设阈值,得到所述切换指令为保持所述当前拍摄模式;
若所述第一幅度谱的能量大于第二预设阈值,且所述第二幅度谱的能量小于或者等于所述第二预设阈值,所述切换指令为切换为所述第一摄像头的单摄模式;
若所述第二幅度谱的能量大于第二预设阈值,且所述第一幅度谱的能量小于或者等于所述第二预设阈值,所述切换指令为切换为所述第二摄像头的单摄模式;
若所述第一幅度谱的能量大于第二预设阈值,且所述第二幅度谱的能量大于或者等于第一预设阈值,所述切换指令为切换为所述第一画中画模式;
若所述第二幅度谱的能量大于第二预设阈值,且所述第一幅度谱的能量大于或者等于第一预设阈值,所述切换指令为切换为所述第二画中画模式;
若所述第一幅度谱的能量与所述第二幅度谱的能量均大于或者等于第二预设阈值,且所述第一幅度谱的能量大于所述第二幅度谱的能量,所述切换指令为切换为所述第一双景模式;
若所述第一幅度谱的能量与所述第二幅度谱的能量均大于或者等于第二预设阈值,且所述第二幅度谱的能量大于所述第一幅度谱的能量,所述切换指令为切换为所述第二双景模式;
其中,所述第二预设阈值大于所述第一预设阈值,所述第一画中画模式是指所述第一摄像头采集的图像为主画面的拍摄模式,所述第二画中画模式是指所述第二摄像头采集的图像为主画面的拍摄模式,所述第一双景模式是指所述第一摄像头采集的图 像位于所述电子设备的显示屏的上侧或者左侧的拍摄模式,所述第二双景模式是指所述第二摄像头采集的图像位于所述电子设备的显示屏的上侧或者左侧的拍摄模式。
结合第二方面,在第二方面的某些实现方式中,所述第一幅度谱为所述第一方向的音频数据中各个频点对应的幅度谱取平均得到的第一平均幅度谱;和/或,
所述第二幅度谱为所述第二方向的音频数据中各个频点对应的幅度谱取平均得到的第二平均幅度谱。
结合第二方面,在第二方面的某些实现方式中,所述第一幅度谱为对第一平均幅度谱进行第一放大处理和/或第二放大后得到的幅度谱,所述第一平均幅度谱为对所述第一方向的音频数据中各个频点对应的幅度谱取平均得到的。
结合第二方面,在第二方面的某些实现方式中,所述一个或多个处理器调用所述计算机指令以使得所述电子设备执行:
对所述第一方向的音频数据进行语音检测,得到第一检测结果;
对所述至少两个拾音装置采集的数据进行波达方向估计,得到预测角度信息;
若所述第一检测检测指示所述第一方向的音频数据包括用户的音频信息,对所述第一方向的音频数据的幅度谱进行所述第一放大处理;和/或,
若所述预测角度信息包括所述第一预设角度范围中的角度信息,对所述第一方向的音频数据的幅度谱进行所述第二放大处理。
结合第二方面,在第二方面的某些实现方式中,所述第二幅度谱为对第二平均幅度谱进行第一放大处理和/或第二放大后得到的幅度谱,所述第二平均幅度谱为对所述第二方向的音频数据中各个频点对应的幅度谱取平均得到的。
结合第二方面,在第二方面的某些实现方式中,所述一个或多个处理器调用所述计算机指令以使得所述电子设备执行:
对所述第二方向的音频数据进行语音检测,得到第二检测结果;
对所述至少两个拾音装置采集的数据进行波达方向估计,得到预测角度信息;
若所述第二检测检测指示所述第二方向的音频数据包括用户的音频信息,对所述第二方向的音频数据的幅度谱进行所述第一放大处理;和/或,
若所述预测角度信息包括所述第二预设角度范围中的角度信息,对所述第二方向的音频数据的幅度谱进行所述第二放大处理。
结合第二方面,在第二方面的某些实现方式中,所述一个或多个处理器调用所述计算机指令以使得所述电子设备执行:
基于盲信号分离算法对所述音频数据进行分离处理,得到N个音频信息,所述N个音频信息为不同用户的音频信息;
对所述N个音频信息中的每个音频信息进行识别,确定所述N个音频信息中是否包括所述目标关键词。
结合第二方面,在第二方面的某些实现方式中,所述第一图像为所述电子设备处于多镜录像时采集的预览图像。
结合第二方面,在第二方面的某些实现方式中,所述第一图像为所述电子设备处于多镜录像时采集的视频画面。
结合第二方面,在第二方面的某些实现方式中,所述音频数据是指在所述电子设 备所处的拍摄环境中所述拾音装置采集的数据。
第三方面,提供了一种电子设备,包括用于执行第一方面或者第一方面中任一种视频处理方法的模块/单元。
第四方面,提供一种电子设备,所述电子设备包括一个或多个处理器、存储器;所述存储器与所述一个或多个处理器耦合,所述存储器用于存储计算机程序代码,所述计算机程序代码包括计算机指令,所述一个或多个处理器调用所述计算机指令以使得所述电子设备执行第一方面或者第一方面中的任一种方法。
第五方面,提供了一种芯片系统,所述芯片系统应用于电子设备,所述芯片系统包括一个或多个处理器,所述处理器用于调用计算机指令以使得所述电子设备执行第一方面或第一方面中的任一种方法。
第六方面,提供了一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序代码,当所述计算机程序代码被电子设备运行时,使得该电子设备执行第一方面或第一方面中的任一种方法。
第七方面,提供了一种计算机程序产品,所述计算机程序产品包括:计算机程序代码,当所述计算机程序代码被电子设备运行时,使得该电子设备执行第一方面或第一面中的任一种方法。
在本申请的实施例中,电子设备可以通过至少两个拾音装置(例如,麦克风)采集拍摄环境中的音频数据;基于音频数据生成切换指令,电子设备基于切换指令自动从当前的第一拍摄模式切换至第二拍摄模式,显示在第二拍摄模式采集的第二图像;在无需用户切换电子设备的拍摄模式的情况下,电子设备能够自动切换拍摄模式完成视频的录制,提高用户的拍摄体验。
附图说明
图1是一种适用于本申请的电子设备的硬件系统的示意图;
图2是一种适用于本申请的电子设备的软件系统的示意图;
图3是一种适用于本申请实施例的应用场景的示意图;
图4是一种适用于本申请实施例的应用场景的示意图;
图5是一种适用于本申请实施例的应用场景的示意图;
图6是一种适用于本申请实施例的应用场景的示意图;
图7是本申请实施例提供的一种视频处理方法的示意性流程图;
图8是本申请实施例提供的一种视频处理方法的示意性流程图;
图9是本申请实施例提供的一种视频处理方法的示意性流程图;
图10是本申请实施例提供的一种电子设备的目标角度的示意图;
图11是本申请实施例提供的一种切换指令的识别方法的示意性流程图;
图12是本申请实施例提供的一种波达方向估计的示意图;
图13是一种适用于本申请实施例的图形用户界面的示意图;
图14是一种适用于本申请实施例的图形用户界面的示意图;
图15是一种适用于本申请实施例的图形用户界面的示意图;
图16是本申请实施例提供的一种电子设备的结构示意图;
图17是本申请实施例提供的一种电子设备的结构示意图。
具体实施方式
在本申请的实施例中,以下术语“第一”、“第二”等仅用于描述目的,而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第二”的特征可以明示或者隐含地包括一个或者更多个该特征。在本实施例的描述中,除非另有说明,“多个”的含义是两个或两个以上。
为了便于对本申请实施例的理解,首先对本申请实施例中涉及的相关概念进行简要说明。
1、傅立叶变换
傅立叶变换是一种线性积分变换,用于表示信号在时域(或者,空域)与频域之间的变换。
2、快速傅立叶变换(fast fouriertransform,FFT)
FFT是指离散傅立叶变换的快速算法,可以将一个信号由时域变换到频域。
3、盲信号分离(blind signal separation,BSS)
盲信号分离是指从获取的混合信号(通常是多个传感器的输出)中恢复独立的源信号的算法。
4、波束形成
基于不拾音装置(例如,麦克风)采集的输入信号进行FFT变换后所得到的频域信号与不同角度的滤波器系数,可以得到不同角度的波束结果。
例如,
Figure PCTCN2022117323-appb-000001
其中,y(ω)表示不同角度的波束结果;
Figure PCTCN2022117323-appb-000002
表示不同角度的滤波器系数;x i(ω)表示拾音装置获取的输入信号做FFT变换后所得到的频域信号;i表示表示的是第i个麦克风的拾音信号;M表示麦克风的数量。
5、语音活性检测(voice activity detection,VAD)
语音活性检测是一项用于语音处理的技术,目的是检测语音信号是否存在。
6、波达方向估计(direction of arrival,DPA)
波达方向估计是指将接收的信号进行空间傅里叶变换,进而取模的平方得到空间谱,估计出信号的到达方向的算法。
7、基于到达时间差(time difference of arrival,TDOA)
TDOA用于表示声源到达电子设备中不同麦克风的时间差。
8、广义互相关-相位变换(generalized cross correlation-phase transform,GCC-PHAT)
GCC-PHAT是一种计算到达角(angle of arrival,AOA)的算法,如图12所示。
9、基于旋转不变技术的信号参数估计(estimating signal parameter via rotational invariance techniques,ESPRIT)
ESRT是指一种旋转不变性技术算法,其原理主要是基于信号的旋转不编写估算信号参数。
10、可控波束形成的定位算法
可控波束形成方法的定位算法的原理是将麦克风接收到的信号进行滤波加权求和 来形成波束,按照一定的规律对声源位置进行搜索,当麦克风达到最大输出功率时,搜索到的声源位置即为真实的声源方位。
11、倒谱算法
倒谱算法是信号处理和信号检测中的方法;所谓倒谱是指信号对数功率谱的功率谱;通过倒谱求语音的原理为:由于浊音信号是周期性激励的,因此浊音信号在倒谱上是周期的冲激,从而可以求得基音周期;一般把倒谱波形中第二个冲激(第一个是包络信息),认为是激励源的基频。
12、离散傅里叶反变换(inverse discrete fouriertransform,IDFT)
IDFT是指反变换就是傅里叶变换的逆过程。
13、角度中心高斯混合模型(complex angular central gaussian mixture model,cACGMM)
cACGMM是一种高斯混合模型;高斯混合模型是指用高斯概率密度函数(例如,正态分布曲线)精确地量化事物,将一个事物分解为若干的基于高斯概率密度函数(例如,正态分布曲线)形成的模型。
14、幅度谱
将信号变换到频域上之后,对信号进行取模操作即可获取幅度谱。
15、多镜录像
如图4中的(a)所示,多镜录像可以是指相机应用程序中与录像、拍摄等类似的一种相机模式;在多镜录像中可以包括多种不同的拍摄模式;例如,如图4中的(b)所示,拍摄模式可以包括但不限于:前/后双摄模式、后/前双摄模式、画中画1模式、画中画2模式、后置单摄模式或者前置单摄模式等。
下面将结合附图,对本申请实施例中的视频处理方法与电子设备进行描述。
图1示出了一种适用于本申请的电子设备的硬件系统。
电子设备100可以是手机、智慧屏、平板电脑、可穿戴电子设备、车载电子设备、增强现实(augmented reality,AR)设备、虚拟现实(virtual reality,VR)设备、笔记本电脑、超级移动个人计算机(ultra-mobile personal computer,UMPC)、上网本、个人数字助理(personal digital assistant,PDA)、投影仪等等,本申请实施例对电子设备100的具体类型不作任何限制。
电子设备100可以包括处理器110,外部存储器接口120,内部存储器121,通用串行总线(universal serial bus,USB)接口130,充电管理模块140,电源管理模块141,电池142,天线1,天线2,移动通信模块150,无线通信模块160,音频模块170,扬声器170A,受话器170B,麦克风170C,耳机接口170D,传感器模块180,按键190,马达191,指示器192,摄像头193,显示屏194,以及用户标识模块(subscriber identification module,SIM)卡接口195等。其中传感器模块180可以包括压力传感器180A,陀螺仪传感器180B,气压传感器180C,磁传感器180D,加速度传感器180E,距离传感器180F,接近光传感器180G,指纹传感器180H,温度传感器180J,触摸传感器180K,环境光传感器180L,骨传导传感器180M等。
示例性地,音频模块170用于将数字音频信息转换成模拟音频信号输出,也可以用于将模拟音频输入转换为数字音频信号。音频模块170还可以用于对音频信号编码 和解码。在一些实施例中,音频模块170或者音频模块170的部分功能模块可以设置于处理器110中。
例如,在本申请的实施例中,音频模块170可以将麦克风采集的音频数据向处理器110发送。
需要说明的是,图1所示的结构并不构成对电子设备100的具体限定。在本申请另一些实施例中,电子设备100可以包括比图1所示的部件更多或更少的部件,或者,电子设备100可以包括图1所示的部件中某些部件的组合,或者,电子设备100可以包括图1所示的部件中某些部件的子部件。图1示的部件可以以硬件、软件、或软件和硬件的组合实现。
处理器110可以包括一个或多个处理单元。例如,处理器110可以包括以下处理单元中的至少一个:应用处理器(application processor,AP)、调制解调处理器、图形处理器(graphics processing unit,GPU)、图像信号处理器(image signal processor,ISP)、控制器、视频编解码器、数字信号处理器(digital signal processor,DSP)、基带处理器、神经网络处理器(neural-network processing unit,NPU)。其中,不同的处理单元可以是独立的器件,也可以是集成的器件。控制器可以根据指令操作码和时序信号,产生操作控制信号,完成取指令和执行指令的控制。
处理器110中还可以设置存储器,用于存储指令和数据。在一些实施例中,处理器110中的存储器为高速缓冲存储器。该存储器可以保存处理器110刚用过或循环使用的指令或数据。如果处理器110需要再次使用该指令或数据,可从所述存储器中直接调用。避免了重复存取,减少了处理器110的等待时间,因而提高了系统的效率。
示例性地,处理器110可以用于执行本申请实施例的视频处理方法;例如,运行电子设备中的相机应用程序;显示第一图像,第一图像为电子设备处于第一拍摄模式时采集的图像;获取音频数据,音频数据为至少两个拾音装置采集的数据;基于音频数据得到切换指令,切换指令用于指示电子设备从第一拍摄模式切换至第二拍摄模式;显示第二图像,第二图像为电子设备处于第二拍摄模式时采集的图像。
图1所示的各模块间的连接关系只是示意性说明,并不构成对电子设备100的各模块间的连接关系的限定。可选地,电子设备100的各模块也可以采用上述实施例中多种连接方式的组合。
电子设备100的无线通信功能可以通过天线1、天线2、移动通信模块150、无线通信模块160、调制解调处理器以及基带处理器等器件实现。
天线1和天线2用于发射和接收电磁波信号。电子设备100中的每个天线可用于覆盖单个或多个通信频带。不同的天线还可以复用,以提高天线的利用率。例如:可以将天线1复用为无线局域网的分集天线。在另外一些实施例中,天线可以和调谐开关结合使用。
电子设备100可以通过GPU、显示屏194以及应用处理器实现显示功能。GPU为图像处理的微处理器,连接显示屏194和应用处理器。GPU用于执行数学和几何计算,用于图形渲染。处理器110可包括一个或多个GPU,其执行程序指令以生成或改变显示信息。
显示屏194可以用于显示图像或视频。
电子设备100可以通过ISP、摄像头193、视频编解码器、GPU、显示屏194以及应用处理器等实现拍摄功能。
ISP用于处理摄像头193反馈的数据。例如,拍照时,打开快门,光线通过镜头被传递到摄像头感光元件上,光信号转换为电信号,摄像头感光元件将所述电信号传递给ISP处理,转化为肉眼可见的图像。ISP可以对图像的噪点、亮度和色彩进行算法优化,ISP还可以优化拍摄场景的曝光和色温等参数。在一些实施例中,ISP可以设置在摄像头193中。
摄像头193用于捕获静态图像或视频。物体通过镜头生成光学图像投射到感光元件。感光元件可以是电荷耦合器件(charge coupled device,CCD)或互补金属氧化物半导体(complementary metal-oxide-semiconductor,CMOS)光电晶体管。感光元件把光信号转换成电信号,之后将电信号传递给ISP转换成数字图像信号。ISP将数字图像信号输出到DSP加工处理。DSP将数字图像信号转换成标准的红绿蓝(red green blue,RGB),YUV等格式的图像信号。在一些实施例中,电子设备100可以包括1个或N个摄像头193,N为大于1的正整数。
示例性地,在本申请的实施例中,电子设备可以包括多个摄像头193;多个摄像头中可以包括前置摄像头与后置摄像头。
数字信号处理器用于处理数字信号,除了可以处理数字图像信号,还可以处理其他数字信号。例如,当电子设备100在频点选择时,数字信号处理器用于对频点能量进行傅里叶变换等。
视频编解码器用于对数字视频压缩或解压缩。电子设备100可以支持一种或多种视频编解码器。这样,电子设备100可以播放或录制多种编码格式的视频,例如:动态图像专家组(moving picture experts group,MPEG)1、MPEG2、MPEG3和MPEG4。
陀螺仪传感器180B可以用于确定电子设备100的运动姿态。在一些实施例中,可以通过陀螺仪传感器180B确定电子设备100围绕三个轴(即,x轴、y轴和z轴)的角速度。陀螺仪传感器180B可以用于拍摄防抖。例如,当快门被按下时,陀螺仪传感器180B检测电子设备100抖动的角度,根据角度计算出镜头模组需要补偿的距离,让镜头通过反向运动抵消电子设备100的抖动,实现防抖。陀螺仪传感器180B还可以用于导航和体感游戏等场景。
加速度传感器180E可检测电子设备100在各个方向上(一般为x轴、y轴和z轴)加速度的大小。当电子设备100静止时可检测出重力的大小及方向。加速度传感器180E还可以用于识别电子设备100的姿态,作为横竖屏切换和计步器等应用程序的输入参数。
距离传感器180F用于测量距离。电子设备100可以通过红外或激光测量距离。在一些实施例中,例如在拍摄场景中,电子设备100可以利用距离传感器180F测距以实现快速对焦。
环境光传感器180L用于感知环境光亮度。电子设备100可以根据感知的环境光亮度自适应调节显示屏194亮度。环境光传感器180L也可用于拍照时自动调节白平衡。环境光传感器180L还可以与接近光传感器180G配合,检测电子设备100是否在口袋里,以防误触。
指纹传感器180H用于采集指纹。电子设备100可以利用采集的指纹特性实现解锁、访问应用锁、拍照和接听来电等功能。
触摸传感器180K,也称为触控器件。触摸传感器180K可以设置于显示屏194,由触摸传感器180K与显示屏194组成触摸屏,触摸屏也称为触控屏。触摸传感器180K用于检测作用于其上或其附近的触摸操作。触摸传感器180K可以将检测到的触摸操作传递给应用处理器,以确定触摸事件类型。可以通过显示屏194提供与触摸操作相关的视觉输出。在另一些实施例中,触摸传感器180K也可以设置于电子设备100的表面,并且与显示屏194设置于不同的位置。
上文详细描述了电子设备100的硬件系统,下面介绍电子设备100的软件系统。
图2是本申请实施例提供的电子设备的软件系统的示意图。
如图2所示,系统架构中可以包括应用层210、应用框架层220、硬件抽象层230、驱动层240以及硬件层250。
应用层210可以包括相机应用程序、图库、日历、通话、地图、导航、WLAN、蓝牙、音乐、视频、短信息等应用程序;应用层210又可以分为应用界面与应用逻辑;其中,相机应用的应用界面可以包括单景模式、双景模式、画中画模式等,对应于不同的视频拍摄模式。
应用框架层220为应用层的应用程序提供应用程序编程接口(application programming interface,API)和编程框架;应用框架层可以包括一些预定义的函数。
例如,应用框架层220可以包括相机访问接口;相机访问接口中可以包括相机管理与相机设备。其中,相机管理可以用于提供管理相机的访问接口;相机设备可以用于提供访问相机的接口。
硬件抽象层230用于将硬件抽象化。比如,硬件抽象层可以包相机抽象层以及其他硬件设备抽象层;相机硬件抽象层可以调用相机算法。
例如,硬件抽象层230中包括相机硬件抽象层与相机算法;相机算法中可以包括用于视频处理或者图像处理的软件算法。
示例性地,相机算法中的算法可以是指不依赖特定硬件实现;比如,通常可以在CPU中运行的代码等。
驱动层240用于为不同硬件设备提供驱动。例如,驱动层可以包括摄像头驱动。
硬件层250位于操作系统的最底层;如图2所示,硬件层250可以包括摄像头1、摄像头2、摄像头3等。其中,摄像头1、摄像头2、摄像头3可对应于电子设备上的多个摄像头。
示例性地,本申请实施例提供的视频处理方法与电子设备可以在硬件抽象层中运行;或者,可以在应用框架层运行;或者,可以在数字信号处理器中运行。
目前,电子设备的拍摄模式(例如,摄像头)的切换依赖于用户手动操作,因此需要用户在拍摄过程中与电子设备之间的距离较近;若用户与电子设备之间的距离较远,则需要基于蓝牙技术实现电子设备的拍摄模式的切换;基于蓝牙技术实现电子设备的拍摄模式切换时,需要通过控制设备对电子设备的镜头进行相应的操作,一方面操作较复杂;另一方面控制设备容易暴露在视频中,影响视频的美感,从而导致用户体验较差。
有鉴于此,本申请的实施例提供了一种视频处理方法,用户在使用电子设备拍摄视频的过程中,电子设备可以根据拍摄环境中的音频数据得到切换指令,基于切换指令电子设备能够自动切换电子设备的拍摄模式;例如,可以自动切换电子设备中的不同摄像头;比如,电子设备能够自动判断是否切换摄像头,或者是否开启多镜录像,或者是否在多镜录像中切换不同的拍摄模式等,使得在无需用户切换电子设备的拍摄模式的情况下完成视频的录制,实现一镜到底的录像体验。
应理解,“一镜到底”是指用户在选择某一拍摄模式后,用户无需再进行相应操作切换拍摄模式;电子设备可以基于采集的拍摄环境中的音频数据,自动生成切换指令;电子设备基于切换指令,自动切换拍摄模式。
下面结合图3至图15对本申请实施例提供的视频处理方法进行详细说明。
示例性地,本申请实施例中的视频处理方法可以应用于录制视频领域、视频通话领域或者其他图像处理领域;在本申请实施例中,通过电子设备中的至少两个拾音装置(例如,麦克风)采集拍摄环境中的音频数据;基于音频数据生成切换指令,电子设备基于切换指令自动从第一拍摄模式切换至第二拍摄模式,显示在第二拍摄模式采集的第二图像;在无需用户切换电子设备的拍摄模式的情况下,电子设备能够自动切换拍摄模式完成视频的录制,提高用户的拍摄体验。
在一个示例中,本申请实施例中的视频处理方法可以应用于录制视频的预览状态。
如图3所示,电子设备处于多镜录像的预览状态,电子设备当前的拍摄模式可以默认为前/后双景的拍摄模式,其中,前景画面可以如图像251所示,后景画面可以如图像252所示;前景画面可以是指电子设备的前置摄像头采集的图像,后景画面可以是指电子设备的后置摄像头采集的图像;下面以前景画面为图像251,后景画面为图像252为例进行举例说明。
如图4中的(a)所示,在电子设备检测到对多镜录像的拍摄模式的控件260的操作后,电子设备可以显示多镜录像中的多种不同的拍摄模式;例如,多种不同的拍摄模式可以包括但不限于:前/后双摄模式、后/前双摄模式、画中画1模式(后置画中画模式)、画中画2模式(前置画中画模式)、后置摄像头单摄模式、或者前置摄像头单摄模式等,如图4中的(b)所示;通过本申请实施例中的视频处理方法,在电子设备处于多镜录像的预览状态的情况下,电子设备中的至少两个拾音装置(例如,麦克风)采集拍摄环境中的音频数据;基于音频数据生成切换指令,电子设备基于切换指令自动从第一拍摄模式切换至第二拍摄模式,显示在第二拍摄模式采集的第二图像;例如,假设电子设备进入多镜录像的预览状态时,默认的拍摄模式为如图3所示的前/后双摄模式,基于电子设备中至少两个拾音装置采集的音频数据得到的切换指令为,将电子设备的拍摄模式切换至后置画中画模式;则在无需用户操作的情况下,电子设备可以自动从前/后双摄模式切换至后置画中画模式,显示第二图像,第二图像为预览图像。
其中,单摄模式可以包括前置单摄模式、后置单摄模式等;多摄模式可以包括前/后双摄模式、后/前双摄模式、画中画1模式、画中画2模式等。
可选地,多摄模式还可以包括前置双摄模式,或者后置双摄模式等。
应理解,在单摄模式下,采用电子设备中的一个摄像头进行视频拍摄;在多摄模式下,采用电子设备中的两个或两个以上摄像头进行视频拍摄。
示例性地,在前置单摄模式下,采用一个前置摄像头进行视频拍摄;在后置单摄模式下,采用一个后置摄像头进行视频拍摄;在前置双摄模式下,采用两个前置摄像头进行视频拍摄;在后置双摄模式下,采用两个后置摄像头进行视频拍摄;在前后双摄模式下,采用一个前置摄像头和一个后置摄像头进行视频拍摄;在前置画中画模式下,采用两个前置摄像头进行视频拍摄,且将一个前置摄像头拍摄的画面置于另一个前置摄像头拍摄的画面之中;在后置画中画模式下,采用两个后置摄像头进行视频拍摄,且将一个后置摄像头拍摄的画面置于另一个后置摄像头拍摄的画面之中;在前后画中画模式下,采用一个前置摄像头和一个后置摄像头进行视频拍摄,且将前置摄像头或后置摄像头拍摄的画面置于后置摄像头或前置摄像头拍摄的画面之中。
应理解,图4所示可以为电子设备在竖屏状态下,多镜录像的不同拍摄模式的拍摄界面;图5所示可以为电子设备在横屏状态下,多镜录像的不同拍摄模式的拍摄界面;其中,图4中的(a)与图5中的(a)对应,图4中的(b)与图5中的(b)对应;电子设备可以根据用户使用电子设备的状态确定竖屏显示或者横屏显示。
在一个示例中,本申请实施例中的视频处理方法可以应用于录制视频的过程中。
如图6所示,电子设备处于多镜录像的视频录制状态,电子设备当前的拍摄模式可以默认为前/后双景的拍摄模式,如图6中的(a)所示,在电子设备录制视频的第5秒时,检测到对多镜录像的拍摄模式的控件270的操作后,电子设备可以显示多镜录像中的多种不同的拍摄模式,如图6中的(b)所示;通过本申请实施例中的视频处理方法,在电子设备处于多镜录像的录制状态的情况下,电子设备中的至少两个拾音装置(例如,麦克风)采集拍摄环境中的音频数据;基于音频数据生成切换指令,电子设备基于切换指令自动从第一拍摄模式切换至第二拍摄模式,显示在第二拍摄模式采集的第二图像;例如,假设电子设备当前正在录制视频,电子设备开始录制视频时采用的默认的前/后双摄模式的拍摄模式,基于电子设备中至少两个拾音装置采集的音频数据得到的切换指令为,将电子设备的拍摄模式切换至后置画中画模式;则在无需用户操作的情况下,电子设备可以自动从前/后双摄模式切换至后置画中画模式,显示第二图像,第二图像为视频画面。
应理解,上述以多镜录像进行举例说明,本申请实施例中的视频处理方法还可以应用于:视频通话、视频会议应用、长短视频应用、视频直播类应用、视频网课应用、人像智能运镜应用场景、系统相机录像功能录制视频、视频监控或者智能猫眼等拍摄类场景中。
在一个示例中,本申请实施例中的视频处理方法还可以应用于电子设备的录像状态;例如,电子设备处于录像状态时可以采用默认的后置单摄的拍摄模式,电子设备中的至少两个拾音装置(例如,麦克风)采集拍摄环境中的音频数据;基于音频数据生成切换指令,电子设备基于切换指令可以自动从后置单摄模式切换至前置单摄模式;或者,电子设备基于切换指令可以自动从单摄模式切换至多摄模式,显示在第二拍摄模式采集的第二图像;第二图像可以为预览图像,或者第二图像也可以为视频画面。可选地,本申请实施例中的视频处理方法还可以应用于拍照领域;例如,电子设备处于录像状态时可以采用默认的后置单摄的拍摄模式,电子设备中的至少两个拾音装置(例如,麦克风)采集拍摄环境中的音频数据;基于音频数据生成切换指令,电子设备基于切换指令自动从后置单摄模式切换至前置单摄模式,显示在第二拍摄模式采集的第二图像;第二图像可以为预览图像,或者第二图像也可以为视频画面。
应理解,上述为对应用场景的举例说明,并不对本申请的应用场景作任何限定。
图7是本申请实施例提供的视频处理方法的示意性流程图。该视频处理方法300包括可以由图1所示的电子设备执行;该视频处理方法包括步骤S310至步骤S350,下面分别对步骤S310至步骤S350进行详细的描述。
步骤S310、运行电子设备的相机应用程序。
示例性地,用户可以通过单击“相机”应用程序的图标,指示电子设备运行相机应用。或者,电子设备处于锁屏状态时,用户可以通过在电子设备的显示屏上向右滑动的手势,指示电子设备运行相机应用。又或者,电子设备处于锁屏状态,锁屏界面上包括相机应用程序的图标,用户通过点击相机应用程序的图标,指示电子设备运行相机应用程序。又或者,电子设备在运行其他应用时,该应用具有调用相机应用程序的权限;用户通过点击相应的控件可以指示电子设备运行相机应用程序。例如,电子设备正在运行即时通信类应用程序时,用户可以通过选择相机功能的控件,指示电子设备运行相机应用程序等。
步骤S320、显示第一图像。
其中,第一图像为电子设备处于第一拍摄模式时采集的图像。
示例性地,第一拍摄模式可以是指单摄模式,或者多摄模式中的任意一种;其中,单摄模式可以包括前置单摄模式或者后置单摄模式;多摄模式可以包括前/后双摄模式、后/前双摄模式、画中画前置主画模式,或者画中画后置主画模式。
例如,在前置单摄模式下,采用电子设备中的一个前置摄像头进行视频拍摄;在后置单摄模式下,采用电子设备中的一个后置摄像头进行视频拍摄;在前后双摄模式下,采用一个前置摄像头和一个后置摄像头进行视频拍摄;在画中画前置模式下,采用一个前置摄像头和一个后置摄像头进行视频拍摄,且将后置摄像头拍摄的画面置于前置摄像头拍摄的画面之中,前指摄像头拍摄的画面为主画面;在画中画后置模式下,采用一个前置摄像头和一个后置摄像头进行视频拍摄,且将前置摄像头拍摄的画面置于后置摄像头拍摄的画面之中,后指摄像头拍摄的画面为主画面。
可选地,多摄模式还可以包括前置双摄模式、后置双摄模式、前置画中画模式或者后置画中画模式等。
可选地,在电子设备处于录像预览时,第一图像为预览图像。
可选地,在电子设备处于录像时,第一图像为视频画面。
可选地,在电子设备处于多镜录像预览时,第一图像为预览图像。
可选地,在电子设备处于多镜录像时,第一图像为视频画面。
步骤S330、获取音频数据。
其中,音频数据为电子设备中的至少两个拾音装置采集的数据;例如,至少两个麦克风采集的数据。
应理解,在本申请的实施例中,由于电子设备需要对音频数据进行方向性判断,因此在本申请的实施例中电子设备中至少包括两个拾音装置,对拾音装置的具体数量不作任何限定。
示例性地,如后续图9所示,电子设备中包括3个拾音装置。
示例性地,音频数据可以是指在电子设备所处的拍摄环境中拾音装置采集的数据。
步骤S340、基于音频数据得到切换指令。
其中,切换指令用于指示电子设备从第一拍摄模式切换至第二拍摄模式。
应理解,第一拍摄模式与第二拍摄模式可以为相同的拍摄模式,或者不同的拍摄模式;若切换指令为默认当前拍摄模式,则第二拍摄模式与第一拍摄模式可以为相同的拍摄模式,如表1所示的标识0;在其他情况下,第二拍摄模式与第一拍摄模式可以为不同的拍摄模式,如表1所示的标识1至标识6。
步骤是350、显示第二图像。
其中,第二图像为电子设备处于第二拍摄模式时采集的图像。
在本申请的实施例中,电子设备可以通过至少两个拾音装置(例如,麦克风)采集拍摄环境中的音频数据;基于音频数据生成切换指令,电子设备基于切换指令自动从当前的第一拍摄模式切换至第二拍摄模式,显示在第二拍摄模式采集的第二图像;在无需用户切换电子设备的拍摄模式的情况下,电子设备能够自动切换拍摄模式完成视频的录制,提高用户的拍摄体验。
示例性地,电子设备可以包括第一摄像头(例如,前置摄像头)与第二摄像头(例如,后置摄像头),第一摄像头与第二摄像头可以位于电子设备的不同方向;上述基于音频数据得到切换指令,包括:
识别音频数据中是否包括目标关键词,目标关键词为切换指令对应的文本信息;
在音频数据中识别到目标关键词的情况下,基于目标关键词得到切换指令;
在音频数据中未识别目标关键词的情况下,对音频数据进行处理,得到第一方向的音频数据和/或第二方向的音频数据,第一方向用于表示第一摄像头对应的第一预设角度范围,第二方向用于表示第二摄像头对应的第二预设角度范围;基于第一方向的音频数据和/或第二方向音频数据,得到切换指令。
在本申请的实施例中,可以先识别音频数据中是否包括目标关键词;若音频数据中包括目标关键词,则电子设备将拍摄模式切换至目标关键词对应的第二拍摄模式;若音频数据中不包括目标关键词,则电子设备可以基于第一方向的音频数据和/或第二方向的音频数据得到切换指令;例如,若用户在电子设备的前方,则一般通过前置摄像头采集图像;若电子设备的前向方向存在用户的音频信息,则可以认为用户在电子设备的前向方向,此时可以开启前置摄像头;若用户在电子设备的后方,则一般通过后置摄像头采集图像;若电子设备的后向方向存在用户的音频信息,则可以认为用户在电子设备的后向方向,此时可以开启后置摄像头。
其中,目标关键词可以包括但不限于:前置摄像头、后置摄像头、前置录像、后置录像、双景录像、画中画录像等;第一方向可以是指电子设备的前向方向,第一预设角度范围可以是指-30度值30度;第二方向可以是指电子设备的后向方向,第二预设角度范围可以是指150度值210度,如图10所示。
示例性地,可以基于声音方向概率计算算法对音频数据进行处理,得到第一方向的音频数据(例如,前向方向)和/或第二方向(例如,后向方向)的音频数据。具体过程可以参见后续图9所示的步骤S507至步骤S510,以及步骤S512,此处不再赘述。
在本申请的实施例中,可以计算音频数据在各个方向的概率大小,从而将音频数据进行方向分离,得到第一方向的音频数据与第二方向的音频数据;基于第一方向的 音频数据和/后第二方向的音频数据可以得到切换指令;电子设备基于切换指令可以自动实现拍摄模式的切换。
示例性地,基于第一方向的音频数据和/或第二方向音频数据,得到切换指令,包括:
基于第一幅度谱的能量和/或第二幅度谱的能量,得到切换指令,第一幅度谱为第一方向的音频数据的幅度谱,第二幅度谱为第二方向的音频数据的幅度谱。
应理解,在录制视频场景中,通常可以认为音频数据的能量越大的方向(例如,音频信息的音量越大的方向)为主要拍摄方向;可以基于不同方向的音频数据的幅度谱的能量得到主要拍摄方向;比如,若第一方向的音频数据的幅度谱的能量大于第二方向的音频数据的幅度谱的能量,则可以认为第一方向为主要拍摄方向;此时,可以开启电子设备中第一方向对应的摄像头。
示例性地,切换指令可以包括当前拍摄模式、第一画中画模式、第二画中画模式、第一双景模式、第二双景模式、所述第一摄像头的单摄模式或者所述第二摄像头的单摄模式,基于第一幅度谱的能量和/或第二幅度谱的能量,得到切换指令,包括:
若第一幅度谱的能量与第二幅度谱的能量均小于第一预设阈值,得到切换指令为保持当前拍摄模式;
若第一幅度谱的能量大于第二预设阈值,且第二幅度谱的能量小于或者等于第二预设阈值,切换指令为切换为第一摄像头单摄模式;
若第二幅度谱的能量大于第二预设阈值,且第一幅度谱的能量小于或者等于第二预设阈值,切换指令为切换为第二摄像头单摄模式;
若第一幅度谱的能量大于第二预设阈值,且第二幅度谱的能量大于或者等于第一预设阈值,切换指令为切换为第一画中画模式;
若第二幅度谱的能量大于第二预设阈值,且第一幅度谱的能量大于或者等于第一预设阈值,切换指令为切换为第二画中画模式;
若第一幅度谱的能量与第二幅度谱的能量均大于或者等于第二预设阈值,且第一幅度谱的能量大于第二幅度谱的能量,切换指令为切换为第一双景模式;
若第一幅度谱的能量与第二幅度谱的能量均大于或者等于第二预设阈值,且第二幅度谱的能量大于第一幅度谱的能量,切换指令为切换为第二双景模式;
其中,第二预设阈值大于第一预设阈值,第一画中画模式是指第一摄像头采集的图像为主画面的拍摄模式,第二画中画模式是指第二摄像头采集的图像为主画面的拍摄模式,第一双景模式是指第一摄像头采集的图像位于电子设备的显示屏的上侧或者左侧的拍摄模式,第二双景模式是指第二摄像头采集的图像位于电子设备的显示屏的上侧或者左侧的拍摄模式。
可选地,上述过程的具体实现方式可以参见图9所示的步骤S515的相关描述。
示例性地,第一幅度谱为第一方向的音频数据中各个频点对应的幅度谱取平均得到的第一平均幅度谱;和/或,
第二幅度谱为第二方向的音频数据中各个频点对应的幅度谱取平均得到的第二平均幅度谱。
在本申请的实施例中,对第一方向的音频数据中的不同频点的幅度谱取平均值后 得到的幅度谱可以称为第一平均幅度谱;对第二方向的音频数据中的不同频点的幅度谱取平均值后得到的幅度谱可以称为第二平均幅度谱;由于第一平均幅度谱和/或第二平均幅度谱是对不同频点的幅度谱取均值得到的幅度谱;因此,可以提高第一方向的音频数据和/或第一方向的音频数据中信息的准确性。
可选地,第一幅度谱为对第一平均幅度谱进行第一放大处理和/或第二放大后得到的幅度谱,第一平均幅度谱为对第一方向的音频数据中各个频点对应的幅度谱取平均得到的。
示例性地,上述视频处理方法还包括:
对第一方向的音频数据进行语音检测,得到第一检测结果;
对至少两个拾音装置采集的数据进行波达方向估计,得到预测角度信息;
若第一检测检测指示第一方向的音频数据包括用户的音频信息,对第一方向的音频数据的幅度谱进行第一放大处理;和/或,若预测角度信息包括第一预设角度范围中的角度信息,对第一方向的音频数据的幅度谱进行第二放大处理。
可选地,第二幅度谱为对第二平均幅度谱进行第一放大处理和/或第二放大后得到的幅度谱,第二平均幅度谱为对所述第二方向的音频数据中各个频点对应的幅度谱取平均得到的。
示例性地,上述视频处理方法还包括:
对第二方向的音频数据进行语音检测,得到第二检测结果;
对至少两个拾音装置采集的数据进行波达方向估计,得到预测角度信息;
若第二检测检测指示第二方向的音频数据包括用户的音频信息,对第二方向的音频数据的幅度谱进行第一放大处理;和/或,若预测角度信息包括第二预设角度范围中的角度信息,对第二方向的音频数据的幅度谱进行第二放大处理。
应理解,在录制视频场景中,通常可以认为用户所在的方向为主要拍摄方向;若检测结果中指示该方向包括用户的音频信息,则可以认为用户在该方向;此时,可以对该方向的音频数据进行第一放大处理,通过第一放大处理能够提高获取的用户音频信息的准确性。
还应理解,在录制视频场景中,通常可以认为用户所在的方向为主要拍摄方向;若检测结果中指示该方向包括用户的音频信息,则可以认为用户在该方向;此时,可以对该方向的音频数据进行第一放大处理,通过第一放大处理能够提高获取的用户音频信息的准确性;当预测角度信息包括第一预设角度范围和/或第二预设角度范围,则可以说明电子设备的第一方向和/后第二方向存在音频信息;通过第二放大处理,能够提高第一幅度谱或者第二幅度谱的准确性;在幅度谱与用户音频信息的准确性提升的情况下,能够准确地得到切换指令。
可选地,上述语音检测的具体过程可以参见后续图9中的步骤S511,或者步骤S513的相关描述。
可选地,上述第一放大处理和/或第二放大处理的具体过程可以参见图9中的步骤S515的相关描述。
示例性地,波达方向估计是指将接收的信号进行空间傅里叶变换,进而取模的平方得到空间谱,估计出信号的到达方向的算法。
可选地,波达方向估计的具体过程可以参见后续图8中步骤S407的相关描述,或者,图9中步骤S514的相关描述。
示例性地,识别音频数据中是否包括目标关键词,包括:
基于盲信号分离算法对音频数据进行分离处理,得到N个音频信息,N个音频信息为不同用户的音频信息;
对N个音频信息中的每个音频信息进行识别,确定N个音频信息中是否包括目标关键词。
在本申请的实施例中,可以先将至少两个拾音装置采集的音频数据进行分离处理,得到N个不同源的音频信息;在N个音频信息中分别识别是否包括目标关键词,从而能够提高识别目标关键词的准确性。
示例性地,盲信号分离算法是指从获取的混合信号(通常是多个传感器的输出)中恢复独立的源信号的算法。
可选地,盲信号分离算法的具体过程可以参见后续图8中的步骤S405的相关描述,或者图9中的步骤S504的相关描述。
可选地,识别音频数据中的目标关键词的具体过程可以参见后续图11的相关描述。
在本申请的实施例中,电子设备可以通过至少两个拾音装置(例如,麦克风)采集拍摄环境中的音频数据;基于音频数据生成切换指令,电子设备基于切换指令自动从当前的第一拍摄模式切换至第二拍摄模式,显示在第二拍摄模式采集的第二图像;在无需用户切换电子设备的拍摄模式的情况下,电子设备能够自动切换拍摄模式完成视频的录制,提高用户的拍摄体验。
图8是本申请实施例提供的视频处理方法的示意性流程图。该视频处理方法400包括可以由图1所示的电子设备执行;该视频处理方法包括步骤S401至步骤S410,下面分别对步骤S401至步骤S410进行详细的描述。
步骤S401、获取N个拾音装置(例如,麦克风)采集的音频数据。
步骤S402、对音频数据进行音源分离处理,得到M个音频信息。
应理解,音源分离也可以称为声源分离;例如,可以将采集的N路音频数据做傅里叶变换,之后将N路音频数据的频域数据加上超参数送入分离器进行声源分离,得到M个音频信息。
步骤S403、判断每个音频信息中是否包括切换指令;若包括切换指令,则执行步骤S404;若不包括切换指令,则执行步骤S405至步骤S410。
可选地,判断M个音频信息中每个音频信息中是否包括切换指令(目标关键词的一个示例);若M个音频信息中任意一个音频包括切换指令,则执行步骤S404;若M个音频信息中均不包括切换指令,则执行步骤S405至步骤S410。
示例性地,切换指令可以包括但不限于:切换至前置摄像头、切换至后置摄像头、前置录像、后置录像、双景录像、画中画录像等。可选地,切换指令的识别方法可以后续图11所示。
步骤S404、执行切换指令。
应理解,电子设备执行切换指令可以是指无需用户在手动切换相机应用程序的情况下,电子设备可以基于切换指令自动切换电子设备的摄像头。
步骤S405、对音频数据进行方向分离处理,得到前向音频信息和/或后向音频信息。
可选地,对N个麦克风采集的音频数据进行方向分离处理,得到前向音频信息(第一方向的音频数据的一个示例)和/或后向音频信息(第二方向的音频数据的一个示例)。
在本申请的实施例中,若在M个音频信息中检测到切换指令,则电子设备自动执行该切换指令;若在M个音频信息中未检测到切换指令,电子设备可以根据拾音装置采集的N路音频数据得到在电子设备的前向方向的目标角度内的前向音频信息,和/或在电子设备的后向方向的目标角度内的后向音频信息;基于前向音频信息的能量与后向音频信息的能量,进行分析可以得到切换指令;使得电子设备执行相应的切换指令。
示例性地,如图10所示,前向语音波束可以是指在电子设备的前向方向上的音频数据;其中,电子设备的前向方向的目标角度(第一预设角度范围的一个示例)可以为[-30,30];后向语音波束可以是指在电子设备的后向方向上的音频数据;其中,电子设备的后向方向的目标角度(第二预设角度范围的一个示例)可以为[150,210]。
可选地,可以基于拾音装置采集的N路音频数据在电子设备的各个方向的声音方向概率,将N路音频数据分离为前向音频数据和/或后向音频数据;例如,具体实现方法可以参见图6所示的步骤S507至步骤S511。
步骤S406、语音检测处理。
可选地,对前向音频信息和/或后向音频信息进行语音检测处理,得到检测结果。
在本申请的实施例中,对前向音频信息和/后向音频信息进行语音检测处理是为了确定前向音频信息和/或后向音频信息中是否包括用户的音频信息;若前向音频信息(或者,后向音频信息)包括用户的音频信息,则可以对前向音频信息(或者,后向音频信息)进行放大处理,从而确保能够准确地获取用户的音频信息。
示例性地,语音检测处理可以包括但不限于:语音活性检测,或者其他用户音频信息的检测方法,本申请对此不作任何限定。
步骤S407、对音频数据进行波达方向估计,得到预测角度信息。
可选地,对N个麦克风采集的音频数据进行波达方向估计,得到预测角度信息。
在本申请的实施例中,通过步骤S405与步骤S406可以将拾音装置采集的N路音频数据分为前向音频信息和/或后向音频信息;进一步,通过对拾音装置采集的N路音频数据进行波达方向估计可以得到音频数据对应的角度信息,从而可以确定拾音装置获取的音频数据是否在目标角度范围内;例如,确定音频数据是否在电子设备的前向方向的目标角度范围内,或者,后向方向的目标角度范围内。
可选地,对N个麦克风采集的音频数据进行波达方向估计,得到预测角度信息的具体实现方法可以参见图9所示的步骤S514。
可选地,在每个音频信息中不包括切换指令的情况下,可以执行步骤S405、步骤S406、步骤S408至步骤S410。
步骤S408、对前向音频信息和/或后向音频信息的幅度谱进行放大处理。
示例性地,可以基于语音检测处理的检测结果可以对前向音频信息和/或后向音频信息的幅度谱进行放大处理。
在本申请的实施例中,在前向音频信息(或者,后向音频信息)对应的语音检测的检测结果为包括用户的音频信息时,可以对前向音频信息(或者,后向音频信息)的幅度谱 进行放大处理,从而提高获取的用户音频信息的准确性。
示例性地,可以基于语音检测处理的检测结果与预测角度信息对前向音频信息和/或后向音频信息的幅度谱进行放大处理。
在本申请的实施例中,在预测角度信息包括电子设备的前向方向或者后向方向的目标角度时,可以对前向音频信息(或者,后向音频信息)的幅度谱进行放大处理,从而提高幅度谱的准确性;此外,在前向音频信息(或者,后向音频信息)对应的语音检测的检测结果为包括用户的音频信息时,可以对前向音频信息(或者,后向音频信息)的幅度谱进行放大处理,从而提高获取的用户音频信息的准确性;在幅度谱的准确性与用户的音频信息的准确性提升的情况下,能够提高得到的切换指令的准确性。
示例性地,分别计算前向音频信息与后向音频信息的幅度谱;当语音检测处理的检测结果表示前向音频信息中包括用户的音频信息时,可以对前向音频信息的幅度谱进行第一放大处理;或者,当语音活性检测结果表示后向音频信息中包括用户的音频信息时,可以对后向音频信息的幅度谱进行第一放大处理;例如,第一放大处理的放大系数为α(1<α<2)。
示例性地,当基于波达方向估计得到的预测角度信息表示拾音装置采集的N路音频数据包括前向方向的目标角度时,可以对前向音频信息的幅度谱进行第二放大处理;或者,当基于波达方向估计得到的预测角度信息表示拾音装置采集的N路音频数据包括后向方向的目标角度时,可以对后向音频信息的幅度谱进行第二放大处理;例如,第二放大处理的放大系数为β(1<β<2),得到放大处理后的前向音频信息和/或后向音频信息的幅度谱。
可选地,放大处理的具体实现可以参见后续图9所示的步骤S515。
步骤S409、基于放大处理后的前向音频信息和/或后向音频信息得到切换指令。
可选地,基于放大处理后的前向音频信息和/或后向音频信息的幅度谱的能量得到切换指令。
在一个示例中,若放大处理后的前向音频信息的幅度谱与放大处理后的后向音频信息的幅度谱的能量均小于第一预设阈值,则认为电子设备的前向方向上与后向方向上均没有音频数据,则电子设备保持默认镜头录制视频;例如,该切换指令可以对应标识0。
在一个示例中,若放大处理后的前向音频信息的幅度谱,或者放大处理后的后向音频信息的幅度谱中仅有一个能量大于第二预设阈值,则电子设备确定能量大于第一预设阈值的幅度谱对应的方向为主声源方向,将电子设备的镜头切换至该方向;例如,该切换指令可以是切换至后置镜头,该切换指令可以对应标识1;或者,该切换指令可以是的切换至前置镜头,该切换指令可以对应标识2。
在一个示例中,若放大处理后的前向音频信息的幅度谱,或者放大处理后的后向音频信息的幅度谱中仅有一个能量大于或者等于第二预设阈值,且另一个能量大于或者等于第一预设阈值,其中,第二预设阈值大于第一预设阈值,则电子设备可以确定能量大于第二预设阈值的幅度谱对应的方向为主声源方向,能量大于第一预设阈值的幅度谱对应的方向为第二声源方向,此时电子设备可以启动画中画录制模式;将能量大于或者等于第二预设阈值的幅度谱对应的方向的画面作为主画面,将能量大于或者等于第一预设阈值的幅度谱对应的方向的画面作为副画面。
例如,若前向音频信息对应的幅度谱的能量大于或者等于第二预设阈值,且后向音频信息对应的幅度谱的能量大于或者等于第一预设阈值,则电子设备的切换指令可以为画中画前置主画,该切换指令可以对应标识3。
例如,若后向音频信息对应的幅度谱的能量大于或者等于第二预设阈值,且前向音频信息对应的幅度谱的能量大于或者等于第一预设阈值,则电子设备的切换指令可以为画中画后置主画,该切换指令可以对应标识4。
在一个示例中,若放大处理后的前向音频信息的幅度谱,或者放大处理后的后向音频信息的幅度谱的能量均大于或者等于第二预设阈值,电子设备可以确定开启双景录制,即开启前置镜头与后置镜头;可选地,可以将能量较大的方向对应的镜头采集的画面显示在显示屏的上侧或者左侧。
例如,若前向音频信息与后向音频信息对应的幅度谱的能量均大于或者等于第二预设阈值,且前向音频信息对应的幅度谱的能量大于后向音频信息对应的幅度谱的能量,则电子设备的切换指令可以为前后双景录制,将电子设备的前置镜头采集的画面显示在显示屏的上侧或者左侧,该切换指令可以对应标识5。
例如,若前向音频信息与后向音频信息对应的幅度谱的能量均大于或者等于第二预设阈值,且后向音频信息对应的幅度谱的能量大于前向音频信息对应的幅度谱的能量,则电子设备的切换指令可以为后前双景录制,将电子设备的后置镜头采集的画面显示在显示屏的上侧或者左侧,该切换指令可以对应标识6。
步骤S410、执行切换指令。
示例性地,电子设备可以基于对放大处理后的前向音频信息和/或放大处理后的后向音频信息的幅度谱得到切换指令,并自动执行该切换指令;即电子设备可以在无需用户手动切换相机应用程序的情况下,基于切换指令自动切换电子设备的摄像头。
在本申请的实施例中,在视频拍摄的场景中,可以根据拍摄环境内的音频数据得到切换指令,使得电子设备能够自动判断是否切换镜头,或者是否开启多镜录像等,使得在用户无需手动操作的情况下实现一镜到底的录像体验,提高用户体验。
图9是本申请实施例提供的视频处理方法得示意性流程图。该视频处理方法500包括可以由图1所示的电子设备执行;该视频处理方法包括步骤S501至步骤S515,下面分别对步骤S501至步骤S515进行详细的描述。
应理解,图8所示的视频处理方法以电子设备中包括3个拾音装置进行举例说明;由于电子设备需要对音频信息进行方向性判断,因此在本申请的实施例中电子设备中至少包括两个拾音装置,对拾音装置的具体数量不作任何限定。
步骤S501、拾音装置1采集音频数据。
步骤S502、拾音装置2采集音频数据。
步骤S503、拾音装置3采集音频数据。
示例性地,拾音装置1、拾音装置2或者拾音装置3可以位于电子设备中的不同位置,用于采集不同方向的音频信息;例如,拾音装置1、拾音装置2或者拾音装置3可以是指麦克风。
可选地,可以在电子设备检测到用户选择录像模式,开启录制视频后,启动拾音装置1、拾音装置2、拾音装置3开始采集音频数据。
应理解,上述步骤S501至步骤S503可以是同时执行的。
步骤S504、盲信号分离。
可选地,对拾音装置采集的音频数据进行盲信号分离,得到M路音频信息。
应理解,盲信号分离也可以称为盲源分离(blind signal/source separation,BSS),是指在不知道源信号及信号混合参数的情况下,根据混合信号估计源信号。在本申请的实施例中,通过对采集的音频数据进行盲音信号分离可以得到不同源的音频信息,即不同对象的音频信号。
示例性地,在电子设备所处的拍摄环境中包括3个用户,分别为用户A、用户B与用户C;通过盲信号分离,可以得到音频数据中的用户A的音频信息、用户B的音频信息与用户C的音频信息。
步骤S505、判断是否包括切换指令。
可选地,判断M路音频信息中是否包括切换指令;若M路音频信息中包括切换指令,则执行步骤S506;若M路音频信息中未包括切换指令,则执行步骤S507至步骤S515。
示例性地,通过步骤S504可以得到M个音频信息;通过对M个音频信息中的每路音频信号进行切换指令识别,确定M个音频信息中的每路音频信号是否包括切换指令;其中,切换指令可以包括但不限于:切换至前置摄像头、切换至后置摄像头、前置录像、后置录像、双景录像、画中画录像等。
可选地,图11是本申请实施例提供的一种切换指令的识别方法的示意性流程图。该识别方法600包括步骤S601至步骤S606,下面分别对步骤S601至步骤S606进行详细的描述。
步骤S601、获取M个音频信息。
可选地,获取分离处理后的M个音频信息。
可选地,步骤S601中也可以是获取拾音装置采集的音频数据,如图8所示的步骤S401。
步骤S602、降噪处理。
可选地,对M个音频信息分别进行降噪处理。
示例性地,降噪处理可以采用任意一种降噪处理算法;例如,降噪处理算法可以包括谱减法或者维纳滤波算法;其中,减谱法的原理是用带噪信号的频谱减去噪声信号的频谱,得到干净信号的频谱;维纳滤波算法的原理是将带噪信号经过线性滤波器变换来逼近原信号,并求均方误差最小时的线性滤波器参数。
步骤S603、声学模型。
可选地,将降噪处理后的M个音频信息分别输入至声学模型,其中,声学模型为预先训练的深度神经网络。
步骤S604、输出置信度。
可选地,对于M个音频信息中每一路音频信息输出一个置信度,置信度用于表示一路音频信息中包括某一切换指令的置信度大小。
步骤S605、确定置信度大于预设阈值。
可选地,将置信度与预设阈值进行比较;在置信度大于预设阈值时,执行步骤S606。
步骤S606、得到切换指令。
应理解,上述步骤S601至步骤S606为举例说明;也可以通过其他识别方法识别音频 信息中是否包括切换指令,本申请对此不作任何限定。
步骤S506、执行切换指令。
示例性地,基于步骤S505识别到的切换指令,电子设备自动执行该切换指令。
应理解,电子设备自动执行该切换指令可以是指无需用户在手动切换相机应用程序的情况下,电子设备可以基于切换指令自动切换电子设备的摄像头。
应理解,步骤S507至步骤S509用于输出M个音频信息的指向性,即确定M个音频信息中前向音频信号与后向音频信号;其中,前向音频信号可以是指在电子设备的前置摄像头的预设角度范围内的音频信号;后向音频信号可以是指在电子设备的后置摄像头的预设角度范围内的音频信号。
步骤S507、声音方向概率计算。
可选地,在M个音频信息不包括切换指令的情况下,对M个音频信息进行声音方向概率计算。
示例性地,基于cACGMM以及3个拾音装置采集的音频数据,可以计算当前输入音频数据的频点在各个方向存在的概率值。
应理解,cACGMM是一种高斯混合模型;高斯混合模型是指用高斯概率密度函数(例如,正态分布曲线)精确地量化事物,将一个事物分解为若干的基于高斯概率密度函数(例如,正态分布曲线)形成的模型。
例如,音频数据的频点在各个方向的概率值满足以下约束条件:
Figure PCTCN2022117323-appb-000003
其中,P k(t,f)表示在k方向的概率值;t表示语音帧(例如,一帧音频数据),f表示频点(例如,一帧音频数据的频率角度)。
应理解,在本申请的实施例中,频点可以是指时频点;时频点可以包括音频数据对应的时间信息、频率范围信息与能量信息。
示例性地,在本申请的实施例中K可以为36;由于电子设备的一周为360度,K为36则可以每隔10度可以设置为一个方向。
应理解,上述约束条件可以表示某个频点在所有方向的概率总和为1。
步骤S508、空间聚类。
应理解,在本申请的实施例中,通过空间聚类可以确定音频数据在电子设备的摄像头的视角范围内的概率值。
示例性地,电子设备的屏幕正前方通常为0度方向,为了确保电子设备的摄像头内的音频数据不损失,如图10所示,可以将前向方向的目标角度设置为[-30,30];将电子设备的后向方向的目标角度设置为[150,210];对应的角度方向索引分别为k1~k2,空间聚类概率为:
Figure PCTCN2022117323-appb-000004
其中,P(t,f)表示音频数据的频点在目标角度的概率;P k(t,f)表示音频数据的频点在k方向的概率值。
步骤S509、增益计算。
例如,
Figure PCTCN2022117323-appb-000005
其中,g mask(t,f)表示音频数据的频点增益;P th1表示第一概率阈值;P th2表示第二概率阈值;g mask-min表示非目标角度中的音频数据的频点增益。
应理解,当音频数据的频点在目标角度的概率大于第一概率阈值时,可以表示该频点在目标角度范围内;当音频数据的频点在目标角度的概率小于或者等于第二概率阈值,则可以表示该频点在非目标角度范围;例如,第一概率阈值可以为0.8;非目标角度中的音频数据的频点增益可以是预先配置的参数;例如,0.2;第二概率阈值可以为0.1。
还应理解,通过上述音频数据的增益计算可以实现对音频数据的平滑处理;使得目标角度范围内的音频数据的频点增强,非目标角度范围内的音频数据的频点减弱。
步骤S510、后向语音波束。
可选地,基于音频数据的频点增益与傅里叶变换处理,可以得到后向语音波束,即后向音频数据。
示例性地,如图10所示,后向音频数据可以是指在电子设备的后向方向上的音频数据;其中,电子设备的后向方向的目标角度可以为[150,210]。
例如,y back(t,f)=g back-mask(t,f)*x back(t,f);其中,y back(t,f)可以表示后向音频数据;g back-mask(t,f)表示后向音频数据的频点增益;x back(t,f)表示后向音频数据的傅里叶变换。
步骤S511、语音活性检测。
可选地,对后向语音波束(例如,后向音频数据)进行语音活性检测。
示例性地,可以通过倒谱算法对后向音频数据进行语义检测,得到语音活性检测结果;若检测到基频,则确定后向语音波束中包括用户的音频信息;若未检测到基频,则确定后向语音波束中不包括用户的语音信息。
需要说明的是,后向音频数据是指电子设备采集的在后向方向的角度范围中的音频数据;后向音频数据中可以包括拍摄环境中的音频信息(例如,车辆的鸣笛声等),或者用户的语音信息;通过对后向音频数据进行语音检测是为了确定后向语音数据中是否包括用户的语音信息;在后向语音数据中包括用户的语音信息时,在执行后续步骤S515时可以对后向语音数据进行放大处理,从而使得能够提高获取用户的语音信息的准确性。
应理解,倒谱算法是信号处理和信号检测中的方法;所谓倒谱是指信号对数功率谱的功率谱;倒谱算法是信号处理和信号检测中的方法;所谓倒谱是指信号对数功率谱的功率谱;通过倒谱求语音的原理为:由于浊音信号是周期性激励的,因此浊音信号在倒谱上是周期的冲激,从而可以求得基音周期;一般把倒谱波形中第二个冲激(第一个是包络信息),认为是激励源的基频;基频是指语音的特征之一,若存在基频则表示当前音频数据中存在语音。
步骤S512、前向语音波束。
可选地,基于音频数据的频点增益与音频数据的能量,可以得到前向语音波束, 即前向音频数据。
示例性地,如图10所示,前向音频数据可以是指在电子设备的前向方向上的音频数据;其中,电子设备的前向方向的目标角度可以为[-30,30]。
例如,y front(t,f)=g front-mask(t,f)*x front(t,f);其中,y front(t,f)可以表示前向语音波束;g front-mask(t,f)表示前向音频数据的频点增益;x front(t,f)表示前向音频数据的傅里叶变换。
步骤S513、语音活性检测。
可选地,对前向语音波束(例如,前向音频数据)进行语音活性检测。
示例性地,可以通过倒谱算法对前向音频数据进行语义检测,得到语音活性检测结果;若检测到基频,则确定前向语音波束中包括用户的音频信息;若未检测到基频,则确定前向语音波束中不包括用户的语音信息。
需要说明的是,前向音频数据是指电子设备采集的在前向方向的角度范围中的音频数据;前向音频数据中可以包括拍摄环境中的音频信息(例如,车辆的鸣笛声等),或者用户的语音信息;通过对前向音频数据进行语音检测是为了确定前向语音数据中是否包括用户的语音信息;在前向语音数据中包括用户的语音信息时,在执行后续步骤S515时可以对前向语音数据进行放大处理,从而使得能够提高获取用户的语音信息的准确性。
步骤S514、波达方向估计。
可选地,对拾音装置采集的音频数据进行波达方向估计。
应理解,在本申请的实施例中,通过对拾音装置采集的音频数据进行波达方向估计可以得到音频数据对应的角度信息,从而可以确定拾音装置获取的音频数据是否在目标角度范围内;例如,确定音频数据是否在电子设备的前向方向的目标角度范围内,或者,后向方向的目标角度范围内。
示例性地,可以采用高分辨率谱估计的定位算法(例如,基于旋转不变技术的信号参数估计(estimating signal parameter via rotational invariance techniques,ESPRIT))、可控波束形成的定位算法或者基于到达时间差(time difference of arrival,TDOA)的定位算法等,对拾音装置采集的音频数据进行波达方向估计。
其中,ESRT是指一种旋转不变性技术算法,其原理主要是基于信号的旋转不变性估算信号参数。可控波束形成方法的定位算法的原理是将麦克风接收到的信号进行滤波加权求和来形成波束,按照一定的规律对声源位置进行搜索,当麦克风达到最大输出功率时,搜索到的声源位置即为真实的声源方位。TDOA用于表示声源到达电子设备中不同麦克风的时间差。
在一个示例中,TDOA的定位算法可以包括GCC-PHAT算法;以GCC-PHAT算法为例,对基于音频数据进行波达方向估计进行说明;如图12所示,拾音装置1与拾音装置2采集到音频数据;拾音装置1与拾音装置2之间的距离为d,则可以根据GCC-PHAT算法得到音频数据与电子设备之间的角度信息。
例如,图12所示的角度θ可以基于以下公式得到:
Figure PCTCN2022117323-appb-000006
其中,IDFT表示离散傅立叶逆变换的逆运算;x a(t,f)表示拾音装置1采集的音频数据进行傅立叶变换后得到的频域信息;x b(t,f)表示拾音装置2采集的音频数据进行傅立叶变换后得到的频域信息;arg表示变元(即自变量argument的英文缩写);arg max表示使后面的公式达到最大值时的变量的取值。
步骤S515、数据分析。
可选地,根据波达方向估计得到的角度信息与语音活性检测结果,可以对前向语音波束与后向语音波束进行数据分析,得到切换指令。
示例性地,分别计算前向语音波束与后向语音波束的平均幅度谱;当语音活性检测结果表示前向语音波束中包括用户的音频信息时,可以对前向语音波束的平均幅度谱分别进行第一放大处理;或者当语音活性检测结果表示后向语音波束中包括用户的音频信息时,可以对后向语音波束的平均幅度谱分别进行第一放大处理;例如,第一放大处理的放大系数为α(1<α<2)。
应理解,对前向语音波束中的不同频点的幅度谱取平均值后得到的幅度谱可以称为前向波束的平均幅度谱;对后向语音波束中的不同频点的幅度谱取平均值后得到的幅度谱可以称为后向波束的平均幅度谱;基于前向语音波束的平均幅度谱和/或后向语音波束的平均幅度谱进行数据分析,可以提高前向语音波束和/或后向语音波束中信息的准确性。
进一步地,当基于波达方向估计得到的角度信息确定前向语音波束在前向的目标角度范围时,可以对前向语音波束的平均幅度谱进行第二放大处理;或者,当基于波达方向估计得到的角度信息确定后向语音波束在后向的目标角度范围时,可以对后向语音波束的平均幅度谱进行第二放大处理;例如,第二放大处理的放大系数为β(1<β<2),得到放大处理后的前向语音波束的幅度谱与后向语音波束的幅度谱。
应理解,在本申请的实施例中,对前向语音波束或者后向语音波束进行放大处理是为了调整幅度谱的准确性;此外,在语音波束(例如,前向语音波束和/或后向语音波束)中包括用户的音频信息时,对语音波束的幅度谱进行放大处理,能够提高获取的用户音频信息的准确性;在幅度谱与用户音频信息的准确性提升的情况下,能够准确地得到语音波束中的切换指令。
例如,音频数据中一个频点对应的幅度谱可以通过以下公式计算:
Figure PCTCN2022117323-appb-000007
其中,Mag(i)表示第i个频点对应的幅度谱;i表示第i个频点;K表示频点范围;K i-1~K i表示取平均所需要的频点范围;应理解,可以无需对所有频点取平均,获 取部分频点的平均值。
例如,当语音活性检测结果表示前向语音波束中包括用户的音频信息,且前向语音波束在前向的目标角度范围内时,放大处理后的前向语音波束的平均幅度谱为:
MagFront=MagFront 1*α*β;
其中,MagFront表示放大处理后的前向语音波束的平均幅度谱;MagFront 1表示原始的前向语音波束的平均幅度谱;α表示预设的第一放大系数;β表示预设的第二放大系数。
应理解,对前向语音波束中的不同频点的幅度谱取平均值后得到的幅度谱可以称为前向波束的平均幅度谱。
例如,当语音活性检测结果表示后向语音波束中包括用户的音频信息,且后向语音波束在后向的目标角度范围内时,放大处理后的后向语音波束的平均幅度谱为:
MagBack=MagBack 1*α*β;
其中,MagBack表示放大处理后的后向语音波束的平均幅度谱;MagBack 1表示原始的后向语音波束的平均幅度谱;α表示预设的第一放大系数;β表示预设的第二放大系数。
应理解,对后向语音波束中的不同频点的幅度谱取平均值后得到的幅度谱可以称为后向波束的平均幅度谱。
在一个示例中,若MagFront与MagBack的能量均小于第一预设阈值,则认为电子设备的前向方向上与后向方向上均没有音频数据,则电子设备保持默认镜头录制视频;例如,如表1所示,该切换指令可以对应标识0。
在一个示例中,若MagFront或者MagBack中仅有一个能量大于第二预设阈值,则电子设备确定能量大于第一预设阈值的幅度谱对应的方向为主声源方向,将电子设备的镜头切换至该方向;例如,如表1所示,切换指令可以是切换至后置镜头,该切换指令可以对应标识1;或者,切换指令可以是的切换至前置镜头,该切换指令可以对应标识2。
在一个示例中,若MagFront或者MagBack中仅有一个能量大于或者等于第二预设阈值,且另一个能量大于或者等于第一预设阈值,其中,第二预设阈值大于第一预设阈值,则电子设备可以确定能量大于第二预设阈值的幅度谱对应的方向为主声源方向,能量大于第一预设阈值的幅度谱对应的方向为第二声源方向,此时电子设备可以启动画中画录制模式;将能量大于或者等于第二预设阈值的幅度谱对应的方向的画面作为主画面,将能量大于或者等于第一预设阈值的幅度谱对应的方向的画面作为副画面。
例如,若MagFront的能量大于或者等于第二预设阈值,且MagBack的能量大于或者等于第一预设阈值,则电子设备的切换指令可以为画中画前置主画,该切换指令可以对应标识3。
例如,若后向语音波束对应的幅度谱的能量大于或者等于第二预设阈值,且前向语音波束对应的幅度谱的能量大于或者等于第一预设阈值,则电子设备的切换指令可以为画中画后置主画;例如,如表1所示,该切换指令可以对应标识4。
在一个示例中,若MagFront或者MagBack的能量均大于或者等于第二预设阈值, 电子设备可以确定开启双景录制,即开启前置镜头与后置镜头。可选地,可以将能量较大的方向对应的镜头采集的画面显示在显示屏的上侧或者左侧。
例如,若MagFront与MagBack的能量均大于或者等于第二预设阈值,且MagFront的能量大于MagBack的能量,则电子设备的切换指令可以为前后双景录制,将电子设备的前置镜头采集的画面显示在显示屏的上侧或者左侧;例如,如表1所示,该切换指令可以对应标识5。
例如,若MagFront与MagBack的能量均大于或者等于第二预设阈值,且MagBack的能量大于MagFront的能量,则电子设备的切换指令可以为后前双景录制,将电子设备的后置镜头采集的画面显示在显示屏的上侧或者左侧;例如,如表1所示,该切换指令可以对应标识6。
表1
Figure PCTCN2022117323-appb-000008
应理解,表1中是对录制场景对应的标识进行举例说明,本申请对此不作任何限定;电子设备在不同的录制场景中,可以自动切换电子设备中的不同摄像头。
示例性地,电子设备可以基于对放大处理后的前向音频信息和/或放大处理后的后向音频信息的幅度谱得到切换指令,并自动执行该切换指令;即电子设备可以在无需用户手动切换相机应用程序的情况下,基于切换指令自动切换电子设备的摄像头。
在本申请的实施例中,在视频拍摄的场景中,可以根据拍摄环境内的音频数据得到切换指令,使得电子设备能够自动判断是否切换镜头,或者是否开启多镜录像等,使得在用户无需手动操作的情况下实现一镜到底的录像体验,提高用户体验。
图13示出了电子设备的一种图形用户界面(graphical user interface,GUI)。
如图13中的(a)所示,在多镜录像的预览界面中可以包括用于指示设置的控件601;检测到用户点击控件601的操作,响应于用户操作显示设置界面,如图13中的(b)所示;在设置界面上包括声控拍照的控件610,检测到用户开启声控拍照;在声控拍照中包括自动切换拍摄模式的控件620,检测到用户点击自动切换拍摄模式的控件620后,电子设备可以开启相机应用程序的自动切换拍摄模式;即可以执行本申请实施例提供的视频处理方法,在视频拍摄的场景中,可以根据拍摄环境内的音频数据得到切换指令,使得电子设备能够自动判断是否切换拍摄模式;在无需用户切换电子设备的拍摄模式的情况下完成视频的录制,提高用户的拍摄体验。
在一个示例中,如图14所示,在多镜录像的预览界面中可以包括指示开启自动切换拍摄模式的控件630,检测到用户点击自动切换拍摄模式的控件630后,电子设备可以开启相机应用程序的自动切换拍摄模式;即可以执行本申请实施例提供的视频处理方法,在视频拍摄的场景中,可以根据拍摄环境内的音频数据得到切换指令,使得电子设备能够自动判断是否切换拍摄模式;在无需用户切换电子设备的拍摄模式的情况下完成视频的录制,提高用户的拍摄体验。
图15示出了电子设备的一种图形用户界面(graphical user interface,GUI)。
图15中的(a)所示的GUI为电子设备的桌面640;当电子设备检测到用户点击 桌面640上的设置的图标650的操作后,可以显示如图15中的(b)所示的另一GUI;图15中的(b)所示的GUI可以是设置的显示界面,在设置的显示界面中可以包括无线网络、蓝牙或者相机等选项;点击相机选项,进入相机的设置界面,显示如图15中的(c)所示的相机设置界面;在相机设置界面中可以包括自动切换拍摄模式的控件660;检测到用户点击自动切换拍摄模式的控件660后,电子设备可以开启相机应用程序的自动切换拍摄模式;即可以执行本申请实施例提供的视频处理方法,在视频拍摄的场景中,可以根据拍摄环境内的音频数据得到切换指令,使得电子设备能够自动判断是否切换拍摄模式;在无需用户切换电子设备的拍摄模式的情况下完成视频的录制,提高用户的拍摄体验。
应理解,上述举例说明是为了帮助本领域技术人员理解本申请实施例,而非要将本申请实施例限于所例示的具体数值或具体场景。本领域技术人员根据所给出的上述举例说明,显然可以进行各种等价的修改或变化,这样的修改或变化也落入本申请实施例的范围内。
上文结合图1至图15详细描述了本申请实施例提供的视频处理方法;下面将结合图16与图17详细描述本申请的装置实施例。应理解,本申请实施例中的装置可以执行前述本申请实施例的各种方法,即以下各种产品的具体工作过程,可以参考前述方法实施例中的对应过程。
图16是本申请实施例提供的一种电子设备的结构示意图。该电子设备700包括处理模块710与显示模块720;电子设备700还可以包括至少两个拾音装置;例如,至少两个麦克风。
其中,所述处理模块710用于启动所述电子设备中的相机应用程序;显示模块720用于显示第一图像,所述第一图像为所述电子设备处于第一拍摄模式时采集的图像;处理模块710还用于获取音频数据,所述音频数据为所述至少两个拾音装置采集的数据;基于所述音频数据得到切换指令,所述切换指令用于指示所述电子设备从所述第一拍摄模式切换至第二拍摄模式;所述显示模块720还用于显示第二图像,所述第二图像为所述电子设备处于所述第二拍摄模式时采集的图像。
可选地,作为一个实施例,所述电子设备包括第一摄像头与第二摄像头,所述第一摄像头与所述第二摄像头位于所述电子设备的不同方向,所述处理模块710具体用于:
识别所述音频数据中是否包括目标关键词,所述目标关键词为所述切换指令对应的文本信息;
在所述音频数据中识别到所述目标关键词的情况下,基于所述目标关键词得到所述切换指令;
在所述音频数据中未识别所述目标关键词的情况下,对所述音频数据进行处理,得到第一方向的音频数据和/或第二方向的音频数据,所述第一方向用于表示所述第一摄像头对应的第一预设角度范围,所述第二方向用于表示所述第二摄像头对应的第二预设角度范围;基于所述第一方向的音频数据和/或所述第二方向音频数据,得到所述切换指令。
可选地,作为一个实施例,所述处理模块710具体用于:
基于声音方向概率计算算法对所述音频数据进行处理,得到所述第一方向的音频 数据和/或所述第二方向的音频数据。
可选地,作为一个实施例,所述处理模块710具体用于:
基于第一幅度谱的能量和/或第二幅度谱的能量,得到所述切换指令,所述第一幅度谱为所述第一方向的音频数据的幅度谱,所述第二幅度谱为所述第二方向的音频数据的幅度谱。
可选地,作为一个实施例,所述切换指令包括当前拍摄模式、第一画中画模式、第二画中画模式、第一双景模式、第二双景模式、所述第一摄像头的单摄模式或者所述第二摄像头的单摄模式,所述处理模块710具体用于:
若所述第一幅度谱的能量与所述第二幅度谱的能量均小于第一预设阈值,得到所述切换指令为保持所述当前拍摄模式;
若所述第一幅度谱的能量大于第二预设阈值,且所述第二幅度谱的能量小于或者等于所述第二预设阈值,所述切换指令为切换为所述第一摄像头的单摄模式;
若所述第二幅度谱的能量大于第二预设阈值,且所述第一幅度谱的能量小于或者等于所述第二预设阈值,所述切换指令为切换为所述第二摄像头的单摄模式;
若所述第一幅度谱的能量大于第二预设阈值,且所述第二幅度谱的能量大于或者等于第一预设阈值,所述切换指令为切换为所述第一画中画模式;
若所述第二幅度谱的能量大于第二预设阈值,且所述第一幅度谱的能量大于或者等于第一预设阈值,所述切换指令为切换为所述第二画中画模式;
若所述第一幅度谱的能量与所述第二幅度谱的能量均大于或者等于第二预设阈值,且所述第一幅度谱的能量大于所述第二幅度谱的能量,所述切换指令为切换为所述第一双景模式;
若所述第一幅度谱的能量与所述第二幅度谱的能量均大于或者等于第二预设阈值,且所述第二幅度谱的能量大于所述第一幅度谱的能量,所述切换指令为切换为所述第二双景模式;
其中,所述第二预设阈值大于所述第一预设阈值,所述第一画中画模式是指所述第一摄像头采集的图像为主画面的拍摄模式,所述第二画中画模式是指所述第二摄像头采集的图像为主画面的拍摄模式,所述第一双景模式是指所述第一摄像头采集的图像位于所述电子设备的显示屏的上侧或者左侧的拍摄模式,所述第二双景模式是指所述第二摄像头采集的图像位于所述电子设备的显示屏的上侧或者左侧的拍摄模式。
可选地,作为一个实施例,所述第一幅度谱为所述第一方向的音频数据中各个频点对应的幅度谱取平均得到的第一平均幅度谱;和/或,
所述第二幅度谱为所述第二方向的音频数据中各个频点对应的幅度谱取平均得到的第二平均幅度谱。
可选地,作为一个实施例,所述第一幅度谱为对第一平均幅度谱进行第一放大处理和/或第二放大后得到的幅度谱,所述第一平均幅度谱为对所述第一方向的音频数据中各个频点对应的幅度谱取平均得到的。
可选地,作为一个实施例,所述处理模块710具体用于:
对所述第一方向的音频数据进行语音检测,得到第一检测结果;
对所述至少两个拾音装置采集的数据进行波达方向估计,得到预测角度信息;
若所述第一检测检测指示所述第一方向的音频数据包括用户的音频信息,对所述第一方向的音频数据的幅度谱进行所述第一放大处理;和/或,
若所述预测角度信息包括所述第一预设角度范围中的角度信息,对所述第一方向的音频数据的幅度谱进行所述第二放大处理。
可选地,作为一个实施例,所述第二幅度谱为对第二平均幅度谱进行第一放大处理和/或第二放大后得到的幅度谱,所述第二平均幅度谱为对所述第二方向的音频数据中各个频点对应的幅度谱取平均得到的。
可选地,作为一个实施例,所述处理模块710具体用于:
对所述第二方向的音频数据进行语音检测,得到第二检测结果;
对所述至少两个拾音装置采集的数据进行波达方向估计,得到预测角度信息;
若所述第二检测检测指示所述第二方向的音频数据包括用户的音频信息,对所述第二方向的音频数据的幅度谱进行所述第一放大处理;和/或
若所述预测角度信息包括所述第二预设角度范围中的角度信息,对所述第二方向的音频数据的幅度谱进行所述第二放大处理。
可选地,作为一个实施例,所述处理模块710具体用于:
基于盲信号分离算法对所述音频数据进行分离处理,得到N个音频信息,所述N个音频信息为不同用户的音频信息;
对所述N个音频信息中的每个音频信息进行识别,确定所述N个音频信息中是否包括所述目标关键词。
可选地,作为一个实施例,所述第一图像为所述电子设备处于多镜录像时采集的预览图像。
可选地,作为一个实施例,所述第一图像为所述电子设备处于多镜录像时采集的视频画面。
可选地,作为一个实施例,所述音频数据是指在所述电子设备所处的拍摄环境中所述拾音装置采集的数据。
需要说明的是,上述电子设备700以功能模块的形式体现。这里的术语“模块”可以通过软件和/或硬件形式实现,对此不作具体限定。
例如,“模块”可以是实现上述功能的软件程序、硬件电路或二者结合。所述硬件电路可能包括应用特有集成电路(application specific integrated circuit,ASIC)、电子电路、用于执行一个或多个软件或固件程序的处理器(例如共享处理器、专有处理器或组处理器等)和存储器、合并逻辑电路和/或其它支持所描述的功能的合适组件。
因此,在本申请的实施例中描述的各示例的单元,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
图17示出了本申请提供的一种电子设备的结构示意图。图17中的虚线表示该单元或该模块为可选的;电子设备800可以用于实现上述方法实施例中描述的方法。
电子设备800包括一个或多个处理器801,该一个或多个处理器801可支持电子设备800实现方法实施例中的视频处理方法。处理器801可以是通用处理器或者专用 处理器。例如,处理器801可以是中央处理器(central processing unit,CPU)、数字信号处理器(digital signal processor,DSP)、专用集成电路(application specific integrated circuit,ASIC)、现场可编程门阵列(field programmable gate array,FPGA)或者其它可编程逻辑器件,如分立门、晶体管逻辑器件或分立硬件组件。
处理器801可以用于对电子设备800进行控制,执行软件程序,处理软件程序的数据。电子设备800还可以包括通信单元805,用以实现信号的输入(接收)和输出(发送)。
例如,电子设备800可以是芯片,通信单元805可以是该芯片的输入和/或输出电路,或者,通信单元805可以是该芯片的通信接口,该芯片可以作为终端设备或其它电子设备的组成部分。
又例如,电子设备800可以是终端设备,通信单元805可以是该终端设备的收发器,或者,通信单元805可以是该终端设备的收发电路。
电子设备800中可以包括一个或多个存储器802,其上存有程序804,程序804可被处理器801运行,生成指令803,使得处理器801根据指令803执行上述方法实施例中描述的视频处理方法。
可选地,存储器802中还可以存储有数据。
可选地,处理器801还可以读取存储器802中存储的数据,该数据可以与程序804存储在相同的存储地址,该数据也可以与程序804存储在不同的存储地址。
处理器801和存储器802可以单独设置,也可以集成在一起,例如,集成在终端设备的系统级芯片(system on chip,SOC)上。
示例性地,存储器802可以用于存储本申请实施例中提供的视频处理方法的相关程序804,处理器801可以用于在执行视频处理方法时调用存储器802中存储的视频处理方法的相关程序804,执行本申请实施例的视频处理方法;例如,启动电子设备中的相机应用程序;显示第一图像,第一图像为电子设备处于第一拍摄模式时采集的图像;获取音频数据,音频数据为电子设备中的至少两个拾音装置采集的数据;基于音频数据得到切换指令,切换指令用于指示电子设备从第一拍摄模式切换至第二拍摄模式;显示第二图像,第二图像为电子设备处于第二拍摄模式时采集的图像。
本申请还提供了一种计算机程序产品,该计算机程序产品被处理器801执行时实现本申请中任一方法实施例的视频处理方法。
该计算机程序产品可以存储在存储器802中,例如是程序804,程序804经过预处理、编译、汇编和链接等处理过程最终被转换为能够被处理器801执行的可执行目标文件。
本申请还提供了一种计算机可读存储介质,其上存储有计算机程序,该计算机程序被计算机执行时实现本申请中任一方法实施例所述的视频处理方法。该计算机程序可以是高级语言程序,也可以是可执行目标程序。
该计算机可读存储介质例如是存储器802。存储器802可以是易失性存储器或非易失性存储器,或者,存储器802可以同时包括易失性存储器和非易失性存储器。其中,非易失性存储器可以是只读存储器(read-only memory,ROM)、可编程只读存储器(programmable ROM,PROM)、可擦除可编程只读存储器(erasable PROM, EPROM)、电可擦除可编程只读存储器(electrically EPROM,EEPROM)或闪存。易失性存储器可以是随机存取存储器(random access memory,RAM),其用作外部高速缓存。通过示例性但不是限制性说明,许多形式的RAM可用,例如静态随机存取存储器(static RAM,SRAM)、动态随机存取存储器(dynamic RAM,DRAM)、同步动态随机存取存储器(synchronous DRAM,SDRAM)、双倍数据速率同步动态随机存取存储器(double data rate SDRAM,DDR SDRAM)、增强型同步动态随机存取存储器(enhanced SDRAM,ESDRAM)、同步连接动态随机存取存储器(synchlink DRAM,SLDRAM)和直接内存总线随机存取存储器(direct rambus RAM,DR RAM)。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
在本申请所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。例如,以上所描述的电子设备的实施例仅仅是示意性的,例如,所述模块的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。
应理解,在本申请的各种实施例中,各过程的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请的实施例的实施过程构成任何限定。
另外,本文中的术语“和/或”,仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。另外,本文中字符“/”,一般表示前后关联对象是一种“或”的关系。
所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(read-only memory,ROM)、随机存取存储器(random access memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准总之,以上所述仅为本申请技术方案的较佳实施例而已,并非用于限定本申请的保护范围。凡在本申请的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本申请的保护范围之内。

Claims (17)

  1. 一种视频处理方法,其特征在于,应用于电子设备,所述电子设备包括至少两个拾音装置,所述视频处理方法包括:
    运行所述电子设备中的相机应用程序;
    显示第一图像,所述第一图像为所述电子设备处于第一拍摄模式时采集的图像;
    获取音频数据,所述音频数据为所述至少两个拾音装置采集的数据;
    基于所述音频数据得到切换指令,所述切换指令用于指示所述电子设备从所述第一拍摄模式切换至第二拍摄模式;
    显示第二图像,所述第二图像为所述电子设备处于所述第二拍摄模式时采集的图像。
  2. 如权利要求1所述的视频处理方法,其特征在于,所述电子设备包括第一摄像头与第二摄像头,所述第一摄像头与所述第二摄像头位于所述电子设备的不同方向,所述基于所述音频数据得到切换指令,包括:
    识别所述音频数据中是否包括目标关键词,所述目标关键词为所述切换指令对应的文本信息;
    在所述音频数据中识别到所述目标关键词的情况下,基于所述目标关键词得到所述切换指令;
    在所述音频数据中未识别所述目标关键词的情况下,对所述音频数据进行处理,得到第一方向的音频数据和/或第二方向的音频数据,所述第一方向用于表示所述第一摄像头对应的第一预设角度范围,所述第二方向用于表示所述第二摄像头对应的第二预设角度范围;
    基于所述第一方向的音频数据和/或所述第二方向音频数据,得到所述切换指令。
  3. 如权利要求2所述的视频处理方法,其特征在于,所述对所述音频数据进行处理,得到第一方向的音频数据和/或第二方向的音频数据,包括:
    基于声音方向概率计算算法对所述音频数据进行处理,得到所述第一方向的音频数据和/或所述第二方向的音频数据。
  4. 如权利要求2或3所述的视频处理方法,其特征在于,所述基于所述第一方向的音频数据和/或所述第二方向音频数据,得到所述切换指令,包括:
    基于第一幅度谱的能量和/或第二幅度谱的能量,得到所述切换指令,所述第一幅度谱为所述第一方向的音频数据的幅度谱,所述第二幅度谱为所述第二方向的音频数据的幅度谱。
  5. 如权利要求4所述的视频处理方法,其特征在于,所述切换指令包括当前拍摄模式、第一画中画模式、第二画中画模式、第一双景模式、第二双景模式、所述第一摄像头的单摄模式或者所述第二摄像头的单摄模式,所述基于第一幅度谱的能量和/或第二幅度谱的能量,得到所述切换指令,包括:
    若所述第一幅度谱的能量与所述第二幅度谱的能量均小于第一预设阈值,得到所述切换指令为保持所述当前拍摄模式;
    若所述第一幅度谱的能量大于第二预设阈值,且所述第二幅度谱的能量小于或者等于所述第二预设阈值,所述切换指令为切换为所述第一摄像头的单摄模式;
    若所述第二幅度谱的能量大于第二预设阈值,且所述第一幅度谱的能量小于或者等于所述第二预设阈值,所述切换指令为切换为所述第二摄像头的单摄模式;
    若所述第一幅度谱的能量大于第二预设阈值,且所述第二幅度谱的能量大于或者等于第一预设阈值,所述切换指令为切换为所述第一画中画模式;
    若所述第二幅度谱的能量大于第二预设阈值,且所述第一幅度谱的能量大于或者等于第一预设阈值,所述切换指令为切换为所述第二画中画模式;
    若所述第一幅度谱的能量与所述第二幅度谱的能量均大于或者等于第二预设阈值,且所述第一幅度谱的能量大于所述第二幅度谱的能量,所述切换指令为切换为所述第一双景模式;
    若所述第一幅度谱的能量与所述第二幅度谱的能量均大于或者等于第二预设阈值,且所述第二幅度谱的能量大于所述第一幅度谱的能量,所述切换指令为切换为所述第二双景模式;
    其中,所述第二预设阈值大于所述第一预设阈值,所述第一画中画模式是指所述第一摄像头采集的图像为主画面的拍摄模式,所述第二画中画模式是指所述第二摄像头采集的图像为主画面的拍摄模式,所述第一双景模式是指所述第一摄像头采集的图像位于所述电子设备的显示屏的上侧或者左侧的拍摄模式,所述第二双景模式是指所述第二摄像头采集的图像位于所述电子设备的显示屏的上侧或者左侧的拍摄模式。
  6. 如权利要求4或5所述的视频处理方法,其特征在于,所述第一幅度谱为所述第一方向的音频数据中各个频点对应的幅度谱取平均得到的第一平均幅度谱;和/或,
    所述第二幅度谱为所述第二方向的音频数据中各个频点对应的幅度谱取平均得到的第二平均幅度谱。
  7. 如权利要求4或5所述的视频处理方法,其特征在于,所述第一幅度谱为对第一平均幅度谱进行第一放大处理和/或第二放大后得到的幅度谱,所述第一平均幅度谱为对所述第一方向的音频数据中各个频点对应的幅度谱取平均得到的。
  8. 如权利要求7所述的视频处理方法,其特征在于,所述视频处理方法还包括:
    对所述第一方向的音频数据进行语音检测,得到第一检测结果;
    对所述至少两个拾音装置采集的数据进行波达方向估计,得到预测角度信息;
    若所述第一检测检测指示所述第一方向的音频数据包括用户的音频信息,对所述第一方向的音频数据的幅度谱进行所述第一放大处理;和/或,
    若所述预测角度信息包括所述第一预设角度范围中的角度信息,对所述第一方向的音频数据的幅度谱进行所述第二放大处理。
  9. 如权利要求4或5所述的视频处理方法,其特征在于,所述第二幅度谱为对第二平均幅度谱进行第一放大处理和/或第二放大后得到的幅度谱,所述第二平均幅度谱为对所述第二方向的音频数据中各个频点对应的幅度谱取平均得到的。
  10. 如权利要求9所述的视频处理方法,其特征在于,所述视频处理方法还包括:
    对所述第二方向的音频数据进行语音检测,得到第二检测结果;
    对所述至少两个拾音装置采集的数据进行波达方向估计,得到预测角度信息;
    若所述第二检测检测指示所述第二方向的音频数据包括用户的音频信息,对所述第二方向的音频数据的幅度谱进行所述第一放大处理;和/或,
    若所述预测角度信息包括所述第二预设角度范围中的角度信息,对所述第二方向的音频数据的幅度谱进行所述第二放大处理。
  11. 如权利要求2至10中任一项所述的视频处理方法,其特征在于,所述识别所述音频数据中是否包括目标关键词,包括:
    基于盲信号分离算法对所述音频数据进行分离处理,得到N个音频信息,所述N个音频信息为不同用户的音频信息;
    对所述N个音频信息中的每个音频信息进行识别,确定所述N个音频信息中是否包括所述目标关键词。
  12. 如权利要求1至11中任一项所述的视频处理方法,其特征在于,所述第一图像为所述电子设备处于多镜录像时采集的预览图像。
  13. 如权利要求1至11中任一项所述的视频处理方法,其特征在于,所述第一图像为所述电子设备处于多镜录像时采集的视频画面。
  14. 如权利要求1至13中任一项所述的视频处理方法,其特征在于,所述音频数据是指在所述电子设备所处的拍摄环境中所述拾音装置采集的数据。
  15. 一种电子设备,其特征在于,包括:
    一个或多个处理器和存储器;
    所述存储器与所述一个或多个处理器耦合,所述存储器用于存储计算机程序代码,所述计算机程序代码包括计算机指令,所述一个或多个处理器调用所述计算机指令以使得所述电子设备执行如权利要求1至14中任一项所述的视频处理方法。
  16. 一种芯片系统,其特征在于,所述芯片系统应用于电子设备,所述芯片系统包括一个或多个处理器,所述处理器用于调用计算机指令以使得所述电子设备执行如权利要求1至14中任一项所述的视频处理方法。
  17. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质存储了计算机程序,当所述计算机程序被处理器执行时,使得处理器执行权利要求1至14中任一项所述的视频处理方法。
PCT/CN2022/117323 2021-12-27 2022-09-06 视频处理方法与电子设备 WO2023124200A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
EP22882090.8A EP4231622A4 (en) 2021-12-27 2022-09-06 VIDEO PROCESSING METHOD AND ELECTRONIC DEVICE

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
CN202111636357.5 2021-12-27
CN202111636357 2021-12-27
CN202210320689.0A CN116405774A (zh) 2021-12-27 2022-03-29 视频处理方法与电子设备
CN202210320689.0 2022-03-29

Publications (1)

Publication Number Publication Date
WO2023124200A1 true WO2023124200A1 (zh) 2023-07-06

Family

ID=86997448

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/117323 WO2023124200A1 (zh) 2021-12-27 2022-09-06 视频处理方法与电子设备

Country Status (2)

Country Link
EP (1) EP4231622A4 (zh)
WO (1) WO2023124200A1 (zh)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100063807A1 (en) * 2008-09-10 2010-03-11 Texas Instruments Incorporated Subtraction of a shaped component of a noise reduction spectrum from a combined signal
CN105306815A (zh) * 2015-09-30 2016-02-03 努比亚技术有限公司 一种拍摄模式切换装置、方法及移动终端
CN105681660A (zh) * 2016-01-20 2016-06-15 广东欧珀移动通信有限公司 一种拍摄模式的切换方法和装置
WO2019241920A1 (zh) * 2018-06-20 2019-12-26 优视科技新加坡有限公司 一种终端控制方法和装置
CN113422903A (zh) * 2021-06-16 2021-09-21 荣耀终端有限公司 拍摄模式切换方法、设备、存储介质和程序产品

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2007312039A (ja) * 2006-05-17 2007-11-29 Nec Saitama Ltd Tv電話機能付き携帯端末
US8451312B2 (en) * 2010-01-06 2013-05-28 Apple Inc. Automatic video stream selection
CN106303260A (zh) * 2016-10-18 2017-01-04 北京小米移动软件有限公司 摄像头切换方法、装置及终端设备
CN108073381A (zh) * 2016-11-15 2018-05-25 腾讯科技(深圳)有限公司 一种对象控制方法、装置及终端设备

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100063807A1 (en) * 2008-09-10 2010-03-11 Texas Instruments Incorporated Subtraction of a shaped component of a noise reduction spectrum from a combined signal
CN105306815A (zh) * 2015-09-30 2016-02-03 努比亚技术有限公司 一种拍摄模式切换装置、方法及移动终端
CN105681660A (zh) * 2016-01-20 2016-06-15 广东欧珀移动通信有限公司 一种拍摄模式的切换方法和装置
WO2019241920A1 (zh) * 2018-06-20 2019-12-26 优视科技新加坡有限公司 一种终端控制方法和装置
CN113422903A (zh) * 2021-06-16 2021-09-21 荣耀终端有限公司 拍摄模式切换方法、设备、存储介质和程序产品

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP4231622A4

Also Published As

Publication number Publication date
EP4231622A4 (en) 2024-04-03
EP4231622A1 (en) 2023-08-23

Similar Documents

Publication Publication Date Title
WO2021196401A1 (zh) 图像重建方法及装置、电子设备和存储介质
WO2021155632A1 (zh) 图像处理方法及装置、电子设备和存储介质
US9692959B2 (en) Image processing apparatus and method
US9235916B2 (en) Image processing device, imaging device, computer-readable storage medium, and image processing method
WO2020192252A1 (zh) 图像生成方法及装置、电子设备和存储介质
US9912859B2 (en) Focusing control device, imaging device, focusing control method, and focusing control program
WO2021056808A1 (zh) 图像处理方法及装置、电子设备和存储介质
WO2020134866A1 (zh) 关键点检测方法及装置、电子设备和存储介质
WO2021031609A1 (zh) 活体检测方法及装置、电子设备和存储介质
WO2019033411A1 (zh) 一种全景拍摄方法及装置
KR102314594B1 (ko) 이미지 디스플레이 방법 및 전자 장치
KR20160026251A (ko) 촬영 방법 및 전자 장치
WO2017124899A1 (zh) 一种信息处理方法及装置、电子设备
WO2017114048A1 (zh) 移动终端及联系人标识方法
CN110661970B (zh) 拍照方法、装置、存储介质及电子设备
WO2020192209A1 (zh) 一种基于Dual Camera+TOF的大光圈虚化方法
WO2020103353A1 (zh) 多波束选取方法及装置
WO2023142830A1 (zh) 切换摄像头的方法与电子设备
WO2024087804A1 (zh) 切换摄像头的方法与电子设备
TW201708928A (zh) 影像產生系統及影像產生方法
CN116405774A (zh) 视频处理方法与电子设备
CN113711123B (zh) 一种对焦方法、装置及电子设备
CN112508959A (zh) 视频目标分割方法、装置、电子设备及存储介质
WO2023124200A1 (zh) 视频处理方法与电子设备
CN115767290B (zh) 图像处理方法和电子设备

Legal Events

Date Code Title Description
ENP Entry into the national phase

Ref document number: 2022882090

Country of ref document: EP

Effective date: 20230504

ENP Entry into the national phase

Ref document number: 2022882090

Country of ref document: EP

Effective date: 20230504

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22882090

Country of ref document: EP

Kind code of ref document: A1