WO2022068613A1 - 音频处理的方法及电子设备 - Google Patents

音频处理的方法及电子设备 Download PDF

Info

Publication number
WO2022068613A1
WO2022068613A1 PCT/CN2021/119048 CN2021119048W WO2022068613A1 WO 2022068613 A1 WO2022068613 A1 WO 2022068613A1 CN 2021119048 W CN2021119048 W CN 2021119048W WO 2022068613 A1 WO2022068613 A1 WO 2022068613A1
Authority
WO
WIPO (PCT)
Prior art keywords
audio
result
analysis
electronic device
video
Prior art date
Application number
PCT/CN2021/119048
Other languages
English (en)
French (fr)
Inventor
陈代挺
Original Assignee
荣耀终端有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 荣耀终端有限公司 filed Critical 荣耀终端有限公司
Priority to EP21874274.0A priority Critical patent/EP4044578A4/en
Publication of WO2022068613A1 publication Critical patent/WO2022068613A1/zh
Priority to US17/740,114 priority patent/US11870941B2/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M3/00Automatic or semi-automatic exchanges
    • H04M3/42Systems providing special services or facilities to subscribers
    • H04M3/56Arrangements for connecting several subscribers to a common circuit, i.e. affording conference facilities
    • H04M3/568Arrangements for connecting several subscribers to a common circuit, i.e. affording conference facilities audio processing specific to telephonic conferencing, e.g. spatial distribution, mixing of participants
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/45Cameras or camera modules comprising electronic image sensors; Control thereof for generating image signals from two or more image sensors being of different type or operating in different modes, e.g. with a CMOS sensor for moving images in combination with a charge-coupled device [CCD] for still images
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R3/00Circuits for transducers, loudspeakers or microphones
    • H04R3/005Circuits for transducers, loudspeakers or microphones for combining the signals of two or more microphones
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M3/00Automatic or semi-automatic exchanges
    • H04M3/42Systems providing special services or facilities to subscribers
    • H04M3/56Arrangements for connecting several subscribers to a common circuit, i.e. affording conference facilities
    • H04M3/563User guidance or feature selection
    • H04M3/566User guidance or feature selection relating to a participants right to speak
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M3/00Automatic or semi-automatic exchanges
    • H04M3/42Systems providing special services or facilities to subscribers
    • H04M3/56Arrangements for connecting several subscribers to a common circuit, i.e. affording conference facilities
    • H04M3/567Multimedia conference systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/76Television signal recording
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/76Television signal recording
    • H04N5/765Interface circuits between an apparatus for recording and another apparatus
    • H04N5/77Interface circuits between an apparatus for recording and another apparatus between a recording apparatus and a television camera
    • H04N5/772Interface circuits between an apparatus for recording and another apparatus between a recording apparatus and a television camera the recording apparatus and the television camera being placed in the same enclosure
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/76Television signal recording
    • H04N5/91Television signal processing therefor
    • H04N5/92Transformation of the television signal for recording, e.g. modulation, frequency changing; Inverse transformation for playback
    • H04N5/9201Transformation of the television signal for recording, e.g. modulation, frequency changing; Inverse transformation for playback involving the multiplexing of an additional signal and the video signal
    • H04N5/9202Transformation of the television signal for recording, e.g. modulation, frequency changing; Inverse transformation for playback involving the multiplexing of an additional signal and the video signal the additional signal being a sound signal
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/76Television signal recording
    • H04N5/91Television signal processing therefor
    • H04N5/92Transformation of the television signal for recording, e.g. modulation, frequency changing; Inverse transformation for playback
    • H04N5/926Transformation of the television signal for recording, e.g. modulation, frequency changing; Inverse transformation for playback by pulse code modulation
    • H04N5/9265Transformation of the television signal for recording, e.g. modulation, frequency changing; Inverse transformation for playback by pulse code modulation with processing of the sound signal
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N9/00Details of colour television systems
    • H04N9/79Processing of colour television signals in connection with recording
    • H04N9/80Transformation of the television signal for recording, e.g. modulation, frequency changing; Inverse transformation for playback
    • H04N9/82Transformation of the television signal for recording, e.g. modulation, frequency changing; Inverse transformation for playback the individual colour picture signal components being recorded simultaneously only
    • H04N9/8205Transformation of the television signal for recording, e.g. modulation, frequency changing; Inverse transformation for playback the individual colour picture signal components being recorded simultaneously only involving the multiplexing of an additional signal and the colour video signal
    • H04N9/8211Transformation of the television signal for recording, e.g. modulation, frequency changing; Inverse transformation for playback the individual colour picture signal components being recorded simultaneously only involving the multiplexing of an additional signal and the colour video signal the additional signal being a sound signal
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N9/00Details of colour television systems
    • H04N9/79Processing of colour television signals in connection with recording
    • H04N9/80Transformation of the television signal for recording, e.g. modulation, frequency changing; Inverse transformation for playback
    • H04N9/82Transformation of the television signal for recording, e.g. modulation, frequency changing; Inverse transformation for playback the individual colour picture signal components being recorded simultaneously only
    • H04N9/8205Transformation of the television signal for recording, e.g. modulation, frequency changing; Inverse transformation for playback the individual colour picture signal components being recorded simultaneously only involving the multiplexing of an additional signal and the colour video signal
    • H04N9/8227Transformation of the television signal for recording, e.g. modulation, frequency changing; Inverse transformation for playback the individual colour picture signal components being recorded simultaneously only involving the multiplexing of an additional signal and the colour video signal the additional signal being at least another television signal
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R2420/00Details of connection covered by H04R, not provided for in its groups
    • H04R2420/01Input selection or mixing for amplifiers or loudspeakers
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R2499/00Aspects covered by H04R or H04S not otherwise provided for in their subgroups
    • H04R2499/10General applications
    • H04R2499/11Transducers incorporated or for use in hand-held devices, e.g. mobile phones, PDA's, camera's
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S2400/00Details of stereophonic systems covered by H04S but not provided for in its groups
    • H04S2400/15Aspects of sound capture and related signal processing for recording or reproduction

Definitions

  • the present application relates to the field of audio processing, and in particular, to an audio processing method and electronic device.
  • the present application provides an audio processing method and electronic device. By detecting the event that the speaker starts to speak based on the captured video image, the weight of the audio corresponding to the video image in the audio after the multi-channel audio combination is adjusted to solve the problem.
  • the electronic device plays audio and video files, it needs to perform audio switching in order to obtain the audio content, resulting in a sense of sudden change in sound.
  • an audio processing method includes a first camera and a second camera, wherein the first camera captures a first perspective, the second camera captures a second perspective, and the Methods include:
  • a video recording mode is entered; in the video recording mode, the first camera records a first video image for the first viewing angle; audio of multiple sound channels is recorded, and the multiple The audio of the sound channels includes the first audio corresponding to the first perspective and the second audio corresponding to the second perspective; at the first moment, the first speaker speaks, and the first speaker is located in the second within the angle of view;
  • the target video file including a third audio and a first video image, wherein the third audio includes at least part of the first audio and at least part of the second audio;
  • the target video file is played;
  • the first camera is a rear camera
  • the second camera is a front camera.
  • the electronic device records video images from the rear view angle through the rear camera; and the first speaker is located within the front view angle range, and the first speaker may be, for example, a user holding the electronic device.
  • the first camera is a front camera
  • the second camera is a rear camera.
  • the electronic device can record the video image of the front view through the front camera; and the first speaker is located in the range of the rear view, at this time, the first speaker may be, for example, the one who is far away from the electronic device. object etc.
  • the speaker in the embodiment of the present application may be a person who speaks during the video recording process and whose voice is recorded, such as: a user holding an electronic device; or a photographed object appearing in the video screen; or not appearing In the video footage, but the voice of the person was recorded.
  • the audios of the multiple sound channels may be audios corresponding to different perspectives, for example, the multi-channel audios respectively correspond to multiple shooting perspectives.
  • multiple audio channels may be simultaneously collected by multiple microphones respectively.
  • different audio frequencies can be collected through the local microphone and the wireless microphone of the electronic device, and the dual-channel audio can respectively correspond to two shooting angles.
  • the local microphone may be a microphone installed inside the electronic device
  • the wireless microphone may be a microphone that establishes a wireless connection with the electronic device.
  • the target video file may be a video or audio processed video file obtained by the electronic device in the video recording mode, such as a file in MP4 format.
  • the third audio in the target video file is the audio obtained by combining the audios of multiple sound channels, including at least part of the first audio and at least part of the second audio.
  • the audio of each channel may occupy a gain of different proportions.
  • the weight of the second audio can be set to be lower, such as 0.2 or 0.
  • the third audio is encoded according to the encoding manner of other audios in the multiple sound channels.
  • the third audio is encoded according to the encoding method of the second audio.
  • the electronic device after the electronic device receives the audios of the multiple sound channels input, it can further separately encode the audios of each channel.
  • the sum of the weights of the audio channels after the adjustment of the third audio channel should be 1.
  • the audio processing method provided by the embodiment of the present application, by detecting the event of the speaker starting to speak based on the captured video image, and adjusting the audio characteristics of the audio corresponding to the video image in the third audio, it is possible to present the complete audio On this basis, the switching effect between audios is optimized to achieve natural and smooth switching between audios, and the key content in the multi-channel audio is highlighted in a targeted manner to improve the listening experience of users.
  • the audio feature includes volume
  • playing the target video file specifically includes: when the video picture corresponding to the first moment is played, the first The volume of the second audio is increased.
  • the time when the speaker starts to speak is used as a reference, and the preset time period is returned from this time, and the third audio frequency is adjusted by i frames of audio frames in advance. the weight of the second audio in , until the target weight is reached.
  • the adjusted target weight of the second audio is greater than the weights of the other channels of audio, so that the third audio presents more content of the second audio.
  • the playing audio in the third audio can be switched to the audio corresponding to the perspective of the speaker, so that the user can hear the audio clearly. to the speaker's voice.
  • the volume of the second audio gradually increases.
  • the volume of the second audio in the currently playing third audio will gradually increase, so that the played audio is gradually switched to the second audio .
  • the audio frame is advanced by i frames, and the second audio frame is dynamically increased.
  • the weight of the audio is used as the benchmark, and the preset time period is returned from this moment.
  • the volume of the second audio can be gradually increased from weak to strong during recording and playback, so as to achieve the effect of switching from other audio to the second audio naturally , to avoid the sudden change of sound when playing the video.
  • the second camera in the recording mode, records a second video image from the second perspective, the electronic device displays a shooting interface, and the The shooting interface includes the first video picture and the second video picture;
  • the target video file also includes the second video picture
  • the electronic device When the electronic device plays the target video file, the electronic device displays the first video picture and the second video picture.
  • the electronic device may, for example, play a front-view picture and a rear-view picture at the same time, or simultaneously play a dual-front-view video picture or a dual-rear-view video picture at the same time.
  • the electronic device enables the user to watch video images from different perspectives by displaying multiple video images, and when the speaker in one of the perspectives starts to speak, the played audio starts to switch to the audio corresponding to the perspective.
  • An audio transition effect that matches the content of the video screen.
  • the second camera records a second video image from the second perspective
  • the electronic device displays a shooting interface, and the shooting interface does not include the first video image.
  • the electronic device plays the target video file, the electronic device does not display the second video image.
  • the electronic device may collect video images from different viewing angles through multiple cameras. However, during the recording process, the electronic device may only display part of the video images, and the video images that are not displayed may be used for image recognition by the electronic device. It is judged whether the speaker in the viewing angle corresponding to the undisplayed video picture is speaking.
  • the electronic device collects the video images corresponding to the front view angle and the video images corresponding to the rear view angle through the rear camera and the front camera respectively.
  • the shooting preview interface of the electronic device only the video images corresponding to the rear perspective may be displayed; and/or, when playing the video, only the video images corresponding to the rear perspective may be played.
  • the electronic device can run the front camera in the background to capture the video image corresponding to the front view angle. For example, the electronic device does not transmit the data of the front video image to the display, so during the recording process, the shooting preview interface does not display the front image. In addition, the data of the pre-video image is not written into the target recording file, so the pre-video image is not played during the recording and playback process.
  • the electronic device uses the front video screen to determine whether the speaker is speaking. When the speaker starts to speak, the volume of the second audio in the third audio is increased, and the played audio is switched to the audio corresponding to the front view.
  • the audio processing method when a video is played, only a part of the video image from a viewing angle is played, and when a speaker within the range of the unplayed viewing angle starts to speak, the played audio can still be switched to the viewing angle of the speaker.
  • the corresponding audio can meet the user's viewing needs of different video images, and ensure that the audio switching matches the audio content.
  • the second camera in the recording mode, records a second video image from the second viewing angle, and at the first moment, the first The first speaker in the second video frame opens his mouth.
  • the moment when the first speaker opens his mouth can be taken as the moment when the first speaker starts to speak.
  • a second speaker speaks, and the second speaker is located in the first perspective
  • the electronic device plays the target video file
  • the screen corresponding to the second time is played, the audio feature of the first audio in the third audio changes.
  • the first perspective is a rear perspective
  • the second speaker may be a subject within the rear perspective
  • the audio of the played third audio is switched to the audio corresponding to the rear perspective.
  • the volume of the audio corresponding to the rear perspective increases, highlighting the second speech. human voice.
  • the played audio when the video is played, when different starts to speak, the played audio will be switched to the audio corresponding to the perspective of the current speaker, so that the user can obtain the current speaker in a timely and complete manner.
  • the content of the spoken content does not require the user to manually switch the playback audio track, which improves the user's listening experience.
  • the volume of the first audio in the third audio gradually increases.
  • the volume of the first audio can be dynamically increased over time, so that when the video is played, the volume of the first audio can be gradually increased from weak to strong, so as to achieve a natural switch to the first audio. , so that the user can hear the voice of the second speaker clearly.
  • the audio processing method provided by the embodiment of the present application, by gradually increasing the volume of the first audio, the effect of naturally switching from other audios to the first audio in the third audio can be realized, avoiding the sudden change of sound when playing the video.
  • the electronic device includes a first microphone and a second microphone
  • the first microphone records the first audio
  • the second microphone records the second audio
  • the first microphone records the second audio
  • the second microphone records the first audio
  • the first microphone and the second microphone may be microphone devices installed inside the electronic device, which are local microphones of the electronic device.
  • an electronic device can record audio from different perspectives through multiple native microphones.
  • multiple local microphones can be installed in different positions of the electronic device, and can record audio in different viewing angles.
  • the electronic device includes a first microphone, and the second microphone is wirelessly connected to the electronic device;
  • the first microphone records the first audio
  • the second microphone records the second audio
  • the first microphone records the second audio
  • the second microphone records the first audio
  • the first microphone may be a microphone device installed inside the electronic device, which is a local microphone of the electronic device; the second microphone may be a wireless microphone, for example, a Bluetooth headset, a Bluetooth speaker, a mobile phone of other users, etc. functional device.
  • the electronic device may record audio corresponding to the front view angle through a local microphone, and audio corresponding to the rear view angle may be recorded by the wireless microphone.
  • the wireless microphone may be worn by, for example, the subject within the range of the rear view angle, or the wireless microphone may be placed in a position that is convenient for recording the audio of the rear view angle.
  • the electronic device can be wirelessly connected with the wireless microphone, so that the electronic device can record audio from different locations through the wireless microphone, especially the audio from a location far away from the electronic device, thereby increasing the The flexibility of audio recording improves the quality of audio recording from different perspectives.
  • both the first microphone and the second microphone are wirelessly connected to the electronic device
  • the first microphone records the first audio
  • the second microphone records the second audio
  • the first microphone records the second audio
  • the second microphone records the first audio
  • the first microphone and the second microphone are both wireless microphones and are wirelessly connected to the electronic device.
  • the wireless microphones can be flexibly arranged at different positions, so the wireless microphones can be arranged at positions convenient for recording audio corresponding to different viewing angles according to the shooting angle of view, thereby improving the audio quality and the flexibility of audio recording.
  • the first microphone when performing dual-channel recording, can be worn by the speaker in the front view, and the second microphone can be worn by the speaker in the rear view to record the audio of different speakers respectively. At this time, Even if the distance between the speaker and the electronic device changes, the audio recording will not be affected.
  • the audio frame of the first audio, the audio frame of the second audio, and the video frame of the first video picture are buffered;
  • the adjustment may be performed from a certain frame before the current frame.
  • the first audio frame may be an audio frame buffered in the buffer at the moment when the mouth opening of the first speaker is detected.
  • the starting moment for adjusting the audio features of each channel of audio which specifically includes: using the currently buffered first audio frame as a reference, rolling back a preset time length, and starting to combine the multiple channels of audio .
  • the preset time length may be, for example, 100 ms.
  • the processing delay of the electronic device can be avoided, resulting in the problem that the third audio cannot completely include the target audio content.
  • the first viewing angle and the second viewing angle are any two viewing angles among a front viewing angle, a wide viewing angle, and a zoom viewing angle.
  • an audio processing method is provided, which is applied to an electronic device.
  • the electronic device includes a first camera and a second camera, wherein the first camera captures a first angle of view, and the second camera captures a first camera. From two perspectives, the method includes:
  • a video recording mode is entered; in the video recording mode, the first camera records a first video image for the first viewing angle; audio of multiple sound channels is recorded, and the multiple The audio of each sound channel includes the first audio corresponding to the first perspective and the second audio corresponding to the second perspective; at the first moment, the first speaker speaks, and the first speaker is located in the first within the angle of view;
  • the target video file including a third audio and a first video image, wherein the third audio includes at least part of the first audio and at least part of the second audio;
  • the target video file is played;
  • the first camera is a rear camera
  • the first viewing angle is a rear viewing angle
  • the first video picture is a picture of the rear viewing angle
  • the first audio is sound within the range of the rear viewing angle
  • the first An audio may include a speaking voice of a first speaker
  • the first speaker is a photographed subject located within a rear view angle
  • the second viewing angle is the front viewing angle
  • the second audio is the sound within the range of the front viewing angle.
  • the first camera may also be a front camera of the electronic device, the first viewing angle is the front viewing angle, the first video image is the image of the front viewing angle, and the first audio is the sound within the range of the front viewing angle.
  • the second viewing angle is the rear viewing angle, and the second audio is the sound within the range of the rear viewing angle.
  • the third audio is audio after combining audios of multiple sound channels, including at least part of the first audio and at least part of the second audio.
  • the audio processing method by detecting the event of the speaker starting to speak based on the captured video image, and dynamically adjusting the weight of the audio corresponding to the video image in the third audio, it is possible to display the complete audio when the audio is displayed.
  • the switching effect between audios is optimized to achieve natural and smooth switching between audios, and the key content in the multi-channel audio is highlighted in a targeted manner to improve the listening experience of users.
  • the audio feature includes volume
  • playing the target video file includes:
  • the volume of the first audio is increased.
  • the playing audio in the third audio can be switched to the audio corresponding to the perspective of the speaker, so that the user can clearly Hear the speaker's voice.
  • the volume of the first audio gradually increases.
  • the weight of the first audio can be dynamically increased over time, so that when the video is played, the first audio can be gradually switched from weak to strong to achieve natural switching.
  • the audio processing method provided by the embodiment of the present application, by gradually increasing the volume of the first audio, the effect of naturally switching from other audios to the first audio in the third audio can be realized, avoiding the sudden change of sound when playing the video.
  • an electronic device including: a plurality of cameras for capturing video images;
  • Audio playback component used to play audio
  • processors one or more processors
  • the one or more computer programs are stored in the memory, the one or more computer programs comprising instructions that, when executed by the electronic device, cause all The electronic device performs the following steps:
  • a video recording mode is entered; in the video recording mode, the first camera records a first video image for the first viewing angle; records audio of multiple sound channels, and the multiple audio channels are recorded.
  • the audio of the sound channels includes the first audio corresponding to the first perspective and the second audio corresponding to the second perspective; at the first moment, the first speaker speaks, and the first speaker is located in the second within the angle of view;
  • the target video file including a third audio and a first video image, wherein the third audio includes at least part of the first audio and at least part of the second audio;
  • the target video file is played;
  • the audio feature includes volume
  • the electronic device when the instruction is executed by the electronic device, the electronic device is caused to perform the following steps: When a video image corresponds to a moment, the volume of the second audio is increased.
  • the electronic device when the instruction is executed by the electronic device, the electronic device is caused to perform the following steps: when the video picture corresponding to the first moment is played, , the volume of the second audio gradually increases.
  • the electronic device when the instruction is executed by the electronic device, the electronic device is caused to perform the following steps: in the recording mode, the second camera The second perspective records a second video picture, the electronic device displays a shooting interface, and the shooting interface includes the first video picture and the second video picture;
  • the target video file also includes the second video picture
  • the electronic device When the electronic device plays the target video file, the electronic device displays the first video picture and the second video picture.
  • the electronic device when the instruction is executed by the electronic device, the electronic device is caused to perform the following steps: in the recording mode, the second camera recording a second video image from the second perspective, the electronic device displays a shooting interface, and the shooting interface does not include the second video image;
  • the electronic device plays the target video file, the electronic device does not display the second video image.
  • the electronic device when the instruction is executed by the electronic device, the electronic device is caused to perform the following steps: in the recording mode, the second camera A second video picture is recorded from the second perspective, and at the first moment, the first speaker in the second video picture opens his mouth.
  • the electronic device when the instruction is executed by the electronic device, the electronic device is caused to perform the following steps: when the screen corresponding to the second moment is played, The volume of the first audio in the third audio is gradually increased.
  • the electronic device includes a first microphone and a second microphone; when the instruction is executed by the electronic device, the electronic device is caused to perform the following steps: In the recording mode, the first microphone records the first audio, and the second microphone records the second audio; or,
  • the first microphone records the second audio
  • the second microphone records the first audio
  • the electronic device includes a first microphone, and the second microphone is wirelessly connected to the electronic device; when the instruction is executed by the electronic device, all The electronic device performs the following steps: in the recording mode, the first microphone records the first audio, and the second microphone records the second audio; or,
  • the first microphone records the second audio
  • the second microphone records the first audio
  • both the first microphone and the second microphone are wirelessly connected to the electronic device; when the instruction is executed by the electronic device, the electronic device is caused to execute The following steps: in the recording mode, the first microphone records the first audio, and the second microphone records the second audio; or,
  • the first microphone records the second audio
  • the second microphone records the first audio
  • the electronic device when the instruction is executed by the electronic device, the electronic device is caused to perform the following step: in the recording mode, buffer the first audio The audio frame, the audio frame of the second audio and the video frame of the first video picture;
  • the audio feature of the first audio in the third audio is adjusted starting from the first i audio frame of the current audio frame, and the audio characteristics of the first audio in the third audio are adjusted. Describes the audio feature of the second audio, where i is greater than or equal to 1.
  • an electronic device comprising: a plurality of cameras for capturing video images; a screen for displaying an interface; an audio playing component for playing audio; one or more processors; a memory; and a or more computer programs, wherein the one or more computer programs are stored in the memory, the one or more computer programs comprising instructions that, when executed by the electronic device, cause the electronic
  • the device performs the following steps:
  • a video recording mode is entered; in the video recording mode, the first camera records a first video image for the first viewing angle; audio of multiple sound channels is recorded, and the multiple The audio of the sound channels includes the first audio corresponding to the first perspective and the second audio corresponding to the second perspective; at the first moment, the first speaker speaks, and the first speaker is located in the first within the angle of view;
  • the target video file including a third audio and a first video image, wherein the third audio includes at least part of the first audio and at least part of the second audio;
  • the target video file is played;
  • the audio feature includes volume
  • the electronic device when the instruction is executed by the electronic device, the electronic device is caused to perform the following steps: When a video image corresponds to a moment, the volume of the first audio is increased.
  • the electronic device when the instruction is executed by the electronic device, the electronic device is caused to perform the following steps: when the video picture corresponding to the first moment is played, , the volume of the first audio gradually increases.
  • an audio processing system comprising an electronic device and at least one wireless microphone, the electronic device is wirelessly connected to the wireless microphone, wherein the electronic device is configured to perform the first aspect or the second
  • the wireless microphone is used to record audio, and send the recorded audio to the electronic device.
  • an apparatus in a sixth aspect, is provided, the apparatus is included in an electronic device, and the apparatus has a function of implementing the behavior of the electronic device in the above-mentioned aspect and possible implementations of the above-mentioned aspect.
  • the functions can be implemented by hardware, or by executing corresponding software by hardware.
  • the hardware or software includes one or more modules or units corresponding to the above functions. For example, a display module or unit, a detection module or unit, a processing module or unit, and the like.
  • a computer-readable storage medium comprising computer instructions that, when the computer instructions are executed on an electronic device, cause the electronic device to perform the implementation of any one of the first aspect or the second aspect. method of audio processing.
  • a computer program product which, when the computer program product runs on a computer, causes the computer to perform the audio processing method according to any one of the first aspect or the second aspect.
  • an electronic device including a screen, a computer memory, and a camera, for implementing the audio processing method according to any one of the implementation manners of the first aspect or the second aspect.
  • FIG. 1 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.
  • FIG. 2 is a schematic diagram of a software structure of an electronic device according to an embodiment of the present application.
  • 3A to 3D are schematic diagrams of user interfaces provided by embodiments of the present application.
  • 4A to 4C are schematic diagrams of possible application scenarios of some audio processing methods provided by embodiments of the present application.
  • 5A and 5B are schematic diagrams of possible application scenarios of other audio processing methods provided by embodiments of the present application.
  • FIG. 6 is a schematic diagram of a possible application scenario of an audio processing method provided by an embodiment of the present application.
  • FIG. 7 is a schematic flowchart of an audio processing method provided by an embodiment of the present application.
  • FIG. 8 is a schematic diagram of an audio weight change provided by an embodiment of the present application.
  • FIG. 9 is a schematic flowchart of another audio processing method provided by an embodiment of the present application.
  • FIG. 10 is a schematic diagram of a multi-channel audio combination provided by an embodiment of the present application.
  • FIG. 11 is a schematic flowchart of another audio processing method provided by an embodiment of the present application.
  • FIG. 12 is a schematic diagram of a multi-channel audio combination provided by an embodiment of the present application.
  • FIG. 13 is a schematic diagram of another multi-channel audio combining provided by an embodiment of the present application.
  • first and second are only used for descriptive purposes, and should not be understood as indicating or implying relative importance and implicitly indicating the number of the indicated technical features.
  • a reference to a “first”, “second” feature may expressly or implicitly include one or more of that feature.
  • plural means two or more.
  • the recording modes can be divided into single-channel recording mode and multi-channel recording mode (or multi-view recording mode).
  • the electronic device can record a single-channel video image during the recording process, that is, record the video image of one line.
  • the single-channel video mode can be divided into the following two situations: (1) the video mode in which the shooting angle is the front shooting angle (hereinafter referred to as the front single video mode); (2) the shooting angle is The recording mode of the rear camera angle (hereinafter referred to as the rear single-channel recording mode).
  • the electronic device can record multi-channel video images during the recording process, that is, record the video images of multiple lines.
  • the video images of different lines may correspond to different shooting angles.
  • the shooting angle of view may be divided according to whether the object to be shot is a front object or a rear object, and/or the size of the zoom factor.
  • the shooting angle of view may include a front angle of view and a rear angle of view; and according to the size of the zoom factor, the rear angle of view may include a wide-angle angle of view (or a rear wide-angle angle of view) and a zoom angle of view (or called the rear zoom angle).
  • the wide-angle angle of view may be a shooting angle of view corresponding to a scene where the zoom factor is less than or equal to the preset value K.
  • the preset value K can be 2, 1.5 or 1, etc.
  • the zoom angle of view may be the shooting angle of view corresponding to the scene where the zoom factor is greater than the preset value K.
  • the front angle of view is the shooting angle of view corresponding to the front-facing shooting scene such as a selfie.
  • the shooting angle of view corresponding to each channel of video images is fixed during this video recording process.
  • the multi-channel video recording in this case may also be called multi-view video recording.
  • the multi-channel video recording mode in this situation can be further divided into the following situations: (1) The video recording mode in which the shooting angle includes the front shooting angle and the rear shooting angle (hereinafter referred to as the front and rear multi-channel recording) (2) the shooting angle of view includes multiple front shooting angles of view, but does not include the recording mode of the rear shooting angle of view (hereinafter referred to as the front multi-channel recording mode); (3) the shooting angle of view includes multiple rear shooting angles, It does not include the video mode of the front shooting angle (hereinafter referred to as the rear multi-channel video mode).
  • the shooting mode and the shooting angle will be described by taking an example that the rear shooting angle of view is a wide-angle angle of view and/or a zooming angle of view. As shown in Table 1, it is the shooting mode and the corresponding shooting angle of view.
  • the shooting angle of view corresponding to the shooting mode may be any one or a combination of any one or more shooting angles of view among the wide-angle angle of view, the zoom angle of view, or the front angle of view.
  • each shooting mode may include one or more lines, and each line may correspond to a shooting angle of view.
  • Shooting modes 1-4 are multi-channel recording modes
  • shooting modes 5-6 are single-channel recording modes.
  • the video images recorded in the shooting mode in the multi-channel recording mode can include any combination of the video images in the wide-angle viewing angle, the video images in the zoom viewing angle, or the video images in the front-viewing angle.
  • the shooting angle of view during the video recording process can be changed during the current video recording process.
  • the speakers in other shooting angles do not speak, only the angle of view can be photographed to obtain the corresponding video image;
  • the wide-angle viewing angle and the front viewing angle can be switched.
  • the electronic device displays the video image corresponding to the wide-angle viewing angle on the shooting preview interface; then, when the first speaker stops speaking and the second speaker starts to speak, the shooting viewing angle is switched to the front video, and the shooting preview interface of the electronic device displays the front video.
  • the shooting preview interface of the electronic device displays the front video.
  • the shooting preview interface of the electronic device can simultaneously display the corresponding two viewing angles. video screen.
  • the electronic device in the single-channel recording mode, while recording a single video picture, can also record multiple channels of audio (that is, the audio of multiple sound channels), and the multiple-channel audio includes The audio corresponding to each video screen.
  • the electronic device can record the audio corresponding to the front perspective (hereinafter referred to as the audio corresponding to the front perspective) while recording the video image corresponding to the front perspective.
  • the electronic device may also record audio corresponding to other perspective ranges outside the front perspective range (hereinafter referred to as audio from other perspectives), for example, recording audio corresponding to the rear perspective.
  • the audio in the front-view range may be the speaker's speaking voice; for example, audio from other viewpoints may be located in areas outside the front-view range. Other people's voices or sounds in the environment, etc.
  • the speaker in the embodiment of the present application may be a person who speaks during the video recording process and whose voice is recorded, such as: a user holding an electronic device; or a photographed object appearing in the video screen; or not appearing In the video footage, but the voice of the person was recorded.
  • the electronic device in the rear single-channel recording mode, can record the video image corresponding to the rear perspective, and simultaneously record the audio corresponding to the rear perspective (hereinafter referred to as the audio corresponding to the rear perspective).
  • the electronic device may also record audio from other perspectives outside the range of the rear perspective, for example, record audio corresponding to the front perspective.
  • the audio in the rear perspective range can be the speaking voice of the speaker; other perspective audios, for example, can be located in areas outside the rear perspective range. other people's voices or other sounds in the environment, etc.
  • the electronic device in the multi-channel recording mode, can record audio corresponding to different shooting angles and video images while recording video images corresponding to multiple shooting angles respectively.
  • the electronic device in the front and rear multi-channel recording mode, can separately record the video images corresponding to the front perspective and the rear perspective, and simultaneously record the audio corresponding to the front perspective and the audio corresponding to the rear perspective.
  • the electronic device can also record audio from other perspectives outside the range of the front and rear perspectives.
  • the audio corresponding to the front view angle may be the voice of the speaker in the front video frame
  • the audio corresponding to the rear perspective may be the speaking voice of the speaker in the rear video image
  • the audio corresponding to the front perspective or the audio corresponding to the rear perspective may also include other sounds in the environment.
  • the audio content corresponding to the wide-angle angle of view may include panoramic sounds (that is, the surrounding 360-degree sound) in all directions, and the audio content corresponding to the zoom angle of view mainly includes the sound within the zoom range.
  • the audio content corresponding to the viewing angle is mainly the sound within the range of the front viewing angle.
  • the electronic device can record the video image corresponding to the wide-angle viewing angle of line 1, and record the audio corresponding to line 1 according to the wide-angle viewing angle;
  • the audio corresponding to line 2 is recorded from the zoom angle of view; the electronic device can record the video image corresponding to the front angle of view of line 3, and record the audio corresponding to line 3 according to the front angle of view.
  • the electronic device in the front multi-channel recording mode, can record multiple video images corresponding to different front perspectives, and simultaneously record audio corresponding to the multi-channel front perspectives. In addition, the electronic device may also record audio from other perspectives outside the range of each front perspective. Wherein, in this mode, if the front video image includes one or more speakers, the audio corresponding to the front view may be the speaking voice of the speaker; or, the audio of the front view may also include other sounds in the environment Wait.
  • the electronic device in the rear multi-channel recording mode, can record multiple video images corresponding to different rear viewing angles, and simultaneously record audio corresponding to the multiple rear viewing angles corresponding to the video images.
  • the electronic device can also record audio from other perspectives outside the range of each rear perspective.
  • the audio corresponding to the rear perspective may be the speaking voice of the speaker; or, the audio of the rear perspective may also include other sounds in the environment Wait.
  • the correspondence between audio and video images from different viewing angles recorded by the electronic device may be: the audio is mainly audio within the viewing angle range corresponding to the video image.
  • the audio content of the audio corresponding to the front viewing angle mainly includes the sound within the range of the front viewing angle
  • the audio corresponding to the rear viewing angle mainly includes the sound within the range of the rear viewing angle.
  • an embodiment of the present application provides an audio processing method, which can be applied to the video recording mode described above.
  • the electronic device after the electronic device enters the video recording mode, it can record video images corresponding to different viewing angles, and simultaneously record multi-channel audio in different viewing angle ranges. Then, the electronic device generates an audio-video file including the video picture and the third audio of the multi-channel audio.
  • the electronic device also plays the third audio while playing the video; during the video playback, if a speaker starts to speak, the volume of the speaker in the third audio will gradually increase, so that the The tri-audio is gradually switched from other voices to the speaker's voice, so that each speaker's voice can be played out clearly.
  • the audio corresponding to the front perspective and the third audio of the audio from other perspectives are also played.
  • the speaker in the front view does not start to speak, it can be considered that the voice of the speaker in the front view does not need to be recorded at this time.
  • the volume of the audio corresponding to the front perspective is higher, and more audio from other perspectives is presented, such as the sound in the environment outside the front perspective range or the voice of other people, so as to obtain the sound that needs to be recorded more; then, the current When the speaker in the video screen starts to speak, the volume of the audio corresponding to the front perspective in the third audio gradually increases, and the volume of the audio from other perspectives can gradually decrease. At this time, the played audio is gradually switched to the audio corresponding to the front perspective. , the user can hear the speaker's voice more clearly and can effectively avoid noise in other viewing angles (eg, noise in the rear viewing angle).
  • other viewing angles eg, noise in the rear viewing angle
  • the volume of the audio from other perspectives in the third audio can gradually increase again, while the volume of the audio corresponding to the front perspective gradually decreases, and the audio played at this time is gradually switched to the speaking voice or environment of other people. other sounds in .
  • the electronic device when the video is played back, the video images corresponding to the front perspective and the rear perspective are played, and the electronic device also plays the audio corresponding to the front perspective and the audio corresponding to the rear perspective. the third audio.
  • the speaker in the front video picture is not speaking, but the speaker in the rear video picture is speaking, the volume of the audio corresponding to the rear perspective in the third audio is relatively high, The volume of the audio corresponding to the front perspective is low or even no sound; then, when the speaker in the front video screen starts to speak, the volume of the audio corresponding to the front perspective in the third audio gradually increases, and the audio corresponding to the rear perspective starts to increase.
  • the third audio is gradually switched from the audio corresponding to the rear perspective to the audio corresponding to the front perspective, so that the third audio presents more of the audio content corresponding to the front perspective;
  • the volume of the audio corresponding to the rear perspective in the third audio gradually increases again, while the volume of the audio corresponding to the front perspective can gradually decrease, and the third audio is gradually switched from the audio corresponding to the front perspective to the rear.
  • the audio corresponding to the front perspective and the audio corresponding to the rear perspective can be switched by repeating the above corresponding process, so that the audio corresponding to the rear perspective can be gradually changed from the audio corresponding to the rear perspective.
  • the effect of switching to the audio corresponding to the front view angle When the speaker in the front video picture and the speaker in the rear video picture speak at the same time, the voice of the speaker in the front video picture and the voice of the speaker in the rear video picture are played.
  • the audio played by the electronic device when the video is played, if the played video picture is a multi-channel video picture composed of a wide-angle video picture and a front view angle, the audio played by the electronic device can be panoramic audio and front view.
  • the third audio of the audio corresponding to the viewing angle; if the played video image is switched to the zoom viewing angle image and the front image, the audio played by the electronic device may be the third audio corresponding to the wide-angle range and the audio corresponding to the front viewing angle.
  • the switching process of each channel of audio in the third audio is similar to the process of switching each channel of audio in the front-rear multi-channel recording mode described above, and will not be repeated here.
  • the audio switching scenarios during video playback in other recording modes are similar to those described above.
  • a speaker starts to speak the volume of the speaker's speaking voice will gradually increase, and the played third audio is gradually switched to the speaker's voice.
  • the speaker starts to speak the volume of the voice of the latest speaker will gradually increase, and the volume of the previous speaker will gradually decrease.
  • the third audio is switched from the previous speaker's voice to the current speaker's voice, so that The user clearly hears the different speakers.
  • the electronic device plays the third audio of the multi-channel audio, and each channel of the audio in the third audio can be switched naturally, thereby improving the user's audio experience of video recording.
  • the audio processing methods provided in the embodiments of the present application can be applied to electronic devices.
  • the electronic device may specifically be a mobile phone, a tablet computer, a wearable device, a vehicle-mounted device, an augmented reality (AR)/virtual reality (VR) device, a notebook computer, an ultra-mobile personal computer (ultra-mobile) personal computer, UMPC), netbook, personal digital assistant (personal digital assistant, PDA) or special camera (for example, a single-lens reflex camera, a card camera), etc.
  • the embodiments of the present application do not impose any restrictions on the specific type of the electronic device.
  • FIG. 1 shows a schematic structural diagram of an electronic device 100 provided by an embodiment of the present application.
  • the electronic device 100 may include a processor 110, an external memory interface 120, an internal memory 121, a universal serial bus (USB) interface 130, a charge management module 140, a power management module 141, a battery 142, an antenna 1, an antenna 2 , mobile communication module 150, wireless communication module 160, audio module 170, speaker 170A, receiver 170B, microphone 170C, headphone jack 170D, sensor module 180, buttons 190, motor 191, indicator 192, camera 193, display screen 194, and Subscriber identification module (subscriber identification module, SIM) card interface 195 and so on.
  • SIM Subscriber identification module
  • the sensor module 180 may include a pressure sensor 180A, a gyroscope sensor 180B, an air pressure sensor 180C, a magnetic sensor 180D, an acceleration sensor 180E, a distance sensor 180F, a proximity light sensor 180G, a fingerprint sensor 180H, a temperature sensor 180J, a touch sensor 180K, and ambient light. Sensor 180L, bone conduction sensor 180M, etc.
  • microphones 170C and multiple cameras 193 there may be multiple microphones 170C and multiple cameras 193 , such as a front-facing camera, a rear-facing camera, and the like.
  • the structures illustrated in the embodiments of the present application do not constitute a specific limitation on the electronic device 100 .
  • the electronic device 100 may include more or less components than shown, or some components may be combined, or some components may be separated, or different component arrangements.
  • the illustrated components may be implemented in hardware, software, or a combination of software and hardware.
  • the processor 110 may include one or more processing units, for example, the processor 110 may include an application processor (application processor, AP), a modem processor, a graphics processor (graphics processing, GPU), an image signal processor ( image signal processor (ISP), audio processor/digital processor (the audio processor), controller, memory, video codec, audio codec, digital signal processor (DSP), baseband processor and/or neural-network processing unit (NPU), etc. Wherein, different processing units may be independent devices, or may be integrated in one or more processors.
  • application processor application processor, AP
  • modem processor graphics processor
  • ISP image signal processor
  • audio processor/digital processor the audio processor
  • controller memory
  • video codec audio codec
  • DSP digital signal processor
  • NPU neural-network processing unit
  • the controller may be the nerve center and command center of the electronic device 100 .
  • the controller can generate an operation control signal according to the instruction operation code and timing signal, and complete the control of fetching and executing instructions.
  • a memory may also be provided in the processor 110 for storing instructions and data.
  • the memory in processor 110 is cache memory. This memory may hold instructions or data that have just been used or recycled by the processor 110 . If the processor 110 needs to use the instruction or data again, it can be called directly from the memory. Repeated accesses are avoided, and the waiting events of the processor 110 are reduced, thereby improving the efficiency of the system.
  • firmware program firmware is stored in the memory, so that the controller or the processor can implement the audio processing method of the present application through an interface or a protocol.
  • the processor 110 may include one or more interfaces.
  • the interface may include an integrated circuit (inter-intergrated circuit, I2C) interface, an integrated circuit built-in audio (inter-intergrated circuit sound, I2S) interface, a pulse code modulation (pluse code modulation, PCM) interface, a universal asynchronous transceiver (universal asynchronous transmitter) receiver/transmitter, UART) interface, mobile industry processor interface (MIPI), general-purpose input/output (GPIO), user identification module interface, and/or universal serial bus interface, etc.
  • I2C integrated circuit
  • I2S integrated circuit built-in audio
  • PCM pulse code modulation
  • PCM pulse code modulation
  • UART universal asynchronous transceiver
  • MIPI mobile industry processor interface
  • GPIO general-purpose input/output
  • user identification module interface and/or universal serial bus interface, etc.
  • the I2C interface is a bidirectional synchronous serial bus that includes a serial data line (SDA) and a serial clock line (SCL).
  • the processor 110 may contain multiple sets of I2C buses.
  • the processor 110 can be respectively coupled to the touch sensor 180K, the microphone, the camera 193 and the like through different I2C bus interfaces.
  • the processor 110 may be coupled to the touch sensor 180K through an I2C interface, and the processor 110 communicates with the touch sensor 180K through an I2C bus interface to implement the touch function of the electronic device 100 .
  • the I2S interface can be used for audio data transmission.
  • the processor 110 may contain multiple sets of I2S buses.
  • the processor 110 may be coupled with the audio module 170 through an I2S bus to implement communication between the processor 110 and the audio module 170 .
  • the audio module 170 may receive audio signals through an I2S interface to implement the function of recording audio.
  • the PCM interface can also be used for audio communications, sampling, quantizing and encoding analog signals.
  • the audio module 170 and the wireless communication module 160 may be coupled through a PCM bus interface.
  • the audio module 170 can also transmit audio signals to the wireless communication module 160 through the PCM interface, so as to realize the function of answering calls through the Bluetooth headset; Audio data collected by the microphone.
  • the UART interface is a universal serial data bus used for asynchronous communication.
  • the bus may be a bidirectional communication bus. It converts the data to be transmitted between serial communication and parallel communication.
  • a UART interface is typically used to connect the processor 110 with the wireless communication module 160 .
  • the processor 110 communicates with the Bluetooth module in the wireless communication module 160 through the UART interface to implement the Bluetooth function.
  • the audio module 170 can receive the audio signal transmitted by the Bluetooth module through the UART interface, so as to realize the function of recording audio through the wireless microphone in the Bluetooth headset.
  • the MIPI interface can be used to connect the processor 110 with peripheral devices such as the display screen 194 and the camera 193 .
  • MIPI interfaces include camera serial interface (CSI), display serial interface (DSI), etc.
  • the processor 110 communicates with the camera 193 through a CSI interface, so as to realize the photographing function of the electronic device 100 .
  • the processor 110 communicates with the display screen 194 through the DSI interface to implement the display function of the electronic device 100 .
  • the GPIO interface can be configured by software. GPIO can be configured as a control signal or as a data signal. In some embodiments, the GPIO interface may be used to connect the processor 110 with the camera 193, the display screen 194, the wireless communication module 160, the audio module 170, the sensor module 180, and the like.
  • the GPIO interface can also be configured as I2C interface, I2S interface, UART interface, MIPI interface, etc.
  • the USB interface 130 is an interface that conforms to the USB standard specification, and may specifically be a Mini USB interface, a Micro USB interface, a USB Type C interface, and the like.
  • the USB interface 130 can be used to connect a charger to charge the electronic device 100, and can also be used to transmit data between the electronic device 100 and peripheral devices. It can also be used to connect headphones to play audio through the headphones.
  • the interface can also be used to connect other electronic devices, such as AR devices.
  • the interface connection relationship between the modules illustrated in the embodiments of the present application is only a schematic illustration, and does not constitute a structural limitation of the electronic device 100 .
  • the electronic device 100 may also adopt different interface connection manners in the foregoing embodiments, or a combination of multiple interface connection manners.
  • the charging management module 140 is used to receive charging input from the charger.
  • the charger may be a wireless charger or a wired charger.
  • the power management module 141 is used to connect the battery 142, the charging management module 140 and the processor 110.
  • the power management module 141 receives input from the battery 142 and/or the charging management module 140 to supply power to the processor 110 , the internal memory 121 , the external memory, the display screen 194 , the camera 193 , and the wireless communication module 163 .
  • the power management module 141 may also be used to monitor parameters such as battery capacity, battery cycle times, and battery health status (leakage, impedance).
  • the power management module 141 may also be provided in the processor 110 .
  • the power management module 141 and the charging management module 140 may also be provided in the same device.
  • the wireless communication function of the electronic device 100 may be implemented by the antenna 1, the antenna 2, the mobile communication module 150, the wireless communication module 160, the modulation and demodulation processor, the baseband processor, and the like.
  • Antenna 1 and Antenna 2 are used to transmit and receive electromagnetic wave signals.
  • Each antenna in electronic device 100 may be used to cover a single or multiple communication frequency bands. Different antennas can also be reused to improve antenna utilization.
  • the mobile communication module 150 may provide wireless communication solutions including 2G/3G/4G/5G etc. applied on the electronic device 100 .
  • the modem processor may include a modulator and a demodulator.
  • the modulator is used to modulate the low frequency baseband signal to be sent into a medium and high frequency signal.
  • the demodulator is used to demodulate the received electromagnetic wave signal into a low frequency baseband signal.
  • the wireless communication module 160 can provide applications on the electronic device 100 including wireless local area networks (WLAN) (such as wireless fidelity (WI-FI) networks), bluetooth (BT), Beidou satellite navigation System (BeiDou navigation satellite system, BDS), global navigation satellite system (global navigation satellite system, GNSS), frequency modulation (frequency modulation, FM), near field communication technology (near field communication, NFC), infrared technology (infrared, IR ) and other wireless communication solutions.
  • WLAN wireless local area networks
  • WI-FI wireless fidelity
  • BDS Beidou satellite navigation System
  • GSS global navigation satellite system
  • FM frequency modulation
  • NFC near field communication technology
  • infrared technology infrared, IR
  • the electronic device 100 implements a display function through a graphics processing unit (graphics processing unit, GPU), a display screen 194, an application processor, and the like.
  • the GPU is a microprocessor for image processing, and is connected to the display screen 194 and the application processor.
  • the GPU is used to perform data and geometric calculations, and is used for graphics rendering.
  • Processor 110 may include one or more GPUs that execute program instructions to generate or alter display information.
  • Display screen 194 is used to display images, videos, and the like.
  • Display screen 194 includes a display panel.
  • the display panel can use liquid crystal display (LCD), organic light-emitting diode (OLED), active-matrix organic light-emitting diode (AMOLED), flexible light-emitting diode Diode (flex light-emitting diode, FLED), quantum dot light emitting diode (quantum dot light emitting diodes, QLED), etc.
  • LCD liquid crystal display
  • OLED organic light-emitting diode
  • AMOLED active-matrix organic light-emitting diode
  • FLED flexible light-emitting diode Diode
  • QLED quantum dot light emitting diode
  • electronic device 100 may include one or more display screens 194 .
  • the electronic device 100 may implement a shooting function through an image signal processor (image signal processor, ISP), a camera 193, a video codec, a GPU, a display screen 194, an application processor, and the like.
  • the camera 193 may include a front camera and a rear camera of the electronic device 100, which may be an optical zoom lens or the like, which is not limited in this application.
  • the ISP may be set in the camera 193, which is not limited in this application.
  • Camera 193 is used to capture still images or video.
  • the object is projected through the lens to generate an optical image onto the photosensitive element.
  • the photosensitive element can be a charge-coupled device (CCD) or a complementary metal-oxide-semiconductor (CMOS) phototransistor.
  • CMOS complementary metal-oxide-semiconductor
  • the photosensitive element converts the optical signal into an electrical signal, and then transmits the electrical signal to the ISP to convert it into a digital image signal.
  • the ISP outputs the digital image signal to the DSP for processing.
  • DSP converts digital image signals into standard RGB, YUV and other formats of image signals.
  • the electronic device 100 may include one or more cameras 193 .
  • the electronic device 100 may include multiple cameras 193, such as at least one front-facing camera and a rear-facing camera, multiple front-facing cameras or multiple rear-facing cameras, and the like.
  • a digital signal processor is used to process digital signals, in addition to processing digital image signals, it can also process other digital signals. For example, when the electronic device 100 selects a frequency point, the digital signal processor is used to perform Fourier transform on the frequency point energy and so on.
  • Video codecs are used to compress or decompress digital video.
  • the electronic device 100 may support one or more video codecs.
  • the electronic device 100 can play or record videos of various encoding formats, for example, moving picture experts group (moving picture experts group, MPEG) 1, MPEG2, MPEG3, MPEG4 and so on.
  • MPEG moving picture experts group
  • the NPU is a neural-network (NN) computing processor.
  • NN neural-network
  • Applications such as intelligent cognition of the electronic device 100 can be implemented through the NPU, for example, image recognition, face recognition, speech recognition, text understanding, and the like.
  • the external memory interface 120 can be used to connect an external memory card, such as a Micro SD card, to expand the storage capacity of the electronic device 100 .
  • the external memory card communicates with the processor 110 through the external memory interface 120 to realize the data storage function. For example, save audio, video, etc. files in an external memory card.
  • the internal memory card 121 may be used to store computer executable program code including instructions.
  • the processor 110 executes various functional applications and data processing of the electronic device 100 by executing the instructions stored in the internal memory 121 .
  • the internal memory 121 may include a storage program area and a storage data area.
  • the storage program area can store an operating system, an application program required for at least one function (such as an audio playback function, an image playback function, etc.), and the like.
  • the storage data can store data (such as audio data, phone book, etc.) created during the use of the electronic device 100 and the like.
  • the internal memory 121 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, universal flash storage (UFS), and the like.
  • the electronic device 100 may implement audio functions through an audio module 170, a speaker 170A, a receiver 170B, a microphone 170C, an earphone interface 170D, an application processor, and the like. For example, audio playback, recording, etc.
  • the audio module 170 is used for converting digital audio information into analog signal output, and also for converting analog analog audio input into digital audio signal. Audio module 170 may also be used to encode and decode audio signals. In some embodiments, the audio module 170 may be provided in the processor 110 , or some functional modules of the audio module 170 may be provided in the processor 110 .
  • Speaker 170A also referred to as a "speaker" is used to convert audio electrical signals into sound signals.
  • the electronic device 100 can enable the user to listen to audio, or listen to a hands-free call or the like through the speaker 170A.
  • the receiver 170B also referred to as "earpiece" is used to convert audio electrical signals into sound signals.
  • the voice can be received by placing the receiver 170B against the human ear.
  • Microphone 170C also referred to as “microphone” or “microphone”, is used to convert sound signals into electrical signals.
  • the user can make a sound by approaching the microphone 170C through a human mouth, and input the sound signal into the microphone 170C.
  • the electronic device 100 may be provided with at least two microphones 170C, such as a local microphone or a wireless microphone. In other embodiments, the electronic device 100 may be provided with three, four or more microphones 170C to collect sound signals, reduce noise, identify sound sources, and implement directional recording functions.
  • the electronic device may collect multiple audio channels through the plurality of microphones 170C.
  • the electronic device can also capture audio through a wireless microphone that is wirelessly connected to the electronic device.
  • the plurality of microphones 170C can convert the acquired sound signals into electrical signals and transmit them to the processor 110 .
  • the audio processor in the processor 110 After receiving the multi-channel audio signals, the audio processor in the processor 110 performs processing on the multi-channel audio signals. Processing, such as encoding each channel audio through an audio codec, etc.
  • the earphone jack 170D is used to connect wired earphones.
  • the earphone interface 170D may be the USB interface 130, or may be a 3.5mm open mobile terminal platform (OMTP) standard interface, a cellular telecommunications industry association of the USA (CTIA) standard interface.
  • OMTP open mobile terminal platform
  • CTIA cellular telecommunications industry association of the USA
  • the electronic device 100 can measure the distance through infrared or laser. In some embodiments, when shooting a scene, the electronic device 100 can use the distance sensor 180F to measure the distance to achieve fast focusing.
  • the fingerprint sensor 180H is used to collect fingerprints.
  • the electronic device 100 can use the collected fingerprint characteristics to realize fingerprint unlocking, accessing application locks, taking pictures with fingerprints, answering incoming calls with fingerprints, and the like.
  • Touch sensor 180K also called “touch panel”.
  • the touch sensor 180K may be disposed on the display screen 194 , and the touch sensor 180K and the display screen 194 form a touch screen, also called a “touch screen”.
  • the touch sensor 180K is used to detect a touch operation on or near it.
  • the touch sensor can pass the detected touch operation to the application processor to determine the type of touch event.
  • Visual output related to touch operations may be provided through display screen 194 .
  • the touch sensor 180K may also be disposed on the surface of the electronic device 100 , which is different from the location where the display screen 194 is located.
  • FIG. 2 is a block diagram of the software structure of the electronic device 100 according to the embodiment of the present application.
  • the layered architecture divides the software into several layers, and each layer has a clear role and division of labor. Layers communicate with each other through software interfaces.
  • the Android system is divided into four layers, which are, from top to bottom, an application layer, an application framework layer, an Android runtime (Android runtime) and a system library, and a hardware abstraction layer (HAL) , and the kernel layer.
  • the application layer can include a series of application packages.
  • the application package can include applications such as camera, gallery, calendar, call, map, navigation, WLAN, Bluetooth, music, video, short message and so on.
  • the application framework layer provides an application programming interface (application programming interface, API) and a programming framework for applications in the application layer.
  • the application framework layer includes some predefined functions.
  • the application framework layer may include window managers, content providers, view systems, telephony managers, resource managers, notification managers, and the like.
  • a window manager is used to manage window programs.
  • the window manager can get the size of the display screen, determine whether there is a status bar, lock the screen, take screenshots, etc.
  • Content providers are used to store and retrieve data and make these data accessible to applications.
  • Data can include videos, images, audio, calls made and received, browsing history and bookmarks, phone book, etc.
  • the view system includes visual controls, such as controls for displaying text, controls for displaying pictures, and so on. View systems can be used to build applications.
  • a display interface can consist of one or more views.
  • the display interface including the short message notification icon may include a view for displaying text and a view for displaying pictures.
  • the phone manager is used to provide the communication function of the electronic device 100 .
  • the management of call status including connecting, hanging up, etc.).
  • the resource manager provides various resources for the application, such as localization strings, icons, pictures, layout files, video files and so on.
  • the notification manager enables applications to display notification information in the status bar, which can be used to convey notification-type messages, and can disappear automatically after a brief pause without user interaction. For example, the notification manager is used to notify download completion, message reminders, etc.
  • the notification manager can also display notifications in the status bar at the top of the system in the form of graphs or scroll bar text, such as notifications of applications running in the background, and notifications on the screen in the form of dialog windows. For example, text information is prompted in the status bar, a prompt sound is issued, the electronic device vibrates, and the indicator light flashes.
  • Android Runtime includes core libraries and a virtual machine. Android runtime is responsible for scheduling and management of the Android system.
  • the core library consists of two parts: one is the function functions that the java language needs to call, and the other is the core library of Android.
  • the application layer and the application framework layer run in virtual machines.
  • the virtual machine executes the java files of the application layer and the application framework layer as binary files.
  • the virtual machine is used to perform functions such as object lifecycle management, stack management, thread management, safety and exception management, and garbage collection.
  • a system library can include multiple functional modules. For example: surface manager (surface manager), media library (Media Libraries), 3D graphics processing library (eg: OpenGL ES), 2D graphics engine (eg: SGL), etc.
  • surface manager surface manager
  • media library Media Libraries
  • 3D graphics processing library eg: OpenGL ES
  • 2D graphics engine eg: SGL
  • the Surface Manager is used to manage the display subsystem and provides a fusion of 2D and 3D layers for multiple applications.
  • the media library supports playback and recording of a variety of commonly used audio and video formats, as well as still image files.
  • the media library can support a variety of audio and video encoding formats, such as: MPEG4, H.264, MP3, AAC, AMR, JPG, PNG, etc.
  • the 3D graphics processing library is used to implement 3D graphics drawing, image rendering, compositing, and layer processing.
  • 2D graphics engine is a drawing engine for 2D drawing.
  • the HAL layer is an interface layer between the operating system kernel and the hardware circuit, which can abstract the hardware.
  • the HAL layer includes audio processing modules.
  • the audio processing module can be used to process the analog audio electrical signal obtained by the microphone according to the shooting angle of view, and generate audio corresponding to different shooting angles of view and video images.
  • the audio processing module may include a timbre correction module, a stereo beamforming module, a gain control module, and the like.
  • the audio processing module may include a timbre correction module, a stereo/mono beamforming module, an ambient noise control module, a gain control module, and the like.
  • the audio processing module may include a timbre correction module, a stereo/mono beam rendering module, a vocal enhancement module and a gain control module, etc.
  • the kernel layer is the layer between the hardware layer and the aforementioned software layer.
  • the kernel layer contains at least display drivers, camera drivers, audio drivers, and sensor drivers.
  • the hardware layer may include a camera, a display screen, a microphone, a processor, and a memory.
  • the display screen in the hardware layer can display the shooting preview interface, the video preview interface and the shooting interface during video recording.
  • the camera in the hardware layer can be used to capture multiple video images.
  • Microphones in the hardware layer can be used to collect sound signals and generate analog audio electrical signals.
  • the audio processing module in the HAL layer can be used to process the digital audio data converted from the analog audio electrical signal, so as to generate audio corresponding to different shooting angles and video images.
  • the display screen can display the video playback interface, and the speaker can play the multi-channel audio that the user cares about and the third audio of the multi-channel audio, thereby improving the audio experience of the user's multi-channel recording.
  • FIGS. 3A to 3D provide schematic diagrams of a graphical user interface (graphical user interface, GUI) in the audio processing process.
  • GUI graphical user interface
  • the screen display system of the mobile phone displays the currently output interface content 301
  • the interface content 301 is the main interface of the mobile phone.
  • the interface content 301 displays multiple applications (application, App), such as gallery, settings, music, camera and other applications. It should be understood that the interface content 301 may also include other more applications, which are not limited in this application.
  • the shooting interface 303 may include a viewfinder frame, an album icon, a shooting control 304, a rotation control, and the like.
  • the viewfinder frame is used to obtain the image of the shooting preview, and display the preview image in real time, such as the preview image of the person in the rear perspective as shown in FIG. 3B .
  • the album icon is used to quickly enter the album.
  • the shooting control 304 is used for shooting or video recording.
  • the mobile phone detects that the user clicks on the shooting control 304, the mobile phone performs a shooting operation and saves the shot; Perform the video recording operation and save the recorded video.
  • the camera rotation controls are used to control the switching between the front camera and the rear camera.
  • the photographing interface 303 also includes function controls for setting photographing modes, such as aperture photographing mode, night scene photographing mode, portrait photographing mode, photographing mode, video recording mode, professional mode and more as shown in FIG. 3B .
  • more modes may also include slow motion mode, panorama mode, black and white art mode, dual scene video mode, filter mode, high-dynamic range image (high-dynamic range, HDR) mode and many more Road recording mode (not shown in the figure), etc.
  • the mobile phone defaults to the camera mode after opening the camera application, which is not limited in this application.
  • the electronic device when the electronic device detects that the user clicks the recording icon on the camera's shooting interface 303, it can enter the single-channel recording mode, for example, enter the rear single-channel recording mode by default; when the electronic device detects that the user clicks on the camera rotation control, The viewing angle of the recording is switched from the rear viewing angle to the front viewing angle, and the recording mode is switched to the front single-channel recording mode.
  • an interface as shown in FIG. 3D is displayed, and the interface may be referred to as a more function interface.
  • the electronic device detects that the user clicks the dual-view recording icon on the more function interface, it enters the dual-view recording mode.
  • the image preview interface of the electronic device displays a video image from a front perspective and a video image from a rear perspective (for example, a zoom perspective) by default.
  • the video screen displayed on the image preview interface can be switched. For example, if it is detected that the user clicks the camera rotation control once, the image preview interface displays dual front video buttons, and it is detected that the user clicks the camera rotation control again, and the image preview interface displays dual rear video images, etc.
  • speaker 1 is the first speaker
  • speaker 2 is the second speaker
  • audio 2 is the first audio
  • audio 1 is the second audio
  • speaker 1 is the second speaker
  • speaker 2 is the first speaker
  • audio 1 is the first audio
  • audio 2 is the second audio
  • speaker 1 is the first speaker
  • speaker 2 is the second speaker
  • audio 1 is the first audio
  • audio 2 is the second audio.
  • speaker 1 is the first speaker
  • speaker 2 is the second speaker
  • audio 1 is the first audio
  • audio 2 is the second audio.
  • the multi-channel audio can be recorded by a plurality of microphones.
  • the electronic device includes a plurality of microphones (the microphones of the electronic device may be referred to as local microphones), and the plurality of local microphones may be arranged in the electronic device. different positions to record audio from different perspectives; in one embodiment, the electronic device can be wirelessly connected to at least one wireless microphone, the electronic device can use the audio collected by one wireless microphone to record audio from a perspective, and the electronic device can also Use the audio collected by multiple wireless microphones to record audio from multiple perspectives.
  • the wireless microphone may be, for example, a wireless headset with a recording function, a wireless speaker, a tablet computer, a wearable device, or a mobile phone of other users. Recording audio using the audio captured by a wireless microphone allows for a clearer capture of the voice of the speaker in the camera's perspective.
  • the video recording modes in the embodiments of the present application can be divided into the following modes: front single-channel recording mode, rear single-channel recording mode, front and rear Multi-channel recording mode, front multi-channel recording mode and rear multi-channel recording mode.
  • the video recording in the embodiment of the present application can be divided into: outdoor video recording venue and indoor video recording.
  • the video recording in the embodiments of the present application can be divided into: a situation in which multiple local microphones participate in recording, and a wireless microphone does not participate in recording; a plurality of wireless microphones participate in recording, The local microphone does not participate in the recording; the local microphone and the wireless microphone both participate in the recording.
  • the electronic device can record multi-channel audio through the audio collected by at least one local microphone and/or at least one wireless microphone, and the multi-channel audio includes at least the sound of each shooting angle of view.
  • the audio processing method provided by the embodiment of the present application can be applied to various combination scenarios of the above-mentioned recording venue, recording mode, and recording type.
  • the video recording process involved in the audio processing method provided by the embodiment of the present application is described below with reference to some of the combination scenarios.
  • the electronic device may establish a wireless connection with the wireless microphone in advance.
  • the electronic device may display a prompt message, prompting the user which microphones to use for recording, and may prompt the user whether A wireless microphone is required to participate in this recording; the user can click the confirm or cancel button as needed; when the electronic device detects that the user clicks the cancel button, multiple local microphones can be activated to record multi-channel audio; when the electronic device detects that the user clicks the confirm button , you can continue to prompt the user to choose which wireless microphone to record, so that the user can choose the available wireless microphones, and the electronic device can also prompt the user whether the local microphone is required to participate in the recording; when the electronic device detects that the user does not need the local microphone to participate in the recording , then during the recording process, multiple wireless microphones are used for recording; when the electronic device detects that the user chooses that the local microphone is required to
  • the electronic device can first prompt the user whether the local microphone is required to participate in the recording, and detects the selection input by the user. Then, prompt the user whether a wireless microphone is required to participate in the recording, and prompt the user to select which microphone or microphones to participate in the recording, etc. This application does not limit this.
  • FIG. 4A it is a schematic diagram of a recording scene.
  • This scenario can be an outdoor venue where the electronic device uses the local microphone and the wireless microphone to record multi-channel audio in the front single-channel recording mode.
  • the electronic device After the electronic device enters the front single-channel recording mode, as shown in FIG. 4A , the electronic device records the image of the speaker 1 in the front view through the front camera, and the shooting preview interface of the electronic device displays the front Video screen; and during the recording process, the local microphone of the electronic device records the speaking voice of speaker 1 (referred to as audio 1), and the wireless microphone located at position 1 (can be the wireless headset of speaker 2 or the one on the mobile phone and other devices). microphone) to record the sound within its pickup range, such as the speaking voice of speaker 2 (recorded as audio 2). Audio 1 and Audio 2 may be stored in the electronic device's buffer.
  • the speaking voice of speaker 1 referred to as audio 1
  • the wireless microphone located at position 1 can be the wireless headset of speaker 2 or the one on the mobile phone and other devices.
  • Audio 1 and Audio 2 may be stored in the electronic device's buffer.
  • position 1 may be outside the range of the front viewing angle, for example, position 1 is located in the rear viewing angle. However, in some other implementations, position 1 may also be within the range of the front viewing angle.
  • the front video image displayed on the shooting preview interface may also include the image of the speaker 2 .
  • the wireless microphone can send the audio 2 to the electronic device through the wireless connection.
  • the electronic device stops recording and exits the recording mode in response to the user's click operation.
  • the electronic device packages the audio and video to generate a video file, where the video file includes a pre-video image and a third audio, and the third audio includes at least part of audio 1 and at least part of audio 2 .
  • the video file includes a pre-video image and a third audio
  • the third audio includes at least part of audio 1 and at least part of audio 2 .
  • audio 1 and audio 2 are being recorded all the time
  • the third audio includes part of audio 1 and part of audio 2, which is formed by combining part of audio 1 and part of audio 2, and during the switching between audio 1 and audio 2, the third audio is composed of audio 1 and audio 2. It is combined with Audio 2 according to the set weight.
  • Audio 1 is only recorded when speaker 1 is speaking
  • audio 2 is recorded only when speaker 2 is speaking
  • the third audio includes all audio 1 and all audio 2, which are combined by audio 1 and audio 2, and are recorded in the audio
  • Audio 1 and Audio 2 are combined according to the set weight.
  • the electronic device can save the video file, which can be stored in an internal memory (internal memory) or an external memory (external memory), such as in an album icon.
  • the video file (that is, the target video file) finally saved in the album is the video file processed by the electronic device. , and increase the volume of the audio 1 in the third audio, so that the volume of the audio 1 increases when the speaker starts to speak, etc.
  • the above processing process can be completed inside the electronic device until the final video file is obtained and saved in the album.
  • the rear camera when the electronic device records the front view through the front camera, the rear camera also records the rear view in the background, and the shooting preview interface of the electronic device does not display the rear video
  • the rear video image recorded by the rear camera is stored, for example, in the cache of the electronic device, so as to detect the action of the speaker opening two mouths. For example, at time t1, the speaker opening two mouths and starts to speak.
  • the electronic device displays the front video picture, and when the picture corresponding to time t1 is played, the audio characteristics of the audio 2 change, for example, the sound of the audio 2 increases.
  • the front video image recorded by the front camera of the electronic device is stored, for example, in the cache of the electronic device, so as to detect the action of the speaker 1 opening his mouth, for example, at time t2, the speaker 1 Open your mouth and start talking.
  • the electronic device displays the front video picture, and when the picture corresponding to time t2 is played, the audio characteristics of the audio 1 change, for example, the sound of the audio 1 increases.
  • the rear camera when the electronic device records the front view angle through the front camera, the rear camera also records the rear view angle in the background, and the shooting preview interface of the electronic device does not display the rear video picture, while the front video picture and The rear video images are stored to detect the actions of speaker 1 and speaker 2. For example, at time t3, speaker 1 opens his mouth and starts to speak; at time t4, speaker 2 opens his mouth and starts to speak.
  • the electronic device displays the front video picture, and when the picture corresponding to time t3 is played, the audio characteristics of the audio 1 change, for example, the sound of the audio 1 increases.
  • the audio characteristics of audio 2 change, for example, the sound of audio 2 increases.
  • the above processing process can also be completed on a cloud server.
  • the electronic device and the wireless microphone can send the acquired video and audio to the cloud server; or, the wireless microphone first sends the recorded audio to the electronic device, and then the electronic device sends it to the cloud server Then the cloud server completes the above processing process, generates the final video file, and sends it to the electronic device; the electronic device then saves the video file in the album.
  • this processing method can be adopted in each video recording scene, and in order to avoid repetition, in the description of other scenes below, it will not be repeated below.
  • FIG. 4B it is a schematic diagram of a situation in which the electronic device uses two wireless microphones to record multi-channel audio in the rear single-channel recording mode.
  • the wireless microphone 1 can be, for example, a wireless headset, worn by the speaker 1 in the front view
  • the wireless microphone 2 can be, for example, the mobile phone (or wireless headset) of the speaker 2, and is worn by the speaker in the rear view range. 2 carry.
  • the front view can also be recorded through the native microphone.
  • the types of wireless microphones are not limited to the wireless earphones and mobile phones shown in FIG. 4B , but may also be other devices with a recording function, which are not limited in this application.
  • the electronic device after the electronic device enters the rear single-channel recording mode, as shown in FIG. 4B , the electronic device records the video image of the speaker 2 in the rear perspective through the rear camera, and the The shooting preview interface displays the rear video screen; and during the recording process, the wireless microphone 1 worn by the speaker 1 records the speaking voice of the speaker 1 (referred to as audio 1), and the wireless microphone 2 carried by the speaker 2 records the speaker 2 speaks sound (denoted as Audio 2).
  • the front camera of the electronic device is turned on in the background to record the image of the speaker 1, wherein the image of the speaker 1 is used for the audio processing of the electronic device. , to identify whether speaker 1 speaks.
  • the front camera is turned on in the background mentioned here, which means that during the recording process, the front camera captures the video image of the front perspective in real time, but the shooting preview interface does not display the front video image; , the video file does not include the pre-video image, and the playback interface does not display the pre-video image during subsequent video playback.
  • the front camera when the electronic device records the rear perspective through the front camera, the front camera also records the front perspective in the background, and the shooting preview interface of the electronic device does not display the front video
  • the front video screen recorded by the front camera is stored, for example, in the cache of the electronic device, so as to detect the action of the speaker opening his mouth. For example, at time t5, the speaker opens his mouth and starts to speak.
  • the electronic device displays the front video picture, and when the picture corresponding to time t5 is played, the audio characteristics of the audio 1 change, for example, the sound of the audio 1 increases.
  • the rear video image recorded by the electronic device through the rear camera is stored, for example, in the cache of the electronic device, so as to detect the mouth opening action of the speaker 2, for example, at time t6, the speaker 2 Open your mouth and start talking.
  • the electronic device displays the front video picture, and when the picture corresponding to time t6 is played, the audio characteristics of the audio 2 change, for example, the sound of the audio 2 increases.
  • the front camera when the electronic device records the rear view angle through the rear camera, the front camera also records the front view angle in the background, and the shooting preview interface of the electronic device does not display the front video picture, while the front video picture and the front video picture are not displayed.
  • the rear video images are stored to detect the mouth movements of speaker 1 and speaker 2. For example, at time t7, speaker 2 opens his mouth and starts to speak; at time t8, speaker 1 opens his mouth and starts to speak.
  • the electronic device displays the front video picture, and when the picture corresponding to time t7 is played, the audio characteristics of the audio 2 change, for example, the sound of the audio 2 increases.
  • the audio characteristics of audio 1 change, for example, the sound of audio 1 increases.
  • the wireless microphone 1 sends the audio 1 to the electronic device, and the wireless microphone 2 sends the audio 2 to the electronic device.
  • the electronic device stops recording and exits the recording mode in response to the user's click operation.
  • the electronic device After the video recording is completed, the electronic device generates a video recording file, where the video recording file includes a pre-video image and a third audio, and the third audio is the third audio of audio 1 and audio 2 .
  • the electronic device can save the video file, for example, in the album icon.
  • the video file finally saved in the album is the video file processed by the electronic device.
  • the multi-channel audio is merged and the image recognition is performed on the speaker 1.
  • the third The volume of audio 1 in the audio so that the volume of audio 1 increases when the speaker starts to speak.
  • the above processing process can be completed inside the electronic device until the final video file is obtained and saved in the album.
  • FIG. 4C it is a schematic diagram of a situation in which the electronic device uses two wireless microphones to record multi-channel audio in the front and rear multi-channel recording mode.
  • the wireless microphone 1 can be, for example, a wireless headset, which is worn by the speaker 1 in the front view
  • the wireless microphone 2 can be, for example, the wireless headset (or mobile phone) of the speaker 2, and is worn by the speaker in the rear view range. 2 carry.
  • the front view can also be recorded through the native microphone.
  • the types of wireless microphones are not limited to the wireless earphones and mobile phones shown in FIG. 4C , but may also be other devices with a recording function, which are not limited in this application.
  • the electronic device after the electronic device enters the front and rear multi-channel recording mode, as shown in FIG. 4C , the electronic device records the video image of the speaker 1 in the front view through the front camera, Record the video image of the speaker 2 in the rear perspective, and the shooting preview interface of the electronic device displays the front video screen and the rear video screen; and during the recording process, the wireless microphone 1 worn by the speaker 1 records the sound of the speaker 1.
  • the wireless microphone 2 carried by speaker 2 records the voice of speaker 2 (denoted as audio 2).
  • an image of the speaker 1 is recorded, wherein the image of the speaker 1 is used for the electronic device to identify whether the speaker 1 speaks during audio processing.
  • the front camera is turned on in the background mentioned here, which means that during the recording process, the front camera captures the video image of the front perspective in real time, but the shooting preview interface does not display the front video image; , the video file does not include the pre-video image, and the playback interface does not display the pre-video image during subsequent video playback.
  • the front video picture and the rear video picture are stored at the same time, so as to detect the action of the speaker 1 and the speaker 2 opening their mouths, for example, at time t9, the speaker 2 opens its mouth and starts to speak; At time t10, the speaker 1 opens his mouth and starts to speak.
  • the electronic device displays the front video picture and the rear video picture.
  • the audio characteristics of audio 2 change, for example, the sound of audio 2 increases.
  • the audio characteristics of audio 1 change, for example, the sound of audio 1 increases.
  • the wireless microphone 1 sends the audio 1 to the electronic device, and the wireless microphone 2 sends the audio 2 to the electronic device.
  • the electronic device When the user clicks the close recording control, the electronic device exits the recording mode in response to the user's click operation. After the video recording is completed, the electronic device generates a video recording file, where the video recording file includes a pre-video image and a third audio, and the third audio is the third audio of audio 1 and audio 2 . Among them, the electronic device can save the video file, for example, in the album icon.
  • the video file finally saved in the album is the video file processed by the electronic device.
  • the multi-channel audio is merged and the image recognition is performed on the speaker 1.
  • the third The volume of audio 1 in the audio so that the volume of audio 1 increases when the speaker starts to speak.
  • the above processing process can be completed inside the electronic device until the final video file is obtained and saved in the album.
  • the electronic device records two channels of audio when recording.
  • the electronic device when the electronic device performs video recording, it can also record more than three channels of audio, and the third audio can include more than three channels of audio.
  • the first audio, the second audio and the third audio can also be stored in the internal memory or the external memory, and the user can select and synthesize different audios to increase flexibility.
  • the user may also be prompted to select a wireless microphone at a suitable location for recording based on the positioning function between the electronic device and the wireless microphone.
  • FIG. 5A and Figure 5B it is a schematic diagram of the scene where the electronic device uses the local microphone and the wireless microphone to jointly participate in the recording in the indoor scene, in the front and rear dual-channel recording mode.
  • the electronic device and the wireless microphone can be connected to the same access point AP, or use the same WI-FI.
  • the electronic device sends a broadcast message, and the broadcast message is used to request the establishment of a wireless connection (such as pairing); after receiving the broadcast message, the wireless microphone establishes a wireless connection with the electronic device according to the broadcast message, that is, to achieve pairing; Alternatively, the wireless microphone sends a broadcast message requesting a wireless connection, and after receiving the broadcast message, the electronic device establishes a wireless connection with the wireless microphone according to the broadcast message.
  • a wireless connection such as pairing
  • the wireless microphone sends a broadcast message requesting a wireless connection
  • the electronic device establishes a wireless connection with the wireless microphone according to the broadcast message.
  • the above-mentioned process of establishing the wireless connection may occur when the electronic device starts the video recording mode, for example: the electronic device sends the above-mentioned broadcast message in response to the start of the video-recording mode, and performs the above-mentioned pairing process; or, the process of establishing the wireless connection can also occur in the before recording.
  • the electronic device when performing dual-view recording before and after, the user operates according to FIG. 3A to FIG. 3C, and then the electronic device detects that the user clicks the dual-view recording icon, and in response to the click action, the electronic device can display a prompt message, as shown in the figure.
  • the content of the prompt message can be, for example: "It is found that there are available wireless microphones around, do you want to select a wireless microphone to participate in video recording?"
  • the electronic device detects that the user clicks the "select" option, it can continue to display the prompt message, such as As shown in FIG. 5B , the content of the prompt message can be the name, model, etc.
  • the currently available wireless microphone device can include “paired device” and “available device”; the user can select a suitable wireless microphone to participate in video recording, when the electronic device After detecting one or more microphones clicked by the user, the electronic device performs this wireless connection with the one or more microphones.
  • “Paired device” means a device that has been paired with an electronic device and is within range for wireless communication. If the user selects any one or more paired devices in the "paired devices" (such as smart speakers or wireless headsets and other devices with a microphone, that is, a wireless microphone), a wireless connection is established between the electronic device and the paired device, which is paired Data is transmitted between the device and the electronic device. When the electronic device is shooting a video, the paired device can transmit the data collected by the microphone to the electronic device.
  • paired devices such as smart speakers or wireless headsets and other devices with a microphone, that is, a wireless microphone
  • “Available device” means a device that can be paired with an electronic device and is within range for wireless communication. If the user selects any one or more of the available devices in the "Available Devices" (such as smart speakers or wireless headsets and other devices with a microphone, that is, a wireless microphone), the electronic device is paired with the available devices. After the pairing is completed, the electronic device is paired with the available devices A wireless connection is established between the devices, and data is transmitted between the available device and the electronic device. When the electronic device is shooting a video, the available device can transmit the data collected by the microphone to the electronic device.
  • the available devices can transmit the data collected by the microphone to the electronic device.
  • the electronic device and the wireless microphone can locate the wireless microphone based on the positioning function or the ranging function, and then automatically select the wireless microphone within the viewing angle range for recording according to the video viewing angle.
  • the electronic device detects the wireless microphone device (such as a smart device) selected by the user according to FIG. 5A and FIG. Speaker 1, Smart Speaker 2).
  • the audio corresponding to the front perspective is recorded by the local microphone of the electronic device, and the audio corresponding to the rear perspective is recorded by smart speaker 1 or smart speaker 2; assuming the initial stage, the rear perspective is the shooting shown in Figure 6 Perspective 1, the electronic device knows that the smart speaker 1 is located within the range of the shooting perspective 1 based on the smart speaker 1, the electronic device automatically realizes the connection with the smart speaker 1 this time, and the smart speaker 1 performs rear perspective recording. Then, during the recording process, if the rear perspective of the electronic device rotates and the rear perspective is switched from shooting perspective 1 to shooting perspective 2, the electronic device can be disconnected from smart speaker 1 and automatically establish this time with smart speaker 2. Wireless connection, the audio corresponding to the rear perspective is recorded by the wireless microphone 2.
  • FIG. 7 is a schematic diagram of an example of an audio processing process provided by an embodiment of the present application.
  • the audio processing method is applied to an electronic device, and the electronic device includes a first camera and a second camera, wherein the first camera captures a first view angle, and the second camera captures a second view angle, and the method includes the following steps:
  • the recording operation may be a single-channel recording operation or a multi-channel recording operation.
  • the electronic device enters the corresponding single-channel recording mode or multi-channel recording mode in response to the recording operation.
  • the user clicks the camera application on the main interface, and in response to the user's click operation, the electronic device displays a shooting preview interface.
  • the photographing preview interface may correspond to FIG. 3B .
  • the video recording operation here can be: in the shooting preview interface, the user clicks the operation of the shooting control 304, in response to the operation; or, in the more functional interface as shown in FIG. 3D, the user clicks the operation of the dual-view video recording control; Or, in the more function interface, the user clicks the operation of the multi-channel recording control.
  • the electronic device After the electronic device detects the user's video recording operation, it enters the corresponding video recording mode. For example, after detecting that the user clicks the shooting control in Figure 3A, it will enter the single-channel recording mode; or, after detecting the user's operation of clicking the dual-channel recording control on the interface of Figure 3D, it will enter the dual-channel recording (or dual-view recording) mode, etc.
  • the first camera records a first video image from a first perspective; records audio of multiple sound channels, where the audio of the multiple sound channels includes the first audio corresponding to the first perspective and the audio corresponding to the second perspective Second audio; at the first moment, the first speaker is speaking, and the first speaker is located in the second viewing angle.
  • the first camera may be a front camera, the first viewing angle is a front viewing angle, the first video image is a front video image; the second viewing angle is a rear viewing angle, wherein the first speaker is located at the back
  • the second audio may include the speaking voice of the first speaker.
  • it may correspond to the scene shown in FIG. 4B .
  • the first speaker corresponds to the speaker 2 in FIG. 4B .
  • the first camera may be a rear camera, the first perspective is a rear perspective, the first video picture is a rear video picture; the second perspective is a front perspective, where the first speaker is located in the front
  • the second audio may include the speaking voice of the first speaker.
  • it may correspond to the scene shown in FIG. 4B .
  • the first speaker corresponds to the speaker 1 in FIG. 4B .
  • the audio of the multiple sound channels may be separately recorded by at least two microphones.
  • the at least two microphones may include a local microphone and/or a wireless microphone of the mobile phone.
  • the audio of the multiple sound channels may be collected by a local microphone and a wireless microphone of the electronic device, respectively; alternatively, it may be collected by multiple wireless microphones; or, it may also be collected by multiple local microphones.
  • the wireless microphone in the present application may be various devices having a microphone function, and the wireless microphone may establish a wireless connection with the mobile phone before the recording operation.
  • the wireless microphone may be, for example, a wireless earphone, a wireless speaker, or another mobile phone, or a device capable of implementing a microphone function. This application does not specifically limit this.
  • the wireless connection between the wireless microphone and the mobile phone may include various methods, such as: Bluetooth, mobile hotspot (wireless fidelity, WI-FI), the fifth generation mobile communication technology (the 5th generation, 5G), the fourth generation Mobile communication technology (the 4th generation, 4G) and so on.
  • wireless hotspot wireless fidelity, WI-FI
  • the fifth generation mobile communication technology the 5th generation, 5G
  • the fourth generation Mobile communication technology the 4th generation, 4G
  • the first speaker speaks, which may include: at the first moment, the first speaker opens his mouth.
  • S703 Generate a target video file, where the target video file includes a third audio and a first video picture, where the third audio includes at least part of the first audio and at least a second audio.
  • the third audio is the audio after combining the audios of the multiple channels, in other words, the third audio is the third audio of the multi-channel audio.
  • the electronic device may combine multiple audio channels according to a preset weight, and obtain the third audio frequency.
  • the audio processor performs merging and encoding on the multi-channel audio to obtain the third audio.
  • the preset weight of each channel of audio may be set in combination with whether the speaker starts to speak.
  • the weight of the first audio may be lower than a first threshold, and the first threshold may be, for example, 0 or 0.2.
  • the third audio may be encoded according to the encoding method of the other audio in the dual audio.
  • the speaker's speech is not detected in the first video picture. If the speaker's mouth-opening action is not detected, it means that the speaker has not made a sound, or the audio corresponding to the first video picture does not contain the content required by the user, etc., At this time, the third audio may reduce the gain ratio (or weight) occupied by the audio corresponding to the first view angle, so as to present more other audio content.
  • the weight of the audio corresponding to the first view angle in the third audio is adjusted to the target weight. For example, when a user holding an electronic device starts to speak during a front-to-back camera, the gain ratio of the audio corresponding to the user in the third audio is increased to highlight the content of the user's speech more.
  • the process of judging whether the speaker starts to speak may be as follows: the electronic device performs image recognition according to the image of the speaker collected by the camera, and determines whether the speaker has a target action such as opening his mouth. When the target moves, it means that the speaker starts to speak.
  • the NPU computing processor of the electronic device recognizes the target action based on the graphic processing result of the speaker image by the ISP, such as detecting whether the object is open or not.
  • the weight of the multi-channel audio of a specific frame is adjusted based on the current buffered audio frame.
  • a weight adjustment strategy may be preset, and when a target action is detected, the weight of each audio channel is adjusted according to the weight adjustment strategy.
  • the weight of the first audio may increase with time, and accordingly, the weights of other audios may decrease with the change of time, so that the other audios are gradually switched to the first audio. effect to achieve smooth switching between audios and avoid sudden changes in sound.
  • the weight of the first audio frequency may vary linearly with time, as shown in FIG. 8 .
  • the horizontal axis is the time axis
  • the vertical axis is the weight of the first audio. From the moment when the third audio is started (frame 1) to frame i, the weight and time of the first audio are a linear relationship.
  • the relationship between the weight and time of each channel of audio may not be limited to a linear relationship.
  • the relationship between the weight and the audio merging time may also include a variety of nonlinear relationships. Not limited.
  • the target video file includes the first video picture and the third audio, therefore, when playing the target video file, the electronic device plays the third audio while playing the first video picture.
  • the target video file may further include multiple other video images, so that when the target video file is played, the electronic device can simultaneously play video images from multiple viewing angles and third audio.
  • the speaker starts to speak, and at this time, the audio feature corresponding to the perspective where the speaker is located changes.
  • the audio feature includes volume, and in the process of playing the target video file, when the video image corresponding to the first moment is played, the volume of the second audio increases.
  • the volume of the second audio gradually increases.
  • the electronic device when the electronic device plays the target video file, the electronic device displays a first video picture and a second video picture.
  • the electronic device when the electronic device plays the target video file, the electronic device displays the first video picture but does not display the second video picture.
  • the first speaker in the second video image opens his mouth.
  • the electronic device may set the playback audio track for playing the third audio as the default audio track of the video, so that when the video work is played, the third audio is played by default; or, when the video work is shared, the third audio is shared by default.
  • the playback track is the playback channel during audio playback.
  • the mobile phone may store the acquired multi-channel audio in the memory, and combine the multi-channel audio to obtain the third audio of the multi-channel audio.
  • the mobile phone may set different preset weights for different audios at different playback times, and weight the data (eg, sampling rate) of the multi-channel audio according to the preset weights to obtain the third audio.
  • the mobile phone uses the front camera to obtain the image of the speaker, and judges whether the speaker starts to speak according to the image of the speaker. If it is determined that the speaker starts to speak, the weight of the audio corresponding to the front image in the third audio can be adjusted, such as dynamic Increase the proportion of the near-end audio of the mobile phone (such as the speaker's audio), so that the third audio is gradually switched to the audio of the near-end of the mobile phone to highlight its audio content.
  • the weight of the audio corresponding to the video image in the third audio is adjusted by the target result detected based on the captured video image, and the audio frequency is optimized on the basis of presenting the complete audio.
  • the switching effect between the two devices solves the sudden change of sound caused by the need to switch to obtain audio content when an electronic device that does not support multi-channel audio playback is playing video.
  • the following describes the internal implementation process and processing flow of the audio processing method provided by the embodiments of the present application by taking a scenario of front and rear dual-channel recording of a mobile phone as an example with reference to the accompanying drawings.
  • the audio processing method provided by the embodiment of the present application may be performed in real time during the recording process, or may be performed after the recording.
  • the following takes audio processing during recording as an example for description.
  • each process may include the following contents.
  • the video recording and video processing flow may include: in the current front and rear dual-channel video recording mode, the electronic device respectively captures a frame of front video images (denoted as front video frames) through the front camera and the rear camera. ZX) and a frame of rear video image (referred to as rear video frame ZY); the front camera and the rear camera respectively transmit the collected video data to the ISP of the electronic device; the electronic device can be, for example, through an open graphic interface (openGL interface) splicing the front video frame ZX and the rear video frame ZY, and then the video codec performs video encoding, and then writes the target video file according to a certain file specification (such as MP4 container file specification).
  • a certain file specification such as MP4 container file specification
  • the recording and audio processing process may include: in the current front and rear dual recording mode, the electronic device may record a frame of audio (denoted as audio frame X) by the local microphone, and the wireless microphone records a frame of audio. (denoted as audio frame Y); after receiving the audio data, the electronic device can buffer the audio data in a buffer area (for example, a buffer area in a memory), wherein the audio data of different sound channels can be buffered in different buffer areas, such as Buffer the audio frame X to the buffer area QX, and buffer the audio frame Y to the buffer area QY; after receiving the multi-channel audio data, the audio processor can independently encode the audio data of each channel, and encode the encoded audio data of each channel.
  • a buffer area for example, a buffer area in a memory
  • the audio data of different sound channels can be buffered in different buffer areas, such as Buffer the audio frame X to the buffer area QX, and buffer the audio frame Y to the buffer area QY
  • the audio processor after receiving the multi-
  • the audio data of the current frame is written to the multi-channel audio file.
  • the encoding manner may include, for example, pulse code modulation (pulse code modulation, PCM), advanced audio coding (advanced audio coding, AAC), and the like.
  • the format of the encoded target audio may include waveform sound file WAV, MP3 format, and the like.
  • the audio processor may combine the audio frame X and the audio frame Y according to the preset weight, for example, combine and encode the two channels of audio according to a certain proportional gain to obtain the third audio.
  • sampling rate of each channel of audio may be the same or different, and the embodiments of the present application are described with the same sampling rate of each channel of audio (for example, 8 bits).
  • the image recognition-based audio mixing process may include: in the current front and rear dual recording mode, the front video image captured by the front camera of the electronic device includes the speaker, and the video frame captured by the electronic device After being transmitted to the IPS and processed by the IPS, the video stream can be divided into two channels, one video stream data is used for merging with the rear video image, and the other video stream is used for image recognition by electronic equipment to determine whether the speaker is speaking. .
  • the processing of video images in the hardware abstract HAL layer is used as an example for introduction.
  • the above-mentioned video processing process, audio processing process and face recognition process are not limited to being implemented in the HAL layer.
  • It can also be implemented in the middle layer or the application layer, which is not limited in this application.
  • the HAL layer here can be the interface layer between the kernel layer and the hardware layer shown in Figure 2; the middle layer can be the system library and application framework layer shown in Figure 2; the application layer can be the application shown in Figure 2 layer.
  • the front camera transmits the collected image signal of the front video frame ZX to the ISP for image processing
  • the rear camera transmits the collected image signal of the rear video frame ZY to the ISP for image processing
  • transmit the post-video stream to the post-processing unit for example, to the beautification processing unit to perform beautification processing on the post-video image
  • the IPS can transmit the pre-video stream to the face recognition unit and the pre-image post-processing unit respectively. Whether the lips are open, and then determine whether the speaker is speaking, the post-processing unit performs beautification processing and anti-shake processing on the front video image.
  • judging whether the speaker is speaking according to the pre-video image may further include the following specific content:
  • the pre-video frame is passed to the NPU computing processor for image recognition.
  • the NPU computing processor After receiving the image input information of the current frame, the NPU computing processor performs fast processing on the input information, such as performing a human-to-speaker analysis based on the acquired current video frame ZX. Face detection, including using the face coordinate AI algorithm to determine whether the speaker has a target action, wherein: if it is determined that the speaker has a target action in the current video frame, indicating that the speaker has started to speak, the audio processor will detect the speaker's speaking moment.
  • the combined audio still combines and encodes the audio recorded by the local microphone and the audio recorded by the wireless microphone according to the preset proportional gain.
  • the post-processing in the above process includes, for example: combining the coordinates of the face, optimizing the color of the image through YUV to obtain the pre-video frame and post-video frame with beautifying effect; and then the current frame video image can be protected against jitter processing.
  • the number of frames per second of the video image transmission is the same, for example, 30fps.
  • the electronic device when the electronic device detects that the speaker starts to speak, it may be delayed from the time when the speaker actually starts to speak, that is, when the electronic device determines that the speaker starts to speak, the corresponding audio frame when the speaker actually starts to speak has already been buffered to Therefore, the weight of each channel of audio is adjusted i frames in advance in order to overcome the delay caused by the electronic device performing the process of determining the start of the speaker, thereby ensuring the integrity of the audio content.
  • each channel of audio is encoded by i frames (i is an integer greater than or equal to 1) in advance of the current audio frame X, and the encoded audio data is multiplexed into audio files.
  • the multi-channel audio data obtained above is written into the target audio and video file corresponding to the current frame, and the acquisition includes the current video and the third audio file corresponding to the video.
  • the above method can be used to process video and audio, and then in addition to ensuring the independent audio of each sound channel, the complete merged audio corresponding to the video picture and the smooth and smooth audio are obtained. Audio transition effect.
  • FIG. 11 shows a schematic flowchart of another audio processing method provided by an embodiment of the present application. As shown in FIG. 11 , the method may include the following steps:
  • the audio frame currently acquired by each microphone is recorded as the first audio frame.
  • the sampling rate of each channel of audio may be the same or different, and here, the sampling rate of each channel of audio is the same (for example, 8 bits) for description.
  • the audio frame currently recorded by the local microphone can be stored in the first buffer (denoted as QX), and the audio frame currently recorded by the wireless microphone (denoted as audio frame Y) can be stored in the first buffer.
  • the second buffer (denoted as QY).
  • the local audio and the wireless microphone audio in the preset time period before the current moment are also cached in the above-mentioned corresponding positions.
  • the audio data within a certain period of time before the current moment can be processed. Buffering; or, for a certain number of frames before the current audio frame, such as buffering the audio data of the local microphone audio frame [X-i, X] and the wireless microphone audio frame [Y-i, Y], i is greater than or equal to 1, and Integer less than X, Y.
  • the motion of the first speaker is detected.
  • the action of opening the mouth of the first speaker is detected, it is considered that the first speaker starts to speak.
  • the speaker starts to speak may refer to the target action of the speaker, such as the action of opening the mouth.
  • the electronic device may determine that the speaker starts speaking based on the target action of the speaker.
  • the corresponding audio frame when the target action is detected may be later than the time when the target action actually occurs. Therefore, in order to present a complete When combining multiple audio channels in this embodiment of the present application, it may start from a frame before the current frame.
  • the first audio frame may be the corresponding audio frame buffered in the buffer when the target action is detected. Based on the first audio frame, the starting moment of combining the multiplexed audios may be determined. Specifically, the first audio frame currently buffered can be used as a reference, and the preset time length can be returned to start merging the multiple audio channels.
  • the preset time length may be, for example, 100 ms.
  • the first audio frame currently buffered may be used as a reference, and the audio frame of i frame may be rolled back to start merging multiple audio channels.
  • one frame of audio frame in the embodiment of the present application may correspond to a time interval.
  • the audio frame recorded by the local microphone is [X]
  • the audio frame recorded by the wireless microphone is [Y]
  • the i frame can be rolled back, that is to say, the audio frame [X-i, X] and the audio frame [Y-i, Y] are merged, and the obtained audio frame corresponding to the third audio can be [ M-i, M].
  • the video frame corresponding to the audio in this time period may be [Z-i, Z].
  • i is an integer greater than or equal to 1
  • X, Y, M, and Z are all integers greater than i.
  • the audio processing method provided by the embodiment of the present application, by combining multiple audio channels at a certain time in advance relative to the time when the target action is detected, it is possible to avoid the delay caused by the process of detecting the target action, resulting in incomplete audio content or Audio incoherence problem.
  • Audio 1 that is, track 1
  • audio 2 that is, audio track 2
  • audio 3 or mixed audio track
  • Audio 1 may be, for example, audio recorded by a local microphone
  • audio 2 may be, for example, audio recorded by a wireless microphone.
  • the sampling rate of audio 1 and audio 2 is 8 bits
  • the audio frames to be combined are [X-i, X] and [Y-i, Y] respectively, where the audio data of the [X-i]th frame is 11.
  • the audio data of the [(X-i)+1]th frame is 12, the audio data of the [(X-i)+2]th frame is 200; the audio data of the [Y-i]th frame is 21, and the [(Y-i)+1th
  • the audio data of the ] frame is 22, and the audio data of the [(Y-i)+2]th frame is 202.
  • Audio 3 (or mixed audio track) obtained by merging Audio 1 and Audio 2
  • the weight of the two-channel audio can be set to be 0.5.
  • the audio data corresponding to each frame of Audio 3 is as follows: th [Z-i]
  • the weight of the volume of the audio 1 and the audio 2 is changed as an example for description. In other embodiments, other audio features may also be adjusted.
  • the weight changes dynamically with time as shown in Figure 8
  • the weight of audio 1 changes linearly with time.
  • the adjustment process of audio 1 and audio 2 is as follows:
  • the weights can be expressed as follows:
  • the weight of each channel of audio can also be adjusted by a method similar to the combination of two channels of audio. Assuming that the audio data from the 1st to the nth frame of each audio buffer is shown in Figure 10, the audio weights W in the ith frame of the third audio and the audio data Zi of the ith frame can respectively satisfy the following formulas:
  • the audio sampling rate in the audio processing method provided by the embodiment of the present application may be 8 bits, 16 bits or 24 bits, which is not limited in the present application.
  • the audio recorded by multiple sound channels can be completely played through one audio track, so that smooth switching between videos can be realized on the basis of ensuring the integrity of the audio content, and targeted highlighting can be achieved.
  • the electronic device includes corresponding hardware and/or software modules for executing each function.
  • the present application can be implemented in hardware or in the form of a combination of hardware and computer software in conjunction with the algorithm steps of each example described in conjunction with the embodiments disclosed herein. Whether a function is performed by hardware or computer software driving hardware depends on the specific application and design constraints of the technical solution. Those skilled in the art may use different methods to implement the described functionality for each particular application in conjunction with the embodiments, but such implementations should not be considered beyond the scope of this application.
  • the electronic device can be divided into functional modules according to the above method examples.
  • each functional module can be divided corresponding to each function, or two or more functions can be integrated into one processing module.
  • the above-mentioned integrated modules can be implemented in the form of hardware. It should be noted that, the division of modules in this embodiment is schematic, and is only a logical function division, and there may be other division manners in actual implementation.
  • Embodiments of the present application further provide an electronic device, including one or more processors and one or more memories.
  • the one or more memories are coupled to the one or more processors for storing computer program code, the computer program code comprising computer instructions that, when executed by the one or more processors, cause the electronic device to perform
  • the above related method steps implement the audio processing method in the above embodiment.
  • Embodiments of the present application further provide a computer-readable storage medium, where computer instructions are stored in the computer-readable storage medium, and when the computer instructions are executed on an electronic device, the electronic device executes the above-mentioned related method steps to realize the above-mentioned embodiments The audio processing method in .
  • Embodiments of the present application also provide a computer program product, which, when the computer program product runs on a computer, causes the computer to execute the above-mentioned relevant steps, so as to implement the audio processing method executed by the electronic device in the above-mentioned embodiment.
  • embodiments of the present application also provide an apparatus, which may specifically be a chip, a component, a module, or a chip system, and the apparatus may include a processor and a memory connected to it; wherein, the memory is used to store instructions for execution by a computer, and when the apparatus is When running, the processor can execute the computer-executed instructions stored in the memory, so that the chip executes the audio processing method executed by the electronic device in the above method embodiments.
  • the electronic device, computer-readable storage medium, computer program product or chip provided in this embodiment are all used to execute the corresponding method provided above. Therefore, for the beneficial effects that can be achieved, reference may be made to the above-provided method. The beneficial effects in the corresponding method will not be repeated here.
  • Embodiment 1 An audio processing method, which is applied to an electronic device, and the electronic device includes a first camera and a second camera, wherein the first camera captures a first angle of view, and the second camera captures a first camera. From two perspectives, the method includes:
  • a video recording mode is entered; in the video recording mode, the first camera records a first video image for the first viewing angle; audio of multiple sound channels is recorded, and the multiple The audio of the sound channels includes the first audio corresponding to the first perspective and the second audio corresponding to the second perspective; at the first moment, the first speaker speaks, and the first speaker is located in the second within the angle of view;
  • the target video file including a third audio and a first video image, wherein the third audio includes at least part of the first audio and at least part of the second audio;
  • the audio characteristic of the second audio in the third audio changes.
  • Embodiment 2 The method according to Embodiment 1, wherein the audio feature includes volume, and playing the target video file specifically includes:
  • the volume of the second audio is increased.
  • Embodiment 3 The method according to Embodiment 2, wherein when the video image corresponding to the first moment is played, the volume of the second audio is gradually increased.
  • Embodiment 4 The method according to any one of Embodiments 1-3, wherein, in the video recording mode, the second camera records a second video image from the second viewing angle, and the electronic device displays a shooting interface, the shooting interface includes the first video picture and the second video picture;
  • the target video file also includes the second video picture
  • the electronic device When the electronic device plays the target video file, the electronic device displays the first video picture and the second video picture.
  • Embodiment 5 The method according to any one of Embodiments 1-3, wherein, in the video recording mode, the second camera records a second video image from the second viewing angle, and the electronic device displays a shooting interface, the shooting interface does not include the second video image;
  • the electronic device plays the target video file, the electronic device does not display the second video image.
  • Embodiment 6 The method according to any one of Embodiments 1-5, wherein, in the video recording mode, the second camera records a second video image from the second viewing angle, and in the first At the moment, the first speaker in the second video frame opens his mouth.
  • Embodiment 7 The method according to any one of Embodiments 1-6, wherein, in the video recording mode, at a second moment, a second speaker speaks, and the second speaker is located in the first within the angle of view;
  • the electronic device plays the target video file
  • the screen corresponding to the second time is played, the audio feature of the first audio in the third audio changes.
  • Embodiment 8 The method according to Embodiment 7, wherein when the screen corresponding to the second moment is played, the volume of the first audio in the third audio gradually increases.
  • Embodiment 9 The method of any one of Embodiments 1-8, wherein the electronic device includes a first microphone and a second microphone;
  • the first microphone records the first audio
  • the second microphone records the second audio
  • the first microphone records the second audio
  • the second microphone records the first audio
  • Embodiment 10 The method according to any one of Embodiments 1-8, wherein the electronic device includes a first microphone, and the second microphone is wirelessly connected to the electronic device;
  • the first microphone records the first audio
  • the second microphone records the second audio
  • the second audio is sent to the electronic device through the wireless connection
  • the first microphone records the second audio
  • the second microphone records the first audio
  • the first audio is sent to the electronic device through the wireless connection.
  • Embodiment 11 The method of any one of Embodiments 1-8, wherein both the first microphone and the second microphone are wirelessly connected to the electronic device, and the first audio and the second audio pass through the sending the wireless connection to the electronic device;
  • the first microphone records the first audio
  • the second microphone records the second audio
  • the first microphone records the second audio
  • the second microphone records the first audio
  • Embodiment 12 The method according to any one of Embodiments 1-11, wherein, in the recording mode, the audio frame of the first audio, the audio frame of the second audio, and the video of the first video picture are buffered frame;
  • Embodiment 13 The method according to any one of Embodiments 1-12, wherein the first viewing angle and the second viewing angle are any two viewing angles among a front viewing angle, a wide angle viewing angle, and a zoom viewing angle.
  • Embodiment 14 an audio processing method, wherein the method is applied to an electronic device, the electronic device includes a first camera and a second camera, wherein the first camera captures a first angle of view, and the second camera captures a first camera. From two perspectives, the method includes:
  • a video recording mode is entered; in the video recording mode, the first camera records a first video image for the first viewing angle; audio of multiple sound channels is recorded, and the multiple The audio of each sound channel includes the first audio corresponding to the first perspective and the second audio corresponding to the second perspective; at the first moment, the first speaker speaks, and the first speaker is located in the first within the angle of view;
  • the target video file includes a third audio and a first video picture, wherein the third audio includes at least a portion of the first audio and at least a portion of the second audio;
  • the audio feature of the first audio in the third audio changes.
  • Embodiment 15 The method according to Embodiment 14, wherein the audio feature includes volume, and playing the target video file specifically includes:
  • the volume of the first audio is increased.
  • Embodiment 16 The method according to Embodiment 15, wherein when the video picture corresponding to the first moment is played, the volume of the first audio is gradually increased.
  • Embodiment 17 The method of any one of Embodiments 14-16, wherein the electronic device includes a first microphone and a second microphone;
  • the first microphone records the first audio
  • the second microphone records the second audio
  • the first microphone records the second audio
  • the second microphone records the first audio
  • Embodiment 18 The method of any one of Embodiments 14-16, wherein the electronic device includes a first microphone, and the second microphone is wirelessly connected to the electronic device;
  • the first microphone records the first audio
  • the second microphone records the second audio
  • the second audio is sent to the electronic device through the wireless connection
  • the first microphone records the second audio
  • the second microphone records the first audio
  • the first audio is sent to the electronic device through the wireless connection.
  • Embodiment 19 The method of any one of Embodiments 14-16, wherein both the first microphone and the second microphone are wirelessly connected to the electronic device, and the first audio and the second audio pass through the sending the wireless connection to the electronic device;
  • the first microphone records the first audio
  • the second microphone records the second audio
  • the first microphone records the second audio
  • the second microphone records the first audio
  • Embodiment 20 The method according to Embodiment 14, wherein, in the recording mode, the audio frame of the first audio, the audio frame of the second audio, and the video frame of the first video picture are buffered;
  • Embodiment 21 The method according to Embodiment 14, wherein the first viewing angle and the second viewing angle are any two viewing angles among a front viewing angle, a wide angle viewing angle, and a zoom viewing angle.
  • Embodiment 22 an electronic device, comprising:
  • Audio playback component used to play audio
  • processors one or more processors
  • Embodiment 23 A computer-readable storage medium, comprising computer instructions that, when executed on an electronic device, cause the electronic device to perform the audio as described in any one of Embodiments 1 to 21 method of processing.
  • Embodiment 24 A computer program product, characterized in that, when the computer program product runs on a computer, the computer is caused to execute the audio processing method according to any one of Embodiments 1 to 21.
  • Embodiment 25 An electronic device, comprising a screen, a computer memory, and a camera, for implementing the audio processing method according to any one of Embodiments 1 to 21.
  • the computer program product includes one or more computer instructions.
  • the computer may be a general purpose computer, special purpose computer, computer network, or other programmable device.
  • the computer instructions may be stored in or transmitted over a computer-readable storage medium. The computer instructions can be sent from one website site, computer, server or data center to another website site, computer, server or data center for transmission.
  • the computer-readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that includes an integration of one or more available media.
  • the usable media may be magnetic media (eg, floppy disks, hard disks, magnetic tapes), optical media (eg, DVDs), or semiconductor media (eg, solid state disks (SSDs)), and the like.
  • the process can be completed by instructing the relevant hardware by a computer program, and the program can be stored in a computer-readable storage medium.
  • the program When the program is executed , which may include the processes of the foregoing method embodiments.
  • the aforementioned storage medium includes: ROM or random storage memory RAM, magnetic disk or optical disk and other mediums that can store program codes.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Otolaryngology (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Human Computer Interaction (AREA)
  • Studio Devices (AREA)

Abstract

本申请实施例提供一种音频处理的方法及电子设备,属于音频处理技术领域。该方法应用于包括摄像头的电子设备中,该电子设备在录像模式下,录制多路音频和与音频对应视频画面,在录像播放时,播放部分视频画面和多路音频合并后的音频,其中,当视频画面中的说话人开始说话时,播放的音频切换为说话人所在视频画面对应的音频。本申请实施例通过基于拍摄的视频图像检测说话人开始说话,然后对合并后的音频中与说话人所在视角对应音频的权重进行调整,在呈现完整音频的基础上,优化音频之间的切换效果,解决了不支持播放多路音频的电子设备在播放视频时,为获取音频内容需要进行切换,导致的声音突变感。

Description

音频处理的方法及电子设备
本申请要求于2020年09月30日提交国家知识产权局、申请号为202011063369.6、申请名称为“音频处理的方法及电子设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及音频处理领域,尤其涉及一种音频处理的方法及电子设备。
背景技术
随着在社交平台分享视频日益普遍,越来越多的用户会采用电子设备拍摄视频进行分享。在视频拍摄时,用户可以启动电子设备的多麦克风功能,录制不同角度或对象的音频,例如说话人的声音或环境中的声音等。以视频录像为例,越来越多的电子设备已经开始支持双景录像(包括前后双景录像等)功能。在双景录像过程中,对应的录音方式一般有两种方式:一是传统正常录音中的单路录音方式;二是采用双路录音,其中,双路录音中的一路音频可以为电子设备通过本机麦克风正常录音,另一路音频可以为基于蓝牙耳机的麦克风功能或者3mic的音频变焦(audio zoom)等无线麦克风录音。对于单路录音来说,只能获取一路音频,无法使用户获取不同视频画面对应的音频,导致音频内容不完整;对于双路录音,虽然在双景录像时能够录制各视频画面对应的多路音频,但是在视频分享时,由于被分享的设备可能不支持双音轨播放,只能选择其中一路录音分享,或者即使能够进行双音轨播放,由于各路音轨的声音会互相干扰,使得用户无法获得良好的收听体验。
发明内容
本申请提供了一种音频处理的方法及电子设备,通过基于拍摄的视频图像检测到说话人开始说话这一事件,对多路音频合并后的音频中与视频图像对应音频的权重进行调整,解决了电子设备在播放音视频文件时,为获取音频内容需要进行音频切换,导致的声音突变感。
第一方面,提供了一种音频处理的方法,所述电子设备包括第一摄像头、第二摄像头,其中,所述第一摄像头拍摄第一视角,所述第二摄像头拍摄第二视角,所述方法包括:
响应于所述用户输入的录像操作时,进入录像模式;在所述录像模式下,所述第一摄像头对所述第一视角录制第一视频画面;录制多个声音通道的音频,所述多个声音通道的音频包括所述第一视角对应的第一音频和所述第二视角对应的第二音频;在第一时刻,第一说话人说话,所述第一说话人位于所述第二视角内;
生成目标录像文件,所述目标录像文件包括第三音频和第一视频画面,其中,所述第三音频包括至少部分所述第一音频和至少部分第二音频;以及
响应于用户输入对所述目标录像文件的播放操作,播放所述目标录像文件;其中,
当播放到所述第一时刻对应的画面时,所述第二音频的音频特征发生变化。
在一种实现方式中,第一摄像头为后置摄像头,第二摄像头为前置摄像头。则在录像过程中,电子设备通过后置摄像头录制后置视角的视频画面;而第一说话人位于前置视角范围内,此时第一说话人例如可以是手持电子设备的用户等。
或者,在一种实现方式中,第一摄像头为前置摄像头,第二摄像头为后置摄像头。则在录像过程中,电子设备可以通过前置摄像头录制前置视角的视频画面;而第一说话人位于后置视角范围内,此时第一说话人例如可以是距离电子设备较远的被拍摄对象等。
应理解,本申请实施例中的说话人可以是在录像过程中说话,并被录制到说话声音的人,如:手持电子设备的用户;或者出现在视频画面中的被拍摄对象;或者未出现在视频画面中,但被录制到说话声音的人。
在一种实现方式中,多个声音通道的音频(或称多路音频)可以为不同视角对应的音频,例如,多路音频分别与多个拍摄视角对应。
在一种实现方式中,在录像过程中,多路音频可以分别由多个麦克风同时采集。例如,在双景录像过程中,可以通过电子设备的本机麦克风和无线麦克风分别采集不同音频,该双路音频可以分别对应两个拍摄视角。其中,本机麦克风可以为安装于电子设备内部的麦克风,无线麦克风可以为与电子设备建立无线连接的麦克风。
在一种实现方式中,目标录像文件可以为电子设备对录像模式下获取的视频或音频处理后的录像文件,如MP4格式的文件。其中,目标录像文件中的第三音频为对多个声音通道的音频合并后的音频,包括至少部分第一音频和至少部分第二音频。
在一种实现方式中,对多个声音通道的音频进行音频合并时,各通道音频可以设置不同的权重,换言之,在第三音频中,各通道音频可以占据不同比例的增益。示例性的,当第二视角内的第一说话人未说话时,第二音频的权重可以设置的较低,如为0.2或者为0。
在一种实现方式中,当第二音频的权重为0时,第三音频按照多个声音通道中的其他路音频的编码方式进行编码。例如,在双路音频处理的场景下,当第一音频的权重为0时,第三音音频按照第二音频的编码方式进行编码。
在一种实现方式中,当电子设备接收到输入的多个声音通道的音频后,还可以分别对各通道音频进行单独编码。
应理解,为避免各路音频合并后溢出,导致用户收听体验下降,在对第三音频调整后的各路音频应满足权重之和为1。
根据本申请实施例提供的音频处理的方法,通过基于拍摄的视频图像检测到说话人开始说话这一事件,对第三音频中与视频图像对应音频的音频特征进行调整,可以在呈现完整音频的基础上,优化音频之间的切换效果,实现音频之间的自然光滑切换,并有针对性地凸显多路音频中的重点内容,提升用户的收听体验。
结合第一方面,在第一方面的某些实现方式中,所述音频特征包括音量,播放所述目标录像文件,具体包括:当播放到所述第一时刻对应的视频画面时,所述第二音频的音量增大。
在一种实现方式中,在音频处理时,当检测到说话人开始说话时,以说话人开始说话的时刻为基准,由该时刻回退预设时间段,提前i帧音频帧调整第三音频中第二音频的权重,直至到达目标权重。示例性的,调整后的第二音频的目标权重大于其他路音频的权重,使得第三音频更多呈现第二音频的内容。
根据本申请实施例提供的音频处理的方法,通过在说话人说话时,提高第二音频的音量,能够使第三音频中的播放音频切换为说话人所在视角对应的音频,使用户清晰的听到说话人的声音。
结合第一方面,在第一方面的某些实现方式中,当播放到所述第一时刻对应的视频画面时,所述第二音频的音量逐渐增大。
在一种实现方式中,播放目标录像文件时,当第一说话人开始说话时,当前播放的第三音频中,第二音频的音量会逐渐增大,使得播放的音频逐渐切换至第二音频。
具体地,在音频处理时,当检测到第一说话人开始说话时,以说话人开始说话的时刻为基准,由该时刻回退预设时间段,提前i帧音频帧,动态增大第二音频的权重。
根据本申请实施例提供的音频处理的方法,通过逐渐增大第二音频的音量,可以在录像播放时,第二音频的音量由弱渐强,实现由其他音频自然切换至第二音频的效果,避免播放录像时的声音突变感。
结合第一方面,在第一方面的某些实现方式中,在所述录像模式下,所述第二摄像头对所述第二视角录制第二视频画面,所述电子设备显示拍摄界面,所述拍摄界面包括所述第一视频画面和第二视频画面;
所述目标录像文件还包括所述第二视频画面;
所述电子设备播放所述目标录像文件时,所述电子设备显示所述第一视频画面和所述第二视频画面。
应理解,在播放过程中,电子设备例如可以同时播放前置视角画面和后置视角画面,或者,同时播放双前置视角的视频画面或同时播放双后置视角的视频画面。
在该场景下,电子设备通过显示多个视频画面,使得用户能够观看不同视角的视频画面,并且当其中一个视角中的说话人开始说话时,播放的音频开始切换到该视角对应的音频,实现与视频画面内容匹配的音频切换效果。
结合第一方面,在第一方面的某些实现方式中,所述第二摄像头对所述第二视角录制第二视频画面,所述电子设备显示拍摄界面,所述拍摄界面不包括所述第二视频画面;
所述电子设备播放所述目标录像文件时,所述电子设备不显示所述第二视频画面。
应理解,电子设备可以通过多个摄像头采集不同视角的视频画面,然而,在录像过程中,电子设备可以仅显示其中部分视频画面,其中,未显示的视频画面可以用于电子设备进行图像识别,判断该未显示视频画面对应视角中的说话人是否说话。
比如,当第一摄像头为后置摄像头,第二摄像头为前置摄像头时,在录像过程中,电子设备分别通过后置摄像头和前置摄像头采集前置视角对应的视频画面和后置视角对应的视频画面,然而在电子设备的拍摄预览界面可以仅显示后置视角对应的视频画面;和/或,在播放录像时,可以仅播放后置视角对应的视频画面。
此时,电子设备可以在后台运行前置摄像头,采集前置视角对应的视频画面,比 如:电子设备不将前置视频画面的数据传输显示器,因而在录像过程中,拍摄预览界面不显示前置视频画面;并且,不将前置视频画面的数据写入目标录像文件,因而,在录像播放过程中不播放该前置视频画面。
电子设备利用前置视频画面,判断其中包括的说话人是否说话,当说话人开始说话的时刻,则第三音频中第二音频的音量提高,播放的音频切换为前置视角对应的音频。
根据本申请实施例提供的音频处理的方法,当录像播放时,仅播放部分视角的视频画面时,未播放视角范围内的说话人开始说话时,播放的音频仍然能够切换至该说话人所在视角对应的音频,能够满足用户在不同视频画面观看需求下,保证音频切换与音频内容匹配。
结合第一方面,在第一方面的某些实现方式中,在所述录像模式下,所述第二摄像头对所述第二视角录制第二视频画面,在所述第一时刻,所述第二视频画面中的所述第一说话人张口。
应理解,当第一说话人张口时,可以表示该第一说话人开始说话,因此,可以将第一说话人张口的时刻,作为第一说话人开始说话的时刻。通过本申请实施例,能够根据说话人图像判断说话人是否开始说话,进而改变该说话人所在视角对应的音频特征。
结合第一方面,在第一方面的某些实现方式中,在所述录像模式下,在第二时刻,第二说话人说话,所述第二说话人位于所述第一视角内;
所述电子设备播放所述目标录像文件时,当播放到所述第二时刻对应的画面时,所述第三音频中所述第一音频的音频特征发生变化。
在一种实现方式中,第一视角为后置视角,第二说话人可以为处于后置视角内的被拍摄对象。
其中,在播放目标录像文件时,当第二说话人开始说话时,播放的第三音频的音频切换为后置视角对应的音频,例如,后置视角对应音频的音量增大,凸显第二说话人的说话声音。
根据本申请实施例提供的音频处理的方法,在视频播放时,不同的开始说话时,则播放的音频就会切换为当前说话人所在视角对应的音频,使得用户能够及时完整地获取当前说话人的说话内容,无需用户手动切换播放音轨,提升用户的收听体验。
结合第一方面,在第一方面的某些实现方式中,当播放到所述第二时刻对应的画面时,所述第三音频中所述第一音频的音量逐渐增大。
示例性的,当检测到第二说话人开始说话时,第一音频的音量可以随时间动态增大,使得播放录像时,第一音频的音量可以由弱渐强,实现自然切换至第一音频,从而使用户清晰地听到第二说话人的声音。
根据本申请实施例提供的音频处理的方法,通过逐渐增大第一音频的音量,能实现第三音频中由其他音频自然切换至第一音频的效果,避免播放录像时的声音突变感。
结合第一方面,在第一方面的某些实现方式中,所述电子设备包括第一麦克风和第二麦克风;
在所述录像模式下,所述第一麦克风录制所述第一音频,所述第二麦克风录制所 述第二音频;或,
在所述录像模式下,所述第一麦克风录制所述第二音频,所述第二麦克风录制所述第一音频。
示例性的,第一麦克风和第二麦克风可以为电子设备内部安装的麦克风装置,为电子设备的本机麦克风。
换句话说,电子设备可以通过多个本机麦克风录制不同视角的音频。其中,多个本机麦克风可以安装于电子设备的不同位置,能够录制不同视角范围内的音频。
结合第一方面,在第一方面的某些实现方式中,所述电子设备包括第一麦克风,第二麦克风与所述电子设备无线连接;
在所述录像模式下,所述第一麦克风录制所述第一音频,所述第二麦克风录制所述第二音频;或,
在所述录像模式下,所述第一麦克风录制所述第二音频,所述第二麦克风录制所述第一音频。
示例性的,第一麦克风可以为电子设备内部安装的麦克风装置,为电子设备的本机麦克风;第二麦克风可以为无线麦克风,例如,蓝牙耳机、蓝牙音箱、其他用户的手机等多种具有录音功能的设备。
在一种实现方式下,电子设备可以通过本机麦克风录制前置视角对应的音频,后置视角对应的音频则由无线麦克风录制。其中,无线麦克风例如可以由后置视角范围内的被拍摄对象佩戴,或者无线麦克风放置于便于录制后置视角音频的位置。
根据本申请实施例提供的音频处理的方法,电子设备能够与无线麦克风无线连接,使得电子设备能够通过由无线麦克风录制不同位置的音频,尤其是与电子设备距离较远位置的音频,从而增加了音频录制的灵活性,提高了不同视角音频录制的质量。
结合第一方面,在第一方面的某些实现方式中,第一麦克风和第二麦克风均与所述电子设备无线连接;
在所述录像模式下,所述第一麦克风录制所述第一音频,所述第二麦克风录制所述第二音频;或,
在所述录像模式下,所述第一麦克风录制所述第二音频,所述第二麦克风录制所述第一音频。
示例性的,第一麦克风和第二麦克风均为无线麦克风,且与电子设备无线连接。
应理解,无线麦克风可以灵活布置于不同的位置,因而可以根据拍摄视角将无线麦克风分别布置在便于录制不同视角对应音频的位置,从而提高音频质量以及音频录制的灵活性。
比如:当进行前后置双路录像时,第一麦克风可以由前置视角中的说话人佩戴,第二麦克风可以由后置视角中的说话人佩戴,分别录制不同说话人的音频,此时,即使说话人与电子设备之间的距离发生变化,也不会影响音频的录制效果。
结合第一方面,在第一方面的某些实现方式中,在所述录像模式下,缓存所述第一音频的音频帧、第二音频的音频帧和第一视频画面的视频帧;
检测所述第一说话人的动作;
在检测到所述第一说话人开始说话时,在当前音频帧的前i帧音频帧开始,调整 所述第三音频中所述第一音频的音频特征,并调整所述第三音频中所述第二音频的音频特征,i大于等于1。
应理解,说话人从实际开始说话到电子设备检测到该事件,期间需要一定的时间,导致检测到开说话时对应的音频帧可能会晚于说话人实际开始说话的时刻,因此,为了呈现完整的音频内容,本申请实施例对第三音频的各路音频的音频特征进行调整时,可以由当前帧之前的某一帧开始执行。
可选地,第一音频帧可以为:在检测到第一说话人张口的时刻,缓存至缓冲区的音频帧。
其中,基于第一音频帧可以确定调整各路音频的音频特征的起始时刻,具体包括:可以由当前缓存的第一音频帧为基准,回退预设时间长度,开始对多路音频进行合并。其中,预设时间长度例如可以是100ms。
因此,根据本申请实施例提供的音频处理的方法,可以避免电子设备处理时延,导致第三音频不能完整包括目标音频内容的问题。
结合第一方面,在第一方面的某些实现方式中,所述第一视角和第二视角是前置视角、广角视角、变焦视角中的任意两个视角。
第二方面,提供了一种音频处理的方法,应用于电子设备,所述电子设备包括第一摄像头、第二摄像头,其中,所述第一摄像头拍摄第一视角,所述第二摄像头拍摄第二视角,所述方法包括:
响应于所述用户输入的录像操作时,进入录像模式;在所述录像模式下,所述第一摄像头对所述第一视角录制第一视频画面;录制多个声音通道的音频,所述多个声音通道的音频包括所述第一视角对应的第一音频和所述第二视角对应的第二音频;在第一时刻,第一说话人说话,所述第一说话人位于所述第一视角内;
生成目标录像文件,所述目标录像文件包括第三音频和第一视频画面,其中,所述第三音频包括至少部分所述第一音频和至少部分第二音频;以及
响应于用户输入对所述目标录像文件的播放操作,播放所述目标录像文件;其中,
当播放到所述第一时刻对应的画面时,所述第一音频的音频特征发生变化。
在一种实现方式中,第一摄像头为后置摄像头,第一视角为后置视角,第一视频画面为后置视角的画面,且第一音频为后置视角范围内的声音,其中,第一音频可以包括第一说话人的说话声音,第一说话人为位于后置视角范围内的被拍摄对象。第二视角为前置视角,第二音频为前置视角范围内的声音。
或者,第一摄像头也可以是电子设备的前置摄像头,第一视角为前置视角,第一视频画面为前置视角的画面,且第一音频为前置视角范围内的声音。第二视角为后置视角,第二音频为后置视角范围内的声音。
示例性的,第三音频为对多个声音通道的音频合并后的音频,包括至少部分第一音频和至少部分第二音频。
根据本申请实施例提供的音频处理的方法,通过基于拍摄的视频图像检测到说话人开始说话这一事件,对第三音频中与视频图像对应音频的权重进行动态调整,可以在呈现完整音频的基础上,优化音频之间的切换效果,实现音频之间的自然光滑切换,并有针对性地凸显多路音频中的重点内容,提升用户的收听体验。
结合第二方面,在第二方面的某些实现方式中,所述音频特征包括音量,播放所述目标录像文件,具体包括:
当播放到所述第一时刻对应的视频画面时,所述第一音频的音量增大。
根据根据本申请实施例提供的音频处理的方法,通过在说话人说话时,提高第一音频的音量,能够使第三音频中的播放音频切换为说话人所在视角对应的音频,使用户清晰的听到说话人的声音。
结合第二方面,在第二方面的某些实现方式中,当播放到所述第一时刻对应的视频画面时,所述第一音频的音量逐渐增大。
具体地,当检测到第一说话人开始说话时,第一音频的权重可以随时间动态增大,使得播放录像时,第一音频可以由弱渐强,实现自然切换。
根据本申请实施例提供的音频处理的方法,通过逐渐增大第一音频的音量,能实现第三音频中由其他音频自然切换至第一音频的效果,避免播放录像时的声音突变感。
第三方面,提供了一种电子设备,包括:多个摄像头,用于采集视频画面;
屏幕,用于显示界面;
音频播放部件,用于播放音频;
一个或多个处理器;
存储器;
以及一个或多个计算机程序,其中所述一个或多个计算机程序被存储在所述存储器中,所述一个或多个计算机程序包括指令,当所述指令被所述电子设备执行时,使得所述电子设备执行如下步骤:
响应于所述用户输入的录像操作时,进入录像模式;在所述录像模式下,所述第一摄像头对所述第一视角录制第一视频画面;录制多个声音通道的音频,所述多个声音通道的音频包括所述第一视角对应的第一音频和所述第二视角对应的第二音频;在第一时刻,第一说话人说话,所述第一说话人位于所述第二视角内;
生成目标录像文件,所述目标录像文件包括第三音频和第一视频画面,其中,所述第三音频包括至少部分所述第一音频和至少部分第二音频;以及
响应于用户输入对所述目标录像文件的播放操作,播放所述目标录像文件;其中,
当播放到所述第一时刻对应的画面时,所述第二音频的音频特征发生变化。
结合第三方面,在第三方面的某些实现方式中,所述音频特征包括音量,当所述指令被所述电子设备执行时,使得所述电子设备执行如下步骤:当播放到所述第一时刻对应的视频画面时,所述第二音频的音量增大。
结合第三方面,在第三方面的某些实现方式中,当所述指令被所述电子设备执行时,使得所述电子设备执行如下步骤:当播放到所述第一时刻对应的视频画面时,所述第二音频的音量逐渐增大。
结合第三方面,在第三方面的某些实现方式中,当所述指令被所述电子设备执行时,使得所述电子设备执行如下步骤:在所述录像模式下,所述第二摄像头对所述第二视角录制第二视频画面,所述电子设备显示拍摄界面,所述拍摄界面包括所述第一视频画面和第二视频画面;
所述目标录像文件还包括所述第二视频画面;
所述电子设备播放所述目标录像文件时,所述电子设备显示所述第一视频画面和所述第二视频画面。
结合第三方面,在第三方面的某些实现方式中,当所述指令被所述电子设备执行时,使得所述电子设备执行如下步骤:在所述录像模式下,所述第二摄像头对所述第二视角录制第二视频画面,所述电子设备显示拍摄界面,所述拍摄界面不包括所述第二视频画面;
所述电子设备播放所述目标录像文件时,所述电子设备不显示所述第二视频画面。
结合第三方面,在第三方面的某些实现方式中,当所述指令被所述电子设备执行时,使得所述电子设备执行如下步骤:在所述录像模式下,所述第二摄像头对所述第二视角录制第二视频画面,在所述第一时刻,所述第二视频画面中的所述第一说话人张口。
结合第三方面,在第三方面的某些实现方式中,当所述指令被所述电子设备执行时,使得所述电子设备执行如下步骤:当播放到所述第二时刻对应的画面时,所述第三音频中所述第一音频的音量逐渐增大。
结合第三方面,在第三方面的某些实现方式中,所述电子设备包括第一麦克风和第二麦克风;当所述指令被所述电子设备执行时,使得所述电子设备执行如下步骤:在所述录像模式下,所述第一麦克风录制所述第一音频,所述第二麦克风录制所述第二音频;或,
在所述录像模式下,所述第一麦克风录制所述第二音频,所述第二麦克风录制所述第一音频。
结合第三方面,在第三方面的某些实现方式中,所述电子设备包括第一麦克风,第二麦克风与所述电子设备无线连接;当所述指令被所述电子设备执行时,使得所述电子设备执行如下步骤:在所述录像模式下,所述第一麦克风录制所述第一音频,所述第二麦克风录制所述第二音频;或,
在所述录像模式下,所述第一麦克风录制所述第二音频,所述第二麦克风录制所述第一音频。
结合第三方面,在第三方面的某些实现方式中,第一麦克风和第二麦克风均与所述电子设备无线连接;当所述指令被所述电子设备执行时,使得所述电子设备执行如下步骤:在所述录像模式下,所述第一麦克风录制所述第一音频,所述第二麦克风录制所述第二音频;或,
在所述录像模式下,所述第一麦克风录制所述第二音频,所述第二麦克风录制所述第一音频。
结合第三方面,在第三方面的某些实现方式中,当所述指令被所述电子设备执行时,使得所述电子设备执行如下步骤:在所述录像模式下,缓存所述第一音频的音频帧、第二音频的音频帧和第一视频画面的视频帧;
检测所述第一说话人的动作;
在检测到所述第一说话人开始说话时,在当前音频帧的前i帧音频帧开始,调整所述第三音频中所述第一音频的音频特征,并调整所述第三音频中所述第二音频的音频特征,i大于等于1。
第四方面,提供了一种电子设备,包括:多个摄像头,用于采集视频画面;屏幕,用于显示界面;音频播放部件,用于播放音频;一个或多个处理器;存储器;以及一个或多个计算机程序,其中所述一个或多个计算机程序被存储在所述存储器中,所述一个或多个计算机程序包括指令,当所述指令被所述电子设备执行时,使得所述电子设备执行如下步骤:
响应于所述用户输入的录像操作时,进入录像模式;在所述录像模式下,所述第一摄像头对所述第一视角录制第一视频画面;录制多个声音通道的音频,所述多个声音通道的音频包括所述第一视角对应的第一音频和所述第二视角对应的第二音频;在第一时刻,第一说话人说话,所述第一说话人位于所述第一视角内;
生成目标录像文件,所述目标录像文件包括第三音频和第一视频画面,其中,所述第三音频包括至少部分所述第一音频和至少部分第二音频;以及
响应于用户输入对所述目标录像文件的播放操作,播放所述目标录像文件;其中,
当播放到所述第一时刻对应的画面时,所述第一音频的音频特征发生变化。
结合第四方面,在第四方面的某些实现方式中,所述音频特征包括音量,当所述指令被所述电子设备执行时,使得所述电子设备执行如下步骤:当播放到所述第一时刻对应的视频画面时,所述第一音频的音量增大。
结合第四方面,在第四方面的某些实现方式中,当所述指令被所述电子设备执行时,使得所述电子设备执行如下步骤:当播放到所述第一时刻对应的视频画面时,所述第一音频的音量逐渐增大。
第五方面,提供了一种音频处理的系统,包括电子设备和至少一个无线麦克风,所述电子设备与所述无线麦克风无线连接,其中,所述电子设备用于执行如第一方面或第二方面中任一实现方式所述的音频处理的方法,无线麦克风用于录制音频,并将录制的音频发送至所述电子设备。
第六方面,提供了一种装置,该装置包含在电子设备中,该装置具有实现上述方面及上述方面的可能实现方式中电子设备行为的功能。功能可以通过硬件实现,也可以通过硬件执行相应的软件实现。硬件或软件包括一个或多个与上述功能相对应的模块或单元。例如,显示模块或单元、检测模块或单元、处理模块或单元等。
第七方面,提供了一种计算机可读存储介质,包括计算机指令,当所述计算机指令在电子设备上运行时,使得所述电子设备执行如第一方面或第二方面中任一实现方式所述的音频处理的方法。
第八方面,提供了一种计算机程序产品,当所述计算机程序产品在计算机上运行时,使得所述计算机执行如第一方面或第二方面中任一实现方式所述的音频处理的方法。
第九方面,提供了一种电子设备,包括屏幕、计算机存储器、摄像头,用于实现如第一方面或第二方面中任一实现方式所述的音频处理的方法。
附图说明
图1是本申请实施例提供的电子设备的结构示意图。
图2是本申请实施例的电子设备的软件结构示意图。
图3A~图3D是本申请实施例提供的用户界面示意图。
图4A~图4C是本申请实施例提供的一些音频处理方法可能的应用场景的示意图。
图5A和图5B是本申请实施例提供的另一些音频处理方法可能的应用场景的示意图。
图6是本申请实施例提供的一种音频处理方法的可能的应用场景的示意图。
图7是本申请实施例提供的一种音频处理的方法的流程示意图。
图8是本申请实施例提供的一种音频权重变化的示意图。
图9是本申请实施例提供的另一种音频处理的方法的流程示意图。
图10是本申请实施例提供的一种多路音频合并的示意图。
图11是本申请实施例提供的又一种音频处理的方法的流程示意图。
图12是本申请实施例提供的一种多路音频合并的示意图。
图13是本申请实施例提供的另一种多路音频合并的示意图。
具体实施方式
下面结合本申请实施例中的附图对本申请实施例进行描述。本申请实施例的实施方式部分使用的术语仅用于对本申请的具体实施例进行解释,而非旨在限定本申请。在本申请实施例的描述中,除非另有说明,“/”表示或的意思,例如,A/B可以表示A或B;本文中的“和/或”仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。另外,在本申请实施例的描述中,“多个”是指两个或多于两个,“多路”是指两路或多于两路。
以下,术语“第一”、“第二”仅用于描述目的,而不能理解为指示或暗示相对重要性护着隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第二”特征可以明示或者隐含地包括一个或者更多个该特征。在本实施例的描述中,除非另有说明,“多个”的含义是两个或两个以上。
随着电子设备录像功能的发展,越来越多的用户习惯用录像的方式记录生活或分享乐趣。根据获取录制视频画面的线路数量,可以将录像模式分为单路录像模式和多路录像模式(或称多景录像模式)。
在单路录像模式下,电子设备在录像过程中可以录制单路视频画面,即录制一条线路的视频画面。根据拍摄视角的不同,可以将单路录像模式具体分为以下两种情形:(1)拍摄视角为前置拍摄视角的录像模式(下文简称前置单路录像模式);(2)拍摄视角为后置拍摄视角的录像模式(下文简称后置单路录像模式)。
在多路录像模式下,电子设备在录像过程中可以录制多路视频画面,即录制多条线路的视频画面。其中,不同线路的视频画面可以对应不同的拍摄视角。
其中,拍摄视角可以根据待拍摄对象为前置对象还是后置对象,和/或变焦倍数的大小来划分。例如,在本申请的实施例中,拍摄视角可以包括前置视角和后置视角;而根据变焦倍数的大小,后置视角又可以包括广角视角(或称后置广角视角)和变焦视角(或称后置变焦视角)。其中,广角视角可以为变焦倍数小于或者等于预设值K的场景对应的拍摄视角。例如,该预设值K可以为2,1.5或1等。变焦视角可以为变 焦倍数大于预设值K的场景对应的拍摄视角。前置视角为自拍等前置拍摄场景对应的拍摄视角。
在一种可能的实现方式中,多路录像模式下,各路视频画面对应的拍摄视角在本次录像过程中是固定不变的。该种情况下的多路录像也可以称为多视角录像。根据拍摄视角的不同,该情形下的多路录像模式还可以具体分为以下几种情形:(1)拍摄视角包括前置拍摄视角和后置拍摄视角的录像模式(下文简称为前后多路录像模式);(2)拍摄视角包括多个前置拍摄视角,而不包括后置拍摄视角的录像模式(下文简称前置多路录像模式);(3)拍摄视角包括多个后置拍摄视角,而不包括前置拍摄视角的录像模式(下文简称后置多路录像模式)。
示例性的,以后置拍摄视角为广角视角和/或变焦视角为例,对拍摄模式和拍摄视角的对应关系进行说明。如表1所示,为拍摄模式和对应的拍摄视角。该拍摄模式对应的拍摄视角可以是广角视角、变焦视角或前置视角中任意一种或多种拍摄视角的组合。其中,每种拍摄模式可以包括一条或多条线路,每条线路可以对应一种拍摄视角。拍摄模式1-4为多路录像模式,拍摄模式5-6为单路录像模式。采用多路录像模式下的拍摄模式录制的视频画面,可以包括广角视角下的视频画面、变焦视角下的视频画面或前置视角下的视频画面中任意多路视频画面的组合。
表1
Figure PCTCN2021119048-appb-000001
在另一种可能的实现方式中,多路录像模式下,录像过程中的拍摄视角在本次录像过程中是可以发生变化的。示例性的,当检测到某一拍摄视角内的说话人开始说话,而其他拍摄视角的说话人未说话时,则可以仅对该视角进行拍摄,获取对应的视频画面;如果检测到其他视角的说话人开始说话,则可以切换到当前说话人对应的视角进行拍摄,获取新的视频画面。
例如,在表1的拍摄模式2下,广角视角和前置视角可以发生切换。比如,广角 视角中存在第一说话人,前置视角存在第二说话人,假设在初始录像阶段第一说话人在说话,而第二说话人未说话,此时可以仅拍摄广角视角的视频画面,电子设备在拍摄预览界面显示该广角视角对应的视频画面;而后,当第一说话人停止说话,第二说话人开始说话时,拍摄视角切换为前置视频,电子设备的拍摄预览界面显示前置视角对应的视频画面。
其中,如果当第一说话人和第二说话人同时说话,则可以同时拍摄广角视角和前置视角两条线路的视频画面,此时,电子设备的拍摄预览界面可以同时显示上述两个视角对应的视频画面。
在本申请的一些实施例中,在单路录制模式下,电子设备在录制单个视频画面的同时,还可以录制多路音频(即多个声音通道的音频),该多路音频包括分别与多个视频画面对应的音频。
例如,在前置单路录像模式下(如用户自拍),电子设备录制前置视角对应的视频画面的同时,还可以录制前置视角对应的音频(以下简称前置视角对应的音频)。此外,为获取环境中其它视角范围的音频,电子设备还可以录制该前置视角范围之外的其它视角范围对应的音频(以下简称其它视角音频),例如,录制后置视角对应的音频等。其中,该模式下,如果前置视频画面包括一个或多个说话人,则前置视角范围的音频可以是说话人的说话声音;其它视角音频例如可以是位于前置视角范围之外区域中的其他人的说话声音或者环境中的声音等。
应理解,本申请实施例中的说话人可以是在录像过程中说话,并被录制到说话声音的人,如:手持电子设备的用户;或者出现在视频画面中的被拍摄对象;或者未出现在视频画面中,但被录制到说话声音的人。
又如,在后置单路录像模式下,电子设备可以录制后置视角对应的视频画面,同时录制该后置视角对应的音频(以下简称后置视角对应的音频)。此外,电子设备还可以录制后置视角范围之外的其它视角音频,例如,录制前置视角对应的音频等。其中,该模式下,如果后置视频画面包括一个或多个说话人,则后置视角范围的音频可以是说话人的说话声音;其它视角音频例如可以是位于后置视角范围之外区域中的其他人的说话声音或者环境中的其他声音等。
在本申请的另一些实施例中,在多路录像模式下,电子设备在录制多个拍摄视角分别对应的视频画面的同时,还可以录制不同拍摄视角和视频画面对应的音频。
在一种可能的实现方式下,在前后多路录像模式下,电子设备可以分别录制前置视角和后置视角对应的视频画面,同时录制前置视角对应的音频和后置视角对应的音频。此外,电子设备还可以录制前置视角和后置视角范围之外的其它视角音频。其中,该模式下,如果前置视频画面包括一个或多个说话人,则前置视角对应的音频可以是前置视频画面中说话人的说话声音;如果后置视频画面包括一个或多个说话人,则后置视角对应的音频可以是后置视频画面中说话人的说话声音;或者,前置视角对应的音频或后置视角对应的音频还可以包括环境中的其它声音等。
比如,上述表1的拍摄模式4中,广角视角对应的音频内容可以包括周围各个方向的全景声音(即周围360度的声音),变焦视角对应的音频内容主要包括变焦范围内的声音,前置视角对应的音频内容主要是前置视角范围内的声音。则在拍摄模式4 下,电子设备可以录制线路1对应的广角视角下的视频画面,并根据广角视角录制线路1对应的音频;电子设备可以录制线路2对应的变焦视角下的视频画面,并根据变焦视角录制线路2对应的音频;电子设备可以录制线路3对应的前置视角下的视频画面,并根据前置视角录制线路3对应的音频。
在一种可能的实现方式下,在前置多路录像模式下,电子设备可以录制多个不同前置视角对应的视频画面,同时录制多路前置视角对应的音频。此外,电子设备还可以录制各前置视角范围之外的其它视角音频。其中,该模式下,如果前置视频画面包括一个或多个说话人,则前置视角对应的音频可以是该说话人的说话声音;或者,前置视角的音频还可以包括环境中其它的声音等。
在一种可能的实现方式下,在后置多路录像模式下,电子设备可以录制多个不同后置视角对应的视频画面,同时录制视频画面对应的多路后置视角对应的音频。此外,电子设备还可以录制各后置视角范围之外的其它视角音频。其中,该模式下,如果后置视频画面包括一个或多个说话人,则后置视角对应的音频可以是该说话人的说话声音;或者,后置视角的音频还可以包括环境中的其它声音等。
应理解,本申请实施例中,在各录像模式下,电子设备录制的不同视角的音频与视频画面对应可以是:音频主要为视频画面对应的视角范围内的音频。例如,前置视角对应的音频的音频内容主要包括前置视角范围内的声音,后置视角对应的音频主要包括后置视角范围内的声音。
在实际应用时,为了在录像回放或者分享时,获得更好的录音音频体验,用户不再满足于仅保留一路音频的内容,而是希望录音作品既能展现更加丰富的音频内容,又能实现自然流畅的音频切换效果。然而,正如背景技术中所介绍的,目前的录像音频只能进行单路录音,不能保留完整的音频内容;或者即使能够进行如前述介绍的多路录音,但是在视频回放时,若要获得不同路音频的内容,需要切换播放音轨,这样会导致声音突变。又如,录像分享时,只能选择其中一路音频进行分享,无法提供完整的音频内容。
针对上述问题,本申请实施例提供了一种音频处理的方法,该方法可以应用于上述介绍的录像模式下。在上述不同录像场景中,电子设备进入录像模式后,可以录制不同视角对应的视频画面,同时录制不同视角范围的多路音频。而后,电子设备生成包括视频画面和多路音频的第三音频的音视频文件。在视频回放时,电子设备在播放视频画面的同时,还播放第三音频;在录像播放过程中,如果某一说话人开始说话,则第三音频中该说话人的音量会逐渐增加,使得第三音频由其他声音逐步切换为说话人的声音,使得每一个说话人的声音均能够被清楚地播放出来。
例如,在上述前置单路录音模式下,当视频播放(或视频回放)时,在播放前置视角对应的视频画面的同时,还播放前置视角对应的音频和其他视角音频的第三音频。示例性的,假设录像播放的初始阶段,前置视角的说话人没有开始说话,可以认为此时并不需要录制前置视角的说话人的声音,此时第三音频中其它视角音频(例如后置视角对应的音频)的音量较高,更多呈现的是其它视角音频,例如前置视角范围之外的环境中的声音或者其他人的说话声音,以获得更需要录制的声音;而后,当前置视频画面中的说话人开始说话时,第三音频中前置视角对应的音频的音量逐渐增加,其 它视角音频的音量可以逐渐降低,此时,播放的音频逐渐切换为前置视角对应的音频,用户可以更清楚地听到说话人的声音从而可以有效地避免其他视角中的杂音(例如后置视角中的杂音)。之后,如果说话人停止说话,第三音频中的其它视角音频的音量可以又逐渐增加,而前置视角对应的音频的音量逐渐降低,此时播放的音频逐渐切换为其他人的说话声音或者环境中的其它声音。
又如,在上述前后置多路录音模式下,当视频回放时,播放前置视角和后置视角分别对应的视频画面,同时电子设备还播放前置视角对应的音频和后置视角对应的音频的第三音频。示例性的,假设在视频回放的初始阶段,前置视频画面中的说话人没有说话,而后置视频画面中的说话人在说话,则第三音频中后置视角对应的音频的音量较高,前置视角对应的音频的音量较低甚至无声音;而后,当前置视频画面中的说话人开始说话时,第三音频中前置视角对应的音频的音量开始逐渐增加,后置视角对应的音频开始逐渐降低,第三音频由后置视角对应的音频逐渐向前置视角对应的音频切换,使得第三音频更多地呈现前置视角对应的音频的内容;之后,当后置视频画面中的说话人再次开始说话时,第三音频中后置视角对应的音频的音量再次逐渐增加,而前置视角对应的音频的音量可以逐渐降低,第三音频由前置视角对应的音频逐渐切换为后置视角对应的音频。如果前置视频画面的说话人之后再次开始说话时,则第三音频中,前置视角对应的音频和后置视角对应的音频的切换可以重复上述相应过程,实现由后置视角对应的音频逐渐切换为前置视角对应的音频的效果。当前置视频画面中的说话人和后置视频画面中的说话人同时说话时,前置视频画面中说话人的声音和后置视频画面中说话人的声音被播放出来。
示例性的,在前后多路录像模式下,在视频播放时,如果播放的视频画面为广角视频画面和前置视角组成的多路视频画面,那么电子设备播放的音频可以为全景音频和前置视角对应的音频的第三音频;如果播放的视频画面切换为变焦视角画面和前置画面时,那么电子设备播放的音频可以为广角范围对应的音频和前置视角对应的音频的第三音频。其中,第三音频中各路音频的切换过程与上述介绍的前后置多路录音模式下的各路音频切换过程类似,此处不再赘述。
其它录像模式下视频回放时的音频切换场景与上述描述的场景类似。示例性的,在视频回放时,当某一说话人开始说话时,则该说话人的说话声音的音量会逐渐增大,播放的第三音频逐渐切换为该说话人的声音,而当另一说话人开始说话时,则最新开始说话的说话人的声音的音量会逐渐增大,之前说话人的音量则会逐渐降低,第三音频由之前说话人的声音切换为当前说话人的声音,使用户清楚地听到不同说话人的声音。
根据本申请实施例提供的音频处理的方法,当视频回放时,电子设备播放多路音频的第三音频,并且第三音频中的各路音频能够自然切换,从而提高用户的录像的音频体验。
本申请实施例提供的音频处理的方法,可以应用于电子设备。例如,该电子设备具体可以是手机、平板电脑、可穿戴设备、车载设备、增强现实(augmented reality,AR)/虚拟现实(virtual reality,VR)设备、笔记本电脑、超级移动个人计算机(ultra-mobile personal computer,UMPC)、上网本、个人数字助理(personal digital assistant,PDA) 或专门的照相机(例如单反相机、卡片式相机)等,本申请实施例对电子设备的具体类型不作任何限制。
示例性的,图1示出了本申请实施例提供的一种电子设备100的结构示意图。电子设备100可以包括处理器110,外部存储器接口120,内部存储器121,通用串行总线(universal serial bus,USB)接口130,充电管理模块140,电源管理模块141,电池142,天线1,天线2,移动通信模块150,无线通信模块160,音频模块170,扬声器170A,受话器170B,麦克风170C,耳机接口170D,传感器模块180,按键190,马达191,指示器192,摄像头193,显示屏194,以及用户标识模块(subscriber identification module,SIM)卡接口195等。其中传感器模块180可以包括压力传感器180A,陀螺仪传感器180B,气压传感器180C,磁传感器180D,加速度传感器180E,距离传感器180F,接近光传感器180G,指纹传感器180H,温度传感器180J,触摸传感器180K,环境光传感器180L,骨传导传感器180M等。
其中,麦克风170C可以有多个,摄像头193也可以有多个,例如前置摄像头、后置摄像头等。
可以理解的是,本申请实施例示意的结构并不构成对电子设备100的具体限定。在本申请另一些实施例中,电子设备100可以包括比图示更多或更少的不见,或者组合某些部件,或者拆分某些部件,或者不同的部件布置。图示的部件可以以硬件、软件或软件和硬件的组合实现。
处理器110可以包括一个或多个处理单元,例如:处理器110可以包括应用处理器(application processor,AP),调制解调处理器,图形处理器(graphics processiong,GPU),图像信号处理器(image signal processor,ISP),音频处理器/数字处理器(the audio processor),控制器、存储器、视频编解码器、音频编解码器,数字信号处理器(digital signal processor,DSP),基带处理器器,和/或神经网络出合理器(neural-network processing unit,NPU)等。其中,不同的处理单元可以是独立的器件,也可以集成在一个或多个处理器中。
其中,控制器可以是电子设备100的神经中枢和指挥中心。控制器可以根据指令操作码和时序信号,产生操作控制信号,完成取指令和执行指令的控制。
处理器110中还可以设置存储器,用于存储指令和数据。在一些实施例中,处理器110中的存储器为高速缓冲存储器。该存储器可以保存处理器110刚用过或循环使用的指令或数据。如果处理器110需要再次使用该指令或数据,可从所述存储器中直接调用。避免了重复存取,减少了处理器110的等待事件,因而提高了系统的效率。
例如,本申请中,存储器中存储有固件程序(firmware),用于使控制器或处理器可以通过接口或协议实现本申请的音频处理的方法。
在一些实施例中,处理器110可以包括一个或多个接口。接口可以包括集成电路(inter-intergrated circuit,I2C)接口,集成电路内置音频(inter-intergrated circuit sound,I2S)接口,脉冲编码调制(pluse code modulation,PCM)接口,通用异步收发传输器(universal asynchronous receiver/transmitter,UART)接口,移动产业处理器接口(mobile industry processor interface,MIPI),通用输入输出(general-purpose input/output,GPIO),用户标识模块接口,和/或通用串行总线接口等。
I2C接口是一种双向同步串行总线,包括一根串行数据线(serial data line,SDA)和一根串行时钟线(derail clock line,SCL)。在一些实施例中,处理器110可以包含多组I2C总线。处理器110可以通过不同的I2C总线接口分别耦合触摸传感器180K,麦克风,摄像头193等。例如,处理器110器可以通过I2C接口耦合触摸传感器180K,是处理器110与触摸传感器180K通过I2C总线接口通信,实现电子设备100的触摸功能。
I2S接口可以用于音频数据传输。在一些实施例中,处理器110可以包含多组I2S总线。处理器110可以通过I2S总线与音频模块170耦合,实现处理器110与音频模块170之间的通信。在一些实施例中,音频模块170可以通过I2S接口接收音频信号,实现录制音频的功能。
PCM接口也可以用于音频通信,将模拟信号抽样,量化和编码。在一些实施例中,音频模块170与无线通信模块160可以通过PCM总线接口耦合。在一些实施例中,音频模块170也可以通过PCM接口向无线通信模块160传递音频信号,实现通过蓝牙耳机接听电话的功能;或者,通过PCM接口接收无线通信模块160输入的音频信号,实现获取无线麦克风采集的音频数据。
UART接口是一种通用串行数据总线,用于异步通信。该总线可以为双向通信总线。它将要传输的数据在串行通信与并行通信之间转换。在一些实施例中,UART接口通常被用于连接处理器110与无线通信模块160。例如,处理器110通过UART接口与无线通信模块160中的蓝牙模块通信,实现蓝牙功能。在一些实施例中,音频模块170可以通过UART接口接收蓝牙模块传递的音频信号,实现通过蓝牙耳机中的无线麦克风的录制音频的功能。
MIPI接口可以被用于连接处理器110与显示屏194,摄像头193等外围器件。MIPI接口包括摄像头串行接口(camera serial interface,CSI),显示屏串行接口(display serial interface,DSI)等。在一些实施例中,处理器110和摄像头193通过CSI接口通信,实现电子设备100的拍摄功能。处理器110和显示屏194通过DSI接口通信,实现电子设备100的显示功能。
GPIO接口可以通过软件配置。GPIO可以被配置为控制信号,也可以被配置为数据信号。在一些实施例中,GPIO接口可以用于连接处理器110与摄像头193,显示屏194,无线通信模块160,音频模块170,传感器模块180等。GPIO接口还可以被配置为I2C接口,I2S接口,UART接口,MIPI接口等。
USB接口130是符合USB标准规范的接口,具体可以是Mini USB接口,Micro USB接口,USB Type C接口等。USB接口130可以用于连接充电器为电子设备100充电,也可以用于电子设备100与外围设备之间传输数据。也可以用于连接耳机,通过耳机播放音频。该接口还可以用于连接其他电子设备,例如AR设备等。
应理解,本申请实施例示意的各模块间的接口连接关系,只是示意性说明,并不构成对电子设备100的结构限定。在本申请另一些实施例中,电子设备100也可以采用上述实施例中不同的接口连接方式,或多种接口连接方式的组合。
充电管理模块140用于从充电器接收充电输入。其中,充电器可以是无线充电器,也可以是有线充电器。电源管理模块141用于连接电池142,充电管理模块140与处 理器110。电源管理模块141接收电池142和/或充电管理模块140的输入,为处理器110,内部存储器121,外部存储器,显示屏194,摄像头193,和无线通信模块163等供电。电源管理模块141还可以用于监测电池容量,电池循环次数,电池健康状态(漏电,阻抗)等参数,在其他一些实施例中,电源管理模块141也可以设置于处理器110中。在另一些实施例中,电源管理模块141和充电管理模块140也可以设置于同一器件中。
电子设备100的无线通信功能可以通过天线1,天线2,移动通信模块150,无线通信模块160,调制解调处理器以及基带处理器等实现。
天线1和天线2用于发射和接收电磁波信号。电子设备100中的每个天线可用于覆盖单个或多个通信频带。不同的天线还可以复用,以提高天线的利用率。移动通信模块150可以提供应用在电子设备100上的包括2G/3G/4G/5G等无线通信的解决方案。
调制解调处理器可以包括调制器和解调器。其中,调制器用于将待发送的低频基带信号调制成中高频信号。解调器用于将接受的电磁波信号解调为低频基带信号。无线通信模块160可以提供应用在电子设备100上的包括无线局域网(wireless local area networks,WLAN)(如无线保真(wireless fidelity,WI-FI)网络),蓝牙(bluetooth,BT),北斗卫星导航系统(BeiDou navigation satellite system,BDS),全球导航卫星系统(global navigation satellite system,GNSS),调频(frequency modulation,FM),近距离无线通信技术(near field communication,NFC),红外技术(infrared,IR)等无线通信的解决方案。
电子设备100通过图形处理器(graphics processing unit,GPU),显示屏194,以及应用处理器等实现显示功能。GPU为图像处理的微处理器,连接显示屏194和应用处理器。GPU用于执行数据和几何计算,用于图形渲染。处理器110可以包括一个或多个GPU,其执行程序指令以生成或改变显示信息。
显示屏194用于显示图像,视频等。显示屏194包括显示面板。显示面板可以采用液晶显示屏(liquid crystal display,LCD),有机发光二极管(organic light-emitting diode,OLED),有源矩阵有机发光二极体(active-matrix organic light emitting diode,AMOLED),柔性发光二极管(flex light-emitting diode,FLED),量子点发光二极管(quantum dot light emitting diodes,QLED)等。在一些实施例中,电子设备100可以包括一个或多个显示屏194。
电子设备100可以通过图像信号处理器(image signal processor,ISP),摄像头193,视频编解码器,GPU,显示屏194以及应用处理器等实现拍摄功能。在本申请中,摄像头193可以包括电子设备100的前置摄像头和后置摄像头,其可以是光学变焦镜头等,本申请对此不作限定。
在一些实施例中,ISP可以设置在摄像头193中,本申请对此不作限定。
摄像头193用于捕获静态图像或视频。物体通过镜头生成光学图像投射到感光元件。感光元件可以使电荷耦合器件(charge couple device,CCD)或互补金属氧化物半导体(complementary metal-oxide-semiconductor,CMOS)光电晶体管。感光元件把光信号转换成电信号,之后将电信号传递给ISP转换成数字图像信号。ISP将数字图像信号输出到DSP加工处理。DSP将数字图像信号转换成标准的RGB,YUV等格式的 图像信号。在一些实施例中,电子设备100可以包括一个或多个摄像头193。
其中,电子设备100可以包括多个摄像头193,比如至少一个前置摄像头和后置摄像头、多个前置摄像头或多个后置摄像头等。
数字信号处理器用于处理数字信号,除了可以处理数字图像信号,还可以处理其他数字信号。例如,当电子设备100在频点选择时,数字信号处理器用于对频点能量进行傅里叶变换等。
视频编解码器用于对数字视频压缩或解压缩。电子设备100可以支持一种或多种视频编解码器。这样,电子设备100可以播放或录制多种编码格式的视频,例如,动态图像专家组(moving picture experts group,MPEG)1,MPEG2,MPEG3,MPEG4等。
NPU为神经网络(neural-network,NN)计算处理器,通过借鉴生物神经网络结构,例如借鉴人脑神经元之间传递模式,对输入信息快速处理,还可以不断的自学习。通过NPU可以实现电子设备100的智能认知等应用,例如,图像识别,人脸识别,语音识别,文本理解等。
外部存储器接口120可以用于连接外部存储卡,例如Micro SD卡,实现扩展电子设备100的存储能力。外部存储卡通过外部存储器接口120与处理器110通信,实现数据存储功能。例如将音频、视频等文件保存在外部存储卡中。
内部存储卡121可以用于存储计算机可执行程序代码,所述可执行可执行程序代码包括指令。处理器110通过运行存储在内部存储器121的指令,从而执行电子设备100的各种功能应用以及数据处理。内部存储器121可以包括存储程序区和存储数据区。其中,存储程序区可存储操作系统,至少一个功能所需的应用程序(比如音频播放功能,图像播放功能等)等。存储数据去可存储电子设备100使用过程中所创建的数据(比如音频数据,电话本等)等。此外,内部存储器121可以包括高速随机存取存储器,还可以包括非易失性存储器,例如,至少一个磁盘存储器件,闪存器件,通用闪存存储器(universal flash storage,UFS)等。
电子设备100可以通过音频模块170,扬声器170A,受话器170B,麦克风170C,耳机接口170D,以及应用处理器等实现音频功能。例如,音频播放,录音等。
音频模块170用于数字音频信息转换为模拟信号输出,也用于将模拟模拟音频输入转换为数字音频信号。音频模块170还可以用于对音频信号编码和解码。在一些实施例中,音频模块170可以设置于处理器110中,或将音频模块170的部分功能模块设置于处理器110中。
扬声器170A,也称“喇叭”,用于将音频电信号转换为声音信号。电子设备100可以通过扬声器170A使用户收听音频,或收听免提通话等。
受话器170B,也称“听筒”,用于将音频电信号转换成声音信号。当电子设备100接听电话或语音信息时,可以通过将受话器170B靠人耳接听语音。
麦克风170C,也称“话筒”、“传声器”,用于将声音信号转换成电信号。当拨打电话或发送语音信息时,用户可以通过人嘴靠近麦克风170C发声,将声音信号输入到麦克风170C。在本申请中,电子设备100可以设置至少两个麦克风170C,例如本机麦克风或者无线麦克风。在另一些实施例中,电子设备100可以设置三个,四个或更 多麦克风170C,实现采集声音信号,降噪,还可以识别声音来源,实现定向录音功能等。
在本申请中,电子设备可以通过多个麦克风170C采集多路音频。除了通过电子设备内部安装的本机麦克风外,电子设备还可以通过与电子设备无线连接的无线麦克风采集音频。
在本申请实施例中,多个麦克风170C可以将获取的声音信号转换为电信号传递至处理器110,处理器110中的音频处理器接收到该多路音频信号后,对多路音频信号进行处理,例如通过音频编解码器对各通道音频进行编码等。
耳机接口170D用于连接有线耳机。耳机接口170D可以是USB接口130,也可以是3.5mm的开放移动电子设备平台(open mobile terminal platform,OMTP)标准接口,美国蜂窝电信工业协会(cellular telecommunications industry association of the USA,CTIA)标准接口。
距离传感器180F,用于测量距离。电子设备100可以通过红外或激光测量距离。在一些实施例中,拍摄场景,电子设备100可以利用距离传感器180F测距以实现快速对焦。
指纹传感器180H用于采集指纹。电子设备100可以利用采集的指纹特性实现指纹解锁,访问应用锁,指纹拍照,指纹接听来电等。
触摸传感器180K,也称“触控面板”。触摸传感器180K可以设置于显示屏194,由触摸传感器180K与显示屏194组成触摸屏,也称“触控屏”。触摸传感器180K用于检测作用于其上或附近的触摸操作。触摸传感器可以将检测到的触摸操作传递给应用处理器,以确定触摸事件类型。可以通过显示屏194提供与触摸操作相关的视觉输出。在另一些实施例中,触摸传感器180K也可以设置于电子设备100的表面,与显示屏194所处的位置不同。
图2是本申请实施例的电子设备100的软件结构框图。分层架构将软件分成若干个层,每一层都有清晰的角色和分工。层与层之间通过软件接口通信。在一些实施例中,将Android系统分为四层,从上至下分别为应用程序层,应用程序框架层,安卓运行时(Android runtime)和系统库,硬件抽象层(hardware abstraction layer,HAL),以及内核层。应用程序层可以包括一系列应用程序包。
如图2所示,应用程序包可以包括相机,图库,日历,通话,地图,导航,WLAN,蓝牙,音乐,视频,短信息等应用程序。
应用程序框架层为应用程序层的应用程序提供应用编程接口(application programming interface,API)和编程框架。应用程序框架层包括一些预先定义的函数。
如图2所示,应用程序框架层可以包括窗口管理器,内容提供器,视图系统,电话管理器,资源管理器,通知管理器等。
窗口管理器用于管理窗口程序。窗口管理器可以获取显示屏大小,判断是否有状态栏,锁定屏幕,截取屏幕等。
内容提供器用来存放和获取数据,并使这些数据可以被应用程序访问。数据可以包括视频,图像,音频,拨打和接听的电话,浏览历史和书签,电话簿等。
视图系统包括可视控件,例如显示文字的控件,显示图片的控件等。视图系统可用于构建应用程序。显示界面可以由一个或多个视图组成的。例如,包括短信通知图标的显示界面,可以包括显示文字的视图以及显示图片的视图。
电话管理器用于提供电子设备100的通信功能。例如通话状态的管理(包括接通,挂断等)。
资源管理器为应用程序提供各种资源,比如本地化字符串,图标,图片,布局文件,视频文件等等。
通知管理器使应用程序可以在状态栏中显示通知信息,可以用于传达告知类型的消息,可以短暂停留后自动消失,无需用户交互。比如通知管理器被用于告知下载完成,消息提醒等。通知管理器还可以是以图表或者滚动条文本形式出现在系统顶部状态栏的通知,例如后台运行的应用程序的通知,还可以是以对话窗口形式出现在屏幕上的通知。例如在状态栏提示文本信息,发出提示音,电子设备振动,指示灯闪烁等。
Android Runtime包括核心库和虚拟机。Android runtime负责安卓系统的调度和管理。
核心库包含两部分:一部分是java语言需要调用的功能函数,另一部分是安卓的核心库。
应用程序层和应用程序框架层运行在虚拟机中。虚拟机将应用程序层和应用程序框架层的java文件执行为二进制文件。虚拟机用于执行对象生命周期的管理,堆栈管理,线程管理,安全和异常的管理,以及垃圾回收等功能。
系统库可以包括多个功能模块。例如:表面管理器(surface manager),媒体库(Media Libraries),三维图形处理库(例如:OpenGL ES),2D图形引擎(例如:SGL)等。
表面管理器用于对显示子系统进行管理,并且为多个应用程序提供了2D和3D图层的融合。
媒体库支持多种常用的音频,视频格式回放和录制,以及静态图像文件等。媒体库可以支持多种音视频编码格式,例如:MPEG4,H.264,MP3,AAC,AMR,JPG,PNG等。
三维图形处理库用于实现三维图形绘图,图像渲染,合成,和图层处理等。
2D图形引擎是2D绘图的绘图引擎。
HAL层为位于操作系统内核与硬件电路之间的接口层,可以将硬件抽象化。HAL层包括音频处理模块。音频处理模块可以用于,根据拍摄视角对麦克风获得的模拟音频电信号进行处理,生成不同拍摄视角和视频画面对应的音频。例如,对于广角视角来说,音频处理模块可以包括音色修正模块、立体声波束形成模块和增益控制模块等。对于变焦视角来说,音频处理模块可以包括音色修正模块,立体声/单声道波束成形模块,环境噪声控制模块和增益控制模块等。对于前置视角来说,音频处理模块可以包括音色修正模块、立体声/单声道波束呈现模块,人声增强模块和增益控制模块等。
内核层是硬件层和上述软件层之间的层。内核层至少包含显示驱动,摄像头驱动,音频驱动,传感器驱动。其中,该硬件层可以包括摄像头、显示屏,麦克风,处理器,以及存储器等。
在本申请的实施例中,在录制多路音频的录像模式下,硬件层中的显示屏可以显 示录像时的拍摄预览界面、录像预览界面和拍摄界面。硬件层中的摄像头可以用于采集多路视频画面。硬件层中的麦克风可以用于采集声音信号,并生成模拟音频电信号。HAL层中的音频处理模块可以用于对模拟音频电信号转换成的数字音频数据进行处理,从而生成不同拍摄视角和视频画面对应的音频。在视频回放时,显示屏可以显示视频播放界面,扬声器可以播放用户关注的多路音频以及多路音频的第三音频,从而提升用户多路录像的音频体验。
为便于理解,本申请实施例以手机作为电子设备,首先对录像过程中人机交互的过程进行介绍。示例性的,图3A至图3D提供了音频处理过程中的图形用户界面(graphical user interface,GUI)示意图。
其中,图3A示出了手机的解锁模式下,手机的屏幕显示系统显示了当前输出的界面内容301,该界面内容301为手机的主界面。该界面内容301显示了多款应用程序(application,App),例如图库、设置、音乐、相机等应用程序。应理解,界面内容301还可以包括其他更多的应用,本申请对此不作限定。
当手机检测到用户点击主界面301上的相机应用的图标302的操作后,可以启动相机应用,显示如图3B所示的界面,该界面可以称为相机的拍摄界面303。该拍摄界面303可以包括取景框、相册图标、拍摄控件304和旋转控件等。
其中,取景框用于获取拍摄预览的图像,实时显示预览图像,如图3B所示的后置视角中的人物的预览图像。相册图标用于快捷进入相册,当手机检测到用户点击相册的图标后,可以在触摸屏上展示已经拍摄的照片或者视频等。拍摄控件304用于拍摄或者录像,当手机检测到用户点击拍摄控件304后,手机执行拍照操作,并将拍摄的照片保存下来;或者,当手机处于录像模式时,用户点击拍摄控件304后,手机执行录像操作,并将录制的录像保存下来。摄像头旋转控件用于控制前置摄像头和后置和摄像头的切换。
此外,该拍摄界面303还包括用于设置拍摄模式的功能控件,例如图3B所示的光圈拍摄模式、夜景拍摄模式、人像拍摄模式、拍照模式、录像模式、专业模式和更多模式等。其中,如图3C所示,更多模式还可以包括慢动作模式、全景模式、黑白艺术模式、双景录像模式、滤镜模式、高动态范围图像((high-dynamic range,HDR)模式以及多路录像模式(图中未示出)等。应理解,当用户点击图标302后,响应于该点击操作,手机打开相机应用后默认在拍照模式下,本申请对此不作限定。
示例性的,当电子设备检测到用户在相机的拍摄界面303点击录像图标时,可以进入单路录像模式,例如默认进入后置单路录像模式;当电子设备检测到用户点击摄像头旋转控件时,录像的视角由后置视角切换为前置视角,录像模式切换为前置单路录像模式。
或者,当电子设备检测到用户在相机的拍摄界面303点击更多图标时,显示如图3D所示的界面,该界面可以称为更多功能界面。例如,当电子设备在更多功能界面检测到用户点击双景录像图标时,进入双景录像模式。示例性的,在双景录像模式下,电子设备的图像预览界面默认显示前置视角的视频画面和后置视角(例如,变焦视角)的视频画面,当电子设备检测到用户点击摄像头旋转控件时,图像预览界面显示的视频画面可以发生切换,如检测到用户点击一次摄像头旋转控件,图像预览界面显示双 前置视频按画面,检测到用户再次点击一次摄像头旋转控件,图像预览界面显示双后置视频画面等。
为便于理解,以下结合附图,对录像过程中的多路音频录制过程进行介绍。图4A至图4C示出了一些录像场景的示意图。在图4中,说话人1为第一说话人,说话人2为第二说话人,音频2为第一音频,音频1为第二音频。或者,说话人1为第二说话人,说话人2为第一说话人,音频1为第一音频,音频2为第二音频。或者,说话人1为第一说话人,说话人2为第二说话人,音频1为第一音频,音频2为第二音频。
应理解,多路音频可以由多个麦克风进行录制,在一个实施例中,电子设备包括多个麦克风(电子设备的麦克风可以称为本机麦克风),多个本机麦克风可以设置在电子设备的不同位置,以录制不同视角的音频;在一个实施例中,电子设备可以与至少一个无线麦克风进行无线连接,电子设备可以利用一个无线麦克风采集到的音频来录制一个视角的音频,电子设备还可以利用多个无线麦克风采集到的音频来分别录制多个视角的音频。其中,无线麦克风例如可以是具有录音功能的无线耳机、无线音箱、平板电脑、可穿戴设备或者其他用户的手机等。使用无线麦克风采集到的音频来录制音频,可以更清晰地采集所拍摄的视角中说话人的声音。
结合上述介绍可知,根据录像时的拍摄视角和录像线路数量的不同,可以将本申请实施例中的录像模式分为以下几种模式:前置单路录像模式、后置单路录像模式、前后多路录像模式、前置多路录像模式和后置多路录像模式。
根据录像场地的不同,可以将本申请实施例中的录像分为:室外录像场地和室内录像。
根据采集音频的麦克风的类型(或称录音类型)不同,可以将本申请实施例中的录像分为:多个本机麦克风参与录音,无线麦克风不参与录音的情形;多个无线麦克风参与录音,本机麦克风不参与录音的情形;本机麦克风和无线麦克风共同参与录音的情形。
其中,在不同的录像模式、录像场地中,电子设备均可以通过至少一个本机麦克风和/或至少一个无线麦克风采集的音频录制多路音频,该多路音频至少包括各拍摄视角范围的声音。
本申请实施例提供的音频处理的方法可以应用于上述录像场地、录像模式和录音类型的多种组合场景中。以下结合其中部分组合场景,对本申请实施例提供的音频处理的方法所涉及的录像过程进行介绍。
在一种实现方式中,电子设备可以预先与无线麦克风建立有无线连接。示例性的,当电子设备按照图3中所示的相关步骤,进入录像模式后,响应于电子设备的模式变化,电子设备可以显示提示消息,提示用户使用哪些麦克风进行录音,并可以提示用户是否需要无线麦克风参与本次录像;用户可以根据需要点击确认或取消按键;当电子设备检测用户点击取消按键,则可以启动多个本机麦克风录制多路音频;当电子设备检测到用户点击确认按键后,可以继续提示用户选择哪个无线麦克风录音,使用户对可用的无线麦克风进行选择,电子设备还可以提示用户是否还需要本机麦克风参与录音;当电子设备检测到用户选择不需要本机麦克风参与录音,则在录像过程中,由多个无线麦克风进行录音;当电子设备检测到用户选择需要本机麦克风参与录音,则 在录像过程中,由本机麦克风和无线麦克风共同参与录音。其中,无线麦克风可以在录像过程中录制其拾音范围内的声音。
应理解,上述电子设备对麦克风类型的选择过程仅为一种示例,该过程还可以有多种其它实现方式,比如电子设备可以先提示用户是否需要本机麦克风参与录音,检测到用户输入的选择后,然后再提示用户是否需要无线麦克风参与录音,以及提示用户选择哪个或哪些麦克风参与录音等。本申请对此不作限定。
作为其中一个示例,如图4A所示,为一种录音场景的示意图。该场景可以是室外场地中,电子设备在前置单路录像模式下,采用本机麦克风和无线麦克风录制多路音频的情形。
在该情形下,电子设备进入前置单路录像模式后,如图4A所示,电子设备通过前置摄像头录制位于前置视角中的说话人1的图像,电子设备的拍摄预览界面显示前置视频画面;并且在录像过程中,电子设备的本机麦克风录制说话人1的说话声音(记为音频1),位于位置1的无线麦克风(可以是说话人2的无线耳机或手机等设备上的麦克风)录制其拾音范围内声音,如说话人2的说话声音(记为音频2)。音频1和音频2可以存在电子设备的缓存中。
应理解,在图4A所示的场景中,位置1可以在前置视角范围之外,例如,位置1位于后置视角。但在其它一些实现方式中,位置1也可以在前置视角范围之内。当位置1在前置视角范围之内时,拍摄预览界面显示的前置视频画面中,还可以包括说话人2的图像。
在录像过程中或者录像完成后,无线麦克风可以通过无线连接将音频2发送给电子设备。
当用户点击关闭录像控件时,电子设备响应于用户的点击操作,停止录像,退出录像模式。电子设备对音频和视频进行打包,生成录像文件,该录像文件包括前置视频画面和第三音频,该第三音频包括至少部分音频1和至少部分音频2。例如,音频1和音频2一直在录制,第三音频包括部分音频1和部分音频2,由部分音频1和部分音频2合并而成,且在音频1和音频2切换的过程中,由音频1和音频2按照设定的权重合并而成。又如,音频1仅在说话人1说话时录制,音频2仅在说话人2说话时录制,第三音频包括全部音频1和全部音频2,由音频1和音频2合并而成,且在音频1和音频2切换的过程中,由音频1和音频2按照设定的权重合并而成。其中,电子设备可以将录像文件保存下来,可以存在内部存储器(内存)或外部存储器(外存)中,如保存在相册图标中。
应理解,这里相册中最终保存的录像文件(即目标录像文件)是电子设备经过处理后的录像文件,例如,经过对多路音频合并,并对说话人进行图像识别,检测说话人开始说话时,增大第三音频中音频1的音量,使得音频1的音量在说话人开始说话时随之提高等处理。上述处理过程可以在电子设备内部完成,直到得到最终的录像文件保存在相册中。
在图4A所示的场景中,在一个实施例中,在电子设备通过前置摄像头录制前置视角时,后置摄像头还在后台录制后置视角,电子设备的拍摄预览界面不显示后置视频画面,但对后置摄像头录制的后置视频画面进行存储,例如存在电子设备的缓存中, 以便对说话人2张口的动作进行检测,例如,在时刻t1,说话人2张口,并开始说话。在对录像文件进行播放时,电子设备显示前置视频画面,当播放时刻t1对应的画面时,音频2的音频特征发生变化,例如音频2的声音增大。
在一个实施例中,在电子设备通过前置摄像头录制的前置视频画面进行存储,例如存在电子设备的缓存中,以便对说话人1张口的动作进行检测,例如,在时刻t2,说话人1张口,并开始说话。在对录像文件进行播放时,电子设备显示前置视频画面,当播放时刻t2对应的画面时,音频1的音频特征发生变化,例如音频1的声音增大。
在一个实施例中,在电子设备通过前置摄像头录制前置视角时,后置摄像头还在后台录制后置视角,电子设备的拍摄预览界面不显示后置视频画面,同时对前置视频画面和后置视频画面进行存储,以便对说话人1和说话人2张口的动作进行检测,例如,在时刻t3,说话人1张口,并开始说话;在时刻t4,说话人2张口,并开始说话。在对录像文件进行播放时,电子设备显示前置视频画面,当播放时刻t3对应的画面时,音频1的音频特征发生变化,例如音频1的声音增大。当播放时刻t4对应的画面时,音频2的音频特征发生变化,例如音频2的声音增大。
在一种可能的实现方式中,上述处理过程也可以在云端服务器完成在。例如,在录像过程中或者录像结束后,电子设备和无线麦克风可以将获取的视频和音频发送给云端服务器;或者,无线麦克风将录制的音频先发送给电子设备,再由电子设备发送给云端服务器;然后由云端服务器完成上述处理过程,生成最终的录像文件,再发送给电子设备;电子设备再将录像文件保存在相册中。应理解,在各录像场景中均可以采用该处理方式,为避免重复,下文其它场景的描述中,对此以下不再赘述。
在另一个场景中,如图4B所示,为电子设备在后置单路录像模式下,采用两个无线麦克风录制多路音频的情形示意图。其中,无线麦克风1例如可以是无线耳机,由位于前置视角中的说话人1佩戴,无线麦克风2例如可以是说话人2的手机(或无线耳机),由位于后置视角范围中的说话人2携带。此外,前置视角还可以通过本机麦克风进行录制。
应理解,在实际应用时,无线麦克风的类型不限于图4B示出的无线耳机和手机,还可以是其它具有录音功能的设备,本申请对此不作限定。
示例性的,在该情形下,电子设备进入后置单路录像模式后,如图4B图所示,电子设备通过后置摄像头录制位于后置视角中的说话人2的视频图像,电子设备的拍摄预览界面显示后置视频画面;并且在录像过程中,说话人1佩戴的无线麦克风1录制说话人1的说话声音(记为音频1),说话人2携带的无线麦克风2录制说话人2说话的声音(记为音频2)。
在一种实现方式中,在该后置单路录像过程中,电子设备的前置摄像头在后台开启,录制说话人1的图像,其中,该说话人1的图像用于电子设备在音频处理时,对说话人1是否说话进行识别。应理解,这里所说的前置摄像头在后台开启,是指在录像过程中,前置摄像头实时采集前置视角的视频画面,但拍摄预览界面并不显示该前置视频画面;生成录像文件后,该录像文件不包括前置视频画面,在之后视频回放时,播放界面也不显示该前置视频画面。
在图4B所示的场景中,在一个实施例中,在电子设备通过前置摄像头录制后置 视角时,前置摄像头还在后台录制前置视角,电子设备的拍摄预览界面不显示前置视频画面,但前置摄像头录制的前置视频画面进行存储,例如存在电子设备的缓存中,以便对说话人1张口的动作进行检测,例如,在时刻t5,说话人1张口,并开始说话。在对录像文件进行播放时,电子设备显示前置视频画面,当播放时刻t5对应的画面时,音频1的音频特征发生变化,例如音频1的声音增大。
在一个实施例中,在电子设备通过后置摄像头录制的后置视频画面进行存储,例如存在电子设备的缓存中,以便对说话人2张口的动作进行检测,例如,在时刻t6,说话人2张口,并开始说话。在对录像文件进行播放时,电子设备显示前置视频画面,当播放时刻t6对应的画面时,音频2的音频特征发生变化,例如音频2的声音增大。
在一个实施例中,在电子设备通过后置摄像头录制后置视角时,前置摄像头还在后台录制前置视角,电子设备的拍摄预览界面不显示前置视频画面,同时对前置视频画面和后置视频画面进行存储,以便对说话人1和说话人2张口的动作进行检测,例如,在时刻t7,说话人2张口,并开始说话;在时刻t8,说话人1张口,并开始说话。在对录像文件进行播放时,电子设备显示前置视频画面,当播放时刻t7对应的画面时,音频2的音频特征发生变化,例如音频2的声音增大。当播放时刻t8对应的画面时,音频1的音频特征发生变化,例如音频1的声音增大。
在录像过程中或者录像完成后,无线麦克风1将音频1发送给电子设备,无线麦克风2将音频2发送给电子设备。
当用户点击关闭录像控件时,电子设备响应于用户的点击操作,停止录像,退出录像模式。录像结束后,电子设备生成录像文件,该录像文件包括前置视频画面和第三音频,该第三音频为音频1和音频2的第三音频。其中,电子设备可以将录像文件保存下来,如保存在相册图标中。
应理解,这里相册中最终保存的录像文件是电子设备经过处理后的录像文件,例如,对多路音频合并,并对说话人1进行图像识别,检测说话人1开始说话时,增大第三音频中音频1的音量,使得音频1的音量在说话人开始说话时随之提高。上述处理过程可以在电子设备内部完成,直到得到最终的录像文件保存在相册中。
在一个场景中,如图4C所示,为电子设备在前后置多路录像模式下,采用两个无线麦克风录制多路音频的情形示意图。其中,无线麦克风1例如可以是无线耳机,由位于前置视角中的说话人1佩戴,无线麦克风2例如可以是说话人2的无线耳机(或手机),由位于后置视角范围中的说话人2携带。此外,前置视角还可以通过本机麦克风进行录制。
应理解,在实际应用时,无线麦克风的类型不限于图4C示出的无线耳机和手机,还可以是其它具有录音功能的设备,本申请对此不作限定。
示例性的,在该情形下,电子设备进入前后置多路录像模式后,如图4C所示,电子设备通过前置摄像头录制位于前置视角中的说话人1的视频图像,通过后置摄像头录制位于后置视角中的说话人2的视频图像,电子设备的拍摄预览界面显示前置视频画面和后置视频画面;并且在录像过程中,说话人1佩戴的无线麦克风1录制说话人1的说话声音(记为音频1),说话人2携带的无线麦克风2录制说话人2说话的声音(记为音频2)。
在一种实现方式中,在录像过程中,录制说话人1的图像,其中,该说话人1的图像用于电子设备在音频处理时,对说话人1是否说话进行识别。应理解,这里所说的前置摄像头在后台开启,是指在录像过程中,前置摄像头实时采集前置视角的视频画面,但拍摄预览界面并不显示该前置视频画面;生成录像文件后,该录像文件不包括前置视频画面,在之后视频回放时,播放界面也不显示该前置视频画面。
在一个实施例中,同时对前置视频画面和后置视频画面进行存储,以便对说话人1和说话人2张口的动作进行检测,例如,在时刻t9,说话人2张口,并开始说话;在时刻t10,说话人1张口,并开始说话。在对录像文件进行播放时,电子设备显示前置视频画面和后置视频画面,当播放时刻t9对应的画面时,音频2的音频特征发生变化,例如音频2的声音增大。当播放时刻t10对应的画面时,音频1的音频特征发生变化,例如音频1的声音增大。
在录像过程中或者录像完成后,无线麦克风1将音频1发送给电子设备,无线麦克风2将音频2发送给电子设备。
当用户点击关闭录像控件时,电子设备响应于用户的点击操作,退出录像模式。录像结束后,电子设备生成录像文件,该录像文件包括前置视频画面和第三音频,该第三音频为音频1和音频2的第三音频。其中,电子设备可以将录像文件保存下来,如保存在相册图标中。
应理解,这里相册中最终保存的录像文件是电子设备经过处理后的录像文件,例如,对多路音频合并,并对说话人1进行图像识别,检测说话人1开始说话时,增大第三音频中音频1的音量,使得音频1的音量在说话人开始说话时随之提高。上述处理过程可以在电子设备内部完成,直到得到最终的录像文件保存在相册中。
在上述场景中,电子设备在进行录像时录制两路的音频。此外,在一些实施例中,电子设备在进行录像的时候,还可以录制三路以上的音频,第三音频可以包括三路以上的音频。在异形场景中,还可以将第一音频、第二音频和第三音频中的至少两个存储在内存或外存中,用户可以自行选择并合成不同的音频,以增加灵活性。
在一种实现方式中,当在录像时采用本机麦克风和无线麦克风共同录音的方式时,还可以基于电子设备与无线麦克风之间的定位功能,提示用户选择合适位置的无线麦克风进行录音。
以前后双景录像模式为例,如图5A和图5B所示,为室内场景下,电子设备在前后置双路录像模式下,采用本机麦克风和无线麦克风共同参与录音的场景示意图。
其中,电子设备可以与无线麦克风接入同一个接入点AP,或者使用同一WI-FI。
在一种实现方式中,电子设备发送广播消息,该广播消息用于请求建立无线连接(如配对);无线麦克风接收到广播消息后,根据该广播消息与电子设备建立无线连接,即实现配对;或者,无线麦克风发送请求无线连接的广播消息,电子设备接收到广播消息后,根据该广播消息与无线麦克风建立无线连接。
其中,上述建立无线连接的过程可以发生电子设备启动录像模式时,例如:电子设备响应于录像模式启动,发送上述广播消息,并进行上述配对过程;或者,该建立无线连接的过程也可以发生在录像之前。
示例性的,再进行前后双景录像时,用户根据图3A图至图3C进行操作,然后电 子设备检测到用户点击双景录像图标,响应于该点击动作,电子设备可以显示提示消息,如图5A所示,该提示消息的内容例如可以是:“发现周围有可用的无线麦克风,是否选择无线麦克风参与录像?”当电子设备检测到用户点击“选择”选项时,可以继续显示提示消息,如图5B所示,该提示消息的内容可以是当前可用的无线麦克风设备的名称、型号等,可以包括“已配对设备”和“可用设备”;用户可以选择合适的无线麦克风参与录像,当电子设备检测到用户点击的一个或多个麦克风后,电子设备与该一个或多个麦克风进行本次无线连接。
“已配对设备”表示已经与电子设备配对过的设备,且在可以进行无线通信的范围之内。如果用户选择“已配对设备”中的任何一个或多个已配对设备(例如智能音箱或无线耳机等具有麦克风的设备,即无线麦克风),电子设备与已配对设备之间建立无线连接,已配对设备与电子设备之间传输数据,当电子设备在拍摄视频时,已配对设备可以将麦克风采集到的数据传输给电子设备。
“可用设备”表示可以与电子设备配对过的设备,且在可以进行无线通信的范围之内。如果用户选择“可用设备”中的任何一个或多个可用设备(例如智能音箱或无线耳机等具有麦克风的设备,即无线麦克风),电子设备与可用设备进行配对,配对完成后,电子设备与可用设备之间建立无线连接,可用设备与电子设备之间传输数据,当电子设备在拍摄视频时,可用设备可以将麦克风采集到的数据传输给电子设备。
在一种实现方式下,电子设备和无线麦克风可以基于定位功能或测距功能,实现对无线麦克风的定位,然后根据录像视角自动选择该视角范围内的无线麦克风进行录音。示例性的,如图6所示,当电子设备在前后双景录像模式下进行录像,在进入双景录像模式时,电子设备检测到用户按照图5A和图5B选择的无线麦克风设备(如智能音箱1、智能音箱2)。
在录像过程中,前置视角对应的音频通过电子设备的本机麦克风录制,后置视角对应的音频通过智能音箱1或智能音箱2录制;假设初始阶段,后置视角为图6所示的拍摄视角1,电子设备基于智能音箱1的获知智能音箱1位于该拍摄视角1的范围内,电子设备自动实现本次与智能音箱1的连接,由该智能音箱1进行后置视角录音。而后,在录像过程中,如果电子设备的后置视角发生旋转,后置视角由拍摄视角1切换为拍摄视角2,则电子设备可以与智能音箱1断开连接,自动与智能音箱2建立本次无线连接,由无线麦克风2录制后置视角对应的音频。
上述结合附图描述了本申请实施例提供的音频处理的方法可能涉及到的录像场景以及录像过程中人机交互的实施例,为了更好地理解本申请提供的音频处理的方法,下面从实现层面介绍具体的实现过程和算法。
图7是本申请实施例提供的一例音频处理过程的示意图。该音频处理的方法应用于电子设备,该电子设备包括第一摄像头和第二摄像头,其中,第一摄像头拍摄第一视角,第二摄像头拍摄第二视角,该方法包括以下步骤:
S701,响应于用户输入的录像操作,进入录像模式。
其中,录像操作可以是单路录像操作或者多路录像操作。相应地,电子设备响应于录像操作,进入对应的单路录像模式或多路录像模式。
示例性的,如图3A所示,用户在主界面点击相机应用,响应于用户的点击操作, 电子设备显示拍摄预览界面。其中,该拍摄预览界面可以对应图3B。
其中,这里的录像操作可以为:在拍摄预览界面,用户点击拍摄控件304的操作,响应于该操作;或者,在如图3D所示的更多功能界面,用户点击双景录像控件的操作;或者,在更多功能界面,用户点击多路录像控件的操作。
电子设备检测到用户的录像操作后,进入对应的录像模式。比如:检测到用户点击图3A的拍摄控件操作后,进入单路录像模式;或者,检测到用户在图3D界面点解双路录像控件的操作后,进入双路录像(或称双景录像)模式等。
S702,在录像模式下,第一摄像头对第一视角录制第一视频画面;录制多个声音通道的音频,该多个声音通道的音频包括第一视角对应的第一音频和第二视角对应的第二音频;在第一时刻,第一说话人说话,第一说话人位于所述第二视角内。
在一种实现方式中,第一摄像头可以是前置摄像头,第一视角为前置视角,第一视频画面为前置视频画面;第二视角为后置视角,其中,第一说话人位于后置视角范围,第二音频可以包括第一说话人的说话声音。例如可以对应图4B所示的场景,此时,第一说话人对应于图4B中的说话人2。
在一种实现方式中,第一摄像头可以是后置摄像头,第一视角为后置视角,第一视频画面为后置视频画面;第二视角为前置视角,其中,第一说话人位于前置视角范围,第二音频可以包括第一说话人的说话声音。例如可以对应图4B所示的场景,此时,第一说话人对应于图4B中的说话人1。
在一种实现方式中,多个声音通道的音频可以由至少两个麦克风分别录制。其中,至少两个麦克风可以包括手机的本机麦克风和/或无线麦克风。具体地,多个声音通道的音频可以分别由电子设备的本机麦克风和无线麦克风采集;或者,也可以由多个无线麦克风采集;或者,也可以由多个本机麦克风采集。
应理解,本申请中的无线麦克风可以是具有麦克风功能的多种设备,并且该无线麦克风可以在录制操作之前与手机建立无线连接。其中,无线麦克风例如可以是:无线耳机、无线音箱或者另一个手机等能够实现麦克风功能的设备。本申请对此不作具体限定。
可选地,无线麦克风与手机的无线连接方式可以包括多种,例如:蓝牙、行动热点(wireless fidelity,WI-FI)、第五代移动通信技术(the 5 th generation,5G)、第四代移动通信技术(the 4 th generation,4G)等。
在一种实现方式中,在第一时刻,第一说话人说话,可以包括:在第一时刻,第一说话人张口。
S703,生成目标录像文件,该目标录像文件包括第三音频和第一视频画面,其中,第三音频包括至少部分第一音频和至少第二音频。
其中,第三音频为多个通道的音频合并后的音频,换句话说,第三音频为多路音频的第三音频。示例性的,在录音过程中或录像结束后,电子设备可以根据预设权重对多路音频进行合并,并获取第三音频。在录音合并过程中,音频处理器对多路音频进行合并编码,获取第三音频。
在一种实现方式中,各路音频的预设权重可以结合说话人是否开始说话进行设置。示例性的,当在第一视频画面未检测到说话人开始说话时,该第一音频的权重可以低 于第一阈值,第一阈值例如可以为0或0.2。其中,当第一音频的权重为0时,第三音频可以按照双路音频中另外一路音频的编码方式进行编码。
应理解,在第一视频画面中未检测到说话人说话,如未检测到说话人发生张口动作时,表示说话人未发出声音,或者第一视频画面对应的音频不存在用户需要的内容等,此时,第三音频可以降低第一视角对应的音频所占的增益比例(或权重),以便更多地呈现其它音频内容。
当根据第一视频画面检测到说话人开始说话时,调整第三音频中第一视角对应的音频的权重至目标权重。例如,在前后景摄像时,当手持电子设备的用户开始说话时,则加大第三音频中该用户对应的音频的增益比例,以更多地凸显该用户的说话内容。
在一种实现方式中,对说话人是否开始说话的判断过程可以为:电子设备根据摄像头采集的说话人的图像,进行图像识别,判断说话人是否发生张口等目标动作,如果检测到说话人发生该目标动作时,则表示说话人开始说话。
具体地,电子设备的NPU计算处理器基于ISP对说话人图像的图形处理结果对目标动作进行识别,如检测被拍摄对象的是否张口。当检测到说话人发生目标动作时,则以缓存的当前音频帧为基准,对特定帧的多路音频的权重进行调整。
在一种实现方式中,可以预设权重调整策略,当检测到目标动作时,按照该权重调整策略对各路音频的权重进行调整。示例性的,该权重调整策略中,第一音频的权重可以随时间的增长而增加,相应地,其他音频的权重可以对时间的变化而减小,从而由其他音频逐渐切换为第一音频的效果,实现音频之间的光滑切换,避免声音突变。
示例性的,第一音频的权重可以与时间之间呈线性变化关系,如图8所示。在图8的权重与时间的关系示意图中,横轴为时间轴,纵轴为第一音频的权重,由开始第三音频的时刻(帧1)开始至帧i,第一音频的权重与时间呈线性关系。
应理解,各路音频的权重与时间关系可以不仅限于线性关系,在保证各路音频逐渐切换的基础上,权重与音频合并时间之间的关系还可以包括多种非线性关系,本申请对此不作限定。
S704,响应于用户输入对目标录像文件的播放操作,播放目标录像文件;其中,当播放到第一时刻对应的画面时,第二音频的音频特征发生变化。
应理解,目标录像文件包括第一视频画面和第三音频,因此,在播放目标录像文件时,电子设备在播放第一视频画面的同时,播放第三音频。
在一种实现方式中,目标录像文件还可以包括多个其它视频画面,使得在目标录像文件播放时,电子设备可以同时播放多个视角的视频画面,以及第三音频。
在一种实现方式中,当播放到第一时刻对应的画面时,说话人开始说话,此时,说话人所在视角对应的音频特征发生变化。
在一种实现方式中,音频特征包括音量,播放所述目标录像文件的过程中,当播放到所述第一时刻对应的视频画面时,所述第二音频的音量增大。
在一种实现方式中,播放所述目标录像文件的过程中,当播放到所述第一时刻对应的视频画面时,所述第二音频的音量逐渐增大。
在一种实现方式中,所述电子设备播放所述目标录像文件时,所述电子设备显示第一视频画面和第二视频画面。
在一种实现方式中,所述电子设备播放所述目标录像文件时,所述电子设备显示第一视频画面,而不显示第二视频画面。
在一种实现方式中,播放所述目标录像文件过程中,在第一时刻,第二视频画面中的第一说话人张口。
电子设备可以设置播放第三音频的播放音轨作为录像的缺省音轨,使得当播放该录像作品时,默认播放第三音频;或者,分享该录像作品时,默认分享该第三音频。其中,播放音轨为音频播放时的播放通道。
在本申请实施例提供的音频处理方法中,手机可以将获取到的多路音频存储至存储器,并且对多路音频进行合并,获取该多路音频的第三音频。具体地,手机可以在不同的播放时间针对不同音频设置有不同的预设权重,按照预设权重对多路音频的数据(如采样率)进行加权,获取第三音频。
手机利用前置摄像头获取说话人的图像,根据该说话人的图像判断说话人是否开始说话,若确定说话人开始说话,则可以调整第三音频中该前置画面对应的音频的权重,如动态增加手机近端音频(如说话人的音频)的比重,使第三音频逐步切换为手机近端的音频,凸显其音频内容。
根据本申请实施例提供的音频处理的方法,通过基于拍摄的视频图像检测到的目标结果,对第三音频中与视频图像对应音频的权重进行调整,在呈现完整音频的基础上,优化音频之间的切换效果,解决了不支持播放多路音频的电子设备在播放视频时,为获取音频内容需要进行切换,导致的声音突变感。
以下结合附图,以手机前后双路录像的场景为例,对本申请实施例提供的音频处理方法的内部实现过程以及处理流程进行介绍。本申请实施例提供的音频处理的方法,可以在录像过程中实时进行,也可以在录像之后进行。以下以在录像过程中进行音频处理为例,进行说明。
在用户通过手机进行录像的过程中,电子设备可以同时进行视频处理、音频处理以及基于图像识别的混音等操作流程。为便于理解,以前后双路录像模式为例,并以一帧音频和一帧视频的处理为例,对处理流程进行说明。其中,如图9所示,各流程可以包括以下内容。
在一种实现方式中,录像及视频处理流程可以包括:在当前的前后双路录像模式下,电子设备通过前置摄像头和后置摄像头分别采集一帧前置视频画面(记为前置视频帧ZX)和一帧后置视频画面(记为后置视频帧ZY);前置摄像头和后置摄像头分别将采集到的视频数据传递至电子设备的ISP;电子设备例如可以是通过开放式图形接口(openGL接口)对前置视频帧ZX和后置视频帧ZY进行拼接,再由视频编解码器进行视频编码,而后按照一定的文件规范(如MP4 container文件规范)写入目标录像文件。
在一种实现方式中,录音及音频处理过程可以包括:在当前的前后双路录像模式下,电子设备可以由本机麦克风录制一帧音频(记为音频帧X),无线麦克风录制一帧音频(记为音频帧Y);电子设备在接收到音频数据后,可以将音频数据缓存至缓存区(例如内存的缓存区),其中,不同声音通道的音频数据可以缓存至不同的缓存区,比如将音频帧X缓存至缓存区QX,将音频帧Y缓存至缓存区QY;音频处理器 接收到多路音频数据后,可以对各路音频数据分别进行独立编码,并将编码后的各路音频的当前帧音频数据写入多路音频文件。其中,编码方式例如可以包括:脉冲编码调制(pulse code modulation,PCM)、高级音频编码(advanced audio coding,AAC)等。编码后目标音频的格式可以包括波形声音文件WAV、MP3格式等。完成音频帧X和音频帧Y的上述处理后,可以将处理后的音频帧X和音频帧Y写入目标录像文件,或者将上述多路音频音频文件写入目标录音文件。
此外,音频处理器可以根据预设权重,对音频帧X和音频帧Y进行合并,比如按照一定比例增益对两路音频进行合并编码,获取第三音频。
其中,各路音频的采样率可以相同或不同,本申请实施例以各路音频采样率相同(如均为8bit)进行说明。
在一种实现方式中,基于图像识别的混音流程可以包括:在当前的前后双路录像模式下,电子设备的前置摄像头采集的前置视频画面包括说话人,电子设备将采集的视频帧传输至IPS后,并由IPS处理后,可以将视频流分为两路,一路视频流数据用于实现与后置视频图像合并,另一路视频流用于电子设备进行图像识别,判断说话人是否说话。
为更好地理解该过程,结合图10示出的软件架构示意图进行具体介绍。
应理解,这里以视频图像的处理在硬件抽象HAL层进行处理为例进行介绍,然而在实际应用时,上述所说的视频处理过程、音频处理过程以及人脸识别过程不限于在HAL层实现,还可以在中间层或应用层实现,本申请对此不作限定。这里的HAL层可以为图2示出的内核层和硬件层之间的接口层;中间层可以为图2示出的系统库及应用程序框架层;应用层可以为图2示出的应用程序层。
其中,前置摄像头将采集到的前置视频帧ZX的图像信号传递至ISP进行图像处理,后置摄像头将采集到的后置视频帧ZY的图像信号传递至ISP进行图像处理;IPS处理外之后,将后置视频流传输至后处理单元,例如传输至美颜处理单元,以对后置视频图像进行美颜处理,而后在传输至防抖处理单元,以对后置视频图像进行防抖处理。同时,IPS可以将前置视频流,分别向人脸识别单元以及前置图像后处理单元传输,其中,人脸识别单元用于对前置视频画面中的说话人进行人脸识别,判断说话人是否嘴唇张开,进而确定说话人是否说话,后处理单元,则对前置视频图像进行美颜处理、防抖处理。
示例性的,根据前置视频图像判断说话人是否说话还可以包括以下具体内容:
前置视频帧被传递至NPU计算处理器进行图像识别,NPU计算处理器接收到当前帧的图像输入信息后,对该输入信息进行快速处理,如基于获取的当前视频帧ZX对说话人进行人脸检测,包括利用人脸坐标AI算法,判断说话人是否发生目标动作,其中:若确定在当前视频帧说话人发生目标动作,表示说话人开始说话,则音频处理器以检测到说话人说话时刻为基准,提前i帧调整各路音频在第三音频中的权重,也即调整音频帧[X-i,X]、音频帧[Y-i,Y]在第三音频中的组成权重;若未检测到目标动作,则该合并后的音频仍按照预设比例增益对本机麦克风录制的音频和无线麦克风录制的音频进行合并编码,其中,此时本机麦克风录制的音频帧X的增益例如可以设置为0。
此外,上述过程中的后处理例如包括:结合人脸坐标,通过YUV对图像颜色进行优化,获得具有美颜效果的前置视频帧和后置视频帧;而后可以再对当前帧视频图像进行防抖处理。
示例性的,上述过程中,视频图像每秒传输帧数相同,例如均为30fps。
上述对当前帧视频画面进行美颜处理和防抖处理的过程,可以参见已有技术,此处不再赘述。
应理解,由于电子设备检测到说话人开始说话时,可能比说话人实际开始说话的时刻有所滞后,也即当电子设备确定说话人开始说话时,实际开始说话时对应的音频帧早已缓存至缓存区,因此,提前i帧对各路音频的权重进行调整,是为了克服电子设备执行确定说话人开始过程中导致的时延,从而保证音频内容的完整性。
可选地,由当前音频帧X分别提前i帧(i为大于或等于1的整数)对各路音频进行编码,并将编码后的音频数据多路音频文件。此外,将上述获取的多路音频数据写入当前帧对应的目标音视频文件,获取包括当前视频和与视频对应的第三音频文件
应理解,对于各音频帧和视频帧,均可采用上述方法进行视频和音频的处理,进而在保证各声音通道的独立音频之外,获取与视频画面对应的完整合并后的音频以及流畅光滑的音频切换效果。
结合以上实施例及相关附图,本申请实施例还提供了一种音频处理的方法,该方法可以在如图1、图2所示的具有摄像头、麦克风的电子设备(如手机、平板电脑等)中实现。图11示出了本申请实施例提供的另一种音频处理的方法的示意性流程图,如图11所示,该方法可以包括以下步骤:
S1101,在录像模式下,缓存第一音频的音频帧、第二音频的音频帧和第一视频画面的视频帧。
在一种实现方式中,将各麦克风当前获取的音频帧记为第一音频帧。其中,各路音频的采样率可以相同或不同,这里以各路音频采样率相同(如均为8bit)进行说明。
示例性的,可以将本机麦克风当前录制的音频帧(记为音频帧X)存储至第一缓冲区(记为QX),将无线麦克风当前录制的音频帧(记为音频帧Y)存储至第二缓冲区(记为QY)。并且当前时刻之前的预设时间段内的本机音频和无线麦克风音频也缓存在上述对应的位置。
在一种实现方式中,记当前时刻为N,则可以对当前时刻之前一定时间段内,如对[N-2s,N]或[N-1s,N]这一时间段内的音频数据进行缓存;或者,对当前音频帧之前的一定帧数,如对本机麦克风音频帧[X-i,X]和无线麦克风音频帧[Y-i,Y]的音频数据进行进行缓存,i为大于或等于1,且小于X,Y的整数。
应理解,通过缓存一定时间段内的音频数据,能够在无需存储所有音频内容的情况下,保证第三音频能获取完整音频内容的基础上,节省存储空间,提高音频处理效率。
S1102,检测第一说话人的动作。
例如,通过人脸识别,对第一说话人的动作进行检测。其中,当检测到第一说话人发生张口的动作时,认为该第一说话人开始说话。
S1103,在检测到第一说话人开始说话时,在当前音频帧的前i帧音频帧开始,调 整第三音频中第一音频的音频特征,并调整第三音频中所述第二音频的音频特征,i大于等于1
其中,说话人开始说话可以指说话人发生目标动作,如张口的动作。电子设备可以基于说话人的目标动作确定其开始说话。
应理解,说话人从发生目标动作到电子设备检测到该目标动作,期间需要一定的时间,导致检测到目标动作时对应的音频帧可能会晚于目标动作实际发生的时刻,因此,为了呈现完整的音频内容,本申请实施例对多路音频进行合并时,可以由当前帧之前的某一帧开始执行。
第一音频帧可以为检测到目标动作时,对应的缓存至缓冲区的音频帧。基于第一音频帧可以确定合并多路音频的起始时刻。具体地,可以由当前缓存的第一音频帧为基准,回退预设时间长度,开始对多路音频进行合并。其中,预设时间长度例如可以是100ms。
在一种实现方式中,可以由当前缓存的第一音频帧为基准,回退i帧音频帧,开始对多路音频进行合并。
应理解,本申请实施例中的一帧音频帧可以对应一段时间间隔。
作为一个示例,如图12所示,假如检测到目标动作时,本机麦克风录制的音频恰好缓存的音频帧为[X],无线麦克风录制的音频恰好缓存的音频帧为[Y],那么对该双路音频的音频进行合并时,可以回退i帧,也就是说将音频帧[X-i,X]和音频帧[Y-i,Y]进行合并,获取的第三音频对应的音频帧可以是[M-i,M]。更进一步地,这一时间段音频对应的视频帧可以是[Z-i,Z]。其中,i为大于或等于1的整数,X,Y,M,Z均为大于i的整数。
根据本申请实施例提供的音频处理的方法,通过相对于检测到目标动作的时刻,提前一定时间对多路音频进行合并,可以避免由于检测目标动作过程造成的时延,导致音频内容不完整或者音效不连贯的问题。
以双路音频合并的场景为例,对调整第三音频中各路音频的权重的具体过程进行详细的介绍。为便于描述,将双路音频分别记为音频1(即音轨1)和音频2(即音轨2),合并后的音频记为音频3(或称混合音轨),在实际应用时,音频1例如可以是本机麦克风录制的音频,音频2例如可以是无线麦克风录制的音频。
作为一个示例,如图13所示,音频1和音频2采样率为8bit,待合并的音频帧分别为[X-i,X]和[Y-i,Y],其中,第[X-i]帧的音频数据为11,第[(X-i)+1]帧的音频数据为12,第[(X-i)+2]帧的音频数据为200;第[Y-i]帧的音频数据为21,第[(Y-i)+1]帧的音频数据为22,第[(Y-i)+2]帧的音频数据为202。
对音频1和音频2进行合并得到的音频3(或称混合音轨)中,例如可以设置双路音频的权重均为0.5,此时,音频3各帧对应的音频数据如下:第[Z-i]帧的音频数据为(11+21)/2=16,第[(Z-i)+1]帧的音频数据为(12+22)/2=17,第[(Z-i)+2]帧的音频数据为(200+202)/2=201。
以下,以改变音频1和音频2的音量的权重为例进行说明,在其它实施例中,还可以调整其它音频特征。当权重随时间动态变化时,如图8所示,音频1的权重随时间呈线性变化,此时音频1和音频2的调整过程如下:
音频1的第一帧,权重W 11=0.2,此时,音频2的第一帧,权重W 21=1-W 11=0.8;
……
音频1的第i帧,权重W 1i=0.8,此时,音频2的第i帧,权重W 2i=1-W 1i=0.2。
因此,对于音频1和音频2的第h帧(1≤h≤i),其权重可以表示如下:
音频1的第h帧:权重
Figure PCTCN2021119048-appb-000002
音频2的第h帧:权重W 2h=1-W 1h
此外,当对n路音频合并时,也可以采用与双路音频合并类似的方法调整各路音频的权重。假设各音频缓存的第1帧至第n帧的音频数据如图10所示,则第三音频中第i帧中各音频权重W和第i帧的音频数据Zi可以分别满足如下公式:
W 1i+W 2i+…+W ni=1
Z i=W 1i×X 1i+W 2i×X 2i+…+W ni×X ni
应理解,本申请实施例提供的音频处理的方法中的音频采样率可以为8bit,16bit或者24bit,本申请对此不作限定。
通过本申请实施例提供的音频处理的方法,通过一个音轨完整播放多个声音通道录制的音频,能够在保证音频内容完整的基础上,实现视频之间的光滑切换,并有针对性地凸显多路音频中的重点内容。并且各路音频自然光滑切换,给用户带来良好的收听体验。
可以理解的是,为了实现上述功能,电子设备包含了执行各个功能相应的硬件和/或软件模块。结合本文中所公开的实施例描述的各示例的算法步骤,本申请能够以硬件或硬件和计算机软件的结合形式来实现。某个功能究竟以硬件还是计算机软件驱动硬件的方式来执行,取决于技术方案的特定应用和设计约束条件。本领域技术人员可以结合实施例对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
本实施例可以根据上述方法示例对电子设备进行功能模块的划分,例如,可以对应各个功能划分各个功能模块,也可以将两个或两个以上的功能集成在一个处理模块中。上述集成的模块可以采用硬件的形式实现。需要说明的是,本实施例中对模块的划分是示意性的,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式。
本申请实施例还提供一种电子设备,包括一个或多个处理器以及一个或多个存储器。该一个或多个存储器与一个或多个处理器耦合,一个或多个存储器用于存储计算机程序代码,计算机程序代码包括计算机指令,当一个或多个处理器执行计算机指令时,使得电子设备执行上述相关方法步骤实现上述实施例中的音频处理方法。
本申请的实施例还提供一种计算机可读存储介质,该计算机可读存储介质中存储有计算机指令,当该计算机指令在电子设备上运行时,使得电子设备执行上述相关方法步骤实现上述实施例中的音频处理方法。
本申请的实施例还提供了一种计算机程序产品,当该计算机程序产品在计算机上运行时,使得计算机执行上述相关步骤,以实现上述实施例中电子设备执行的音频处理方法。
另外,本申请的实施例还提供一种装置,这个装置具体可以是芯片,组件,模块或芯片系统,该装置可包括相连的处理器和存储器;其中,存储器用于存储计算机执 行指令,当装置运行时,处理器可执行存储器存储的计算机执行指令,以使芯片执行上述各方法实施例中电子设备执行的音频处理方法。
其中,本实施例提供的电子设备、计算机可读存储介质、计算机程序产品或芯片均用于执行上文所提供的对应的方法,因此,其所能达到的有益效果可参考上文所提供的对应的方法中的有益效果,此处不再赘述。
结合上文,本申请还提供如下实施例:
实施例1、一种音频处理的方法,其中,应用于电子设备,所述电子设备包括第一摄像头、第二摄像头,其中,所述第一摄像头拍摄第一视角,所述第二摄像头拍摄第二视角,所述方法包括:
响应于所述用户输入的录像操作时,进入录像模式;在所述录像模式下,所述第一摄像头对所述第一视角录制第一视频画面;录制多个声音通道的音频,所述多个声音通道的音频包括所述第一视角对应的第一音频和所述第二视角对应的第二音频;在第一时刻,第一说话人说话,所述第一说话人位于所述第二视角内;
生成目标录像文件,所述目标录像文件包括第三音频和第一视频画面,其中,所述第三音频包括至少部分所述第一音频和至少部分第二音频;以及
响应于用户输入对所述目标录像文件的播放操作,播放所述第三音频和第一视频画面;其中,
当播放到所述第一时刻对应的画面时,所述第三音频中的第二音频的音频特征发生变化。
实施例2、根据实施例1所述的方法,其中,所述音频特征包括音量,播放所述目标录像文件,具体包括:
当播放到所述第一时刻对应的视频画面时,所述第二音频的音量增大。
实施例3、根据实施例2所述的方法,其中,当播放到所述第一时刻对应的视频画面时,所述第二音频的音量逐渐增大。
实施例4、根据实施例1-3中任一项所述的方法,其中,在所述录像模式下,所述第二摄像头对所述第二视角录制第二视频画面,所述电子设备显示拍摄界面,所述拍摄界面包括所述第一视频画面和第二视频画面;
所述目标录像文件还包括所述第二视频画面;
所述电子设备播放所述目标录像文件时,所述电子设备显示所述第一视频画面和所述第二视频画面。
实施例5、根据实施例1-3中任一项所述的方法,其中,在所述录像模式下,所述第二摄像头对所述第二视角录制第二视频画面,所述电子设备显示拍摄界面,所述拍摄界面不包括所述第二视频画面;
所述电子设备播放所述目标录像文件时,所述电子设备不显示所述第二视频画面。
实施例6、根据实施例1-5中任一项所述的方法,其中,在所述录像模式下,所述第二摄像头对所述第二视角录制第二视频画面,在所述第一时刻,所述第二视频画面中的所述第一说话人张口。
实施例7、根据实施例1-6中任一项所述的方法,其中,在所述录像模式下,在第二时刻,第二说话人说话,所述第二说话人位于所述第一视角内;
所述电子设备播放所述目标录像文件时,当播放到所述第二时刻对应的画面时,所述第三音频中所述第一音频的音频特征发生变化。
实施例8、根据实施例7所述的方法,其中,当播放到所述第二时刻对应的画面时,所述第三音频中所述第一音频的音量逐渐增大。
实施例9、根据实施例1-8中任一项所述的方法,其中,所述电子设备包括第一麦克风和第二麦克风;
在所述录像模式下,所述第一麦克风录制所述第一音频,所述第二麦克风录制所述第二音频;或,
在所述录像模式下,所述第一麦克风录制所述第二音频,所述第二麦克风录制所述第一音频。
实施例10、根据实施例1-8中任一项所述的方法,其中,所述电子设备包括第一麦克风,第二麦克风与所述电子设备无线连接;
在所述录像模式下,所述第一麦克风录制所述第一音频,所述第二麦克风录制所述第二音频,所述第二音频通过所述无线连接发送给所述电子设备;或,
在所述录像模式下,所述第一麦克风录制所述第二音频,所述第二麦克风录制所述第一音频,所述第一音频通过所述无线连接发送给所述电子设备。
实施例11、根据实施例1-8中任一项所述的方法,其中,第一麦克风和第二麦克风均与所述电子设备无线连接,所述第一音频和所述第二音频通过所述无线连接发送给所述电子设备;
在所述录像模式下,所述第一麦克风录制所述第一音频,所述第二麦克风录制所述第二音频;或,
在所述录像模式下,所述第一麦克风录制所述第二音频,所述第二麦克风录制所述第一音频。
实施例12、根据实施例1-11中任一所述的方法,其中,在所述录像模式下,缓存所述第一音频的音频帧、第二音频的音频帧和第一视频画面的视频帧;
检测所述第一说话人的动作;
在检测到所述第一说话人开始说话时,在当前音频帧的前i帧音频帧开始,调整所述第三音频中所述第一音频的音频特征,并调整所述第三音频中所述第二音频的音频特征,i大于等于1。
实施例13、根据实施例1-12中任一所述的方法,其中,所述第一视角和第二视角是前置视角、广角视角、变焦视角中的任意两个视角。
实施例14、一种音频处理的方法,其中,应用于电子设备,所述电子设备包括第一摄像头、第二摄像头,其中,所述第一摄像头拍摄第一视角,所述第二摄像头拍摄第二视角,所述方法包括:
响应于所述用户输入的录像操作时,进入录像模式;在所述录像模式下,所述第一摄像头对所述第一视角录制第一视频画面;录制多个声音通道的音频,所述多个声音通道的音频包括所述第一视角对应的第一音频和所述第二视角对应的第二音频;在第一时刻,第一说话人说话,所述第一说话人位于所述第一视角内;
生成目标录像文件,所述目标录像文件包括第三音频和第一视频画面,其中,所 述第三音频包括至少部分所述第一音频和至少部分第二音频;以及
响应于用户输入对所述目标录像文件的播放操作,播放所述第三音频和第一视频画面;其中,
当播放到所述第一时刻对应的画面时,所述第三音频中的第一音频的音频特征发生变化。
实施例15、根据实施例14所述的方法,其中,所述音频特征包括音量,播放所述目标录像文件,具体包括:
当播放到所述第一时刻对应的视频画面时,所述第一音频的音量增大。
实施例16、根据实施例15所述的方法,其中,当播放到所述第一时刻对应的视频画面时,所述第一音频的音量逐渐增大。
实施例17、根据实施例14-16中任一项所述的方法,其中,所述电子设备包括第一麦克风和第二麦克风;
在所述录像模式下,所述第一麦克风录制所述第一音频,所述第二麦克风录制所述第二音频;或,
在所述录像模式下,所述第一麦克风录制所述第二音频,所述第二麦克风录制所述第一音频。
实施例18、根据实施例14-16中任一项所述的方法,其中,所述电子设备包括第一麦克风,第二麦克风与所述电子设备无线连接;
在所述录像模式下,所述第一麦克风录制所述第一音频,所述第二麦克风录制所述第二音频,所述第二音频通过所述无线连接发送给所述电子设备;或,
在所述录像模式下,所述第一麦克风录制所述第二音频,所述第二麦克风录制所述第一音频,所述第一音频通过所述无线连接发送给所述电子设备。
实施例19、根据实施例14-16中任一项所述的方法,其中,第一麦克风和第二麦克风均与所述电子设备无线连接,所述第一音频和所述第二音频通过所述无线连接发送给所述电子设备;
在所述录像模式下,所述第一麦克风录制所述第一音频,所述第二麦克风录制所述第二音频;或,
在所述录像模式下,所述第一麦克风录制所述第二音频,所述第二麦克风录制所述第一音频。
实施例20、根据实施例14所述的方法,其中,在所述录像模式下,缓存所述第一音频的音频帧、第二音频的音频帧和第一视频画面的视频帧;
检测所述第一说话人的动作;
在检测到所述第一说话人开始说话时,在当前音频帧的前i帧音频帧开始,调整所述第三音频中所述第一音频的音频特征,并调整所述第三音频中所述第二音频的音频特征,i大于等于1。
实施例21、根据实施例14所述的方法,其中,所述第一视角和第二视角是前置视角、广角视角、变焦视角中的任意两个视角。
实施例22、一种电子设备,其中,包括:
多个摄像头,用于采集视频画面;
屏幕,用于显示界面;
音频播放部件,用于播放音频;
一个或多个处理器;
存储器;
以及一个或多个计算机程序,其中所述一个或多个计算机程序被存储在所述存储器中,所述一个或多个计算机程序包括指令,当所述指令被所述电子设备执行时,使得所述电子设备执行如实施例1-21任一项所述的音频处理方法。
实施例23、一种计算机可读存储介质,其中,包括计算机指令,当所述计算机指令在电子设备上运行时,使得所述电子设备执行如实施例1至21中任一项所述的音频处理的方法。
实施例24、一种计算机程序产品,其特征在于,当所述计算机程序产品在计算机上运行时,使得所述计算机执行如实施例1至21中任一项所述的音频处理的方法。
实施例25、一种电子设备,包括屏幕、计算机存储器、摄像头,用于实现如实施例1至21中任一项所述的音频处理的方法。
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时,全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者通过所述计算机可读存储介质进行传输。所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线)或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质(例如,DVD)、或者半导体介质(例如,固态硬盘(solid state disk,SSD))等。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,该流程可以由计算机程序来指令相关的硬件完成,该程序可存储于计算机可读取存储介质中,该程序在执行时,可包括如上述各方法实施例的流程。而前述的存储介质包括:ROM或随机存储记忆体RAM、磁碟或者光盘等各种可存储程序代码的介质。
以上所述,仅为本申请实施例的具体实施方式,但本申请实施例的保护范围并不局限于此,任何在本申请实施例揭露的技术范围内的变化或替换,都应涵盖在本申请实施例的保护范围之内。因此,本申请实施例的保护范围应以所述权利要求的保护范围为准。

Claims (43)

  1. [援引加入(细则20.6) 06.12.2021]
    一种音频处理的方法,其特征在于,应用于电子设备,所述电子设备包括第一摄像头、第二摄像头,其中,所述第一摄像头拍摄第一视角,所述第二摄像头拍摄第二视角,所述方法包括:
    响应于所述用户输入的录像操作时,进入录像模式;在所述录像模式下,
    所述第一摄像头对所述第一视角录制第一视频画面;
    录制多个声音通道的音频,所述多个声音通道的音频包括所述第一视角对应的第一音频和所述第二视角对应的第二音频;
    在第一时刻,第一说话人说话,所述第一说话人位于所述第二视角内;
    生成目标录像文件,所述目标录像文件包括第三音频和第一视频画面,其中,所述第三音频包括至少部分所述第一音频和至少部分第二音频;以及
    响应于用户输入对所述目标录像文件的播放操作,播放所述第三音频和第一视频画面;其中,
    当播放到所述第一时刻对应的画面时,所述第三音频中的第二音频的音频特征发生变化。
  2. [援引加入(细则20.6) 06.12.2021]根据权利要求1所述的方法,其特征在于,所述音频特征包括音量,播放所述目标录像文件,具体包括:
    当播放到所述第一时刻对应的视频画面时,所述第二音频的音量增大。
  3. [援引加入(细则20.6) 06.12.2021]根据权利要求2所述的方法,其特征在于,当播放到所述第一时刻对应的视频画面时,所述第二音频的音量逐渐增大。
  4. [援引加入(细则20.6) 06.12.2021]根据权利要求1-3中任一项所述的方法,其特征在于,在所述录像模式下,所述第二摄像头对所述第二视角录制第二视频画面,所述电子设备显示拍摄界面,所述拍摄界面包括所述第一视频画面和第二视频画面;
    所述目标录像文件还包括所述第二视频画面;
    播放所述目标录像文件时,显示所述第一视频画面和所述第二视频画面。
  5. [援引加入(细则20.6) 06.12.2021]根据权利要求1-3中任一项所述的方法,其特征在于,在所述录像模式下,所述第二摄像头对所述第二视角录制第二视频画面,所述电子设备显示拍摄界面,所述拍摄界面不包括所述第二视频画面;
    播放所述目标录像文件时,不显示所述第二视频画面。
  6. [援引加入(细则20.6) 06.12.2021]根据权利要求1-5中任一项所述的方法,其特征在于,在所述录像模式下,所述第二摄像头对所述第二视角录制第二视频画面,在所述第一时刻,所述第二视频画面中的所述第一说话人张口。
  7. [援引加入(细则20.6) 06.12.2021]根据权利要求1-6中任一项所述的方法,其特征在于,在所述录像模式下,在第二时刻,第二说话人说话,所述第二说话人位于所述第一视角内;
    播放所述目标录像文件时,当播放到所述第二时刻对应的画面时,所述第三音频中所述第一音频的音频特征发生变化。
  8. [援引加入(细则20.6) 06.12.2021]根据权利要求7所述的方法,其特征在于,当播放到所述第二时刻对应的画面时,所述第三音频中所述第一音频的音量逐渐增大。
  9. [援引加入(细则20.6) 06.12.2021]根据权利要求1-8中任一项所述的方法,其特征在于,所述电子设备包括第一麦克风和第二麦克风;
    在所述录像模式下,所述第一麦克风录制所述第一音频,所述第二麦克风录制所述第二音频;或,
    在所述录像模式下,所述第一麦克风录制所述第二音频,所述第二麦克风录制所述第一音频。
  10. [援引加入(细则20.6) 06.12.2021]根据权利要求1-8中任一项所述的方法,其特征在于,所述电子设备包括第一麦克风,第二麦克风与所述电子设备无线连接;
    在所述录像模式下,所述第一麦克风录制所述第一音频,所述第二麦克风录制所述第二音频,所述第二音频通过所述无线连接发送给所述电子设备;或,
    在所述录像模式下,所述第一麦克风录制所述第二音频,所述第二麦克风录制所述第一音频,所述第一音频通过所述无线连接发送给所述电子设备。
  11. [援引加入(细则20.6) 06.12.2021]根据权利要求1-8中任一项所述的方法,其特征在于,第一麦克风和第二麦克风均与所述电子设备无线连接,所述第一音频和所述第二音频通过所述无线连接发送给所述电子设备;
    在所述录像模式下,所述第一麦克风录制所述第一音频,所述第二麦克风录制所述第二音频;或,
    在所述录像模式下,所述第一麦克风录制所述第二音频,所述第二麦克风录制所述第一音频。
  12. [援引加入(细则20.6) 06.12.2021]根据权利要求1-11中任一所述的方法,其特征在于,在所述录像模式下,缓存所述第一音频的音频帧、第二音频的音频帧和第一视频画面的视频帧;
    检测所述第一说话人的动作;
    在检测到所述第一说话人开始说话时,在当前音频帧的前i帧音频帧开始,调整所述第三音频中所述第一音频的音频特征,并调整所述第三音频中所述第二音频的音频特征,i大于等于1。
  13. [援引加入(细则20.6) 06.12.2021]根据权利要求1-12中任一所述的方法,其特征在于,所述第一视角和第二视角是前置视角、广角视角、变焦视角中的任意两个视角。
  14. [援引加入(细则20.6) 06.12.2021]一种音频处理的方法,其特征在于,应用于电子设备,所述电子设备包括第一摄像头、第二摄像头,其中,所述第一摄像头拍摄第一视角,所述第二摄像头拍摄第二视角,所述方法包括:
    响应于所述用户输入的录像操作时,进入录像模式;在所述录像模式下,
    所述第一摄像头对所述第一视角录制第一视频画面;
    录制多个声音通道的音频,所述多个声音通道的音频包括所述第一视角对应的第一音频和所述第二视角对应的第二音频;
    在第一时刻,第一说话人说话,所述第一说话人位于所述第一视角内;
    生成目标录像文件,所述目标录像文件包括第三音频和第一视频画面,其中,所述第三音频包括至少部分所述第一音频和至少部分第二音频;以及
    响应于用户输入对所述目标录像文件的播放操作,播放所述第三音频和第一视频画面;其中,
    当播放到所述第一时刻对应的画面时,所述第三音频中的第一音频的音频特征发生变化。
  15. [援引加入(细则20.6) 06.12.2021]根据权利要求14所述的方法,其特征在于,所述音频特征包括音量,播放所述目标录像文件,具体包括:
    当播放到所述第一时刻对应的视频画面时,所述第一音频的音量增大。
  16. [援引加入(细则20.6) 06.12.2021]根据权利要求15所述的方法,其特征在于,当播放到所述第一时刻对应的视频画面时,所述第一音频的音量逐渐增大。
  17. [援引加入(细则20.6) 06.12.2021]一种电子设备,其特征在于,包括:
    多个摄像头,用于采集视频画面;
    屏幕,用于显示界面;
    音频播放部件,用于播放音频;
    一个或多个处理器;
    存储器;
    以及一个或多个计算机程序,其中所述一个或多个计算机程序被存储在所述存储器中,所述一个或多个计算机程序包括指令,当所述指令被所述电子设备执行时,使得所述电子设备执行如权利要求1-13或14-16任一项所述的音频处理方法。
  18. [援引加入(细则20.6) 06.12.2021]一种计算机可读存储介质,其特征在于,包括计算机指令,当所述计算机指令在电子设备上运行时,使得所述电子设备执行如权利要求1-13或14-16中任一项所述的音频处理的方法。
  19. [援引加入(细则20.6) 06.12.2021]一种计算机程序产品,其特征在于,当所述计算机程序产品在计算机上运行时,使得所述计算机执行如权利要求1-13或14-16中任一项所述的音频处理的方法。
  20. [援引加入(细则20.6) 06.12.2021]一种电子设备,包括屏幕、计算机存储器、摄像头,用于实现如权利要求1-13或14-16中任一项所述的音频处理的方法。
  21. 一种基于多分析任务的数据分析方法,其特征在于,包括:
    获取待处理数据,触发并行的至少一个第一分析任务和至少一个第二分析任务,其中,所述第一分析任务和所述第二分析任务均用于对所述待处理数据进行分析;
    获取各个所述第一分析任务生成的第一结果,以及对各个所述第二分析任务进行结果预测得到的第二结果;
    获取对所述第一结果和所述第二结果进行优先级排序后得到的排序结果,并根据所述排序结果判断是否需要等待所述第二分析任务生成分析结果;
    若根据所述排序结果判定无需等待所述第二分析任务生成分析结果,则将所述排序结果中优先级最高的结果作为输出结果。
  22. 根据权利要求1所述的基于多分析任务的数据分析方法,其特征在于,所述根据所述排序结果判断是否需要等待所述第二分析任务生成分析结果,包括:
    若所述排序结果中优先级最高的结果为所述第一结果,则判定为无需等待所述第二分析任务生成分析结果。
  23. 根据权利要求1所述的基于多分析任务的数据分析方法,其特征在于,所述根据所述排序结果判断是否需要等待所述第二分析任务生成分析结果,包括:
    若所述排序结果中优先级最高的结果为所述第二结果,将该优先级最高的结果对应的所述第二分析任务作为目标分析任务,并判定为需要等待所述目标分析任务生成分析结果;
    相应的,所述基于多分析任务的数据分析方法还包括:
    若根据所述排序结果判定需要等待所述目标分析任务生成分析结果,则获取所述目标分析任务生成的分析结果,并将该分析结果作为输出结果。
  24. 根据权利要求1至3任意一项所述的基于多分析任务的数据分析方法,其特征在于,所述获取对所述第一结果和所述第二结果进行优先级排序后得到的排序结果,包括:
    若获取到由一个或多个所述第二分析任务生成的第三结果,则获取对所述第一结果、所述第二结果和所述第三结果进行优先级排序后得到的排序结果;
    相应的,所述根据所述排序结果判断是否需要等待所述第二分析任务生成分析结果,包括:
    若所述排序结果中优先级最高的结果为所述第一结果或者所述第三结果,则判定为无需等待所述第二分析任务生成分析结果。
  25. 根据权利要求1至3任意一项所述的基于多分析任务的数据分析方法,其特征在于,所述获取对所述第一结果和所述第二结果进行优先级排序后得到的排序结果,包括:
    若获取到由一个或多个所述第二分析任务生成的第三结果,则从获取到的所述第二结果中剔除与所述第三结果对应的结果,并获取对所述第一结果、所述第三结果以及剔除操作之后剩余的所述第二结果进行优先级排序后,得到所述排序结果;
    相应的,所述根据所述排序结果判断是否需要等待所述第二分析任务生成分析结果,包括:
    若所述排序结果中优先级最高的结果为所述第一结果或者所述第三结果,则判定为无需等待所述第二分析任务生成分析结果。
  26. 根据权利要求1至5任意一项所述的基于多分析任务的数据分析方法,其特征在于,获取对所述第二分析任务进行结果预测得到的第二结果的操作,包括:
    获取预先训练的与所述第二分析任务一一对应的数据分析模型,并将所述待处理数据输入至所述数据分析模型中进行处理,其中,所述数据分析模型用于对对应的所述第二分析任务进行结果预测;
    获取由所述数据分析模型生成的所述第二结果。
  27. 根据权利要求6所述的基于多分析任务的数据分析方法,其特征在于,对单个所述数据分析模型的训练过程,还包括:
    获取样本数据,并基于所述样本数据对所述数据分析模型进行参数更新;
    在每次参数更新操作完成后,基于预设的分类器分析当次参数更新操作完成后,所述数据分析模型对所述样本数据分析的准确性数据;
    基于所述准确性数据,对所述数据分析模型进行参数迭代更新训练,直至所述数据分析模型满足预设的收敛条件,完成训练。
  28. 根据权利要求6或7所述的基于多分析任务的数据分析方法,其特征在于,还包括:
    获取各个所述第二分析任务生成的第三结果,并基于所述待处理数据和所述第三结果,对各个所述数据分析模型进行参数更新。
  29. 根据权利要求8所述的基于多分析任务的数据分析方法,其特征在于,所述基于所述待处理数据和所述第三结果,对各个所述数据分析模型进行参数更新的操作中,对单个所述第二分析任务对应的所述数据分析模型的参数更新操作,包括:
    对所述待处理数据和该第二分析任务生成的第三结果进行匹配分析;
    若所述待处理数据与该第二分析任务生成的第三结果相匹配,则基于所述待处理数据和该第二分析任务生成的第三结果,对该第二分析任务对应的数据分析模型进行参数更新。
  30. 一种基于多分析任务的数据分析方法,其特征在于,包括:
    获取待处理数据,触发并行的至少一个第一分析任务和至少一个第二分析任务,其中,所述第一分析任务和所述第二分析任务均用于对所述待处理数据进行分析;
    获取各个所述第一分析任务生成的第一结果,获取对各个所述第二分析任务进行结果预测得到的第二结果,以及各个所述第二结果对应的置信度;
    获取对所述第一结果和所述第二结果进行优先级排序后得到的第一排序结果,并根据所述第一排序结果和所述置信度判断是否需要等待所述第二分析任务生成分析结果;
    若根据所述第一排序结果和所述置信度判定无需等待所述第二分析任务生成分析结果,则将所述第一排序结果中优先级最高的结果作为输出结果。
  31. 根据权利要求10所述的基于多分析任务的数据分析方法,其特征在于,所述根据所述第一排序结果和所述置信度判断是否需要等待所述第二分析任务生成分析结果,包括:
    若所述第一排序结果内优先级最高的前n个结果中不包含目标结果,且所述第一排序结果中优先级最高的结果为所述第一结果,则判定为无需等待所述第二分析任务生成分析结果,其中,所述目标结果为所述置信度高于预设第一阈值的所述第二结果,n为大于1的整数。
  32. 根据权利要求10或11所述的基于多分析任务的数据分析方法,其特征在于,所述根据所述第一排序结果和所述置信度判断是否需要等待所述第二分析任务生成分析结果,包括:
    若所述第一排序结果内优先级最高的前n个结果中不包含目标结果,且所述第一排序结果中优先级最高的结果为所述第二结果,将该优先级最高的结果对应的所述第二分析任务作为目标分析任务,并判定为需要等待所述目标分析任务生成分析结果,其中,所述目标结果为所述置信度高于预设第一阈值的所述第二结果,n为大于1的整数;
    相应的,所述基于多分析任务的数据分析方法还包括:
    若根据所述第一排序结果和所述置信度判定需要等待所述目标分析任务生成分析结果,则获取所述目标分析任务生成的分析结果,并将该分析结果作为输出结果。
  33. 根据权利要求10至12任意一项所述的基于多分析任务的数据分析方法,其特征在于,所述根据所述第一排序结果和所述置信度判断是否需要等待所述第二分析任务生成分析结果,包括:
    若所述第一排序结果内优先级最高的前n个结果中包含目标结果,且所述第一排序结果中优先级最高的结果为所述目标结果,将该优先级最高的结果对应的所述第二分析任务作为目标分析任务,并判定为需要等待所述目标分析任务生成分析结果,其中,所述目标结果为所述置信度高于预设第一阈值的所述第二结果,n为大于1的整数;
    相应的,所述基于多分析任务的数据分析方法还包括:
    若根据所述第一排序结果和所述置信度判定需要等待所述目标分析任务生成分析结果,则获取所述目标分析任务生成的分析结果,并将该分析结果作为输出结果。
  34. 根据权利要求10至13任意一项所述的基于多分析任务的数据分析方法,其特征在于,所述根据所述第一排序结果和所述置信度判断是否需要等待所述第二分析任务生成分析结果,包括:
    若所述第一排序结果内优先级最高的前n个结果中包含目标结果,且所述第一排序结果中优先级最高的结果不为所述目标结果,则判定为需要等待所述第二分析任务生成分析结果,其中,所述目标结果为所述置信度高于预设第一阈值的所述第二结果,n为大于1的整数。
  35. 根据权利要求14所述的基于多分析任务的数据分析方法,其特征在于,还包括:
    若根据所述第一排序结果和所述置信度判定需要等待所述第二分析任务生成分析结果,获取所述前n个结果中各个所述目标结果对应的第三结果,其中,该第三结果为所述目标结果对应的所述第二分析任务生成的分析结果;
    对所述前n个结果中的所述第一结果、所述置信度低于或等于所述第一阈值的所 述第二结果以及获取到的所述第三结果进行优先级排序,得到对应的第二排序结果;
    若所述第二排序结果中优先级最高的结果为所述第一结果或者所述第三结果,则将所述第二排序结果中优先级最高的结果作为输出结果。
  36. 根据权利要求15所述的基于多分析任务的数据分析方法,其特征在于,还包括:
    若所述第二排序结果中优先级最高的结果为所述第二结果,获取该第二结果对应的第三结果,并将获取到的所述第三结果作为输出结果,其中,该第三结果为,由所述第二排序结果中优先级最高的结果对应的第二分析任务生成的分析结果。
  37. 根据权利要求14所述的基于多分析任务的数据分析方法,其特征在于,还包括:
    若根据所述第一排序结果和所述置信度判定需要等待所述第二分析任务生成分析结果,获取所述前n个结果中各个所述第二结果对应的第三结果,其中,所述第三结果为所述第二结果对应的所述第二分析任务生成的分析结果;
    对所述前n个结果中的所述第一结果以及获取到的所述第三结果进行优先级排序,得到对应的第二排序结果;
    将所述第二排序结果中优先级最高的结果作为输出结果。
  38. 根据权利要求10至17任意一项所述的基于多分析任务的数据分析方法,其特征在于,获取对所述第二分析任务进行结果预测得到的第二结果的操作,包括:
    获取预先训练的与所述第二分析任务一一对应的数据分析模型,并将所述待处理数据输入至所述数据分析模型中进行处理,其中,所述数据分析模型用于对对应的所述第二分析任务进行结果预测;
    获取由所述数据分析模型生成的所述第二结果。
  39. 根据权利要求18所述的基于多分析任务的数据分析方法,其特征在于,对单个所述数据分析模型的训练过程,还包括:
    获取样本数据,并基于所述样本数据对所述数据分析模型进行参数更新;
    在每次参数更新操作完成后,基于预设的分类器分析当次参数更新操作完成后,所述数据分析模型对所述样本数据分析的准确性数据;
    基于所述准确性数据,对所述数据分析模型进行参数迭代更新训练,直至所述数据分析模型满足预设的收敛条件,完成训练。
  40. 根据权利要求18或19所述的基于多分析任务的数据分析方法,其特征在于,还包括:
    获取各个所述第二分析任务生成的第三结果,并基于所述待处理数据和所述第三结果,对各个所述数据分析模型进行参数更新。
  41. 根据权利要求20所述的基于多分析任务的数据分析方法,其特征在于,所述基于所述待处理数据和所述第三结果,对各个所述数据分析模型进行参数更新的操作中,对单个所述第二分析任务对应的所述数据分析模型的参数更新操作,包括:
    对所述待处理数据和该第二分析任务生成的第三结果进行匹配分析;
    若所述待处理数据与该第二分析任务生成的第三结果相匹配,则基于所述待处理数据和该第二分析任务生成的第三结果,对该第二分析任务对应的数据分析模型进行 参数更新。
  42. 一种电子设备,其特征在于,所述电子设备包括存储器、处理器,所述存储器上存储有可在所述处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现根据权利要求1至9任一项所述方法,或者实现根据权利要求10至21任一项所述方法。
  43. 一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,其特征在于,所述计算机程序被处理器执行时实现根据权利要求1至9任一项所述方法,或者实现根据权利要求10至21任一项所述方法。
PCT/CN2021/119048 2020-09-30 2021-09-17 音频处理的方法及电子设备 WO2022068613A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP21874274.0A EP4044578A4 (en) 2020-09-30 2021-09-17 SOUND PROCESSING METHOD AND ELECTRONIC DEVICE
US17/740,114 US11870941B2 (en) 2020-09-30 2022-05-09 Audio processing method and electronic device

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011063396.6 2020-09-30
CN202011063396.6A CN114338965B (zh) 2020-09-30 2020-09-30 音频处理的方法及电子设备

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/740,114 Continuation US11870941B2 (en) 2020-09-30 2022-05-09 Audio processing method and electronic device

Publications (1)

Publication Number Publication Date
WO2022068613A1 true WO2022068613A1 (zh) 2022-04-07

Family

ID=80949564

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/119048 WO2022068613A1 (zh) 2020-09-30 2021-09-17 音频处理的方法及电子设备

Country Status (4)

Country Link
US (1) US11870941B2 (zh)
EP (1) EP4044578A4 (zh)
CN (2) CN114338965B (zh)
WO (1) WO2022068613A1 (zh)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11838652B2 (en) * 2021-07-15 2023-12-05 Samsung Electronics Co., Ltd. Method for storing image and electronic device supporting the same
CN117135299A (zh) * 2023-04-27 2023-11-28 荣耀终端有限公司 视频录制方法和电子设备
CN116958331B (zh) * 2023-09-20 2024-01-19 四川蜀天信息技术有限公司 一种音画同步的调整方法、装置和电子设备
CN118042329A (zh) * 2024-04-11 2024-05-14 深圳波洛斯科技有限公司 基于会议场景的多麦克风阵列降噪方法及其系统

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120062729A1 (en) * 2010-09-10 2012-03-15 Amazon Technologies, Inc. Relative position-inclusive device interfaces
CN104699445A (zh) * 2013-12-06 2015-06-10 华为技术有限公司 一种音频信息处理方法及装置
CN110336968A (zh) * 2019-07-17 2019-10-15 广州酷狗计算机科技有限公司 视频录制方法、装置、终端设备及存储介质
CN111343420A (zh) * 2020-02-18 2020-06-26 维沃移动通信有限公司 一种语音增强方法及穿戴设备

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP5375165B2 (ja) * 2009-02-19 2013-12-25 株式会社ニコン 撮像装置
JP2012120128A (ja) * 2010-12-03 2012-06-21 Canon Inc 再生装置及び方法
JP5241865B2 (ja) * 2011-01-21 2013-07-17 日立コンシューマエレクトロニクス株式会社 ビデオカメラ
KR20140114238A (ko) * 2013-03-18 2014-09-26 삼성전자주식회사 오디오와 결합된 이미지 표시 방법
CN106027933B (zh) * 2016-06-21 2019-02-15 维沃移动通信有限公司 一种视频的录制、播放方法及移动终端
US10825480B2 (en) * 2017-05-31 2020-11-03 Apple Inc. Automatic processing of double-system recording
CN107205130B (zh) * 2017-06-29 2020-02-11 努比亚技术有限公司 一种基于双摄像头的录像方法、终端及计算机可读介质
CN115762579A (zh) * 2018-09-29 2023-03-07 华为技术有限公司 一种声音处理方法、装置与设备
CN109413342B (zh) * 2018-12-21 2021-01-08 广州酷狗计算机科技有限公司 音视频处理方法、装置、终端及存储介质
CN110248240A (zh) * 2019-06-11 2019-09-17 北京达佳互联信息技术有限公司 视频的播放方法、装置、电子设备及存储介质

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120062729A1 (en) * 2010-09-10 2012-03-15 Amazon Technologies, Inc. Relative position-inclusive device interfaces
CN104699445A (zh) * 2013-12-06 2015-06-10 华为技术有限公司 一种音频信息处理方法及装置
CN110336968A (zh) * 2019-07-17 2019-10-15 广州酷狗计算机科技有限公司 视频录制方法、装置、终端设备及存储介质
CN111343420A (zh) * 2020-02-18 2020-06-26 维沃移动通信有限公司 一种语音增强方法及穿戴设备

Also Published As

Publication number Publication date
US11870941B2 (en) 2024-01-09
US20220272200A1 (en) 2022-08-25
EP4044578A1 (en) 2022-08-17
EP4044578A4 (en) 2023-01-25
CN114338965A (zh) 2022-04-12
CN116887015A (zh) 2023-10-13
CN114338965B (zh) 2023-05-23

Similar Documents

Publication Publication Date Title
EP4099688A1 (en) Audio processing method and device
WO2022068613A1 (zh) 音频处理的方法及电子设备
KR102577396B1 (ko) 녹화 프레임 레이트 제어 방법 및 관련 장치
JP7355941B2 (ja) 長焦点シナリオにおける撮影方法および端末
WO2021175165A1 (zh) 一种音频处理方法及设备
WO2022042168A1 (zh) 音频处理方法及电子设备
CN112398855B (zh) 应用内容跨设备流转方法与装置、电子设备
CN112532892A (zh) 图像处理方法及电子装置
CN110602312B (zh) 通话方法、电子设备及计算机可读存储介质
CN114040242A (zh) 投屏方法和电子设备
CN114185503A (zh) 多屏交互的系统、方法、装置和介质
CN113810589A (zh) 电子设备及其视频拍摄方法和介质
WO2022262416A1 (zh) 音频的处理方法及电子设备
CN113593567B (zh) 视频声音转文本的方法及相关设备
WO2021244368A1 (zh) 一种视频播放的方法及设备
CN117133306A (zh) 立体声降噪方法、设备及存储介质
WO2022161006A1 (zh) 合拍的方法、装置、电子设备和可读存储介质
CN113923372B (zh) 曝光调整方法及相关设备
CN115550559A (zh) 视频画面显示方法、装置、设备和存储介质
WO2023202431A1 (zh) 一种定向拾音方法及设备
CN117221708A (zh) 一种拍摄方法及相关电子设备
CN115700463A (zh) 一种投屏方法、系统及电子设备
CN117221709A (zh) 一种拍摄方法及相关电子设备
CN117221707A (zh) 一种视频处理方法和终端
CN115379039A (zh) 视频拍摄方法、装置和电子设备

Legal Events

Date Code Title Description
ENP Entry into the national phase

Ref document number: 2021874274

Country of ref document: EP

Effective date: 20220428

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21874274

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE